Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.bumbleagi.com/llms.txt

Use this file to discover all available pages before exploring further.

The hybrid deployment splits the stack: inference stays on your home GPU (Ollama + gateway + Cloudflare Tunnel), while the entity worker runs on Railway with Postgres for durable memory.

Architecture

The hybrid stack splits across two hosts:
Home PCRailway
Ollama (Gemma 4)Bumblebee worker
Inference GatewayTelegram / Discord
Cloudflare TunnelPostgres (memory)
Volume (/app/data)
The Railway worker sends inference requests to your home GPU through the Cloudflare Tunnel. Everything else — platforms, memory, tools — runs in the cloud container.
Optional: For product-style harness testing with hosted frontier models (OpenRouter, Venice AI) instead of the home gateway, use BUMBLEBEE_INFERENCE_PROVIDER=openrouter or venice and set the matching API key on the worker. Same codebase and license story; see Hosted inference (testing).

Setup wizard

bumblebee setup --profile hybrid
For a structured walkthrough of every step (readiness checks, tunnel automation, health probes, Railway flags, troubleshooting), see Setup & onboarding. The wizard walks through .env, gateway token, optional automated Cloudflare Tunnel + DNS, optional home stack start, health checks, entity selection, and Railway variables / volume / deploy. For the gateway piece only:
bumblebee gateway setup

Manual setup

Home side

  1. Set the gateway token in .env:
INFERENCE_GATEWAY_TOKEN=your_secret_here
  1. Configure and start the Cloudflare Tunnel pointing at your gateway (default 127.0.0.1:8010).
  2. Start the home stack:
bumblebee gateway on
On Windows this runs scripts/gateway.ps1; on macOS and Linux it runs scripts/gateway.sh. See Gateway for dependencies (curl, cloudflared, Ollama, correct Python).

Railway side

  1. Link the repo:
railway link
  1. Set environment variables on the worker service.
Default hybrid (home GPU + tunnel):
BUMBLEBEE_DEPLOYMENT_MODE=hybrid_railway
BUMBLEBEE_INFERENCE_PROVIDER=remote_gateway
BUMBLEBEE_INFERENCE_BASE_URL=https://your-tunnel.example.com
BUMBLEBEE_INFERENCE_GATEWAY_TOKEN=your_secret_here
BUMBLEBEE_ENTITY=canary
DATABASE_URL=postgresql://...
TELEGRAM_TOKEN=your_telegram_token
Optional — hosted brain for evaluation (no tunnel; set model IDs to provider slugs): e.g. BUMBLEBEE_INFERENCE_PROVIDER=openrouter, OPENROUTER_API_KEY=..., and BUMBLEBEE_INFERENCE_PASS_NUM_CTX=false. Full detail: Hosted inference (testing).
  1. Deploy:
npm run deploy:canary

Execution

Shell, filesystem, and code tools execute in the Railway container by default (when RAILWAY_ENVIRONMENT is set).
ScenarioWhere tools run
On Railway, no RPC URLIn the container
On Railway, with RPC URLRPC host (falls back to container if unreachable)
On your laptop, hybrid modeBlocked unless tools.execution.allow_local: true
To hard-block local execution:
BUMBLEBEE_EXECUTION_REQUIRE_RAILWAY=true

Docker

The Dockerfile copies canary.example.yaml to canary.yaml in the image, so BUMBLEBEE_ENTITY=canary works without committing local YAML. The railway.json start command auto-selects worker or API role based on BUMBLEBEE_RAILWAY_ROLE.

Persistent Python environment on the volume

The worker and API processes do not rely only on packages baked into the image layer. At startup, docker/entrypoint-railway.sh:
  1. Uses BUMBLEBEE_EXECUTION_WORKSPACE_DIR (default /app/data if unset) as the mount where the “canonical machine” lives.
  2. Creates $WORKSPACE/.venv and installs /app[railway,api,full] into that venv when pyproject.toml changes (SHA stamp in .venv/.pyproject_sha).
  3. Sets HOME to $WORKSPACE/.home, PIP_CACHE_DIR, XDG_CACHE_HOME, and PLAYWRIGHT_BROWSERS_PATH under the volume so optional extras and browser binaries survive redeploys as long as the volume is attached.
Set BUMBLEBEE_EXECUTION_WORKSPACE_DIR=/app/data on the service (see setup wizard). The image still contains a bootstrap install for debugging; set BUMBLEBEE_SKIP_VOLUME_VENV=1 to run with the image Python only. Install-time extras in pyproject.toml (full = voice, PDF, YouTube, Playwright, fal, plus railway and api) are therefore available and persistent on the volume-backed venv, not only in the ephemeral container root.