“Portable” here means: the entity and worker can run anywhere, while weights and Ollama stay where you want them — usually your own GPU — without handing prompts to a third-party API.Documentation Index
Fetch the complete documentation index at: https://docs.bumbleagi.com/llms.txt
Use this file to discover all available pages before exploring further.
Layers
- Ollama (or any backend the gateway forwards to) runs the models.
- Inference gateway exposes a small OpenAI-compatible HTTP API: health, models list, chat completions, embeddings. Everything is bearer-authenticated.
- Tunnel or edge (e.g. Cloudflare Tunnel) exposes only that HTTP port to the internet — not your whole LAN.
- Worker or laptop sets
BUMBLEBEE_INFERENCE_PROVIDER=remote_gatewayand pointsBUMBLEBEE_INFERENCE_BASE_URLat the tunnel URL.
Why a dedicated gateway
The gateway is intentionally narrow:- No shell, filesystem, entity tools, or admin UI on that port.
- Tunneled origin should terminate at the gateway, not at a catch-all reverse proxy that also exposes SSH or NAS UIs.
bumblebee gateway helpers.
Swap the middle mile
As long as the worker sees a stable HTTPS URL and passes the same bearer token, you can replace pieces of the chain:- Different tunnel (Tailscale Funnel, frp, WireGuard + nginx, corporate egress) — still forward to
127.0.0.1:<gateway_port>. - Different edge auth (Cloudflare Access, mTLS in front of the gateway) — ensure the client still reaches an OpenAI-compatible
/v1/chat/completionsand/v1/embeddingswith a token the gateway accepts (or terminate auth at the edge and forward with a static internal bearer).
BUMBLEBEE_INFERENCE_BASE_URL resolves and the token matches.
Worker agents and hybrid deploy
On Railway, the worker agent (bumblebee worker) holds Telegram/Discord sessions, Postgres memory, and the daemon. It does not need a GPU if remote_gateway is configured: every reflex/deliberate/embed call crosses the tunnel to your gateway → Ollama.
That pattern makes the social and memory footprint portable while keeping inference sovereignty on a machine you control.
Step-by-step: Hybrid Railway.
Local vs remote in one codebase
| Setting | Effect |
|---|---|
deployment.mode: local (default) | inference.provider → local Ollama URL unless overridden. |
hybrid_railway / remote_gateway | HTTP client to gateway; same entity code paths. |
openrouter / venice | Optional hosted OpenAI-compatible API (Bearer key) for harness / product testing with frontier models—still the same entity code paths. Not a fork; local-first defaults unchanged. See Hosted inference (testing). |
bumblebee gateway setup.
Mental model
Treat inference as a replaceable endpoint (local socket vs tunneled HTTPS) and treat entity state as durable data (SQLite file vsDATABASE_URL). Bumblebee keeps cognition and memory logic the same; you choose where each layer runs.