The inference gateway is a lightweight FastAPI proxy that sits between Cloudflare Tunnel and your local Ollama instance. It handles bearer auth, request forwarding, and timeout management.Documentation Index
Fetch the complete documentation index at: https://docs.bumbleagi.com/llms.txt
Use this file to discover all available pages before exploring further.
Optional hosted APIs (OpenRouter, Venice) for evaluation use the same OpenAI-compatible client path but do not use this gateway—they call the vendor’s
/v1 directly. That is documented under Hosted inference (testing) and does not change the local-first default.Setup
.env setup. When the platform helper script is present (see below), the wizard can start the stack immediately after.
Install
The gateway requires thegateway extra:
Controls
--leave-ollama-running to off or restart if you only want to bounce the tunnel and gateway.
bumblebee gateway on | off | status | restart invoke a platform helper script from the repo: scripts/gateway.ps1 on Windows, or scripts/gateway.sh on macOS and Linux (bash). gateway setup works on any OS for tunnel/token/.env configuration even if you start processes manually.Configuration
Environment variables (set in.env or system env):
| Variable | Default | Purpose |
|---|---|---|
INFERENCE_GATEWAY_TOKEN | — | Bearer token for auth (must match worker’s BUMBLEBEE_INFERENCE_GATEWAY_TOKEN) |
OLLAMA_BASE_URL | http://127.0.0.1:11434 | Upstream Ollama URL |
INFERENCE_GATEWAY_HOST | 127.0.0.1 | Gateway listen address |
INFERENCE_GATEWAY_PORT | 8010 | Gateway listen port |
INFERENCE_GATEWAY_BACKEND_TIMEOUT | 120 | Timeout for Ollama requests (seconds) |
INFERENCE_GATEWAY_MAX_BODY_BYTES | 8000000 | Max request body size |
Security
- Bind to
127.0.0.1only — the tunnel handles external access - Terminate the tunnel at the gateway, not a broad reverse proxy
- Use a strong, random bearer token
- The token lookup order:
INFERENCE_GATEWAY_TOKEN→BUMBLEBEE_INFERENCE_GATEWAY_TOKEN→.envkeys
Helper scripts and OS notes
The home stack needs Ollama, cloudflared,curl, and a Python interpreter where python -m bumblebee.inference_gateway resolves (e.g. your uv/venv environment). If the wrong interpreter is picked up, set BUMBLEBEE_PYTHON to the full path to python3.
On macOS, install Ollama and cloudflared (e.g. Homebrew — cloudflared is often under /opt/homebrew/bin or /usr/local/bin). The CLI and scripts look there when cloudflared is not on PATH.
Windows
Batch shortcuts (optional):macOS and Linux
The same operations use bash (thebumblebee gateway CLI calls this for you):
.env and imports resolve the same way as the CLI.