Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.bumbleagi.com/llms.txt

Use this file to discover all available pages before exploring further.

The inference gateway is a lightweight FastAPI proxy that sits between Cloudflare Tunnel and your local Ollama instance. It handles bearer auth, request forwarding, and timeout management.
Optional hosted APIs (OpenRouter, Venice) for evaluation use the same OpenAI-compatible client path but do not use this gateway—they call the vendor’s /v1 directly. That is documented under Hosted inference (testing) and does not change the local-first default.

Setup

bumblebee gateway setup
Interactive walkthrough: tunnel configuration, gateway env vars, and .env setup. When the platform helper script is present (see below), the wizard can start the stack immediately after.

Install

The gateway requires the gateway extra:
pip install 'bumblebee[gateway]'

Controls

bumblebee gateway on          # start Ollama + gateway + cloudflared
bumblebee gateway off         # stop everything
bumblebee gateway status      # check what's running
bumblebee gateway restart     # off then on (2s pause)
Add --leave-ollama-running to off or restart if you only want to bounce the tunnel and gateway.
bumblebee gateway on | off | status | restart invoke a platform helper script from the repo: scripts/gateway.ps1 on Windows, or scripts/gateway.sh on macOS and Linux (bash). gateway setup works on any OS for tunnel/token/.env configuration even if you start processes manually.

Configuration

Environment variables (set in .env or system env):
VariableDefaultPurpose
INFERENCE_GATEWAY_TOKENBearer token for auth (must match worker’s BUMBLEBEE_INFERENCE_GATEWAY_TOKEN)
OLLAMA_BASE_URLhttp://127.0.0.1:11434Upstream Ollama URL
INFERENCE_GATEWAY_HOST127.0.0.1Gateway listen address
INFERENCE_GATEWAY_PORT8010Gateway listen port
INFERENCE_GATEWAY_BACKEND_TIMEOUT120Timeout for Ollama requests (seconds)
INFERENCE_GATEWAY_MAX_BODY_BYTES8000000Max request body size

Security

  • Bind to 127.0.0.1 only — the tunnel handles external access
  • Terminate the tunnel at the gateway, not a broad reverse proxy
  • Use a strong, random bearer token
  • The token lookup order: INFERENCE_GATEWAY_TOKENBUMBLEBEE_INFERENCE_GATEWAY_TOKEN.env keys

Helper scripts and OS notes

The home stack needs Ollama, cloudflared, curl, and a Python interpreter where python -m bumblebee.inference_gateway resolves (e.g. your uv/venv environment). If the wrong interpreter is picked up, set BUMBLEBEE_PYTHON to the full path to python3. On macOS, install Ollama and cloudflared (e.g. Homebrewcloudflared is often under /opt/homebrew/bin or /usr/local/bin). The CLI and scripts look there when cloudflared is not on PATH.

Windows

Batch shortcuts (optional):
scripts/windows/gateway-on.cmd
scripts/windows/gateway-off.cmd
scripts/windows/gateway-status.cmd
scripts/windows/gateway-restart.cmd
Direct PowerShell:
powershell -NoProfile -ExecutionPolicy Bypass -File .\scripts\gateway.ps1 on

macOS and Linux

The same operations use bash (the bumblebee gateway CLI calls this for you):
bash scripts/gateway.sh on
Run from the repository root so .env and imports resolve the same way as the CLI.