Gateway - Bumblebee

The inference gateway is a lightweight FastAPI proxy that sits between Cloudflare Tunnel and your local Ollama instance. It handles bearer auth, request forwarding, and timeout management.

Optional hosted APIs (OpenRouter, Venice) for evaluation use the same OpenAI-compatible client path but do not use this gateway—they call the vendor’s /v1 directly. That is documented under Hosted inference (testing) and does not change the local-first default.

Setup

bumblebee gateway setup

Interactive walkthrough: tunnel configuration, gateway env vars, and .env setup. When the platform helper script is present (see below), the wizard can start the stack immediately after.

Install

The gateway requires the gateway extra:

pip install 'bumblebee[gateway]'

Controls

bumblebee gateway on          # start Ollama + gateway + cloudflared
bumblebee gateway off         # stop everything
bumblebee gateway status      # check what's running
bumblebee gateway restart     # off then on (2s pause)

Add --leave-ollama-running to off or restart if you only want to bounce the tunnel and gateway.

bumblebee gateway on | off | status | restart invoke a platform helper script from the repo: scripts/gateway.ps1 on Windows, or scripts/gateway.sh on macOS and Linux (bash). gateway setup works on any OS for tunnel/token/.env configuration even if you start processes manually.

Configuration

Environment variables (set in .env or system env):

Variable	Default	Purpose
`INFERENCE_GATEWAY_TOKEN`	—	Bearer token for auth (must match worker’s `BUMBLEBEE_INFERENCE_GATEWAY_TOKEN`)
`OLLAMA_BASE_URL`	`http://127.0.0.1:11434`	Upstream Ollama URL
`INFERENCE_GATEWAY_HOST`	`127.0.0.1`	Gateway listen address
`INFERENCE_GATEWAY_PORT`	`8010`	Gateway listen port
`INFERENCE_GATEWAY_BACKEND_TIMEOUT`	`120`	Timeout for Ollama requests (seconds)
`INFERENCE_GATEWAY_MAX_BODY_BYTES`	`8000000`	Max request body size

Security

Bind to 127.0.0.1 only — the tunnel handles external access
Terminate the tunnel at the gateway, not a broad reverse proxy
Use a strong, random bearer token
The token lookup order: INFERENCE_GATEWAY_TOKEN → BUMBLEBEE_INFERENCE_GATEWAY_TOKEN → .env keys

Helper scripts and OS notes

The home stack needs Ollama, cloudflared, curl, and a Python interpreter where python -m bumblebee.inference_gateway resolves (e.g. your uv/venv environment). If the wrong interpreter is picked up, set BUMBLEBEE_PYTHON to the full path to python3. On macOS, install Ollama and cloudflared (e.g. Homebrew — cloudflared is often under /opt/homebrew/bin or /usr/local/bin). The CLI and scripts look there when cloudflared is not on PATH.

Windows

Batch shortcuts (optional):

scripts/windows/gateway-on.cmd
scripts/windows/gateway-off.cmd
scripts/windows/gateway-status.cmd
scripts/windows/gateway-restart.cmd

Direct PowerShell:

powershell -NoProfile -ExecutionPolicy Bypass -File .\scripts\gateway.ps1 on

macOS and Linux

The same operations use bash (the bumblebee gateway CLI calls this for you):

bash scripts/gateway.sh on

Run from the repository root so .env and imports resolve the same way as the CLI.

​Setup

​Install

​Controls

​Configuration

​Security

​Helper scripts and OS notes

​Windows

​macOS and Linux

Setup

Install

Controls

Configuration

Security

Helper scripts and OS notes

Windows

macOS and Linux