Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.bumbleagi.com/llms.txt

Use this file to discover all available pages before exploring further.

Bumblebee is local-first: the default path runs open weights on your hardware via Ollama, without shipping prompts to a third-party inference API. That ethos is unchanged in the open-source project. This page documents an optional inference mode for a different job: exercising the harness as a product—routing, memory, tools, platforms, and cognition loops—against the most capable hosted models you choose (often called “frontier” or “bleeding edge” APIs). It is a test and evaluation lever, not a replacement for self-hosted inference as the recommended default.

Why this exists

GoalTypical setup
Day-to-day use, privacy, no per-token bill for core chatlocal or remote_gateway → your Ollama
Stress-test behavior, compare model families, demo “best case” repliesopenrouter or venice → hosted OpenAI-compatible API
The implementation is intentionally thin: the same OpenAICompatibleTransport and provider surface used for Ollama and the home gateway. No separate “cloud edition” of the harness—only a different BUMBLEBEE_INFERENCE_PROVIDER and API key.

Supported providers

The harness recognizes two named presets (you can override the base URL if a vendor changes endpoints):
ProviderBUMBLEBEE_INFERENCE_PROVIDERAPI key env (default)Default base URL (root before /v1)
OpenRouteropenrouterOPENROUTER_API_KEYhttps://openrouter.ai/api
Venice AIveniceVENICE_API_KEYhttps://api.venice.ai/api
Both speak the OpenAI-compatible surface the harness already uses: /v1/chat/completions, /v1/embeddings, /v1/models. Health checks use GET /v1/models (not the home gateway’s GET /health on the tunnel root).

Base URL convention

The transport builds API paths as {base_url}/v1/..., same as local Ollama (http://127.0.0.1:11434 + /v1/...). For hosted vendors, BUMBLEBEE_INFERENCE_BASE_URL must be the prefix that ends right before /v1—for OpenRouter that is https://openrouter.ai/api, not a URL that already contains /v1. If you omit BUMBLEBEE_INFERENCE_BASE_URL, the preset defaults above apply.

Bearer token and custom key variable

Requests send Authorization: Bearer <key> using:
  1. OPENROUTER_API_KEY or VENICE_API_KEY by default, or
  2. The env var named in BUMBLEBEE_INFERENCE_API_KEY_ENV, or
  3. Harness inference.api_key_env in YAML when it is not the generic gateway name BUMBLEBEE_INFERENCE_GATEWAY_TOKEN (that name stays reserved for the home tunnel gateway).
This keeps gateway tokens and hosted API keys from colliding conceptually, even though both are Bearer headers.

Configuration

BUMBLEBEE_INFERENCE_PROVIDER=openrouter
OPENROUTER_API_KEY=sk-or-...

# Hosted stacks usually reject Ollama-only fields (e.g. options.num_ctx):
BUMBLEBEE_INFERENCE_PASS_NUM_CTX=false
Or for Venice:
BUMBLEBEE_INFERENCE_PROVIDER=venice
VENICE_API_KEY=...
BUMBLEBEE_INFERENCE_PASS_NUM_CTX=false
Optional overrides:
VariablePurpose
BUMBLEBEE_INFERENCE_BASE_URLCustom OpenAI-compat root (before /v1)
BUMBLEBEE_INFERENCE_TIMEOUTHTTP timeout (seconds)
BUMBLEBEE_INFERENCE_API_KEY_ENVEnv var name that holds the Bearer key
Full tables: Environment variables. Copy-paste examples also live in the repo .env.example under “Hosted brain”.

Harness YAML

In configs/default.yaml or an overlay, you can set:
inference:
  provider: openrouter   # or venice — often set via env instead
  base_url: ""           # empty uses provider default
  pass_num_ctx: false    # recommended for strict OpenAI-compat hosts
BUMBLEBEE_INFERENCE_PASS_NUM_CTX=true|false still overrides pass_num_ctx when set.

Models must match the host

models.reflex, models.deliberate, models.embedding, and any per-entity cognition.*_model overrides must use IDs your host actually serves (OpenRouter model slugs, Venice model ids, etc.). The harness will call /v1/models where supported; if your chosen id is wrong, you will see errors from the API, not from Bumblebee-specific logic. Embedding models in particular must exist on that provider if you rely on vector memory features.

Local vs hybrid Railway

DeploymentBrain
BUMBLEBEE_DEPLOYMENT_MODE=localHosted API from your laptop / CI
hybrid_railwayHosted API from the Railway worker (and API service if it runs LLM paths)—set the same BUMBLEBEE_INFERENCE_* and API key on bumblebee-worker / bumblebee-api. No home tunnel or INFERENCE_GATEWAY_TOKEN required for this mode.
Using hosted inference on Railway does not change Postgres, volumes, or platform behavior—only where /v1/chat/completions is executed.

Setup wizard

After you choose hybrid or local in bumblebee setup, the wizard offers an optional step: configure OpenRouter or Venice, write keys into .env, set BUMBLEBEE_INFERENCE_PASS_NUM_CTX=false, and optionally prompt for a custom base URL. For cloud bodies, mirror those variables in Railway yourself (the wizard explains this). Structured onboarding: Setup & onboarding.

Relationship to other docs

  • Portable inference — mental model: inference as a replaceable endpoint; hosted presets are another endpoint, not a fork.
  • Gateway — home appliance for your Ollama; orthogonal to OpenRouter/Venice.
  • Hybrid Railway — default hybrid brain is still tunnel + gateway; this page is the alternative brain for evaluation.

Operational notes

  • Cost and data handling are between you and the provider; read their terms. This mode is opt-in and key-driven.
  • Tool calling and long contexts depend on the model and provider, not on Bumblebee’s local-first defaults.
  • If a host returns errors on extra JSON fields, keep pass_num_ctx off and avoid provider-specific assumptions beyond OpenAI-compat chat/embeddings.

Summary

Open-source posture: defaults stay local and self-hosted; the repo remains Apache 2.0 and fully runnable without any hosted API. Product testing posture: flip BUMBLEBEE_INFERENCE_PROVIDER to openrouter or venice, set the key, align model ids, and run the same harness to see how your entity behaves on the strongest models you subscribe to—without changing the core architecture or licensing story.