Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.bumbleagi.com/llms.txt

Use this file to discover all available pages before exploring further.

The default deployment mode. Ollama handles inference, Bumblebee handles everything else — all on your machine.

Prerequisites

Ollama installed and in your PATH
Models pulled: gemma4:26b and nomic-embed-text
GPU with 16 GB+ VRAM (see Hardware)

Running

bumblebee talk canary --ollama
Direct conversation, no daemon. Good for testing and development.
The --ollama flag probes the configured URL (default http://localhost:11434) and runs ollama serve in the background if the endpoint is unreachable. It does not re-download models unless --pull-models is added.

What runs

ComponentDetails
OllamaChat models (reflex + deliberate) and embedding model, loaded on demand
DaemonHeartbeat, soma ticks, memory consolidation, drive checks, wake cycles
PlatformsEvery platform in entity YAML (CLI, Telegram, Discord)
ToolsShell, filesystem, code execution — all on local machine

Stopping

bumblebee stop
FlagEffect
--dry-runShow what would be stopped
--skip-gatewayDon’t run gateway shutdown
--leave-ollama-runningKeep Ollama alive

Ollama troubleshooting

If Ollama falls back to CPU unexpectedly, run npm run ollama:reset. This stops everything, clears stale processes, sets safe defaults (OLLAMA_MAX_LOADED_MODELS=1, OLLAMA_KEEP_ALIVE=60s), restarts the gateway, and warms the model.

Configuration

deployment:
  mode: local

ollama:
  base_url: "http://localhost:11434"
  timeout: 120
  retry_attempts: 3

models:
  reflex: "gemma4:26b"
  deliberate: "gemma4:26b"
  embedding: "nomic-embed-text"
Override models per entity under cognition.reflex_model / cognition.deliberate_model.
Optional: To evaluate the same harness against hosted frontier models (OpenRouter, Venice) on this machine, switch BUMBLEBEE_INFERENCE_PROVIDER and set the provider key — local-first defaults and licensing are unchanged. See Hosted inference (testing).