Local deployment - Bumblebee

The default deployment mode. Ollama handles inference, Bumblebee handles everything else — all on your machine.

Prerequisites

Ollama installed and in your PATH

Models pulled: gemma4:26b and nomic-embed-text

GPU with 16 GB+ VRAM (see Hardware)

Running

CLI only
Full daemon
Fresh machine

bumblebee talk canary --ollama

Direct conversation, no daemon. Good for testing and development.

bumblebee run canary --ollama

Heartbeat, soma, memory consolidation, initiative, and every platform in entity YAML.

bumblebee run canary --ollama --pull-models

Also downloads chat + embedding models from the Ollama library.

The --ollama flag probes the configured URL (default http://localhost:11434) and runs ollama serve in the background if the endpoint is unreachable. It does not re-download models unless --pull-models is added.

What runs

Component	Details
Ollama	Chat models (reflex + deliberate) and embedding model, loaded on demand
Daemon	Heartbeat, soma ticks, memory consolidation, drive checks, wake cycles
Platforms	Every platform in entity YAML (CLI, Telegram, Discord)
Tools	Shell, filesystem, code execution — all on local machine

Stopping

bumblebee stop

Flag	Effect
`--dry-run`	Show what would be stopped
`--skip-gateway`	Don’t run gateway shutdown
`--leave-ollama-running`	Keep Ollama alive

Ollama troubleshooting

If Ollama falls back to CPU unexpectedly, run npm run ollama:reset. This stops everything, clears stale processes, sets safe defaults (OLLAMA_MAX_LOADED_MODELS=1, OLLAMA_KEEP_ALIVE=60s), restarts the gateway, and warms the model.

Configuration

deployment:
  mode: local

ollama:
  base_url: "http://localhost:11434"
  timeout: 120
  retry_attempts: 3

models:
  reflex: "gemma4:26b"
  deliberate: "gemma4:26b"
  embedding: "nomic-embed-text"

Override models per entity under cognition.reflex_model / cognition.deliberate_model.

Optional: To evaluate the same harness against hosted frontier models (OpenRouter, Venice) on this machine, switch BUMBLEBEE_INFERENCE_PROVIDER and set the provider key — local-first defaults and licensing are unchanged. See Hosted inference (testing).

​Prerequisites

​Running

​What runs

​Stopping

​Ollama troubleshooting

​Configuration

Prerequisites

Running

What runs

Stopping

Ollama troubleshooting

Configuration