The default deployment mode. Ollama handles inference, Bumblebee handles everything else — all on your machine.Documentation Index
Fetch the complete documentation index at: https://docs.bumbleagi.com/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
Ollama installed and in your PATH
Models pulled:
gemma4:26b and nomic-embed-textGPU with 16 GB+ VRAM (see Hardware)
Running
- CLI only
- Full daemon
- Fresh machine
--ollama flag probes the configured URL (default http://localhost:11434) and runs ollama serve in the background if the endpoint is unreachable. It does not re-download models unless --pull-models is added.
What runs
| Component | Details |
|---|---|
| Ollama | Chat models (reflex + deliberate) and embedding model, loaded on demand |
| Daemon | Heartbeat, soma ticks, memory consolidation, drive checks, wake cycles |
| Platforms | Every platform in entity YAML (CLI, Telegram, Discord) |
| Tools | Shell, filesystem, code execution — all on local machine |
Stopping
| Flag | Effect |
|---|---|
--dry-run | Show what would be stopped |
--skip-gateway | Don’t run gateway shutdown |
--leave-ollama-running | Keep Ollama alive |
Ollama troubleshooting
Configuration
cognition.reflex_model / cognition.deliberate_model.
Optional: To evaluate the same harness against hosted frontier models (OpenRouter, Venice) on this machine, switch
BUMBLEBEE_INFERENCE_PROVIDER and set the provider key — local-first defaults and licensing are unchanged. See Hosted inference (testing).