Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.bumbleagi.com/llms.txt

Use this file to discover all available pages before exploring further.

Bumblebee Bumblebee is an open-source Gemma 4–native agent harness that runs on your hardware—local-first, no hosted runtime by default. Some people want a capable local assistant: tools, memory, bounded agent loops, and MCP—without paying per token for frontier hosted APIs. Others want the full entitative experience: one persistent digital entity with personality, soma body state, and presence across CLI, Telegram, and Discord. Same stack; you choose how far you take identity and autonomy. You define personality, voice, drives, and memory in YAML; the stack handles cognition, body state, tools, and multi-platform presence for the same being across surfaces.
Local inference via Ollama. No API keys. No subscriptions. Gemma 4 under the hood. Apache 2.0.

Who it’s for

Assistant-first

You want a harness purpose-built for Gemma 4, not a generic framework with the model bolted on. Use Bumblebee as a serious local agent: multi-step tools, retrieval, and extensibility—whether or not you lean into “digital entity” framing.

Local inference, not API bills

If you are tired of large monthly hosted API spend (for example high-volume calls to frontier inference APIs), you can run open weights on a consumer GPU and get agentic behavior in the same league as strong open-source stacks—without per-token metering for your default path.

Persistent presence

You want the entitative path: memory that accrues, a voice that evolves, soma and GEN, proactive wake cycles—the “one self across platforms” design. That is Bumblebee’s center of gravity; the assistant use case rides the same harness.

Why Bumblebee

You configure an entity in YAML — traits, voice quirks, backstory, drives, platforms — and Bumblebee runs it as a persistent being across CLI, Telegram, and Discord. The entity develops opinions, relationships, and habits over time. It remembers everything. It costs nothing to run.

Architecture

Five pillars, one design question: how does this entity exist more fully?
A phased perceive pipeline decomposes each turn into discrete stages: input processing, memory retrieval, prompt assembly, context compaction, a bounded agent loop with parallel tool execution, and reply delivery. Both reflex and deliberate reasoning profiles share the same tool registry and model weights.Read more →
A tonic body state engine provides continuous internal experience independent of conversation. Three layers: quantitative drive bars with decay and momentum (plus impulses and conflicts, including near-threshold and brewing strain), layered LLM-derived affects (surface, undercurrents, optional edge blends), and Generative Entropic Noise (GEN) — a second model producing raw associative inner voice at high temperature between turns.Ebb scales how much body + GEN appears in each turn’s prompt from a salience score (quiet / normal / high), while state keeps updating in the background — body.md stays full detail.The entity reads its own body. It cannot control it. The body is a signal, not a command.Read more →
A layered personality engine composes a first-person system prompt from core traits, behavioral patterns, voice configuration, and backstory. Trait evolution applies small adjustments over many interactions so character drifts naturally through experience.Read more →
Episodic narratives, per-person relationship models, world beliefs, emotional imprints, and self-narrative synthesis. Memory reads like biography, not chat logs. SQLite locally, Postgres for hybrid deployments.Read more →
An always-on daemon drives body state, memory consolidation, proactive initiative, and scheduled automations across CLI, Telegram, and Discord simultaneously. The same entity persists everywhere you wire it.Read more →

Inference

Local by default

Purpose-built for the Gemma 4 family running through Ollama. The default stack uses gemma4:26b for both reflex and deliberate reasoning and nomic-embed-text for vector memory. No external API calls unless you explicitly configure them—so your baseline is not a metered hosted chat API.

Hybrid option

Keep inference on your home GPU behind a gateway and Cloudflare Tunnel. An always-on worker runs on Railway with Postgres — persistent, reachable, and fully isolated from third-party providers.

Optional hosted evaluation (testing)

The project stays local-first and Apache 2.0—there is no “cloud edition.” If you want to stress-test the harness as a product—same cognition, memory, tools, and platforms—against frontier hosted models, you can opt in to OpenRouter or Venice AI (OpenAI-compatible APIs, Bearer keys). That path is documented in Hosted inference (testing); it is an evaluation lever, not a divergence from open-source ethos.

Tools and extensibility

60+ native tools

Web search, shell, filesystem, code execution, voice synthesis, browser automation, messaging, reminders, automations, and more. Toggle categories in config; optional extras install via pip.

MCP

Attach external tools via Model Context Protocol. Declare stdio servers in entity YAML — tools register dynamically at startup alongside native ones. Zapier, GitHub, and anything that speaks MCP.
See the complete tool reference for every built-in tool.

Get started

Quickstart

Install to first conversation in five minutes.

Setup & onboarding

Guided bumblebee setup, hybrid stack, tunnel, and Railway.

Create an entity

Define personality, voice, and drives in YAML.

Telegram

Connect to a bot.

Hybrid deploy

Home brain, cloud worker.

CLI reference

Every command and flag.