Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.bumbleagi.com/llms.txt

Use this file to discover all available pages before exploring further.

Yes. The project is often described in entitative terms because memory, tools, and presence are designed around one persistent self—but you can treat it as a Gemma 4–native local assistant: CLI or platforms, tools, retrieval, and agent loops without emphasizing character roleplay.The harness is not only for one aesthetic. Configuration and prompts are yours; the stack is the same.
It depends on your bar. Bumblebee’s default path runs Gemma 4 via Ollama on your GPU, so you avoid per-token billing for core inference. Many users get agentic behavior comparable to strong open-source agent setups on similar hardware.Frontier hosted models can still win on raw capability for some workloads; this project optimizes for ownership, repeatability, and Gemma-specific integration—not for matching a proprietary API on every benchmark.
Entitative is shorthand for entity-first: the system is organized around a single, named digital self — not around anonymous chat threads or a grab-bag of unrelated tasks.In practice, memory, habits, and voice accumulate for that entity across sessions and surfaces (CLI, Telegram, Discord). You are not starting fresh every time. You are continuing the same presence, with resets and tools available when you intend to use them.It is a design stance: many frameworks optimize for stateless or disposable conversations. Bumblebee optimizes for a persistent self you own — local inference, your disks, your rules.
No for the default inference path. Core chat and embeddings use Ollama on your GPU; no hosted chat API is required. You can optionally add Firecrawl for enhanced web search or Fal for image generation.Optional: For evaluation (e.g. comparing harness behavior on frontier hosted models), you can configure OpenRouter or Venice AI — same Apache 2.0 repo, opt-in keys only. See Hosted inference (testing).
Yes, as an optional mode: set BUMBLEBEE_INFERENCE_PROVIDER to openrouter or venice, add the matching API key, align model IDs with that provider, and usually set BUMBLEBEE_INFERENCE_PASS_NUM_CTX=false. The harness code paths are unchanged; you are only swapping the OpenAI-compatible HTTP endpoint. This does not replace the local-first recommendation — it is for testing and product-style evaluation. See Hosted inference (testing).
  • Python 3.11+ and uv (recommended) or pip
  • Ollama with gemma4:26b and nomic-embed-text
  • A GPU with 16 GB+ VRAM for the recommended experience (see Hardware)
CPU-only via Ollama works for experiments but expect slow turns.
bumblebee talk <entity> starts a terminal-only conversation — no background daemon, no Telegram, no Discord. Ideal for quick tests and development.bumblebee run <entity> starts the full presence loop: the daemon plus every platform listed in your entity YAML, with heartbeat, soma, memory consolidation, wake cycles, and automations.
Hybrid keeps inference on your home GPU behind a gateway and Cloudflare Tunnel while an always-on worker runs on Railway with Postgres. You get persistence and reachability without sending inference to a third-party API.See Setup & onboarding for the guided wizard, then Hybrid Railway for architecture and manual variables.
By default, each entity uses a SQLite file at ~/.bumblebee/entities/<name>/memory.db. When DATABASE_URL is set — typical for hybrid Railway deployments — the harness uses Postgres instead.knowledge.md and journal.md are always on disk. On Railway, they live on the volume when BUMBLEBEE_EXECUTION_WORKSPACE_DIR is set.
No. Platform commands like /reset clear rolling chat turns for the current session. They do not wipe episodic memory, beliefs, relationships, or other data in the database.A full experiential wipe is intentional and host-side: bumblebee wipe <entity> --yes. Always back up first.
Yes. Entities can create their own automations (scheduled routines), update their own knowledge.md entries, and write to their own journal. The emergence system can even suggest new routines based on the entity’s memory and relationships.All self-modification happens through the standard tool system — the entity uses create_automation, update_knowledge, and write_journal like any other tool.
A second model produces continuous internal commentary — raw associative material at high temperature. The main model reads these fragments as its own stream of consciousness. GEN runs on a timer during silence (~60s) and also regenerates after every conversation turn, so the subconscious stays current during active exchanges. After 30 minutes of silence, the entity already has 30 minutes of inner voice accumulated.GEN reads bars, affects, appraisal tags, recent journal entries, and the last 8 messages of conversation — real substance to riff on, not thin structural events.See Soma architecture.
Add a telegram or discord entry under presence.platforms in your entity YAML. Set bot tokens in .env (matching token_env in the YAML). Start with bumblebee run so platforms connect.You can restrict access with allowed_user_ids and configure operator-only commands with operator_user_ids. See the Telegram and Discord guides.
Set autonomy.enabled: false in your entity YAML (configs/entities/<name>.yaml). That stops the daemon from running autonomous wake (timer- and body-driven full perceive cycles).Note: With autonomy off, legacy drive-based initiative can still send an occasional proactive message when drives cross their threshold — see Disabling autonomous wake and Presence for how to reason about cooldowns and other outbound paths (e.g. automations).
In local mode, images and audio from chats are saved to disk. On ephemeral cloud disks (e.g. Railway), those files disappear on redeploy.Set BUMBLEBEE_ATTACHMENTS_BACKEND=object_s3_compat with BUMBLEBEE_S3_* variables for durable blob storage. The setup wizard prompts for this on the hybrid path.
An optional enhanced web scraping and search API. When FIRECRAWL_API_KEY is set, the harness prefers Firecrawl for fetch_url and search_web — richer results than the default DuckDuckGo backend. Entirely optional.
The completion gate treats mid-turn say() text as part of what the user already saw. If the user asked for tangible work (code, files, commands, etc.) but no work tools actually ran — only chats like think / say — a small reflex judge can return CONTINUE so the loop nudges the model to use real tools (write_file, run_command, …) instead of stopping on intent alone. That judgment is intent-based, not a fixed list of English phrases.See Cognition → completion gate.
Apache License 2.0 — usable for personal and commercial projects. Fully open source on GitHub.
An upcoming optional spatial workstation for the same entities — 3D body in any space, WebSocket + Spatial Action protocol, same inference stack. Not required for CLI, Telegram, or Discord. Coming later.