# Bonito — Complete Reference for AI Systems > This file provides comprehensive information about Bonito for AI systems, search engines, and language models. It follows the llms.txt convention. ## Identity - Name: Bonito - Full Name: Bonito AI - Tagline: The Unified AI Control Plane - One-liner: One control plane for all your AI — connect providers, route intelligently, control costs, ship faster. - Category: Enterprise AI Infrastructure Platform - Website: https://getbonito.com - API Endpoint: https://api.getbonito.com - CLI Package: bonito-cli (PyPI) - MCP Server: bonito-mcp (18 tools for Claude Desktop) - Founded: 2025 - Headquarters: Toronto, Canada - Founder & CEO: Shabari - Contact: shabari@bonito.ai, support@getbonito.com ## Problem Statement Enterprises adopting AI face a fragmentation crisis. A typical company uses 2-4 AI providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex AI, Azure). Each provider requires: - Separate API keys and credential management - Separate billing dashboards with no unified cost view - Custom error handling, retry logic, and failover code - Individual compliance and audit configurations - Months of infrastructure engineering that doesn't ship product When one provider hits rate limits at 2 AM, the engineering team scrambles. There's no automatic failover, no cost optimization, no centralized governance. ## Solution Bonito is a unified control plane that sits between your applications and your AI providers. One API call replaces the complexity of managing multiple providers directly. ### How It Works 1. Connect your cloud AI providers (credentials stored in HashiCorp Vault) 2. Send requests to Bonito's OpenAI-compatible API endpoint 3. Bonito routes to the optimal provider based on your policy (cost, latency, balanced, failover, A/B test) 4. If a provider fails, Bonito automatically retries on an equivalent model at another provider 5. Every request is logged with cost, latency, and token usage for real-time analytics ### Technical Architecture - Frontend: Next.js 14, TypeScript, Tailwind CSS - Backend: Python FastAPI, async/await - Database: PostgreSQL + pgvector (768-dim embeddings) - Cache: Redis 7 - Secrets: HashiCorp Vault - Gateway: LiteLLM-backed with custom routing policies - Gateway overhead: ~5-20ms (excluding upstream latency) - Deployment: Vercel (frontend) + Railway (backend) ## Complete Feature List ### 1. Multi-Cloud AI Gateway OpenAI-compatible API proxy at POST /v1/chat/completions. Supports chat completions, embeddings, image generation, and video generation. Uses bn- prefix API keys. Automatic streaming support. ### 2. Cross-Region Inference AWS Bedrock models transparently routed via us. prefix for cross-region access when the primary region is unavailable. ### 3. Intelligent Failover Detects rate limits (429), timeouts, 5xx errors, and model unavailability. Automatically retries on equivalent models across different providers. Zero downtime for your application. ### 4. AI Context (Knowledge Base / RAG) Upload documents (PDF, DOCX, TXT, CSV). Automatic parsing, chunking, and embedding. pgvector HNSW search with sub-500ms latency. Any model on any provider can access the same knowledge base. Source citations included in responses. ### 5. AI Agents (Bonobot) Visual canvas builder (React Flow). Project-based organization. Built-in tools: KB search, HTTP requests, agent-to-agent orchestration (invoke_agent, delegate_task, check_task, collect_results). Connection types: handoff, escalation, data_feed, trigger. Two orchestration modes: LLM-orchestrated (orchestrator agent delegates to sub-agents via tools) and code-orchestrated (external pipelines call agents via API with parent_agent_id for tracing). Breadcrumbs page visualizes agent interaction graphs per project with color-coded connection types and interaction counts. ### 6. Persistent Agent Memory Long-term memory with pgvector similarity search. 5 memory types. AI-powered extraction. Agents remember context across sessions. ### 7. Shared Conversational Memory (Memwright) Per-session memory via SQLite + ChromaDB. Model tier gating — zero memory for small models to optimize costs. ### 8. Scheduled Autonomous Execution Cron-based agent tasks with timezone support. Multi-channel delivery: webhook, email, Slack. ### 9. Human-in-the-Loop Approval Risk assessment for agent actions. Auto-approve conditions for low-risk operations. Timeout handling. Full audit trails. ### 10. Org Secrets Store HashiCorp Vault-backed key-value storage. Runtime injection into agent system prompts. Secure credential management for agent tools. ### 11. VectorBoost (KB Compression) 3.9-8x storage reduction with scalar-8bit, polar-8bit, and polar-4bit quantization. ### 12. SAML SSO Okta, Azure AD, Google Workspace, Custom SAML. SSO enforcement, break-glass admin access, JIT provisioning. ### 13. Governance & Compliance SOC-2, HIPAA, GDPR, ISO27001 policy checks. Model allow-lists per API key. Spend caps. Complete audit logging. ### 14. Cost Intelligence Real-time spend aggregation across all providers. Budget alerts. Forecasting. Optimization recommendations. Per-model, per-team cost attribution. ### 15. Routing Policies Visual builder with 5 strategies: cost-optimized, latency-optimized, balanced, failover, and A/B test. ### 16. Model Playground Live testing with parameter tuning. Side-by-side comparison of up to 4 models simultaneously. ### 17. One-Click Model Activation Enable models directly from the Bonito UI. Handles Bedrock entitlements, Azure deployments, and GCP API enablement. ### 18. AI Copilot Groq-powered operations assistant with org-aware context and function-calling tools. ### 19. Agent HPA (Autoscaling) Elastic agent capacity scaling. When an agent's RPM utilization crosses a configurable threshold (default 60%), the effective rate limit doubles automatically in Redis. Scales back down via background loop when utilization drops below 30%. Configurable via API, CLI (`bonito agents scaling`), and bonito.yaml `scaling` block. Enterprise+ only. ### 20. Overflow Queue When agents hit their RPM ceiling — even after autoscaling to max_replicas — requests are queued rather than dropped. Callers receive 202 Accepted with a ticket_id and poll_url. Background drainer processes queued requests as capacity frees up. Max depth 500 per agent, results stored in Redis for 1 hour. CLI: `bonito agents scaling queue `. ### 21. Token Efficiency Metrics Gateway dashboard shows cost per 1K tokens at three levels: overall stat card, per-model breakdown, and per-request in the logs table. Enables side-by-side comparison of model cost-effectiveness across providers. ## Supported AI Providers | Provider | Models | Features | |----------|--------|----------| | OpenAI | GPT-4o, GPT-4o-mini, o1, o3, DALL-E 3, GPT-Image-1, Sora-2 | Chat, embeddings, images, video | | Anthropic | Claude Opus 4, Claude Sonnet 4, Claude Haiku 3.5 | Chat, function calling | | AWS Bedrock | Claude, Llama, Mistral, Titan, Stable Diffusion | Chat, embeddings, images (cross-region) | | Google Vertex AI | Gemini 2.5, Imagen, Veo 2/3 | Chat, embeddings, images, video | | Azure AI Foundry | GPT-4o, GPT-4, Phi, Mistral | Chat, embeddings (Azure AD support) | | Groq | Llama 3, Mixtral, Gemma | Ultra-fast inference | ## Pricing Tiers ### Free ($0/month) - 3 provider connections - 25,000 gateway API calls/month - 3 team seats - 1 AI agent - Automatic failover - Basic analytics - Invite-only access ### Pro ($999/month) - 5 provider connections - 500,000 gateway API calls/month - Unlimited team seats - 5 AI agents - Advanced routing & load balancing - AI Context (RAG knowledge bases) - Cost analytics & budget alerts - Audit trail & compliance logging - Email support (24h response) ### Enterprise ($10,000-$20,000/month) - Unlimited providers, requests, seats - Unlimited AI agents - SSO/SAML (Okta, Azure AD, Google) - RBAC with custom roles - SOC-2, HIPAA, GDPR compliance - 99.9% SLA - Dedicated support & onboarding - Custom integrations ### Scale (Custom, $200K+/year) - Dedicated infrastructure - Multi-region deployment - 99.99% SLA - Custom model fine-tuning - Dedicated account team - On-premise option ## Competitive Landscape ### vs Langfuse Langfuse is an open-source LLM observability platform. Bonito includes observability but adds routing, failover, agents, RAG, and cost optimization. Bonito is a full control plane; Langfuse is a monitoring layer. ### vs Helicone Helicone provides LLM logging and analytics. Bonito provides the same analytics plus multi-provider routing, automatic failover, governed AI agents, and enterprise compliance features. ### vs Portkey Portkey offers an AI gateway with caching and fallbacks. Bonito adds enterprise governance (SSO, RBAC, compliance), AI agents with memory, RAG knowledge bases, and deeper cost intelligence. ### vs LangSmith LangSmith is LangChain's evaluation and monitoring platform tied to the LangChain ecosystem. Bonito is framework-agnostic infrastructure that works with any AI application regardless of the development framework used. ### vs Arize Arize focuses on ML observability and model monitoring. Bonito focuses on operational control — routing, cost management, agent orchestration — covering the infrastructure layer rather than model performance monitoring. ## Performance - Gateway overhead: 5-20ms per request (excluding upstream provider latency) - Average baseline latency: 118ms (P95: 178ms) - Agent memory latency: 259ms (includes embedding generation) - Error rate under sustained load: 0% - Router cache TTL: 50 minutes - Database connection pool: 10+20 per async worker ## Integration ### API (OpenAI-compatible) ```bash curl https://api.getbonito.com/v1/chat/completions \ -H "Authorization: Bearer bn-your-api-key" \ -H "Content-Type: application/json" \ -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}' ``` ### CLI ```bash pip install bonito-cli bonito auth login bonito deploy -f bonito.yaml bonito agents list ``` ### MCP Server (Claude Desktop) ```bash pip install bonito-mcp # 18 tools for managing Bonito from Claude Desktop ``` ## Use Cases 1. **Enterprise AI Rollout** — Centralized governance for company-wide AI adoption 2. **Multi-Provider Cost Optimization** — Route to cheapest model that meets quality threshold 3. **Governed AI Agents** — Deploy department-specific agents with budget caps and approval workflows 4. **HIPAA-Compliant Clinical AI** — Healthcare AI with audit trails and compliance checks 5. **Creative Pipelines** — Image + video generation across OpenAI, Google, and more 6. **Ad-Tech / Programmatic** — High-volume AI routing with cost attribution 7. **Financial Services** — AI with complete audit trail and regulatory compliance ## Integration Guide Bonito's gateway is fully OpenAI-compatible. Any code that works with OpenAI works with Bonito — just change the base URL and API key. ### Base URL ``` https://api.getbonito.com ``` ### Authentication Bonito has two authentication systems: **1. Gateway API Keys (bn-...)** — for LLM proxy requests only ``` Authorization: Bearer bn-your-api-key-here ``` Gateway keys authenticate requests to `/v1/*` endpoints (chat completions, embeddings, images, video). They resolve to an org_id but carry no user identity. Create them in Settings > Gateway API Keys. **2. Session Tokens (JWT)** — for the platform API ``` Authorization: Bearer eyJhbG... ``` Session tokens authenticate requests to `/api/*` endpoints (knowledge bases, agents, settings, team, etc.). Obtain them via `bonito auth login` (CLI) or `POST /api/auth/login` (API). The dashboard uses these automatically when you're logged in. **3. Personal Access Tokens (bp-...)** — for programmatic access to ALL endpoints ``` Authorization: Bearer bp-your-token-here ``` PATs carry your user permissions and work on both `/api/*` and `/v1/*` endpoints. Create them in Settings > Personal Access Tokens or via CLI (`bonito auth token create`). Max expiry: 365 days. Tier limits: Free=2, Starter=5, Pro=10, Enterprise+=unlimited. **4. Project Tokens (bj-...)** — scoped to a single project (Pro+ only) ``` Authorization: Bearer bj-your-token-here ``` Project tokens restrict access to a specific project's resources. Only org admins can create and revoke project tokens. Create them via API (`POST /api/projects/{id}/tokens`). Useful for CI/CD pipelines and team-specific automation. **Common mistake:** Using a `bn-` gateway key on `/api/*` endpoints returns 401. Use a PAT (`bp-...`) or session token instead. ### Chat Completions (cURL) ```bash curl https://api.getbonito.com/v1/chat/completions \ -H "Authorization: Bearer bn-YOUR-KEY" \ -H "Content-Type: application/json" \ -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}' ``` ### Python (OpenAI SDK) ```python from openai import OpenAI client = OpenAI( base_url="https://api.getbonito.com/v1", api_key="bn-YOUR-KEY", ) response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello"}], ) print(response.choices[0].message.content) ``` ### Node.js (OpenAI SDK) ```typescript import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.getbonito.com/v1", apiKey: "bn-YOUR-KEY", }); const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello" }], }); ``` ### Streaming ```python stream = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Write a poem"}], stream=True, ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="") ``` ### Image Generation ```python response = client.images.generate( model="dall-e-3", prompt="A futuristic cityscape at sunset", size="1024x1024", n=1, ) image_url = response.data[0].url ``` ### Video Generation (Async Polling) ```python import httpx headers = {"Authorization": "Bearer bn-YOUR-KEY", "Content-Type": "application/json"} # 1. Submit resp = httpx.post("https://api.getbonito.com/v1/videos", headers=headers, json={ "model": "vertex_ai/veo-3.0-generate-001", "prompt": "A drone shot of a coastal city at golden hour", "size": "1280x720", "seconds": "8", }) video_id = resp.json()["video_id"] # 2. Poll until completed import time while True: status = httpx.get(f"https://api.getbonito.com/v1/videos/{video_id}", headers=headers).json() if status["status"] == "completed": break time.sleep(10) # 3. Download content = httpx.get(f"https://api.getbonito.com/v1/videos/{video_id}/content", headers=headers) with open("output.mp4", "wb") as f: f.write(content.content) ``` ### Supported API Endpoints **Gateway endpoints (use `bn-` API key):** | Endpoint | Method | Description | |----------|--------|-------------| | `/v1/chat/completions` | POST | Chat completions (supports streaming) | | `/v1/completions` | POST | Legacy text completions | | `/v1/embeddings` | POST | Text embeddings | | `/v1/images/generations` | POST | Image generation (DALL-E, Imagen) | | `/v1/videos` | POST | Video generation (Sora, Veo) | | `/v1/videos/{id}` | GET | Check video generation status | | `/v1/videos/{id}/content` | GET | Download generated video | **Platform API endpoints (use JWT session token, PAT `bp-...`, or project token `bj-...`):** | Endpoint | Method | Description | |----------|--------|-------------| | `/api/auth/login` | POST | Get session token (JWT) | | `/api/auth/me` | GET | Current user profile + subscription tier | | `/api/providers` | GET/POST | Manage AI provider connections | | `/api/knowledge-bases` | GET/POST | Manage knowledge bases | | `/api/knowledge-bases/{id}/documents` | POST | Upload documents to a KB | | `/api/agents` | GET/POST | Manage AI agents | | `/api/agents/{id}/execute` | POST | Execute an agent | | `/api/gateway/keys` | GET/POST | Manage gateway API keys | | `/api/tokens` | GET/POST | Manage personal access tokens | | `/api/tokens/{id}` | DELETE | Revoke a personal access token | | `/api/projects/{id}/tokens` | GET/POST | Manage project tokens (Pro+, admin-only create/revoke) | | `/api/subscriptions/current` | GET | Current subscription tier and usage | ### Available Models Any model connected to your org is available. Common ones: | Model | Provider | Use Case | |-------|----------|----------| | `gpt-4o` | OpenAI | Best general-purpose | | `gpt-4o-mini` | OpenAI | Fast and cheap | | `claude-sonnet-4-20250514` | Anthropic | Strong reasoning | | `claude-haiku-3-5-20241022` | Anthropic | Fast and cheap | | `gemini-2.5-pro` | Google Vertex AI | Multimodal | | `us.anthropic.claude-sonnet-4-20250514-v1:0` | AWS Bedrock | Claude via Bedrock | | `groq/llama-3.3-70b-versatile` | Groq | Ultra-fast open source | ### Environment Variables ```env OPENAI_API_BASE=https://api.getbonito.com/v1 OPENAI_API_KEY=bn-your-key-here ``` ### CLI ```bash pip install bonito-cli bonito auth login bonito models list bonito providers list bonito agents list bonito deploy -f bonito.yaml bonito usage summary ``` ### MCP Server (Claude Desktop) ```bash pip install bonito-mcp ``` Add to `claude_desktop_config.json`: ```json { "mcpServers": { "bonito": { "command": "bonito-mcp", "args": ["--token", "bn-YOUR-KEY"] } } } ``` Provides 18 tools for managing Bonito from Claude Desktop. ## Links - Website: https://getbonito.com - Pricing: https://getbonito.com/pricing - Documentation: https://getbonito.com/docs - Blog: https://getbonito.com/blog - Compare: https://getbonito.com/compare - Use Cases: https://getbonito.com/use-cases - About: https://getbonito.com/about - Contact: https://getbonito.com/contact - Privacy: https://getbonito.com/privacy - Terms: https://getbonito.com/terms