Bonito Docs
Everything you need to connect your cloud AI providers, deploy models, and route requests through a single gateway.
Getting Started
Bonito is a unified AI gateway that connects your AI providers (AWS Bedrock, Azure OpenAI, Google Cloud Vertex AI, OpenAI Direct, Anthropic Direct, and Groq) and lets you manage all your models from a single dashboard. You get one API endpoint, one place to track costs, and one control plane for your entire AI stack.
Quick start (5 minutes)
- 1Sign up at getbonito.com/register — one account covers your entire organization.
- 2Go to Providers → Add Provider and connect at least one cloud provider (AWS, Azure, or GCP).
- 3Bonito validates your credentials and syncs all available models automatically.
- 4Enable the models you want — click Enable on any model or use bulk activation for up to 20 at once.
- 5Go to Gateway → Create Key to generate an API key.
- 6Point any OpenAI-compatible SDK at gateway.getbonito.com/v1 with your new key.
# Make your first request through Bonito
curl -X POST https://getbonito.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_BONITO_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic.claude-3-sonnet-20240229-v1:0",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 256
}'Provider Setup
Connect your cloud provider accounts so Bonito can discover your available models, route requests, and track costs. Each provider requires different credentials.
AWS Bedrock
To connect AWS, you need an IAM Access Key ID and Secret Access Key. Bonito validates them using STS and checks Bedrock permissions automatically.
| Field | Description |
|---|---|
| Access Key ID | Your IAM access key |
| Secret Access Key | Your IAM secret key |
Azure OpenAI
Azure requires a service principal with access to your Azure OpenAI resource. The endpoint must be a custom-subdomain URL (e.g., https://your-resource.openai.azure.com/), not a generic regional endpoint.
| Field | Description |
|---|---|
| Tenant ID | Azure AD tenant ID |
| Client ID | Service principal application ID |
| Client Secret | Service principal secret |
| Subscription ID | Your Azure subscription |
| Resource Group | Resource group with your OpenAI resource |
| Endpoint URL | Custom subdomain endpoint URL |
https://eastus.api.cognitive.microsoft.com/ will not work. You must use an Azure OpenAI resource with a custom subdomain.Google Cloud (Vertex AI)
GCP requires your Project ID and a Service Account JSON key file. Paste the entire JSON contents — Bonito validates the format in the browser before sending.
| Field | Description |
|---|---|
| Project ID | Your GCP project ID |
| Service Account JSON | Full JSON key file contents |
OpenAI Direct
For teams using OpenAI directly (not through Azure). Connect with just an API key to access GPT-4o, GPT-4o mini, and other OpenAI models.
| Field | Description |
|---|---|
| API Key | Your API key from platform.openai.com |
Anthropic Direct
For teams that want to use Claude models directly through Anthropic without going through AWS Bedrock. Connect with just an API key.
| Field | Description |
|---|---|
| API Key | Your API key from console.anthropic.com |
Groq
Ultra-fast inference for open-source models like Llama 3.3 and Mixtral. Groq's LPU hardware delivers the lowest latency in the industry. Connect with just an API key.
| Field | Description |
|---|---|
| API Key | Your API key from console.groq.com |
Permissions & IAM
Bonito supports two IAM setup modes for every provider. Choose based on your security requirements.
Quick Start
Attach a single managed role with broad permissions. Fast to set up, ideal for evaluation and testing.
Enterprise (Recommended)
Separate least-privilege policies per capability. Only grant the exact permissions each feature needs.
AWS Bedrock permissions
In Enterprise mode, each capability has its own policy so you only grant what you need:
| Policy | Actions | Required? |
|---|---|---|
| Core | ListFoundationModels, GetFoundationModel, InvokeModel, InvokeModelWithResponseStream, sts:GetCallerIdentity | Always |
| Provisioning | Create/Get/Update/Delete/ListProvisionedModelThroughput | If deploying reserved capacity |
| Model Activation | PutFoundationModelEntitlement | If enabling models from Bonito UI |
| Cost Tracking | ce:GetCostAndUsage, GetCostForecast, GetDimensionValues, GetTags | If you want spend visibility |
Example IAM policy (core only — minimum to get started):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:ListFoundationModels",
"bedrock:GetFoundationModel",
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": ["sts:GetCallerIdentity"],
"Resource": "*"
}
]
}Azure permissions
Quick Start: Assign Cognitive Services Contributor on the Azure OpenAI resource.
Enterprise: Create a custom role with only the exact permissions Bonito uses — account read, deployments read/write/delete, models read, and inference actions.
az role definition create --role-definition '{
"Name": "Bonito AI Operator",
"Actions": [
"Microsoft.CognitiveServices/accounts/read",
"Microsoft.CognitiveServices/accounts/deployments/read",
"Microsoft.CognitiveServices/accounts/deployments/write",
"Microsoft.CognitiveServices/accounts/deployments/delete",
"Microsoft.CognitiveServices/accounts/models/read"
],
"DataActions": [
"Microsoft.CognitiveServices/accounts/OpenAI/deployments/chat/completions/action",
"Microsoft.CognitiveServices/accounts/OpenAI/deployments/completions/action",
"Microsoft.CognitiveServices/accounts/OpenAI/deployments/embeddings/action"
],
"AssignableScopes": ["/subscriptions/YOUR_SUBSCRIPTION_ID"]
}'Optionally add Cost Management Reader at subscription scope for spend visibility.
GCP permissions
Quick Start: Assign roles/aiplatform.user to the service account.
Enterprise: Create a custom role with discovery (publishers.get, publisherModels.get), invocation (endpoints.predict), endpoint management (create/get/list/update/delete/deploy/undeploy), model metadata (models.list, models.get), and project validation (resourcemanager.projects.get).
iam_mode = "least_privilege" for enterprise or "managed" for quick start.Model Management
Once a provider is connected, Bonito automatically syncs all available models. You can view, search, filter, and enable models from a single catalog.
One-click model activation
Models with a 🔒 icon exist in your provider's catalog but aren't yet enabled in your cloud account. Instead of switching to each provider's console, enable them directly from Bonito:
- 1Go to the Models page and find the model you want to enable.
- 2Click the Enable button on the model card.
- 3Bonito handles the provider-specific activation (Bedrock entitlements, Azure deployments, GCP API enablement).
- 4Some models may require approval from the provider and won't activate instantly.
Playground
Test any enabled chat model directly in the browser. The Playground supports single-model chat and side-by-side comparison mode (up to 4 models). Token usage and cost appear after each response. Only chat-capable, enabled models are shown in the picker.
Deployments
Deploy AI models directly into your cloud from the Bonito UI — no console-hopping required. Bonito creates real deployments in your cloud account.
| Provider | Deployment Type | What Bonito Creates |
|---|---|---|
| AWS Bedrock | On-demand or Provisioned Throughput | On-demand: validates access. PT: creates reserved capacity with commitment (1 week–6 months) |
| Azure OpenAI | Model deployment with TPM capacity | Creates a deployment on your Azure OpenAI resource (Standard or GlobalStandard tier) |
| GCP Vertex AI | Serverless (no provisioning needed) | Verifies access — GCP models are serverless by default |
bedrock:CreateProvisionedModelThroughput IAM permission.Gateway API
Bonito provides an OpenAI-compatible API endpoint so you can use any connected model with tools that support the OpenAI format. One API key, all your providers.
Endpoint
POST https://getbonito.com/v1/chat/completions
Authentication
Generate API keys from the Gateway page in the dashboard. Include your key in the Authorization header:
Authorization: Bearer YOUR_BONITO_API_KEY
Example: Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
base_url="https://getbonito.com/v1",
api_key="YOUR_BONITO_API_KEY"
)
response = client.chat.completions.create(
model="anthropic.claude-3-sonnet-20240229-v1:0",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)Example: curl
curl https://getbonito.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_BONITO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic.claude-3-sonnet-20240229-v1:0",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 256
}'Model names
Use the provider-native model IDs shown on the Models page in Bonito. For example: anthropic.claude-3-sonnet-20240229-v1:0 for AWS Bedrock, gpt-4o for Azure, gemini-1.5-pro for GCP.
When using a routing policy, pass the policy name as the model field instead of a specific model ID.
Routing Policies
Routing policies let you automatically select the best model for each request based on your priorities. Create policies from Routing → Create Policy in the dashboard.
Cost-Optimized
Automatically selects the cheapest capable model for each request. Route routine traffic to economy models and save 40–70% versus using a single premium model for everything.
Failover Chain
Define a primary model and one or more fallbacks. If the primary fails or is unavailable, Bonito automatically tries the next model in the chain. Great for high-availability use cases.
A/B Testing
Split traffic between models using percentage weights (must sum to 100). Test new models in production with controlled rollout — e.g., send 90% to your current model and 10% to a new one.
Cross-Region Inference
Bonito automatically creates cross-region inference profiles for AWS Bedrock models. If us-east-1 is overloaded or experiencing downtime, requests seamlessly route to us-west-2 or eu-west-1 without any action on your part.
How it works
- Bonito handles the us. prefix routing at the gateway level, so cross-region inference profiles are created and managed automatically.
- No configuration needed — just connect your AWS provider and Bonito takes care of the rest.
- If a primary region is throttled or returns errors, requests are retried in an alternate region transparently.
True high-availability
Cross-region inference combines with multi-provider failover for true high-availability. For example, if Claude on Bedrock fails across all regions, Bonito can fall back to Anthropic Direct automatically. No other platform (Portkey, LiteLLM, Helicone) offers automatic cross-region inference profiles with intelligent failover built in.
Model Aliases
Bonito supports shorthand model aliases that resolve to provider-specific versioned model IDs. This means you can reference models by simple names and switch providers without changing your code.
How aliases work
When you send a request with an alias like claude-sonnet, Bonito resolves it to the correct model ID for your active provider. On Bedrock, that becomes anthropic.claude-3-sonnet-20240229-v1:0. On Anthropic Direct, it becomes claude-3-sonnet-20240229.
- Switch providers without changing a single line of code in your application.
- Aliases are updated automatically when new model versions are released.
- Works with routing policies — aliases resolve before routing rules are applied.
from openai import OpenAI
client = OpenAI(
base_url="https://getbonito.com/v1",
api_key="YOUR_BONITO_API_KEY"
)
# Use an alias instead of a provider-specific model ID
response = client.chat.completions.create(
model="claude-sonnet", # Resolves automatically
messages=[{"role": "user", "content": "Hello!"}]
)Notifications
Bonito sends in-app notifications for important events across the platform so you never miss a deployment status change or cost alert.
Notification types
- Deployment lifecycle — creation, scaling, completion, and failure alerts for deployments across all providers.
- Spend alerts — get notified when costs approach or exceed your configured budget thresholds.
- Model activation — confirmation when models are enabled or if activation requires provider approval.
- Provider health — alerts when a provider connection has issues or needs credential rotation.
The notification bell in the dashboard header shows your unread count. Click to see the full list with read/unread states. You can configure alert rules for budget thresholds with email and in-app delivery preferences.
Cost Management
Monitor AI spending across all connected providers from a single dashboard. Bonito pulls real cost data from your cloud accounts and shows breakdowns by model, provider, and time period.
What you get
- Aggregated costs across AWS, Azure, and GCP with daily/weekly/monthly views.
- Cost forecast with projected spending and confidence bounds.
- Per-model and per-provider breakdowns to identify expensive workloads.
- Budget alerts — set thresholds and get notified before you exceed them.
- Optimization recommendations — Bonito suggests cheaper model alternatives and cross-provider routing opportunities.
CLI Tool
bonito-cli is a Python CLI for managing your Bonito resources from the terminal. It's useful for scripting, CI/CD pipelines, and terminal-first workflows.
Installation
pip install bonito-cli
Authentication
# Login with your Bonito credentials bonito auth login # Or set your API key directly export BONITO_API_KEY=your-key-here
Common commands
# List connected providers bonito providers list # List available models bonito models list # Create a gateway API key bonito gateway keys create --name "my-key" # List routing policies bonito routing list # Check costs bonito costs summary
Run bonito --help for the full list of commands and options.
Integrations

Claude Cowork is Anthropic's agentic desktop application. The Bonito plugin gives Claude deep knowledge of your AI infrastructure, so you can deploy providers, create agents, configure routing, and analyze costs through natural conversation.
Install the Plugin
Install from the Claude plugin marketplace or via Claude Code:
claude plugin install bonito
What the Plugin Adds
6 domain skills that Claude draws on automatically when relevant:
- deploy-stack: Deploy infrastructure from a bonito.yaml config
- manage-providers: Connect and manage cloud AI providers
- create-agent: Create BonBon agents and Bonobot orchestrators
- gateway-routing: Configure failover, cost-optimized routing, A/B testing
- cost-analysis: Analyze spending and recommend optimizations
- debug-issues: Troubleshoot gateway, provider, and agent problems
MCP Server
The plugin connects to the Bonito MCP server, which exposes 18 tools for direct API access. Install via PyPI or Docker:
pip install bonito-mcp
Or run with Docker:
docker run -e BONITO_API_KEY=your-key -p 8080:8080 bonitoai/mcp-server
Claude Desktop Configuration
Add this to your Claude Desktop MCP config:
{
"mcpServers": {
"bonito": {
"command": "bonito-mcp",
"env": {
"BONITO_API_KEY": "your-bonito-api-key"
}
}
}
}Learn More
Declarative Config (bonito.yaml)
Define your entire AI stack in a single bonito.yaml file. Providers, agents, MCP servers, knowledge bases, routing, and triggers — all in one place. Deploy with a single command.
bonito deploy -f bonito.yaml
Example: AWS Bedrock stack
A complete stack using AWS Bedrock with two agents and a knowledge base:
version: "1"
name: my-ai-stack
gateway:
providers:
- name: aws
priority: 1
models:
- anthropic.claude-3-sonnet-20240229-v1:0
- amazon.nova-pro-v1:0
- amazon.titan-embed-text-v2:0
region: us-east-1
access_key: ${AWS_ACCESS_KEY_ID}
secret_key: ${AWS_SECRET_ACCESS_KEY}
routing:
strategy: cost-optimized
fallback: true
agents:
support-bot:
type: bonbon
mode: simple
display_name: Support Agent
model:
primary: anthropic.claude-3-sonnet-20240229-v1:0
fallback: amazon.nova-pro-v1:0
system_prompt: |
You are a helpful support agent...
rag:
knowledge_base: company-docs
code-reviewer:
type: bonbon
mode: advanced
display_name: Code Reviewer
model:
primary: anthropic.claude-3-sonnet-20240229-v1:0
mcp_servers:
- github
knowledge_bases:
company-docs:
description: Internal documentation
sources:
- type: directory
path: ./docs/
glob: "**/*.md"
embedding:
model: amazon.titan-embed-text-v2:0
provider: awsExample: Multi-provider stack
A stack using Groq, Anthropic, OpenAI, and GCP with multi-agent orchestration:
version: "1"
name: multi-provider-stack
gateway:
providers:
- name: groq
priority: 1
models: [llama-3.3-70b-versatile, mixtral-8x7b-32768]
api_key: ${GROQ_API_KEY}
- name: anthropic
priority: 1
models: [claude-sonnet-4-20250514]
api_key: ${ANTHROPIC_API_KEY}
- name: openai
priority: 2
models: [gpt-4o, gpt-4o-mini]
api_key: ${OPENAI_API_KEY}
- name: gcp
priority: 2
models: [gemini-1.5-pro, gemini-1.5-flash]
project_id: ${GCP_PROJECT_ID}
service_account: ${GCP_SERVICE_ACCOUNT_JSON}
routing:
strategy: cost-optimized
fallback: true
retry_attempts: 2
agents:
orchestrator:
type: bonobot
display_name: Operations Center
model:
primary: claude-sonnet-4-20250514
fallback: gpt-4o
delegates:
- agent: fast-responder
domains: [triage, alerts, summaries]
- agent: deep-analyst
domains: [code-review, analysis, research]
fast-responder:
type: bonbon
mode: simple
model:
primary: groq/llama-3.3-70b-versatile
system_prompt: |
You are a fast-response agent for triage...
deep-analyst:
type: bonbon
mode: advanced
model:
primary: claude-sonnet-4-20250514
fallback: gpt-4o
mcp_servers:
- github
- jira${AWS_ACCESS_KEY_ID} to keep secrets out of your config file. Bonito resolves them at deploy time from your environment or a .env file.BonBon Agents
BonBon is Bonito's managed agent service. Create AI agents with custom system prompts, connect them to knowledge bases for RAG, and deploy them as embeddable chat widgets or API endpoints — all without managing infrastructure.
Agent tiers
Simple — $49/mo
Pre-built agent with a system prompt, optional knowledge base, and an embeddable widget. Ideal for FAQ bots, customer support, and internal assistants. Deploy in minutes.
Advanced — $99/mo
Agent with MCP tool integration, multiple knowledge bases, webhook triggers, and custom workflows. Built for agents that need to interact with external systems.
Creating an agent
- 1Go to Agents → Create Agent in the dashboard.
- 2Choose a tier (Simple or Advanced) and give your agent a name.
- 3Write a system prompt that defines your agent's personality, constraints, and behavior.
- 4Optionally attach a knowledge base for RAG-powered responses.
- 5Select the backing model (or models for Advanced tier).
- 6Deploy — Bonito gives you a widget embed code and an API endpoint.
# Create an agent via CLI bonito agents create \ --name "Support Bot" \ --tier simple \ --model "anthropic.claude-3-sonnet-20240229-v1:0" \ --system-prompt "You are a helpful support agent for Acme Corp..." # List your agents bonito agents list # Get agent details bonito agents get --id ag_abc123
System prompts
System prompts define how your agent behaves. Write clear instructions about the agent's role, tone, constraints, and what it should or shouldn't do. You can update the system prompt at any time without redeploying.
You are a customer support agent for Acme Corp. Rules: - Only answer questions about Acme products and services. - If you don't know the answer, say so and offer to connect with a human. - Be friendly, concise, and professional. - Never make up pricing or feature information — use the knowledge base. - Respond in the same language as the customer.
RAG integration
Connect a knowledge base to give your agent access to your documents. When a user asks a question, Bonito retrieves relevant chunks from your KB and includes them in the context before the model generates a response.
# Attach a knowledge base to an agent bonito agents update --id ag_abc123 \ --knowledge-base kb_xyz789 # Attach multiple KBs (Advanced tier only) bonito agents update --id ag_abc123 \ --knowledge-base kb_xyz789 \ --knowledge-base kb_docs456
Widget embedding
Every BonBon agent gets an embeddable chat widget. Add it to any website with a single script tag:
<!-- Add to your website --> <script src="https://getbonito.com/widget.js" data-agent-id="ag_abc123" data-theme="dark" data-position="bottom-right" async ></script>
POST /v1/agents/ag_abc123/chat with the same OpenAI-compatible format.Bonobot Orchestrator
Bonobot is Bonito's multi-agent orchestration layer. It acts as a front-door agent that classifies user intent, delegates to specialized sub-agents, and synthesizes their responses into a unified reply. Think of it as a dispatcher that routes conversations to the right expert.
How it works
- 1A user sends a message to the Bonobot endpoint.
- 2The orchestrator classifies the user's intent using a fast classification model.
- 3Based on the intent, it delegates the request to one or more specialized BonBon agents.
- 4Each sub-agent processes its part using its own system prompt, model, and knowledge base.
- 5Bonobot synthesizes the responses and returns a single, coherent answer.
Delegation map
The delegation map defines which sub-agents handle which intents. Configure it as a JSON mapping of intent patterns to agent IDs:
{
"delegation_map": {
"billing": {
"agent_id": "ag_billing01",
"description": "Handles billing, invoices, and payment questions"
},
"technical_support": {
"agent_id": "ag_techsup01",
"description": "Handles technical issues, bugs, and troubleshooting"
},
"sales": {
"agent_id": "ag_sales01",
"description": "Handles pricing, demos, and feature inquiries"
},
"general": {
"agent_id": "ag_general01",
"description": "Fallback for anything that doesn't match a specific intent"
}
}
}Creating a Bonobot
# Create an orchestrator bonito bonobot create \ --name "Customer Hub" \ --classifier-model "anthropic.claude-3-haiku-20240307-v1:0" \ --delegation-map ./delegation.json # Update the delegation map bonito bonobot update --id bot_abc123 \ --delegation-map ./updated-delegation.json # Test intent classification bonito bonobot classify --id bot_abc123 \ --message "I need a refund for my last invoice"
Response synthesis
When a request touches multiple intents (e.g., "I want to upgrade my plan and fix a bug"), Bonobot can delegate to multiple agents in parallel and merge their responses. Enable multi-delegation in the orchestrator settings:
{
"multi_delegation": true,
"synthesis_model": "anthropic.claude-3-sonnet-20240229-v1:0",
"synthesis_prompt": "Combine the following specialist responses into a single, coherent answer."
}Code Review
Bonito's GitHub App provides AI-powered code review on pull requests. Install it from the Bonito dashboard, connect your GitHub repos, and get automatic reviews on every PR.
What it does
- Automatically reviews PRs for security vulnerabilities, performance issues, and code quality.
- Posts structured findings directly as PR comments with clear explanations and suggestions.
- Multiple review personas available — default professional tone, or fun characters for a lighter touch.
- Configure which repos and branches trigger automatic reviews from the dashboard.
Getting started
- 1Go to Code Review in the Bonito dashboard.
- 2Click Install GitHub App and authorize access to your repositories.
- 3Select which repos and branches should trigger automatic reviews.
- 4Open a pull request — Bonito reviews it and posts comments within minutes.
MCP Integration
MCP (Model Context Protocol) lets your BonBon agents call external tools — databases, APIs, code execution, file systems, and more. Register MCP servers with Bonito and connect them to your agents so they can take actions, not just answer questions.
Supported transports
SSE (Server-Sent Events)
Connect to remote MCP servers over HTTP. Best for hosted tools, third-party integrations, and production deployments.
stdio (Standard I/O)
Run MCP servers as local subprocesses. Best for development, local tools, and self-hosted servers.
Registering an MCP server
{
"name": "github-tools",
"transport": "sse",
"url": "https://mcp.example.com/github/sse",
"headers": {
"Authorization": "Bearer ghp_xxxxxxxxxxxx"
},
"tools": ["create_issue", "search_repos", "get_pull_request"]
}# Register via CLI
bonito mcp register \
--name "github-tools" \
--transport sse \
--url "https://mcp.example.com/github/sse"
# List registered MCP servers
bonito mcp list
# Test a tool call
bonito mcp test --server "github-tools" --tool "search_repos" \
--params '{"query": "bonito"}'stdio configuration
For stdio-based MCP servers, provide the command and arguments to launch the server process:
{
"name": "sqlite-tools",
"transport": "stdio",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-sqlite", "./data/mydb.sqlite"],
"env": {
"NODE_ENV": "production"
}
}Connecting MCP to agents
Once registered, attach MCP servers to your BonBon agents. The agent's model will automatically see the available tools and can call them during conversations:
# Connect MCP server to an agent bonito agents update --id ag_abc123 \ --mcp-server "github-tools" \ --mcp-server "sqlite-tools" # Remove an MCP server from an agent bonito agents update --id ag_abc123 \ --remove-mcp-server "sqlite-tools"
Knowledge Bases
Knowledge Bases power RAG (Retrieval-Augmented Generation) for your BonBon agents. Upload documents, and Bonito chunks, embeds, and indexes them so your agents can retrieve relevant context at query time.
Creating a knowledge base
- 1Go to Knowledge Bases → Create in the dashboard, or use the CLI.
- 2Give it a name and optional description.
- 3Choose an embedding model (defaults to a high-quality model on your connected providers).
- 4Configure chunking strategy (size, overlap).
- 5Upload your documents.
# Create a knowledge base bonito kb create --name "Product Docs" \ --embedding-model "amazon.titan-embed-text-v2:0" \ --chunk-size 512 \ --chunk-overlap 50 # Upload documents bonito kb upload --id kb_xyz789 ./docs/*.pdf bonito kb upload --id kb_xyz789 ./faq.md bonito kb upload --id kb_xyz789 https://example.com/api-reference.html # Check indexing status bonito kb status --id kb_xyz789
Supported file formats
Bonito accepts PDF, Markdown, plain text, HTML, DOCX, and CSV files. Each file is parsed, split into chunks, and embedded using your chosen embedding model.
Chunking strategies
| Strategy | Description | Best For |
|---|---|---|
| fixed | Split by token count with configurable overlap | General purpose, predictable chunk sizes |
| semantic | Split at natural boundaries (paragraphs, sections) | Long-form documents, articles |
| sentence | Split by sentence with grouping | FAQ, short-form content |
Embedding models
Bonito uses embedding models from your connected providers. Any enabled embedding model can be used:
- AWS Bedrock: amazon.titan-embed-text-v2:0, cohere.embed-english-v3
- Azure OpenAI: text-embedding-ada-002, text-embedding-3-small, text-embedding-3-large
- GCP Vertex AI: textembedding-gecko, text-embedding-004
Querying
You can query a knowledge base directly to test retrieval before connecting it to an agent:
# Query a knowledge base directly bonito kb query --id kb_xyz789 \ --query "What is the refund policy?" \ --top-k 5 # Connect KB to an agent (see BonBon Agents section) bonito agents update --id ag_abc123 \ --knowledge-base kb_xyz789
Managed Inference
Managed Inference gives you zero-config access to AI models without connecting any cloud provider. No API keys, no cloud accounts, no setup — just start making requests. Bonito handles provider selection, routing, and billing.
How it works
- 1Sign up for Bonito and create a gateway API key — that's it.
- 2Use any supported model by name in your API requests.
- 3Bonito routes your request to the optimal provider automatically.
- 4You're billed through Bonito based on token usage — no separate cloud bills.
from openai import OpenAI
# No cloud provider setup needed — just your Bonito key
client = OpenAI(
base_url="https://getbonito.com/v1",
api_key="YOUR_BONITO_API_KEY"
)
# Use any supported model
response = client.chat.completions.create(
model="claude-3-sonnet", # Bonito routes to the best provider
messages=[{"role": "user", "content": "Explain quantum computing"}]
)
print(response.choices[0].message.content)Supported models
Managed Inference supports a curated set of popular models across providers. Use simplified model names — Bonito resolves them to provider-specific IDs:
| Model | Provider | Use Case |
|---|---|---|
| claude-3-sonnet | Anthropic | General purpose, balanced performance |
| claude-3-haiku | Anthropic | Fast, cost-effective tasks |
| gpt-4o | OpenAI | Multimodal, high performance |
| gpt-4o-mini | OpenAI | Lightweight, budget-friendly |
| gemini-1.5-pro | Long context, multimodal |
When to use Managed vs BYOC
Managed Inference
- • Quick start — no cloud setup
- • Don't want to manage API keys
- • Prototyping and development
- • Teams without cloud accounts
Bring Your Own Cloud
- • Data residency requirements
- • Existing cloud commitments/discounts
- • Full provider control
- • Enterprise compliance needs
Triggers
Triggers let you invoke BonBon agents automatically based on events — incoming webhooks, cron schedules, slash commands, or custom events. Instead of waiting for a user to open a chat widget, triggers bring your agents into workflows programmatically.
Trigger types
Webhook
HTTP endpoint that invokes your agent when called. Connect to GitHub, Stripe, Slack, or any service that sends webhooks. The request payload is passed as context to the agent.
Scheduled (Cron)
Run your agent on a schedule using cron expressions. Great for daily reports, periodic data processing, health checks, and recurring tasks.
Slash Command
Register slash commands in Slack or Discord that invoke your agent. Users type /ask-support how do I reset my password? and get an agent response inline.
Event
Trigger agents from internal Bonito events — new document indexed in a KB, agent error threshold exceeded, or cost alert fired.
Creating triggers
# Create a webhook trigger bonito triggers create \ --agent-id ag_abc123 \ --type webhook \ --name "GitHub PR Review" # Output: Webhook URL → https://getbonito.com/hooks/tr_wh_abc123 # Create a scheduled trigger (daily at 9 AM UTC) bonito triggers create \ --agent-id ag_abc123 \ --type cron \ --schedule "0 9 * * *" \ --name "Daily Summary" \ --input "Generate a summary of yesterday's support tickets" # Create a slash command trigger bonito triggers create \ --agent-id ag_abc123 \ --type slash-command \ --platform slack \ --command "/ask-support" \ --name "Slack Support" # List triggers for an agent bonito triggers list --agent-id ag_abc123
Webhook payload
When a webhook trigger fires, the HTTP request body is passed to the agent as context. You can define a template to extract specific fields:
{
"trigger_id": "tr_wh_abc123",
"payload_template": "New {{event}} from {{repository.full_name}}: {{pull_request.title}}",
"headers_to_forward": ["X-GitHub-Event"],
"secret": "whsec_xxxxxxxx"
}secret field to verify webhook signatures. Bonito validates the HMAC signature on incoming requests and rejects unverified payloads.Observability
Bonito provides built-in observability for every request flowing through the gateway and every agent interaction. Track tokens, latency, costs, and errors across your entire AI stack without any additional tooling.
Request tracing
Every API request gets a unique trace ID. View the full lifecycle of a request — from gateway receipt through provider routing to response delivery:
# View recent requests bonito logs list --limit 20 # Get details for a specific trace bonito logs get --trace-id tr_xxxxxxxxxxxx # Filter by model, status, or time range bonito logs list \ --model "claude-3-sonnet" \ --status error \ --since "2024-01-15T00:00:00Z"
What's tracked
| Metric | Description |
|---|---|
| Tokens (in/out) | Input and output token counts per request |
| Latency | Time to first token (TTFT) and total response time |
| Cost | Estimated cost per request based on provider pricing |
| Model | Which model and provider served the request |
| Status | Success, error, rate-limited, or timed-out |
| Agent | Which BonBon agent handled the request (if applicable) |
Per-agent analytics
Each BonBon agent has its own analytics dashboard showing conversation volume, average response time, token usage, cost breakdown, and error rates over time.
# Get agent analytics bonito agents analytics --id ag_abc123 \ --period 7d # Export analytics as CSV bonito agents analytics --id ag_abc123 \ --period 30d \ --format csv > agent-analytics.csv
Cost monitoring
Observability integrates with Cost Management to show real-time spend. Set budget alerts per agent, per model, or globally:
# Set a per-agent budget alert bonito alerts create \ --scope agent \ --agent-id ag_abc123 \ --threshold 500 \ --period monthly \ --notify email,in-app # Set a global daily spend alert bonito alerts create \ --scope global \ --threshold 100 \ --period daily
Troubleshooting
"Connected! Found 0 models"
Your credentials connected successfully, but model listing failed silently. Check:
- Azure: Make sure your Endpoint URL is an Azure OpenAI resource endpoint with a custom subdomain, not a generic regional endpoint. Also verify the resource group is correct.
- GCP: Ensure the Vertex AI API is enabled in your project.
- AWS: Verify your IAM user has Bedrock permissions and you're in a region where Bedrock is available.
Models showing a 🔒 lock icon
The model exists in your provider's catalog but isn't enabled. Click the Enable button in Bonito, or enable it directly in your cloud console (AWS Bedrock → Model Access, Azure → create a deployment, GCP → enable the Vertex AI API).
Playground returns a 500 error
This typically means the model isn't a chat model (embedding or completion-only models can't be used in the Playground), the model isn't enabled in your account, or the model isn't available in your region.
Rate limit or timeout errors
Wait 30–60 seconds and try again. This can happen when making many rapid changes. If 502 errors persist, the backend service may need attention — contact support@getbonito.com.
Need more help?
Can't find what you're looking for? Our team is here to help.