Documentation

Bonito Docs

Everything you need to connect your cloud AI providers, deploy models, and route requests through a single gateway.

Getting Started

Bonito is a unified AI gateway that connects your AI providers (AWS Bedrock, Azure OpenAI, Google Cloud Vertex AI, OpenAI Direct, Anthropic Direct, and Groq) and lets you manage all your models from a single dashboard. You get one API endpoint, one place to track costs, and one control plane for your entire AI stack.

Quick start (5 minutes)

  1. 1Sign up at getbonito.com/register — one account covers your entire organization.
  2. 2Go to Providers → Add Provider and connect at least one cloud provider (AWS, Azure, or GCP).
  3. 3Bonito validates your credentials and syncs all available models automatically.
  4. 4Enable the models you want — click Enable on any model or use bulk activation for up to 20 at once.
  5. 5Go to Gateway → Create Key to generate an API key.
  6. 6Point any OpenAI-compatible SDK at gateway.getbonito.com/v1 with your new key.
# Make your first request through Bonito
curl -X POST https://getbonito.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_BONITO_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic.claude-3-sonnet-20240229-v1:0",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 256
  }'
Tip
Bonito's gateway is fully compatible with the OpenAI Chat Completions format. Any tool that supports OpenAI (LangChain, LlamaIndex, custom apps) works out of the box — just change the base URL and API key.

Provider Setup

Connect your cloud provider accounts so Bonito can discover your available models, route requests, and track costs. Each provider requires different credentials.

AWS Bedrock

To connect AWS, you need an IAM Access Key ID and Secret Access Key. Bonito validates them using STS and checks Bedrock permissions automatically.

FieldDescription
Access Key IDYour IAM access key
Secret Access KeyYour IAM secret key

Azure OpenAI

Azure requires a service principal with access to your Azure OpenAI resource. The endpoint must be a custom-subdomain URL (e.g., https://your-resource.openai.azure.com/), not a generic regional endpoint.

FieldDescription
Tenant IDAzure AD tenant ID
Client IDService principal application ID
Client SecretService principal secret
Subscription IDYour Azure subscription
Resource GroupResource group with your OpenAI resource
Endpoint URLCustom subdomain endpoint URL
Warning
A generic regional endpoint like https://eastus.api.cognitive.microsoft.com/ will not work. You must use an Azure OpenAI resource with a custom subdomain.

Google Cloud (Vertex AI)

GCP requires your Project ID and a Service Account JSON key file. Paste the entire JSON contents — Bonito validates the format in the browser before sending.

FieldDescription
Project IDYour GCP project ID
Service Account JSONFull JSON key file contents
Tip
You can update credentials at any time without re-entering everything. Go to Providers → click a provider → change only the fields you need. Blank fields keep their current values.

OpenAI Direct

For teams using OpenAI directly (not through Azure). Connect with just an API key to access GPT-4o, GPT-4o mini, and other OpenAI models.

FieldDescription
API KeyYour API key from platform.openai.com

Anthropic Direct

For teams that want to use Claude models directly through Anthropic without going through AWS Bedrock. Connect with just an API key.

FieldDescription
API KeyYour API key from console.anthropic.com

Groq

Ultra-fast inference for open-source models like Llama 3.3 and Mixtral. Groq's LPU hardware delivers the lowest latency in the industry. Connect with just an API key.

FieldDescription
API KeyYour API key from console.groq.com

Permissions & IAM

Bonito supports two IAM setup modes for every provider. Choose based on your security requirements.

Quick Start

Attach a single managed role with broad permissions. Fast to set up, ideal for evaluation and testing.

Enterprise (Recommended)

Separate least-privilege policies per capability. Only grant the exact permissions each feature needs.

AWS Bedrock permissions

In Enterprise mode, each capability has its own policy so you only grant what you need:

PolicyActionsRequired?
CoreListFoundationModels, GetFoundationModel, InvokeModel, InvokeModelWithResponseStream, sts:GetCallerIdentityAlways
ProvisioningCreate/Get/Update/Delete/ListProvisionedModelThroughputIf deploying reserved capacity
Model ActivationPutFoundationModelEntitlementIf enabling models from Bonito UI
Cost Trackingce:GetCostAndUsage, GetCostForecast, GetDimensionValues, GetTagsIf you want spend visibility

Example IAM policy (core only — minimum to get started):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:ListFoundationModels",
        "bedrock:GetFoundationModel",
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": ["sts:GetCallerIdentity"],
      "Resource": "*"
    }
  ]
}

Azure permissions

Quick Start: Assign Cognitive Services Contributor on the Azure OpenAI resource.

Enterprise: Create a custom role with only the exact permissions Bonito uses — account read, deployments read/write/delete, models read, and inference actions.

az role definition create --role-definition '{
  "Name": "Bonito AI Operator",
  "Actions": [
    "Microsoft.CognitiveServices/accounts/read",
    "Microsoft.CognitiveServices/accounts/deployments/read",
    "Microsoft.CognitiveServices/accounts/deployments/write",
    "Microsoft.CognitiveServices/accounts/deployments/delete",
    "Microsoft.CognitiveServices/accounts/models/read"
  ],
  "DataActions": [
    "Microsoft.CognitiveServices/accounts/OpenAI/deployments/chat/completions/action",
    "Microsoft.CognitiveServices/accounts/OpenAI/deployments/completions/action",
    "Microsoft.CognitiveServices/accounts/OpenAI/deployments/embeddings/action"
  ],
  "AssignableScopes": ["/subscriptions/YOUR_SUBSCRIPTION_ID"]
}'

Optionally add Cost Management Reader at subscription scope for spend visibility.

GCP permissions

Quick Start: Assign roles/aiplatform.user to the service account.

Enterprise: Create a custom role with discovery (publishers.get, publisherModels.get), invocation (endpoints.predict), endpoint management (create/get/list/update/delete/deploy/undeploy), model metadata (models.list, models.get), and project validation (resourcemanager.projects.get).

Tip
Bonito's IaC templates (Terraform) support both modes for all providers. Set iam_mode = "least_privilege" for enterprise or "managed" for quick start.

Model Management

Once a provider is connected, Bonito automatically syncs all available models. You can view, search, filter, and enable models from a single catalog.

One-click model activation

Models with a 🔒 icon exist in your provider's catalog but aren't yet enabled in your cloud account. Instead of switching to each provider's console, enable them directly from Bonito:

  1. 1Go to the Models page and find the model you want to enable.
  2. 2Click the Enable button on the model card.
  3. 3Bonito handles the provider-specific activation (Bedrock entitlements, Azure deployments, GCP API enablement).
  4. 4Some models may require approval from the provider and won't activate instantly.
Tip
Use bulk activation to enable up to 20 models at once — select them and click "Enable Selected".

Playground

Test any enabled chat model directly in the browser. The Playground supports single-model chat and side-by-side comparison mode (up to 4 models). Token usage and cost appear after each response. Only chat-capable, enabled models are shown in the picker.

Deployments

Deploy AI models directly into your cloud from the Bonito UI — no console-hopping required. Bonito creates real deployments in your cloud account.

ProviderDeployment TypeWhat Bonito Creates
AWS BedrockOn-demand or Provisioned ThroughputOn-demand: validates access. PT: creates reserved capacity with commitment (1 week–6 months)
Azure OpenAIModel deployment with TPM capacityCreates a deployment on your Azure OpenAI resource (Standard or GlobalStandard tier)
GCP Vertex AIServerless (no provisioning needed)Verifies access — GCP models are serverless by default
Warning
AWS Provisioned Throughput costs real money ($20+/hr per model unit) and requires a minimum 1-month commitment. Use on-demand for testing. Also requires the bedrock:CreateProvisionedModelThroughput IAM permission.
Note
Azure deployments require TPM quota for the model in your subscription. If you get a quota error, request an increase in Azure Portal → Quotas.

Gateway API

Bonito provides an OpenAI-compatible API endpoint so you can use any connected model with tools that support the OpenAI format. One API key, all your providers.

Endpoint

POST https://getbonito.com/v1/chat/completions

Authentication

Generate API keys from the Gateway page in the dashboard. Include your key in the Authorization header:

Authorization: Bearer YOUR_BONITO_API_KEY

Example: Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    base_url="https://getbonito.com/v1",
    api_key="YOUR_BONITO_API_KEY"
)

response = client.chat.completions.create(
    model="anthropic.claude-3-sonnet-20240229-v1:0",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Example: curl

curl https://getbonito.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_BONITO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic.claude-3-sonnet-20240229-v1:0",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 256
  }'

Model names

Use the provider-native model IDs shown on the Models page in Bonito. For example: anthropic.claude-3-sonnet-20240229-v1:0 for AWS Bedrock, gpt-4o for Azure, gemini-1.5-pro for GCP.

When using a routing policy, pass the policy name as the model field instead of a specific model ID.

Routing Policies

Routing policies let you automatically select the best model for each request based on your priorities. Create policies from Routing → Create Policy in the dashboard.

Cost-Optimized

Automatically selects the cheapest capable model for each request. Route routine traffic to economy models and save 40–70% versus using a single premium model for everything.

Failover Chain

Define a primary model and one or more fallbacks. If the primary fails or is unavailable, Bonito automatically tries the next model in the chain. Great for high-availability use cases.

A/B Testing

Split traffic between models using percentage weights (must sum to 100). Test new models in production with controlled rollout — e.g., send 90% to your current model and 10% to a new one.

Tip
Use the "Test" button on any routing policy to dry-run model selection and verify your configuration before going live.

Cross-Region Inference

Bonito automatically creates cross-region inference profiles for AWS Bedrock models. If us-east-1 is overloaded or experiencing downtime, requests seamlessly route to us-west-2 or eu-west-1 without any action on your part.

How it works

  • Bonito handles the us. prefix routing at the gateway level, so cross-region inference profiles are created and managed automatically.
  • No configuration needed — just connect your AWS provider and Bonito takes care of the rest.
  • If a primary region is throttled or returns errors, requests are retried in an alternate region transparently.

True high-availability

Cross-region inference combines with multi-provider failover for true high-availability. For example, if Claude on Bedrock fails across all regions, Bonito can fall back to Anthropic Direct automatically. No other platform (Portkey, LiteLLM, Helicone) offers automatic cross-region inference profiles with intelligent failover built in.

Tip
Cross-region inference works out of the box for all Bedrock models. Pair it with a failover routing policy that includes a direct provider (OpenAI, Anthropic, Groq) for maximum resilience.

Model Aliases

Bonito supports shorthand model aliases that resolve to provider-specific versioned model IDs. This means you can reference models by simple names and switch providers without changing your code.

How aliases work

When you send a request with an alias like claude-sonnet, Bonito resolves it to the correct model ID for your active provider. On Bedrock, that becomes anthropic.claude-3-sonnet-20240229-v1:0. On Anthropic Direct, it becomes claude-3-sonnet-20240229.

  • Switch providers without changing a single line of code in your application.
  • Aliases are updated automatically when new model versions are released.
  • Works with routing policies — aliases resolve before routing rules are applied.
from openai import OpenAI

client = OpenAI(
    base_url="https://getbonito.com/v1",
    api_key="YOUR_BONITO_API_KEY"
)

# Use an alias instead of a provider-specific model ID
response = client.chat.completions.create(
    model="claude-sonnet",  # Resolves automatically
    messages=[{"role": "user", "content": "Hello!"}]
)

Notifications

Bonito sends in-app notifications for important events across the platform so you never miss a deployment status change or cost alert.

Notification types

  • Deployment lifecycle — creation, scaling, completion, and failure alerts for deployments across all providers.
  • Spend alerts — get notified when costs approach or exceed your configured budget thresholds.
  • Model activation — confirmation when models are enabled or if activation requires provider approval.
  • Provider health — alerts when a provider connection has issues or needs credential rotation.

The notification bell in the dashboard header shows your unread count. Click to see the full list with read/unread states. You can configure alert rules for budget thresholds with email and in-app delivery preferences.

Cost Management

Monitor AI spending across all connected providers from a single dashboard. Bonito pulls real cost data from your cloud accounts and shows breakdowns by model, provider, and time period.

What you get

  • Aggregated costs across AWS, Azure, and GCP with daily/weekly/monthly views.
  • Cost forecast with projected spending and confidence bounds.
  • Per-model and per-provider breakdowns to identify expensive workloads.
  • Budget alerts — set thresholds and get notified before you exceed them.
  • Optimization recommendations — Bonito suggests cheaper model alternatives and cross-provider routing opportunities.
Note
Cost tracking requires the cost-related IAM permissions for each provider (AWS Cost Explorer, Azure Cost Management Reader, GCP Billing Viewer). These are optional — the platform works without them, you just won't see spend data.

CLI Tool

bonito-cli is a Python CLI for managing your Bonito resources from the terminal. It's useful for scripting, CI/CD pipelines, and terminal-first workflows.

Installation

pip install bonito-cli

Authentication

# Login with your Bonito credentials
bonito auth login

# Or set your API key directly
export BONITO_API_KEY=your-key-here

Common commands

# List connected providers
bonito providers list

# List available models
bonito models list

# Create a gateway API key
bonito gateway keys create --name "my-key"

# List routing policies
bonito routing list

# Check costs
bonito costs summary

Run bonito --help for the full list of commands and options.

Integrations

Bonito + Claude Cowork

Claude Cowork is Anthropic's agentic desktop application. The Bonito plugin gives Claude deep knowledge of your AI infrastructure, so you can deploy providers, create agents, configure routing, and analyze costs through natural conversation.

Install the Plugin

Install from the Claude plugin marketplace or via Claude Code:

claude plugin install bonito

What the Plugin Adds

6 domain skills that Claude draws on automatically when relevant:

  • deploy-stack: Deploy infrastructure from a bonito.yaml config
  • manage-providers: Connect and manage cloud AI providers
  • create-agent: Create BonBon agents and Bonobot orchestrators
  • gateway-routing: Configure failover, cost-optimized routing, A/B testing
  • cost-analysis: Analyze spending and recommend optimizations
  • debug-issues: Troubleshoot gateway, provider, and agent problems

MCP Server

The plugin connects to the Bonito MCP server, which exposes 18 tools for direct API access. Install via PyPI or Docker:

pip install bonito-mcp

Or run with Docker:

docker run -e BONITO_API_KEY=your-key -p 8080:8080 bonitoai/mcp-server

Claude Desktop Configuration

Add this to your Claude Desktop MCP config:

{
  "mcpServers": {
    "bonito": {
      "command": "bonito-mcp",
      "env": {
        "BONITO_API_KEY": "your-bonito-api-key"
      }
    }
  }
}
Tip
The MCP server works with any MCP-compatible client including Cursor, Windsurf, and other tools that support the Model Context Protocol.

Learn More

Declarative Config (bonito.yaml)

Define your entire AI stack in a single bonito.yaml file. Providers, agents, MCP servers, knowledge bases, routing, and triggers — all in one place. Deploy with a single command.

bonito deploy -f bonito.yaml

Example: AWS Bedrock stack

A complete stack using AWS Bedrock with two agents and a knowledge base:

version: "1"
name: my-ai-stack

gateway:
  providers:
    - name: aws
      priority: 1
      models:
        - anthropic.claude-3-sonnet-20240229-v1:0
        - amazon.nova-pro-v1:0
        - amazon.titan-embed-text-v2:0
      region: us-east-1
      access_key: ${AWS_ACCESS_KEY_ID}
      secret_key: ${AWS_SECRET_ACCESS_KEY}

  routing:
    strategy: cost-optimized
    fallback: true

agents:
  support-bot:
    type: bonbon
    mode: simple
    display_name: Support Agent
    model:
      primary: anthropic.claude-3-sonnet-20240229-v1:0
      fallback: amazon.nova-pro-v1:0
    system_prompt: |
      You are a helpful support agent...
    rag:
      knowledge_base: company-docs

  code-reviewer:
    type: bonbon
    mode: advanced
    display_name: Code Reviewer
    model:
      primary: anthropic.claude-3-sonnet-20240229-v1:0
    mcp_servers:
      - github

knowledge_bases:
  company-docs:
    description: Internal documentation
    sources:
      - type: directory
        path: ./docs/
        glob: "**/*.md"
    embedding:
      model: amazon.titan-embed-text-v2:0
      provider: aws

Example: Multi-provider stack

A stack using Groq, Anthropic, OpenAI, and GCP with multi-agent orchestration:

version: "1"
name: multi-provider-stack

gateway:
  providers:
    - name: groq
      priority: 1
      models: [llama-3.3-70b-versatile, mixtral-8x7b-32768]
      api_key: ${GROQ_API_KEY}

    - name: anthropic
      priority: 1
      models: [claude-sonnet-4-20250514]
      api_key: ${ANTHROPIC_API_KEY}

    - name: openai
      priority: 2
      models: [gpt-4o, gpt-4o-mini]
      api_key: ${OPENAI_API_KEY}

    - name: gcp
      priority: 2
      models: [gemini-1.5-pro, gemini-1.5-flash]
      project_id: ${GCP_PROJECT_ID}
      service_account: ${GCP_SERVICE_ACCOUNT_JSON}

  routing:
    strategy: cost-optimized
    fallback: true
    retry_attempts: 2

agents:
  orchestrator:
    type: bonobot
    display_name: Operations Center
    model:
      primary: claude-sonnet-4-20250514
      fallback: gpt-4o
    delegates:
      - agent: fast-responder
        domains: [triage, alerts, summaries]
      - agent: deep-analyst
        domains: [code-review, analysis, research]

  fast-responder:
    type: bonbon
    mode: simple
    model:
      primary: groq/llama-3.3-70b-versatile
    system_prompt: |
      You are a fast-response agent for triage...

  deep-analyst:
    type: bonbon
    mode: advanced
    model:
      primary: claude-sonnet-4-20250514
      fallback: gpt-4o
    mcp_servers:
      - github
      - jira
Tip
Use environment variable references like ${AWS_ACCESS_KEY_ID} to keep secrets out of your config file. Bonito resolves them at deploy time from your environment or a .env file.

BonBon Agents

BonBon is Bonito's managed agent service. Create AI agents with custom system prompts, connect them to knowledge bases for RAG, and deploy them as embeddable chat widgets or API endpoints — all without managing infrastructure.

Agent tiers

Simple — $49/mo

Pre-built agent with a system prompt, optional knowledge base, and an embeddable widget. Ideal for FAQ bots, customer support, and internal assistants. Deploy in minutes.

Advanced — $99/mo

Agent with MCP tool integration, multiple knowledge bases, webhook triggers, and custom workflows. Built for agents that need to interact with external systems.

Creating an agent

  1. 1Go to Agents → Create Agent in the dashboard.
  2. 2Choose a tier (Simple or Advanced) and give your agent a name.
  3. 3Write a system prompt that defines your agent's personality, constraints, and behavior.
  4. 4Optionally attach a knowledge base for RAG-powered responses.
  5. 5Select the backing model (or models for Advanced tier).
  6. 6Deploy — Bonito gives you a widget embed code and an API endpoint.
# Create an agent via CLI
bonito agents create \
  --name "Support Bot" \
  --tier simple \
  --model "anthropic.claude-3-sonnet-20240229-v1:0" \
  --system-prompt "You are a helpful support agent for Acme Corp..."

# List your agents
bonito agents list

# Get agent details
bonito agents get --id ag_abc123

System prompts

System prompts define how your agent behaves. Write clear instructions about the agent's role, tone, constraints, and what it should or shouldn't do. You can update the system prompt at any time without redeploying.

You are a customer support agent for Acme Corp.

Rules:
- Only answer questions about Acme products and services.
- If you don't know the answer, say so and offer to connect with a human.
- Be friendly, concise, and professional.
- Never make up pricing or feature information — use the knowledge base.
- Respond in the same language as the customer.

RAG integration

Connect a knowledge base to give your agent access to your documents. When a user asks a question, Bonito retrieves relevant chunks from your KB and includes them in the context before the model generates a response.

# Attach a knowledge base to an agent
bonito agents update --id ag_abc123 \
  --knowledge-base kb_xyz789

# Attach multiple KBs (Advanced tier only)
bonito agents update --id ag_abc123 \
  --knowledge-base kb_xyz789 \
  --knowledge-base kb_docs456

Widget embedding

Every BonBon agent gets an embeddable chat widget. Add it to any website with a single script tag:

<!-- Add to your website -->
<script
  src="https://getbonito.com/widget.js"
  data-agent-id="ag_abc123"
  data-theme="dark"
  data-position="bottom-right"
  async
></script>
Tip
You can also use the agent via the API directly. Send messages to POST /v1/agents/ag_abc123/chat with the same OpenAI-compatible format.

Bonobot Orchestrator

Bonobot is Bonito's multi-agent orchestration layer. It acts as a front-door agent that classifies user intent, delegates to specialized sub-agents, and synthesizes their responses into a unified reply. Think of it as a dispatcher that routes conversations to the right expert.

How it works

  1. 1A user sends a message to the Bonobot endpoint.
  2. 2The orchestrator classifies the user's intent using a fast classification model.
  3. 3Based on the intent, it delegates the request to one or more specialized BonBon agents.
  4. 4Each sub-agent processes its part using its own system prompt, model, and knowledge base.
  5. 5Bonobot synthesizes the responses and returns a single, coherent answer.

Delegation map

The delegation map defines which sub-agents handle which intents. Configure it as a JSON mapping of intent patterns to agent IDs:

{
  "delegation_map": {
    "billing": {
      "agent_id": "ag_billing01",
      "description": "Handles billing, invoices, and payment questions"
    },
    "technical_support": {
      "agent_id": "ag_techsup01",
      "description": "Handles technical issues, bugs, and troubleshooting"
    },
    "sales": {
      "agent_id": "ag_sales01",
      "description": "Handles pricing, demos, and feature inquiries"
    },
    "general": {
      "agent_id": "ag_general01",
      "description": "Fallback for anything that doesn't match a specific intent"
    }
  }
}

Creating a Bonobot

# Create an orchestrator
bonito bonobot create \
  --name "Customer Hub" \
  --classifier-model "anthropic.claude-3-haiku-20240307-v1:0" \
  --delegation-map ./delegation.json

# Update the delegation map
bonito bonobot update --id bot_abc123 \
  --delegation-map ./updated-delegation.json

# Test intent classification
bonito bonobot classify --id bot_abc123 \
  --message "I need a refund for my last invoice"
Note
Bonobot is a separate add-on ($349/mo hosted) for fully custom multi-agent orchestration. Each sub-agent in the delegation map can be a BonBon agent or another Bonobot.

Response synthesis

When a request touches multiple intents (e.g., "I want to upgrade my plan and fix a bug"), Bonobot can delegate to multiple agents in parallel and merge their responses. Enable multi-delegation in the orchestrator settings:

{
  "multi_delegation": true,
  "synthesis_model": "anthropic.claude-3-sonnet-20240229-v1:0",
  "synthesis_prompt": "Combine the following specialist responses into a single, coherent answer."
}

Code Review

Bonito's GitHub App provides AI-powered code review on pull requests. Install it from the Bonito dashboard, connect your GitHub repos, and get automatic reviews on every PR.

What it does

  • Automatically reviews PRs for security vulnerabilities, performance issues, and code quality.
  • Posts structured findings directly as PR comments with clear explanations and suggestions.
  • Multiple review personas available &mdash; default professional tone, or fun characters for a lighter touch.
  • Configure which repos and branches trigger automatic reviews from the dashboard.

Getting started

  1. 1Go to Code Review in the Bonito dashboard.
  2. 2Click Install GitHub App and authorize access to your repositories.
  3. 3Select which repos and branches should trigger automatic reviews.
  4. 4Open a pull request &mdash; Bonito reviews it and posts comments within minutes.
Note
Free tier includes 5 code reviews per month per repo. Upgrade for unlimited reviews and priority processing.

MCP Integration

MCP (Model Context Protocol) lets your BonBon agents call external tools — databases, APIs, code execution, file systems, and more. Register MCP servers with Bonito and connect them to your agents so they can take actions, not just answer questions.

Supported transports

SSE (Server-Sent Events)

Connect to remote MCP servers over HTTP. Best for hosted tools, third-party integrations, and production deployments.

stdio (Standard I/O)

Run MCP servers as local subprocesses. Best for development, local tools, and self-hosted servers.

Registering an MCP server

{
  "name": "github-tools",
  "transport": "sse",
  "url": "https://mcp.example.com/github/sse",
  "headers": {
    "Authorization": "Bearer ghp_xxxxxxxxxxxx"
  },
  "tools": ["create_issue", "search_repos", "get_pull_request"]
}
# Register via CLI
bonito mcp register \
  --name "github-tools" \
  --transport sse \
  --url "https://mcp.example.com/github/sse"

# List registered MCP servers
bonito mcp list

# Test a tool call
bonito mcp test --server "github-tools" --tool "search_repos" \
  --params '{"query": "bonito"}'

stdio configuration

For stdio-based MCP servers, provide the command and arguments to launch the server process:

{
  "name": "sqlite-tools",
  "transport": "stdio",
  "command": "npx",
  "args": ["-y", "@modelcontextprotocol/server-sqlite", "./data/mydb.sqlite"],
  "env": {
    "NODE_ENV": "production"
  }
}

Connecting MCP to agents

Once registered, attach MCP servers to your BonBon agents. The agent's model will automatically see the available tools and can call them during conversations:

# Connect MCP server to an agent
bonito agents update --id ag_abc123 \
  --mcp-server "github-tools" \
  --mcp-server "sqlite-tools"

# Remove an MCP server from an agent
bonito agents update --id ag_abc123 \
  --remove-mcp-server "sqlite-tools"
Warning
MCP tool execution happens server-side. Make sure your MCP servers are secured — Bonito passes through authentication headers but does not sandbox tool execution.

Knowledge Bases

Knowledge Bases power RAG (Retrieval-Augmented Generation) for your BonBon agents. Upload documents, and Bonito chunks, embeds, and indexes them so your agents can retrieve relevant context at query time.

Creating a knowledge base

  1. 1Go to Knowledge Bases → Create in the dashboard, or use the CLI.
  2. 2Give it a name and optional description.
  3. 3Choose an embedding model (defaults to a high-quality model on your connected providers).
  4. 4Configure chunking strategy (size, overlap).
  5. 5Upload your documents.
# Create a knowledge base
bonito kb create --name "Product Docs" \
  --embedding-model "amazon.titan-embed-text-v2:0" \
  --chunk-size 512 \
  --chunk-overlap 50

# Upload documents
bonito kb upload --id kb_xyz789 ./docs/*.pdf
bonito kb upload --id kb_xyz789 ./faq.md
bonito kb upload --id kb_xyz789 https://example.com/api-reference.html

# Check indexing status
bonito kb status --id kb_xyz789

Supported file formats

Bonito accepts PDF, Markdown, plain text, HTML, DOCX, and CSV files. Each file is parsed, split into chunks, and embedded using your chosen embedding model.

Chunking strategies

StrategyDescriptionBest For
fixedSplit by token count with configurable overlapGeneral purpose, predictable chunk sizes
semanticSplit at natural boundaries (paragraphs, sections)Long-form documents, articles
sentenceSplit by sentence with groupingFAQ, short-form content

Embedding models

Bonito uses embedding models from your connected providers. Any enabled embedding model can be used:

  • AWS Bedrock: amazon.titan-embed-text-v2:0, cohere.embed-english-v3
  • Azure OpenAI: text-embedding-ada-002, text-embedding-3-small, text-embedding-3-large
  • GCP Vertex AI: textembedding-gecko, text-embedding-004

Querying

You can query a knowledge base directly to test retrieval before connecting it to an agent:

# Query a knowledge base directly
bonito kb query --id kb_xyz789 \
  --query "What is the refund policy?" \
  --top-k 5

# Connect KB to an agent (see BonBon Agents section)
bonito agents update --id ag_abc123 \
  --knowledge-base kb_xyz789
Tip
Start with a chunk size of 512 tokens and 50-token overlap. Adjust based on your content — shorter chunks work better for precise Q&A, longer chunks for summarization tasks.

Managed Inference

Managed Inference gives you zero-config access to AI models without connecting any cloud provider. No API keys, no cloud accounts, no setup — just start making requests. Bonito handles provider selection, routing, and billing.

How it works

  1. 1Sign up for Bonito and create a gateway API key — that's it.
  2. 2Use any supported model by name in your API requests.
  3. 3Bonito routes your request to the optimal provider automatically.
  4. 4You're billed through Bonito based on token usage — no separate cloud bills.
from openai import OpenAI

# No cloud provider setup needed — just your Bonito key
client = OpenAI(
    base_url="https://getbonito.com/v1",
    api_key="YOUR_BONITO_API_KEY"
)

# Use any supported model
response = client.chat.completions.create(
    model="claude-3-sonnet",  # Bonito routes to the best provider
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

print(response.choices[0].message.content)

Supported models

Managed Inference supports a curated set of popular models across providers. Use simplified model names — Bonito resolves them to provider-specific IDs:

ModelProviderUse Case
claude-3-sonnetAnthropicGeneral purpose, balanced performance
claude-3-haikuAnthropicFast, cost-effective tasks
gpt-4oOpenAIMultimodal, high performance
gpt-4o-miniOpenAILightweight, budget-friendly
gemini-1.5-proGoogleLong context, multimodal

When to use Managed vs BYOC

Managed Inference

  • • Quick start — no cloud setup
  • • Don't want to manage API keys
  • • Prototyping and development
  • • Teams without cloud accounts

Bring Your Own Cloud

  • • Data residency requirements
  • • Existing cloud commitments/discounts
  • • Full provider control
  • • Enterprise compliance needs
Note
Managed Inference and BYOC (Bring Your Own Cloud) can be used simultaneously. Use managed models for quick experiments and your own providers for production workloads.

Triggers

Triggers let you invoke BonBon agents automatically based on events — incoming webhooks, cron schedules, slash commands, or custom events. Instead of waiting for a user to open a chat widget, triggers bring your agents into workflows programmatically.

Trigger types

Webhook

HTTP endpoint that invokes your agent when called. Connect to GitHub, Stripe, Slack, or any service that sends webhooks. The request payload is passed as context to the agent.

Scheduled (Cron)

Run your agent on a schedule using cron expressions. Great for daily reports, periodic data processing, health checks, and recurring tasks.

Slash Command

Register slash commands in Slack or Discord that invoke your agent. Users type /ask-support how do I reset my password? and get an agent response inline.

Event

Trigger agents from internal Bonito events — new document indexed in a KB, agent error threshold exceeded, or cost alert fired.

Creating triggers

# Create a webhook trigger
bonito triggers create \
  --agent-id ag_abc123 \
  --type webhook \
  --name "GitHub PR Review"

# Output: Webhook URL → https://getbonito.com/hooks/tr_wh_abc123

# Create a scheduled trigger (daily at 9 AM UTC)
bonito triggers create \
  --agent-id ag_abc123 \
  --type cron \
  --schedule "0 9 * * *" \
  --name "Daily Summary" \
  --input "Generate a summary of yesterday's support tickets"

# Create a slash command trigger
bonito triggers create \
  --agent-id ag_abc123 \
  --type slash-command \
  --platform slack \
  --command "/ask-support" \
  --name "Slack Support"

# List triggers for an agent
bonito triggers list --agent-id ag_abc123

Webhook payload

When a webhook trigger fires, the HTTP request body is passed to the agent as context. You can define a template to extract specific fields:

{
  "trigger_id": "tr_wh_abc123",
  "payload_template": "New {{event}} from {{repository.full_name}}: {{pull_request.title}}",
  "headers_to_forward": ["X-GitHub-Event"],
  "secret": "whsec_xxxxxxxx"
}
Tip
Use the secret field to verify webhook signatures. Bonito validates the HMAC signature on incoming requests and rejects unverified payloads.

Observability

Bonito provides built-in observability for every request flowing through the gateway and every agent interaction. Track tokens, latency, costs, and errors across your entire AI stack without any additional tooling.

Request tracing

Every API request gets a unique trace ID. View the full lifecycle of a request — from gateway receipt through provider routing to response delivery:

# View recent requests
bonito logs list --limit 20

# Get details for a specific trace
bonito logs get --trace-id tr_xxxxxxxxxxxx

# Filter by model, status, or time range
bonito logs list \
  --model "claude-3-sonnet" \
  --status error \
  --since "2024-01-15T00:00:00Z"

What's tracked

MetricDescription
Tokens (in/out)Input and output token counts per request
LatencyTime to first token (TTFT) and total response time
CostEstimated cost per request based on provider pricing
ModelWhich model and provider served the request
StatusSuccess, error, rate-limited, or timed-out
AgentWhich BonBon agent handled the request (if applicable)

Per-agent analytics

Each BonBon agent has its own analytics dashboard showing conversation volume, average response time, token usage, cost breakdown, and error rates over time.

# Get agent analytics
bonito agents analytics --id ag_abc123 \
  --period 7d

# Export analytics as CSV
bonito agents analytics --id ag_abc123 \
  --period 30d \
  --format csv > agent-analytics.csv

Cost monitoring

Observability integrates with Cost Management to show real-time spend. Set budget alerts per agent, per model, or globally:

# Set a per-agent budget alert
bonito alerts create \
  --scope agent \
  --agent-id ag_abc123 \
  --threshold 500 \
  --period monthly \
  --notify email,in-app

# Set a global daily spend alert
bonito alerts create \
  --scope global \
  --threshold 100 \
  --period daily
Note
Observability data is retained for 90 days by default. Contact support for extended retention or data export to your own analytics pipeline.

Troubleshooting

"Connected! Found 0 models"

Your credentials connected successfully, but model listing failed silently. Check:

  • Azure: Make sure your Endpoint URL is an Azure OpenAI resource endpoint with a custom subdomain, not a generic regional endpoint. Also verify the resource group is correct.
  • GCP: Ensure the Vertex AI API is enabled in your project.
  • AWS: Verify your IAM user has Bedrock permissions and you're in a region where Bedrock is available.

Models showing a 🔒 lock icon

The model exists in your provider's catalog but isn't enabled. Click the Enable button in Bonito, or enable it directly in your cloud console (AWS Bedrock → Model Access, Azure → create a deployment, GCP → enable the Vertex AI API).

Playground returns a 500 error

This typically means the model isn't a chat model (embedding or completion-only models can't be used in the Playground), the model isn't enabled in your account, or the model isn't available in your region.

Rate limit or timeout errors

Wait 30–60 seconds and try again. This can happen when making many rapid changes. If 502 errors persist, the backend service may need attention — contact support@getbonito.com.

Need more help?

Can't find what you're looking for? Our team is here to help.