Changelog
What's new in Bonito. ~25 ships in May 2026 alone — gateway, agents, KB, infra, governance. The cadence below is the cadence.
May 2026
Custom error pages (5/30)
Branded 404, 403, 500, 503, and global error pages with Bonito fish theme, Framer Motion animations, and contextual messaging. 503 includes 60s auto-retry countdown.
Starter tier — $199/mo (5/28)
New tier between Free and Pro: 3 providers, 100K req/mo, 5 seats, 2 agents, RAG (2 KBs), analytics, audit trail, CLI, email support. Designed for teams that want to swipe a card without procurement approval.
Personal Access Tokens + Project Tokens (5/27)
Three token types now live: gateway keys (bn-), personal access tokens (bp-) carrying user permissions across all endpoints, and project tokens (bj-) scoped to a single project (Pro+, admin-only). Per-tier caps enforced at create time.
Per-org log retention by tier (5/27)
Retention runs per-org: Free=30d, Pro=60d, Enterprise/Scale=90d. Settings UI shows tier-appropriate options with locked indicators for higher tiers.
GCS log sink org-partitioned (5/27)
Restructured to {org_id}/{log_type}/{YYYY}/{MM}/{DD}/{HH}.ndjson across 10 log types (gateway, agent, auth, kb, admin, deployment, billing, compliance, approval, system). Per-org GCS lifecycle rules for tier-based retention.
UX onboarding improvements (5/27)
Settings page shows subscription tier badge. Sidebar nav items requiring a higher tier show Pro/Ent badges. KB cards show pending-state guidance. /api/auth/me returns subscription_tier.
Token efficiency metrics on Gateway dashboard (5/26)
Cost-per-1K-tokens at three levels: overall stat card, per-model in breakdown, and per-request in the logs table. Side-by-side model cost-effectiveness comparison across providers.
Overflow Queue for agents (5/25)
Redis-backed FIFO queue per agent. When RPM ceilings are hit, requests are queued not dropped — 202 Accepted with ticket_id and poll_url. Background drainer (2s interval) retries as capacity frees up. Max depth 500, result TTL 1h.
Agent HPA — Phase 1 (5/25)
Elastic agent capacity. Virtual mode doubles effective RPM in Redis when utilization crosses threshold (default 60%). Scale-down via background loop (30s). Configurable via API, CLI, and bonito.yaml scaling block. Enterprise+ only.
KB vector dimension upgrade — 768 → 1024 (5/25)
Pgvector column migrated from vector(768) to vector(1024) to match Titan Embed V2 native dimensions. Migration NULLs existing embeddings, alters the column, and backfills KBs to embedding_dimensions=1024.
Production reliability — 6 fixes (5/25)
Pgvector greenlet_spawn fix (codec registration moved to checkout event). KB delete cascade fix (raw SQL bypasses ORM ARRAY/vector coercion). Alembic multiple-heads merge after branched migration. Ingestion error handler rollback. GCS fast-fail on missing credentials. Embedding timeout raised 30s → 90s.
KB search quality (5/24)
Tool-search threshold lowered from 0.7 to 0.5 to match RAG injection. Added MODEL_MAX_DIMENSIONS map to clamp requested embedding dimensions to model max — fixes silent ingestion failures with GCP text-embedding-005 (768 max) at KB default 1024.
Gateway Vault fallback (5/24)
_get_provider_credentials() now uses Vault → encrypted DB fallback chain across all provider lookups. No more single-point-of-failure on Vault availability.
External orchestration / Breadcrumbs tracing (5/23)
POST /api/agents/{id}/execute accepts optional parent_agent_id. When set, a synthetic invoke_agent delegation record is logged in the parent's session, so code-orchestrated pipelines appear in Breadcrumbs with zero latency impact. CLI flag: --parent-agent.
Agent Health dashboard (5/23)
Platform admin page at /admin/agent-health cross-references agent model_ids against available provider models to detect deprecated or unroutable models. Background check runs after every 24h model sync. Per-agent health badges: Healthy, Deprecated, No Route, Warning.
Gateway duplicate-provider fix (5/23)
_get_provider_credentials() now keys by provider UUID instead of provider_type — fixes silent credential overwrites when orgs have multiple providers of the same type. All provider CRUD endpoints now call reset_router() for immediate cache invalidation instead of waiting 50min TTL.
Image generation endpoint (5/20)
POST /v1/images/generations live across dall-e-3, dall-e-2, gpt-image-1. Same bn- key as chat. Powers creative-asset workflows: brand-asset pipelines, marketing visuals, campaign generation.
Video generation endpoints (5/20)
POST /v1/videos (submit), GET /v1/videos/{id} (status), GET /v1/videos/{id}/content (download) across OpenAI Sora-2 and Vertex AI Veo 2.0/3.0/3.1. Credentials injected from Vault/DB. Per-second cost tracking.
Sentry tracking — backend + frontend (5/12)
sentry-sdk[fastapi] initializes before FastAPI app with environment-aware sampling (20% prod, 100% dev). @sentry/nextjs SDK with client/server/edge configs, instrumentation hook, global error boundary, and source-map upload via withSentryConfig.
API schema hardening — extra="forbid" (5/12)
All Bonobot create/update schemas now reject unknown fields with 422 instead of silently dropping them. Covers AgentCreate, AgentUpdate, AgentConnectionCreate, AgentGroupCreate/Update, AgentExecuteRequest, AgentScheduleCreate/Update.
All 6 providers connectable via UI (5/06)
Connect modal + onboarding wizard fixed for all 6 providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex AI, Azure AI, Groq). Anthropic validation uses /v1/models instead of hardcoded model. Connect modal uses apiRequest() for JWT auth.
Background model sync — 24h (5/06)
model_sync.py runs every 24h and syncs models for all active providers. Anthropic now uses live API with static pricing fallback. Wired into FastAPI lifespan.
Credential storage fix (5/06)
Legacy POST/PATCH /api/providers endpoints now encrypt credentials at write time (were storing plain JSON). DB fallback auto-migrates plain JSON → AES-256-GCM on read. Bedrock _check_model_access fixed to use real API.
Admin access requests UI (5/06)
Admin page at /admin/access-requests for invite-only registration approval. Submit → admin approve → invite code → register flow, rate-limited at 5 req/60s.
March–April 2026
Bonobot agent framework
Enterprise agent framework with visual canvas (React Flow), project-based organization, built-in tools (KB search, HTTP, agent-to-agent), and enterprise security: default-deny tool policy, budget stops, rate limiting, SSRF protection, full audit trail.
RAG knowledge bases on pgvector HNSW
Cross-cloud RAG pipeline: upload/parse/chunk/embed docs, pgvector HNSW search, gateway context injection on chat completions, source citations on every response.
VectorBoost — KB compression
3.9-8x storage reduction across scalar-8bit, polar-8bit, polar-4bit compression methods. Enterprise+ configurable per KB.
SAML SSO across 4 IdPs
Okta, Azure AD, Google Workspace, Custom SAML. SSO enforcement, break-glass admin, JIT provisioning.
Persistent agent memory
Long-term agent memory with pgvector similarity search, 5 memory types, AI-powered extraction. Cross-session continuity for production agent deployments.
Scheduled autonomous execution
Cron-based agent tasks with timezone support and multi-channel delivery (webhook, email, Slack).
Approval queue / Human-in-the-loop
Risk assessment per tool call, auto-approve conditions, timeout handling, full audit trails. Enterprise governance for agent-initiated actions.
Org Secrets Store — Vault-backed
HashiCorp Vault-backed key-value secrets, runtime injection into agent system prompts. Org-scoped, never exposed to model providers.
Memwright — shared conversational memory
Per-session memory via SQLite + ChromaDB. Model tier gating (zero memory for small models, full context for premium tiers).
February 2026
Deployment Provisioning
Deploy AI models directly into your cloud from the Bonito UI. AWS Provisioned Throughput (reserved capacity), Azure OpenAI deployments (Standard/GlobalStandard TPM), and GCP Vertex AI serverless — all without leaving the dashboard.
Least-Privilege Permissions
Two IAM modes for every provider: Quick Start (managed roles for fast setup) and Enterprise (separate least-privilege policies per capability). Only grant the exact permissions each feature needs.
Routing Policies
Cost-optimized routing, failover chains, and A/B testing with weight-based model selection. Route requests intelligently across providers and models with dry-run testing.
Notifications
In-app notification system for deployment lifecycle events, spend alerts, model activation confirmations, and provider health updates. Configurable alert rules with email and in-app delivery.
Bonito CLI
Python CLI (bonito-cli) for terminal-based management. Manage providers, models, gateway keys, routing policies, and costs from your terminal or CI/CD pipelines.
AI Copilot
Natural language assistant for managing your AI infrastructure. Ask questions about costs, configure routing, and analyze provider health.
Enhanced Cost Analytics
Breakdown by model, provider, team, and application. Export reports in CSV and PDF formats.
January 2026
One-Click Model Activation
Enable models directly from the Bonito dashboard without leaving to your cloud console. Supports individual enable and bulk activation (up to 20 models at once). Works across AWS Bedrock entitlements, Azure deployments, and GCP API enablement.
API Gateway v2
OpenAI-compatible gateway endpoint with intelligent request routing, automatic failover, and support for routing policies. One API key for all your providers.
Google Vertex AI Support
Full integration with Google Vertex AI including Gemini models. Connect and manage alongside your other providers.
Audit Trail
Complete audit logging for every API call, configuration change, and team action. Export for compliance reporting.
December 2025
Budget Alerts
Set spending thresholds per provider, per team, or globally. Receive email and in-app notifications before you exceed them.
Azure OpenAI Integration
Connect your Azure OpenAI deployments alongside AWS Bedrock for multi-cloud routing strategies.
November 2025
Public Launch
Bonito is live! Unified multi-cloud AI management with support for AWS Bedrock, Azure OpenAI, and Google Cloud Vertex AI.