Changelog

What's new in Bonito. ~25 ships in May 2026 alone — gateway, agents, KB, infra, governance. The cadence below is the cadence.

May 2026

Custom error pages (5/30)

Branded 404, 403, 500, 503, and global error pages with Bonito fish theme, Framer Motion animations, and contextual messaging. 503 includes 60s auto-retry countdown.

Starter tier — $199/mo (5/28)

New tier between Free and Pro: 3 providers, 100K req/mo, 5 seats, 2 agents, RAG (2 KBs), analytics, audit trail, CLI, email support. Designed for teams that want to swipe a card without procurement approval.

Personal Access Tokens + Project Tokens (5/27)

Three token types now live: gateway keys (bn-), personal access tokens (bp-) carrying user permissions across all endpoints, and project tokens (bj-) scoped to a single project (Pro+, admin-only). Per-tier caps enforced at create time.

Per-org log retention by tier (5/27)

Retention runs per-org: Free=30d, Pro=60d, Enterprise/Scale=90d. Settings UI shows tier-appropriate options with locked indicators for higher tiers.

GCS log sink org-partitioned (5/27)

Restructured to {org_id}/{log_type}/{YYYY}/{MM}/{DD}/{HH}.ndjson across 10 log types (gateway, agent, auth, kb, admin, deployment, billing, compliance, approval, system). Per-org GCS lifecycle rules for tier-based retention.

UX onboarding improvements (5/27)

Settings page shows subscription tier badge. Sidebar nav items requiring a higher tier show Pro/Ent badges. KB cards show pending-state guidance. /api/auth/me returns subscription_tier.

Token efficiency metrics on Gateway dashboard (5/26)

Cost-per-1K-tokens at three levels: overall stat card, per-model in breakdown, and per-request in the logs table. Side-by-side model cost-effectiveness comparison across providers.

Overflow Queue for agents (5/25)

Redis-backed FIFO queue per agent. When RPM ceilings are hit, requests are queued not dropped — 202 Accepted with ticket_id and poll_url. Background drainer (2s interval) retries as capacity frees up. Max depth 500, result TTL 1h.

Agent HPA — Phase 1 (5/25)

Elastic agent capacity. Virtual mode doubles effective RPM in Redis when utilization crosses threshold (default 60%). Scale-down via background loop (30s). Configurable via API, CLI, and bonito.yaml scaling block. Enterprise+ only.

KB vector dimension upgrade — 768 → 1024 (5/25)

Pgvector column migrated from vector(768) to vector(1024) to match Titan Embed V2 native dimensions. Migration NULLs existing embeddings, alters the column, and backfills KBs to embedding_dimensions=1024.

Production reliability — 6 fixes (5/25)

Pgvector greenlet_spawn fix (codec registration moved to checkout event). KB delete cascade fix (raw SQL bypasses ORM ARRAY/vector coercion). Alembic multiple-heads merge after branched migration. Ingestion error handler rollback. GCS fast-fail on missing credentials. Embedding timeout raised 30s → 90s.

KB search quality (5/24)

Tool-search threshold lowered from 0.7 to 0.5 to match RAG injection. Added MODEL_MAX_DIMENSIONS map to clamp requested embedding dimensions to model max — fixes silent ingestion failures with GCP text-embedding-005 (768 max) at KB default 1024.

Gateway Vault fallback (5/24)

_get_provider_credentials() now uses Vault → encrypted DB fallback chain across all provider lookups. No more single-point-of-failure on Vault availability.

External orchestration / Breadcrumbs tracing (5/23)

POST /api/agents/{id}/execute accepts optional parent_agent_id. When set, a synthetic invoke_agent delegation record is logged in the parent's session, so code-orchestrated pipelines appear in Breadcrumbs with zero latency impact. CLI flag: --parent-agent.

Agent Health dashboard (5/23)

Platform admin page at /admin/agent-health cross-references agent model_ids against available provider models to detect deprecated or unroutable models. Background check runs after every 24h model sync. Per-agent health badges: Healthy, Deprecated, No Route, Warning.

Gateway duplicate-provider fix (5/23)

_get_provider_credentials() now keys by provider UUID instead of provider_type — fixes silent credential overwrites when orgs have multiple providers of the same type. All provider CRUD endpoints now call reset_router() for immediate cache invalidation instead of waiting 50min TTL.

Image generation endpoint (5/20)

POST /v1/images/generations live across dall-e-3, dall-e-2, gpt-image-1. Same bn- key as chat. Powers creative-asset workflows: brand-asset pipelines, marketing visuals, campaign generation.

Video generation endpoints (5/20)

POST /v1/videos (submit), GET /v1/videos/{id} (status), GET /v1/videos/{id}/content (download) across OpenAI Sora-2 and Vertex AI Veo 2.0/3.0/3.1. Credentials injected from Vault/DB. Per-second cost tracking.

Sentry tracking — backend + frontend (5/12)

sentry-sdk[fastapi] initializes before FastAPI app with environment-aware sampling (20% prod, 100% dev). @sentry/nextjs SDK with client/server/edge configs, instrumentation hook, global error boundary, and source-map upload via withSentryConfig.

API schema hardening — extra="forbid" (5/12)

All Bonobot create/update schemas now reject unknown fields with 422 instead of silently dropping them. Covers AgentCreate, AgentUpdate, AgentConnectionCreate, AgentGroupCreate/Update, AgentExecuteRequest, AgentScheduleCreate/Update.

All 6 providers connectable via UI (5/06)

Connect modal + onboarding wizard fixed for all 6 providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex AI, Azure AI, Groq). Anthropic validation uses /v1/models instead of hardcoded model. Connect modal uses apiRequest() for JWT auth.

Background model sync — 24h (5/06)

model_sync.py runs every 24h and syncs models for all active providers. Anthropic now uses live API with static pricing fallback. Wired into FastAPI lifespan.

Credential storage fix (5/06)

Legacy POST/PATCH /api/providers endpoints now encrypt credentials at write time (were storing plain JSON). DB fallback auto-migrates plain JSON → AES-256-GCM on read. Bedrock _check_model_access fixed to use real API.

Admin access requests UI (5/06)

Admin page at /admin/access-requests for invite-only registration approval. Submit → admin approve → invite code → register flow, rate-limited at 5 req/60s.

March–April 2026

Bonobot agent framework

Enterprise agent framework with visual canvas (React Flow), project-based organization, built-in tools (KB search, HTTP, agent-to-agent), and enterprise security: default-deny tool policy, budget stops, rate limiting, SSRF protection, full audit trail.

RAG knowledge bases on pgvector HNSW

Cross-cloud RAG pipeline: upload/parse/chunk/embed docs, pgvector HNSW search, gateway context injection on chat completions, source citations on every response.

VectorBoost — KB compression

3.9-8x storage reduction across scalar-8bit, polar-8bit, polar-4bit compression methods. Enterprise+ configurable per KB.

SAML SSO across 4 IdPs

Okta, Azure AD, Google Workspace, Custom SAML. SSO enforcement, break-glass admin, JIT provisioning.

Persistent agent memory

Long-term agent memory with pgvector similarity search, 5 memory types, AI-powered extraction. Cross-session continuity for production agent deployments.

Scheduled autonomous execution

Cron-based agent tasks with timezone support and multi-channel delivery (webhook, email, Slack).

Approval queue / Human-in-the-loop

Risk assessment per tool call, auto-approve conditions, timeout handling, full audit trails. Enterprise governance for agent-initiated actions.

Org Secrets Store — Vault-backed

HashiCorp Vault-backed key-value secrets, runtime injection into agent system prompts. Org-scoped, never exposed to model providers.

Memwright — shared conversational memory

Per-session memory via SQLite + ChromaDB. Model tier gating (zero memory for small models, full context for premium tiers).

February 2026

Deployment Provisioning

Deploy AI models directly into your cloud from the Bonito UI. AWS Provisioned Throughput (reserved capacity), Azure OpenAI deployments (Standard/GlobalStandard TPM), and GCP Vertex AI serverless — all without leaving the dashboard.

Least-Privilege Permissions

Two IAM modes for every provider: Quick Start (managed roles for fast setup) and Enterprise (separate least-privilege policies per capability). Only grant the exact permissions each feature needs.

Routing Policies

Cost-optimized routing, failover chains, and A/B testing with weight-based model selection. Route requests intelligently across providers and models with dry-run testing.

Notifications

In-app notification system for deployment lifecycle events, spend alerts, model activation confirmations, and provider health updates. Configurable alert rules with email and in-app delivery.

Bonito CLI

Python CLI (bonito-cli) for terminal-based management. Manage providers, models, gateway keys, routing policies, and costs from your terminal or CI/CD pipelines.

AI Copilot

Natural language assistant for managing your AI infrastructure. Ask questions about costs, configure routing, and analyze provider health.

Enhanced Cost Analytics

Breakdown by model, provider, team, and application. Export reports in CSV and PDF formats.

January 2026

One-Click Model Activation

Enable models directly from the Bonito dashboard without leaving to your cloud console. Supports individual enable and bulk activation (up to 20 models at once). Works across AWS Bedrock entitlements, Azure deployments, and GCP API enablement.

API Gateway v2

OpenAI-compatible gateway endpoint with intelligent request routing, automatic failover, and support for routing policies. One API key for all your providers.

Google Vertex AI Support

Full integration with Google Vertex AI including Gemini models. Connect and manage alongside your other providers.

Audit Trail

Complete audit logging for every API call, configuration change, and team action. Export for compliance reporting.

December 2025

Budget Alerts

Set spending thresholds per provider, per team, or globally. Receive email and in-app notifications before you exceed them.

Azure OpenAI Integration

Connect your Azure OpenAI deployments alongside AWS Bedrock for multi-cloud routing strategies.

November 2025

Public Launch

Bonito is live! Unified multi-cloud AI management with support for AWS Bedrock, Azure OpenAI, and Google Cloud Vertex AI.