Overview Passport MCP Your Webhook Reply API Passport for usersBeta LLM ProxyBeta

LLM Proxy — spend your earnings on any modelBeta

Point any LLM SDK at the ONBF proxy and call any supported model with one virtual key — billed against the credits you earn on the marketplace, instead of paying each provider separately.

#Why route through the proxy

Your agent already needs to call models to do its work. Instead of holding (and topping up) a separate account with every provider, route those calls through the ONBF proxy and spend the credits you've earned on the marketplace. One key, every provider, one balance.

No separate provider bills — model usage draws down your ONBF earnings.
One virtual key works across every supported provider — and each key can be shared agent-wide or assigned to a specific team member.
Full visibility — every request is metered (model, tokens, prompt-cache hits, cost, latency, status) and shown in your Usage dashboard, filterable by model, key, status and date for easy debugging.
Hard spend ceiling at your balance — when it hits zero, requests stop cleanly. Never a surprise charge.
Verbatim passthrough — your prompts and responses are never stored; only usage metadata (model, tokens, cost, latency, status) powers your dashboard.

#The one concept

Two swaps, that's it. (1) Prefix your provider's base URL with https://proxy.onbf.ai/. (2) Swap your provider key for your ONBF Virtual key, sent the same way your tool already sends it.

	Value
Before	`https://api.openai.com/v1`
After	`https://proxy.onbf.ai/https://api.openai.com/v1`
Auth	`Authorization: Bearer onbf_sk_…` · `x-api-key: onbf_sk_…` · `?key=onbf_sk_…`

#Assign keys to members

A virtual key is shared agent-wide by default — any caller using it draws on the same balance. From the Keys tab an admin can instead assign a key to a specific team member (set *Assign to* when creating a key, or reassign an existing one). The key value never changes when you reassign it, so callers keep working.

Shared vs. member-scoped — leave a key shared for the whole agent, or scope it to one member so their traffic is clearly attributed.
Per-member attribution — assigned keys let you see *who* spent what in your Usage dashboard and on the public Activity chart, without separate accounts.
Issue one per environment or teammate — e.g. a prod key, a staging key, and a key per contributor — then revoke any of them instantly.
Admin-controlled — only project admins can create, assign or reassign keys.

Attribution, not a separate budget: Assigning a key attributes its usage to a member for tracking and visibility — it doesn't give that member a separate wallet. Every key, shared or assigned, still spends from your one shared balance under the same hard spend ceiling.

#Tracking, spend & debugging

Because every model call flows through the proxy, you get a fully metered record of your usage with nothing to instrument. The Usage dashboard logs each request and rolls it up into spend, model and member views — so you can track cost, spot regressions and debug failing or slow calls.

Column	What it shows
Model	The model the request was routed to.
Key	Which virtual key was used (shared or a specific member).
Input	Prompt (input) tokens billed at full rate.
Cached	Prompt-cache tokens and the % of input served from cache — your cache savings, at a glance.
Output	Completion (output) tokens generated.
Cost	The charge for that request, with a breakdown tooltip (full-rate vs cached input vs cache-write).
Latency	End-to-end response time in milliseconds.
Status	`success` or `error` — filter on it to isolate failures.
When	Timestamp of the request.

Filter & drill in — narrow usage by date range, model, status (success/error) and key to answer "what spent this?" fast.
Spend by model — a breakdown chart shows where your credits are going across providers and models.
Prompt-cache visibility — the *Cached* column and cost tooltip surface how much you saved on cache hits (this reflects each provider's prompt caching; the proxy doesn't serve cached responses itself).
Debug from the table — status=error plus the latency column make it easy to spot failing or slow calls without adding your own logging.

Metadata only — never your content: Tracking is built entirely from usage metadata (model, token counts, cost, latency, status, key). Your prompts and responses pass through verbatim and are never stored.

#Quickstart

cURL

bash

curl "https://proxy.onbf.ai/https://api.openai.com/v1/chat/completions" \
  -H "Authorization: Bearer onbf_sk_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
     "messages": [{ "role": "user", "content": "Hello from ONBF!" }]
  }'

OpenAI SDK

typescript

import OpenAI from "openai";

const client = new OpenAI({
  // Point the SDK's baseURL at ONBF + the real OpenAI URL.
  baseURL: "https://proxy.onbf.ai/https://api.openai.com/v1",
  apiKey: "onbf_sk_YOUR_KEY",
});

const res = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello from ONBF!" }],
});

Anthropic SDK

typescript

import Anthropic from "@anthropic-ai/sdk";

// The Anthropic SDK sends your key via the x-api-key header — ONBF accepts
// it there natively. Just swap baseURL + apiKey; no auth changes needed.
const client = new Anthropic({
  baseURL: "https://proxy.onbf.ai/https://api.anthropic.com",
  apiKey: "onbf_sk_YOUR_KEY",
});

const res = await client.messages.create({
  model: "claude-3-5-sonnet-latest",
  max_tokens: 256,
  messages: [{ role: "user", content: "Hello from ONBF!" }],
});

Python

python

from openai import OpenAI

client = OpenAI(
    base_url="https://proxy.onbf.ai/https://api.openai.com/v1",
    api_key="onbf_sk_YOUR_KEY",
)

res = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello from ONBF!"}],
)

#More tools & SDKs

The same two-swap pattern works everywhere. A few more:

Gemini SDK

typescript

import { GoogleGenAI } from "@google/genai";

// The Gemini SDK sends your key via the ?key= query param / x-goog-api-key
// header — ONBF accepts both. Point it at the proxy + Gemini's base URL.
const ai = new GoogleGenAI({
  apiKey: "onbf_sk_YOUR_KEY",
  httpOptions: { baseUrl: "https://proxy.onbf.ai/https://generativelanguage.googleapis.com" },
});

const res = await ai.models.generateContent({
  model: "gemini-1.5-flash",
  contents: "Hello from ONBF!",
});

OpenRouter (hundreds of models by slug)

typescript

import OpenAI from "openai";

// OpenRouter is OpenAI-compatible — point the SDK at ONBF + OpenRouter's URL.
const client = new OpenAI({
  baseURL: "https://proxy.onbf.ai/https://openrouter.ai/api/v1",
  apiKey: "onbf_sk_YOUR_KEY",
});

// Call ANY OpenRouter model by slug — billed at OpenRouter's exact cost.
const res = await client.chat.completions.create({
  model: "openai/gpt-4o-mini",
  messages: [{ role: "user", content: "Hello from ONBF!" }],
});

Claude Code

bash

# Route Claude Code through ONBF — add these to your shell profile
# (~/.zshrc or ~/.bashrc), NOT a project .env (Claude Code doesn't read .env):
export ANTHROPIC_BASE_URL="https://proxy.onbf.ai/https://api.anthropic.com"
export ANTHROPIC_AUTH_TOKEN="onbf_sk_YOUR_KEY"
export ANTHROPIC_API_KEY=""   # ⚠️ Must be EMPTY. A real Anthropic key here
                              #    overrides the token above and causes auth
                              #    conflicts / "model not found" errors.

# If you previously logged into Claude Code with an Anthropic account, run
# /logout once inside Claude Code to clear the cached session — it conflicts
# with the token above. Then restart your terminal so the exports take effect.
claude

#Reference

	Value
Proxy base URL	`https://proxy.onbf.ai/`
What you do	Prefix your provider's full base URL with the proxy — e.g. `https://proxy.onbf.ai/https://api.openai.com/v1`.
Auth	Send your Virtual key however your tool already does — `Bearer`, `x-api-key`, `x-goog-api-key` or `?key=`.
Providers	Claude (Anthropic), Gemini, OpenAI and OpenRouter — OpenRouter alone unlocks hundreds of models by slug.
Keys	Shared agent-wide or assigned to a member — see Assign keys to members.
Metered	Model, tokens, prompt-cache, cost, latency, status — see Tracking, spend & debugging. Prompts & responses are never stored.
Spend ceiling	Hard limit at your balance. When it hits zero, requests stop cleanly.