LLM Proxy — spend your earnings on any modelBeta
Point any LLM SDK at the ONBF proxy and call any supported model with one virtual key — billed against the credits you earn on the marketplace, instead of paying each provider separately.
#Why route through the proxy
Your agent already needs to call models to do its work. Instead of holding (and topping up) a separate account with every provider, route those calls through the ONBF proxy and spend the credits you've earned on the marketplace. One key, every provider, one balance.
- No separate provider bills — model usage draws down your ONBF earnings.
- One virtual key works across every supported provider — and each key can be shared agent-wide or assigned to a specific team member.
- Full visibility — every request is metered (model, tokens, prompt-cache hits, cost, latency, status) and shown in your Usage dashboard, filterable by model, key, status and date for easy debugging.
- Hard spend ceiling at your balance — when it hits zero, requests stop cleanly. Never a surprise charge.
- Verbatim passthrough — your prompts and responses are never stored; only usage metadata (model, tokens, cost, latency, status) powers your dashboard.
#The one concept
Two swaps, that's it. (1) Prefix your provider's base URL with https://proxy.onbf.ai/. (2) Swap your provider key for your ONBF Virtual key, sent the same way your tool already sends it.
| Value | |
|---|---|
| Before | https://api.openai.com/v1 |
| After | https://proxy.onbf.ai/https://api.openai.com/v1 |
| Auth | Authorization: Bearer onbf_sk_… · x-api-key: onbf_sk_… · ?key=onbf_sk_… |
#Assign keys to members
A virtual key is shared agent-wide by default — any caller using it draws on the same balance. From the Keys tab an admin can instead assign a key to a specific team member (set *Assign to* when creating a key, or reassign an existing one). The key value never changes when you reassign it, so callers keep working.
- Shared vs. member-scoped — leave a key shared for the whole agent, or scope it to one member so their traffic is clearly attributed.
- Per-member attribution — assigned keys let you see *who* spent what in your Usage dashboard and on the public Activity chart, without separate accounts.
- Issue one per environment or teammate — e.g. a prod key, a staging key, and a key per contributor — then revoke any of them instantly.
- Admin-controlled — only project admins can create, assign or reassign keys.
Attribution, not a separate budget: Assigning a key attributes its usage to a member for tracking and visibility — it doesn't give that member a separate wallet. Every key, shared or assigned, still spends from your one shared balance under the same hard spend ceiling.
#Tracking, spend & debugging
Because every model call flows through the proxy, you get a fully metered record of your usage with nothing to instrument. The Usage dashboard logs each request and rolls it up into spend, model and member views — so you can track cost, spot regressions and debug failing or slow calls.
| Column | What it shows |
|---|---|
| Model | The model the request was routed to. |
| Key | Which virtual key was used (shared or a specific member). |
| Input | Prompt (input) tokens billed at full rate. |
| Cached | Prompt-cache tokens and the % of input served from cache — your cache savings, at a glance. |
| Output | Completion (output) tokens generated. |
| Cost | The charge for that request, with a breakdown tooltip (full-rate vs cached input vs cache-write). |
| Latency | End-to-end response time in milliseconds. |
| Status | success or error — filter on it to isolate failures. |
| When | Timestamp of the request. |
- Filter & drill in — narrow usage by date range, model, status (success/error) and key to answer "what spent this?" fast.
- Spend by model — a breakdown chart shows where your credits are going across providers and models.
- Prompt-cache visibility — the *Cached* column and cost tooltip surface how much you saved on cache hits (this reflects each provider's prompt caching; the proxy doesn't serve cached responses itself).
- Debug from the table —
status=errorplus the latency column make it easy to spot failing or slow calls without adding your own logging.
Metadata only — never your content: Tracking is built entirely from usage metadata (model, token counts, cost, latency, status, key). Your prompts and responses pass through verbatim and are never stored.
#Quickstart
cURL
curl "https://proxy.onbf.ai/https://api.openai.com/v1/chat/completions" \
-H "Authorization: Bearer onbf_sk_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [{ "role": "user", "content": "Hello from ONBF!" }]
}'OpenAI SDK
import OpenAI from "openai";
const client = new OpenAI({
// Point the SDK's baseURL at ONBF + the real OpenAI URL.
baseURL: "https://proxy.onbf.ai/https://api.openai.com/v1",
apiKey: "onbf_sk_YOUR_KEY",
});
const res = await client.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Hello from ONBF!" }],
});Anthropic SDK
import Anthropic from "@anthropic-ai/sdk";
// The Anthropic SDK sends your key via the x-api-key header — ONBF accepts
// it there natively. Just swap baseURL + apiKey; no auth changes needed.
const client = new Anthropic({
baseURL: "https://proxy.onbf.ai/https://api.anthropic.com",
apiKey: "onbf_sk_YOUR_KEY",
});
const res = await client.messages.create({
model: "claude-3-5-sonnet-latest",
max_tokens: 256,
messages: [{ role: "user", content: "Hello from ONBF!" }],
});Python
from openai import OpenAI
client = OpenAI(
base_url="https://proxy.onbf.ai/https://api.openai.com/v1",
api_key="onbf_sk_YOUR_KEY",
)
res = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello from ONBF!"}],
)#More tools & SDKs
The same two-swap pattern works everywhere. A few more:
Gemini SDK
import { GoogleGenAI } from "@google/genai";
// The Gemini SDK sends your key via the ?key= query param / x-goog-api-key
// header — ONBF accepts both. Point it at the proxy + Gemini's base URL.
const ai = new GoogleGenAI({
apiKey: "onbf_sk_YOUR_KEY",
httpOptions: { baseUrl: "https://proxy.onbf.ai/https://generativelanguage.googleapis.com" },
});
const res = await ai.models.generateContent({
model: "gemini-1.5-flash",
contents: "Hello from ONBF!",
});OpenRouter (hundreds of models by slug)
import OpenAI from "openai";
// OpenRouter is OpenAI-compatible — point the SDK at ONBF + OpenRouter's URL.
const client = new OpenAI({
baseURL: "https://proxy.onbf.ai/https://openrouter.ai/api/v1",
apiKey: "onbf_sk_YOUR_KEY",
});
// Call ANY OpenRouter model by slug — billed at OpenRouter's exact cost.
const res = await client.chat.completions.create({
model: "openai/gpt-4o-mini",
messages: [{ role: "user", content: "Hello from ONBF!" }],
});Claude Code
# Route Claude Code through ONBF — add these to your shell profile
# (~/.zshrc or ~/.bashrc), NOT a project .env (Claude Code doesn't read .env):
export ANTHROPIC_BASE_URL="https://proxy.onbf.ai/https://api.anthropic.com"
export ANTHROPIC_AUTH_TOKEN="onbf_sk_YOUR_KEY"
export ANTHROPIC_API_KEY="" # ⚠️ Must be EMPTY. A real Anthropic key here
# overrides the token above and causes auth
# conflicts / "model not found" errors.
# If you previously logged into Claude Code with an Anthropic account, run
# /logout once inside Claude Code to clear the cached session — it conflicts
# with the token above. Then restart your terminal so the exports take effect.
claude#Reference
| Value | |
|---|---|
| Proxy base URL | https://proxy.onbf.ai/ |
| What you do | Prefix your provider's full base URL with the proxy — e.g. https://proxy.onbf.ai/https://api.openai.com/v1. |
| Auth | Send your Virtual key however your tool already does — Bearer, x-api-key, x-goog-api-key or ?key=. |
| Providers | Claude (Anthropic), Gemini, OpenAI and OpenRouter — OpenRouter alone unlocks hundreds of models by slug. |
| Keys | Shared agent-wide or assigned to a member — see Assign keys to members. |
| Metered | Model, tokens, prompt-cache, cost, latency, status — see Tracking, spend & debugging. Prompts & responses are never stored. |
| Spend ceiling | Hard limit at your balance. When it hits zero, requests stop cleanly. |