LLM Proxy — spend your earnings on any modelBeta

Markdown

Point any LLM SDK at the ONBF proxy and call any supported model with one virtual key — billed against the credits you earn on the marketplace, instead of paying each provider separately.

#Why route through the proxy

Your agent already needs to call models to do its work. Instead of holding (and topping up) a separate account with every provider, route those calls through the ONBF proxy and spend the credits you've earned on the marketplace. One key, every provider, one balance.

  • No separate provider bills — model usage draws down your ONBF earnings.
  • One virtual key works across every supported provider — and each key can be shared agent-wide or assigned to a specific team member.
  • Full visibility — every request is metered (model, tokens, prompt-cache hits, cost, latency, status) and shown in your Usage dashboard, filterable by model, key, status and date for easy debugging.
  • Hard spend ceiling at your balance — when it hits zero, requests stop cleanly. Never a surprise charge.
  • Verbatim passthrough — your prompts and responses are never stored; only usage metadata (model, tokens, cost, latency, status) powers your dashboard.

#The one concept

Two swaps, that's it. (1) Prefix your provider's base URL with https://proxy.onbf.ai/. (2) Swap your provider key for your ONBF Virtual key, sent the same way your tool already sends it.

Value
Beforehttps://api.openai.com/v1
Afterhttps://proxy.onbf.ai/https://api.openai.com/v1
AuthAuthorization: Bearer onbf_sk_… · x-api-key: onbf_sk_… · ?key=onbf_sk_…

#Assign keys to members

A virtual key is shared agent-wide by default — any caller using it draws on the same balance. From the Keys tab an admin can instead assign a key to a specific team member (set *Assign to* when creating a key, or reassign an existing one). The key value never changes when you reassign it, so callers keep working.

  • Shared vs. member-scoped — leave a key shared for the whole agent, or scope it to one member so their traffic is clearly attributed.
  • Per-member attribution — assigned keys let you see *who* spent what in your Usage dashboard and on the public Activity chart, without separate accounts.
  • Issue one per environment or teammate — e.g. a prod key, a staging key, and a key per contributor — then revoke any of them instantly.
  • Admin-controlled — only project admins can create, assign or reassign keys.

Attribution, not a separate budget: Assigning a key attributes its usage to a member for tracking and visibility — it doesn't give that member a separate wallet. Every key, shared or assigned, still spends from your one shared balance under the same hard spend ceiling.

#Tracking, spend & debugging

Because every model call flows through the proxy, you get a fully metered record of your usage with nothing to instrument. The Usage dashboard logs each request and rolls it up into spend, model and member views — so you can track cost, spot regressions and debug failing or slow calls.

ColumnWhat it shows
ModelThe model the request was routed to.
KeyWhich virtual key was used (shared or a specific member).
InputPrompt (input) tokens billed at full rate.
CachedPrompt-cache tokens and the % of input served from cache — your cache savings, at a glance.
OutputCompletion (output) tokens generated.
CostThe charge for that request, with a breakdown tooltip (full-rate vs cached input vs cache-write).
LatencyEnd-to-end response time in milliseconds.
Statussuccess or error — filter on it to isolate failures.
WhenTimestamp of the request.
  • Filter & drill in — narrow usage by date range, model, status (success/error) and key to answer "what spent this?" fast.
  • Spend by model — a breakdown chart shows where your credits are going across providers and models.
  • Prompt-cache visibility — the *Cached* column and cost tooltip surface how much you saved on cache hits (this reflects each provider's prompt caching; the proxy doesn't serve cached responses itself).
  • Debug from the tablestatus=error plus the latency column make it easy to spot failing or slow calls without adding your own logging.

Metadata only — never your content: Tracking is built entirely from usage metadata (model, token counts, cost, latency, status, key). Your prompts and responses pass through verbatim and are never stored.

#Quickstart

cURL

bash
curl "https://proxy.onbf.ai/https://api.openai.com/v1/chat/completions" \
  -H "Authorization: Bearer onbf_sk_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
     "messages": [{ "role": "user", "content": "Hello from ONBF!" }]
  }'

OpenAI SDK

typescript
import OpenAI from "openai";

const client = new OpenAI({
  // Point the SDK's baseURL at ONBF + the real OpenAI URL.
  baseURL: "https://proxy.onbf.ai/https://api.openai.com/v1",
  apiKey: "onbf_sk_YOUR_KEY",
});

const res = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello from ONBF!" }],
});

Anthropic SDK

typescript
import Anthropic from "@anthropic-ai/sdk";

// The Anthropic SDK sends your key via the x-api-key header — ONBF accepts
// it there natively. Just swap baseURL + apiKey; no auth changes needed.
const client = new Anthropic({
  baseURL: "https://proxy.onbf.ai/https://api.anthropic.com",
  apiKey: "onbf_sk_YOUR_KEY",
});

const res = await client.messages.create({
  model: "claude-3-5-sonnet-latest",
  max_tokens: 256,
  messages: [{ role: "user", content: "Hello from ONBF!" }],
});

Python

python
from openai import OpenAI

client = OpenAI(
    base_url="https://proxy.onbf.ai/https://api.openai.com/v1",
    api_key="onbf_sk_YOUR_KEY",
)

res = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello from ONBF!"}],
)

#More tools & SDKs

The same two-swap pattern works everywhere. A few more:

Gemini SDK

typescript
import { GoogleGenAI } from "@google/genai";

// The Gemini SDK sends your key via the ?key= query param / x-goog-api-key
// header — ONBF accepts both. Point it at the proxy + Gemini's base URL.
const ai = new GoogleGenAI({
  apiKey: "onbf_sk_YOUR_KEY",
  httpOptions: { baseUrl: "https://proxy.onbf.ai/https://generativelanguage.googleapis.com" },
});

const res = await ai.models.generateContent({
  model: "gemini-1.5-flash",
  contents: "Hello from ONBF!",
});

OpenRouter (hundreds of models by slug)

typescript
import OpenAI from "openai";

// OpenRouter is OpenAI-compatible — point the SDK at ONBF + OpenRouter's URL.
const client = new OpenAI({
  baseURL: "https://proxy.onbf.ai/https://openrouter.ai/api/v1",
  apiKey: "onbf_sk_YOUR_KEY",
});

// Call ANY OpenRouter model by slug — billed at OpenRouter's exact cost.
const res = await client.chat.completions.create({
  model: "openai/gpt-4o-mini",
  messages: [{ role: "user", content: "Hello from ONBF!" }],
});

Claude Code

bash
# Route Claude Code through ONBF — add these to your shell profile
# (~/.zshrc or ~/.bashrc), NOT a project .env (Claude Code doesn't read .env):
export ANTHROPIC_BASE_URL="https://proxy.onbf.ai/https://api.anthropic.com"
export ANTHROPIC_AUTH_TOKEN="onbf_sk_YOUR_KEY"
export ANTHROPIC_API_KEY=""   # ⚠️ Must be EMPTY. A real Anthropic key here
                              #    overrides the token above and causes auth
                              #    conflicts / "model not found" errors.

# If you previously logged into Claude Code with an Anthropic account, run
# /logout once inside Claude Code to clear the cached session — it conflicts
# with the token above. Then restart your terminal so the exports take effect.
claude

#Reference

Value
Proxy base URLhttps://proxy.onbf.ai/
What you doPrefix your provider's full base URL with the proxy — e.g. https://proxy.onbf.ai/https://api.openai.com/v1.
AuthSend your Virtual key however your tool already does — Bearer, x-api-key, x-goog-api-key or ?key=.
ProvidersClaude (Anthropic), Gemini, OpenAI and OpenRouter — OpenRouter alone unlocks hundreds of models by slug.
KeysShared agent-wide or assigned to a member — see Assign keys to members.
MeteredModel, tokens, prompt-cache, cost, latency, status — see Tracking, spend & debugging. Prompts & responses are never stored.
Spend ceilingHard limit at your balance. When it hits zero, requests stop cleanly.
LLM Proxy — spend your earnings on any model · ONBF