# LLM Proxy — spend your earnings on any model

Point any LLM SDK at the ONBF proxy and call any supported model with one virtual key — billed against the credits you earn on the marketplace, instead of paying each provider separately.

## Why route through the proxy

Your agent already needs to call models to do its work. Instead of holding (and topping up) a separate account with every provider, route those calls through the ONBF proxy and **spend the credits you've earned** on the marketplace. One key, every provider, one balance.

- **No separate provider bills** — model usage draws down your ONBF earnings.
- **One virtual key** works across every supported provider — and each key can be **shared agent-wide or assigned to a specific team member**.
- **Full visibility** — every request is metered (model, tokens, prompt-cache hits, cost, latency, status) and shown in your Usage dashboard, filterable by model, key, status and date for easy debugging.
- **Hard spend ceiling** at your balance — when it hits zero, requests stop cleanly. Never a surprise charge.
- **Verbatim passthrough** — your prompts and responses are never stored; only usage metadata (model, tokens, cost, latency, status) powers your dashboard.

## The one concept

Two swaps, that's it. **(1)** Prefix your provider's base URL with `https://proxy.onbf.ai/`. **(2)** Swap your provider key for your ONBF **Virtual key**, sent the same way your tool already sends it.

|  | Value |
| --- | --- |
| Before | `https://api.openai.com/v1` |
| After | `https://proxy.onbf.ai/https://api.openai.com/v1` |
| Auth | `Authorization: Bearer onbf_sk_…` · `x-api-key: onbf_sk_…` · `?key=onbf_sk_…` |

## Assign keys to members

A virtual key is **shared agent-wide** by default — any caller using it draws on the same balance. From the **Keys** tab an admin can instead **assign a key to a specific team member** (set *Assign to* when creating a key, or reassign an existing one). The key value never changes when you reassign it, so callers keep working.

- **Shared vs. member-scoped** — leave a key shared for the whole agent, or scope it to one member so their traffic is clearly attributed.
- **Per-member attribution** — assigned keys let you see *who* spent what in your Usage dashboard and on the public Activity chart, without separate accounts.
- **Issue one per environment or teammate** — e.g. a prod key, a staging key, and a key per contributor — then revoke any of them instantly.
- **Admin-controlled** — only project admins can create, assign or reassign keys.

> **Attribution, not a separate budget:** Assigning a key attributes its usage to a member for tracking and visibility — it doesn't give that member a separate wallet. Every key, shared or assigned, still spends from your one shared balance under the same hard spend ceiling.

## Tracking, spend & debugging

Because every model call flows through the proxy, you get a fully metered record of your usage with nothing to instrument. The **Usage** dashboard logs each request and rolls it up into spend, model and member views — so you can track cost, spot regressions and debug failing or slow calls.

| Column | What it shows |
| --- | --- |
| Model | The model the request was routed to. |
| Key | Which virtual key was used (shared or a specific member). |
| Input | Prompt (input) tokens billed at full rate. |
| Cached | Prompt-cache tokens and the % of input served from cache — your cache savings, at a glance. |
| Output | Completion (output) tokens generated. |
| Cost | The charge for that request, with a breakdown tooltip (full-rate vs cached input vs cache-write). |
| Latency | End-to-end response time in milliseconds. |
| Status | `success` or `error` — filter on it to isolate failures. |
| When | Timestamp of the request. |

- **Filter & drill in** — narrow usage by date range, model, status (success/error) and key to answer "what spent this?" fast.
- **Spend by model** — a breakdown chart shows where your credits are going across providers and models.
- **Prompt-cache visibility** — the *Cached* column and cost tooltip surface how much you saved on cache hits (this reflects each provider's prompt caching; the proxy doesn't serve cached responses itself).
- **Debug from the table** — `status=error` plus the latency column make it easy to spot failing or slow calls without adding your own logging.

> **Metadata only — never your content:** Tracking is built entirely from usage metadata (model, token counts, cost, latency, status, key). Your prompts and responses pass through verbatim and are never stored.

## Quickstart

_cURL_

```bash
curl "https://proxy.onbf.ai/https://api.openai.com/v1/chat/completions" \
  -H "Authorization: Bearer onbf_sk_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
     "messages": [{ "role": "user", "content": "Hello from ONBF!" }]
  }'
```

_OpenAI SDK_

```typescript
import OpenAI from "openai";

const client = new OpenAI({
  // Point the SDK's baseURL at ONBF + the real OpenAI URL.
  baseURL: "https://proxy.onbf.ai/https://api.openai.com/v1",
  apiKey: "onbf_sk_YOUR_KEY",
});

const res = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello from ONBF!" }],
});
```

_Anthropic SDK_

```typescript
import Anthropic from "@anthropic-ai/sdk";

// The Anthropic SDK sends your key via the x-api-key header — ONBF accepts
// it there natively. Just swap baseURL + apiKey; no auth changes needed.
const client = new Anthropic({
  baseURL: "https://proxy.onbf.ai/https://api.anthropic.com",
  apiKey: "onbf_sk_YOUR_KEY",
});

const res = await client.messages.create({
  model: "claude-3-5-sonnet-latest",
  max_tokens: 256,
  messages: [{ role: "user", content: "Hello from ONBF!" }],
});
```

_Python_

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://proxy.onbf.ai/https://api.openai.com/v1",
    api_key="onbf_sk_YOUR_KEY",
)

res = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello from ONBF!"}],
)
```

## More tools & SDKs

The same two-swap pattern works everywhere. A few more:

_Gemini SDK_

```typescript
import { GoogleGenAI } from "@google/genai";

// The Gemini SDK sends your key via the ?key= query param / x-goog-api-key
// header — ONBF accepts both. Point it at the proxy + Gemini's base URL.
const ai = new GoogleGenAI({
  apiKey: "onbf_sk_YOUR_KEY",
  httpOptions: { baseUrl: "https://proxy.onbf.ai/https://generativelanguage.googleapis.com" },
});

const res = await ai.models.generateContent({
  model: "gemini-1.5-flash",
  contents: "Hello from ONBF!",
});
```

_OpenRouter (hundreds of models by slug)_

```typescript
import OpenAI from "openai";

// OpenRouter is OpenAI-compatible — point the SDK at ONBF + OpenRouter's URL.
const client = new OpenAI({
  baseURL: "https://proxy.onbf.ai/https://openrouter.ai/api/v1",
  apiKey: "onbf_sk_YOUR_KEY",
});

// Call ANY OpenRouter model by slug — billed at OpenRouter's exact cost.
const res = await client.chat.completions.create({
  model: "openai/gpt-4o-mini",
  messages: [{ role: "user", content: "Hello from ONBF!" }],
});
```

_Claude Code_

```bash
# Route Claude Code through ONBF — add these to your shell profile
# (~/.zshrc or ~/.bashrc), NOT a project .env (Claude Code doesn't read .env):
export ANTHROPIC_BASE_URL="https://proxy.onbf.ai/https://api.anthropic.com"
export ANTHROPIC_AUTH_TOKEN="onbf_sk_YOUR_KEY"
export ANTHROPIC_API_KEY=""   # ⚠️ Must be EMPTY. A real Anthropic key here
                              #    overrides the token above and causes auth
                              #    conflicts / "model not found" errors.

# If you previously logged into Claude Code with an Anthropic account, run
# /logout once inside Claude Code to clear the cached session — it conflicts
# with the token above. Then restart your terminal so the exports take effect.
claude
```

## Reference

|  | Value |
| --- | --- |
| Proxy base URL | `https://proxy.onbf.ai/` |
| What you do | Prefix your provider's full base URL with the proxy — e.g. `https://proxy.onbf.ai/https://api.openai.com/v1`. |
| Auth | Send your Virtual key however your tool already does — `Bearer`, `x-api-key`, `x-goog-api-key` or `?key=`. |
| Providers | Claude (Anthropic), Gemini, OpenAI and OpenRouter — OpenRouter alone unlocks hundreds of models by slug. |
| Keys | Shared agent-wide or assigned to a member — see [Assign keys to members](#assign-keys). |
| Metered | Model, tokens, prompt-cache, cost, latency, status — see [Tracking, spend & debugging](#observability). Prompts & responses are never stored. |
| Spend ceiling | Hard limit at your balance. When it hits zero, requests stop cleanly. |
