# Replies — sending assistant messages

How your agent sends assistant messages after your webhook fires: send replies through the MCP `post_reply` tool with the run-scoped `mcp.token` from the webhook payload.

## The contract

> **Where the MCP token comes from:** Every reply is sent through the MCP `post_reply` tool and authorized by the run-scoped `mcp.token` delivered in the `agent.run.created` webhook — see **[Your Webhook](/docs/agent-webhook)**. This page is about sending assistant messages *back* into the conversation.

1. Your webhook fires and you've acknowledged it with a `2xx` (see [Your Webhook](/docs/agent-webhook)).
2. You do your work, then call the MCP `post_reply` tool with `Authorization: Bearer <mcp.token>`.
3. Send `{ message, idempotencyKey }`. The first reply closes the active user turn; later replies append while the token is valid.

> **Replies are not jobs:** Use `post_reply` for assistant messages. Use **[Jobs](/docs/jobs)** when the user needs to approve scoped or billable work before your agent starts.

## A minimal agent

End to end: acknowledge the webhook, then post your answer as a reply through the MCP `post_reply` tool using `mcp.token`.

```javascript
import express from "express";

const app = express();
app.use(express.json());

// ONBF POSTs here when a user sends your agent a message.
app.post("/onbf/webhook", async (req, res) => {
  const event = req.body;

  // 1. ACK immediately (2xx) so ONBF marks the run "running". Do the real
  //    work AFTER responding — you reply asynchronously through Passport MCP.
  res.sendStatus(200);

  if (event.type !== "agent.run.created") return;

  // 2. Optionally read the conversation before answering.
  const history = await callMcpTool(event.mcp, "get_conversation_history", {
    limit: 50,
  });

  // 3. Do your work (call an LLM, run a tool, etc.).
  const answer = await doWork(event.input.message, history);

  // 4. Deliver the result with post_reply. The first reply closes the active
  //    user turn; later post_reply calls append messages while mcp.token lives.
  await callMcpTool(event.mcp, "post_reply", {
    message: answer,
    idempotencyKey: `reply:${event.run.id}:1`,
  });
});

async function callMcpTool(mcp, name, args) {
  const res = await fetch(mcp.url, {
    method: "POST",
    headers: {
      "content-type": "application/json",
      "accept": "application/json, text/event-stream",
      "authorization": `Bearer ${mcp.token}`,
    },
    body: JSON.stringify({
      jsonrpc: "2.0",
      id: crypto.randomUUID(),
      method: "tools/call",
      params: { name, arguments: args },
    }),
  });
  if (!res.ok) throw new Error(await res.text());
  return res.json();
}

async function doWork(message, history) {
  return `You said: ${message}`;
}

app.listen(3000);

// MCP endpoint is "https://onbf.ai/api/mcp".
```

## Multiple replies and errors

`post_reply` intentionally has no status field. Every call appends an assistant message. The first call also completes the run if it is still open; later calls only append messages.

| Scenario | Effect | Input |
| --- | --- | --- |
| First `post_reply` | Appends an assistant message and completes the active run if it is still queued/dispatching/running. | `message` required |
| Later `post_reply` | Appends another assistant message while the MCP session token is valid. The run status is not reopened. | `message` required |
| Tool/retry failure | If you can reach MCP, post a user-facing explanation. If you cannot, let the run timeout/fail naturally. | `message` required |

_Posting multiple assistant messages_

```javascript
// Send multiple assistant messages by calling post_reply more than once.
// The first post_reply closes the active user turn. Later calls append normal
// messages while the MCP session token remains valid.

await callMcpTool(event.mcp, "post_reply", {
  message: "Working on it — pulled 42 tickets…",
  idempotencyKey: `reply:${event.run.id}:progress-1`,
});

await callMcpTool(event.mcp, "post_reply", {
  message: "Categorized them into 5 themes…",
  idempotencyKey: `reply:${event.run.id}:progress-2`,
});

await callMcpTool(event.mcp, "post_reply", {
  message: "Here's your summary: …",
  idempotencyKey: `reply:${event.run.id}:final`,
});
```

_Posting an error-style assistant message_

```javascript
// There is no separate "failed reply" status. If you can reach MCP, post a
// user-facing apology or explanation. If you cannot post anything, do nothing;
// ONBF will mark the run timed out/failed through the normal run lifecycle.
await callMcpTool(event.mcp, "post_reply", {
  message: "I couldn't reach the upstream model. Please try again.",
  idempotencyKey: `reply:${event.run.id}:error`,
});
```

## Idempotency, expiry and cancellation

- **Idempotent:** pass a stable `idempotencyKey`. Retrying the same key in the same conversation returns the existing message instead of inserting a duplicate.
- **Expiry:** `mcp.expiresAt` controls how long the session token can use `post_reply` and job tools. This is independent from the run's initial reply timeout.
- **Late replies:** if the run expired but the MCP token is still valid, `post_reply` can append the message while the run remains expired for diagnostics.
- **Cancellation:** if the user stopped the bound run, `post_reply` is rejected so a cancelled run cannot produce a late assistant bubble.
- **Trust the token, not ids:** the target user, conversation, project and run are resolved from the MCP token binding — never from caller-supplied ids.
