# Rate Limits

# Rate Limits

Amdahl enforces rate limits at several layers so that bursty traffic from one caller cannot starve the platform. This page documents what is live today, what is on the roadmap, and the client-side patterns you should adopt now so your integration behaves correctly once per-key throttles ship.

## Current state

### Global per-IP limiter

Every request to the platform passes through a single global limiter applied at the HTTP edge. The defaults are:

- **100 requests per minute per IP** for production traffic on `/api/platform/v1/*`
- **500 requests per minute per IP** in local development

The limiter is shared across all endpoints under the platform API. A client making 60 `data.query` calls and 40 `artifacts.list` calls in the same minute counts against the same 100-request budget.

### OAuth dynamic client registration

Dynamic client registration under RFC 7591 has its own tighter bucket:

- **5 registrations per minute per IP** at `POST /oauth/register`

This is deliberate. Client registration is a write operation that provisions long-lived credentials, so the protection is stricter than read traffic.

### Agent session cost caps

Agent sessions have one soft cap that functions as a turn-based rate limit:

- **`maxTurns`** is the per-profile turn budget. When exhausted the session pauses with a `continue_or_finish` request rather than ending outright. You decide whether to grant more turns.

See [Agents guide](./agents.md) for how to handle the `continue_or_finish` pause.

## Per-key and per-tool limits (roadmap)

Per-API-key and per-tool rate limits are planned but not yet enforced. When they land:

- Limits will be scoped to the API key, not the IP, so shared infrastructure stops being a noisy neighbor.
- Read-heavy tools (`data.query`, `context.ask`, `artifacts.get`) will get higher budgets than write-heavy tools (`artifacts.create`, `agents.start`).
- Burst allowances will let callers spike briefly before sustained throttling kicks in.

Until that ships, treat the 100-req-per-minute global limit as your effective ceiling and build in the headers and backoff logic below so your code keeps working when the tighter per-key limits arrive.

## Response headers

Every response from the platform API carries rate-limit headers that let you monitor your budget without a separate bookkeeping layer:

| Header                  | Meaning                                         |
| ----------------------- | ----------------------------------------------- |
| `X-RateLimit-Limit`     | Total requests permitted in the current window  |
| `X-RateLimit-Remaining` | Requests still available in the current window  |
| `X-RateLimit-Reset`     | Unix timestamp (seconds) when the window resets |

Read these on every response, not just on errors. If `X-RateLimit-Remaining` is close to zero, slow down before you hit a `429`.

## 429 response shape

When you hit a limit the platform returns HTTP status `429` with the standard error envelope:

```json
{
  "error": {
    "code": "rate_limited",
    "message": "Rate limit exceeded. Try again in 42 seconds.",
    "details": {
      "retry_after_seconds": 42,
      "limit": 100,
      "window_seconds": 60
    }
  }
}
```

The `Retry-After` header is also set, in seconds, so you can rely on either the envelope detail or the header.

## Backoff strategy

When you receive a `429`, back off and retry. The recommended pattern is exponential backoff with full jitter, capped at 60 seconds:

```typescript
async function withBackoff<T>(fn: () => Promise<T>, maxAttempts = 5): Promise<T> {
  let attempt = 0
  while (true) {
    try {
      return await fn()
    } catch (err: any) {
      if (err.status !== 429 || attempt >= maxAttempts - 1) throw err
      const baseMs = Math.min(60_000, 1000 * 2 ** attempt)
      const jitterMs = Math.random() * baseMs
      await new Promise(r => setTimeout(r, jitterMs))
      attempt++
    }
  }
}
```

Three rules:

1. Honor `Retry-After` first. If the header is present, sleep at least that many seconds before retrying.
2. Never retry faster than 1 second. Tight retry loops make the problem worse.
3. Cap at 60 seconds. Beyond that, surface the error to the caller rather than hiding a long stall.

## Best practices

- **Batch where possible.** `data.query` can filter and aggregate in a single call; avoid making ten `GET` requests for data you can pull in one.
- **Cache tool metadata.** The tool registry (`tools/_index.md`, scopes, data models) changes on the order of weeks, not seconds. Cache it on your side and refresh daily, not per-request.
- **Do not poll faster than 1/sec.** When polling `agents.status`, 2 to 5 seconds is the right cadence for interactive UI and 15 to 30 seconds is right for batch integrations. Prefer the SSE stream at `GET /api/platform/v1/agents/:session_id/stream` for sub-second updates.
- **Spread write bursts.** Creating 500 artifacts in 10 seconds will tip you into throttling. Spread them over a minute or enqueue them on your side.
- **Key your retries by idempotency.** If you retry a `POST`, make sure the server-side call is idempotent or carries a dedupe key, so a successful-but-timed-out first attempt plus a successful retry do not create two records.

## See also

- [API reference: errors](./api-reference/errors.md) for the full error envelope and every code
- [Agents guide: rate limits](./agents.md) for the session-level budget mechanics
- [Webhooks guide](./webhooks.md) for how retries handle your endpoint's `429` responses
