Rate limits & quotas

We enforce two independent limits on every request: a short-window rate limit (per minute) and a long-window monthly quota. Both are advertised in response headers so you can plan around them.

Last updated: April 25, 2026 Reading time: 4 min

Two limits, one purpose

Rate limit — protects the platform from bursts. Counted per calendar minute (UTC) per key — i.e. the bucket resets at the top of every wall-clock minute, not 60s after your first request. Hitting it returns 429 rate_limited with a Retry-After header pointing at the next minute boundary.
Monthly quota — the total number of requests included in your subscription. Counted per calendar month (UTC), per key. Hitting it returns 429 quota_exceeded.

Both limits are tracked at the key level, not the account level. If you have three keys on the Pro tier, each gets its own 300 req/min budget and its own 100 000 req/month bucket.

Per-tier limits

Tier	Rate limit	Monthly quota	Bulk export
Starter	60 / min	10 000	—
Pro	300 / min	100 000	✓
Enterprise	1 200 / min	1 000 000	✓
Enterprise+	Custom	Custom	✓
Partner (invite-only, free)	Custom	Custom	✓

Response headers

Every successful response carries:

Header	Meaning
X-RateLimit-Limit	Your per-minute rate ceiling.
X-RateLimit-Remaining	Requests left in the current calendar-minute bucket.
X-RateLimit-Reset	Unix timestamp (seconds) when the next calendar-minute bucket starts.
X-Quota-Limit	Monthly quota for this key.
X-Quota-Remaining	Requests left in the current month.
X-Quota-Reset	Unix timestamp when the monthly bucket resets (00:00 UTC on the 1st).
X-Computed-At	Server-side response generation time, milliseconds since the Unix epoch. Useful for measuring observed API latency separate from network/edge-cache delay.
X-Schema-Version	The response-shape contract this body was built against (currently `v1`). Bump-watching this lets clients gate on schema upgrades before parsing changes.
Last-Modified	HTTP-date of the most recent `updatedAt` in the response body. SupraBench scores are dynamic — they shift whenever a new submission lands or a bench is re-rated — so this header (and the per-row `updatedAt` field) is the canonical "as-of" stamp. Combined with `If-Modified-Since` on a follow-up request you can short-circuit unchanged data (304 support arriving with v1.1).

Why two timestamps?

X-Computed-At is "when the server packaged this response". Last-Modified is "when the data inside last changed". They diverge whenever you hit an edge cache (X-Computed-At = original computation time, Last-Modified = data age) or whenever you hit an endpoint with a stale-but-still-valid cached body (both timestamps lag wall-clock by the cache TTL). For freshness-sensitive workflows — leaderboard mirrors, public dashboards — read both.

Handling 429

When you exceed either limit, you get HTTP 429 with a body:

{
  "error": {
    "code": "rate_limited",
    "message": "> 60 req/min",
    "hint": "Slow down or upgrade tier."
  }
}

The code field disambiguates the two cases: rate_limited (try again next minute) versus quota_exceeded (try again next month, or upgrade). A Retry-After header is also set with the recommended wait time in seconds.

Exponential backoff

For rate_limited, retry with exponential backoff capped at Retry-After. Don't hammer the rate-limit endpoint — repeated 429s aren't free for either of us.

# curl's --retry 5 automatically obeys Retry-After when combined with
# --retry-all-errors. It won't cap the wait the way a real program
# should — for anything non-trivial use a real HTTP client.
curl --retry 5 --retry-all-errors --retry-delay 1 --retry-max-time 60 \
  "https://api.suprabench.com/v1/models" \
  -H "Authorization: Bearer $SUPRABENCH_KEY"

import time, requests

def call(url, *, max_retries=5):
    delay = 1
    for attempt in range(max_retries):
        r = requests.get(url, headers={"Authorization": f"Bearer {KEY}"})
        if r.status_code != 429:
            return r
        wait = int(r.headers.get("Retry-After", delay))
        time.sleep(wait)
        delay = min(delay * 2, 60)
    r.raise_for_status()

async function call(url, { maxRetries = 5 } = {}) {
  let delay = 1;
  for (let i = 0; i < maxRetries; i++) {
    const r = await fetch(url, { headers: { Authorization: `Bearer ${KEY}` } });
    if (r.status !== 429) return r;
    const wait = Number(r.headers.get("Retry-After")) || delay;
    await new Promise(res => setTimeout(res, wait * 1000));
    delay = Math.min(delay * 2, 60);
  }
  throw new Error("rate-limited after retries");
}

func call(url string, maxRetries int) (*http.Response, error) {
    delay := time.Second
    var resp *http.Response
    var err error
    for i := 0; i < maxRetries; i++ {
        req, _ := http.NewRequest("GET", url, nil)
        req.Header.Set("Authorization", "Bearer "+os.Getenv("SUPRABENCH_KEY"))
        resp, err = http.DefaultClient.Do(req)
        if err != nil { return nil, err }
        if resp.StatusCode != 429 { return resp, nil }
        wait := delay
        if h := resp.Header.Get("Retry-After"); h != "" {
            if secs, err := strconv.Atoi(h); err == nil {
                wait = time.Duration(secs) * time.Second
            }
        }
        resp.Body.Close()
        time.Sleep(wait)
        if delay < 60*time.Second { delay *= 2 }
    }
    return resp, fmt.Errorf("rate-limited after %d retries", maxRetries)
}

For quota_exceeded, exponential backoff doesn't help — wait for the next month or upgrade. Detect the code and surface a useful message to your user/operator.

Use the cache

Most read endpoints carry Cache-Control: public, max-age=300 (5 minutes for rankings, 24h for bulk exports, 1h for tags). If you're polling rankings every second from one process, you're wasting your quota — there's nothing newer to fetch. A simple in-process cache that respects max-age will cut consumption by 100×+ for many workloads.

Cloudflare, Varnish, Fastly and even requests-cache will respect our headers automatically. Your edge cache is not counted against your quota — only what hits our origin.

Quota resets

The monthly bucket resets at 00:00 UTC on the 1st of each month, regardless of when you subscribed.
Unused quota does not roll over to the next month.
The first month after a fresh subscription is pro-rated by remaining days, so you don't pay full price for half a month.

Upgrading mid-cycle

Switching tiers takes effect immediately. Stripe pro-rates the difference and charges/refunds accordingly. The new monthly quota replaces the old one (so if you've used 9 800 / 10 000 on Starter and upgrade to Pro, you immediately have 100 000 − 9 800 = 90 200 requests for the rest of this billing month).

Need a custom limit for a research project or a launch spike? Email us — we usually grant short-term boosts within a business day.

Previous ← Authentication Next Errors →