Rate limits & quotas

We enforce two independent limits on every request: a short-window rate limit (per minute) and a long-window monthly quota. Both are advertised in response headers so you can plan around them.

Last updated: April 25, 2026 Reading time: 4 min

Two limits, one purpose

Both limits are tracked at the key level, not the account level. If you have three keys on the Pro tier, each gets its own 300 req/min budget and its own 100 000 req/month bucket.

Per-tier limits

Tier Rate limit Monthly quota Bulk export
Starter60 / min10 000
Pro300 / min100 000
Enterprise1 200 / min1 000 000
Enterprise+CustomCustom
Partner (invite-only, free)CustomCustom

Response headers

Every successful response carries:

HeaderMeaning
X-RateLimit-LimitYour per-minute rate ceiling.
X-RateLimit-RemainingRequests left in the current calendar-minute bucket.
X-RateLimit-ResetUnix timestamp (seconds) when the next calendar-minute bucket starts.
X-Quota-LimitMonthly quota for this key.
X-Quota-RemainingRequests left in the current month.
X-Quota-ResetUnix timestamp when the monthly bucket resets (00:00 UTC on the 1st).
X-Computed-AtServer-side response generation time, milliseconds since the Unix epoch. Useful for measuring observed API latency separate from network/edge-cache delay.
X-Schema-VersionThe response-shape contract this body was built against (currently v1). Bump-watching this lets clients gate on schema upgrades before parsing changes.
Last-ModifiedHTTP-date of the most recent updatedAt in the response body. SupraBench scores are dynamic — they shift whenever a new submission lands or a bench is re-rated — so this header (and the per-row updatedAt field) is the canonical "as-of" stamp. Combined with If-Modified-Since on a follow-up request you can short-circuit unchanged data (304 support arriving with v1.1).

Why two timestamps?

X-Computed-At is "when the server packaged this response". Last-Modified is "when the data inside last changed". They diverge whenever you hit an edge cache (X-Computed-At = original computation time, Last-Modified = data age) or whenever you hit an endpoint with a stale-but-still-valid cached body (both timestamps lag wall-clock by the cache TTL). For freshness-sensitive workflows — leaderboard mirrors, public dashboards — read both.

Handling 429

When you exceed either limit, you get HTTP 429 with a body:

{
  "error": {
    "code": "rate_limited",
    "message": "> 60 req/min",
    "hint": "Slow down or upgrade tier."
  }
}

The code field disambiguates the two cases: rate_limited (try again next minute) versus quota_exceeded (try again next month, or upgrade). A Retry-After header is also set with the recommended wait time in seconds.

Exponential backoff

For rate_limited, retry with exponential backoff capped at Retry-After. Don't hammer the rate-limit endpoint — repeated 429s aren't free for either of us.

# curl's --retry 5 automatically obeys Retry-After when combined with
# --retry-all-errors. It won't cap the wait the way a real program
# should — for anything non-trivial use a real HTTP client.
curl --retry 5 --retry-all-errors --retry-delay 1 --retry-max-time 60 \
  "https://api.suprabench.com/v1/models" \
  -H "Authorization: Bearer $SUPRABENCH_KEY"
import time, requests

def call(url, *, max_retries=5):
    delay = 1
    for attempt in range(max_retries):
        r = requests.get(url, headers={"Authorization": f"Bearer {KEY}"})
        if r.status_code != 429:
            return r
        wait = int(r.headers.get("Retry-After", delay))
        time.sleep(wait)
        delay = min(delay * 2, 60)
    r.raise_for_status()
async function call(url, { maxRetries = 5 } = {}) {
  let delay = 1;
  for (let i = 0; i < maxRetries; i++) {
    const r = await fetch(url, { headers: { Authorization: `Bearer ${KEY}` } });
    if (r.status !== 429) return r;
    const wait = Number(r.headers.get("Retry-After")) || delay;
    await new Promise(res => setTimeout(res, wait * 1000));
    delay = Math.min(delay * 2, 60);
  }
  throw new Error("rate-limited after retries");
}
func call(url string, maxRetries int) (*http.Response, error) {
    delay := time.Second
    var resp *http.Response
    var err error
    for i := 0; i < maxRetries; i++ {
        req, _ := http.NewRequest("GET", url, nil)
        req.Header.Set("Authorization", "Bearer "+os.Getenv("SUPRABENCH_KEY"))
        resp, err = http.DefaultClient.Do(req)
        if err != nil { return nil, err }
        if resp.StatusCode != 429 { return resp, nil }
        wait := delay
        if h := resp.Header.Get("Retry-After"); h != "" {
            if secs, err := strconv.Atoi(h); err == nil {
                wait = time.Duration(secs) * time.Second
            }
        }
        resp.Body.Close()
        time.Sleep(wait)
        if delay < 60*time.Second { delay *= 2 }
    }
    return resp, fmt.Errorf("rate-limited after %d retries", maxRetries)
}

For quota_exceeded, exponential backoff doesn't help — wait for the next month or upgrade. Detect the code and surface a useful message to your user/operator.

Use the cache

Most read endpoints carry Cache-Control: public, max-age=300 (5 minutes for rankings, 24h for bulk exports, 1h for tags). If you're polling rankings every second from one process, you're wasting your quota — there's nothing newer to fetch. A simple in-process cache that respects max-age will cut consumption by 100×+ for many workloads.

Cloudflare, Varnish, Fastly and even requests-cache will respect our headers automatically. Your edge cache is not counted against your quota — only what hits our origin.

Quota resets

Upgrading mid-cycle

Switching tiers takes effect immediately. Stripe pro-rates the difference and charges/refunds accordingly. The new monthly quota replaces the old one (so if you've used 9 800 / 10 000 on Starter and upgrade to Pro, you immediately have 100 000 − 9 800 = 90 200 requests for the rest of this billing month).

Need a custom limit for a research project or a launch spike? Email us — we usually grant short-term boosts within a business day.