Rate limits & quotas
We enforce two independent limits on every request: a short-window rate limit (per minute) and a long-window monthly quota. Both are advertised in response headers so you can plan around them.
Two limits, one purpose
- Rate limit — protects the platform from bursts. Counted per calendar minute (UTC) per key — i.e. the bucket resets at the top of every wall-clock minute, not 60s after your first request. Hitting it returns
429 rate_limitedwith aRetry-Afterheader pointing at the next minute boundary. - Monthly quota — the total number of requests included in your subscription. Counted per calendar month (UTC), per key. Hitting it returns
429 quota_exceeded.
Both limits are tracked at the key level, not the account level. If you have three keys on the Pro tier, each gets its own 300 req/min budget and its own 100 000 req/month bucket.
Per-tier limits
| Tier | Rate limit | Monthly quota | Bulk export |
|---|---|---|---|
| Starter | 60 / min | 10 000 | — |
| Pro | 300 / min | 100 000 | ✓ |
| Enterprise | 1 200 / min | 1 000 000 | ✓ |
| Enterprise+ | Custom | Custom | ✓ |
| Partner (invite-only, free) | Custom | Custom | ✓ |
Response headers
Every successful response carries:
| Header | Meaning |
|---|---|
| X-RateLimit-Limit | Your per-minute rate ceiling. |
| X-RateLimit-Remaining | Requests left in the current calendar-minute bucket. |
| X-RateLimit-Reset | Unix timestamp (seconds) when the next calendar-minute bucket starts. |
| X-Quota-Limit | Monthly quota for this key. |
| X-Quota-Remaining | Requests left in the current month. |
| X-Quota-Reset | Unix timestamp when the monthly bucket resets (00:00 UTC on the 1st). |
| X-Computed-At | Server-side response generation time, milliseconds since the Unix epoch. Useful for measuring observed API latency separate from network/edge-cache delay. |
| X-Schema-Version | The response-shape contract this body was built against (currently v1). Bump-watching this lets clients gate on schema upgrades before parsing changes. |
| Last-Modified | HTTP-date of the most recent updatedAt in the response body. SupraBench scores are dynamic — they shift whenever a new submission lands or a bench is re-rated — so this header (and the per-row updatedAt field) is the canonical "as-of" stamp. Combined with If-Modified-Since on a follow-up request you can short-circuit unchanged data (304 support arriving with v1.1). |
Why two timestamps?
X-Computed-At is "when the server packaged this response". Last-Modified is "when the data inside last changed". They diverge whenever you hit an edge cache (X-Computed-At = original computation time, Last-Modified = data age) or whenever you hit an endpoint with a stale-but-still-valid cached body (both timestamps lag wall-clock by the cache TTL). For freshness-sensitive workflows — leaderboard mirrors, public dashboards — read both.
Handling 429
When you exceed either limit, you get HTTP 429 with a body:
{
"error": {
"code": "rate_limited",
"message": "> 60 req/min",
"hint": "Slow down or upgrade tier."
}
}
The code field disambiguates the two cases:
rate_limited (try again next minute) versus
quota_exceeded (try again next month, or upgrade).
A Retry-After header is also set with the recommended
wait time in seconds.
Exponential backoff
For rate_limited, retry with exponential backoff
capped at Retry-After. Don't hammer the rate-limit
endpoint — repeated 429s aren't free for either of us.
# curl's --retry 5 automatically obeys Retry-After when combined with
# --retry-all-errors. It won't cap the wait the way a real program
# should — for anything non-trivial use a real HTTP client.
curl --retry 5 --retry-all-errors --retry-delay 1 --retry-max-time 60 \
"https://api.suprabench.com/v1/models" \
-H "Authorization: Bearer $SUPRABENCH_KEY"
import time, requests
def call(url, *, max_retries=5):
delay = 1
for attempt in range(max_retries):
r = requests.get(url, headers={"Authorization": f"Bearer {KEY}"})
if r.status_code != 429:
return r
wait = int(r.headers.get("Retry-After", delay))
time.sleep(wait)
delay = min(delay * 2, 60)
r.raise_for_status()
async function call(url, { maxRetries = 5 } = {}) {
let delay = 1;
for (let i = 0; i < maxRetries; i++) {
const r = await fetch(url, { headers: { Authorization: `Bearer ${KEY}` } });
if (r.status !== 429) return r;
const wait = Number(r.headers.get("Retry-After")) || delay;
await new Promise(res => setTimeout(res, wait * 1000));
delay = Math.min(delay * 2, 60);
}
throw new Error("rate-limited after retries");
}
func call(url string, maxRetries int) (*http.Response, error) {
delay := time.Second
var resp *http.Response
var err error
for i := 0; i < maxRetries; i++ {
req, _ := http.NewRequest("GET", url, nil)
req.Header.Set("Authorization", "Bearer "+os.Getenv("SUPRABENCH_KEY"))
resp, err = http.DefaultClient.Do(req)
if err != nil { return nil, err }
if resp.StatusCode != 429 { return resp, nil }
wait := delay
if h := resp.Header.Get("Retry-After"); h != "" {
if secs, err := strconv.Atoi(h); err == nil {
wait = time.Duration(secs) * time.Second
}
}
resp.Body.Close()
time.Sleep(wait)
if delay < 60*time.Second { delay *= 2 }
}
return resp, fmt.Errorf("rate-limited after %d retries", maxRetries)
}
For quota_exceeded, exponential backoff doesn't help
— wait for the next month or upgrade. Detect the code and surface
a useful message to your user/operator.
Use the cache
Most read endpoints carry Cache-Control: public, max-age=300
(5 minutes for rankings, 24h for bulk exports, 1h for tags). If
you're polling rankings every second from one process, you're
wasting your quota — there's nothing newer to fetch. A simple
in-process cache that respects max-age will cut
consumption by 100×+ for many workloads.
Cloudflare, Varnish, Fastly and even requests-cache
will respect our headers automatically. Your edge cache is
not counted against your quota — only what hits
our origin.
Quota resets
- The monthly bucket resets at 00:00 UTC on the 1st of each month, regardless of when you subscribed.
- Unused quota does not roll over to the next month.
- The first month after a fresh subscription is pro-rated by remaining days, so you don't pay full price for half a month.
Upgrading mid-cycle
Switching tiers takes effect immediately. Stripe pro-rates the difference and charges/refunds accordingly. The new monthly quota replaces the old one (so if you've used 9 800 / 10 000 on Starter and upgrade to Pro, you immediately have 100 000 − 9 800 = 90 200 requests for the rest of this billing month).
Need a custom limit for a research project or a launch spike? Email us — we usually grant short-term boosts within a business day.