Benchmarks GA on launch

List the benchmarks SupraBench tracks, with their per-dimension community ratings, raw quality score and Bench Score (the actual ranking weight in the Supra-Score aggregation).

Last updated: April 18, 2026 Reading time: 3 min

List benchmarks

GET https://api.suprabench.com/v1/benches

Returns benchmarks sorted by effectiveWeight (Bench Score) descending. Hidden / disputed benchmarks are excluded.

Query parameters

Parameter	Type	Description
limit optional	integer	1–500. Default 100. Clamped silently.

Response — array of `BenchSummary`

Field	Type	Description
slug	string	URL-safe identifier.
name	string	Display name.
description	string	One-paragraph human description.
url	string	Canonical benchmark page (paper, leaderboard or repo).
isOfficial	boolean	`true` if the benchmark has an official URL on the source allowlist.
tags	string[]	Capability tags voted up by the community.
scaleMin	number	Native scale lower bound.
scaleMax	number	Native scale upper bound.
qualityScore	number	0–100, mean of the four quality dimensions × 20. Input into `effectiveWeight`.
dimensions	object	Per-dimension averages on a 1–5 scale: `relevance`, `contamination`, `discriminability`, `reproducibility`, `difficulty`. Keys may evolve additively.
modelCount	integer	Distinct models with at least one valid score.
raterCount	integer	Number of users who rated this benchmark.
effectiveWeight	number	Bench Score on 0–100. Equal to quality × difficulty × headroom, further adjusted by the benchmark's net-upvote share. This is the direct ability weight applied to a model's normalised score on this bench when computing the SupraScore; model coverage is used separately as evidence confidence.

Example

curl "https://api.suprabench.com/v1/benches?limit=2" \
  -H "Authorization: Bearer $SUPRABENCH_KEY"

benches = requests.get(
    "https://api.suprabench.com/v1/benches",
    params={"limit": 2},
    headers={"Authorization": f"Bearer {KEY}"},
).json()
for b in benches:
    print(f"{b['name']:25s} weight={b['effectiveWeight']:.1f}")

const r = await fetch("https://api.suprabench.com/v1/benches?limit=2", {
  headers: { Authorization: `Bearer ${KEY}` },
});
const benches = await r.json();
for (const b of benches) {
  console.log(`${b.name.padEnd(25)} weight=${b.effectiveWeight.toFixed(1)}`);
}

req, _ := http.NewRequest("GET",
    "https://api.suprabench.com/v1/benches?limit=2", nil)
req.Header.Set("Authorization", "Bearer "+os.Getenv("SUPRABENCH_KEY"))
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()

var benches []struct {
    Name            string  `json:"name"`
    EffectiveWeight float64 `json:"effectiveWeight"`
}
json.NewDecoder(resp.Body).Decode(&benches)
for _, b := range benches {
    fmt.Printf("%-25s weight=%.1f\n", b.Name, b.EffectiveWeight)
}

[
  {
    "slug": "swe-bench-verified",
    "name": "SWE-bench Verified",
    "description": "500 human-verified GitHub issues, agent must produce a passing patch end-to-end.",
    "url": "https://www.swebench.com/",
    "isOfficial": true,
    "tags": ["coding", "agentic"],
    "scaleMin": 0,
    "scaleMax": 100,
    "qualityScore": 87.4,
    "dimensions": {
      "relevance": 4.6,
      "contamination": 4.5,
      "discriminability": 3.9,
      "reproducibility": 4.3,
      "difficulty": 4.7
    },
    "modelCount": 41,
    "raterCount": 23,
    "effectiveWeight": 56.8
  },
  {
    "slug": "mmlu-pro",
    "name": "MMLU-Pro",
    "description": "Reasoning-heavy multiple-choice across 14 academic disciplines, harder than original MMLU.",
    "url": "https://arxiv.org/abs/2406.01574",
    "isOfficial": true,
    "tags": ["reasoning", "knowledge"],
    "scaleMin": 0,
    "scaleMax": 1,
    "qualityScore": 81.2,
    "dimensions": {
      "relevance": 4.4,
      "contamination": 4.1,
      "discriminability": 3.7,
      "reproducibility": 3.9,
      "difficulty": 4.0
    },
    "modelCount": 67,
    "raterCount": 19,
    "effectiveWeight": 41.0
  }
]

Errors

Status	Code	When
400	invalid_param	Non-integer `limit`.

Previous ← Models Next Best by tag →