Benchmarks GA on launch

List the benchmarks SupraBench tracks, with their per-dimension community ratings, raw quality score and Bench Score (the actual ranking weight in the Supra-Score aggregation).

Last updated: April 18, 2026 Reading time: 3 min

List benchmarks

GET https://api.suprabench.com/v1/benches

Returns benchmarks sorted by effectiveWeight (Bench Score) descending. Hidden / disputed benchmarks are excluded.

Query parameters

ParameterTypeDescription
limit optional integer 1–500. Default 100. Clamped silently.

Response — array of BenchSummary

FieldTypeDescription
slugstringURL-safe identifier.
namestringDisplay name.
descriptionstringOne-paragraph human description.
urlstringCanonical benchmark page (paper, leaderboard or repo).
isOfficialbooleantrue if the benchmark has an official URL on the source allowlist.
tagsstring[]Capability tags voted up by the community.
scaleMinnumberNative scale lower bound.
scaleMaxnumberNative scale upper bound.
qualityScorenumber0–100, mean of the four quality dimensions × 20. Input into effectiveWeight.
dimensionsobjectPer-dimension averages on a 1–5 scale: relevance, contamination, discriminability, reproducibility, difficulty. Keys may evolve additively.
modelCountintegerDistinct models with at least one valid score.
raterCountintegerNumber of users who rated this benchmark.
effectiveWeightnumberBench Score on 0–100. Equal to quality × difficulty × headroom, further adjusted by the benchmark's net-upvote and model-coverage shares. This is the actual ranking weight applied to a model's normalised score on this bench when computing the Supra-Score.

Example

curl "https://api.suprabench.com/v1/benches?limit=2" \
  -H "Authorization: Bearer $SUPRABENCH_KEY"
benches = requests.get(
    "https://api.suprabench.com/v1/benches",
    params={"limit": 2},
    headers={"Authorization": f"Bearer {KEY}"},
).json()
for b in benches:
    print(f"{b['name']:25s} weight={b['effectiveWeight']:.1f}")
const r = await fetch("https://api.suprabench.com/v1/benches?limit=2", {
  headers: { Authorization: `Bearer ${KEY}` },
});
const benches = await r.json();
for (const b of benches) {
  console.log(`${b.name.padEnd(25)} weight=${b.effectiveWeight.toFixed(1)}`);
}
req, _ := http.NewRequest("GET",
    "https://api.suprabench.com/v1/benches?limit=2", nil)
req.Header.Set("Authorization", "Bearer "+os.Getenv("SUPRABENCH_KEY"))
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()

var benches []struct {
    Name            string  `json:"name"`
    EffectiveWeight float64 `json:"effectiveWeight"`
}
json.NewDecoder(resp.Body).Decode(&benches)
for _, b := range benches {
    fmt.Printf("%-25s weight=%.1f\n", b.Name, b.EffectiveWeight)
}
[
  {
    "slug": "swe-bench-verified",
    "name": "SWE-bench Verified",
    "description": "500 human-verified GitHub issues, agent must produce a passing patch end-to-end.",
    "url": "https://www.swebench.com/",
    "isOfficial": true,
    "tags": ["coding", "agentic"],
    "scaleMin": 0,
    "scaleMax": 100,
    "qualityScore": 87.4,
    "dimensions": {
      "relevance": 4.6,
      "contamination": 4.5,
      "discriminability": 3.9,
      "reproducibility": 4.3,
      "difficulty": 4.7
    },
    "modelCount": 41,
    "raterCount": 23,
    "effectiveWeight": 56.8
  },
  {
    "slug": "mmlu-pro",
    "name": "MMLU-Pro",
    "description": "Reasoning-heavy multiple-choice across 14 academic disciplines, harder than original MMLU.",
    "url": "https://arxiv.org/abs/2406.01574",
    "isOfficial": true,
    "tags": ["reasoning", "knowledge"],
    "scaleMin": 0,
    "scaleMax": 1,
    "qualityScore": 81.2,
    "dimensions": {
      "relevance": 4.4,
      "contamination": 4.1,
      "discriminability": 3.7,
      "reproducibility": 3.9,
      "difficulty": 4.0
    },
    "modelCount": 67,
    "raterCount": 19,
    "effectiveWeight": 41.0
  }
]

Errors

StatusCodeWhen
400invalid_paramNon-integer limit.