Benchmarks GA on launch
List the benchmarks SupraBench tracks, with their per-dimension community ratings, raw quality score and Bench Score (the actual ranking weight in the Supra-Score aggregation).
List benchmarks
GET
https://api.suprabench.com/v1/benches
Returns benchmarks sorted by effectiveWeight (Bench Score)
descending. Hidden / disputed benchmarks are excluded.
Query parameters
| Parameter | Type | Description |
|---|---|---|
| limit optional | integer | 1–500. Default 100. Clamped silently. |
Response — array of BenchSummary
| Field | Type | Description |
|---|---|---|
| slug | string | URL-safe identifier. |
| name | string | Display name. |
| description | string | One-paragraph human description. |
| url | string | Canonical benchmark page (paper, leaderboard or repo). |
| isOfficial | boolean | true if the benchmark has an official URL on the source allowlist. |
| tags | string[] | Capability tags voted up by the community. |
| scaleMin | number | Native scale lower bound. |
| scaleMax | number | Native scale upper bound. |
| qualityScore | number | 0–100, mean of the four quality dimensions × 20. Input into effectiveWeight. |
| dimensions | object | Per-dimension averages on a 1–5 scale: relevance, contamination, discriminability, reproducibility, difficulty. Keys may evolve additively. |
| modelCount | integer | Distinct models with at least one valid score. |
| raterCount | integer | Number of users who rated this benchmark. |
| effectiveWeight | number | Bench Score on 0–100. Equal to quality × difficulty × headroom, further adjusted by the benchmark's net-upvote and model-coverage shares. This is the actual ranking weight applied to a model's normalised score on this bench when computing the Supra-Score. |
Example
curl "https://api.suprabench.com/v1/benches?limit=2" \
-H "Authorization: Bearer $SUPRABENCH_KEY"
benches = requests.get(
"https://api.suprabench.com/v1/benches",
params={"limit": 2},
headers={"Authorization": f"Bearer {KEY}"},
).json()
for b in benches:
print(f"{b['name']:25s} weight={b['effectiveWeight']:.1f}")
const r = await fetch("https://api.suprabench.com/v1/benches?limit=2", {
headers: { Authorization: `Bearer ${KEY}` },
});
const benches = await r.json();
for (const b of benches) {
console.log(`${b.name.padEnd(25)} weight=${b.effectiveWeight.toFixed(1)}`);
}
req, _ := http.NewRequest("GET",
"https://api.suprabench.com/v1/benches?limit=2", nil)
req.Header.Set("Authorization", "Bearer "+os.Getenv("SUPRABENCH_KEY"))
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()
var benches []struct {
Name string `json:"name"`
EffectiveWeight float64 `json:"effectiveWeight"`
}
json.NewDecoder(resp.Body).Decode(&benches)
for _, b := range benches {
fmt.Printf("%-25s weight=%.1f\n", b.Name, b.EffectiveWeight)
}
[
{
"slug": "swe-bench-verified",
"name": "SWE-bench Verified",
"description": "500 human-verified GitHub issues, agent must produce a passing patch end-to-end.",
"url": "https://www.swebench.com/",
"isOfficial": true,
"tags": ["coding", "agentic"],
"scaleMin": 0,
"scaleMax": 100,
"qualityScore": 87.4,
"dimensions": {
"relevance": 4.6,
"contamination": 4.5,
"discriminability": 3.9,
"reproducibility": 4.3,
"difficulty": 4.7
},
"modelCount": 41,
"raterCount": 23,
"effectiveWeight": 56.8
},
{
"slug": "mmlu-pro",
"name": "MMLU-Pro",
"description": "Reasoning-heavy multiple-choice across 14 academic disciplines, harder than original MMLU.",
"url": "https://arxiv.org/abs/2406.01574",
"isOfficial": true,
"tags": ["reasoning", "knowledge"],
"scaleMin": 0,
"scaleMax": 1,
"qualityScore": 81.2,
"dimensions": {
"relevance": 4.4,
"contamination": 4.1,
"discriminability": 3.7,
"reproducibility": 3.9,
"difficulty": 4.0
},
"modelCount": 67,
"raterCount": 19,
"effectiveWeight": 41.0
}
]
Errors
| Status | Code | When |
|---|---|---|
| 400 | invalid_param | Non-integer limit. |