@techfren Coding LLM Benchmarks

Unofficial Aider Polyglot Benchmarks with sorting per language

My current mission is to find the best value and speed for money in coding LLMs.

Join the community Discord to chat or submit a model for benchmarking. You can also submit a pull request or issue on GitHub.

Speed in seconds per test case. Lower is better.
Toggle between simple and detailed views. Click on column headers to sort. Click on a row to see full details.
RankModelPass Rate Speed per CaseCost
1DeepSeek R1 0528 (DeepInfra)
71.6%
330.1s
$8.020
2MS R1
56.9%
374.8s
$0.000
3qwen3 235B
54%
380.4s
$2.426
4DeepSeek R1
52%
419.2s
$6.192
5Flash 2.5 Thinking
48.9%
71.4s
$5.000
6R1T-Chimera
48.4%
186.4s
$0.000
7Flash 2.5 Thinking
47.6%
93.1s
$6.000
8DeepSeek Chat v3
44.9%
65.4s
$1.355
9Qwen3 30B
39.6%
192.3s
$1.364
10GPT-4.1-mini
35.7%
40.6s
$2.170
11Grok-3-mini-beta
30.2%
64.2s
$0.778
12Qwen 2.5 Coder 32B
11.6%
100.6s
$0.894

Last benchmark added: