@techfren Coding LLM Benchmarks

Unofficial Aider Polyglot Benchmarks with sorting per language

My current mission is to find the best value and speed for money in coding LLMs.

Join the community Discord to chat or submit a model for benchmarking. You can also submit a pull request or issue on GitHub.

Speed in seconds per test case. Lower is better.
Toggle between simple and detailed views. Click on column headers to sort. Click on a row to see full details.
RankModelPass Rate Speed per CaseCost
1MS R1
56.9%
374.8s
$0.000
2qwen3 235B
54%
380.4s
$2.426
3DeepSeek R1
52%
419.2s
$6.192
4Flash 2.5 Thinking
48.9%
71.4s
$5.000
5R1T-Chimera
48.4%
186.4s
$0.000
6Flash 2.5 Thinking
47.6%
93.1s
$6.000
7DeepSeek Chat v3
44.9%
65.4s
$1.355
8Qwen3 30B
39.6%
192.3s
$1.364
9GPT-4.1-mini
35.7%
40.6s
$2.170
10Grok-3-mini-beta
30.2%
64.2s
$0.778
11Grok-3-mini-beta
27.1%
180.3s
$2.165
12Grok-3-mini-beta
22.2%
68.8s
$1.010
13GLM-4
12.9%
1281.0s
$0.000
14Qwen 2.5 Coder 32B
11.6%
100.6s
$0.894

Last benchmark added: