@techfren Coding LLM Benchmarks

Unofficial Aider Polyglot Benchmarks with sorting per language

My current mission is to find the best value and speed for money in coding LLMs.

Join the community Discord to chat or submit a model for benchmarking. You can also submit a pull request or issue on GitHub.

Speed in seconds per test case. Lower is better.
Toggle between simple and detailed views. Click on column headers to sort. Click on a row to see full details.
RankModelPass Rate Speed per CaseCost
1DeepSeek R1 0528 (DeepInfra)
71.6%
330.1s
$8.020
2DeepSeek-TNG-R1T2-Chimera
64.4%
263.2s
$0.000
3ERNIE-4.5-300B
61.3%
149.9s
$1.534
4MS R1
56.9%
374.8s
$0.000
5qwen3 235B
54%
380.4s
$2.426
6DeepSeek R1
52%
419.2s
$6.192
7Flash 2.5 Thinking
48.9%
71.4s
$5.000
8R1T-Chimera
48.4%
186.4s
$0.000
9Flash 2.5 Thinking
47.6%
93.1s
$6.000
10DeepSeek Chat v3
44.9%
65.4s
$1.355
11Qwen3 30B
39.6%
192.3s
$1.364
12GPT-4.1-mini
35.7%
40.6s
$2.170
13Grok-3-mini-beta
30.2%
64.2s
$0.778
14Qwen 2.5 Coder 32B
11.6%
100.6s
$0.894

Last benchmark added: