@techfren Coding LLM Benchmarks

Unofficial Aider Polyglot Benchmarks with sorting per language

My current mission is to find the best value and speed for money in coding LLMs.

Join the community Discord to chat or submit a model for benchmarking. You can also submit a pull request or issue on GitHub.

Speed in seconds per test case. Lower is better.
Toggle between simple and detailed views. Click on column headers to sort. Click on a row to see full details.
RankModelPass Rate Speed per CaseCost
1DeepSeek R1 0528 (DeepInfra)
71.6%
330.1s
$8.020
2ERNIE-4.5-300B
61.3%
149.9s
$1.534
3MS R1
56.9%
374.8s
$0.000
4qwen3 235B
54%
380.4s
$2.426
5DeepSeek R1
52%
419.2s
$6.192
6Flash 2.5 Thinking
48.9%
71.4s
$5.000
7R1T-Chimera
48.4%
186.4s
$0.000
8Flash 2.5 Thinking
47.6%
93.1s
$6.000
9DeepSeek Chat v3
44.9%
65.4s
$1.355
10Qwen3 30B
39.6%
192.3s
$1.364
11GPT-4.1-mini
35.7%
40.6s
$2.170
12Grok-3-mini-beta
30.2%
64.2s
$0.778
13Qwen 2.5 Coder 32B
11.6%
100.6s
$0.894

Last benchmark added: