Benchmark Results - 20260307T170059
Model Selection
| Slot |
Role |
Model |
Composite Score |
| 1 |
General (Primary) |
llama3.2:3b |
0.967 |
| 2 |
General (Secondary) |
llama3.2:3b |
0.967 |
| 3 |
Coding (Primary) |
deepseek-coder-v2 |
0.738 |
| 4 |
Coding (Secondary) |
qwen2.5-coder:7b |
0.63 |
Detailed Metrics
deepseek-coder-v2
- Category: coding
- Coding Quality: 0.667
- General Quality: 0.918
- Avg Tokens/sec: 20.2
- Latency (ms): 1744.5
- Coding Composite: 0.738
- General Composite: 0.852
qwen2.5-coder:7b
- Category: coding
- Coding Quality: 0.64
- General Quality: 0.922
- Avg Tokens/sec: 11.2
- Latency (ms): 1211.5
- Coding Composite: 0.63
- General Composite: 0.757
llama3.2:3b
- Category: general
- Coding Quality: 0.607
- General Quality: 0.991
- Avg Tokens/sec: 22.5
- Latency (ms): 576.1
- Coding Composite: 0.794
- General Composite: 0.967
Scoring Formula
- Composite = quality * 0.45 + token_speed_normalized * 0.30 + latency_score * 0.25
- Speed normalized against 22 tok/sec ceiling (hardware-observed max)
- Coding quality: has_def×0.20 + has_return×0.20 + has_docstring×0.15 + has_type_hint×0.15 + has_code_block×0.10 + has_assert×0.08 + has_test_def×0.07 + has_import×0.05
- Category: override dict → quality delta (coding_avg - general_avg >= 0.1) → name pattern (coder/codestral/codellama/starcoder) → general