benchmark_20260310T110632.md 3.2 KB

Benchmark Results - 20260310T110632

Model Selection (6-slot / 2-socket)

Slot Socket Role Model Composite Score
1 Node 1 (port 11434) General (locked) llama3.2:3b 0.814
2 Node 1 (port 11434) General (locked) llama3.1:8b 0.621
5 Node 1 (port 11434) General (rotate) gemma3:12b-it-q4_K_M 0.483
3 Node 0 (port 11435) Coding (locked) deepseek-coder-v2:16b 0.738
4 Node 0 (port 11435) Coding (locked) deepseek-coder-v2:latest 0.735
6 Node 0 (port 11435) Coding (rotate) qwen2.5-coder:latest 0.667

Detailed Metrics

codellama:34b

  • Category: coding
  • Coding Quality: 0.833
  • General Quality: 0.586
  • Avg Tokens/sec: 3.2
  • Latency (ms): 4244.1
  • Coding Composite: 0.437
  • General Composite: 0.326

    deepseek-coder-v2:latest

  • Category: coding

  • Coding Quality: 0.833

  • General Quality: 0.885

  • Avg Tokens/sec: 25.0

  • Latency (ms): 1543.2

  • Coding Composite: 0.735

  • General Composite: 0.758

    deepseek-coder-v2:16b

  • Category: coding

  • Coding Quality: 0.833

  • General Quality: 0.885

  • Avg Tokens/sec: 24.5

  • Latency (ms): 1415.1

  • Coding Composite: 0.738

  • General Composite: 0.762

    qwen2.5-coder:14B

  • Category: coding

  • Coding Quality: 0.85

  • General Quality: 0.931

  • Avg Tokens/sec: 6.6

  • Latency (ms): 2195.9

  • Coding Composite: 0.572

  • General Composite: 0.609

    qwen2.5-coder:latest

  • Category: coding

  • Coding Quality: 0.85

  • General Quality: 0.91

  • Avg Tokens/sec: 12.8

  • Latency (ms): 1228.2

  • Coding Composite: 0.667

  • General Composite: 0.694

    llama3.1:8b

  • Category: general

  • Coding Quality: 0.823

  • General Quality: 0.877

  • Avg Tokens/sec: 11.8

  • Latency (ms): 2249.3

  • Coding Composite: 0.596

  • General Composite: 0.621

    qwen2.5-coder:7b

  • Category: coding

  • Coding Quality: 0.85

  • General Quality: 0.91

  • Avg Tokens/sec: 12.7

  • Latency (ms): 1231.9

  • Coding Composite: 0.666

  • General Composite: 0.693

    gemma3:12b-it-q4_K_M

  • Category: general

  • Coding Quality: 0.873

  • General Quality: 0.966

  • Avg Tokens/sec: 6.4

  • Latency (ms): 6355.8

  • Coding Composite: 0.441

  • General Composite: 0.483

    llama3.2:3b

  • Category: general

  • Coding Quality: 0.89

  • General Quality: 0.954

  • Avg Tokens/sec: 22.3

  • Latency (ms): 644.2

  • Coding Composite: 0.785

  • General Composite: 0.814

Scoring Formula

  • Composite = quality * 0.45 + token_speed_normalized * 0.30 + latency_score * 0.25
  • Speed normalized against 40 tok/sec ceiling (hardware-observed max)
  • Coding quality (per-prompt): code_gen: has_def×0.20 + has_return×0.20 + has_docstring×0.15 + has_type_hint×0.15 + has_code_block×0.10 + has_assert×0.08 + has_test_def×0.07 + has_import×0.05 debug: has_def×0.30 + has_return×0.30 + has_code_block×0.25 + has_assert×0.15 refactor: has_def×0.25 + has_return×0.25 + has_code_block×0.20 + has_type_hint×0.15 + has_import×0.15
  • Category: override dict → quality delta (coding_avg - general_avg >= 0.1) → name pattern (coder/codestral/codellama/starcoder) → general