benchmark_20260308T215747.md 2.1 KB

Benchmark Results - 20260308T215747

Model Selection (6-slot / 2-socket)

Slot Socket Role Model Composite Score
1 Node 1 (port 11434) General (locked) llama3.2:3b 0.45
2 Node 1 (port 11434) General (locked) mistral-nemo:latest 0.45
5 Node 1 (port 11434) General (rotate) none N/A
3 Node 0 (port 11435) Coding (locked) qwen2.5-coder:7b 0.371
4 Node 0 (port 11435) Coding (locked) qwen2.5-coder:7b 0.371
6 Node 0 (port 11435) Coding (rotate) none N/A

Detailed Metrics

llama3.2:3b

  • Category: general
  • Coding Quality: 0.917
  • General Quality: 1.0
  • Avg Tokens/sec: 0.1
  • Latency (ms): 9999
  • Coding Composite: 0.413
  • General Composite: 0.45

qwen2.5-coder:7b

  • Category: coding
  • Coding Quality: 0.823
  • General Quality: 0.85
  • Avg Tokens/sec: 0.1
  • Latency (ms): 9999
  • Coding Composite: 0.371
  • General Composite: 0.383

mistral-nemo:latest

  • Category: general
  • Coding Quality: 0.85
  • General Quality: 1.0
  • Avg Tokens/sec: 0.1
  • Latency (ms): 9999
  • Coding Composite: 0.383
  • General Composite: 0.45

Scoring Formula

  • Composite = quality * 0.45 + token_speed_normalized * 0.30 + latency_score * 0.25
  • Speed normalized against 40 tok/sec ceiling (hardware-observed max)
  • Coding quality (per-prompt): code_gen: has_def×0.20 + has_return×0.20 + has_docstring×0.15 + has_type_hint×0.15 + has_code_block×0.10 + has_assert×0.08 + has_test_def×0.07 + has_import×0.05 debug: has_def×0.30 + has_return×0.30 + has_code_block×0.25 + has_assert×0.15 refactor: has_def×0.25 + has_return×0.25 + has_code_block×0.20 + has_type_hint×0.15 + has_import×0.15
  • Category: override dict → quality delta (coding_avg - general_avg >= 0.1) → name pattern (coder/codestral/codellama/starcoder) → general