Kaynağa Gözat

Fix warmup service bugs and add CLAUDE.md

Three bugs fixed in the model warm-up pipeline:
- warmup.sh.j2: replace undefined slot1_model/slot2_model/slot3_model/slot4_model
  variables with correct model_selection.slot*_general/coding references; skip
  slot4 warmup when value is 'none'
- 04_models.yml: add missing ollama_api_key Vault lookup to vars block so the
  warmup script template can resolve the variable
- 04_models.yml: fix warmup service template path (templates/systemd/, not
  templates/ollama/)

Also adds CLAUDE.md with project guidance and updated benchmark results from
today's run.
Shaun Arman 6 gün önce
ebeveyn
işleme
9472c9dbe4

+ 113 - 0
CLAUDE.md

@@ -0,0 +1,113 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Common Commands
+
+```bash
+# Full deployment
+ansible-playbook playbooks/site.yml
+
+# Run a single playbook
+ansible-playbook playbooks/03_benchmark.yml
+
+# Run with tags (each playbook defines granular tags)
+ansible-playbook playbooks/site.yml --tags ollama,docker
+
+# Benchmark and update warm-up slots in one shot
+ansible-playbook playbooks/03_benchmark.yml && ansible-playbook playbooks/04_models.yml
+
+# Override slot 4 with a specific model
+ansible-playbook playbooks/04_models.yml -e "slot4_model=qwen2.5-coder:7b"
+
+# Run against a subset of hosts
+ansible-playbook playbooks/09_nginx.yml --limit nginx_proxy
+
+# Lint playbooks
+ansible-lint playbooks/
+
+# Install Galaxy dependencies
+ansible-galaxy collection install -r requirements.yml
+
+# Check mode (dry run)
+ansible-playbook playbooks/site.yml --check --diff
+```
+
+## Required Local Configuration
+
+Two gitignored files must exist before any playbook runs:
+
+**`inventory/local.yml`** — per-host SSH overrides:
+```yaml
+all:
+  hosts:
+    ai_server:
+      ansible_host: <actual_ip>
+      ansible_user: <ssh_user>
+    nginx_proxy:
+      ansible_host: <actual_ip>
+    coredns_host:
+      ansible_host: <actual_ip>
+```
+
+**`local.yml`** — play-level variable overrides (domain, platform_name, SSL cert paths, etc.)
+
+Vault runtime credentials live in `vault/.vault-token` and `vault/.vault-init.json` (written by `01_vault.yml` on first run).
+
+## Architecture
+
+### Three-Host Model
+
+```
+nginx_proxy (172.0.0.30)     — NGINX TLS termination for all public-facing services
+ai_server (172.0.0.100)      — Ollama, Keycloak, Qdrant, Open WebUI, Vault, OpenClaw
+coredns_host (172.0.0.29)    — CoreDNS zone file, Vault data mount
+```
+
+Vault runs on `ai_server` at `127.0.0.1:8202` only; NGINX proxies `https://vault.tftsr.com → ai_server:8202`. The same NGINX-as-TLS-terminator pattern applies to all services.
+
+### Playbook Sequence
+
+`site.yml` imports `00_preflight.yml` through `11_vault_oidc.yml` in order. Each can be run standalone. The canonical sequence matters for first-run because:
+- `01_vault.yml` must precede all others (secrets don't exist yet)
+- `05_keycloak.yml` must precede `07_openwebui.yml` (OIDC client_secret written to Vault by Keycloak role, read by OpenWebUI role)
+- `03_benchmark.yml` must precede `04_models.yml` (produces `model_selection.json`)
+
+### Secrets Flow
+
+All credentials live exclusively in Vault under `secret/data/{{ vault_project_slug }}/*`. Playbooks retrieve them using either:
+- `community.hashi_vault.hashi_vault` lookup plugin
+- `ansible.builtin.uri` REST calls with `X-Vault-Token` header from `vault/.vault-token`
+
+**Idempotency rule:** secrets are written to Vault only when the key does not already exist. Re-running never rotates credentials. To rotate: `vault kv delete secret/<slug>/<path>` then re-run the relevant playbook.
+
+### Dynamic Benchmark → Model Slot Pipeline
+
+`03_benchmark.yml` tests every locally-installed Ollama model against 6 prompts (3 coding, 3 general + 1 latency test), scores each, and writes `benchmarks/results/model_selection.json`. `04_models.yml` reads that JSON to decide which models to pull and keep warm.
+
+**Composite score formula:**
+```
+composite = (quality × 0.45) + (tokens_per_sec / 30, capped at 1.0) × 0.30 + (1 - ttft_ms/5000, floored at 0) × 0.25
+```
+
+**Slot classification:** if `coding_composite - general_composite >= 0.15` (configurable via `benchmark_coding_threshold`), model goes to a coding slot; otherwise general.
+
+**4 warm-up slots always hot in RAM:**
+- Slots 1–2: top general-purpose models by composite score
+- Slots 3–4: top coding models by composite score
+- Slot 4 is user-rotatable via `-e slot4_model=<name>` without re-benchmarking
+
+`04_models.yml` creates named Ollama Modelfiles (`coder-128k`, `coder-32k`, `llama-family`, `gemma-family`) and a `ollama-warmup.service` systemd one-shot that pre-loads all 4 slots after Ollama starts.
+
+### Key Variables
+
+All tuneable defaults live in `inventory/group_vars/all.yml`. The two most commonly changed clusters:
+
+- **`candidate_models`** list — which models to auto-pull before benchmarking
+- **`benchmark_thresholds`** block — min scores and normalization ceiling
+
+`ollama_numa_node` and `ollama_cpu_affinity` are tuned for the Dell M630 dual-socket layout (NUMA node 1 holds ~120 GB free RAM); adjust these for other hardware.
+
+### Docker Services
+
+Keycloak, Qdrant, and Open WebUI run as Docker containers managed by `community.docker.docker_container`. Service-to-service calls use `host.docker.internal` (Docker bridge). Ollama and Vault run as native systemd services, not containers.

+ 41 - 0
benchmarks/results/benchmark_20260307T161036.md

@@ -0,0 +1,41 @@
+# Benchmark Results - 20260307T161036
+
+## Model Selection
+| Slot | Role | Model | Composite Score |
+|------|------|-------|----------------|
+| 1 | General (Primary) | deepseek-coder-v2 | 0.0 |
+| 2 | General (Secondary) | qwen2.5-coder:7b | 0.0 |
+| 3 | Coding (Primary) | deepseek-coder-v2 | 0.0 |
+| 4 | Coding (Secondary) | none | N/A |
+
+## Detailed Metrics
+### deepseek-coder-v2
+- **Category**: general
+- **Coding Quality**: 0
+- **General Quality**: 0
+- **Avg Tokens/sec**: 0.0
+- **Latency (ms)**: 9999
+- **Coding Composite**: 0.0
+- **General Composite**: 0.0
+### qwen2.5-coder:7b
+- **Category**: general
+- **Coding Quality**: 0
+- **General Quality**: 0
+- **Avg Tokens/sec**: 0.0
+- **Latency (ms)**: 9999
+- **Coding Composite**: 0.0
+- **General Composite**: 0.0
+### llama3.2:3b
+- **Category**: general
+- **Coding Quality**: 0
+- **General Quality**: 0
+- **Avg Tokens/sec**: 0.0
+- **Latency (ms)**: 9999
+- **Coding Composite**: 0.0
+- **General Composite**: 0.0
+
+## Scoring Formula
+- Composite = quality * 0.45 + token_speed_normalized * 0.30 + latency_score * 0.25
+- Speed normalized against 30 tok/sec ceiling
+- Coding quality: has_def×0.20 + has_return×0.20 + has_assert×0.15 + has_test_def×0.15 + has_docstring×0.15 + has_type_hint×0.15
+- Category: coding if (coding_composite - general_composite) >= 0.1, else general

+ 41 - 0
benchmarks/results/benchmark_20260307T161220.md

@@ -0,0 +1,41 @@
+# Benchmark Results - 20260307T161220
+
+## Model Selection
+| Slot | Role | Model | Composite Score |
+|------|------|-------|----------------|
+| 1 | General (Primary) | deepseek-coder-v2 | 0.0 |
+| 2 | General (Secondary) | qwen2.5-coder:7b | 0.0 |
+| 3 | Coding (Primary) | deepseek-coder-v2 | 0.0 |
+| 4 | Coding (Secondary) | none | N/A |
+
+## Detailed Metrics
+### deepseek-coder-v2
+- **Category**: general
+- **Coding Quality**: 0
+- **General Quality**: 0
+- **Avg Tokens/sec**: 0.0
+- **Latency (ms)**: 9999
+- **Coding Composite**: 0.0
+- **General Composite**: 0.0
+### qwen2.5-coder:7b
+- **Category**: general
+- **Coding Quality**: 0
+- **General Quality**: 0
+- **Avg Tokens/sec**: 0.0
+- **Latency (ms)**: 9999
+- **Coding Composite**: 0.0
+- **General Composite**: 0.0
+### llama3.2:3b
+- **Category**: general
+- **Coding Quality**: 0
+- **General Quality**: 0
+- **Avg Tokens/sec**: 0.0
+- **Latency (ms)**: 9999
+- **Coding Composite**: 0.0
+- **General Composite**: 0.0
+
+## Scoring Formula
+- Composite = quality * 0.45 + token_speed_normalized * 0.30 + latency_score * 0.25
+- Speed normalized against 30 tok/sec ceiling
+- Coding quality: has_def×0.20 + has_return×0.20 + has_assert×0.15 + has_test_def×0.15 + has_docstring×0.15 + has_type_hint×0.15
+- Category: coding if (coding_composite - general_composite) >= 0.1, else general

+ 41 - 0
benchmarks/results/benchmark_20260307T161721.md

@@ -0,0 +1,41 @@
+# Benchmark Results - 20260307T161721
+
+## Model Selection
+| Slot | Role | Model | Composite Score |
+|------|------|-------|----------------|
+| 1 | General (Primary) | llama3.2:3b | 0.821 |
+| 2 | General (Secondary) | deepseek-coder-v2 | 0.792 |
+| 3 | Coding (Primary) | llama3.2:3b | 0.766 |
+| 4 | Coding (Secondary) | none | N/A |
+
+## Detailed Metrics
+### deepseek-coder-v2
+- **Category**: general
+- **Coding Quality**: 0.6
+- **General Quality**: 0.959
+- **Avg Tokens/sec**: 20.3
+- **Latency (ms)**: 1853.1
+- **Coding Composite**: 0.63
+- **General Composite**: 0.792
+### qwen2.5-coder:7b
+- **Category**: general
+- **Coding Quality**: 0.6
+- **General Quality**: 0.918
+- **Avg Tokens/sec**: 12.5
+- **Latency (ms)**: 1186.0
+- **Coding Composite**: 0.585
+- **General Composite**: 0.729
+### llama3.2:3b
+- **Category**: general
+- **Coding Quality**: 0.75
+- **General Quality**: 0.873
+- **Avg Tokens/sec**: 21.4
+- **Latency (ms)**: 728.7
+- **Coding Composite**: 0.766
+- **General Composite**: 0.821
+
+## Scoring Formula
+- Composite = quality * 0.45 + token_speed_normalized * 0.30 + latency_score * 0.25
+- Speed normalized against 30 tok/sec ceiling
+- Coding quality: has_def×0.20 + has_return×0.20 + has_assert×0.15 + has_test_def×0.15 + has_docstring×0.15 + has_type_hint×0.15
+- Category: coding if (coding_composite - general_composite) >= 0.1, else general

+ 41 - 0
benchmarks/results/benchmark_20260307T163017.md

@@ -0,0 +1,41 @@
+# Benchmark Results - 20260307T163017
+
+## Model Selection
+| Slot | Role | Model | Composite Score |
+|------|------|-------|----------------|
+| 1 | General (Primary) | llama3.2:3b | 0.86 |
+| 2 | General (Secondary) | deepseek-coder-v2 | 0.781 |
+| 3 | Coding (Primary) | llama3.2:3b | 0.748 |
+| 4 | Coding (Secondary) | none | N/A |
+
+## Detailed Metrics
+### deepseek-coder-v2
+- **Category**: general
+- **Coding Quality**: 0.55
+- **General Quality**: 0.948
+- **Avg Tokens/sec**: 19.8
+- **Latency (ms)**: 1875.8
+- **Coding Composite**: 0.602
+- **General Composite**: 0.781
+### qwen2.5-coder:7b
+- **Category**: general
+- **Coding Quality**: 0.6
+- **General Quality**: 0.895
+- **Avg Tokens/sec**: 12.3
+- **Latency (ms)**: 2501.0
+- **Coding Composite**: 0.518
+- **General Composite**: 0.65
+### llama3.2:3b
+- **Category**: general
+- **Coding Quality**: 0.7
+- **General Quality**: 0.949
+- **Avg Tokens/sec**: 21.8
+- **Latency (ms)**: 697.1
+- **Coding Composite**: 0.748
+- **General Composite**: 0.86
+
+## Scoring Formula
+- Composite = quality * 0.45 + token_speed_normalized * 0.30 + latency_score * 0.25
+- Speed normalized against 30 tok/sec ceiling
+- Coding quality: has_def×0.20 + has_return×0.20 + has_assert×0.15 + has_test_def×0.15 + has_docstring×0.15 + has_type_hint×0.15
+- Category: coding if (coding_composite - general_composite) >= 0.1, else general

+ 59 - 371
benchmarks/results/model_selection.json

@@ -1,401 +1,89 @@
 {
     "all_metrics": {
-        "codellama:13b-instruct-q5_K_M": {
-            "avg_tok_per_sec": 4.1,
+        "deepseek-coder-v2": {
+            "avg_tok_per_sec": 19.8,
             "category": "general",
-            "coding_composite": 0.568,
-            "coding_quality": 0.804,
-            "general_composite": 0.508,
-            "general_quality": 0.671,
-            "latency_ms": 1126.4,
-            "latency_score": 0.775,
-            "toks_norm": 0.041
-        },
-        "codestral:22b-v0.1-q4_K_M": {
-            "avg_tok_per_sec": 2.3,
-            "category": "general",
-            "coding_composite": 0.32,
-            "coding_quality": 0.696,
-            "general_composite": 0.406,
-            "general_quality": 0.887,
-            "latency_ms": 58429.3,
-            "latency_score": 0,
-            "toks_norm": 0.023
-        },
-        "deepseek-coder-v2:16b-lite-instruct-q4_K_M": {
-            "avg_tok_per_sec": 21.3,
-            "category": "general",
-            "coding_composite": 0.618,
-            "coding_quality": 0.855,
-            "general_composite": 0.683,
-            "general_quality": 1.0,
-            "latency_ms": 1617.0,
-            "latency_score": 0.677,
-            "toks_norm": 0.213
-        },
-        "deepseek-r1:14b": {
-            "avg_tok_per_sec": 6.4,
-            "category": "general",
-            "coding_composite": 0.519,
-            "coding_quality": 0.853,
-            "general_composite": 0.562,
+            "coding_composite": 0.602,
+            "coding_quality": 0.55,
+            "general_composite": 0.781,
             "general_quality": 0.948,
-            "latency_ms": 2677.7,
-            "latency_score": 0.464,
-            "toks_norm": 0.064
-        },
-        "dolphin-mixtral:8x7b": {
-            "avg_tok_per_sec": 4.8,
-            "category": "general",
-            "coding_composite": 0.451,
-            "coding_quality": 0.755,
-            "general_composite": 0.437,
-            "general_quality": 0.725,
-            "latency_ms": 3065.7,
-            "latency_score": 0.387,
-            "toks_norm": 0.048
-        },
-        "gpt-oss:20b": {
-            "avg_tok_per_sec": 10.3,
-            "category": "general",
-            "coding_composite": 0.471,
-            "coding_quality": 0.978,
-            "general_composite": 0.447,
-            "general_quality": 0.925,
-            "latency_ms": 8158.0,
-            "latency_score": 0,
-            "toks_norm": 0.103
-        },
-        "mistral:7b-instruct": {
-            "avg_tok_per_sec": 12.1,
-            "category": "general",
-            "coding_composite": 0.417,
-            "coding_quality": 0.846,
-            "general_composite": 0.359,
-            "general_quality": 0.717,
-            "latency_ms": 6696.2,
-            "latency_score": 0,
-            "toks_norm": 0.121
-        },
-        "phi4:14b": {
-            "avg_tok_per_sec": 6.6,
-            "category": "general",
-            "coding_composite": 0.457,
-            "coding_quality": 0.904,
-            "general_composite": 0.469,
-            "general_quality": 0.931,
-            "latency_ms": 4394.9,
-            "latency_score": 0.121,
-            "toks_norm": 0.066
+            "latency_ms": 1875.8,
+            "latency_score": 0.625,
+            "toks_norm": 0.661
         },
-        "qwen2.5-coder:14b-instruct-q4_K_M": {
-            "avg_tok_per_sec": 4.9,
+        "llama3.2:3b": {
+            "avg_tok_per_sec": 21.8,
             "category": "general",
-            "coding_composite": 0.393,
-            "coding_quality": 0.84,
-            "general_composite": 0.396,
-            "general_quality": 0.848,
-            "latency_ms": 6865.3,
-            "latency_score": 0,
-            "toks_norm": 0.049
-        },
-        "qwen2.5-coder:7b-instruct-q4_K_M": {
-            "avg_tok_per_sec": 11.5,
-            "category": "general",
-            "coding_composite": 0.593,
-            "coding_quality": 0.83,
-            "general_composite": 0.619,
-            "general_quality": 0.887,
-            "latency_ms": 1301.7,
-            "latency_score": 0.74,
-            "toks_norm": 0.115
-        },
-        "qwen2.5-coder:7b-instruct-q5_K_M": {
-            "avg_tok_per_sec": 9.0,
+            "coding_composite": 0.748,
+            "coding_quality": 0.7,
+            "general_composite": 0.86,
+            "general_quality": 0.949,
+            "latency_ms": 697.1,
+            "latency_score": 0.861,
+            "toks_norm": 0.728
+        },
+        "qwen2.5-coder:7b": {
+            "avg_tok_per_sec": 12.3,
             "category": "general",
-            "coding_composite": 0.496,
-            "coding_quality": 0.81,
-            "general_composite": 0.548,
-            "general_quality": 0.925,
-            "latency_ms": 2900.9,
-            "latency_score": 0.42,
-            "toks_norm": 0.09
-        },
-        "qwen2.5-coder:7b-instruct-q6_K": {
-            "avg_tok_per_sec": 5.9,
-            "category": "general",
-            "coding_composite": 0.536,
-            "coding_quality": 0.832,
-            "general_composite": 0.576,
-            "general_quality": 0.919,
-            "latency_ms": 2112.8,
-            "latency_score": 0.577,
-            "toks_norm": 0.059
-        },
-        "qwen3-coder-next:latest": {
-            "avg_tok_per_sec": 4.6,
-            "category": "general",
-            "coding_composite": 0.444,
-            "coding_quality": 0.785,
-            "general_composite": 0.492,
-            "general_quality": 0.892,
-            "latency_ms": 3462.7,
-            "latency_score": 0.307,
-            "toks_norm": 0.046
-        },
-        "qwen3-coder:30b": {
-            "avg_tok_per_sec": 7.9,
-            "category": "general",
-            "coding_composite": 0.584,
-            "coding_quality": 0.885,
-            "general_composite": 0.578,
-            "general_quality": 0.872,
-            "latency_ms": 1769.0,
-            "latency_score": 0.646,
-            "toks_norm": 0.079
-        },
-        "qwen3.5:35b": {
-            "avg_tok_per_sec": 5.3,
-            "category": "general",
-            "coding_composite": 0.411,
-            "coding_quality": 0.879,
-            "general_composite": 0.466,
-            "general_quality": 1.0,
-            "latency_ms": 133176.0,
-            "latency_score": 0,
-            "toks_norm": 0.053
+            "coding_composite": 0.518,
+            "coding_quality": 0.6,
+            "general_composite": 0.65,
+            "general_quality": 0.895,
+            "latency_ms": 2501.0,
+            "latency_score": 0.5,
+            "toks_norm": 0.41
         }
     },
     "coding_ranking": [],
     "general_ranking": [
         {
-            "composite": 0.683,
+            "composite": 0.86,
             "metrics": {
-                "avg_tok_per_sec": 21.3,
+                "avg_tok_per_sec": 21.8,
                 "category": "general",
-                "coding_composite": 0.618,
-                "coding_quality": 0.855,
-                "general_composite": 0.683,
-                "general_quality": 1.0,
-                "latency_ms": 1617.0,
-                "latency_score": 0.677,
-                "toks_norm": 0.213
+                "coding_composite": 0.748,
+                "coding_quality": 0.7,
+                "general_composite": 0.86,
+                "general_quality": 0.949,
+                "latency_ms": 697.1,
+                "latency_score": 0.861,
+                "toks_norm": 0.728
             },
-            "name": "deepseek-coder-v2:16b-lite-instruct-q4_K_M"
+            "name": "llama3.2:3b"
         },
         {
-            "composite": 0.619,
+            "composite": 0.781,
             "metrics": {
-                "avg_tok_per_sec": 11.5,
+                "avg_tok_per_sec": 19.8,
                 "category": "general",
-                "coding_composite": 0.593,
-                "coding_quality": 0.83,
-                "general_composite": 0.619,
-                "general_quality": 0.887,
-                "latency_ms": 1301.7,
-                "latency_score": 0.74,
-                "toks_norm": 0.115
-            },
-            "name": "qwen2.5-coder:7b-instruct-q4_K_M"
-        },
-        {
-            "composite": 0.578,
-            "metrics": {
-                "avg_tok_per_sec": 7.9,
-                "category": "general",
-                "coding_composite": 0.584,
-                "coding_quality": 0.885,
-                "general_composite": 0.578,
-                "general_quality": 0.872,
-                "latency_ms": 1769.0,
-                "latency_score": 0.646,
-                "toks_norm": 0.079
-            },
-            "name": "qwen3-coder:30b"
-        },
-        {
-            "composite": 0.576,
-            "metrics": {
-                "avg_tok_per_sec": 5.9,
-                "category": "general",
-                "coding_composite": 0.536,
-                "coding_quality": 0.832,
-                "general_composite": 0.576,
-                "general_quality": 0.919,
-                "latency_ms": 2112.8,
-                "latency_score": 0.577,
-                "toks_norm": 0.059
-            },
-            "name": "qwen2.5-coder:7b-instruct-q6_K"
-        },
-        {
-            "composite": 0.562,
-            "metrics": {
-                "avg_tok_per_sec": 6.4,
-                "category": "general",
-                "coding_composite": 0.519,
-                "coding_quality": 0.853,
-                "general_composite": 0.562,
+                "coding_composite": 0.602,
+                "coding_quality": 0.55,
+                "general_composite": 0.781,
                 "general_quality": 0.948,
-                "latency_ms": 2677.7,
-                "latency_score": 0.464,
-                "toks_norm": 0.064
-            },
-            "name": "deepseek-r1:14b"
-        },
-        {
-            "composite": 0.548,
-            "metrics": {
-                "avg_tok_per_sec": 9.0,
-                "category": "general",
-                "coding_composite": 0.496,
-                "coding_quality": 0.81,
-                "general_composite": 0.548,
-                "general_quality": 0.925,
-                "latency_ms": 2900.9,
-                "latency_score": 0.42,
-                "toks_norm": 0.09
-            },
-            "name": "qwen2.5-coder:7b-instruct-q5_K_M"
-        },
-        {
-            "composite": 0.508,
-            "metrics": {
-                "avg_tok_per_sec": 4.1,
-                "category": "general",
-                "coding_composite": 0.568,
-                "coding_quality": 0.804,
-                "general_composite": 0.508,
-                "general_quality": 0.671,
-                "latency_ms": 1126.4,
-                "latency_score": 0.775,
-                "toks_norm": 0.041
-            },
-            "name": "codellama:13b-instruct-q5_K_M"
-        },
-        {
-            "composite": 0.492,
-            "metrics": {
-                "avg_tok_per_sec": 4.6,
-                "category": "general",
-                "coding_composite": 0.444,
-                "coding_quality": 0.785,
-                "general_composite": 0.492,
-                "general_quality": 0.892,
-                "latency_ms": 3462.7,
-                "latency_score": 0.307,
-                "toks_norm": 0.046
-            },
-            "name": "qwen3-coder-next:latest"
-        },
-        {
-            "composite": 0.469,
-            "metrics": {
-                "avg_tok_per_sec": 6.6,
-                "category": "general",
-                "coding_composite": 0.457,
-                "coding_quality": 0.904,
-                "general_composite": 0.469,
-                "general_quality": 0.931,
-                "latency_ms": 4394.9,
-                "latency_score": 0.121,
-                "toks_norm": 0.066
-            },
-            "name": "phi4:14b"
-        },
-        {
-            "composite": 0.466,
-            "metrics": {
-                "avg_tok_per_sec": 5.3,
-                "category": "general",
-                "coding_composite": 0.411,
-                "coding_quality": 0.879,
-                "general_composite": 0.466,
-                "general_quality": 1.0,
-                "latency_ms": 133176.0,
-                "latency_score": 0,
-                "toks_norm": 0.053
-            },
-            "name": "qwen3.5:35b"
-        },
-        {
-            "composite": 0.447,
-            "metrics": {
-                "avg_tok_per_sec": 10.3,
-                "category": "general",
-                "coding_composite": 0.471,
-                "coding_quality": 0.978,
-                "general_composite": 0.447,
-                "general_quality": 0.925,
-                "latency_ms": 8158.0,
-                "latency_score": 0,
-                "toks_norm": 0.103
-            },
-            "name": "gpt-oss:20b"
-        },
-        {
-            "composite": 0.437,
-            "metrics": {
-                "avg_tok_per_sec": 4.8,
-                "category": "general",
-                "coding_composite": 0.451,
-                "coding_quality": 0.755,
-                "general_composite": 0.437,
-                "general_quality": 0.725,
-                "latency_ms": 3065.7,
-                "latency_score": 0.387,
-                "toks_norm": 0.048
-            },
-            "name": "dolphin-mixtral:8x7b"
-        },
-        {
-            "composite": 0.406,
-            "metrics": {
-                "avg_tok_per_sec": 2.3,
-                "category": "general",
-                "coding_composite": 0.32,
-                "coding_quality": 0.696,
-                "general_composite": 0.406,
-                "general_quality": 0.887,
-                "latency_ms": 58429.3,
-                "latency_score": 0,
-                "toks_norm": 0.023
-            },
-            "name": "codestral:22b-v0.1-q4_K_M"
-        },
-        {
-            "composite": 0.396,
-            "metrics": {
-                "avg_tok_per_sec": 4.9,
-                "category": "general",
-                "coding_composite": 0.393,
-                "coding_quality": 0.84,
-                "general_composite": 0.396,
-                "general_quality": 0.848,
-                "latency_ms": 6865.3,
-                "latency_score": 0,
-                "toks_norm": 0.049
+                "latency_ms": 1875.8,
+                "latency_score": 0.625,
+                "toks_norm": 0.661
             },
-            "name": "qwen2.5-coder:14b-instruct-q4_K_M"
+            "name": "deepseek-coder-v2"
         },
         {
-            "composite": 0.359,
+            "composite": 0.65,
             "metrics": {
-                "avg_tok_per_sec": 12.1,
+                "avg_tok_per_sec": 12.3,
                 "category": "general",
-                "coding_composite": 0.417,
-                "coding_quality": 0.846,
-                "general_composite": 0.359,
-                "general_quality": 0.717,
-                "latency_ms": 6696.2,
-                "latency_score": 0,
-                "toks_norm": 0.121
+                "coding_composite": 0.518,
+                "coding_quality": 0.6,
+                "general_composite": 0.65,
+                "general_quality": 0.895,
+                "latency_ms": 2501.0,
+                "latency_score": 0.5,
+                "toks_norm": 0.41
             },
-            "name": "mistral:7b-instruct"
+            "name": "qwen2.5-coder:7b"
         }
     ],
-    "slot1_general": "deepseek-coder-v2:16b-lite-instruct-q4_K_M",
-    "slot2_general": "qwen2.5-coder:7b-instruct-q4_K_M",
-    "slot3_coding": "deepseek-coder-v2:16b-lite-instruct-q4_K_M",
+    "slot1_general": "llama3.2:3b",
+    "slot2_general": "deepseek-coder-v2",
+    "slot3_coding": "llama3.2:3b",
     "slot4_coding": "none"
 }

+ 2 - 1
playbooks/04_models.yml

@@ -12,6 +12,7 @@
     model_selection_file: "{{ playbook_dir }}/../benchmarks/results/model_selection.json"
     modelfiles_dir: /mnt/ai_data/ollama_models/modelfiles
     slot4_model: ""
+    ollama_api_key: "{{ lookup('community.hashi_vault.hashi_vault', vault_secret_prefix ~ '/ollama:api_key token=' ~ lookup('ansible.builtin.file', vault_token_file) ~ ' url=' ~ vault_url) }}"
 
   tasks:
     # ── Load benchmark results ───────────────────────────────────────
@@ -181,7 +182,7 @@
 
     - name: "Models | Template warmup systemd service"
       ansible.builtin.template:
-        src: "{{ playbook_dir }}/../templates/ollama/ollama-warmup.service.j2"
+        src: "{{ playbook_dir }}/../templates/systemd/ollama-warmup.service.j2"
         dest: /etc/systemd/system/ollama-warmup.service
         mode: "0644"
         owner: root

+ 6 - 4
templates/ollama/warmup.sh.j2

@@ -18,9 +18,11 @@ warmup_model() {
     echo "[warmup] Done: $model"
 }
 
-warmup_model "{{ slot1_model }}"
-warmup_model "{{ slot2_model }}"
-warmup_model "{{ slot3_model }}"
-warmup_model "{{ slot4_model }}"
+warmup_model "{{ model_selection.slot1_general }}"
+warmup_model "{{ model_selection.slot2_general }}"
+warmup_model "{{ model_selection.slot3_coding }}"
+{% if model_selection.slot4_coding | length > 0 and model_selection.slot4_coding != 'none' %}
+warmup_model "{{ model_selection.slot4_coding }}"
+{% endif %}
 
 echo "[warmup] All models warmed up."