Role: models

Purpose

Manage the Ollama model lifecycle — pulling models, creating custom Modelfile configurations, and running warm-up services to ensure models are loaded into RAM at boot time across both NUMA instances.

6-Slot System

Slot	Instance	Port	Role	Selection	Rotation
1	Node 1	11434	General (locked)	Top general composite	Re-benchmark only
2	Node 1	11434	General (locked)	2nd general composite	Re-benchmark only
5	Node 1	11434	General (rotate)	3rd general composite	`-e slot5_model=<name>`
3	Node 0	11435	Coding (locked)	Top coding composite	Re-benchmark only
4	Node 0	11435	Coding (locked)	2nd coding composite	Re-benchmark only
6	Node 0	11435	Coding (rotate)	3rd coding composite	`-e slot6_model=<name>`

Slot Rotation

Rotate the general slot on Node 1 (port 11434):

ansible-playbook playbooks/04_models.yml -K -e @local.yml -e "slot5_model=mistral:latest"

Rotate the coding slot on Node 0 (port 11435):

ansible-playbook playbooks/04_models.yml -K -e @local.yml -e "slot6_model=llama3.1:70b"

Both at once:

ansible-playbook playbooks/04_models.yml -K -e @local.yml \
  -e "slot5_model=mistral:latest" -e "slot6_model=command-r:35b"

Reset both rotate slots back to benchmark recommendations:

ansible-playbook playbooks/04_models.yml -K -e @local.yml

Modelfile Configurations

Custom Modelfile variants are created for fine-tuned context windows:

Custom Model	Base Slot	Context	Port	Use Case
`coder-128k`	slot3_coding	32768	11435	Primary coding (large context)
`coder-32k`	slot4_coding	32768	11435	Secondary coding
`coder-rotate`	slot6_coding_rotate	32768	11435	Rotatable coding model
`llama-family`	llama3.2:3b	8192	11434	Family-safe general assistant
`gemma-family`	llama3.1:8b	8192	11434	Family-safe general assistant

These aliases are excluded from benchmarking via benchmark_skip_aliases — their 32k-token parameter allocations stall the benchmark loop with 285-second responses.

Warm-up Services

Two oneshot systemd services pre-load models after their respective Ollama instances start:

Service	Warms	Instance
`ollama-warmup.service`	slots 1, 2, 5	Node 1 (port 11434)
`ollama-warmup-node0.service`	slots 3, 4, 6	Node 0 (port 11435)

OLLAMA_KEEP_ALIVE=-1 keeps models pinned once loaded. The warmup services only need to run once after boot; subsequent requests hit already-loaded models immediately.

Check warmup status:

systemctl status ollama-warmup ollama-warmup-node0

Re-run warmup manually (e.g. after rotating a slot):

systemctl restart ollama-warmup          # Node 1 general models
systemctl restart ollama-warmup-node0    # Node 0 coding models

model_selection.json

playbooks/04_models.yml reads benchmarks/results/model_selection.json:

{
  "slot1_general": "llama3.1:8b",
  "slot2_general": "mistral:latest",
  "slot5_general_rotate": "llama3.2:3b",
  "slot3_coding": "deepseek-coder-v2:16b",
  "slot4_coding": "qwen2.5-coder:7b",
  "slot6_coding_rotate": "codegemma:7b",
  "general_ranking": [...],
  "coding_ranking": [...],
  "all_metrics": { ... }
}

README.md 4.0 KB

Lien permanent Historique Raw

Role: models

Purpose

6-Slot System

Slot Rotation

Modelfile Configurations

Warm-up Services

model_selection.json

Tags

README.md 4.0 KB Lien permanent Historique Raw

Role: models

Purpose

6-Slot System

Slot Rotation

Modelfile Configurations

Warm-up Services

model_selection.json

Tags

README.md 4.0 KB

Lien permanent Historique Raw