# Role: models ## Purpose Manage the Ollama model lifecycle — pulling models, creating custom Modelfile configurations, and running warm-up services to ensure models are loaded into RAM at boot time across both NUMA instances. ## 6-Slot System | Slot | Instance | Port | Role | Selection | Rotation | |------|---------------|-------|------------------|--------------------------------|-----------------------------| | 1 | Node 1 | 11434 | General (locked) | Top general composite | Re-benchmark only | | 2 | Node 1 | 11434 | General (locked) | 2nd general composite | Re-benchmark only | | 5 | Node 1 | 11434 | General (rotate) | 3rd general composite | `-e slot5_model=` | | 3 | Node 0 | 11435 | Coding (locked) | Top coding composite | Re-benchmark only | | 4 | Node 0 | 11435 | Coding (locked) | 2nd coding composite | Re-benchmark only | | 6 | Node 0 | 11435 | Coding (rotate) | 3rd coding composite | `-e slot6_model=` | ## Slot Rotation Rotate the general slot on Node 1 (port 11434): ```bash ansible-playbook playbooks/04_models.yml -K -e @local.yml -e "slot5_model=mistral:latest" ``` Rotate the coding slot on Node 0 (port 11435): ```bash ansible-playbook playbooks/04_models.yml -K -e @local.yml -e "slot6_model=llama3.1:70b" ``` Both at once: ```bash ansible-playbook playbooks/04_models.yml -K -e @local.yml \ -e "slot5_model=mistral:latest" -e "slot6_model=command-r:35b" ``` Reset both rotate slots back to benchmark recommendations: ```bash ansible-playbook playbooks/04_models.yml -K -e @local.yml ``` ## Modelfile Configurations Custom Modelfile variants are created for fine-tuned context windows: | Custom Model | Base Slot | Context | Port | Use Case | |-----------------|--------------|---------|-------|----------------------------------| | `coder-128k` | slot3_coding | 32768 | 11435 | Primary coding (large context) | | `coder-32k` | slot4_coding | 32768 | 11435 | Secondary coding | | `coder-rotate` | slot6_coding_rotate | 32768 | 11435 | Rotatable coding model | | `llama-family` | llama3.2:3b | 8192 | 11434 | Family-safe general assistant | | `gemma-family` | llama3.1:8b | 8192 | 11434 | Family-safe general assistant | **These aliases are excluded from benchmarking** via `benchmark_skip_aliases` — their 32k-token parameter allocations stall the benchmark loop with 285-second responses. ## Warm-up Services Two oneshot systemd services pre-load models after their respective Ollama instances start: | Service | Warms | Instance | |------------------------------|---------------------|---------------------| | `ollama-warmup.service` | slots 1, 2, 5 | Node 1 (port 11434) | | `ollama-warmup-node0.service`| slots 3, 4, 6 | Node 0 (port 11435) | `OLLAMA_KEEP_ALIVE=-1` keeps models pinned once loaded. The warmup services only need to run once after boot; subsequent requests hit already-loaded models immediately. Check warmup status: ```bash systemctl status ollama-warmup ollama-warmup-node0 ``` Re-run warmup manually (e.g. after rotating a slot): ```bash systemctl restart ollama-warmup # Node 1 general models systemctl restart ollama-warmup-node0 # Node 0 coding models ``` ## model_selection.json `playbooks/04_models.yml` reads `benchmarks/results/model_selection.json`: ```json { "slot1_general": "llama3.1:8b", "slot2_general": "mistral:latest", "slot5_general_rotate": "llama3.2:3b", "slot3_coding": "deepseek-coder-v2:16b", "slot4_coding": "qwen2.5-coder:7b", "slot6_coding_rotate": "codegemma:7b", "general_ranking": [...], "coding_ranking": [...], "all_metrics": { ... } } ``` ## Tags ```bash ansible-playbook playbooks/site.yml --tags models -K -e @local.yml ansible-playbook playbooks/site.yml --tags models-warmup -K -e @local.yml ```