CLAUDE.md 5.3 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Common Commands

# Full deployment
ansible-playbook playbooks/site.yml -K -e @local.yml

# Run a single playbook
ansible-playbook playbooks/03_benchmark.yml -K -e @local.yml

# Run with tags (each playbook defines granular tags)
ansible-playbook playbooks/site.yml --tags ollama,docker -K -e @local.yml

# Benchmark and update warm-up slots in one shot
ansible-playbook playbooks/03_benchmark.yml -K -e @local.yml && \
ansible-playbook playbooks/04_models.yml -K -e @local.yml

# Rotate general slot (Node 1, port 11434)
ansible-playbook playbooks/04_models.yml -K -e @local.yml -e "slot5_model=mistral:latest"

# Rotate coding slot (Node 0, port 11435)
ansible-playbook playbooks/04_models.yml -K -e @local.yml -e "slot6_model=llama3.1:70b"

# Run against a subset of hosts
ansible-playbook playbooks/09_nginx.yml --limit nginx_proxy -K -e @local.yml

# Lint playbooks
ansible-lint playbooks/

# Install Galaxy dependencies
ansible-galaxy collection install -r requirements.yml

# Check mode (dry run)
ansible-playbook playbooks/site.yml --check --diff -K -e @local.yml

Required Local Configuration

Two gitignored files must exist before any playbook runs:

inventory/local.yml — per-host SSH overrides:

all:
  hosts:
    ai_server:
      ansible_host: <actual_ip>
      ansible_user: <ssh_user>
    nginx_proxy:
      ansible_host: <actual_ip>
    coredns_host:
      ansible_host: <actual_ip>

local.yml — play-level variable overrides (domain, platform_name, SSL cert paths, etc.)

Vault runtime credentials live in vault/.vault-token and vault/.vault-init.json (written by 01_vault.yml on first run).

Architecture

Three-Host Model

nginx_proxy (172.0.0.30)     — NGINX TLS termination for all public-facing services
ai_server (172.0.0.100)      — Ollama, Keycloak, Qdrant, Open WebUI, Vault, OpenClaw
coredns_host (172.0.0.29)    — CoreDNS zone file, Vault data mount

Vault runs on ai_server at 127.0.0.1:8202 only; NGINX proxies https://vault.tftsr.com → ai_server:8202. The same NGINX-as-TLS-terminator pattern applies to all services.

Playbook Sequence

site.yml imports 00_preflight.yml through 11_vault_oidc.yml in order. Each can be run standalone. The canonical sequence matters for first-run because:

  • 01_vault.yml must precede all others (secrets don't exist yet)
  • 05_keycloak.yml must precede 07_openwebui.yml (OIDC client_secret written to Vault by Keycloak role, read by OpenWebUI role)
  • 03_benchmark.yml must precede 04_models.yml (produces model_selection.json)

Secrets Flow

All credentials live exclusively in Vault under secret/data/{{ vault_project_slug }}/*. Playbooks retrieve them using either:

  • community.hashi_vault.hashi_vault lookup plugin
  • ansible.builtin.uri REST calls with X-Vault-Token header from vault/.vault-token

Idempotency rule: secrets are written to Vault only when the key does not already exist. Re-running never rotates credentials. To rotate: vault kv delete secret/<slug>/<path> then re-run the relevant playbook.

Dynamic Benchmark → Model Slot Pipeline

03_benchmark.yml tests every locally-installed Ollama model against 6 prompts (3 coding, 3 general + 1 latency test), scores each, and writes benchmarks/results/model_selection.json. 04_models.yml reads that JSON to decide which models to pull and keep warm.

Composite score formula:

composite = (quality × 0.45) + (tokens_per_sec / ceiling, capped at 1.0) × 0.30 + (1 - ttft_ms/5000, floored at 0) × 0.25

benchmark_toks_norm_ceiling defaults to 40 (dual-socket target).

Slot classification: if coding_composite - general_composite >= 0.10 (configurable via benchmark_coding_threshold), model goes to a coding slot; otherwise general.

6 warm-up slots across two NUMA instances:

  • Node 1 (port 11434): slots 1–2 locked general + slot 5 rotatable general
  • Node 0 (port 11435): slots 3–4 locked coding + slot 6 rotatable coding
  • Slots 5/6 rotatable via -e slot5_model=<name> / -e slot6_model=<name> without re-benchmarking

04_models.yml creates Modelfiles (coder-128k, coder-32k, coder-rotate, llama-family, gemma-family) and two warmup services: ollama-warmup.service (Node 1) and ollama-warmup-node0.service (Node 0).

Benchmark alias filter: benchmark_skip_aliases in group_vars/all.yml lists the Modelfile aliases — the benchmark playbook excludes these from the test loop to prevent 32k-token KV-cache allocations from stalling the run.

Key Variables

All tuneable defaults live in inventory/group_vars/all.yml. The two most commonly changed clusters:

  • candidate_models list — which models to auto-pull before benchmarking
  • benchmark_thresholds block — min scores and normalization ceiling

ollama_numa_node and ollama_cpu_affinity are tuned for the Dell M630 dual-socket layout (NUMA node 1 holds ~120 GB free RAM); adjust these for other hardware.

Docker Services

Keycloak, Qdrant, and Open WebUI run as Docker containers managed by community.docker.docker_container. Service-to-service calls use host.docker.internal (Docker bridge). Ollama and Vault run as native systemd services, not containers.