ai-platform -- Local AI Server Automation
Ansible automation for full lifecycle management of a server as a local AI inference platform. This project provisions, configures, benchmarks, and maintains every service required to run Ollama-based LLM inference behind NGINX reverse proxy with SSO, vector search (RAG), DNS, secret management, and Telegram bot access -- all driven by a single ansible-playbook deploy_ai.yml command.

25 Commits

2 Branches

0 Releases

Shaun Arman 342cbd123d Merge branch 'feature/three-pass-benchmark' of sarman/tftsr_ai into master		1 day ago
benchmarks	d9450d0c08 fix(benchmark): refine model selection and enhance evaluation metrics	1 day ago
inventory	3b9e8951df fix(benchmark): prevent deepseek:latest re-pull; Run 7 achieves target Node 0 layout	1 day ago
playbooks	bf99e921b9 refactor(benchmark): remove Handoff documentation and update benchmark metrics	1 day ago
roles	55d412f85d Add three-pass benchmark with size-aware tier routing	5 days ago
templates	bf99e921b9 refactor(benchmark): remove Handoff documentation and update benchmark metrics	1 day ago
tftsr_nginx-hardening	f188c046ed Initial commit	5 days ago
vault	c9457bb38b Initial release: full-stack local AI platform automation	5 days ago
.gitignore	c9457bb38b Initial release: full-stack local AI platform automation	5 days ago
CLAUDE.md	55d412f85d Add three-pass benchmark with size-aware tier routing	5 days ago
README.md	55d412f85d Add three-pass benchmark with size-aware tier routing	5 days ago
ansible.cfg	c9457bb38b Initial release: full-stack local AI platform automation	5 days ago
deploy_ai.yml	c9457bb38b Initial release: full-stack local AI platform automation	5 days ago
requirements.yml	c9457bb38b Initial release: full-stack local AI platform automation	5 days ago

ai-platform -- Local AI Server Automation

Ansible automation for full lifecycle management of a server as a local AI inference platform. This project provisions, configures, benchmarks, and maintains every service required to run Ollama-based LLM inference behind NGINX reverse proxy with SSO, vector search (RAG), DNS, secret management, and Telegram bot access -- all driven by a single ansible-playbook deploy_ai.yml command.

Architecture

                         ┌──────────────┐
                         │   Internet   │
                         └──────┬───────┘
                                │
                       ┌────────▼────────┐
                       │  nginx_proxy    │
                       │  192.168.1.30   │
                       │  NGINX reverse  │
                       │  proxy + TLS    │
                       └──┬──────────┬───┘
                          │          │
          ┌───────────────▼┐    ┌────▼──────────────────────┐
          │ coredns_host   │    │ ai_server                 │
          │ 192.168.1.29   │    │ 192.168.1.100             │
          │                │    │                           │
          │ - CoreDNS      │    │ - Ollama (LLM inference)  │
          └────────────────┘    │ - Open WebUI              │
                                │ - Keycloak (SSO/OIDC)     │
                                │ - HashiCorp Vault         │
                                │ - Qdrant (vector DB)      │
                                │ - OpenClaw (Telegram bot) │
                                └───────────────────────────┘

Infrastructure Map

Host	IP Address	Purpose
`nginx_proxy`	192.168.1.30	NGINX reverse proxy, TLS termination
`coredns_host`	192.168.1.29	CoreDNS
`ai_server`	192.168.1.100	Ollama, Open WebUI, Keycloak, Vault, Qdrant, OpenClaw

These are the default values in inventory/group_vars/all.yml. Override for your environment — see Configuration below.

Service URLs

Service	URL (default `domain: example.com`)
Open WebUI	https://ollama-ui.example.com
Ollama API	https://ollama-api.example.com
Keycloak	https://idm.example.com
Vault	https://vault.example.com

Configuration

All environment-specific values are variables with generic defaults in inventory/group_vars/all.yml. Override them in local.yml (gitignored).

Variable	Default	Description
`domain`	`example.com`	Base domain for all service URLs
`ai_server_ip`	`192.168.1.100`	IP of the AI inference server
`nginx_proxy_ip`	`192.168.1.30`	IP of the NGINX reverse proxy
`coredns_host_ip`	`192.168.1.29`	IP of the CoreDNS host
`ansible_user`	`admin`	SSH user on all managed hosts
`platform_name`	`"AI Platform"`	Display name used in WebUI, Keycloak, and summaries
`vault_project_slug`	`"ai-platform"`	Slug for Keycloak realm name and Vault secret paths
`nginx_ssl_cert`	`/etc/nginx/ssl/{{ domain }}.crt`	Path to TLS certificate on nginx_proxy
`nginx_ssl_key`	`/etc/nginx/ssl/{{ domain }}.key`	Path to TLS private key on nginx_proxy

If you use Let's Encrypt, override nginx_ssl_cert and nginx_ssl_key in local.yml to point to your certbot paths (e.g. /etc/letsencrypt/live/your-domain/fullchain.pem).

Setup: two gitignored local files

Configuration is split across two gitignored files — create both before first run.

inventory/local.yml — SSH connection details (host IPs and user):

# inventory/local.yml
all:
  hosts:
    ai_server:
      ansible_host: 10.0.1.50
      ansible_user: myuser
    nginx_proxy:
      ansible_host: 10.0.1.10
      ansible_user: myuser
    coredns_host:
      ansible_host: 10.0.1.9
      ansible_user: myuser

Ansible reads the inventory/ directory automatically (ansible.cfg sets inventory = inventory/), so inventory/local.yml is merged with inventory/hosts.yml on every run — no extra flags needed.

The inventory/ directory also contains group_vars/ and host_vars/, which ensures Ansible finds them regardless of which playbook is run directly.

local.yml — play variables (domain, platform identity, SSL certs, etc.):

# local.yml
domain: mylab.internal
ai_server_ip: 10.0.1.50
nginx_proxy_ip: 10.0.1.10
coredns_host_ip: 10.0.1.9
platform_name: "My AI Platform"
vault_project_slug: my-ai
nginx_ssl_cert: /etc/letsencrypt/live/mylab.internal/fullchain.pem
nginx_ssl_key: /etc/letsencrypt/live/mylab.internal/privkey.pem

ai_server_ip, nginx_proxy_ip, and coredns_host_ip appear in both files. inventory/local.yml controls where Ansible SSHs to; local.yml controls what gets rendered into config files and DNS records.

Alternative: inline `-e` flags (no local.yml)

ansible-playbook deploy_ai.yml -K \
  -e "domain=mylab.internal" \
  -e "ai_server_ip=10.0.1.50" \
  -e "nginx_proxy_ip=10.0.1.10" \
  -e "coredns_host_ip=10.0.1.9" \
  -e "platform_name='My AI Platform'" \
  -e "vault_project_slug=my-ai" \
  -e "nginx_ssl_cert=/etc/letsencrypt/live/mylab.internal/fullchain.pem" \
  -e "nginx_ssl_key=/etc/letsencrypt/live/mylab.internal/privkey.pem"

inventory/local.yml must still exist for SSH to work — inline -e flags cannot set per-host connection variables.

Prerequisites

Ansible 2.14+
Python 3.9+
SSH access to all 3 hosts
sudo privileges on all 3 hosts

Ansible Galaxy collections:

ansible-galaxy collection install -r requirements.yml

First-Run Quickstart

git clone <repo>
cd ai-platform
ansible-galaxy collection install -r requirements.yml

# 1. Create inventory/local.yml with your host IPs and SSH user (gitignored)
# 2. Create local.yml with your domain, platform name, SSL cert paths, etc. (gitignored)
# See the Configuration section above for the contents of each file.

# 3. Deploy
ansible-playbook deploy_ai.yml -K -e @local.yml

-K prompts for the sudo (become) password on the remote hosts.

Credential Management

All secrets (API keys, passwords, OIDC client secrets) are stored in HashiCorp Vault and only written once — re-running any playbook will never overwrite an existing secret. This means deploy_ai.yml is safe to re-run at any time without breaking running services.

Credential rotation

To rotate a specific credential, delete it from Vault and re-run the full deploy:

# Example: rotate Keycloak credentials
vault kv delete secret/<vault_project_slug>/keycloak
ansible-playbook deploy_ai.yml -K -e @local.yml

New credentials will be generated, stored in Vault, and all dependent services (Keycloak, Open WebUI, Vault OIDC) will be redeployed in the correct order automatically.

Vault login

Vault UI supports two login methods:

Token — use the root token from vault/.vault-init.json (emergency/admin use only)
OIDC — select method OIDC, role default, click Sign in with OIDC Provider, authenticate via Keycloak. Only users with the ai-admin Keycloak role can log in.

User Roles

Users are created in Keycloak at https://idm.<domain>/admin/. Assign roles from the platform realm (not the master realm):

Role	Open WebUI	Vault OIDC
`ai-user`	✅ Standard access	❌ Blocked
`ai-admin`	✅ Admin access	✅ Full access
(none)	❌ Blocked	❌ Blocked

Connecting Coding Agents

The platform exposes two API endpoints for coding tools (aider, Continue.dev, Cursor, etc.). Users should connect via Open WebUI — it enforces Keycloak authentication and issues per-user API keys. Direct Ollama access is for service accounts and admin use only.

Option A — Via Open WebUI (recommended for users)

Each user authenticates through Keycloak and has their own API key. Open WebUI exposes an OpenAI-compatible API that all major coding agent tools support.

Step 1 — Generate your personal API key:

Browse to https://ollama-ui.<domain> and log in via SSO
Click your avatar (top-right) → Settings → Account
Scroll to API Keys → Create new secret key
Copy the key — it is only shown once

Step 2 — Configure your coding tool:

Setting	Value
Base URL	`https://ollama-ui.<domain>/api`
API key	your personal Open WebUI key
Model	any model name shown in the WebUI

Aider:

aider --openai-api-base https://ollama-ui.<domain>/api \
      --openai-api-key  <your-openwebui-key> \
      --model           deepseek-coder-v2:latest

Continue.dev (~/.continue/config.json):

{
  "models": [
    {
      "title": "AI Platform",
      "provider": "openai",
      "model": "deepseek-coder-v2:latest",
      "apiBase": "https://ollama-ui.<domain>/api",
      "apiKey": "<your-openwebui-key>"
    }
  ]
}

Cursor / VS Code — add a custom OpenAI-compatible provider pointing to https://ollama-ui.<domain>/api with your personal key.

Option B — Direct Ollama API (admin / service accounts only)

The Ollama API endpoint is protected by a single shared key stored in Vault. It is intended for internal service-to-service calls and admin use — not for individual users.

Retrieve the Ollama API key from Vault:

vault kv get -field=api_key secret/<vault_project_slug>/ollama

Setting	Value
Base URL	`https://ollama-api.<domain>/v1`
API key	Ollama API key from Vault
Model	any installed Ollama model name

Aider:

aider --openai-api-base https://ollama-api.<domain>/v1 \
      --openai-api-key  <ollama-api-key> \
      --model           deepseek-coder-v2:latest

Note: Direct Ollama access bypasses Keycloak auth and usage tracking. Rotate the key via vault kv delete secret/<vault_project_slug>/ollama and re-run playbooks/02_infrastructure.yml.

Recommended models for coding

The benchmark playbook automatically selects the best coding models and keeps them warm. Check the current slot assignments in benchmarks/results/model_selection.json:

python3 -m json.tool benchmarks/results/model_selection.json | grep slot

Slots 3–6 are coding-classified models, all running on the Node 0 instance at port 11435. Use slot3_coding (the highest-scoring coding model) as your primary model. Connect coding tools directly to https://ollama-api.<domain> (proxied from port 11434, Node 1) or to Open WebUI which load-balances across both instances.

Day-2 Operations

Full deploy / idempotent re-run:

ansible-playbook deploy_ai.yml -K -e @local.yml

Pre-flight checks only:

ansible-playbook deploy_ai.yml -K -e @local.yml --tags preflight

Skip benchmarking on re-runs (faster):

ansible-playbook deploy_ai.yml -K -e @local.yml --skip-tags benchmark

Vault only:

ansible-playbook playbooks/01_vault.yml -K -e @local.yml

Docker + Ollama only:

ansible-playbook playbooks/02_infrastructure.yml -K -e @local.yml

Re-benchmark all installed models:

ansible-playbook playbooks/03_benchmark.yml -K -e @local.yml

Benchmark specific models only:

ansible-playbook playbooks/03_benchmark.yml -K -e @local.yml \
  -e "benchmark_models=qwen2.5-coder:14b-instruct-q4_K_M,codestral:22b-v0.1-q4_K_M"

Override tier boundaries or timeouts (see benchmarks/README.md):

ansible-playbook playbooks/03_benchmark.yml -K -e @local.yml \
  -e "benchmark_small_max_gb=8 benchmark_medium_max_gb=20"

Pull recommended models if scores are below threshold:

ansible-playbook playbooks/03_benchmark.yml -K -e @local.yml -e "pull_if_better=true"

Update warm-up slots after a benchmark:

ansible-playbook playbooks/04_models.yml -K -e @local.yml

Rotate slot 5 (general) or slot 6 (coding) to a specific model:

# Swap general rotate slot
ansible-playbook playbooks/04_models.yml -K -e @local.yml -e "slot5_model=mistral:latest"

# Swap coding rotate slot
ansible-playbook playbooks/04_models.yml -K -e @local.yml -e "slot6_model=llama3.1:70b"

# Both at once
ansible-playbook playbooks/04_models.yml -K -e @local.yml -e "slot5_model=mistral:latest" -e "slot6_model=command-r:35b"

# Reset both rotate slots back to benchmark recommendations
ansible-playbook playbooks/04_models.yml -K -e @local.yml

Redeploy Keycloak only:

ansible-playbook playbooks/05_keycloak.yml -K -e @local.yml

Redeploy Open WebUI only:

ansible-playbook playbooks/07_openwebui.yml -K -e @local.yml

Update NGINX configs only:

ansible-playbook playbooks/09_nginx.yml -K -e @local.yml

Update CoreDNS records only:

ansible-playbook playbooks/10_coredns.yml -K -e @local.yml

Configure Keycloak SSO login for Vault UI:

ansible-playbook playbooks/11_vault_oidc.yml -K -e @local.yml

Model Slot System

Six models are kept warm across two Ollama instances (OLLAMA_MAX_LOADED_MODELS=3 each, OLLAMA_KEEP_ALIVE=-1). Slots are filled automatically by the benchmark playbook — no model names are hardcoded.

NUMA Node 1 — ollama.service     — port 11434  (general models)
NUMA Node 0 — ollama-node0.service — port 11435 (coding models)

Slot	Instance	Port	Role	Selection	Rotation
1	Node 1	11434	General primary (locked)	Top general composite score	Replaced only by re-benchmark
2	Node 1	11434	General secondary (locked)	2nd general composite score	Replaced only by re-benchmark
5	Node 1	11434	General rotate	3rd general composite score	`-e slot5_model=<name>`
3	Node 0	11435	Coding primary (locked)	Top coding composite score	Replaced only by re-benchmark
4	Node 0	11435	Coding secondary (locked)	2nd coding composite score	Replaced only by re-benchmark
6	Node 0	11435	Coding rotate	3rd coding composite score	`-e slot6_model=<name>`

Classification rule: a model is classified coding if its coding composite score exceeds its general composite score by ≥ 0.10; otherwise general.

Modelfile aliases (coder-128k, coder-32k, coder-rotate, llama-family, gemma-family) are excluded from benchmarking to prevent KV-cache allocation stalls.

Verification Steps

After a full deploy_ai.yml run, verify the deployment (substitute your actual domain and IPs):

Vault health -- curl -s https://vault.example.com/v1/sys/health returns initialized: true, sealed: false
Vault OIDC login -- select OIDC method, role default, authenticate with an ai-admin Keycloak user
Ollama API -- curl -s https://ollama-api.example.com/api/tags returns model list
Open WebUI -- browse to https://ollama-ui.example.com, SSO login works with ai-user or ai-admin
Keycloak admin -- browse to https://idm.example.com/admin/, login with admin credentials from Vault
Qdrant health -- curl -s http://<ai_server_ip>:6333/healthz returns OK
CoreDNS resolution -- dig @<coredns_host_ip> vault.example.com returns <nginx_proxy_ip>
NGINX configs -- ssh <nginx_proxy_ip> 'sudo nginx -t' passes
OpenClaw -- send a message to the Telegram bot, confirm response using slot1_general model
Benchmark report -- check benchmarks/results/benchmark_<timestamp>.md for latest results
Node 0 Ollama -- curl -s -H "Authorization: Bearer <key>" http://<ai_server_ip>:11435/api/tags returns model list
Both warmup services -- systemctl status ollama-warmup ollama-warmup-node0 both show active (exited)

Role Reference

Role	README	Purpose
preflight	roles/preflight/README.md	Pre-flight validation
hashi_vault	roles/hashi_vault/README.md	HashiCorp Vault deployment
docker	roles/docker/README.md	Docker CE installation
ollama	roles/ollama/README.md	Ollama inference server
benchmark	roles/benchmark/README.md	Model benchmarking
models	roles/models/README.md	Model lifecycle management
keycloak	roles/keycloak/README.md	Keycloak SSO/OIDC
qdrant	roles/qdrant/README.md	Qdrant vector database
openwebui	roles/openwebui/README.md	Open WebUI deployment
openclaw	roles/openclaw/README.md	OpenClaw Telegram bot
nginx	roles/nginx/README.md	NGINX reverse proxy
coredns	roles/coredns/README.md	CoreDNS zone management

Security Notes

vault/.vault-init.json and vault/.vault-token are gitignored -- they contain Vault unseal keys and root tokens. Never commit these files.
local.yml and inventory/local.yml are gitignored -- they contain your environment-specific IPs, usernames, and cert paths. Never commit these files.
All service secrets (database passwords, API keys, OIDC client secrets) are stored in HashiCorp Vault and injected at deploy time. Secrets are never regenerated unless explicitly deleted from Vault.
Ollama API is protected by OLLAMA_API_KEY to prevent unauthenticated access.
TLS termination happens at the NGINX reverse proxy layer.
Open WebUI and Vault UI both require a valid Keycloak role to access via SSO.

README.md