ai-platform -- Local AI Server Automation
Ansible automation for full lifecycle management of a server as a local AI inference platform. This project provisions, configures, benchmarks, and maintains every service required to run Ollama-based LLM inference behind NGINX reverse proxy with SSO, vector search (RAG), DNS, secret management, and Telegram bot access -- all driven by a single ansible-playbook deploy_ai.yml command.
|
|
1 day ago | |
|---|---|---|
| benchmarks | 1 day ago | |
| inventory | 1 day ago | |
| playbooks | 1 day ago | |
| roles | 5 days ago | |
| templates | 1 day ago | |
| tftsr_nginx-hardening | 5 days ago | |
| vault | 5 days ago | |
| .gitignore | 5 days ago | |
| CLAUDE.md | 5 days ago | |
| README.md | 5 days ago | |
| ansible.cfg | 5 days ago | |
| deploy_ai.yml | 5 days ago | |
| requirements.yml | 5 days ago |
Ansible automation for full lifecycle management of a server as a
local AI inference platform. This project provisions, configures, benchmarks, and
maintains every service required to run Ollama-based LLM inference behind NGINX
reverse proxy with SSO, vector search (RAG), DNS, secret management, and Telegram
bot access -- all driven by a single ansible-playbook deploy_ai.yml command.
┌──────────────┐
│ Internet │
└──────┬───────┘
│
┌────────▼────────┐
│ nginx_proxy │
│ 192.168.1.30 │
│ NGINX reverse │
│ proxy + TLS │
└──┬──────────┬───┘
│ │
┌───────────────▼┐ ┌────▼──────────────────────┐
│ coredns_host │ │ ai_server │
│ 192.168.1.29 │ │ 192.168.1.100 │
│ │ │ │
│ - CoreDNS │ │ - Ollama (LLM inference) │
└────────────────┘ │ - Open WebUI │
│ - Keycloak (SSO/OIDC) │
│ - HashiCorp Vault │
│ - Qdrant (vector DB) │
│ - OpenClaw (Telegram bot) │
└───────────────────────────┘
| Host | IP Address | Purpose |
|---|---|---|
nginx_proxy |
192.168.1.30 | NGINX reverse proxy, TLS termination |
coredns_host |
192.168.1.29 | CoreDNS |
ai_server |
192.168.1.100 | Ollama, Open WebUI, Keycloak, Vault, Qdrant, OpenClaw |
These are the default values in
inventory/group_vars/all.yml. Override for your environment — see Configuration below.
| Service | URL (default domain: example.com) |
|---|---|
| Open WebUI | https://ollama-ui.example.com |
| Ollama API | https://ollama-api.example.com |
| Keycloak | https://idm.example.com |
| Vault | https://vault.example.com |
All environment-specific values are variables with generic defaults in
inventory/group_vars/all.yml. Override them in local.yml (gitignored).
| Variable | Default | Description |
|---|---|---|
domain |
example.com |
Base domain for all service URLs |
ai_server_ip |
192.168.1.100 |
IP of the AI inference server |
nginx_proxy_ip |
192.168.1.30 |
IP of the NGINX reverse proxy |
coredns_host_ip |
192.168.1.29 |
IP of the CoreDNS host |
ansible_user |
admin |
SSH user on all managed hosts |
platform_name |
"AI Platform" |
Display name used in WebUI, Keycloak, and summaries |
vault_project_slug |
"ai-platform" |
Slug for Keycloak realm name and Vault secret paths |
nginx_ssl_cert |
/etc/nginx/ssl/{{ domain }}.crt |
Path to TLS certificate on nginx_proxy |
nginx_ssl_key |
/etc/nginx/ssl/{{ domain }}.key |
Path to TLS private key on nginx_proxy |
If you use Let's Encrypt, override
nginx_ssl_certandnginx_ssl_keyinlocal.ymlto point to your certbot paths (e.g./etc/letsencrypt/live/your-domain/fullchain.pem).
Configuration is split across two gitignored files — create both before first run.
inventory/local.yml — SSH connection details (host IPs and user):
# inventory/local.yml
all:
hosts:
ai_server:
ansible_host: 10.0.1.50
ansible_user: myuser
nginx_proxy:
ansible_host: 10.0.1.10
ansible_user: myuser
coredns_host:
ansible_host: 10.0.1.9
ansible_user: myuser
Ansible reads the inventory/ directory automatically (ansible.cfg sets
inventory = inventory/), so inventory/local.yml is merged with
inventory/hosts.yml on every run — no extra flags needed.
The inventory/ directory also contains group_vars/ and host_vars/, which
ensures Ansible finds them regardless of which playbook is run directly.
local.yml — play variables (domain, platform identity, SSL certs, etc.):
# local.yml
domain: mylab.internal
ai_server_ip: 10.0.1.50
nginx_proxy_ip: 10.0.1.10
coredns_host_ip: 10.0.1.9
platform_name: "My AI Platform"
vault_project_slug: my-ai
nginx_ssl_cert: /etc/letsencrypt/live/mylab.internal/fullchain.pem
nginx_ssl_key: /etc/letsencrypt/live/mylab.internal/privkey.pem
ai_server_ip,nginx_proxy_ip, andcoredns_host_ipappear in both files.inventory/local.ymlcontrols where Ansible SSHs to;local.ymlcontrols what gets rendered into config files and DNS records.
-e flags (no local.yml)ansible-playbook deploy_ai.yml -K \
-e "domain=mylab.internal" \
-e "ai_server_ip=10.0.1.50" \
-e "nginx_proxy_ip=10.0.1.10" \
-e "coredns_host_ip=10.0.1.9" \
-e "platform_name='My AI Platform'" \
-e "vault_project_slug=my-ai" \
-e "nginx_ssl_cert=/etc/letsencrypt/live/mylab.internal/fullchain.pem" \
-e "nginx_ssl_key=/etc/letsencrypt/live/mylab.internal/privkey.pem"
inventory/local.ymlmust still exist for SSH to work — inline-eflags cannot set per-host connection variables.
Ansible Galaxy collections:
ansible-galaxy collection install -r requirements.yml
git clone <repo>
cd ai-platform
ansible-galaxy collection install -r requirements.yml
# 1. Create inventory/local.yml with your host IPs and SSH user (gitignored)
# 2. Create local.yml with your domain, platform name, SSL cert paths, etc. (gitignored)
# See the Configuration section above for the contents of each file.
# 3. Deploy
ansible-playbook deploy_ai.yml -K -e @local.yml
-Kprompts for the sudo (become) password on the remote hosts.
All secrets (API keys, passwords, OIDC client secrets) are stored in HashiCorp Vault
and only written once — re-running any playbook will never overwrite an existing
secret. This means deploy_ai.yml is safe to re-run at any time without breaking
running services.
To rotate a specific credential, delete it from Vault and re-run the full deploy:
# Example: rotate Keycloak credentials
vault kv delete secret/<vault_project_slug>/keycloak
ansible-playbook deploy_ai.yml -K -e @local.yml
New credentials will be generated, stored in Vault, and all dependent services (Keycloak, Open WebUI, Vault OIDC) will be redeployed in the correct order automatically.
Vault UI supports two login methods:
vault/.vault-init.json (emergency/admin use only)OIDC, role default, click Sign in with OIDC Provider,
authenticate via Keycloak. Only users with the ai-admin Keycloak role can log in.Users are created in Keycloak at https://idm.<domain>/admin/. Assign roles
from the platform realm (not the master realm):
| Role | Open WebUI | Vault OIDC |
|---|---|---|
ai-user |
✅ Standard access | ❌ Blocked |
ai-admin |
✅ Admin access | ✅ Full access |
| (none) | ❌ Blocked | ❌ Blocked |
The platform exposes two API endpoints for coding tools (aider, Continue.dev, Cursor, etc.). Users should connect via Open WebUI — it enforces Keycloak authentication and issues per-user API keys. Direct Ollama access is for service accounts and admin use only.
Each user authenticates through Keycloak and has their own API key. Open WebUI exposes an OpenAI-compatible API that all major coding agent tools support.
Step 1 — Generate your personal API key:
https://ollama-ui.<domain> and log in via SSOStep 2 — Configure your coding tool:
| Setting | Value |
|---|---|
| Base URL | https://ollama-ui.<domain>/api |
| API key | your personal Open WebUI key |
| Model | any model name shown in the WebUI |
Aider:
aider --openai-api-base https://ollama-ui.<domain>/api \
--openai-api-key <your-openwebui-key> \
--model deepseek-coder-v2:latest
Continue.dev (~/.continue/config.json):
{
"models": [
{
"title": "AI Platform",
"provider": "openai",
"model": "deepseek-coder-v2:latest",
"apiBase": "https://ollama-ui.<domain>/api",
"apiKey": "<your-openwebui-key>"
}
]
}
Cursor / VS Code — add a custom OpenAI-compatible provider pointing to
https://ollama-ui.<domain>/api with your personal key.
The Ollama API endpoint is protected by a single shared key stored in Vault. It is intended for internal service-to-service calls and admin use — not for individual users.
Retrieve the Ollama API key from Vault:
vault kv get -field=api_key secret/<vault_project_slug>/ollama
| Setting | Value |
|---|---|
| Base URL | https://ollama-api.<domain>/v1 |
| API key | Ollama API key from Vault |
| Model | any installed Ollama model name |
Aider:
aider --openai-api-base https://ollama-api.<domain>/v1 \
--openai-api-key <ollama-api-key> \
--model deepseek-coder-v2:latest
Note: Direct Ollama access bypasses Keycloak auth and usage tracking. Rotate the key via
vault kv delete secret/<vault_project_slug>/ollamaand re-runplaybooks/02_infrastructure.yml.
The benchmark playbook automatically selects the best coding models and keeps them warm.
Check the current slot assignments in benchmarks/results/model_selection.json:
python3 -m json.tool benchmarks/results/model_selection.json | grep slot
Slots 3–6 are coding-classified models, all running on the Node 0 instance at port 11435.
Use slot3_coding (the highest-scoring coding model) as your primary model. Connect coding
tools directly to https://ollama-api.<domain> (proxied from port 11434, Node 1) or to
Open WebUI which load-balances across both instances.
Full deploy / idempotent re-run:
ansible-playbook deploy_ai.yml -K -e @local.yml
Pre-flight checks only:
ansible-playbook deploy_ai.yml -K -e @local.yml --tags preflight
Skip benchmarking on re-runs (faster):
ansible-playbook deploy_ai.yml -K -e @local.yml --skip-tags benchmark
Vault only:
ansible-playbook playbooks/01_vault.yml -K -e @local.yml
Docker + Ollama only:
ansible-playbook playbooks/02_infrastructure.yml -K -e @local.yml
Re-benchmark all installed models:
ansible-playbook playbooks/03_benchmark.yml -K -e @local.yml
Benchmark specific models only:
ansible-playbook playbooks/03_benchmark.yml -K -e @local.yml \
-e "benchmark_models=qwen2.5-coder:14b-instruct-q4_K_M,codestral:22b-v0.1-q4_K_M"
Override tier boundaries or timeouts (see benchmarks/README.md):
ansible-playbook playbooks/03_benchmark.yml -K -e @local.yml \
-e "benchmark_small_max_gb=8 benchmark_medium_max_gb=20"
Pull recommended models if scores are below threshold:
ansible-playbook playbooks/03_benchmark.yml -K -e @local.yml -e "pull_if_better=true"
Update warm-up slots after a benchmark:
ansible-playbook playbooks/04_models.yml -K -e @local.yml
Rotate slot 5 (general) or slot 6 (coding) to a specific model:
# Swap general rotate slot
ansible-playbook playbooks/04_models.yml -K -e @local.yml -e "slot5_model=mistral:latest"
# Swap coding rotate slot
ansible-playbook playbooks/04_models.yml -K -e @local.yml -e "slot6_model=llama3.1:70b"
# Both at once
ansible-playbook playbooks/04_models.yml -K -e @local.yml -e "slot5_model=mistral:latest" -e "slot6_model=command-r:35b"
# Reset both rotate slots back to benchmark recommendations
ansible-playbook playbooks/04_models.yml -K -e @local.yml
Redeploy Keycloak only:
ansible-playbook playbooks/05_keycloak.yml -K -e @local.yml
Redeploy Open WebUI only:
ansible-playbook playbooks/07_openwebui.yml -K -e @local.yml
Update NGINX configs only:
ansible-playbook playbooks/09_nginx.yml -K -e @local.yml
Update CoreDNS records only:
ansible-playbook playbooks/10_coredns.yml -K -e @local.yml
Configure Keycloak SSO login for Vault UI:
ansible-playbook playbooks/11_vault_oidc.yml -K -e @local.yml
Six models are kept warm across two Ollama instances (OLLAMA_MAX_LOADED_MODELS=3 each, OLLAMA_KEEP_ALIVE=-1). Slots are filled automatically by the benchmark playbook — no model names are hardcoded.
NUMA Node 1 — ollama.service — port 11434 (general models)
NUMA Node 0 — ollama-node0.service — port 11435 (coding models)
| Slot | Instance | Port | Role | Selection | Rotation |
|---|---|---|---|---|---|
| 1 | Node 1 | 11434 | General primary (locked) | Top general composite score | Replaced only by re-benchmark |
| 2 | Node 1 | 11434 | General secondary (locked) | 2nd general composite score | Replaced only by re-benchmark |
| 5 | Node 1 | 11434 | General rotate | 3rd general composite score | -e slot5_model=<name> |
| 3 | Node 0 | 11435 | Coding primary (locked) | Top coding composite score | Replaced only by re-benchmark |
| 4 | Node 0 | 11435 | Coding secondary (locked) | 2nd coding composite score | Replaced only by re-benchmark |
| 6 | Node 0 | 11435 | Coding rotate | 3rd coding composite score | -e slot6_model=<name> |
Classification rule: a model is classified coding if its coding composite score exceeds its general composite score by ≥ 0.10; otherwise general.
Modelfile aliases (coder-128k, coder-32k, coder-rotate, llama-family, gemma-family) are excluded from benchmarking to prevent KV-cache allocation stalls.
After a full deploy_ai.yml run, verify the deployment (substitute your actual domain and IPs):
curl -s https://vault.example.com/v1/sys/health returns initialized: true, sealed: falsedefault, authenticate with an ai-admin Keycloak usercurl -s https://ollama-api.example.com/api/tags returns model listai-user or ai-adminadmin credentials from Vaultcurl -s http://<ai_server_ip>:6333/healthz returns OKdig @<coredns_host_ip> vault.example.com returns <nginx_proxy_ip>ssh <nginx_proxy_ip> 'sudo nginx -t' passesbenchmarks/results/benchmark_<timestamp>.md for latest resultscurl -s -H "Authorization: Bearer <key>" http://<ai_server_ip>:11435/api/tags returns model listsystemctl status ollama-warmup ollama-warmup-node0 both show active (exited)| Role | README | Purpose |
|---|---|---|
| preflight | roles/preflight/README.md | Pre-flight validation |
| hashi_vault | roles/hashi_vault/README.md | HashiCorp Vault deployment |
| docker | roles/docker/README.md | Docker CE installation |
| ollama | roles/ollama/README.md | Ollama inference server |
| benchmark | roles/benchmark/README.md | Model benchmarking |
| models | roles/models/README.md | Model lifecycle management |
| keycloak | roles/keycloak/README.md | Keycloak SSO/OIDC |
| qdrant | roles/qdrant/README.md | Qdrant vector database |
| openwebui | roles/openwebui/README.md | Open WebUI deployment |
| openclaw | roles/openclaw/README.md | OpenClaw Telegram bot |
| nginx | roles/nginx/README.md | NGINX reverse proxy |
| coredns | roles/coredns/README.md | CoreDNS zone management |
vault/.vault-init.json and vault/.vault-token are gitignored -- they contain
Vault unseal keys and root tokens. Never commit these files.local.yml and inventory/local.yml are gitignored -- they contain your
environment-specific IPs, usernames, and cert paths. Never commit these files.OLLAMA_API_KEY to prevent unauthenticated access.