Shaun Arman sarman

sarman cerere de tragere comasată sarman/tftsr_ai#1

feat(benchmark): three-pass benchmark pipeline with NUMA fix and model diversity

1 zi în urmă

sarman cerere de tragere închisă sarman/tftsr_ai#1

feat(benchmark): three-pass benchmark pipeline with NUMA fix and model diversity

1 zi în urmă

sarman a creat cererea de tragere sarman/tftsr_ai#1

feat(benchmark): three-pass benchmark pipeline with NUMA fix and model diversity

1 zi în urmă

sarman a împins spre master la sarman/tftsr_ai

  • fb7c5f1061 Tune Ollama performance: fix HT regression, add OS tuning, KV cache quant - Remove stale play-level vars from 02_infrastructure.yml that overrode group_vars/all.yml and silently re-enabled 28 HT threads + full HT affinity list on every site.yml run; correct values (14 physical cores, physical-only CPU affinity) now flow exclusively from group_vars - Add os-tune task block: sysctl (numa_balancing=0, swappiness=1, overcommit_memory=1), THP=madvise, CPU governor=performance; each setting persisted via /etc/sysctl.d/ or a oneshot systemd service - Add OLLAMA_KV_CACHE_TYPE=q8_0 to override.conf.j2; halves KV cache memory bandwidth vs fp16 with negligible quality loss - Promote ollama thread/affinity config to group_vars with corrected physical-core-only values; add ollama_binary_path var - Refine benchmark scoring: per-prompt quality weights for debug and refactor prompts; update toks_norm_ceiling to 22.5 tok/sec observed - Add baseline_models group var; use it in 04_models.yml instead of hardcoded list; fix gemma-family Modelfile to use llama3.1:8b - Add optional AWS Bedrock OpenAI-compatible API integration to 07_openwebui.yml; token stored/retrieved from Vault, conditionally wired into Open WebUI container env - Commit latest benchmark runs and updated model_selection.json (gemma3:12b added to general pool, slot2_general populated)

5 zile în urmă

sarman a împins spre master la sarman/tftsr_ai

  • 75e9ea03bc Fix benchmark scoring, classification, and Keycloak userinfo roles Benchmark (03_benchmark.yml, all.yml): - Add has_code_block and has_import signals to coding heuristic; reduce has_assert/has_test_def weights so debug/refactor prompts aren't penalised - Add has_list and has_detail to general heuristic; cut length_score weight from 0.60 to 0.35 to reduce verbosity dominance - Replace composite-delta classification (always ~0) with 3-tier logic: override dict -> raw quality delta -> name pattern (coder/codestral/etc.) - Lower toks_norm ceiling 30 -> 22 to match observed Dell M630 hardware max - Add model_category_overrides variable for manual classification escape hatch - Result: deepseek-coder-v2 and qwen2.5-coder:7b now correctly land in slots 3/4 (coding); duplicate llama3.2:3b in slot 3 eliminated Keycloak (05_keycloak.yml): - Add oidc-usermodel-realm-role-mapper to open-webui client so realm_access.roles is included in the userinfo endpoint response; fixes Open WebUI resetting OIDC users to pending on every login despite having ai-admin in Keycloak

6 zile în urmă

sarman a împins spre master la sarman/tftsr_ai

  • 068427e60d Improve benchmark scoring and promote thresholds to group vars - Move benchmark_toks_norm_ceiling and benchmark_coding_threshold into group_vars/all.yml so they can be tuned per environment without touching playbook code - Fix min_composite_score to reference benchmark_thresholds instead of a hardcoded 0.50 default - Add ollama_api_key Vault lookup and Authorization header to benchmark API calls (API key auth was silently bypassed before) - Expand code quality scoring: add has_assert, has_test_def, has_docstring, has_type_hint signals alongside existing has_def/has_return - Reference benchmark_coding_threshold variable in category classification and benchmark report output - Fix min_composite_score | float cast to avoid Jinja2 type comparison errors
  • 9472c9dbe4 Fix warmup service bugs and add CLAUDE.md Three bugs fixed in the model warm-up pipeline: - warmup.sh.j2: replace undefined slot1_model/slot2_model/slot3_model/slot4_model variables with correct model_selection.slot*_general/coding references; skip slot4 warmup when value is 'none' - 04_models.yml: add missing ollama_api_key Vault lookup to vars block so the warmup script template can resolve the variable - 04_models.yml: fix warmup service template path (templates/systemd/, not templates/ollama/) Also adds CLAUDE.md with project guidance and updated benchmark results from today's run.
  • Vizualizați comparația pentru aceste 2 consemnări »

6 zile în urmă

sarman a împins spre master la sarman/tftsr_ai

  • c9457bb38b Initial release: full-stack local AI platform automation Provisions and manages a 3-host local AI inference platform via Ansible: Infrastructure: - HashiCorp Vault (systemd, KV v2) for centralized secret management with idempotent secret generation — credentials never overwritten on re-run - Docker CE + Ollama with NUMA/CPU affinity tuning for Dell M630 hardware - Keycloak 24.x SSO/OIDC with KC_PROXY_HEADERS and full HTTPS hostname config - Open WebUI with Keycloak OIDC, Qdrant RAG, and role-based access control - Qdrant vector database for RAG pipelines - NGINX reverse proxy with Let's Encrypt TLS termination - CoreDNS zone management with automatic container reload - OpenClaw Telegram bot (Python, python-telegram-bot) proxying to Ollama - Vault OIDC login via Keycloak — ai-admin role required Automation: - Full deploy in dependency order via deploy_ai.yml (idempotent, safe to re-run) - Model benchmarking with composite scoring; auto-selects 4 warm-up slots - Slot 4 rotatable at runtime: -e slot4_model=<name> - Credential rotation: delete Vault path, re-run deploy_ai.yml Configuration: - All environment-specific values are variables with generic defaults - Two gitignored local files: inventory/local.yml (SSH), local.yml (vars) - Zero hardcoded IPs, domains, usernames, or platform names in tracked files

6 zile în urmă

sarman a creat o ramură nouă master la sarman/tftsr_ai

6 zile în urmă

sarman a creat un repozitoriu sarman/tftsr_ai

6 zile în urmă