Articles
Technical articles and research notes by ksh3. Covering infrastructure, LLM, software tools, workflows, and more.
Personal tech articles and research notes from work and hobby projects.
Infrastructure(1 article)
Server hardware, network topology, container orchestration, monitoring, and GPU environment documentation.
Key topics: AMD EPYC 9175F, MikroTik RouterOS, Podman/Quadlet, Ubuntu Server, Prometheus/Grafana, 10GbE Networking, PostgreSQL, LLM Stack Deployment
Latest articles:
2025-03-01
Implementation of asynchronous data persistence for UI prompts and LLM responses using Dagster orchestration and NATS pub/sub messaging, including event routing, audit logging, and distributed system …
LLM Research(13 articles)
Large language model benchmarks, CPU/GPU inference validation, quantization testing, and optimization research.
Key topics: DeepSeek V3.2, Qwen3, Kimi K2.5, GLM-4.7, Llama 4, Hermes, MiniMax, EPYC 9175F inference optimization, GGUF quantization
Latest articles:
2026-02-27
Complete record of running the 229B MoE model MiniMax-2.5 on EPYC 9175F + RTX PRO 6000. Expert Offload benchmarks across three quantization levels (IQ5_K/IQ4_NL/IQ3_S), plus one-shot web generation …
2026-02-27
Qwen3.5-397B-A17B (397B total / 17B active MoE) deployed with IQ4_NL quantization on EPYC 9175F + GPU hybrid setup. 28 consecutive inference runs averaging TG 22.5tok/s, peak PP 372tok/s. Documenting …
2026-02-27
Llama-4-Scout (17B active / 16-expert MoE) benchmarked on EPYC 9175F CPU Q6_K inference and RTX PRO 6000 Blackwell Max-Q GPU nvfp4 inference. CPU 17tok/s vs GPU 30-60tok/s. Validating mmap cache …
Software Tools(8 articles)
Development tools, IDE configurations, MCP integrations, code analysis utilities, and web project implementations.
Key topics: VS Code Server, Zed, Serena MCP, ctree, Dagster, Django, Lightdash, shelpa
Latest articles:
2026-02-27
Two code generation validations using the 400B-class MoE model Qwen3.5-397B. One-shot generation of a 6-page dental clinic static site (HTML+Tailwind+Alpine.js) and single-turn generation of a …
2025-03-01
code-tree architecture and tool specifications, context compression and token cost reduction implementation strategy, operational flow through MCP integration
2025-03-01
Architecture design of MCP-compliant virtual shell server (shelpa-mcp), command routing, pipeline stage management, session CWD, and dual-write tee implementation — ultimately abandoned due to model …
Workflows(2 articles)
Development workflows, coding philosophy, AI agent configurations, prompt specifications, and automation practices.
Key topics: Coding philosophy, LLM agent operations, LTX2 prompt specifications, bilingual proofreading, local AI development environment
Latest articles:
2026-02-27
Structured prompt specifications for LTX-2 video generation. Covers the 36-scene horror scenario template with mandatory dialogue, cinematic shot design principles, and multi-scene visual continuity …
2026-02-26
This document defines AI prompts for engineers to translate English to Japanese and to proofread Japanese into 'English-translation-friendly Japanese'.
Architecture(1 article)
System architecture designs, distributed pipeline patterns, and migration records.
Key topics: Rust, NATS, Dagster, OpenAI Proxy, SSE Streaming, Go Migration
Latest articles:
2026-02-27
Rust(axum) OpenAI-compatible proxy, NATS Core/JetStream event relay, Dagster oneshot job execution, PG idempotency design, Qdrant semantic cache, SSE streaming, Quadlet/systemd integration. Plus the …

