Articles

Personal tech articles and research notes from work and hobby projects.

Infrastructure（1 article）

Server hardware, network topology, container orchestration, monitoring, and GPU environment documentation.

Key topics: AMD EPYC 9175F, MikroTik RouterOS, Podman/Quadlet, Ubuntu Server, Prometheus/Grafana, 10GbE Networking, PostgreSQL, LLM Stack Deployment

Latest articles:

Dagster + NATS Event-Driven Pipeline Design and Implementation
2025-03-01
Implementation of asynchronous data persistence for UI prompts and LLM responses using Dagster orchestration and NATS pub/sub messaging, including event routing, audit logging, and distributed system …

Browse all articles →

LLM Research（13 articles）

Large language model benchmarks, CPU/GPU inference validation, quantization testing, and optimization research.

Key topics: DeepSeek V3.2, Qwen3, Kimi K2.5, GLM-4.7, Llama 4, Hermes, MiniMax, EPYC 9175F inference optimization, GGUF quantization

Latest articles:

MiniMax-2.5 (229B MoE) Expert Offload and Web Generation: IQ5_K to IQ3_S
2026-02-27
Complete record of running the 229B MoE model MiniMax-2.5 on EPYC 9175F + RTX PRO 6000. Expert Offload benchmarks across three quantization levels (IQ5_K/IQ4_NL/IQ3_S), plus one-shot web generation …

Qwen3.5-397B IQ4_NL Measured: 22.5tok/s Average from 28 Runs, Hybrid Offload Config and 400B-Class MoE Daily Viability
2026-02-27
Qwen3.5-397B-A17B (397B total / 17B active MoE) deployed with IQ4_NL quantization on EPYC 9175F + GPU hybrid setup. 28 consecutive inference runs averaging TG 22.5tok/s, peak PP 372tok/s. Documenting …

Llama-4-Scout-17B-16E Measured: CPU Q6_K 17tok/s vs GPU nvfp4 60tok/s, Cache Strategy and 100K Context Boundary
2026-02-27
Llama-4-Scout (17B active / 16-expert MoE) benchmarked on EPYC 9175F CPU Q6_K inference and RTX PRO 6000 Blackwell Max-Q GPU nvfp4 inference. CPU 17tok/s vs GPU 30-60tok/s. Validating mmap cache …

Browse all articles →

Software Tools（8 articles）

Development tools, IDE configurations, MCP integrations, code analysis utilities, and web project implementations.

Key topics: VS Code Server, Zed, Serena MCP, ctree, Dagster, Django, Lightdash, shelpa

Latest articles:

Qwen3.5-397B Autonomous Code Generation: From Dental Clinic Sites to Django CMS Foundations
2026-02-27
Two code generation validations using the 400B-class MoE model Qwen3.5-397B. One-shot generation of a 6-page dental clinic static site (HTML+Tailwind+Alpine.js) and single-turn generation of a …

code-tree Specification, Design Intent, and Expected Effects — LLM Context Optimization Tool
2025-03-01
code-tree architecture and tool specifications, context compression and token cost reduction implementation strategy, operational flow through MCP integration

shelpa-mcp: Design Record of a Scrapped Virtual Pipeline
2025-03-01
Architecture design of MCP-compliant virtual shell server (shelpa-mcp), command routing, pipeline stage management, session CWD, and dual-write tee implementation — ultimately abandoned due to model …

Browse all articles →

Workflows（2 articles）

Development workflows, coding philosophy, AI agent configurations, prompt specifications, and automation practices.

Key topics: Coding philosophy, LLM agent operations, LTX2 prompt specifications, bilingual proofreading, local AI development environment

Latest articles:

LTX-2 Video Generation Prompt Engineering: From 36-Scene Horror to Cinematic Continuity Pipelines
2026-02-27
Structured prompt specifications for LTX-2 video generation. Covers the 36-scene horror scenario template with mandatory dialogue, cinematic shot design principles, and multi-scene visual continuity …

Bilingual AI Proofreading and Translation Prompt Definitions
2026-02-26
This document defines AI prompts for engineers to translate English to Japanese and to proofread Japanese into 'English-translation-friendly Japanese'.

Browse all articles →

Architecture（1 article）

System architecture designs, distributed pipeline patterns, and migration records.

Key topics: Rust, NATS, Dagster, OpenAI Proxy, SSE Streaming, Go Migration

Latest articles:

Rust + NATS + Dagster AI Factory: OpenAI Proxy, Idempotent Design, SSE Streaming, and Go Migration Record
2026-02-27
Rust(axum) OpenAI-compatible proxy, NATS Core/JetStream event relay, Dagster oneshot job execution, PG idempotency design, Qdrant semantic cache, SSE streaming, Quadlet/systemd integration. Plus the …

Browse all articles →