A note on KV cache, MQA/GQA, MLA, and why long-context models are becoming memory systems.
Writing
Technical essays and research notes on AI infrastructure, agent systems, and the mechanics behind modern models.
Essays
Notes
Paper notes
Dense notes on papers worth understanding deeply, not summaries for engagement.
Work in public, but keep the bar high.