// Writing

Thinking out loud.

Writing on AI infrastructure, developer tooling, and technical content strategy. Practitioner-level posts from someone who builds the systems and writes about them.

Why AI Agents Keep Failing in Production and What the Field Is Doing About It

I have spent two years watching agents fail in production. Here is what I keep seeing and what the field is starting to do about it.

4 min read ai agents production reliability

A Taxonomy of AI Agents That Actually Explains What You Are Building

Most AI agent taxonomies are either too academic or too vague to be useful. Here is the classification I use when I need to decide what kind of agent to build.

6 min read ai agents architecture

Python model.predict(): The Function That Turns Data Into Decisions

A practical guide to model.predict() across scikit-learn, Keras, PyTorch, and XGBoost—what it does, how it behaves differently across frameworks, and the gotchas that will bite you in production.

10 min read python machine-learning scikit-learn

RAG vs Memory: What AI Developers Need to Know

Understand the fundamental differences between RAG and memory systems for LLM applications, when to use each, and how to combine them in production.

13 min read ai rag memory

Short-Term Memory for AI Agents: A Practical Guide

Context windows are not memory. Here is what every production AI agent engineer needs to understand about token budgets, overflow handling, and how short-term and long-term memory actually work together.

11 min read ai agents memory

AI Memory Management for LLMs: What Actually Works

A senior engineer's breakdown of what memory management for LLMs actually looks like in production: eviction strategies, KV cache management, importance-weighted retention, and why your agent keeps forgetting things.

15 min read ai agents memory

Context Windows vs Memory: Why They Are Not the Same Thing

A 1M token context window is not memory. Treating it like one is how you build expensive systems that still forget what they were doing last Tuesday.

17 min read ai llm memory

State of AI Agent Memory in 2026

The memory stack for AI agents has exploded into a fragmented mess of competing approaches. Here is what actually works, what is still research, and why the next 18 months will sort the winners from the wreckage.

18 min read ai agents memory

The BEAM Memory Benchmark: Why 1M Context Windows Are Not Enough

The BEAM benchmark reveals that LLMs fail catastrophically at retrieving facts from the middle of long contexts. Here is what the data actually shows, why it happens, and what matters for real deployments.

11 min read ai llm memory

How Memory Works in HyperAgents

A deep dive into how HyperAgents retain context across interactions, layer memory architectures, and handle session continuity in production.

13 min read ai agents memory

Memory for Voice AI Agents: What Text Chatbots Cannot Do

Voice AI agents live or die by how they manage memory across a real-time streaming pipeline. Text chatbots solve memory with RAG. Voice agents need something different.

17 min read ai voice agents

Memory Hierarchy in AI Systems: From Sensory to Semantic

How layered memory architecture helps AI systems achieve long-term context, personalization, and continuous learning — and why flat memory fails.

11 min read ai agents memory

How Memory Works in Claude Code

A practical guide to understanding how Claude Code retains context across sessions, uses project files, and manages long-term memory for coding tasks.

16 min read ai claude agents

How Memory Works in DeerFlow

A deep dive into the memory architecture of DeerFlow: layered context passing, session state files, sub-agent isolation, and how it compares to Letta, AutoGen, and CrewAI.

11 min read ai agents memory

Technical Writing for Engineers: The 80/20 Guide

Most engineering documentation fails for the same reasons. Here is what actually moves the needle.

10 min read technical-writing developer-experience engineering-culture

LLM token budgets: a practical guide to cost control

Real numbers, real pricing, and concrete strategies for keeping your LLM spend predictable.

9 min read ai cost backend

RAG Evaluation Metrics: What Actually Matters

A practical guide to RAGAs, recall, precision, and the metrics that separate production RAG systems from prototypes.

8 min read rag evaluation llm

What Nobody Tells You About Error Handling in Production AI Agents

Hard-won lessons from running AI agents in production: the error patterns that actually break systems, and the patterns that fix them.

6 min read ai devtools backend

The 800ms Barrier: Profiling the Latency Chain of a Real-Time Gemini 3.1 Voice Agent

I built a sub-second latency voice assistant and profiled every millisecond of the Audio-to-Audio request/response loop on a MacBook Air M2. Here is the bottleneck analysis.

20 min read voice-ai real-time gemini

Context Engineering as Heap Management: Measuring Accuracy vs. KV Cache Eviction

VRAM is too expensive to waste on low-attention tokens. I benchmarked KV cache eviction strategies to treat LLM context like a managed heap, reaching 90% pruning with zero recall loss.

15 min read llm kv-cache memory-optimization