// Writing
Thinking out loud.
Writing on AI infrastructure, developer tooling, and technical content strategy. Practitioner-level posts from someone who builds the systems and writes about them.
Fine-Tuning vs RAG for Agent Memory: When Each Approach Makes Sense
Fine-tuning and RAG solve different parts of the agent memory problem. Here is how to decide which one you actually need.
Why Your Agent Remembers the Wrong Thing: Memory Attribution Failures
Memory attribution failures cause AI agents to act on outdated or misassigned context. Here is what I found examining the failure patterns.
Shared Memory vs Isolated Memory in Multi-Agent Workflows
How to choose between shared and isolated memory architectures for multi-agent systems, with trade-offs from production deployments.
Building a Customer Support Agent with Persistent Memory: A Worked Example
I built a customer support agent that actually remembers across sessions. Here is what I learned about memory architecture, serialization trade-offs, and the failure modes that will kill your deployment.
LLM Inference Optimization: What Actually Works in Production
A practical breakdown of the inference optimization techniques that move the needle — batching, quantization, caching, and attention kernels — with concrete numbers and the tradeoffs between them.
Why Your Coding Agent Keeps Forgetting Everything: Memory Persistence in AI Coding Assistants
The memory persistence patterns that actually work for AI coding assistants, and why most agents lose context between sessions.
Contextual Compression for Agent Memory: What Stays and What Goes
How agents decide what to keep in memory when context space is finite, and the three compression strategies that actually work.
Memory Versioning and Audit Trails for Regulated AI Agents
If your agent overwrites its memory, you cannot pass a compliance audit. How to build append-only memory versioning and trace agent reasoning.
Episodic, Semantic, and Working Memory in AI Agents: A Practical Map
AI agents juggle three distinct memory types. Getting them wrong is the source of most agent memory failures I see in production.
Memory Serialization: How Agents Persist State Across Sessions
Why agents forget everything on restart, and the serialization patterns that actually solve it
The HERMES.md Bug That Silently Burned $200 in Claude Code Credits
A case-sensitive string in git commit messages routes Claude Code API requests through extra usage billing instead of plan quota. Here is how it works and what it means for teams.
The Memory Hierarchy: Why RAG Alone Is Not Enough for Agent Memory
RAG handles document retrieval well. It handles agent memory poorly, because agents need episodic recall, working context, and cross-session persistence that a vector store cannot provide.
Why Agent Memory Retrieval Is Asymmetric and Why It Breaks Your RAG Pipeline
The retrieval patterns that work for a query do not work when the agent is the one retrieving, and this asymmetry silently breaks production RAG systems.
Multi-Agent vs Single-Agent Systems: The Real Trade-offs
The decision between one agent and many is not about capability. It is about failure modes, latency, and operational complexity.
The Agent Design Space: A Map of What Engineers Are Actually Building
After surveying production agents across industries, the design space clusters into patterns. Here is what I found.
When to Build an Agent and When to Build a Smarter Assistant
The difference between an AI agent and a smart assistant comes down to one thing: who drives the loop.
Lambda Calculus as AI Reasoning Benchmark
I have used lambda calculus to test whether AI systems can actually reason through composition, or whether they are just pattern-matching their way to plausible outputs.
Apple's iCloud Keychain Escrow Security: How It Works and Why It Matters
Apple's iCloud Keychain escrow uses a key-splitting architecture that makes escrow records cryptographically inaccessible without the device passcode. This is how it actually works.
Matz's Ruby AOT Compiler: How Spinel Differs from YJIT/MJIT and What It Means for Production Ruby
Spinel compiles Ruby to standalone native binaries with 11.6x speedups over miniruby. I dug into the architecture to understand why AOT beats JIT for long-running services.
The Anatomy of an Agent Loop: Perceive, Think, Act, Remember
The agent loop is not one thing. It is four distinct phases that run in sequence, and understanding each one is how you debug what breaks.