// Writing

Thinking out loud.

Writing on AI infrastructure, developer tooling, and technical content strategy. Practitioner-level posts from someone who builds the systems and writes about them.

Fine-Tuning vs RAG for Agent Memory: When Each Approach Makes Sense
Fine-tuning and RAG solve different parts of the agent memory problem. Here is how to decide which one you actually need.
8 min read ai-agents rag fine-tuning
Why Your Agent Remembers the Wrong Thing: Memory Attribution Failures
Memory attribution failures cause AI agents to act on outdated or misassigned context. Here is what I found examining the failure patterns.
6 min read ai-agents agent-memory production-errors
Shared Memory vs Isolated Memory in Multi-Agent Workflows
How to choose between shared and isolated memory architectures for multi-agent systems, with trade-offs from production deployments.
6 min read ai-agents multi-agent agent-memory
Building a Customer Support Agent with Persistent Memory: A Worked Example
I built a customer support agent that actually remembers across sessions. Here is what I learned about memory architecture, serialization trade-offs, and the failure modes that will kill your deployment.
9 min read ai-agents memory customer-support
LLM Inference Optimization: What Actually Works in Production
A practical breakdown of the inference optimization techniques that move the needle — batching, quantization, caching, and attention kernels — with concrete numbers and the tradeoffs between them.
6 min read ai llm infrastructure
Why Your Coding Agent Keeps Forgetting Everything: Memory Persistence in AI Coding Assistants
The memory persistence patterns that actually work for AI coding assistants, and why most agents lose context between sessions.
8 min read ai-agents coding memory
Contextual Compression for Agent Memory: What Stays and What Goes
How agents decide what to keep in memory when context space is finite, and the three compression strategies that actually work.
7 min read ai-agents agent-memory context-windows
Memory Versioning and Audit Trails for Regulated AI Agents
If your agent overwrites its memory, you cannot pass a compliance audit. How to build append-only memory versioning and trace agent reasoning.
6 min read ai agents agent-memory compliance
Episodic, Semantic, and Working Memory in AI Agents: A Practical Map
AI agents juggle three distinct memory types. Getting them wrong is the source of most agent memory failures I see in production.
6 min read ai-agents agent-memory llm-architecture
Memory Serialization: How Agents Persist State Across Sessions
Why agents forget everything on restart, and the serialization patterns that actually solve it
6 min read ai-agents agent-memory agent-architecture
The HERMES.md Bug That Silently Burned $200 in Claude Code Credits
A case-sensitive string in git commit messages routes Claude Code API requests through extra usage billing instead of plan quota. Here is how it works and what it means for teams.
6 min read claude anthropic billing
The Memory Hierarchy: Why RAG Alone Is Not Enough for Agent Memory
RAG handles document retrieval well. It handles agent memory poorly, because agents need episodic recall, working context, and cross-session persistence that a vector store cannot provide.
6 min read ai-agents agent-memory rag
Why Agent Memory Retrieval Is Asymmetric and Why It Breaks Your RAG Pipeline
The retrieval patterns that work for a query do not work when the agent is the one retrieving, and this asymmetry silently breaks production RAG systems.
5 min read ai-agents agent-memory rag
Multi-Agent vs Single-Agent Systems: The Real Trade-offs
The decision between one agent and many is not about capability. It is about failure modes, latency, and operational complexity.
5 min read ai agents multi-agent systems agent architecture
The Agent Design Space: A Map of What Engineers Are Actually Building
After surveying production agents across industries, the design space clusters into patterns. Here is what I found.
6 min read ai agents architecture
When to Build an Agent and When to Build a Smarter Assistant
The difference between an AI agent and a smart assistant comes down to one thing: who drives the loop.
5 min read ai agents ai architecture agent design
Lambda Calculus as AI Reasoning Benchmark
I have used lambda calculus to test whether AI systems can actually reason through composition, or whether they are just pattern-matching their way to plausible outputs.
6 min read ai reasoning benchmarking formal methods
Apple's iCloud Keychain Escrow Security: How It Works and Why It Matters
Apple's iCloud Keychain escrow uses a key-splitting architecture that makes escrow records cryptographically inaccessible without the device passcode. This is how it actually works.
6 min read security apple cryptography
Matz's Ruby AOT Compiler: How Spinel Differs from YJIT/MJIT and What It Means for Production Ruby
Spinel compiles Ruby to standalone native binaries with 11.6x speedups over miniruby. I dug into the architecture to understand why AOT beats JIT for long-running services.
6 min read ruby compilers performance
The Anatomy of an Agent Loop: Perceive, Think, Act, Remember
The agent loop is not one thing. It is four distinct phases that run in sequence, and understanding each one is how you debug what breaks.
7 min read ai agents agent architecture loop design