// Writing

Thinking out loud.

Writing on AI infrastructure, developer tooling, and technical content strategy. Practitioner-level posts from someone who builds the systems and writes about them.

Fine-Tuning vs RAG for Agent Memory: When Each Approach Makes Sense

Fine-tuning and RAG solve different parts of the agent memory problem. Here is how to decide which one you actually need.

8 min read ai-agents rag fine-tuning

Why Your Agent Remembers the Wrong Thing: Memory Attribution Failures

Memory attribution failures cause AI agents to act on outdated or misassigned context. Here is what I found examining the failure patterns.

6 min read ai-agents agent-memory production-errors

Shared Memory vs Isolated Memory in Multi-Agent Workflows

How to choose between shared and isolated memory architectures for multi-agent systems, with trade-offs from production deployments.

6 min read ai-agents multi-agent agent-memory

Building a Customer Support Agent with Persistent Memory: A Worked Example

I built a customer support agent that actually remembers across sessions. Here is what I learned about memory architecture, serialization trade-offs, and the failure modes that will kill your deployment.

9 min read ai-agents memory customer-support

LLM Inference Optimization: What Actually Works in Production

A practical breakdown of the inference optimization techniques that move the needle — batching, quantization, caching, and attention kernels — with concrete numbers and the tradeoffs between them.

6 min read ai llm infrastructure

Why Your Coding Agent Keeps Forgetting Everything: Memory Persistence in AI Coding Assistants

The memory persistence patterns that actually work for AI coding assistants, and why most agents lose context between sessions.

8 min read ai-agents coding memory

Contextual Compression for Agent Memory: What Stays and What Goes

How agents decide what to keep in memory when context space is finite, and the three compression strategies that actually work.

7 min read ai-agents agent-memory context-windows

Memory Versioning and Audit Trails for Regulated AI Agents

If your agent overwrites its memory, you cannot pass a compliance audit. How to build append-only memory versioning and trace agent reasoning.

6 min read ai agents agent-memory compliance

Episodic, Semantic, and Working Memory in AI Agents: A Practical Map

AI agents juggle three distinct memory types. Getting them wrong is the source of most agent memory failures I see in production.

6 min read ai-agents agent-memory llm-architecture

Memory Serialization: How Agents Persist State Across Sessions

Why agents forget everything on restart, and the serialization patterns that actually solve it

6 min read ai-agents agent-memory agent-architecture

The HERMES.md Bug That Silently Burned $200 in Claude Code Credits

A case-sensitive string in git commit messages routes Claude Code API requests through extra usage billing instead of plan quota. Here is how it works and what it means for teams.

6 min read claude anthropic billing

The Memory Hierarchy: Why RAG Alone Is Not Enough for Agent Memory

RAG handles document retrieval well. It handles agent memory poorly, because agents need episodic recall, working context, and cross-session persistence that a vector store cannot provide.

6 min read ai-agents agent-memory rag

Why Agent Memory Retrieval Is Asymmetric and Why It Breaks Your RAG Pipeline

The retrieval patterns that work for a query do not work when the agent is the one retrieving, and this asymmetry silently breaks production RAG systems.

5 min read ai-agents agent-memory rag

Multi-Agent vs Single-Agent Systems: The Real Trade-offs

The decision between one agent and many is not about capability. It is about failure modes, latency, and operational complexity.

5 min read ai agents multi-agent systems agent architecture

The Agent Design Space: A Map of What Engineers Are Actually Building

After surveying production agents across industries, the design space clusters into patterns. Here is what I found.

6 min read ai agents architecture

When to Build an Agent and When to Build a Smarter Assistant

The difference between an AI agent and a smart assistant comes down to one thing: who drives the loop.

5 min read ai agents ai architecture agent design

Lambda Calculus as AI Reasoning Benchmark

I have used lambda calculus to test whether AI systems can actually reason through composition, or whether they are just pattern-matching their way to plausible outputs.

6 min read ai reasoning benchmarking formal methods

Apple's iCloud Keychain Escrow Security: How It Works and Why It Matters

Apple's iCloud Keychain escrow uses a key-splitting architecture that makes escrow records cryptographically inaccessible without the device passcode. This is how it actually works.

6 min read security apple cryptography

Matz's Ruby AOT Compiler: How Spinel Differs from YJIT/MJIT and What It Means for Production Ruby

Spinel compiles Ruby to standalone native binaries with 11.6x speedups over miniruby. I dug into the architecture to understand why AOT beats JIT for long-running services.

6 min read ruby compilers performance

The Anatomy of an Agent Loop: Perceive, Think, Act, Remember

The agent loop is not one thing. It is four distinct phases that run in sequence, and understanding each one is how you debug what breaks.

7 min read ai agents agent architecture loop design