// writing

Writing

Posts and essays. Each card links out to where the writing actually lives.

Posts

The State of MCP Servers: I measured 10 MCPs under 4 models

I ran mcp-dyno across 10 popular MCP servers and 4 models — 40 runs — and 'tokens per task' lied every time: the premium model wasn't the leanest, two search servers were 8× apart on payload, and some tools were unusable on first reach. What a six-pillar measurement reveals that a single number hides.

Substack

I tried to improve my MCP. The hard part was knowing if I had.

Optimizing an MCP server is the easy part; knowing whether you actually improved it is the hard part. How a noisy benchmark — and a 'cheaper, faster' server that turned out to be 23% correct — led me to build mcp-dyno, a CLI that measures an MCP across five lenses with real error bars.

Substack

What benchmarking my own MCP server taught me about building tools for LLMs

A journey from 22 hand-built tools to one execute_code, through a regression I caused myself, into an ablation methodology that fixed it. With actual numbers.

Substack

I built an AI round table from 4 years of Lenny's Podcast — here's what almost killed it, and what saved it

A 10-day buildathon story about fake quotes, 286 guests, design dead-ends, and an accidental poker game. The LLM failure mode I built the entire architecture around to prevent.

Substack