Field notes

Notes from the work.

Short pieces on AI, building, and what we're noticing along the way. Written when we have something specific to say.

Counter-take

11 May 2026 6 min read

The Demo Trap

Most AI demos optimize for convincing the room. The ones that translate to production optimize for telling the truth. Four sins of demos that lie, and the test that catches them.

Playbook

11 May 2026 7 min read

When Your Eval Set Stops Telling the Truth

Your eval set is supposed to be the truth. It can quietly stop being it. Four ways evals lie, the quarterly audit that catches them, and the discipline of killing examples.

Playbook

04 May 2026 7 min read

Latency Budgets for AI Features

Cost gets a ceiling. Latency rarely does — until users churn. The interactive bar, the budget hierarchy, and the four levers when you're over the line.

Playbook

07 Apr 2026 7 min read

Structured Outputs and the Validation Trap

How to get a model to return clean JSON without paying the retry tax. Schema design, the validation pattern, and the three schema failures that quietly leak budget.

Decision framework

17 Mar 2026 7 min read

Picking a Model Size for a Given Task

Smaller models do most of the work — when given the right work. A five-step process for sizing models to tasks, and the three signals you've picked wrong.

Counter-take

23 Feb 2026 8 min read

Tool Use vs. Agents: Knowing When to Add Steps

Most "we need an agent" problems are tool-use problems, and most tool-use problems are prompt problems. The hierarchy of complexity — and the cost of skipping a rung.

Counter-take

02 Feb 2026 7 min read

Retrieval That Earns Its Keep

Most RAG isn't worth it. A four-question test for when to add retrieval, the three failure modes that turn it into a debugging burden, and what to try first.

Discipline

12 Jan 2026 6 min read

Prompt Versioning Without the Hairball

How to keep a prompt from becoming 30 untracked variants in 30 places. A four-rule discipline that scales from one prompt to a hundred.

Playbook

01 Dec 2025 7 min read

Cost Ceilings for AI Features

Most AI features die for cost, not quality. Set the unit-economics ceiling before you ship, watch the four cost vectors, and know the three levers when you're over budget.

Playbook

10 Nov 2025 7 min read

Logging for LLM Systems

What to capture before you regret not capturing it. The minimum log schema — and the three questions it should let you answer in under five minutes.

Design PT-BR

20 Oct 2025 6 min read

Where Humans Belong in Your AI Loop

Every AI feature has humans somewhere. Most teams put them in the wrong place. Four placement modes — and a four-question test for picking the right one.

Em português →

Playbook PT-BR

29 Sep 2025 8 min read

Evals Before Features

The unit-test playbook for LLM systems. How to build your first 50-example eval set in a week — and why every team that skips this step pays for it later.

Em português →

Diagnostic PT-BR

08 Sep 2025 7 min read

The AI Ambition Gap

Almost every team has shipped an AI demo. Almost none have shipped an AI feature their users rely on every day. A diagnostic — and a three-question filter to get unstuck.

Em português →

Colophon Set in Inter From Rio de Janeiro RSS 21x · BR

Get in touch

Working on something hard?

We help founders and teams turn AI ambition into systems that ship and stay shipped. If that's you, write to us — short and direct is fine.

Talk to us