projects

Lenny's Round Table

A virtual round table built on 4 years of Lenny's Podcast. Drag three guests onto a felt, ask a question, and they respond as themselves — grounded in their real transcripts, with verbatim-validated quotes. They address each other. Sometimes they disagree. Sometimes only one of three has anything to say and the other two stay quiet.

AI AgentsRAGPersona DesignBuildathon
Lenny's Round Table

What it does

You drag three guests from a 287-person roster — Marty Cagan, Keith Rabois, Elena Verna, April Dunford, whoever — onto seats at a felt table, ask a question about something you're working on, and the seated guests respond as themselves. Every response is grounded in stance cards extracted from the guest's real episodes; tier-1 responses include a verbatim quote with the timestamp it came from. The guests address each other, not just you. When three seats are filled, a "Deal a hand" button appears and you can play no-limit Hold'em with the same three guests, each with persona-driven betting behaviour. Public visitors land in a demo mode that replays curated conversations turn-by-turn; an invite-token path unlocks the live experience.

Why I built it

Lenny opened the buildathon with a dataset of newsletters and podcast transcripts — four years, 355 newsletters, 299 podcasts. The thing I kept coming back to during brainstorming was some sort of advisory-board idea. I'm a product manager. I read the newsletter and listen to the podcast specifically because I want to learn about topics that resonate with my work. But what if I could just… ask the guests directly about a specific question or project I'm working on? Not in a 1:1 setting — more like an informal round-table. The hypothesis: a small advisory dynamic — three guests, real disagreements, real citations — was more useful than another search box or another database over the same corpus.

How it works

Each guest is an offline-baked persona pack — a ~50-80 KB JSON file produced by a 9-stage Node pipeline. Stages: preprocess (filter the guest's speech turns out of speaker-tagged transcripts) → bio → voice anchors (5–10 verbatim quotes for tone imitation) → stance cards (the core unit: ~30–80 "this guest believes X" claims per guest, each with a verbatim quote, citation, confidence score, and topic tag, programmatically validated against the transcript — if the substring isn't a contiguous match, the stance is dropped) → frameworks → adjacent topics → chunk embeddings → cross-guest disagreement graph → assemble. End state: 287 personas, 3,644 verbatim-validated quotes, 82 cross-guest disagreements, all committed as plain JSON.

A single user message goes through three server phases. Intent scoring runs a small Claude model in parallel per seated guest — each returns want_to_speak, a one-line intent, and which stance cards they'd lean on. A threshold filter picks who actually responds; a guest with nothing useful to add stays silent. Tiered response runs a larger model for the selected guests; the prompt includes the guest's persona pack, the other seated guests' names so they can address each other, and the disagreement graph entries that intersect this guest with the others at the table. Responses are constrained to a 5-tier hierarchy: tier 1 must include a verbatim quote; tier 4 is free synthesis with no quote; tier 5 is "I have nothing useful here." Quote-bearing responses are runtime-validated against the persona pack's quote pool — if the model invents a quote, the response auto-downgrades to tier 4 instead of publishing the fake.

The roster sidebar is hybrid search. Substring filter for name-shaped queries ("marty" → Marty Cagan). For topic queries ("PLG", "how should I price my SaaS?"), two engines run in parallel on every keystroke: a server-side keyword recommender with a PM-specific synonym map (pricing → monetization, revenue, packaging; ~10 ms, $0 per query), and a client-side semantic search using Xenova/all-MiniLM-L6-v2 loaded in-browser on first focus (~6 MB int8 weights, ~110 KB pre-built persona index). Keyword renders first; semantic silently overwrites when it arrives. Three-mode deployment (demo / production / dev) gates the LLM endpoints with per-IP rate limiting, same-origin checks, and strict zod input bounds.

What I learned

Three things stuck.

Hallucinations don't look like hallucinations when an LLM imitates a real person. The model will invent quotes — fluently, in the person's voice, with a confidence indistinguishable from accuracy. Found this on day three when Marty Cagan "quoted" something he'd never said. The product isn't shippable without preventing it. The architecture that survived: extract stances offline with verbatim quotes, validate every quote as a substring against the source transcript, drop any stance whose quote doesn't validate. At runtime, classify every response into a 5-tier grounding hierarchy and auto-downgrade to a no-quote tier if the model's quote fails validation. Keep the answer. Drop the unverifiable quote. Never publish the fake.

Curated intent and recommended intent are different products that share a UI. With 25 hand-picked guests, the search box was a substring filter on names and that was enough — you knew who you were looking for. With 286, picking became decision paralysis: testers told me "I don't know who to pick — can you recommend who to talk to based on what I'm working on?" My first attempt was an "Auto-Pick" button that picked semi-randomly. It was awful — random selection threw away the one thing that made the experience yours. What worked instead was making the search box itself the recommender, routing by input shape: name-shaped query → substring filter; topic-shaped query → hybrid keyword + semantic recommendation. No tab, no toggle. The user shouldn't pick which mode they're in — the system should detect it.

The thing that makes a demo memorable is almost never on the spec. Eight days in, mostly as a joke, I built a poker side-mode where the three seated guests play no-limit Hold'em with persona-driven betting behaviour pulled from a per-guest decision function. It's not a serious poker product. It's the 90-second emotional cold-open that makes the guests feel like people, not chatbots. Every single early tester clicked the poker button before asking a real question. Every one. The lesson: leave room in the schedule for the late-stage detour that arrives uninvited.

Stack

Next.jsTypeScriptAnthropicAI SDKTransformers.jsdnd-kitFramer MotionTailwindReplit