Long-form Audio is built for podcasts, audiobooks, dialogue and bedtime stories. Multiple speakers, each with their own voice, taking turns.
Title and structure
Give your episode a clear title — the title shapes how Lumen paces transitions. Two-speaker conversational structure is the default; add a third speaker for round-tables or three-narrator audiobooks.
Cast your voices
Tap each speaker to assign a voice. Pick voices that contrast — different accent, different timbre, different gender if appropriate. Contrast is what makes a conversation feel alive.
Write turns, not paragraphs
Short turns sound natural. Long, monologue-style turns sound like reading. Aim for 1-3 sentence turns most of the time, with the occasional longer paragraph for substance.
Use natural conversation moves
Real conversations have:
- Interruptions — “Wait, hold on—”
- Agreement noises — “Yeah, exactly.”
- Self-corrections — “Well actually, I think—”
- Tangents — “Speaking of which, I read…”
Sprinkle these in. Lumen's model handles them naturally.
Generate and review
A 10-minute podcast renders in about 90 seconds. If a turn sounds wrong, swap that speaker's voice or rewrite the turn — don't re-render the whole episode.