The voices in Lumen are the same studio-grade voices you'd pay for in the biggest TTS platforms. The three sliders below them are what most people miss.
Stability
Stability controls how consistent the voice is across the read. Lower stability means more emotional range — pauses, sighs, energetic accents. Higher stability means a steadier, more predictable read.
- 0.3 — expressive, theatrical. Audiobooks, dialogue, storytelling.
- 0.5 — balanced. The default for most uses.
- 0.8 — flat, consistent. Tutorials, news, voiceover.
Similarity
Similarity controls how closely the output matches the original voice sample. Higher similarity preserves more of the original voice's quirks — accent thickness, breath patterns, micro-pauses.
For most purposes, leave this at 0.75. Bump it to 0.9 if you're cloning your own voice and want maximum resemblance.
Style exaggeration
This one is subtle. At 0 it neutralises the voice's personality. At 1 it cranks the voice's defining traits to 11. For most voices, leaving this at 0.1-0.2 produces the most natural read. Push it to 0.5+ only for characters that should feel exaggerated.
Speaker boost
A small CPU bump that improves resemblance to the original. Worth turning on for hero content.
Writing for TTS
Use ellipses for thinking pauses. Use em-dashes for sharp breaks. Use commas generously. Avoid abbreviations — write “Doctor” instead of “Dr.” because TTS reads punctuation literally.