The sonic atelier

Refining voice
for the social era.

Curate the perfect tone. Refine the pacing. Bring your text to life, then export it as a TikTok-ready social video. All on-device. No account. No catch.

Enter the studio How it works ↓

Built for laptop + Wi-Fi · ~300MB first-run download for text to speech, ~700MB if you add karaoke captions · cached forever
Chrome · Edge · Firefox · WebAssembly

Hear the voices

28 distinct personas running entirely on your hardware. Ideal for ASMR, podcasts, and cinematic narration.

Real samples, pre-rendered with the same Kokoro engine that runs in your browser. When you use the studio, your text never leaves your device.

5 of the 28 voices

Heart

WARM · CONVERSATIONAL · DEEP

"Hi. You just found a voice studio that costs nothing."

What you shape

Three dimensions.
One craft.

Tone

28 sculpted voices across American and British English. Each tagged, each curated, each character-driven. Pick the one that fits the moment.

Pacing

Insert silences with [pause:500]. Adjust playback speed. Tune karaoke word-highlights to hit the beat. Rhythm is half the performance.

Frame

Export as a vertical video with burned-in karaoke captions in six curated styles. TikTok, Reels, Shorts, ready to upload without ever touching another editor.

Under the hood

No servers.
No invoices.

PixVoice runs Kokoro-82M for speech, with Whisper-base for word-level alignment on capable desktop devices (mobile falls back to phoneme timing). Everything stays in your browser via WebAssembly. Models cache on first visit, so sessions load instantly thereafter, with no servers, no accounts, no quotas.

First-run download

Kokoro-82M text-to-speech ~300MB
Whisper-base only for karaoke captions ~400MB
Total if you use karaoke ~700MB

Every subsequent visit loads in ~1.5 seconds. Once cached, the models never re-download, so later visits load in seconds.

The studio awaits.

Enter the studio

Refining voice for the social era.