Daneel can now read answers aloud, and take your questions by voice. You will find a mic button in every chat composer, a Play button on every assistant message, and a fresh Speech section in Settings where you choose how it all sounds.
It is the kind of feature that looks small on a screenshot and changes how you use the extension. Read a long technical reply while you tidy your desk. Dictate a question while the other hand is on the trackpad. Have a research session narrate itself in the background.
Three voices, your choice
Text-to-speech in Daneel is not one engine hiding behind a toggle. It is a real picker, and the three options make sense for three different moments.
System voices, the default, come straight from your operating system through the browser's Speech Synthesis API. Nothing downloads. Nothing streams out. Your machine has had these voices for years, Daneel just puts them to work. They sound fine, they start instantly, and they cover every language your OS does.
Kokoro 82M is the one to pick when you want everything to stay on your device. It is a full neural TTS model, 82 million parameters, with 54 voices across seven languages. You download it once (around 326 megabytes) and from then on, every word of every assistant reply is synthesized locally on your GPU. No round-trip, no telemetry, no cloud. The voices sing, too: expressive, natural, and a notable step up from the flat cadence of most system voices.
Google Cloud voices live behind an advanced toggle because they stream text to Google servers. We filter them by default to keep the privacy story honest. When you opt in, though, the quality is genuinely remarkable, arguably the smoothest prosody you can get in a browser today. A good fit for the times when voice naturalness matters more than data locality.
All three sit behind the same interface. Switch between them in Settings, and the next click on Play uses the new provider. No restart, no reload.
Kokoro, the locality story
The fun engineering is all in Kokoro. It runs entirely in the browser, on WebGPU, inside a dedicated Web Worker hosted by the extension's headless page. Synthesis produces raw 24 kHz PCM which is piped into a host-side AudioContext that schedules chunks back-to-back with sample-accurate timing. The result is gapless playback across sentences, even for long replies.
A few decisions worth naming. We run Kokoro in fp32 on WebGPU, not the smaller quantized build, because quantized models drag dequantization ops onto the CPU and that single detail takes synthesis from 20 seconds down to 1. We chunk long replies using Kokoro-tuned bounds, with a forward-merge pass that protects the model from very short inputs where its prosody can wobble. Each new Play click preempts any prior one cleanly, and synthesis pipelines one chunk ahead of playback so you never hear a pause between sentences.
Because the AudioContext lives on the host page and not the tab you are browsing, audio keeps playing when you switch tabs. A small thing that is easy to miss until you switch away from a long article and the narration just continues.
Hold Alt+Space to dictate
Speech recognition uses the browser's built-in recognizer for now, which is cloud-backed on Chrome (audio streams to Google). The mic button lives in the composer, the keyboard shortcut is Alt+Space from anywhere on the page, and the transcript lands in your input box without auto-submitting. You get to read it before you send. When Airplane Mode is on, the mic disables itself with an explanatory tooltip, no accidental cloud round-trips.
A fully local speech-to-text option, based on the Moonshine model family, is the next piece. The catalog entries are already there, marked "Coming soon." The same privacy story as Kokoro, applied to the other direction.
The settings worth knowing
A single Speech section in Settings holds everything. Toggle either side on or off, pick a provider, pick a specific voice from the list (Web Speech gives you the OS catalog, Kokoro its 54), drag the rate slider between 0.5× and 2.0×, and flip Auto-read if you would rather every assistant message play itself without needing the button. There is also a Test button next to the voice picker, which is genuinely useful when you are trying to decide between, say, Bella and Nicole on Kokoro.
The download for Kokoro runs inline in the card, with progress. You can remove the model any time and the card goes back to offering the download, no disk left behind.
What is next
Moonshine local STT is the next shipping piece, bringing the same locality guarantee to dictation. Beyond that, we are looking at giving users a dtype choice for Kokoro (q8 for WASM-only devices, smaller download), adding a quick-switch shortcut for voices, and surfacing a visible "now speaking" indicator for longer replies.
For now, open Settings → Speech, download Kokoro if you have the patience for a one-time 326 MB pull, and let the assistant read you the next long reply. You might find it is the way you actually wanted to consume it all along.