Hi! Yes, the premium voices are Kokoro. I’m only exposing the English voices right now because the rest of the pipeline around them is English-first and custom, especially pronunciation/G2P, QA, and timestamp awareness. I’d like to expand that over time, but I don’t want to overpromise multilingual support before the surrounding stack is ready. So I'm taking it one language at a time based on demand and feedback.
AI summaries are currently generated remote, not local. Those currently leverage gpt-4o-mini. TTS and OCR are on-device and summarization is the cloud-backed feature.
Voice names and number of English voices suggest that it's using Kokoro. Kokoro also supports other languages. Any plans to expose those?
Do AI summaries also run locally or is there a hosted model involved? What model is that?
Hi! Yes, the premium voices are Kokoro. I’m only exposing the English voices right now because the rest of the pipeline around them is English-first and custom, especially pronunciation/G2P, QA, and timestamp awareness. I’d like to expand that over time, but I don’t want to overpromise multilingual support before the surrounding stack is ready. So I'm taking it one language at a time based on demand and feedback.
AI summaries are currently generated remote, not local. Those currently leverage gpt-4o-mini. TTS and OCR are on-device and summarization is the cloud-backed feature.