Show HN: Three new models by KittenML. <25 MB Open-source TTS. Highly Expressive

(kittenml.com)

3 points | by rohan_joshi 11 hours ago ago

5 comments

rohan_joshi 11 hours ago
New Kitten TTS V0.8 models are out in three variants - 80M, 40M, 14M. The largest model has the highest quality. The 14M variant reaches new SOTA in expressivity among similar sized models, despite being <25MB in size. All models are highly expressive and realistic with high quality voices. Kitten TTS is an open-source series of tiny and expressive text-to-speech models for on-device applications, built by KittenML (with < 3) . This release supports English text-to-speech applications in eight voices: four male and four female. Most models are quantized to int8 + fp16, and it uses onnx for runtime. The model is designed to run literally anywhere eg. raspberry pi, low-end smartphones, wearables, browsers etc. No GPU required! This release bridges the gap between on-device and cloud models for tts applications. Multi-lingual support is planned for the future.
We'd love your feedback! On-device AI is currently bottlenecked by the availability of tiny performant models. We're trying to change that by releasing open-source models that can unlock on-device voice agents and applications in the next few months.
Code, weights and more information available on our github: https://github.com/KittenML/KittenTTS
[-]
- 999900000999 10 hours ago
  Some actual audio examples would be nice. I'd like to see what this is before taking the time to run it
  [-]
  - rohan_joshi 8 hours ago
    we also launched on reddit and got great feedback on locallamma. the video with samples are posted there too.
  - rohan_joshi 8 hours ago
    hi, the readme in the github has a video. the entire audio is outputted from the models ^^
    would love the feedback.
    [-]
    - 999900000999 5 hours ago
      Thank you, this is what I'm excited about. I could run this on a raspberry pi and easily build a locally ran home assistant