The endless loop grew tiresome. Downloading models, scripting tests, watching them falter on setups while big tech smirks from afar. Trial-and-error felt like a slow grind, so Speechos came about: a web UI to drop audio, switch models on the fly, and watch the comparisons unfold.
All local, data stays put. Mic input or file upload. It auto-senses GPU/CPU/RAM for smart defaults, but tweaks are possible.
Built-in (no Docker): faster-whisper (tiny to large-v3), Vosk, Wav2Vec2, Piper, Kokoro, Bark, eSpeak, Chatterbox, emotion2vec+, HuBERT, Resemblyzer, Silero VAD.
Docker extras: XTTS, ChatTTS, Orpheus, Fish-Speech, Qwen3-TTS, Parler, MeloTTS, Speaches, NeMo, PyAnnote, and more.
Python/FastAPI + Next.js, uv/pnpm. ./dev.sh starts it. MIT, scales from 2GB CPU basics to 24GB GPU full load.
The endless loop grew tiresome. Downloading models, scripting tests, watching them falter on setups while big tech smirks from afar. Trial-and-error felt like a slow grind, so Speechos came about: a web UI to drop audio, switch models on the fly, and watch the comparisons unfold.
All local, data stays put. Mic input or file upload. It auto-senses GPU/CPU/RAM for smart defaults, but tweaks are possible.
Built-in (no Docker): faster-whisper (tiny to large-v3), Vosk, Wav2Vec2, Piper, Kokoro, Bark, eSpeak, Chatterbox, emotion2vec+, HuBERT, Resemblyzer, Silero VAD.
Docker extras: XTTS, ChatTTS, Orpheus, Fish-Speech, Qwen3-TTS, Parler, MeloTTS, Speaches, NeMo, PyAnnote, and more.
Python/FastAPI + Next.js, uv/pnpm. ./dev.sh starts it. MIT, scales from 2GB CPU basics to 24GB GPU full load.
Grab it if it speaks to you.