Any self-hosted speech-to-text / text-to-speech LLM available?

Zeon@lemmy.world · 2 days ago

Any self-hosted speech-to-text / text-to-speech LLM available?

GameGod@lemmy.ca · edit-2 20 hours ago

Whisper is the way to go for speech to text (edit: had that backwards). Whisper.cpp is decently fast too: https://github.com/ggerganov/whisper.cpp/releases/tag/v1.7.1 Get the binaries from the link that’s on that page (god GitHub usability sucks)

snekerpimp@lemmy.world · 20 hours ago

I thought whisper was hallucinating huge chunks of text in that medical transcription app. Is it more reliable with smaller chunks?

Windex007@lemmy.world · 2 days ago

Whisper is fantastic and has different sized models so you can zero in to what gives you the best mix of speed/accuracy for whatever hardware you’ll be running it on