https://github.com/viitor-ai/viitor-voice-nar
Whilst it is not the only benchmark which matters, but having such low WER for both English and Chinese is pretty impressive. About one word wrong out of every 100 words — which in TTS evaluation is excellent and implies the synthesized speech is highly intelligible.

Leave a Comment