You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
i have tried to generate voice based on subtitles and when i set beams=3 or 4 i get wav file with additional ~20s silent.
https://imgur.com/a/iJ2N9JM
first is the original audio
second is genereted voice with beams=2
third is genereted voice with beams=3 ... the same is happenning for 4.
strange thing is that this 20 secound of silence is added only to few sentens like: "Jak wam idzie?", "nie ładnie", "co kiedy" rest of wavs are fine:
https://imgur.com/el4Z5yt
Why silence is not added to other short sentens like: "Nieładnie!" , "Tato.", etc...
I'm using code from documentation Inference available on this page
https://coqui-tts.readthedocs.io/en/latest/models/xtts.html
Beta Was this translation helpful? Give feedback.
All reactions