You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for this work.
I tried to synthesize Japanese text using CosyVoice2-0.5B model, and after trying many times with various clean prompt_speech audios, there are some quite loud music/noise in the background. Why this is happening, and how to fix?
for i, j in enumerate(cosyvoice.inference_cross_lingual('この新しいシステムのパフォーマンスには感心しています。楽
しい会話が行われているのが聞こえます。', prompt_speech_16k, stream=False, text_frontend=False)):
torchaudio.save('JA_SP.wav', j['tts_speech'], cosyvoice.sample_rate)
Thank you for your prompt answer. May I ask, why this is the case?
during training, for some language like chinese/english/japanese/korean, we add langauge tag at sentence start. for some language like cantonese, we use instruct like '请用粤语讲这句话'. please check out demo page for details
Hi,
Thank you for this work.
I tried to synthesize Japanese text using CosyVoice2-0.5B model, and after trying many times with various clean prompt_speech audios, there are some quite loud music/noise in the background. Why this is happening, and how to fix?
JA_SP_sample.zip
Thank you.
The text was updated successfully, but these errors were encountered: