background music/noise in the synthesized samples #863

taalua · 2025-01-10T04:15:34Z

Hi,

Thank you for this work.
I tried to synthesize Japanese text using CosyVoice2-0.5B model, and after trying many times with various clean prompt_speech audios, there are some quite loud music/noise in the background. Why this is happening, and how to fix?

for i, j in enumerate(cosyvoice.inference_cross_lingual('この新しいシステムのパフォーマンスには感心しています。楽
しい会話が行われているのが聞こえます。', prompt_speech_16k, stream=False, text_frontend=False)):
    torchaudio.save('JA_SP.wav', j['tts_speech'], cosyvoice.sample_rate)

JA_SP_sample.zip

Thank you.

The text was updated successfully, but these errors were encountered:

aluminumbox · 2025-01-10T06:44:28Z

改成<|jp|>この新しいシステ

taalua · 2025-01-10T21:44:39Z

Thank you for your prompt answer.
May I ask, why this is the case?

aluminumbox · 2025-01-11T15:54:47Z

Thank you for your prompt answer. May I ask, why this is the case?

during training, for some language like chinese/english/japanese/korean, we add langauge tag at sentence start. for some language like cantonese, we use instruct like '请用粤语讲这句话'. please check out demo page for details

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

background music/noise in the synthesized samples #863

background music/noise in the synthesized samples #863

taalua commented Jan 10, 2025

aluminumbox commented Jan 10, 2025

taalua commented Jan 10, 2025

aluminumbox commented Jan 11, 2025

background music/noise in the synthesized samples #863

background music/noise in the synthesized samples #863

Comments

taalua commented Jan 10, 2025

aluminumbox commented Jan 10, 2025

taalua commented Jan 10, 2025

aluminumbox commented Jan 11, 2025