Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

background music/noise in the synthesized samples #863

Open
taalua opened this issue Jan 10, 2025 · 3 comments
Open

background music/noise in the synthesized samples #863

taalua opened this issue Jan 10, 2025 · 3 comments

Comments

@taalua
Copy link

taalua commented Jan 10, 2025

Hi,

Thank you for this work.
I tried to synthesize Japanese text using CosyVoice2-0.5B model, and after trying many times with various clean prompt_speech audios, there are some quite loud music/noise in the background. Why this is happening, and how to fix?

for i, j in enumerate(cosyvoice.inference_cross_lingual('この新しいシステムのパフォーマンスには感心しています。楽
しい会話が行われているのが聞こえます。', prompt_speech_16k, stream=False, text_frontend=False)):
    torchaudio.save('JA_SP.wav', j['tts_speech'], cosyvoice.sample_rate)

JA_SP_sample.zip

Thank you.

@aluminumbox
Copy link
Collaborator

改成<|jp|>この新しいシステ

@taalua
Copy link
Author

taalua commented Jan 10, 2025

Thank you for your prompt answer.
May I ask, why this is the case?

@aluminumbox
Copy link
Collaborator

Thank you for your prompt answer. May I ask, why this is the case?

during training, for some language like chinese/english/japanese/korean, we add langauge tag at sentence start. for some language like cantonese, we use instruct like '请用粤语讲这句话'. please check out demo page for details

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants