Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

长音频声音克隆 #886

Open
xipingL opened this issue Jan 15, 2025 · 2 comments
Open

长音频声音克隆 #886

xipingL opened this issue Jan 15, 2025 · 2 comments

Comments

@xipingL
Copy link

xipingL commented Jan 15, 2025

是否支持单个的长音频克隆,因为我看项目提供的是几秒钟的零克隆,如果提供的是单个3-5分钟带参考文本的长音频,那要怎么做到克隆效果,或者有什么推荐的方案或者项目吗?感谢!

@aluminumbox
Copy link
Collaborator

切句挑一个质量高的片段,或者自己提取一下embedding做一下平均

@xipingL
Copy link
Author

xipingL commented Jan 15, 2025

切句挑一个质量高的片段,或者自己提取一下embedding做一下平均

提取embedding没问题,就是speech_token的提取会报错,还是说这个就不提取了

  File "C:\Users\root\Desktop\workspace\_tts\tts-collect\webui.py", line 89, in cosyvoice_gen
    for i, j in enumerate(cosyvoice.inference_zero_shot(text, prompt_text, prompt_speech_16k, stream=False)):
  File "C:\Users\root\Desktop\workspace\_tts\tts-collect\CosyVoice\cosyvoice\cli\cosyvoice.py", line 69, in inference_zero_shot
    model_input = self.frontend.frontend_zero_shot(i, prompt_text, prompt_speech_16k)
  File "C:\Users\root\Desktop\workspace\_tts\tts-collect\CosyVoice\cosyvoice\cli\frontend.py", line 150, in frontend_zero_shot
    speech_token, speech_token_len = self._extract_speech_token(prompt_speech_16k)
  File "C:\Users\root\Desktop\workspace\_tts\tts-collect\CosyVoice\cosyvoice\cli\frontend.py", line 83, in _extract_speech_token
    speech_token = self.speech_tokenizer_session.run(None,
  File "C:\ProgramData\miniconda3\envs\tts-show\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 220, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Add node. Name:'/Add_2' Status Message: /Add_2: right operand cannot broadcast on dim 0 LeftShape: {1,2341,1280}, RightShape: {1500,1280}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants