-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add main track option to be used for speaker wav #15
base: main
Are you sure you want to change the base?
Add main track option to be used for speaker wav #15
Conversation
…he voice track that is passed must be cut based on the subtitles. Signed-off-by: k-aito <[email protected]>
Signed-off-by: k-aito <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi thanks for your contribution, very apriciate, sory for this late review. i suggest some change on your PR because it will cause infinite loop.
while True: | ||
try: | ||
tts_method(f"{entry_data['text']}",file_path=audio_path,**convert_param,**kwargs) | ||
break | ||
except Exception as e: | ||
print(f"Exception: {e}") | ||
print(f"Actual text: {entry_data['text']}") | ||
entry_data['text'] = input("New text: ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will result an infinity loop if the error not sentence spliting. limit the while loop to few attempts and provide user ability to quit loop or to end the script. if posible only handle error for sentence splitting only, if other error happen make the script crash
btw can you give the example to recreate the sentence spliting error, i cant recreate them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, it's true I didn't think of a way to quit or to skip. Do you have a preference of a way to do that or maybe a code in mind? If you want to modify directly I don't mind.
I could reproduce the error with this code (the python and ass for testing)
$ cat test.py
from subtoaudio import SubToAudio
sub = SubToAudio(fairseq_language="fra")
subtitle = sub.subtitle("test.ass")
sub.convert_to_audio(sub_data=subtitle, tempo_mode="precise", output_path="vf.mp3", voice_conversion=False, speaker_wav="vo_vocal.wav")
$ cat test.ass
[Script Info]
; Script generated by FFmpeg/Lavc60.3.100
ScriptType: v4.00+
PlayResX: 384
PlayResY: 288
ScaledBorderAndShadow: yes
YCbCr Matrix: None
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Arial,16,&Hffffff,&Hffffff,&H0,&H0,0,0,0,0,100,100,0,0,1,1,0,2,10,10,10,1
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:08.18,0:00:12.01,Default,,0,0,0,,Salut. Tu es la...?
Dialogue: 0,0:00:09.18,0:10:12.01,Default,,0,0,0,,1.
Maybe it's a kind of combination of stuff like using French and "." being discarded. But I get the following exception.
Traceback (most recent call last):
File "/home/user/sub-to-audio/test.py", line 5, in <module>
sub.convert_to_audio(sub_data=subtitle, tempo_mode="precise", output_path="vf.mp3", voice_conversion=False, speaker_wav="vo_vocal.wav")
File "/home/user/sub-to-audio/subtoaudio/subtoaudio.py", line 120, in convert_to_audio
tts_method(f"{entry_data['text']}",file_path=audio_path,**convert_param,**kwargs)
File "/home/user/sub-to-audio/env/lib/python3.11/site-packages/TTS/api.py", line 403, in tts_to_file
wav = self.tts(text=text, speaker=speaker, language=language, speaker_wav=speaker_wav, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/sub-to-audio/env/lib/python3.11/site-packages/TTS/api.py", line 341, in tts
wav = self.synthesizer.tts(
^^^^^^^^^^^^^^^^^^^^^
File "/home/user/sub-to-audio/env/lib/python3.11/site-packages/TTS/utils/synthesizer.py", line 390, in tts
outputs = synthesis(
^^^^^^^^^^
File "/home/user/sub-to-audio/env/lib/python3.11/site-packages/TTS/tts/utils/synthesis.py", line 221, in synthesis
outputs = run_model_torch(
^^^^^^^^^^^^^^^^
File "/home/user/sub-to-audio/env/lib/python3.11/site-packages/TTS/tts/utils/synthesis.py", line 53, in run_model_torch
outputs = _func(
^^^^^^
File "/home/user/sub-to-audio/env/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/user/sub-to-audio/env/lib/python3.11/site-packages/TTS/tts/models/vits.py", line 1150, in inference
attn = generate_path(w_ceil.squeeze(1), attn_mask.squeeze(1).transpose(1, 2))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2)
if cut_speaker_wav: | ||
t1 = entry_data['start_time'] | ||
t2 = entry_data['end_time'] | ||
vocal = fullvocal[t1:t2] | ||
vocal.export(speaker_wav, format="wav") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is your intention. Does this mean that every subtitle entry's voice will change according to the original voice track of the subtitle? So, is this a full voiceover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It kind of does, or it was my idea. The best would be to have a specific wav for each speaker, but it would require a lot of preparation.
I can describe my use case maybe it would be better to understand. My goal was to dub a movie because it only exists with subtitles, which my friend cannot read, so I wanted to generate a dub.
First, I extract the vocals by using https://github.com/Anjok07/ultimatevocalremovergui it takes some time, but I will have the "intrumental" track and vocal track.
I use the vocal track with sub-to-audio so that each subtitle uses its own portion of voice as speaker_wav. There are probably issues, like if one subtitle contains two speakers, but the idea was to have something work with limited effort.
At the end, I mix the instrumental, dubbed track and vocals with a lower volume because sometimes the vocals also extract normal sounds like crying, laughing...
Don't hesitate to ping me if something is not very clear in my explain.
Have a good day
Hello,
The PR adds an option so that we can use the speaker_wav track as a complete subtitle track. I don't think it's the best thing to do, but it gives a way to use the clone voice on the short vocal of the subtitle.
I also had some exceptions that made the script crash. I think that the sentence splitting sometimes went wrong because the output couldn't be done in TTS like
"Salut.." "."
. With the while loop, it lets the user insert modified text so that it can continue.You are free to make any modifications if you think there could be a better integration.
Thanks for your script. Have a good day.
Cheers