Add main track option to be used for speaker wav #15

k-aito · 2023-12-05T22:24:14Z

Hello,

The PR adds an option so that we can use the speaker_wav track as a complete subtitle track. I don't think it's the best thing to do, but it gives a way to use the clone voice on the short vocal of the subtitle.

I also had some exceptions that made the script crash. I think that the sentence splitting sometimes went wrong because the output couldn't be done in TTS like "Salut.." ".". With the while loop, it lets the user insert modified text so that it can continue.

You are free to make any modifications if you think there could be a better integration.

Thanks for your script. Have a good day.

Cheers

…he voice track that is passed must be cut based on the subtitles. Signed-off-by: k-aito <[email protected]>

Signed-off-by: k-aito <[email protected]>

bnsantoso

Hi thanks for your contribution, very apriciate, sory for this late review. i suggest some change on your PR because it will cause infinite loop.

bnsantoso · 2023-12-09T21:07:35Z

subtoaudio/subtoaudio.py

+        while True:
+          try:
+            tts_method(f"{entry_data['text']}",file_path=audio_path,**convert_param,**kwargs)
+            break
+          except Exception as e:
+            print(f"Exception: {e}")
+            print(f"Actual text: {entry_data['text']}")
+            entry_data['text'] = input("New text: ")


it will result an infinity loop if the error not sentence spliting. limit the while loop to few attempts and provide user ability to quit loop or to end the script. if posible only handle error for sentence splitting only, if other error happen make the script crash

btw can you give the example to recreate the sentence spliting error, i cant recreate them.

Hello, it's true I didn't think of a way to quit or to skip. Do you have a preference of a way to do that or maybe a code in mind? If you want to modify directly I don't mind.

I could reproduce the error with this code (the python and ass for testing)

$ cat test.py from subtoaudio import SubToAudio sub = SubToAudio(fairseq_language="fra") subtitle = sub.subtitle("test.ass") sub.convert_to_audio(sub_data=subtitle, tempo_mode="precise", output_path="vf.mp3", voice_conversion=False, speaker_wav="vo_vocal.wav")

$ cat test.ass [Script Info] ; Script generated by FFmpeg/Lavc60.3.100 ScriptType: v4.00+ PlayResX: 384 PlayResY: 288 ScaledBorderAndShadow: yes YCbCr Matrix: None [V4+ Styles] Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding Style: Default,Arial,16,&Hffffff,&Hffffff,&H0,&H0,0,0,0,0,100,100,0,0,1,1,0,2,10,10,10,1 [Events] Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text Dialogue: 0,0:00:08.18,0:00:12.01,Default,,0,0,0,,Salut. Tu es la...? Dialogue: 0,0:00:09.18,0:10:12.01,Default,,0,0,0,,1.

Maybe it's a kind of combination of stuff like using French and "." being discarded. But I get the following exception.

Traceback (most recent call last): File "/home/user/sub-to-audio/test.py", line 5, in <module> sub.convert_to_audio(sub_data=subtitle, tempo_mode="precise", output_path="vf.mp3", voice_conversion=False, speaker_wav="vo_vocal.wav") File "/home/user/sub-to-audio/subtoaudio/subtoaudio.py", line 120, in convert_to_audio tts_method(f"{entry_data['text']}",file_path=audio_path,**convert_param,**kwargs) File "/home/user/sub-to-audio/env/lib/python3.11/site-packages/TTS/api.py", line 403, in tts_to_file wav = self.tts(text=text, speaker=speaker, language=language, speaker_wav=speaker_wav, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/sub-to-audio/env/lib/python3.11/site-packages/TTS/api.py", line 341, in tts wav = self.synthesizer.tts( ^^^^^^^^^^^^^^^^^^^^^ File "/home/user/sub-to-audio/env/lib/python3.11/site-packages/TTS/utils/synthesizer.py", line 390, in tts outputs = synthesis( ^^^^^^^^^^ File "/home/user/sub-to-audio/env/lib/python3.11/site-packages/TTS/tts/utils/synthesis.py", line 221, in synthesis outputs = run_model_torch( ^^^^^^^^^^^^^^^^ File "/home/user/sub-to-audio/env/lib/python3.11/site-packages/TTS/tts/utils/synthesis.py", line 53, in run_model_torch outputs = _func( ^^^^^^ File "/home/user/sub-to-audio/env/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/user/sub-to-audio/env/lib/python3.11/site-packages/TTS/tts/models/vits.py", line 1150, in inference attn = generate_path(w_ceil.squeeze(1), attn_mask.squeeze(1).transpose(1, 2)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2)

bnsantoso · 2023-12-09T21:41:06Z

subtoaudio/subtoaudio.py

+        if cut_speaker_wav:
+          t1        = entry_data['start_time']
+          t2        = entry_data['end_time']
+          vocal     = fullvocal[t1:t2]
+          vocal.export(speaker_wav, format="wav")
+


I'm not sure if this is your intention. Does this mean that every subtitle entry's voice will change according to the original voice track of the subtitle? So, is this a full voiceover?

It kind of does, or it was my idea. The best would be to have a specific wav for each speaker, but it would require a lot of preparation.

I can describe my use case maybe it would be better to understand. My goal was to dub a movie because it only exists with subtitles, which my friend cannot read, so I wanted to generate a dub.

First, I extract the vocals by using https://github.com/Anjok07/ultimatevocalremovergui it takes some time, but I will have the "intrumental" track and vocal track.

I use the vocal track with sub-to-audio so that each subtitle uses its own portion of voice as speaker_wav. There are probably issues, like if one subtitle contains two speakers, but the idea was to have something work with limited effort.

At the end, I mix the instrumental, dubbed track and vocals with a lower volume because sometimes the vocals also extract normal sounds like crying, laughing...

Don't hesitate to ping me if something is not very clear in my explain.

Have a good day

k-aito added 2 commits December 5, 2023 23:02

Add the argument cut_speaker_wav:bool can be used to mention that t…

56f5a8a

…he voice track that is passed must be cut based on the subtitles. Signed-off-by: k-aito <[email protected]>

Add a way to change the text if an exception is raised

12125ff

Signed-off-by: k-aito <[email protected]>

bnsantoso requested changes Dec 9, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add main track option to be used for speaker wav #15

Add main track option to be used for speaker wav #15

k-aito commented Dec 5, 2023

bnsantoso left a comment

bnsantoso Dec 9, 2023

k-aito Dec 11, 2023

bnsantoso Dec 9, 2023

k-aito Dec 11, 2023

Add main track option to be used for speaker wav #15

Are you sure you want to change the base?

Add main track option to be used for speaker wav #15

Conversation

k-aito commented Dec 5, 2023

bnsantoso left a comment

Choose a reason for hiding this comment

bnsantoso Dec 9, 2023

Choose a reason for hiding this comment

k-aito Dec 11, 2023

Choose a reason for hiding this comment

bnsantoso Dec 9, 2023

Choose a reason for hiding this comment

k-aito Dec 11, 2023

Choose a reason for hiding this comment