Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add main track option to be used for speaker wav #15

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

k-aito
Copy link

@k-aito k-aito commented Dec 5, 2023

Hello,

The PR adds an option so that we can use the speaker_wav track as a complete subtitle track. I don't think it's the best thing to do, but it gives a way to use the clone voice on the short vocal of the subtitle.

I also had some exceptions that made the script crash. I think that the sentence splitting sometimes went wrong because the output couldn't be done in TTS like "Salut.." ".". With the while loop, it lets the user insert modified text so that it can continue.

You are free to make any modifications if you think there could be a better integration.

Thanks for your script. Have a good day.

Cheers

…he voice track that is passed must be cut based on the subtitles.

Signed-off-by: k-aito <[email protected]>
Copy link
Owner

@bnsantoso bnsantoso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi thanks for your contribution, very apriciate, sory for this late review. i suggest some change on your PR because it will cause infinite loop.

Comment on lines +132 to +139
while True:
try:
tts_method(f"{entry_data['text']}",file_path=audio_path,**convert_param,**kwargs)
break
except Exception as e:
print(f"Exception: {e}")
print(f"Actual text: {entry_data['text']}")
entry_data['text'] = input("New text: ")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will result an infinity loop if the error not sentence spliting. limit the while loop to few attempts and provide user ability to quit loop or to end the script. if posible only handle error for sentence splitting only, if other error happen make the script crash

btw can you give the example to recreate the sentence spliting error, i cant recreate them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, it's true I didn't think of a way to quit or to skip. Do you have a preference of a way to do that or maybe a code in mind? If you want to modify directly I don't mind.

I could reproduce the error with this code (the python and ass for testing)

$ cat test.py  
from subtoaudio import SubToAudio

sub = SubToAudio(fairseq_language="fra")
subtitle = sub.subtitle("test.ass")
sub.convert_to_audio(sub_data=subtitle, tempo_mode="precise", output_path="vf.mp3", voice_conversion=False, speaker_wav="vo_vocal.wav")
$ cat test.ass 
[Script Info]
; Script generated by FFmpeg/Lavc60.3.100
ScriptType: v4.00+
PlayResX: 384
PlayResY: 288
ScaledBorderAndShadow: yes
YCbCr Matrix: None

[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Arial,16,&Hffffff,&Hffffff,&H0,&H0,0,0,0,0,100,100,0,0,1,1,0,2,10,10,10,1

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:08.18,0:00:12.01,Default,,0,0,0,,Salut. Tu es la...?
Dialogue: 0,0:00:09.18,0:10:12.01,Default,,0,0,0,,1.

Maybe it's a kind of combination of stuff like using French and "." being discarded. But I get the following exception.

Traceback (most recent call last):
  File "/home/user/sub-to-audio/test.py", line 5, in <module>
    sub.convert_to_audio(sub_data=subtitle, tempo_mode="precise", output_path="vf.mp3", voice_conversion=False, speaker_wav="vo_vocal.wav")
  File "/home/user/sub-to-audio/subtoaudio/subtoaudio.py", line 120, in convert_to_audio
    tts_method(f"{entry_data['text']}",file_path=audio_path,**convert_param,**kwargs)
  File "/home/user/sub-to-audio/env/lib/python3.11/site-packages/TTS/api.py", line 403, in tts_to_file
    wav = self.tts(text=text, speaker=speaker, language=language, speaker_wav=speaker_wav, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/sub-to-audio/env/lib/python3.11/site-packages/TTS/api.py", line 341, in tts
    wav = self.synthesizer.tts(
          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/sub-to-audio/env/lib/python3.11/site-packages/TTS/utils/synthesizer.py", line 390, in tts
    outputs = synthesis(
              ^^^^^^^^^^
  File "/home/user/sub-to-audio/env/lib/python3.11/site-packages/TTS/tts/utils/synthesis.py", line 221, in synthesis
    outputs = run_model_torch(
              ^^^^^^^^^^^^^^^^
  File "/home/user/sub-to-audio/env/lib/python3.11/site-packages/TTS/tts/utils/synthesis.py", line 53, in run_model_torch
    outputs = _func(
              ^^^^^^
  File "/home/user/sub-to-audio/env/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/sub-to-audio/env/lib/python3.11/site-packages/TTS/tts/models/vits.py", line 1150, in inference
    attn = generate_path(w_ceil.squeeze(1), attn_mask.squeeze(1).transpose(1, 2))
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2)

Comment on lines +125 to +130
if cut_speaker_wav:
t1 = entry_data['start_time']
t2 = entry_data['end_time']
vocal = fullvocal[t1:t2]
vocal.export(speaker_wav, format="wav")

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is your intention. Does this mean that every subtitle entry's voice will change according to the original voice track of the subtitle? So, is this a full voiceover?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It kind of does, or it was my idea. The best would be to have a specific wav for each speaker, but it would require a lot of preparation.

I can describe my use case maybe it would be better to understand. My goal was to dub a movie because it only exists with subtitles, which my friend cannot read, so I wanted to generate a dub.

First, I extract the vocals by using https://github.com/Anjok07/ultimatevocalremovergui it takes some time, but I will have the "intrumental" track and vocal track.

I use the vocal track with sub-to-audio so that each subtitle uses its own portion of voice as speaker_wav. There are probably issues, like if one subtitle contains two speakers, but the idea was to have something work with limited effort.

At the end, I mix the instrumental, dubbed track and vocals with a lower volume because sometimes the vocals also extract normal sounds like crying, laughing...

Don't hesitate to ping me if something is not very clear in my explain.

Have a good day

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants