Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for word-level progress tracking in TextToSpeech #131

Open
Dhruv-1105 opened this issue Jul 9, 2024 · 2 comments
Open

Comments

@Dhruv-1105
Copy link
Contributor

Is your feature request related to a problem? Please describe:
In many applications that use text-to-speech (TTS), it is essential to track the progress of spoken words to provide features such as synchronized text highlighting. Currently, the @capacitor-community/text-to-speech package does not offer a way to get real-time updates on the specific words being spoken, which limits its utility in such scenarios.

Describe the solution you'd like:
I propose adding support for an onRangeStart event that emits the start and end indices of the currently spoken word, along with the spoken word itself. This feature would allow developers to track which word is being spoken in real-time and implement functionalities such as synchronized text highlighting.
The implementation involves the following changes:

  • TextToSpeech.java:
    Added an UtteranceProgressListener that listens for onRangeStart events and emits the start and end indices of the spoken word.
    @Override
    public void onRangeStart(String utteranceId, int start, int end, int frame) {
        String spokenWord = text.substring(start, end);
        Log.d("TTS", "Spoken word: " + spokenWord);
        resultCallback.onRangeStart(start, end);
    }
  • TextToSpeechPlugin.java:
    Added a method to handle the onRangeStart callback and emit the event.
    @PluginMethod
    public void speak(PluginCall call) {
        // existing code...
        SpeakResultCallback resultCallback = new SpeakResultCallback() {
            @Override
            public void onRangeStart(int start, int end) {
                JSObject ret = new JSObject();
                ret.put("start", start);
                ret.put("end", end);
                call.resolve(ret);
            }
        };
        // existing code...
    }
  • definitions.ts:
    Added an addListener method to listen for onRangeStart events.
    addListener(eventName: 'onRangeStart', listenerFunc: (info: { start: number; end: number; spokenWord: string }) => void): Promise<PluginListenerHandle>;

Describe alternatives you've considered:
An alternative approach could be to periodically poll the TTS engine for its current progress, but this would be less efficient and more complex to implement. Integrating directly with the UtteranceProgressListener provides a more reliable and accurate solution.

Additional context:
This feature is critical for applications that need to provide synchronized text highlighting, karaoke-style text displays, or any other feature that requires real-time tracking of spoken words. Adding this capability to the @capacitor-community/text-to-speech package will significantly enhance its usability for a broader range of applications.

@robingenz robingenz added the feature Feature label Jul 9, 2024
@Dhruv-1105
Copy link
Contributor Author

Please check the following PR for this issue:
#132

@bridgecode
Copy link

Hello, I'm pretty new to Capacitor and not sure if this is the correct place to put this, but I tried to implement this feature in my Vue/Vite/Ionic/Capacitor app and I'm having trouble getting this to work. Not sure if this will only work on device but I was trying to use this in chrome so I could debug and get it working across all applications with this version. I would prefer to not have to use the Web SpeechSynthesisUtterance API in parallel to prevent different behavior between Web/Mobile. Is it possible to get it working in my local web environment or will this only work on a device, I can use an emulator but I prefer to have it working on the web as well

Thanks in advance for any info or if there's a code pen example I can see with a console log of the start/end/frame, and also thanks for the hard work getting this feature out I'm really excited for this! (please, lmk if I should put this in a separate issue

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants