From 8843fd4adfd08af4741d4e1a6f0c88ca2e13aad0 Mon Sep 17 00:00:00 2001 From: Fangjun Kuang Date: Thu, 2 Jan 2025 19:04:17 +0800 Subject: [PATCH] Add RTF for TTS models (#690) --- .../onnx/tts/pretrained_models/index.rst | 1 + .../onnx/tts/pretrained_models/matcha.rst | 31 ++ .../source/onnx/tts/pretrained_models/rtf.rst | 94 ++++++ .../onnx/tts/pretrained_models/vits.rst | 296 +++++++++++++++++- 4 files changed, 421 insertions(+), 1 deletion(-) create mode 100644 docs/source/onnx/tts/pretrained_models/rtf.rst diff --git a/docs/source/onnx/tts/pretrained_models/index.rst b/docs/source/onnx/tts/pretrained_models/index.rst index d8c21ae67..56df695ce 100644 --- a/docs/source/onnx/tts/pretrained_models/index.rst +++ b/docs/source/onnx/tts/pretrained_models/index.rst @@ -14,5 +14,6 @@ This page list pre-trained models for text-to-speech. .. toctree:: :maxdepth: 5 + ./rtf ./matcha ./vits diff --git a/docs/source/onnx/tts/pretrained_models/matcha.rst b/docs/source/onnx/tts/pretrained_models/matcha.rst index b2559c9c9..2c9df3182 100644 --- a/docs/source/onnx/tts/pretrained_models/matcha.rst +++ b/docs/source/onnx/tts/pretrained_models/matcha.rst @@ -166,6 +166,8 @@ Generate speech with Python script +.. _matcha-icefall-zh-baker: + matcha-icefall-zh-baker (Chinese, 1 female speaker) --------------------------------------------------- @@ -368,3 +370,32 @@ After running, it will generate a file ``matcha-baker-zh-2.wav`` in the current + +RTF on Raspberry Pi 4 Model B Rev 1.5 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We use the following command to test the RTF of this model on Raspberry Pi 4 Model B Rev 1.5: + +.. code-block:: bash + + + for t in 1 2 3 4; do + build/bin/sherpa-onnx-offline-tts \ + --num-threads=$t \ + --matcha-acoustic-model=./matcha-icefall-zh-baker/model-steps-3.onnx \ + --matcha-vocoder=./hifigan_v2.onnx \ + --matcha-lexicon=./matcha-icefall-zh-baker/lexicon.txt \ + --matcha-tokens=./matcha-icefall-zh-baker/tokens.txt \ + --matcha-dict-dir=./matcha-icefall-zh-baker/dict \ + --output-filename=./matcha-baker-0.wav \ + "当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受 着生命的奇迹与温柔." + done + +The results are given below: + + +-------------+-------+-------+-------+-------+ + | num_threads | 1 | 2 | 3 | 4 | + +=============+=======+=======+=======+=======+ + | RTF | 0.892 | 0.536 | 0.432 | 0.391 | + +-------------+-------+-------+-------+-------+ + diff --git a/docs/source/onnx/tts/pretrained_models/rtf.rst b/docs/source/onnx/tts/pretrained_models/rtf.rst new file mode 100644 index 000000000..43a7577c0 --- /dev/null +++ b/docs/source/onnx/tts/pretrained_models/rtf.rst @@ -0,0 +1,94 @@ +RTF of pre-trained models +========================== + +The following table lists the RTF of pre-trained models on +``Raspberry Pi 4 Model B Rev 1.5``. + +.. list-table:: + + * - Number of threads + - 1 + - 2 + - 3 + - 4 + - + * - :ref:`vits-melo-tts-zh_en` + - 6.727 + - 3.877 + - 2.914 + - 2.518 + - 163 MB + * - :ref:`vits-piper-en_US-glados` + - 0.812 + - 0.480 + - 0.391 + - 0.349 + - 61 MB + * - :ref:`vits-piper-en_US-libritts_r-medium` + - 0.790 + - 0.493 + - 0.392 + - 0.357 + - 75 MB + * - :ref:`vits-model-vits-ljspeech` + - 6.057 + - 3.517 + - 2.535 + - 2.206 + - 109 MB + * - :ref:`vits-model-vits-vctk` + - 6.079 + - 3.483 + - 2.537 + - 2.226 + - 116 MB + * - :ref:`sherpa-onnx-vits-zh-ll` + - 4.275 + - 2.494 + - 1.840 + - 1.593 + - 116 MB + * - :ref:`vits-zh-hf-fanchen-C` + - 4.306 + - 2.451 + - 1.846 + - 1.600 + - 116 MB + * - :ref:`vits-zh-hf-fanchen-wnj` + - 4.276 + - 2.505 + - 1.827 + - 1.608 + - 116 MB + * - :ref:`vits-zh-hf-theresa` + - 6.032 + - 3.448 + - 2.566 + - 2.210 + - 117 MB + * - :ref:`vits-zh-hf-eula` + - 6.011 + - 3.473 + - 2.537 + - 2.231 + - 117 MB + * - :ref:`vits-model-aishell3` + - 0.365 + - 0.220 + - 0.171 + - 0.156 + - 30 MB + * - :ref:`vits-model-en_US-lessac-medium` + - 0.774 + - 0.482 + - 0.390 + - 0.357 + - 61 MB + * - :ref:`matcha-icefall-zh-baker` + - 0.892 + - 0.536 + - 0.432 + - 0.391 + - 73 MB + + diff --git a/docs/source/onnx/tts/pretrained_models/vits.rst b/docs/source/onnx/tts/pretrained_models/vits.rst index ad8efda22..53fca2c15 100644 --- a/docs/source/onnx/tts/pretrained_models/vits.rst +++ b/docs/source/onnx/tts/pretrained_models/vits.rst @@ -17,7 +17,6 @@ The following table summarizes the information of all models in this page. You can try all the models at the following huggingface space. ``_. - .. hint:: You can find Android APKs for each model at the following page @@ -328,6 +327,31 @@ After running, it will generate a file ``zh-en-3.wav`` in the current directory. +RTF on Raspberry Pi 4 Model B Rev 1.5 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We use the following command to test the RTF of this model on Raspberry Pi 4 Model B Rev 1.5: + +.. code-block:: bash + + for t in 1 2 3 4; do + ./build/bin/sherpa-onnx-offline-tts \ + --num-threads=$t \ + --vits-model=./vits-melo-tts-zh_en/model.onnx \ + --vits-lexicon=./vits-melo-tts-zh_en/lexicon.txt \ + --vits-tokens=./vits-melo-tts-zh_en/tokens.txt \ + --vits-dict-dir=./vits-melo-tts-zh_en/dict \ + "当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与 温柔." + done + +The results are given below: + + +-------------+-------+-------+-------+-------+ + | num_threads | 1 | 2 | 3 | 4 | + +=============+=======+=======+=======+=======+ + | RTF | 6.727 | 3.877 | 2.914 | 2.518 | + +-------------+-------+-------+-------+-------+ + .. _vits-piper-en_US-glados: vits-piper-en_US-glados (English, 1 speaker) @@ -565,6 +589,30 @@ and ``glados-bug.wav`` in the current directory. +RTF on Raspberry Pi 4 Model B Rev 1.5 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We use the following command to test the RTF of this model on Raspberry Pi 4 Model B Rev 1.5: + +.. code-block:: bash + + for t in 1 2 3 4; do + ./build/bin/sherpa-onnx-offline-tts \ + --num-threads=$t \ + --vits-model=./vits-piper-en_US-glados/en_US-glados.onnx\ + --vits-tokens=./vits-piper-en_US-glados/tokens.txt \ + --vits-data-dir=./vits-piper-en_US-glados/espeak-ng-data \ + "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." + done + +The results are given below: + + +-------------+-------+-------+-------+-------+ + | num_threads | 1 | 2 | 3 | 4 | + +=============+=======+=======+=======+=======+ + | RTF | 0.812 | 0.480 | 0.391 | 0.349 | + +-------------+-------+-------+-------+-------+ + .. _vits-piper-en_US-libritts_r-medium: vits-piper-en_US-libritts_r-medium (English, 904 speakers) @@ -773,6 +821,29 @@ and ``libritts-armstrong-500.wav`` in the current directory. +RTF on Raspberry Pi 4 Model B Rev 1.5 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We use the following command to test the RTF of this model on Raspberry Pi 4 Model B Rev 1.5: + +.. code-block:: bash + + for t in 1 2 3 4; do + ./build/bin/sherpa-onnx-offline-tts \ + --num-threads=$t \ + --vits-model=./vits-piper-en_US-libritts_r-medium/en_US-libritts_r-medium.onnx \ + --vits-tokens=./vits-piper-en_US-libritts_r-medium/tokens.txt \ + --vits-data-dir=./vits-piper-en_US-libritts_r-medium/espeak-ng-data \ + "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." + done + +The results are given below: + + +-------------+-------+-------+-------+-------+ + | num_threads | 1 | 2 | 3 | 4 | + +=============+=======+=======+=======+=======+ + | RTF | 0.790 | 0.493 | 0.392 | 0.357 | + +-------------+-------+-------+-------+-------+ .. _vits-model-vits-ljspeech: @@ -912,6 +983,30 @@ After running, it will generate a file ``armstrong.wav`` in the current director +RTF on Raspberry Pi 4 Model B Rev 1.5 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We use the following command to test the RTF of this model on Raspberry Pi 4 Model B Rev 1.5: + +.. code-block:: bash + + for t in 1 2 3 4; do + ./build/bin/sherpa-onnx-offline-tts \ + --num-threads=$t \ + --vits-model=./vits-ljs/vits-ljs.onnx \ + --vits-lexicon=./vits-ljs/lexicon.txt \ + --vits-tokens=./vits-ljs/tokens.txt \ + "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." + done + +The results are given below: + + +-------------+-------+-------+-------+-------+ + | num_threads | 1 | 2 | 3 | 4 | + +=============+=======+=======+=======+=======+ + | RTF | 6.057 | 3.517 | 2.535 | 2.206 | + +-------------+-------+-------+-------+-------+ + .. _vits-model-vits-vctk: VCTK (English, multi-speaker, 109 speakers) @@ -1116,6 +1211,30 @@ It will generate 3 files: ``einstein-30.wav``, ``franklin-66.wav``, and ``martin +RTF on Raspberry Pi 4 Model B Rev 1.5 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We use the following command to test the RTF of this model on Raspberry Pi 4 Model B Rev 1.5: + +.. code-block:: bash + + for t in 1 2 3 4; do + ./build/bin/sherpa-onnx-offline-tts \ + --num-threads=$t \ + --vits-model=./vits-vctk/vits-vctk.onnx \ + --vits-lexicon=./vits-vctk/lexicon.txt \ + --vits-tokens=./vits-vctk/tokens.txt \ + "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." + done + +The results are given below: + + +-------------+-------+-------+-------+-------+ + | num_threads | 1 | 2 | 3 | 4 | + +=============+=======+=======+=======+=======+ + | RTF | 6.079 | 3.483 | 2.537 | 2.226 | + +-------------+-------+-------+-------+-------+ + .. _sherpa-onnx-vits-zh-ll: csukuangfj/sherpa-onnx-vits-zh-ll (Chinese, 5 speakers) @@ -1275,6 +1394,31 @@ Please check the file sizes of the downloaded model: +RTF on Raspberry Pi 4 Model B Rev 1.5 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We use the following command to test the RTF of this model on Raspberry Pi 4 Model B Rev 1.5: + +.. code-block:: bash + + for t in 1 2 3 4; do + ./build/bin/sherpa-onnx-offline-tts \ + --num-threads=$t \ + --vits-model=./sherpa-onnx-vits-zh-ll/model.onnx \ + --vits-dict-dir=./sherpa-onnx-vits-zh-ll/dict \ + --vits-lexicon=./sherpa-onnx-vits-zh-ll/lexicon.txt \ + --vits-tokens=./sherpa-onnx-vits-zh-ll/tokens.txt \ + '当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔.' + done + +The results are given below: + + +-------------+-------+-------+-------+-------+ + | num_threads | 1 | 2 | 3 | 4 | + +=============+=======+=======+=======+=======+ + | RTF | 4.275 | 2.494 | 1.840 | 1.593 | + +-------------+-------+-------+-------+-------+ + .. _vits-zh-hf-fanchen-C: csukuangfj/vits-zh-hf-fanchen-C (Chinese, 187 speakers) @@ -1435,6 +1579,31 @@ You can download the model using the following commands:: +RTF on Raspberry Pi 4 Model B Rev 1.5 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We use the following command to test the RTF of this model on Raspberry Pi 4 Model B Rev 1.5: + +.. code-block:: bash + + for t in 1 2 3 4; do + ./build/bin/sherpa-onnx-offline-tts \ + --num-threads=$t \ + --vits-model=./vits-zh-hf-fanchen-C/vits-zh-hf-fanchen-C.onnx \ + --vits-dict-dir=./vits-zh-hf-fanchen-C/dict \ + --vits-lexicon=./vits-zh-hf-fanchen-C/lexicon.txt \ + --vits-tokens=./vits-zh-hf-fanchen-C/tokens.txt \ + "当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与 温柔." + done + +The results are given below: + + +-------------+-------+-------+-------+-------+ + | num_threads | 1 | 2 | 3 | 4 | + +=============+=======+=======+=======+=======+ + | RTF | 4.306 | 2.451 | 1.846 | 1.600 | + +-------------+-------+-------+-------+-------+ + .. _vits-zh-hf-fanchen-wnj: csukuangfj/vits-zh-hf-fanchen-wnj (Chinese, 1 male) @@ -1523,6 +1692,31 @@ You can download the model using the following commands:: +RTF on Raspberry Pi 4 Model B Rev 1.5 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We use the following command to test the RTF of this model on Raspberry Pi 4 Model B Rev 1.5: + +.. code-block:: bash + + for t in 1 2 3 4; do + ./build/bin/sherpa-onnx-offline-tts \ + --num-threads=$t \ + --vits-model=./vits-zh-hf-fanchen-wnj/vits-zh-hf-fanchen-wnj.onnx \ + --vits-dict-dir=./vits-zh-hf-fanchen-wnj/dict \ + --vits-lexicon=./vits-zh-hf-fanchen-wnj/lexicon.txt \ + --vits-tokens=./vits-zh-hf-fanchen-wnj/tokens.txt \ + "当夜幕 降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的 奇迹与温柔." + done + +The results are given below: + + +-------------+-------+-------+-------+-------+ + | num_threads | 1 | 2 | 3 | 4 | + +=============+=======+=======+=======+=======+ + | RTF | 4.276 | 2.505 | 1.827 | 1.608 | + +-------------+-------+-------+-------+-------+ + .. _vits-zh-hf-theresa: csukuangfj/vits-zh-hf-theresa (Chinese, 804 speakers) @@ -1614,6 +1808,32 @@ You can download the model with the following commands:: + +RTF on Raspberry Pi 4 Model B Rev 1.5 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We use the following command to test the RTF of this model on Raspberry Pi 4 Model B Rev 1.5: + +.. code-block:: bash + + for t in 1 2 3 4; do + ./build/bin/sherpa-onnx-offline-tts \ + --num-threads=$t \ + --vits-model=./vits-zh-hf-theresa/theresa.onnx \ + --vits-dict-dir=./vits-zh-hf-theresa/dict \ + --vits-lexicon=./vits-zh-hf-theresa/lexicon.txt \ + --vits-tokens=./vits-zh-hf-theresa/tokens.txt \ + "当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与 温柔." + done + +The results are given below: + + +-------------+-------+-------+-------+-------+ + | num_threads | 1 | 2 | 3 | 4 | + +=============+=======+=======+=======+=======+ + | RTF | 6.032 | 3.448 | 2.566 | 2.210 | + +-------------+-------+-------+-------+-------+ + .. _vits-zh-hf-eula: csukuangfj/vits-zh-hf-eula (Chinese, 804 speakers) @@ -1706,6 +1926,30 @@ You can download the model using the following commands:: +RTF on Raspberry Pi 4 Model B Rev 1.5 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We use the following command to test the RTF of this model on Raspberry Pi 4 Model B Rev 1.5: + +.. code-block:: bash + + for t in 1 2 3 4; do + ./build/bin/sherpa-onnx-offline-tts \ + --num-threads=$t \ + --vits-model=./vits-zh-hf-eula/eula.onnx \ + --vits-dict-dir=./vits-zh-hf-eula/dict \ + --vits-lexicon=./vits-zh-hf-eula/lexicon.txt \ + --vits-tokens=./vits-zh-hf-eula/tokens.txt \ + "当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与 温柔." + done + +The results are given below: + + +-------------+-------+-------+-------+-------+ + | num_threads | 1 | 2 | 3 | 4 | + +=============+=======+=======+=======+=======+ + | RTF | 6.011 | 3.473 | 2.537 | 2.231 | + +-------------+-------+-------+-------+-------+ .. _vits-model-aishell3: @@ -1986,6 +2230,32 @@ The Python script also supports rule-based text normalization. +RTF on Raspberry Pi 4 Model B Rev 1.5 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We use the following command to test the RTF of this model on Raspberry Pi 4 Model B Rev 1.5: + +.. code-block:: bash + + for t in 1 2 3 4; do + build/bin/sherpa-onnx-offline-tts \ + --num-threads=$t \ + --vits-model=./vits-icefall-zh-aishell3/model.onnx \ + --vits-lexicon=./vits-icefall-zh-aishell3/lexicon.txt \ + --vits-tokens=./vits-icefall-zh-aishell3/tokens.txt \ + --sid=66 \ + "当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔." + done + + +The results are given below: + + +-------------+-------+-------+-------+-------+ + | num_threads | 1 | 2 | 3 | 4 | + +=============+=======+=======+=======+=======+ + | RTF | 0.365 | 0.220 | 0.171 | 0.156 | + +-------------+-------+-------+-------+-------+ + .. _vits-model-en_US-lessac-medium: en_US-lessac-medium (English, single-speaker) @@ -2154,3 +2424,27 @@ After running, it will generate a file ``armstrong-piper-en_US-lessac-medium.wav + +RTF on Raspberry Pi 4 Model B Rev 1.5 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We use the following command to test the RTF of this model on Raspberry Pi 4 Model B Rev 1.5: + +.. code-block:: bash + + for t in 1 2 3 4; do + ./build/bin/sherpa-onnx-offline-tts \ + --num-threads=$t \ + --vits-model=./vits-piper-en_US-lessac-medium/en_US-lessac-medium.onnx \ + --vits-data-dir=./vits-piper-en_US-lessac-medium/espeak-ng-data \ + --vits-tokens=./vits-piper-en_US-lessac-medium/tokens.txt \ + "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." + done + +The results are given below: + + +-------------+-------+-------+-------+-------+ + | num_threads | 1 | 2 | 3 | 4 | + +=============+=======+=======+=======+=======+ + | RTF | 0.774 | 0.482 | 0.390 | 0.357 | + +-------------+-------+-------+-------+-------+