[ros_speech_recognition] Add vosk engine #462

nakane11 · 2023-05-07T06:15:28Z

This PR enables vosk engine in speech recognition.
More APIs became available after SpeechRecognition==3.9.0.
Vosk is a speech recognition toolkit which works offline and supports Japanese.

Sample (tested with PR2)

$ wget https://alphacephei.com/vosk/models/vosk-model-small-ja-0.22.zip -P /tmp
$ unzip /tmp/vosk-model-small-ja-0.22.zip -d /tmp

<launch>
  <arg name="audio_topic" default="/audio" doc="Name of audio topic captured from microphone" />
  <arg name="voice_topic" default="/speech_to_text" doc="Name of text topic of recognized speech" />
  <arg name="n_channel" default="1" doc="Number of channels of audio topic and microphone. '$ pactl list short sinks' to check your hardware" />
  <arg name="depth" default="16" doc="Bit depth of audio topic and microphone. '$ pactl list short sinks' to check your hardware" />
  <arg name="sample_rate" default="16000" doc="Frame rate of audio topic and microphone. '$ pactl list short sinks' to check your hardware"/>
  <arg name="device" default="" doc="Card and device number of microphone (e.g. hw:0,0). you can check card number and device number by '$ arecord -l', then uses hw:[card number],[device number]" />
  <arg name="engine" default="Vosk" doc="Speech to text engine. TTS engine, Google, GoogleCloud, Sphinx, Wit, Bing Houndify, IBM" />
  <arg name="language" default="en-US" doc="Speech to text language. For Japanese, set ja" />
  <arg name="continuous" default="true" doc="If false, /speech_recognition service is published. If true, /speech_to_text topic is published." />
  <arg name="auto_start" default="true" doc="Whether speech_recognition starts automatically or not. This parameter works when continuous is true" />

  <arg name="self_cancellation" default="true" doc="Do not recognize the audio when robot is speaking or not." />
  <arg name="tts_tolerance" default="1.0" doc="Tolerance second for recognizing whether robot is speaking or not" />
  <arg name="tts_action_names" default="['sound_play']" doc="tts action name. these servers outputs are ignored by sound_recognition" />

  <node name="speech_recognition"
        pkg="ros_speech_recognition" type="speech_recognition_node.py"
        respawn="true"
        output="screen">
    <rosparam subst_value="true">
      audio_topic: $(arg audio_topic)
      voice_topic: $(arg voice_topic)
      n_channel: $(arg n_channel)
      depth: $(arg depth)
      sample_rate: $(arg sample_rate)
      engine: $(arg engine)
      language: $(arg language)
      continuous: $(arg continuous)
      auto_start: $(arg auto_start)
      self_cancellation: $(arg self_cancellation)
      tts_tolerance: $(arg tts_tolerance)
      tts_action_names: $(arg tts_action_names)
      vosk_model_path: /tmp/vosk-model-small-ja-0.22
    </rosparam>
  </node>

</launch>

nakane11 · 2023-05-07T06:32:16Z

Misrecognition for speech is few, but to avoid it for silence, I use a filter as https://github.com/nakane11/navigation_pr2/blob/d57d4253b36c96a0679753cda7a254dd8e333c1d/node_scripts/filter_vosk.py

mqcmd196 · 2023-05-09T07:46:41Z

How about download model when launch this node at first

mqcmd196

How about adding a script to download vosk models to ~/.ros/ like jsk_recognition and select models with language arg?

nakane11 · 2023-05-09T09:43:38Z

Thank you!
It's a good idea. I'll update it.

nakane11 · 2023-05-12T07:53:22Z

@mqcmd196
If vosk_model_path is specified, load it as priority.
If vosk_model_path is none and language is en-US or ja, models already downloaded to trained_data are used.

ros_speech_recognition/src/ros_speech_recognition/recognize_vosk.py

tkmtnt7000 · 2023-05-16T03:40:12Z

Current implementation, I got the following error when I don't set ~vosk_model_path manually as README say.

tsukamoto@tsukamoto-desktop-ryzen ~/ros/fetch_ws/src/jsk-ros-pkg/jsk_3rdparty/ros_speech_recognition/src (vosk *%) 
$ ROSCONSOLE_FORMAT='[${severity}] [${time}] [${node}]: [${message}]' roslaunch ros_speech_recognition speech_recognition.launch engine:=Vosk launch_sound_play:=false language:=ja
... logging to /home/tsukamoto/.ros/log/d506cd58-f38e-11ed-9492-937962485673/roslaunch-tsukamoto-desktop-ryzen-1067313.log
Checking log directory for disk usage. This may take a while.
Press Ctrl-C to interrupt
WARNING: disk usage in log directory [/home/tsukamoto/.ros/log] is over 1GB.
It's recommended that you use the 'rosclean' command.

started roslaunch server http://tsukamoto-desktop-ryzen:43425/

SUMMARY
========

PARAMETERS
 * /audio_capture/channels: 1
 * /audio_capture/depth: 16
 * /audio_capture/device: 
 * /audio_capture/format: wave
 * /audio_capture/sample_rate: 16000
 * /rosdistro: noetic
 * /rosversion: 1.16.0
 * /speech_recognition/audio_topic: /audio
 * /speech_recognition/auto_start: True
 * /speech_recognition/continuous: True
 * /speech_recognition/depth: 16
 * /speech_recognition/enable_sound_effect: False
 * /speech_recognition/engine: Vosk
 * /speech_recognition/language: ja
 * /speech_recognition/n_channel: 1
 * /speech_recognition/sample_rate: 16000
 * /speech_recognition/self_cancellation: True
 * /speech_recognition/tts_action_names: ['sound_play']
 * /speech_recognition/tts_tolerance: 1.0
 * /speech_recognition/voice_topic: /speech_to_text

NODES
  /
    audio_capture (audio_capture/audio_capture)
    speech_recognition (ros_speech_recognition/speech_recognition_node.py)
    speech_recognition_candidates_to_string (ros_speech_recognition/speech_recognition_candidates_to_string.py)

ROS_MASTER_URI=http://localhost:11311

process[audio_capture-1]: started with pid [1067327]
process[speech_recognition-2]: started with pid [1067328]
process[speech_recognition_candidates_to_string-3]: started with pid [1067333]
[WARN] [1684207609.205358] [/speech_recognition_candidates_to_string]: [[/speech_recognition_candidates_to_string] subscribes topics only with child subscribers. Set '~always_subscribe' as True to have it subscribe always.]
[ERROR] [1684207609.253494] [/speech_recognition]: [action 'sound_play' is not initialized.]
[INFO] [1684207609.270212] [/speech_recognition]: [Enabled continuous mode]
[INFO] [1684207609.270874] [/speech_recognition]: [Auto start: True]
[INFO] [1684207609.753036] [/speech_recognition]: [Set minimum energy threshold to 358.4727721173039]
[WARN] [1684207613.506773] [/speech_recognition]: [data_path: /home/tsukamoto/ros/fetch_ws/src/jsk-ros-pkg/jsk_3rdparty/ros_speech_recognition/trained_data]
[WARN] [1684207613.507372] [/speech_recognition]: [model_path_before: None]
[INFO] [1684207613.507909] [/speech_recognition]: [Loading model from /home/tsukamoto/ros/fetch_ws/src/jsk-ros-pkg/jsk_3rdparty/ros_speech_recognition/trained_data/vosk-model-small-ja-0.22]
LOG (VoskAPI:ReadDataFiles():model.cc:213) Decoding params beam=13 max-active=7000 lattice-beam=4
LOG (VoskAPI:ReadDataFiles():model.cc:216) Silence phones 1:2:3:4:5:6:7:8:9:10
LOG (VoskAPI:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes.
LOG (VoskAPI:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components.
LOG (VoskAPI:ReadDataFiles():model.cc:248) Loading i-vector extractor from /home/tsukamoto/ros/fetch_ws/src/jsk-ros-pkg/jsk_3rdparty/ros_speech_recognition/trained_data/vosk-model-small-ja-0.22/ivector/final.ie
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (VoskAPI:ReadDataFiles():model.cc:282) Loading HCL and G from /home/tsukamoto/ros/fetch_ws/src/jsk-ros-pkg/jsk_3rdparty/ros_speech_recognition/trained_data/vosk-model-small-ja-0.22/graph/HCLr.fst /home/tsukamoto/ros/fetch_ws/src/jsk-ros-pkg/jsk_3rdparty/ros_speech_recognition/trained_data/vosk-model-small-ja-0.22/graph/Gr.fst
LOG (VoskAPI:ReadDataFiles():model.cc:308) Loading winfo /home/tsukamoto/ros/fetch_ws/src/jsk-ros-pkg/jsk_3rdparty/ros_speech_recognition/trained_data/vosk-model-small-ja-0.22/graph/phones/word_boundary.int
[INFO] [1684207614.227814] [/speech_recognition]: [Result: b'\xe3\x81\x82\xe3\x81\x82']
[INFO] [1684207619.698366] [/speech_recognition]: [Loading model from None]
lang None does not exist
[ERROR] [1684207622.279914] [/speech_recognition]: [Unexpected error: (<class 'SystemExit'>, SystemExit(1), <traceback object at 0x7f722adf6b40>)]
Exception ignored in: <function Model.__del__ at 0x7f722cb3faf0>
Traceback (most recent call last):
  File "/home/tsukamoto/ros/fetch_ws/devel/.private/ros_speech_recognition/share/ros_speech_recognition/venv/lib/python3.8/site-packages/vosk/__init__.py", line 60, in __del__
    _c.vosk_model_free(self._handle)
AttributeError: 'Model' object has no attribute '_handle'
[INFO] [1684207634.369576] [/speech_recognition]: [Loading model from None]
lang None does not exist
[ERROR] [1684207635.364335] [/speech_recognition]: [Unexpected error: (<class 'SystemExit'>, SystemExit(1), <traceback object at 0x7f722b0571c0>)]
Exception ignored in: <function Model.__del__ at 0x7f722cb3faf0>
Traceback (most recent call last):
  File "/home/tsukamoto/ros/fetch_ws/devel/.private/ros_speech_recognition/share/ros_speech_recognition/venv/lib/python3.8/site-packages/vosk/__init__.py", line 60, in __del__
    _c.vosk_model_free(self._handle)
AttributeError: 'Model' object has no attribute '_handle'
[INFO] [1684207638.883466] [/speech_recognition]: [Loading model from None]
lang None does not exist
[ERROR] [1684207640.547415] [/speech_recognition]: [Unexpected error: (<class 'SystemExit'>, SystemExit(1), <traceback object at 0x7f7229de8e00>)]
Exception ignored in: <function Model.__del__ at 0x7f722cb3faf0>
Traceback (most recent call last):
  File "/home/tsukamoto/ros/fetch_ws/devel/.private/ros_speech_recognition/share/ros_speech_recognition/venv/lib/python3.8/site-packages/vosk/__init__.py", line 60, in __del__
    _c.vosk_model_free(self._handle)
AttributeError: 'Model' object has no attribute '_handle'
^C[speech_recognition_candidates_to_string-3] killing on exit
[speech_recognition-2] killing on exit
[audio_capture-1] killing on exit
[speech_recognition-2] escalating to SIGTERM
[speech_recognition-2] escalating to SIGKILL
Shutdown errors:
 * process[speech_recognition-2, pid 1067328]: required SIGKILL. May still be running.
shutting down processing monitor...
... shutting down processing monitor complete
done

ros_speech_recognition/src/ros_speech_recognition/recognize_vosk.py

tkmtnt7000 · 2023-05-19T02:50:38Z

Test fails on venv locking dependencies.

[ros_speech_recognition:results] Full test results for 'test_results/ros_speech_recognition/venv_check-ros_speech_recognition-requirements.xml'
[ros_speech_recognition:results] -------------------------------------------------
[ros_speech_recognition:results] <?xml version='1.0' encoding='utf-8'?>
[ros_speech_recognition:results] <testsuite name="venv_check" tests="1" failures="1" errors="0"><testcase name="check_locked" classname="catkin_virtualenv.Venv"><failure message="/home/tsukamoto/ros/fetch_ws/src/jsk-ros-pkg/jsk_3rdparty/ros_speech_recognition/requirements.txt is not fully locked">Consider defining INPUT_REQUIREMENTS to have catkin_virtualenv generate a lock file for this package.
[ros_speech_recognition:results] See https://github.com/locusrobotics/catkin_virtualenv/blob/master/README.md#locking-dependencies.
[ros_speech_recognition:results] The following changes would fully lock /home/tsukamoto/ros/fetch_ws/src/jsk-ros-pkg/jsk_3rdparty/ros_speech_recognition/requirements.txt:
[ros_speech_recognition:results] --- 
[ros_speech_recognition:results] 
[ros_speech_recognition:results] +++ 
[ros_speech_recognition:results] 
[ros_speech_recognition:results] @@ -1,2 +1,12 @@
[ros_speech_recognition:results] 
[ros_speech_recognition:results] +certifi==2023.5.7
[ros_speech_recognition:results] +cffi==1.15.1
[ros_speech_recognition:results] +charset-normalizer==3.1.0
[ros_speech_recognition:results] +idna==3.4
[ros_speech_recognition:results] +pycparser==2.21
[ros_speech_recognition:results] +requests==2.30.0
[ros_speech_recognition:results]  speechrecognition==3.9.0
[ros_speech_recognition:results] +srt==3.5.3
[ros_speech_recognition:results] +tqdm==4.65.0
[ros_speech_recognition:results] +urllib3==2.0.2
[ros_speech_recognition:results]  vosk==0.3.45
[ros_speech_recognition:results] +websockets==11.0.3</failure></testcase></testsuite>
[ros_speech_recognition:results] -------------------------------------------------
[ros_speech_recognition:results] test_results/ros_speech_recognition/venv_check-ros_speech_recognition-requirements.xml: 1 tests, 0 errors, 1 failures, 0 skipped
[ros_speech_recognition:results] Summary: 4 tests, 0 errors, 1 failures, 0 sk

You should set CHECK_VENV as FALSE in CMakeLists.txt or write other dependencies the test suggested to requirements.txt

Co-authored-by: Naoto Tsukamoto <[email protected]>

nakane11 · 2023-05-21T05:19:06Z

@tkmtnt7000 Thank you. I set CHECK_VENV as FALSE.

tkmtnt7000 · 2023-05-31T07:51:10Z

ros_speech_recognition/requirements.txt

+SpeechRecognition==3.9.0
+vosk==0.3.45


I am very sorry for too late comment.
I just found that both SpeechRecognition==3.9.0 and vosk are not compatible with python 3.4, so indigo test will surely fail.

Drop indigo support like 2c9e010

Support indigo like e8742b0

I think it may be ok to drop indigo support dependent on #471 discussion.

k-okada · 2023-06-08T10:07:55Z

closed via #474

github-actions bot added the ros_speech_recognition label May 7, 2023

mqcmd196 requested changes May 9, 2023

View reviewed changes

mqcmd196 added the WaitForFix label May 12, 2023

mqcmd196 force-pushed the vosk branch from 81c92b8 to 5cab532 Compare May 16, 2023 01:28

tkmtnt7000 reviewed May 16, 2023

View reviewed changes

ros_speech_recognition/src/ros_speech_recognition/recognize_vosk.py Outdated Show resolved Hide resolved

tkmtnt7000 requested changes May 16, 2023

View reviewed changes

ros_speech_recognition/src/ros_speech_recognition/recognize_vosk.py Show resolved Hide resolved

tkmtnt7000 reviewed May 16, 2023

View reviewed changes

ros_speech_recognition/src/ros_speech_recognition/recognize_vosk.py Outdated Show resolved Hide resolved

tkmtnt7000 added the WaitForCI label May 18, 2023

nakane11 and others added 6 commits May 20, 2023 13:31

[ros_speech_recognition] Update dependencies

2199382

[ros_speech_recognition] Use vosk engine

e6956bc

[ros_speech_recognition] transform vosk result

7f1d084

[ros_speech_recognition] add vosk_model_path description to README

46bd2a2

[ros_speech_recognition] Install vosk models

a81d301

[ros_speech_recognition] Use installed vosk model

e181b80

Co-authored-by: Naoto Tsukamoto <[email protected]>

nakane11 force-pushed the vosk branch from 7b20eda to e181b80 Compare May 20, 2023 04:34

Merge branch 'master' into vosk

0952c9f

tkmtnt7000 reviewed May 31, 2023

View reviewed changes

k-okada mentioned this pull request Jun 7, 2023

[ros_speech_recognition] Add vosk engine #474

Merged

k-okada closed this Jun 8, 2023

nakane11 deleted the vosk branch June 8, 2023 11:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ros_speech_recognition] Add vosk engine #462

[ros_speech_recognition] Add vosk engine #462

nakane11 commented May 7, 2023 •

edited

Loading

nakane11 commented May 7, 2023

mqcmd196 commented May 9, 2023

mqcmd196 left a comment

nakane11 commented May 9, 2023

nakane11 commented May 12, 2023

tkmtnt7000 commented May 16, 2023

tkmtnt7000 commented May 19, 2023

nakane11 commented May 21, 2023

tkmtnt7000 May 31, 2023 •

edited

Loading

tkmtnt7000 May 31, 2023

k-okada commented Jun 8, 2023

		SpeechRecognition==3.9.0
		vosk==0.3.45

[ros_speech_recognition] Add vosk engine #462

[ros_speech_recognition] Add vosk engine #462

Conversation

nakane11 commented May 7, 2023 • edited Loading

Sample (tested with PR2)

nakane11 commented May 7, 2023

mqcmd196 commented May 9, 2023

mqcmd196 left a comment

Choose a reason for hiding this comment

nakane11 commented May 9, 2023

nakane11 commented May 12, 2023

tkmtnt7000 commented May 16, 2023

tkmtnt7000 commented May 19, 2023

nakane11 commented May 21, 2023

tkmtnt7000 May 31, 2023 • edited Loading

Choose a reason for hiding this comment

tkmtnt7000 May 31, 2023

Choose a reason for hiding this comment

k-okada commented Jun 8, 2023

nakane11 commented May 7, 2023 •

edited

Loading

tkmtnt7000 May 31, 2023 •

edited

Loading