Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ros_speech_recognition] Add vosk engine #462

Closed
wants to merge 7 commits into from

Conversation

nakane11
Copy link
Member

@nakane11 nakane11 commented May 7, 2023

This PR enables vosk engine in speech recognition.
More APIs became available after SpeechRecognition==3.9.0.
Vosk is a speech recognition toolkit which works offline and supports Japanese.

Sample (tested with PR2)

$ wget https://alphacephei.com/vosk/models/vosk-model-small-ja-0.22.zip -P /tmp
$ unzip /tmp/vosk-model-small-ja-0.22.zip -d /tmp
<launch>
  <arg name="audio_topic" default="/audio" doc="Name of audio topic captured from microphone" />
  <arg name="voice_topic" default="/speech_to_text" doc="Name of text topic of recognized speech" />
  <arg name="n_channel" default="1" doc="Number of channels of audio topic and microphone. '$ pactl list short sinks' to check your hardware" />
  <arg name="depth" default="16" doc="Bit depth of audio topic and microphone. '$ pactl list short sinks' to check your hardware" />
  <arg name="sample_rate" default="16000" doc="Frame rate of audio topic and microphone. '$ pactl list short sinks' to check your hardware"/>
  <arg name="device" default="" doc="Card and device number of microphone (e.g. hw:0,0). you can check card number and device number by '$ arecord -l', then uses hw:[card number],[device number]" />
  <arg name="engine" default="Vosk" doc="Speech to text engine. TTS engine, Google, GoogleCloud, Sphinx, Wit, Bing Houndify, IBM" />
  <arg name="language" default="en-US" doc="Speech to text language. For Japanese, set ja" />
  <arg name="continuous" default="true" doc="If false, /speech_recognition service is published. If true, /speech_to_text topic is published." />
  <arg name="auto_start" default="true" doc="Whether speech_recognition starts automatically or not. This parameter works when continuous is true" />

  <arg name="self_cancellation" default="true" doc="Do not recognize the audio when robot is speaking or not." />
  <arg name="tts_tolerance" default="1.0" doc="Tolerance second for recognizing whether robot is speaking or not" />
  <arg name="tts_action_names" default="['sound_play']" doc="tts action name. these servers outputs are ignored by sound_recognition" />

  <node name="speech_recognition"
        pkg="ros_speech_recognition" type="speech_recognition_node.py"
        respawn="true"
        output="screen">
    <rosparam subst_value="true">
      audio_topic: $(arg audio_topic)
      voice_topic: $(arg voice_topic)
      n_channel: $(arg n_channel)
      depth: $(arg depth)
      sample_rate: $(arg sample_rate)
      engine: $(arg engine)
      language: $(arg language)
      continuous: $(arg continuous)
      auto_start: $(arg auto_start)
      self_cancellation: $(arg self_cancellation)
      tts_tolerance: $(arg tts_tolerance)
      tts_action_names: $(arg tts_action_names)
      vosk_model_path: /tmp/vosk-model-small-ja-0.22
    </rosparam>
  </node>

</launch>

@nakane11
Copy link
Member Author

nakane11 commented May 7, 2023

Misrecognition for speech is few, but to avoid it for silence, I use a filter as https://github.com/nakane11/navigation_pr2/blob/d57d4253b36c96a0679753cda7a254dd8e333c1d/node_scripts/filter_vosk.py

@mqcmd196
Copy link
Member

mqcmd196 commented May 9, 2023

How about download model when launch this node at first

Copy link
Member

@mqcmd196 mqcmd196 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a script to download vosk models to ~/.ros/ like jsk_recognition and select models with language arg?

@nakane11
Copy link
Member Author

nakane11 commented May 9, 2023

Thank you!
It's a good idea. I'll update it.

@nakane11
Copy link
Member Author

@mqcmd196
If vosk_model_path is specified, load it as priority.
If vosk_model_path is none and language is en-US or ja, models already downloaded to trained_data are used.

@tkmtnt7000
Copy link
Member

Current implementation, I got the following error when I don't set ~vosk_model_path manually as README say.

tsukamoto@tsukamoto-desktop-ryzen ~/ros/fetch_ws/src/jsk-ros-pkg/jsk_3rdparty/ros_speech_recognition/src (vosk *%) 
$ ROSCONSOLE_FORMAT='[${severity}] [${time}] [${node}]: [${message}]' roslaunch ros_speech_recognition speech_recognition.launch engine:=Vosk launch_sound_play:=false language:=ja
... logging to /home/tsukamoto/.ros/log/d506cd58-f38e-11ed-9492-937962485673/roslaunch-tsukamoto-desktop-ryzen-1067313.log
Checking log directory for disk usage. This may take a while.
Press Ctrl-C to interrupt
WARNING: disk usage in log directory [/home/tsukamoto/.ros/log] is over 1GB.
It's recommended that you use the 'rosclean' command.

started roslaunch server http://tsukamoto-desktop-ryzen:43425/

SUMMARY
========

PARAMETERS
 * /audio_capture/channels: 1
 * /audio_capture/depth: 16
 * /audio_capture/device: 
 * /audio_capture/format: wave
 * /audio_capture/sample_rate: 16000
 * /rosdistro: noetic
 * /rosversion: 1.16.0
 * /speech_recognition/audio_topic: /audio
 * /speech_recognition/auto_start: True
 * /speech_recognition/continuous: True
 * /speech_recognition/depth: 16
 * /speech_recognition/enable_sound_effect: False
 * /speech_recognition/engine: Vosk
 * /speech_recognition/language: ja
 * /speech_recognition/n_channel: 1
 * /speech_recognition/sample_rate: 16000
 * /speech_recognition/self_cancellation: True
 * /speech_recognition/tts_action_names: ['sound_play']
 * /speech_recognition/tts_tolerance: 1.0
 * /speech_recognition/voice_topic: /speech_to_text

NODES
  /
    audio_capture (audio_capture/audio_capture)
    speech_recognition (ros_speech_recognition/speech_recognition_node.py)
    speech_recognition_candidates_to_string (ros_speech_recognition/speech_recognition_candidates_to_string.py)

ROS_MASTER_URI=http://localhost:11311

process[audio_capture-1]: started with pid [1067327]
process[speech_recognition-2]: started with pid [1067328]
process[speech_recognition_candidates_to_string-3]: started with pid [1067333]
[WARN] [1684207609.205358] [/speech_recognition_candidates_to_string]: [[/speech_recognition_candidates_to_string] subscribes topics only with child subscribers. Set '~always_subscribe' as True to have it subscribe always.]
[ERROR] [1684207609.253494] [/speech_recognition]: [action 'sound_play' is not initialized.]
[INFO] [1684207609.270212] [/speech_recognition]: [Enabled continuous mode]
[INFO] [1684207609.270874] [/speech_recognition]: [Auto start: True]
[INFO] [1684207609.753036] [/speech_recognition]: [Set minimum energy threshold to 358.4727721173039]
[WARN] [1684207613.506773] [/speech_recognition]: [data_path: /home/tsukamoto/ros/fetch_ws/src/jsk-ros-pkg/jsk_3rdparty/ros_speech_recognition/trained_data]
[WARN] [1684207613.507372] [/speech_recognition]: [model_path_before: None]
[INFO] [1684207613.507909] [/speech_recognition]: [Loading model from /home/tsukamoto/ros/fetch_ws/src/jsk-ros-pkg/jsk_3rdparty/ros_speech_recognition/trained_data/vosk-model-small-ja-0.22]
LOG (VoskAPI:ReadDataFiles():model.cc:213) Decoding params beam=13 max-active=7000 lattice-beam=4
LOG (VoskAPI:ReadDataFiles():model.cc:216) Silence phones 1:2:3:4:5:6:7:8:9:10
LOG (VoskAPI:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes.
LOG (VoskAPI:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components.
LOG (VoskAPI:ReadDataFiles():model.cc:248) Loading i-vector extractor from /home/tsukamoto/ros/fetch_ws/src/jsk-ros-pkg/jsk_3rdparty/ros_speech_recognition/trained_data/vosk-model-small-ja-0.22/ivector/final.ie
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (VoskAPI:ReadDataFiles():model.cc:282) Loading HCL and G from /home/tsukamoto/ros/fetch_ws/src/jsk-ros-pkg/jsk_3rdparty/ros_speech_recognition/trained_data/vosk-model-small-ja-0.22/graph/HCLr.fst /home/tsukamoto/ros/fetch_ws/src/jsk-ros-pkg/jsk_3rdparty/ros_speech_recognition/trained_data/vosk-model-small-ja-0.22/graph/Gr.fst
LOG (VoskAPI:ReadDataFiles():model.cc:308) Loading winfo /home/tsukamoto/ros/fetch_ws/src/jsk-ros-pkg/jsk_3rdparty/ros_speech_recognition/trained_data/vosk-model-small-ja-0.22/graph/phones/word_boundary.int
[INFO] [1684207614.227814] [/speech_recognition]: [Result: b'\xe3\x81\x82\xe3\x81\x82']
[INFO] [1684207619.698366] [/speech_recognition]: [Loading model from None]
lang None does not exist
[ERROR] [1684207622.279914] [/speech_recognition]: [Unexpected error: (<class 'SystemExit'>, SystemExit(1), <traceback object at 0x7f722adf6b40>)]
Exception ignored in: <function Model.__del__ at 0x7f722cb3faf0>
Traceback (most recent call last):
  File "/home/tsukamoto/ros/fetch_ws/devel/.private/ros_speech_recognition/share/ros_speech_recognition/venv/lib/python3.8/site-packages/vosk/__init__.py", line 60, in __del__
    _c.vosk_model_free(self._handle)
AttributeError: 'Model' object has no attribute '_handle'
[INFO] [1684207634.369576] [/speech_recognition]: [Loading model from None]
lang None does not exist
[ERROR] [1684207635.364335] [/speech_recognition]: [Unexpected error: (<class 'SystemExit'>, SystemExit(1), <traceback object at 0x7f722b0571c0>)]
Exception ignored in: <function Model.__del__ at 0x7f722cb3faf0>
Traceback (most recent call last):
  File "/home/tsukamoto/ros/fetch_ws/devel/.private/ros_speech_recognition/share/ros_speech_recognition/venv/lib/python3.8/site-packages/vosk/__init__.py", line 60, in __del__
    _c.vosk_model_free(self._handle)
AttributeError: 'Model' object has no attribute '_handle'
[INFO] [1684207638.883466] [/speech_recognition]: [Loading model from None]
lang None does not exist
[ERROR] [1684207640.547415] [/speech_recognition]: [Unexpected error: (<class 'SystemExit'>, SystemExit(1), <traceback object at 0x7f7229de8e00>)]
Exception ignored in: <function Model.__del__ at 0x7f722cb3faf0>
Traceback (most recent call last):
  File "/home/tsukamoto/ros/fetch_ws/devel/.private/ros_speech_recognition/share/ros_speech_recognition/venv/lib/python3.8/site-packages/vosk/__init__.py", line 60, in __del__
    _c.vosk_model_free(self._handle)
AttributeError: 'Model' object has no attribute '_handle'
^C[speech_recognition_candidates_to_string-3] killing on exit
[speech_recognition-2] killing on exit
[audio_capture-1] killing on exit
[speech_recognition-2] escalating to SIGTERM
[speech_recognition-2] escalating to SIGKILL
Shutdown errors:
 * process[speech_recognition-2, pid 1067328]: required SIGKILL. May still be running.
shutting down processing monitor...
... shutting down processing monitor complete
done

@tkmtnt7000
Copy link
Member

Test fails on venv locking dependencies.

[ros_speech_recognition:results] Full test results for 'test_results/ros_speech_recognition/venv_check-ros_speech_recognition-requirements.xml'
[ros_speech_recognition:results] -------------------------------------------------
[ros_speech_recognition:results] <?xml version='1.0' encoding='utf-8'?>
[ros_speech_recognition:results] <testsuite name="venv_check" tests="1" failures="1" errors="0"><testcase name="check_locked" classname="catkin_virtualenv.Venv"><failure message="/home/tsukamoto/ros/fetch_ws/src/jsk-ros-pkg/jsk_3rdparty/ros_speech_recognition/requirements.txt is not fully locked">Consider defining INPUT_REQUIREMENTS to have catkin_virtualenv generate a lock file for this package.
[ros_speech_recognition:results] See https://github.com/locusrobotics/catkin_virtualenv/blob/master/README.md#locking-dependencies.
[ros_speech_recognition:results] The following changes would fully lock /home/tsukamoto/ros/fetch_ws/src/jsk-ros-pkg/jsk_3rdparty/ros_speech_recognition/requirements.txt:
[ros_speech_recognition:results] --- 
[ros_speech_recognition:results] 
[ros_speech_recognition:results] +++ 
[ros_speech_recognition:results] 
[ros_speech_recognition:results] @@ -1,2 +1,12 @@
[ros_speech_recognition:results] 
[ros_speech_recognition:results] +certifi==2023.5.7
[ros_speech_recognition:results] +cffi==1.15.1
[ros_speech_recognition:results] +charset-normalizer==3.1.0
[ros_speech_recognition:results] +idna==3.4
[ros_speech_recognition:results] +pycparser==2.21
[ros_speech_recognition:results] +requests==2.30.0
[ros_speech_recognition:results]  speechrecognition==3.9.0
[ros_speech_recognition:results] +srt==3.5.3
[ros_speech_recognition:results] +tqdm==4.65.0
[ros_speech_recognition:results] +urllib3==2.0.2
[ros_speech_recognition:results]  vosk==0.3.45
[ros_speech_recognition:results] +websockets==11.0.3</failure></testcase></testsuite>
[ros_speech_recognition:results] -------------------------------------------------
[ros_speech_recognition:results] test_results/ros_speech_recognition/venv_check-ros_speech_recognition-requirements.xml: 1 tests, 0 errors, 1 failures, 0 skipped
[ros_speech_recognition:results] Summary: 4 tests, 0 errors, 1 failures, 0 sk

You should set CHECK_VENV as FALSE in CMakeLists.txt or write other dependencies the test suggested to requirements.txt

@nakane11
Copy link
Member Author

@tkmtnt7000 Thank you. I set CHECK_VENV as FALSE.

Comment on lines +1 to +2
SpeechRecognition==3.9.0
vosk==0.3.45
Copy link
Member

@tkmtnt7000 tkmtnt7000 May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very sorry for too late comment.
I just found that both SpeechRecognition==3.9.0 and vosk are not compatible with python 3.4, so indigo test will surely fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Drop indigo support like 2c9e010
  2. Support indigo like e8742b0

I think it may be ok to drop indigo support dependent on #471 discussion.

@k-okada
Copy link
Member

k-okada commented Jun 8, 2023

closed via #474

@k-okada k-okada closed this Jun 8, 2023
@nakane11 nakane11 deleted the vosk branch June 8, 2023 11:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants