-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ros_speech_recognition] Add vosk engine #462
Conversation
Misrecognition for speech is few, but to avoid it for silence, I use a filter as https://github.com/nakane11/navigation_pr2/blob/d57d4253b36c96a0679753cda7a254dd8e333c1d/node_scripts/filter_vosk.py |
How about download model when launch this node at first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a script to download vosk models to ~/.ros/
like jsk_recognition
and select models with language
arg?
Thank you! |
@mqcmd196 |
ros_speech_recognition/src/ros_speech_recognition/recognize_vosk.py
Outdated
Show resolved
Hide resolved
Current implementation, I got the following error when I don't set tsukamoto@tsukamoto-desktop-ryzen ~/ros/fetch_ws/src/jsk-ros-pkg/jsk_3rdparty/ros_speech_recognition/src (vosk *%)
$ ROSCONSOLE_FORMAT='[${severity}] [${time}] [${node}]: [${message}]' roslaunch ros_speech_recognition speech_recognition.launch engine:=Vosk launch_sound_play:=false language:=ja
... logging to /home/tsukamoto/.ros/log/d506cd58-f38e-11ed-9492-937962485673/roslaunch-tsukamoto-desktop-ryzen-1067313.log
Checking log directory for disk usage. This may take a while.
Press Ctrl-C to interrupt
WARNING: disk usage in log directory [/home/tsukamoto/.ros/log] is over 1GB.
It's recommended that you use the 'rosclean' command.
started roslaunch server http://tsukamoto-desktop-ryzen:43425/
SUMMARY
========
PARAMETERS
* /audio_capture/channels: 1
* /audio_capture/depth: 16
* /audio_capture/device:
* /audio_capture/format: wave
* /audio_capture/sample_rate: 16000
* /rosdistro: noetic
* /rosversion: 1.16.0
* /speech_recognition/audio_topic: /audio
* /speech_recognition/auto_start: True
* /speech_recognition/continuous: True
* /speech_recognition/depth: 16
* /speech_recognition/enable_sound_effect: False
* /speech_recognition/engine: Vosk
* /speech_recognition/language: ja
* /speech_recognition/n_channel: 1
* /speech_recognition/sample_rate: 16000
* /speech_recognition/self_cancellation: True
* /speech_recognition/tts_action_names: ['sound_play']
* /speech_recognition/tts_tolerance: 1.0
* /speech_recognition/voice_topic: /speech_to_text
NODES
/
audio_capture (audio_capture/audio_capture)
speech_recognition (ros_speech_recognition/speech_recognition_node.py)
speech_recognition_candidates_to_string (ros_speech_recognition/speech_recognition_candidates_to_string.py)
ROS_MASTER_URI=http://localhost:11311
process[audio_capture-1]: started with pid [1067327]
process[speech_recognition-2]: started with pid [1067328]
process[speech_recognition_candidates_to_string-3]: started with pid [1067333]
[WARN] [1684207609.205358] [/speech_recognition_candidates_to_string]: [[/speech_recognition_candidates_to_string] subscribes topics only with child subscribers. Set '~always_subscribe' as True to have it subscribe always.]
[ERROR] [1684207609.253494] [/speech_recognition]: [action 'sound_play' is not initialized.]
[INFO] [1684207609.270212] [/speech_recognition]: [Enabled continuous mode]
[INFO] [1684207609.270874] [/speech_recognition]: [Auto start: True]
[INFO] [1684207609.753036] [/speech_recognition]: [Set minimum energy threshold to 358.4727721173039]
[WARN] [1684207613.506773] [/speech_recognition]: [data_path: /home/tsukamoto/ros/fetch_ws/src/jsk-ros-pkg/jsk_3rdparty/ros_speech_recognition/trained_data]
[WARN] [1684207613.507372] [/speech_recognition]: [model_path_before: None]
[INFO] [1684207613.507909] [/speech_recognition]: [Loading model from /home/tsukamoto/ros/fetch_ws/src/jsk-ros-pkg/jsk_3rdparty/ros_speech_recognition/trained_data/vosk-model-small-ja-0.22]
LOG (VoskAPI:ReadDataFiles():model.cc:213) Decoding params beam=13 max-active=7000 lattice-beam=4
LOG (VoskAPI:ReadDataFiles():model.cc:216) Silence phones 1:2:3:4:5:6:7:8:9:10
LOG (VoskAPI:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes.
LOG (VoskAPI:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components.
LOG (VoskAPI:ReadDataFiles():model.cc:248) Loading i-vector extractor from /home/tsukamoto/ros/fetch_ws/src/jsk-ros-pkg/jsk_3rdparty/ros_speech_recognition/trained_data/vosk-model-small-ja-0.22/ivector/final.ie
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (VoskAPI:ReadDataFiles():model.cc:282) Loading HCL and G from /home/tsukamoto/ros/fetch_ws/src/jsk-ros-pkg/jsk_3rdparty/ros_speech_recognition/trained_data/vosk-model-small-ja-0.22/graph/HCLr.fst /home/tsukamoto/ros/fetch_ws/src/jsk-ros-pkg/jsk_3rdparty/ros_speech_recognition/trained_data/vosk-model-small-ja-0.22/graph/Gr.fst
LOG (VoskAPI:ReadDataFiles():model.cc:308) Loading winfo /home/tsukamoto/ros/fetch_ws/src/jsk-ros-pkg/jsk_3rdparty/ros_speech_recognition/trained_data/vosk-model-small-ja-0.22/graph/phones/word_boundary.int
[INFO] [1684207614.227814] [/speech_recognition]: [Result: b'\xe3\x81\x82\xe3\x81\x82']
[INFO] [1684207619.698366] [/speech_recognition]: [Loading model from None]
lang None does not exist
[ERROR] [1684207622.279914] [/speech_recognition]: [Unexpected error: (<class 'SystemExit'>, SystemExit(1), <traceback object at 0x7f722adf6b40>)]
Exception ignored in: <function Model.__del__ at 0x7f722cb3faf0>
Traceback (most recent call last):
File "/home/tsukamoto/ros/fetch_ws/devel/.private/ros_speech_recognition/share/ros_speech_recognition/venv/lib/python3.8/site-packages/vosk/__init__.py", line 60, in __del__
_c.vosk_model_free(self._handle)
AttributeError: 'Model' object has no attribute '_handle'
[INFO] [1684207634.369576] [/speech_recognition]: [Loading model from None]
lang None does not exist
[ERROR] [1684207635.364335] [/speech_recognition]: [Unexpected error: (<class 'SystemExit'>, SystemExit(1), <traceback object at 0x7f722b0571c0>)]
Exception ignored in: <function Model.__del__ at 0x7f722cb3faf0>
Traceback (most recent call last):
File "/home/tsukamoto/ros/fetch_ws/devel/.private/ros_speech_recognition/share/ros_speech_recognition/venv/lib/python3.8/site-packages/vosk/__init__.py", line 60, in __del__
_c.vosk_model_free(self._handle)
AttributeError: 'Model' object has no attribute '_handle'
[INFO] [1684207638.883466] [/speech_recognition]: [Loading model from None]
lang None does not exist
[ERROR] [1684207640.547415] [/speech_recognition]: [Unexpected error: (<class 'SystemExit'>, SystemExit(1), <traceback object at 0x7f7229de8e00>)]
Exception ignored in: <function Model.__del__ at 0x7f722cb3faf0>
Traceback (most recent call last):
File "/home/tsukamoto/ros/fetch_ws/devel/.private/ros_speech_recognition/share/ros_speech_recognition/venv/lib/python3.8/site-packages/vosk/__init__.py", line 60, in __del__
_c.vosk_model_free(self._handle)
AttributeError: 'Model' object has no attribute '_handle'
^C[speech_recognition_candidates_to_string-3] killing on exit
[speech_recognition-2] killing on exit
[audio_capture-1] killing on exit
[speech_recognition-2] escalating to SIGTERM
[speech_recognition-2] escalating to SIGKILL
Shutdown errors:
* process[speech_recognition-2, pid 1067328]: required SIGKILL. May still be running.
shutting down processing monitor...
... shutting down processing monitor complete
done |
ros_speech_recognition/src/ros_speech_recognition/recognize_vosk.py
Outdated
Show resolved
Hide resolved
Test fails on venv locking dependencies.
You should set |
Co-authored-by: Naoto Tsukamoto <[email protected]>
@tkmtnt7000 Thank you. I set CHECK_VENV as FALSE. |
SpeechRecognition==3.9.0 | ||
vosk==0.3.45 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am very sorry for too late comment.
I just found that both SpeechRecognition==3.9.0
and vosk
are not compatible with python 3.4, so indigo test will surely fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
closed via #474 |
This PR enables vosk engine in speech recognition.
More APIs became available after SpeechRecognition==3.9.0.
Vosk is a speech recognition toolkit which works offline and supports Japanese.
Sample (tested with PR2)