Wav2vec resources

Wav2Vec tools:

Huggingface

Advantage: compatible with other huggingface layers

Disadvantage: currently not supporting language model for decoding, will be updated someday.

Good resource to take a look at: Fine-Tune Wav2Vec for English ASR with Huggingface Transformers

If you want to follow the provided resource with your custom dataset, take a look at this notebook.

Fairseq

Advantage: support language model for decoding. YAML file for all hyperparameters.

Disadvantage: requires large computation power (original study used 24 gpu, and it shows OOM error when running on Google Colab)

TODO: Upload custom scripts that wrote to preprocess the data structure.

Install flashlight:

How to fix CANNOT FIND FFTW3LibraryDependency error?: github issue

export MKLROOT path: github issue

SpeechBrain

Advantage: compatible with huggingface. Support language model. Good documentation

Disadvantage: requires python >= 3.8. need to understand the structure of yaml files and custom functions.

TODO:

Wav2Vec2.0 with huggingface and fairseq using SpeechBrain HERE

Pretrain and finetune the model using huggingface HERE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wav2vec resources

Clone this wiki locally