Skip to content

Wav2vec resources

Seongjin Park edited this page Jul 7, 2021 · 13 revisions

Wav2Vec tools:

  1. Huggingface
  • If you want to follow the provided resource with your custom dataset, take a look at this notebook.
  1. Fairseq
  • Advantage: support language model for decoding. YAML file for all hyperparameters.
  • Disadvantage: requires large computation power (original study used 24 gpu, and it shows OOM error when running on Google Colab)
  • TODO: Upload custom scripts that wrote to preprocess the data structure.
  • Install flashlight:
  1. SpeechBrain
  • Advantage: compatible with huggingface. Support language model. Good documentation
  • Disadvantage: requires python >= 3.8. need to understand the structure of yaml files and custom functions.
  • TODO:
  • Wav2Vec2.0 with huggingface and fairseq using SpeechBrain HERE
  • Pretrain and finetune the model using huggingface HERE
Clone this wiki locally