SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

This repository is the official implementation of "SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation"

Paper: arxiv
Demo page: Audio Samples
Chekpoints: Hugging Face (Now only checkpoints are avaiable.）

Contact:

Koichi SAITO: koichi.saito@sony.com

Info

[2024/12/04] We're plainig to open-source codebase/checkpoints of DiT backbone with full-band text-to-sound setting and downstream tasks, as well.

Checkpoints

Download and put the teacher model's checkpoints and AudioLDM-s-full checkpoints for VAE+Vocoder part to soundctm/ckpt
SoundCTM checkpoint on AudioCaps (ema=0.999, 30K training iterations)

For inference, both AudioLDM-s-full (for VAE's decoder+Vocoder) and SoundCTM checkpoints will be used.

Prerequisites

Install docker to your own server and build docker container:

docker build -t soundctm .

Then run scripts in the container.

Training

Please see ctm_train.sh and ctm_train.py and modify folder path dependeing on your environment.

Then run bash ctm_train.sh

Inference

Please see ctm_inference.sh and ctm_inference.py and modify folder path dependeing on your environment.

Then run bash ctm_inference.sh

Numerical evaluation

Please see numerical_evaluation.sh and numerical_evaluation.py and modify folder path dependeing on your environment.

Then run bash numerical_evaluation.sh

Dataset

Follow the instructions given in the AudioCaps repository for downloading the data. Data locations are needed to be spesificied in ctm_train.sh. You can also see some examples at data/train.csv.

WandB for logging

The training code also requires a Weights & Biases account to log the training outputs and demos. Create an account and log in with:

$ wandb login

Or you can also pass an API key as an environment variable WANDB_API_KEY. (You can obtain the API key from https://wandb.ai/authorize after logging in to your account.)

$ WANDB_API_KEY="12345x6789y..."

Citation

@article{saito2024soundctm,
  title={SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation}, 
  author={Koichi Saito and Dongjun Kim and Takashi Shibuya and Chieh-Hsin Lai and Zhi Zhong and Yuhta Takida and Yuki Mitsufuji},
  journal={arXiv preprint arXiv:2405.18503},
  year={2024}
}

Reference

Part of the code is borrowed from the following repos. We would like to thank the authors of these repos for their contribution.

https://github.com/sony/ctm

https://github.com/declare-lab/tango

https://github.com/haoheliu/AudioLDM

https://github.com/haoheliu/audioldm_eval

Name	Name	Last commit message	Last commit date
Latest commit koichi-saito-sony Update README.md Dec 4, 2024 c536d49 · Dec 4, 2024 History 23 Commits
audioldm_eval	audioldm_eval	Add files via upload	Jun 5, 2024
ckpt	ckpt	Add files via upload	Jun 5, 2024
clap	clap	Add files via upload	Jun 5, 2024
configs	configs	Add files via upload	Jun 5, 2024
ctm	ctm	Update inference_sampling.py	Oct 1, 2024
data	data	Add files via upload	Jun 5, 2024
src/hear21passt	src/hear21passt	Add files via upload	Jun 5, 2024
tango_edm	tango_edm	Add files via upload	Jun 5, 2024
tools	tools	Add files via upload	Jun 5, 2024
LICENSE	LICENSE	Add files via upload	Jun 5, 2024
README.md	README.md	Update README.md	Dec 4, 2024
ctm_inference.py	ctm_inference.py	Update ctm_inference.py	Aug 29, 2024
ctm_inference.sh	ctm_inference.sh	Update ctm_inference.sh	Aug 29, 2024
ctm_train.py	ctm_train.py	Add files via upload	Jun 5, 2024
ctm_train.sh	ctm_train.sh	Add files via upload	Jun 5, 2024
docker2singularity.sh	docker2singularity.sh	Add files via upload	Jun 5, 2024
dockerfile	dockerfile	Add files via upload	Jun 5, 2024
numerical_evaluation.py	numerical_evaluation.py	Add files via upload	Jun 5, 2024
numerical_evaluation.sh	numerical_evaluation.sh	Add files via upload	Jun 5, 2024
python_accelerate.sh	python_accelerate.sh	Add files via upload	Jun 5, 2024
requirements.txt	requirements.txt	Update requirements.txt	Jun 7, 2024
teacher_eval.py	teacher_eval.py	Add files via upload	Jun 5, 2024
teacher_eval.sh	teacher_eval.sh	Add files via upload	Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Info

Checkpoints

Prerequisites

Training

Inference

Numerical evaluation

Dataset

WandB for logging

Citation

Reference

About

Releases

Packages

Languages

License

sony/soundctm

Folders and files

Latest commit

History

Repository files navigation

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Info

Checkpoints

Prerequisites

Training

Inference

Numerical evaluation

Dataset

WandB for logging

Citation

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages