Snippets Groups Projects

Marcus Gursch authored 10 months ago

bb9f92dc

bb9f92dc 10 months ago

Name	Last commit	Last update
data/musan/noise/free-sound
model_parameters
src
.gitignore
README.md
demo_inference.ipynb
eval.ipynb
eval_additive.ipynb
eval_init.py
eval_noise.ipynb
requirements.txt
requirements_freeze.txt
train_all.ipynb
train_all.py

Artificial Bandwidth Extension of Speech using Generative Neural Networks

Setup

# create and activate python venv
python3 -m venv env
source ./env/bin/activate
python -m pip install -r requirements.txt  # latest versions
# python -m pip install -r requirements_freeze.txt  # specific versions

# get VCTK
source ./env/bin/activate
python src/dataset_init.py

This will download VCTK, resample it to 16kHz, create the train/test split and remove speakers p280 and p315.

Training

For two-step trainings, first run the train[...]_step_1.py. This creates a run directory in runs/[...]_step_1 in which training metrics and state_dict snapshots are stored. A state_dict from step 1, typically the latest one, has to be passed to train_[...]_step_2.py

train_all.py automates this process and trains all models in about 2-4 days on a single GPU.

Inference

See demo_inference.ipynb for details on how to instantiate a model and run inference on it.

Evaluation

eval.ipynb evaluates models using non-additive synthesis.
Likewise eval_additive.ipynb runs evaluation with additive synthesis. Both notebooks store results per test sample as pandas dataframes.
eval_noise.ipynb runs inference on noisy speech.