Skip to content
Snippets Groups Projects
Marcus Gursch's avatar
Marcus Gursch authored
bb9f92dc
History

Artificial Bandwidth Extension of Speech using Generative Neural Networks

Setup

# create and activate python venv
python3 -m venv env
source ./env/bin/activate
python -m pip install -r requirements.txt  # latest versions
# python -m pip install -r requirements_freeze.txt  # specific versions
# get VCTK
source ./env/bin/activate
python src/dataset_init.py

This will download VCTK, resample it to 16kHz, create the train/test split and remove speakers p280 and p315.

Training

For two-step trainings, first run the train[...]_step_1.py. This creates a run directory in runs/[...]_step_1 in which training metrics and state_dict snapshots are stored. A state_dict from step 1, typically the latest one, has to be passed to train_[...]_step_2.py

train_all.py automates this process and trains all models in about 2-4 days on a single GPU.

Inference

See demo_inference.ipynb for details on how to instantiate a model and run inference on it.

Evaluation

  • eval.ipynb evaluates models using non-additive synthesis.
  • Likewise eval_additive.ipynb runs evaluation with additive synthesis. Both notebooks store results per test sample as pandas dataframes.
  • eval_noise.ipynb runs inference on noisy speech.