Artificial Bandwidth Extension of Speech using Generative Neural Networks
Setup
# create and activate python venv
python3 -m venv env
source ./env/bin/activate
python -m pip install -r requirements.txt # latest versions
# python -m pip install -r requirements_freeze.txt # specific versions
# get VCTK
source ./env/bin/activate
python src/dataset_init.py
This will download VCTK, resample it to 16kHz, create the train/test split and remove speakers p280 and p315.
Training
For two-step trainings, first run the train[...]_step_1.py
. This creates a run directory in runs/[...]_step_1
in which training metrics and state_dict snapshots are stored.
A state_dict
from step 1, typically the latest one, has to be passed to train_[...]_step_2.py
train_all.py
automates this process and trains all models in about 2-4 days on a single GPU.
Inference
See demo_inference.ipynb
for details on how to instantiate a model and run inference on it.
Evaluation
-
eval.ipynb
evaluates models using non-additive synthesis. - Likewise
eval_additive.ipynb
runs evaluation with additive synthesis. Both notebooks store results per test sample as pandas dataframes. -
eval_noise.ipynb
runs inference on noisy speech.