VPVID: Variance-Preserving Velocity-Guided Interpolant Diffusion for Speech Enhancement and Dereverberation

G. Yang, Y. J. Wei, B. Niu, and Y. Q. Wang

School of Computer Science and Engineering, Northeastern University, China

Abstract

Diffusion-based generative models for speech enhancement often face challenges in balancing performance and inference efficiency. To address this, we propose a model of Variance-Preserving Velocity-guided Interpolant Diffusion (VPVID), a novel framework that achieves competitive enhancement performance while maintaining high computational efficiency. Our approach incorporates a scalable interpolant framework that reconstructs the reverse diffusion process using velocity terms and state variables. Unlike traditional score-matching objectives, we employ a velocity-based loss function that directly estimates the instantaneous rate of change, providing more stable training and efficient data distribution learning. We further combine stochastic diffusion sampling with probability flow ordinary differential equations, augmented by an adaptive corrector mechanism, creating a flexible sampling strategy that balances quality and efficiency. Extensive experiments on VoiceBank-DEMAND and WSJ0-CHiME3 datasets demonstrate that VPVID significantly outperforms existing baselines across multiple metrics, particularly excelling in noise separation with SI-SIR improvement up to 4.7 dB. Furthermore, VPVID achieves up to 7× faster inference than existing diffusion-based methods while maintaining excellent speech enhancement and dereverberation performance.

Speech Enhancement Experiments (WSJ0-CHiME3)

Audio samples demonstrating noise reduction performance on the WSJ0-CHiME3 dataset.

Sample 1: 051o0211

Clean (Target)

Clean spectrogram

Noisy (Input)

Noisy spectrogram

SGMSEP

SGMSEP spectrogram

VPIDM

VPIDM spectrogram

FLOWSE

FLOWSE spectrogram

VPVID (Ours)

VPVID spectrogram

VPVID-PC (Ours)

VPVID-PC spectrogram

VPVID-ODE (Ours)

VPVID-ODE spectrogram

Sample 2: 22ga010f

Clean (Target)

Clean spectrogram

Noisy (Input)

Noisy spectrogram

SGMSEP

SGMSEP spectrogram

VPIDM

VPIDM spectrogram

FLOWSE

FLOWSE spectrogram

VPVID (Ours)

VPVID spectrogram

VPVID-PC (Ours)

VPVID-PC spectrogram

VPVID-ODE (Ours)

VPVID-ODE spectrogram

Sample 3: 422c020o

Clean (Target)

Clean spectrogram

Noisy (Input)

Noisy spectrogram

SGMSEP

SGMSEP spectrogram

VPIDM

VPIDM spectrogram

FLOWSE

FLOWSE spectrogram

VPVID (Ours)

VPVID spectrogram

VPVID-PC (Ours)

VPVID-PC spectrogram

VPVID-ODE (Ours)

VPVID-ODE spectrogram

Sample 4: 423o0304

Clean (Target)

Clean spectrogram

Noisy (Input)

Noisy spectrogram

SGMSEP

SGMSEP spectrogram

VPIDM

VPIDM spectrogram

FLOWSE

FLOWSE spectrogram

VPVID (Ours)

VPVID spectrogram

VPVID-PC (Ours)

VPVID-PC spectrogram

VPVID-ODE (Ours)

VPVID-ODE spectrogram

Speech Dereverberation Experiments (WSJ0-Reverb)

Audio samples demonstrating reverberation removal performance on the WSJ0-Reverb dataset.

Sample 1: 441c0208

Anechoic (Target)

Anechoic spectrogram

Reverb (Input)

Reverb spectrogram

SGMSEP

SGMSEP spectrogram

VPIDM

VPIDM spectrogram

FLOWSE

FLOWSE spectrogram

VPVID (Ours)

VPVID spectrogram

VPVID-PC (Ours)

VPVID-PC spectrogram

VPVID-ODE (Ours)

VPVID-ODE spectrogram

Sample 2: 441o030y

Anechoic (Target)

Anechoic spectrogram

Reverb (Input)

Reverb spectrogram

SGMSEP

SGMSEP spectrogram

VPIDM

VPIDM spectrogram

FLOWSE

FLOWSE spectrogram

VPVID (Ours)

VPVID spectrogram

VPVID-PC (Ours)

VPVID-PC spectrogram

VPVID-ODE (Ours)

VPVID-ODE spectrogram

Sample 3: 442o0301

Anechoic (Target)

Anechoic spectrogram

Reverb (Input)

Reverb spectrogram

SGMSEP

SGMSEP spectrogram

VPIDM

VPIDM spectrogram

FLOWSE

FLOWSE spectrogram

VPVID (Ours)

VPVID spectrogram

VPVID-PC (Ours)

VPVID-PC spectrogram

VPVID-ODE (Ours)

VPVID-ODE spectrogram

Sample 4: 447c020s_597_1.07_-8.1

Anechoic (Target)

Anechoic spectrogram

Reverb (Input)

Reverb spectrogram

SGMSEP

SGMSEP spectrogram

VPIDM

VPIDM spectrogram

FLOWSE

FLOWSE spectrogram

VPVID (Ours)

VPVID spectrogram

VPVID-PC (Ours)

VPVID-PC spectrogram

VPVID-ODE (Ours)

VPVID-ODE spectrogram