Non-Intrusive Binaural Speech Intelligibility Prediction From Discrete Latent Representations

A. F. McKinney and B. Cauchi
IEEE Signal Processing Letters
Non-intrusive speech intelligibility (SI) predictionfrom binaural signals is useful in many applications. However,most existing signal-based measures are designed to be applied tosingle-channel signals. Measures specifically designed to take intoaccount the binaural properties of the signal are often intrusive– characterised by requiring access to a clean speech signal – andtypically rely on combining both channels into a single-channel signal before making predictions. This letter proposes a non-intrusiveSI measure that computes features from a binaural input signalusing a combination of vector quantization (VQ) and contrastivepredictive coding (CPC) methods. vector quantized contrastivepredictive coding (VQ-CPC) feature extraction does not rely on anymodel of the auditory system and is instead trained to maximise themutual information between the input signal and output features.The computed VQ-CPC features are input to a predicting functionparameterized by a neural network. Two predicting functions areconsidered in this letter. Both feature extractor and predictingfunctions are trained on simulated binaural signals with isotropicnoise. They are tested on simulated signals with isotropic and realnoise. For all signals, the ground truth scores are the (intrusive)deterministic binaural short-time objective intelligibility (STOI).Results are presented in terms of correlations and mean-squarederror (MSE) and demonstrate that VQ-CPC features are able tocapture information relevant to modelling SI and outperform all theconsidered benchmarks – even when evaluating on data comprisingof different noise field types.
Augmented Auditive Intelligence