IMPROVEMENT OF INFERENCE-TIME PREDICTION FOR SPEECH EMOTION RECOGNITION USING ITERATIVE kNN MAJORITY VOTING ON WavLM FEATURE EMBEDDINGS

FALANA, John Oluwaseun; Covenant University Dissertation

IMPROVEMENT OF INFERENCE-TIME PREDICTION FOR SPEECH EMOTION RECOGNITION USING ITERATIVE kNN MAJORITY VOTING ON WavLM FEATURE EMBEDDINGS

Files

Primary Pages from FALANA JOHN OLUWASEUN 23PCG02638 final printing copy.pdf (287.52 KB)

Date

2025-08

Authors

FALANA, John Oluwaseun

Covenant University Dissertation

Publisher

Covenant University Ota

Abstract

The prediction inconsistency and poor decision boundaries in high-dimensional embedding spaces limit the performance of Speech Emotion Recognition (SER) systems. This study proposes a post-processing framework that applies iterative k-Nearest Neighbors (kNN) majority voting to refine the output of a fine-tuned WavLM model without requiring retraining. Using the CREMA-D, an English dataset with 7,442 samples, embeddings were extracted and iteratively relabelled based on local neighborhood structure in the latent space. This refinement process enhanced label consistency and leveraged proximity-based corrections at inference time. Model performance was evaluated using standard SER metrics (accuracy and F1-score) and t-SNE visualization. Results show that repeated kNN refinement improves both classification accuracy and the clarity of decision boundaries, with a 1.87% improvement in F1 score from baseline compared to an improvement of 0.67% by the SCL+kNN approach from baseline. The approach is model-agnostic, efficient, and data-centric, offering a viable alternative to computationally expensive retraining. It highlights the value of embedding-space operations for improving SER reliability in real-world settings.

Keywords

Speech Emotion Recognition, Human Computer Interaction, K-Nearest Neighbors, Self-Supervised Learning, WavLM

URI

https://repository.covenantuniversity.edu.ng/handle/123456789/50404

Collections

Programme: Computer Science

Full item page

IMPROVEMENT OF INFERENCE-TIME PREDICTION FOR SPEECH EMOTION RECOGNITION USING ITERATIVE kNN MAJORITY VOTING ON WavLM FEATURE EMBEDDINGS

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By