Department of Computer and Information Sciences
Permanent URI for this communityhttp://itsupport.cu.edu.ng:4000/handle/123456789/28739
Welcome to the Page of Computer and Information Sciences
Browse
Item IMPROVEMENT OF INFERENCE-TIME PREDICTION FOR SPEECH EMOTION RECOGNITION USING ITERATIVE kNN MAJORITY VOTING ON WavLM FEATURE EMBEDDINGS(Covenant University Ota, 2025-08) FALANA, John Oluwaseun; Covenant University DissertationThe prediction inconsistency and poor decision boundaries in high-dimensional embedding spaces limit the performance of Speech Emotion Recognition (SER) systems. This study proposes a post-processing framework that applies iterative k-Nearest Neighbors (kNN) majority voting to refine the output of a fine-tuned WavLM model without requiring retraining. Using the CREMA-D, an English dataset with 7,442 samples, embeddings were extracted and iteratively relabelled based on local neighborhood structure in the latent space. This refinement process enhanced label consistency and leveraged proximity-based corrections at inference time. Model performance was evaluated using standard SER metrics (accuracy and F1-score) and t-SNE visualization. Results show that repeated kNN refinement improves both classification accuracy and the clarity of decision boundaries, with a 1.87% improvement in F1 score from baseline compared to an improvement of 0.67% by the SCL+kNN approach from baseline. The approach is model-agnostic, efficient, and data-centric, offering a viable alternative to computationally expensive retraining. It highlights the value of embedding-space operations for improving SER reliability in real-world settings.