Department of Computer and Information Sciences

Permanent URI for this communityhttp://itsupport.cu.edu.ng:4000/handle/123456789/28739

Welcome to the Page of Computer and Information Sciences

Browse

Search Results

Now showing 1 - 10 of 20
  • Item
    DEVELOPMENT OF A REGULARIZATION-BASED FAIRNESS-AWARE LOSS FUNCTION FOR MITIGATING POPULARITY BIAS IN MOVIE RECOMMENDER SYSTEMS
    (Covenant University Ota, 2025-08) IDOWU, Esther Oluwaseyi; Covenant University Dissertation
    Recommender systems are a cornerstone of the movie entertainment industry, driving user engagement and personalizing content delivery to enhance customer experience. However, popularity bias where certain content is disproportionately recommended can limit the visibility of diverse contents, undermining innovation and customer satisfaction. This study proposed a regularization-based fairness-aware mechanism designed to mitigate popularity bias in recommender systems. The proposed mechanism integrates fairness-aware loss functions, such as exposure fairness loss, fairness-aware ranking loss and disparate Impact Loss, into the Neural Collaborative Filtering recommendation algorithm. The fairness-aware model was integrated into a movie recommendation system. The system was evaluated for technical effectiveness in terms of recommendation accuracy, fairness in exposure, and usability in real-world entertainment business contexts. All models achieved very high exposure fairness (≥0.9995). REBFAL matched the best baselines in long-tail coverage (0.2987) while showing a slightly higher exposure imbalance (0.7487), reflecting a trade-off between fairness and distribution.
  • Item
    DEVELOPMENT OF A TRANSFER LEARNING PIPELINE FOR PROSTATE CANCER AGGRESSIVENESS CLASSIFICATION
    (Covenant University Ota, 2025-08) OLUSUYI, Fiyinfoluwa Ruth; Covenant University Dissertation
    Prostate cancer is a leading malignancy in men, where accurate aggressiveness assessment is crucial for guiding treatment. While multi-parametric MRI (mpMRI) is now the established standard for non-invasive diagnosis, its interpretation can be subjective. Deep learning has shown promise, but limited data poses a challenge. This study addresses this limitation by developing a comprehensive transfer learning pipeline for automated prostate cancer aggressiveness classification using mpMRI data. The public PROSTATEx dataset was processed into 2D image patches combining T2-weighted, ADC, and high b-value DWI sequences as 3-channel inputs. Seven state-of-the-art pre-trained Convolutional Neural Network (CNN) architectures, including EfficientNet-B0, ResNet18, VGG16, DenseNet121, MobileNetV3, InceptionV3, and ShuffleNet V2, were fine-tuned using a consistent framework incorporating WeightedRandomSampler and regularization to address class imbalance. Performance evaluation was carried out on a separate validation set using a range of standard metrics, including accuracy, F1-score, specificity, and AUC. The findings identified EfficientNet-B0 as the superior architecture. It delivered the best performance, achieving an overall accuracy of nearly 97% and a macro F1-score of 0.96. This result highlights the exceptional effectiveness of modern, efficient network designs. Remarkably, the lightweight MobileNetV3 delivered nearly identical performance, also achieving a 96% accuracy and macro F1-score. Other architectures, including ShuffleNet V2, DenseNet121, and ResNet18, also proved highly effective with accuracies between 94-96%. The VGG16 and InceptionV3 models did not reach the same level of performance as the leading architectures, with accuracies of 0.72 and 0.90, respectively.
  • Item
    DEVELOPMENT OF A GENE FUSION DETECTION VALIDATION FRAMEWORK FOR LONG-READ RNA SEQUENCING USING ALIGNMENT EVIDENCE AND MACHINE LEARNING
    (Covenant University Ota, 2025-08) AMOS, IISOMINEA ISOZO; Covenant University Dissertation
    Gene fusions are critical drivers of cancer and serve as diagnostic and therapeutic biomarkers. Detecting them reliably from long-read RNA sequencing (RNA-Seq) data remains challenging due to high error rates and complex transcript structures. Current methods often depend on matched whole-genome sequencing (WGS) data, which may be unavailable or uninformative when fusions are expressed without clear genomic breakpoints. To address this, a long-read fusion validation pipeline was developed, optimized for transcript-level evidence by removing reliance on genomic data and focusing on functionally expressed fusions. The pipeline integrates alignment support from realigned soft-clipped reads, supplementary alignments, and full-length chimeric reads to validate transcripts. A Random Forest model was further trained using features derived from validated events to refine classification. Applied to five cancer cell line datasets, with emphasis on breast cancer, the pipeline achieved a 68.1% overall validation rate and 77.8% in MCF7. It distinguished true fusions, deprioritized database-reported false positives, and highlighted high-confidence novel candidates. Known fusions such as BCAS4–BCAS3 were confirmed, while MOV10–RHOC emerged as a biologically relevant novel fusion supported by multiple evidence types and recurrent in MCF7 and K562. Another candidate, CLUL1–TYMS, detected across four lines, likely represents a transcriptional read-through.Benchmarking against experimentally validated fusion transcripts, rather than DNA-based tools, established a transcript-focused alternative for fusion discovery. This dataset will be made publicly available to support benchmarking and machine learning research. The framework enables high-confidence detection of transcript-level fusions in cancer and shows strong potential for biomarker discovery and precision oncology.
  • Item
    A COMPUTATIONAL FRAMEWORK FOR PREDICTING COMPOUNDPROTEIN INTERACTION FOR PROSTATE CANCER THERAPEUTIC DISCOVERY
    (Covenant University Ota, 2025-08) AGBI, Mayowa; Covenant University Dissertation
    Prostate cancer (PCa) is a major public health issue globally. In sub-Saharan Africa, with its limited number of diagnostic and treatment resources, it accounts for high mortality. The conventional approach to drug discovery is lengthy, expensive, and often insufficient to address the complex treatment-resistant prostate cancers present. In this study, a deep learning computational framework to predict Compound-Protein Interactions (CPI) for prostate cancer drug discovery was developed. An end-to-end machine learning pipeline was implemented using curated datasets from Zenodo, ChEMBL, BindingDB, and UniProt. Molecular representations for compounds were constructed using 2048-bit Morgan fingerprints, dimensionally reduced to 200 via Principal Component Analysis (PCA), and for the proteins, 100-dimensional 3-mer Word2Vec embeddings were used. These features were fed into a double-input deep neural network that was optimized with binary-cross-entropy loss, the Adam optimizer, and dropout regularization. The model identified five novel bioactive compounds for targeting proteins of prostate cancer biomarkers. Model confidence was used to prioritize predicted interactions for AR, SRC, and EGFR. Molecular docking in PyRx and AutoDock Vina, followed by visualization in Discovery Studio supporting strong binding affinity (-7.2 to -10) and complementarity from the structural point of view, constituting therapeutic potential. An integration of molecular docking enriched translational value to the prediction. The results presented here point to a disease-specific platform for in silico drug discovery in prostate cancer. This study opens a very promising path toward giving priority to candidate compounds by coupling the deep learning with structure-based affirmation. It provides a very viable ground to be merged with experimental validation and combinatorial therapy design, thereby taking one step further into machine learning-assisted precision oncology.
  • Item
    CONSTRUCTING GENE REGULATORY NETWORKS FOR BREAST CANCER STEM CELLS USING SINGLE-CELL MULTI-OMICS
    (Covenant University Ota, 2025-08) UJOH, Treasure Ulonna; Covenant University Dissertation
    Breast cancer mortality is primarily driven by metastasis and therapeutic relapse; processes strongly linked to breast cancer stem cells (BCSCs). These cells are believed to orchestrate tumor initiation, resistance, and recurrence through complex gene regulatory networks (GRNs) that remain poorly characterized. This study aimed to construct and compare GRNs of BCSCs and normal mammary stem cells (MaSCs) using a single-cell multi-omics framework. Datasets that were publicly available, containing single-cell RNA sequencing (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) profiles, were sourced from normal breast tissue, primary tumors, recurrent tumors, and BCSC-enriched mammospheres. The datasets were subjected to strict preprocessing that involved filtering, normalization, and quality control. The scGLUE model, which uses a graph neural network to integrate multiple omics including transcriptomic and epigenomic data into a single latent space while conserving biological identity, was used to integrate the datasets. pySCENIC pipeline was then used, which combines co-expression analysis, cis-regulatory motif enrichment, and pruning to reconstruct gene regulatory networks, and high-confidence regulons were generated for BCSCs and MaSCs. Comparative network analysis revealed extensive “regulatory rewiring” in BCSCs, with transcription factors such as JUNB, FOSB, LEF1, SOX4, and MAFB emerging as master regulators absent or significantly altered in normal stem cells. Functional enrichment of BCSC-exclusive targets highlighted pathways central to metastasis and recurrence, including extracellular matrix remodeling, adhesion, migration, and growth factor signaling. Disease ontology mapping further confirmed strong associations with invasive breast carcinoma and therapy resistance. Collectively, this study provides one of the first integrated single-cell GRN maps contrasting BCSCs with their normal counterparts, establishing mechanistic links between regulatory rewiring and cancer hallmarks. The identification of BCSC-specific master regulators offers promising therapeutic entry points for interventions aimed at eradicating the root drivers of breast cancer relapse and metastasis.
  • Item
    DEVELOPMENT OF A MULTI-LABEL CLASSIFIER FOR PREDICTING GENETIC MARKERS ASSOCIATED WITH MULTI-DRUG RESISTANCE IN Plasmodium falciparum STRAINS
    (Covenant University Ota, 2025-08) OGUNDIMU, Temitayo Ayomikun; Covenant University Dissertation
    Malaria is an infectious disease of global health importance caused by Plasmodium falciparum. It is highly complicated by parasite’s ability to gain resistance to multiple antimalarial drugs simultaneously, a phenomenon known as multidrug resistance (MDR). Single-label models only predict resistance to one drug at a time and as such would not capture these complex resistance patterns, limiting their utility for real-world surveillance. To bridge this gap, this study developed and evaluated four advanced multi-label classification models: Random Forest with Binary Relevance (RFDTBR), Ensemble of Classifier Chains (ECCJ48), Ensemble of Binary Relevance (EBRJ48), and a Backpropagation Neural Network (BPNN), using genomic and phenotypic data for five key antimalarials. Notably, RFDTBR and EBRJ48 outperformed others in predicting exact MDR profiles, while BPNN performed faster compared to the other models. Sulfadoxine-Pyrimethamine had the lowest performance across the models. Specific genomic features consistently emerged as key predictive factors across all models. These findings demonstrate the value of multi-label learning for comprehensive MDR prediction. Also, effective models and genomic regions were identified, warranting further investigation, thereby paving the way for improved resistance surveillance
  • Item
    IMPROVEMENT OF INFERENCE-TIME PREDICTION FOR SPEECH EMOTION RECOGNITION USING ITERATIVE kNN MAJORITY VOTING ON WavLM FEATURE EMBEDDINGS
    (Covenant University Ota, 2025-08) FALANA, John Oluwaseun; Covenant University Dissertation
    The prediction inconsistency and poor decision boundaries in high-dimensional embedding spaces limit the performance of Speech Emotion Recognition (SER) systems. This study proposes a post-processing framework that applies iterative k-Nearest Neighbors (kNN) majority voting to refine the output of a fine-tuned WavLM model without requiring retraining. Using the CREMA-D, an English dataset with 7,442 samples, embeddings were extracted and iteratively relabelled based on local neighborhood structure in the latent space. This refinement process enhanced label consistency and leveraged proximity-based corrections at inference time. Model performance was evaluated using standard SER metrics (accuracy and F1-score) and t-SNE visualization. Results show that repeated kNN refinement improves both classification accuracy and the clarity of decision boundaries, with a 1.87% improvement in F1 score from baseline compared to an improvement of 0.67% by the SCL+kNN approach from baseline. The approach is model-agnostic, efficient, and data-centric, offering a viable alternative to computationally expensive retraining. It highlights the value of embedding-space operations for improving SER reliability in real-world settings.
  • Item
    GENOME-WIDE IDENTIFICATION OF SHORT TANDEM REPEATS ASSOCIATED WITH MULTI-DRUG RESISTANCE IN Plasmodium falciparum STRAINS
    (Covenant University Ota, 2025-08) EMMANUELLA EKURI MAMTUMAMBOH; Covenant University Dissertation
    Antimalarial drug resistance in Plasmodium falciparum threatens global malaria control, and while single nucleotide polymorphisms (SNPs) are well-studied, the role of short tandem repeats (STRs) remains underexplored. This study investigates the contribution of pathogenic STRs to drug resistance using STR genotypes from HipSTR, phenotypic resistance data, and machine learning models. Allele frequency analysis revealed consistently lower alternative allele frequencies in resistant strains across all 14 chromosomes, with strong selective signals on chromosomes 2, 3, 4, 8, and 13. Population differentiation analyses (PCA, FST) identified key resistance loci near PfKelch13 and plasmepsin 2/3, along with potential novel resistance regions. A logistic regression model trained on STR alleles achieved perfect classification (AUC = 1.00), demonstrating the strong predictive power of STRs in distinguishing resistant from sensitive parasites. Top STRs showed both known and novel associations with resistance, reinforcing the polygenic nature of antimalarial resistance. These findings establish STRs as important genetic markers for resistance surveillance and highlight their potential utility in guiding malaria treatment strategies.
  • Item
    PREDICTION OF THE SPREAD OF MALARIA IN PLATEAU STATE: A MACHINE LEARNING APPROACH
    (Covenant University Ota, 2025-08) EGAH, Daniel Owhlama; Covenant University Dissertation
    Malaria remains a major public health concern in Plateau State, Nigeria, with seasonal surges driven by climatic, environmental, and socio-economic factors. Despite various control interventions, locally adapted predictive models are scarce, limiting proactive disease control measures. This study aimed to develop and evaluate machine learning models capable of forecasting malaria incidence across the state, thereby supporting targeted prevention and control strategies. Using a ten-year dataset (2014– 2023) covering confirmed malaria cases, rainfall, temperature, and relative humidity for all 17 Local Government Areas (LGAs) of Plateau State, the data were preprocessed through cleaning, normalization, and integration of climatic and epidemiological variables. Three supervised machine learning algorithms—Support Vector Machine (SVM), Random Forest (RF), and Extreme Gradient Boosting (XGBoost)— were trained for both regression and classification tasks, and their performance was evaluated using Mean Squared Error (MSE), Coefficient of Determination (R2), accuracy, precision, recall, and F1-score. For classification, the Random Forest model achieved the highest accuracy (63.4%) with balanced precision and recall, followed by XGBoost, while SVM exhibited higher recall for class 0 but markedly lower performance for class 1. For regression, XGBoost outperformed all models, yielding the lowest MSE (554,539) and highest R2 (0.587), followed by Random Forest (R2 = 0.562), while SVM recorded a negative R2 (-0.037), indicating poor fit. The study concludes that tree-based ensemble models, particularly XGBoost, offer superior predictive capabilities for malaria incidence in Plateau State. It is recommended that such predictive models be integrated into the state’s malaria surveillance systems, retrained periodically with updated climatic and epidemiological data, and expanded to include socio-economic and intervention coverage variables for improved accuracy and operational relevance
  • Item
    CONSTRUCTION AND ANALYSIS OF AN miRNA-mRNA NETWORK TO IDENTIFY HUB REGULATORS IN DROUGHT-STRESSED COWPEA
    (Covenant University Ota, 2025-08) AWHA, Oghenetega Joel; Covenant University Dissertation
    Drought stress severely threatens cowpea (Vigna unguiculata), a crop vital to food security in sub-Saharan Africa. While naturally resilient, the molecular mechanisms governing its drought response are poorly understood. This study aimed to map the post-transcriptional regulatory network of microRNAs (miRNAs) and their messenger RNA (mRNA) targets to identify key regulators of drought tolerance. Using a bioinformatics approach on public RNA-sequencing data from drought-stressed cowpea, differentially expressed miRNAs and mRNAs were identified. These were then used to construct an integrated miRNA-mRNA regulatory network, and centrality analysis was applied to pinpoint the most influential "hub" regulators. The analysis revealed a core set of hub miRNAs, including the conserved miR396, miR172, and miR156 families, that orchestrate the drought response. These hubs regulate critical target genes involved in stress hormone synthesis (ACC oxidase), protein stability (Heat Shock Protein 70), and antioxidant defense (Sulfate Transporter, Cysteine proteinase). Additionally, this study resulted in the identification of 31 novel miRNAs, which, however, require laboratory validation. This discovery highlights a coordinated survival strategy where a few master-switch miRNAs control a wide array of protective functions. This research provides a foundational "molecular blueprint" of cowpea's drought response, identifying high-priority candidate genes for developing climate-resilient varieties. By elucidating these regulatory hubs, this work offers a clear path for targeted breeding and biotechnological interventions to safeguard this essential crop