Department of Computer and Information Sciences
Permanent URI for this communityhttp://itsupport.cu.edu.ng:4000/handle/123456789/28739
Welcome to the Page of Computer and Information Sciences
Browse
Item A COMPUTATIONAL FRAMEWORK FOR PREDICTING COMPOUNDPROTEIN INTERACTION FOR PROSTATE CANCER THERAPEUTIC DISCOVERY(Covenant University Ota, 2025-08) AGBI, Mayowa; Covenant University DissertationProstate cancer (PCa) is a major public health issue globally. In sub-Saharan Africa, with its limited number of diagnostic and treatment resources, it accounts for high mortality. The conventional approach to drug discovery is lengthy, expensive, and often insufficient to address the complex treatment-resistant prostate cancers present. In this study, a deep learning computational framework to predict Compound-Protein Interactions (CPI) for prostate cancer drug discovery was developed. An end-to-end machine learning pipeline was implemented using curated datasets from Zenodo, ChEMBL, BindingDB, and UniProt. Molecular representations for compounds were constructed using 2048-bit Morgan fingerprints, dimensionally reduced to 200 via Principal Component Analysis (PCA), and for the proteins, 100-dimensional 3-mer Word2Vec embeddings were used. These features were fed into a double-input deep neural network that was optimized with binary-cross-entropy loss, the Adam optimizer, and dropout regularization. The model identified five novel bioactive compounds for targeting proteins of prostate cancer biomarkers. Model confidence was used to prioritize predicted interactions for AR, SRC, and EGFR. Molecular docking in PyRx and AutoDock Vina, followed by visualization in Discovery Studio supporting strong binding affinity (-7.2 to -10) and complementarity from the structural point of view, constituting therapeutic potential. An integration of molecular docking enriched translational value to the prediction. The results presented here point to a disease-specific platform for in silico drug discovery in prostate cancer. This study opens a very promising path toward giving priority to candidate compounds by coupling the deep learning with structure-based affirmation. It provides a very viable ground to be merged with experimental validation and combinatorial therapy design, thereby taking one step further into machine learning-assisted precision oncology.Item A MULTI-DOCUMENT SUMMARIZATION APPROACH FOR QUERY-DRIVEN NON-FACTOID QUESTION-ANSWERING SYSTEM(Covenant University Ota, 2025-07) EFOSA-ZUWA, Emmanuel Temidire; Covenant University DissertationIn Natural Language Processing (NLP), Question Answering Systems (QAS) are essential for facilitating efficient access to relevant information. Traditional QAS approaches typically involve decomposing user queries, retrieving relevant documents, and ranking potential answers, often struggle with non-factoid questions that require detailed, context-rich responses synthesized from multiple sources. While existing research has focused heavily on passage selection and ranking, many methods fail to produce a coherent answer, leaving the challenge of multi-source summarization largely unresolved. This study presents a transfer learning-based QAS framework that addresses non-factoid queries through multi-source summarization. The framework follows a multi-stage methodology incorporating question paraphrasing, contradiction detection, sentence embedding and pruning, and a hybrid approach combining extractive and abstractive summarization techniques. Quantitative and qualitative analyses were conducted using benchmark datasets, including WikiHow QA and PubMedQA to evaluate its effectiveness. The proposed system achieved strong quantitative results, with scores on WikiHow QA (ROUGE-1: 34.10, ROUGE-2: 12.30, ROUGE-L: 32.10, BLEU: 25.14, BERTScore: 95.17) and PubMedQA (ROUGE-1: 42.30, ROUGE-2: 16.10, ROUGE-L: 33.40, BLEU: 31.66, BERTScore: 95.72), demonstrating its ability to generate accurate and contextually relevant answers. Qualitative evaluations also yielded promising outcomes, with average ratings of 4.37 for information, 4.16 for conciseness, 4.20 for readability, and 4.01 for correctness on a 5-point scale, confirming the model’s effectiveness in delivering accurate and comprehensible responses. This transfer learning-based QAS framework contributes meaningfully to advancements in NLP and offers valuable support for researchers and developers working on intelligent, explainable, and practical question answering systems.Item ADAPTING MOBILESAM FOR FEW-SHOT SEGMENTATION OF PROSTATE CANCER IN HISTOPATHOLOGY IMAGES(Covenant University Ota, 2025-08) ANTHONY, Micheal IdediA; Covenant University DissertationSegmenting prostate cancer in tissue images is difficult because of irregular gland shapes, broken tissue structures, and very few labelled images available for training. This study introduces FrozenSE-SAM, a segmentation method that works well even with small datasets. It combines a frozen MobileSAM encoder with a lightweight decoder enhanced by Squeeze-and-Excitation (SE) blocks and is trained using Focal Tversky Loss, which helps focus on difficult regions. Unlike older methods that need extra shape information or lots of labels, FrozenSE-SAM can directly segment tumour regions without prompts. It was trained on only 35 tissue microarray (TMA) cores from the Gleason 2019 dataset and tested on 100 new samples. The model achieved a Dice score of 68.45%, which is better than U-Net (60.72%), Swin-UNETR (58.12%), and a Signed Distance Function (SDF) based model (62.77%). For measuring boundary accuracy, FrozenSE-SAM showed better performance with HD95 = 0.0228 mm and ASD = 0.0056 mm, compared to the SDF model (HD95 = 0.0328 mm, ASD = 0.0072 mm), and worse scores from U-Net and Swin-UNETR. Visual/Qualitative result also confirmed that FrozenSE-SAM was better at outlining complex tumour regions. It could accurately segment cribriform and fused glands without including nearby healthy tissue. In contrast, the SDF model produced blurry edges and missed finer structures, leading to under-segmentation. These results show that FrozenSE-SAM is a strong, reliable method for prostate cancer segmentation, especially in real-world situations with limited data.Item AN OPTIMIZED DEEP-FOREST MODEL USING A MODIFIED DIFFERENTIAL EVOLUTION OPTIMIZATION ALGORITHM: A CASE OF HOST-PATHOGEN PROTEIN-PROTEIN INTERACTION PREDICTION(Covenant University Ota, 2025-04) EMMANUEL JERRY DAUDA; Covenant University ThesisDeep forest is an advanced ensemble learning technique that employs forest structures within a cascade framework, leveraging deep architectures to enhance predictive performance by adaptively capturing high-level feature representations. Despite its promise, deep forest models often face critical challenges, including manual hyperparameter optimization and inefficiencies in computational time and memory usage. To address these limitations, Bayesian optimization, a prominent model-based hyperparameter optimization method, is frequently utilized, with Differential Evolution (DE) serving as the acquisition function in recent implementations. However, DE's reliance on random index selection for constructing donor vectors introduces inefficiencies, as suboptimal or redundant indices may hinder the search for optimal solutions. This study introduces an optimized deep forest algorithm that integrates a modified DE acquisition function into Bayesian optimization to improve host-pathogen protein-protein interaction (HPPPI) prediction. The modified DE approach incorporates a weighted and adaptive donor vector selection mechanism, enhancing the exploration and exploitation of hyperparameter configurations. Performance evaluations using 10-fold cross-validation on human–Plasmodium falciparum (PF) protein sequence datasets sourced from reputable databases demonstrated the model's superiority over traditional Bayesian optimization, genetic algorithms, evolutionary strategies, and conventional machine learning models. The optimized framework achieved an accuracy of 89.3%, sensitivity of 85.4%, precision of 91.6%, and Area Under the Receiver Operating Characteristic Curve (AUROC) of 89.1%, surpassing existing methods. Additionally, the model exhibited reduced computational time and memory usage. The optimized DF was deployed as a web-based pipeline, DFH3PI (Deep Forest Host-Pathogen Protein-Protein Interaction Prediction), which successfully identified three potential human–PF PPIs previously classified as non-interacting: P50250–P08319, Q8ILI6–O94813, and Q7KQL3–Q96GQ7. These findings not only present the potential of DFH3PI for advancing HPPPI prediction but also establish the optimized deep forest framework as a transformative tool in computational biology. Its ability to combine accuracy and efficiency marks a significant step forward in predictive modeling.Item CONSTRUCTING GENE REGULATORY NETWORKS FOR BREAST CANCER STEM CELLS USING SINGLE-CELL MULTI-OMICS(Covenant University Ota, 2025-08) UJOH, Treasure Ulonna; Covenant University DissertationBreast cancer mortality is primarily driven by metastasis and therapeutic relapse; processes strongly linked to breast cancer stem cells (BCSCs). These cells are believed to orchestrate tumor initiation, resistance, and recurrence through complex gene regulatory networks (GRNs) that remain poorly characterized. This study aimed to construct and compare GRNs of BCSCs and normal mammary stem cells (MaSCs) using a single-cell multi-omics framework. Datasets that were publicly available, containing single-cell RNA sequencing (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) profiles, were sourced from normal breast tissue, primary tumors, recurrent tumors, and BCSC-enriched mammospheres. The datasets were subjected to strict preprocessing that involved filtering, normalization, and quality control. The scGLUE model, which uses a graph neural network to integrate multiple omics including transcriptomic and epigenomic data into a single latent space while conserving biological identity, was used to integrate the datasets. pySCENIC pipeline was then used, which combines co-expression analysis, cis-regulatory motif enrichment, and pruning to reconstruct gene regulatory networks, and high-confidence regulons were generated for BCSCs and MaSCs. Comparative network analysis revealed extensive “regulatory rewiring” in BCSCs, with transcription factors such as JUNB, FOSB, LEF1, SOX4, and MAFB emerging as master regulators absent or significantly altered in normal stem cells. Functional enrichment of BCSC-exclusive targets highlighted pathways central to metastasis and recurrence, including extracellular matrix remodeling, adhesion, migration, and growth factor signaling. Disease ontology mapping further confirmed strong associations with invasive breast carcinoma and therapy resistance. Collectively, this study provides one of the first integrated single-cell GRN maps contrasting BCSCs with their normal counterparts, establishing mechanistic links between regulatory rewiring and cancer hallmarks. The identification of BCSC-specific master regulators offers promising therapeutic entry points for interventions aimed at eradicating the root drivers of breast cancer relapse and metastasis.Item CONSTRUCTION AND ANALYSIS OF AN miRNA-mRNA NETWORK TO IDENTIFY HUB REGULATORS IN DROUGHT-STRESSED COWPEA(Covenant University Ota, 2025-08) AWHA, Oghenetega Joel; Covenant University DissertationDrought stress severely threatens cowpea (Vigna unguiculata), a crop vital to food security in sub-Saharan Africa. While naturally resilient, the molecular mechanisms governing its drought response are poorly understood. This study aimed to map the post-transcriptional regulatory network of microRNAs (miRNAs) and their messenger RNA (mRNA) targets to identify key regulators of drought tolerance. Using a bioinformatics approach on public RNA-sequencing data from drought-stressed cowpea, differentially expressed miRNAs and mRNAs were identified. These were then used to construct an integrated miRNA-mRNA regulatory network, and centrality analysis was applied to pinpoint the most influential "hub" regulators. The analysis revealed a core set of hub miRNAs, including the conserved miR396, miR172, and miR156 families, that orchestrate the drought response. These hubs regulate critical target genes involved in stress hormone synthesis (ACC oxidase), protein stability (Heat Shock Protein 70), and antioxidant defense (Sulfate Transporter, Cysteine proteinase). Additionally, this study resulted in the identification of 31 novel miRNAs, which, however, require laboratory validation. This discovery highlights a coordinated survival strategy where a few master-switch miRNAs control a wide array of protective functions. This research provides a foundational "molecular blueprint" of cowpea's drought response, identifying high-priority candidate genes for developing climate-resilient varieties. By elucidating these regulatory hubs, this work offers a clear path for targeted breeding and biotechnological interventions to safeguard this essential cropItem DEVELOPMENT OF A GENE FUSION DETECTION VALIDATION FRAMEWORK FOR LONG-READ RNA SEQUENCING USING ALIGNMENT EVIDENCE AND MACHINE LEARNING(Covenant University Ota, 2025-08) AMOS, IISOMINEA ISOZO; Covenant University DissertationGene fusions are critical drivers of cancer and serve as diagnostic and therapeutic biomarkers. Detecting them reliably from long-read RNA sequencing (RNA-Seq) data remains challenging due to high error rates and complex transcript structures. Current methods often depend on matched whole-genome sequencing (WGS) data, which may be unavailable or uninformative when fusions are expressed without clear genomic breakpoints. To address this, a long-read fusion validation pipeline was developed, optimized for transcript-level evidence by removing reliance on genomic data and focusing on functionally expressed fusions. The pipeline integrates alignment support from realigned soft-clipped reads, supplementary alignments, and full-length chimeric reads to validate transcripts. A Random Forest model was further trained using features derived from validated events to refine classification. Applied to five cancer cell line datasets, with emphasis on breast cancer, the pipeline achieved a 68.1% overall validation rate and 77.8% in MCF7. It distinguished true fusions, deprioritized database-reported false positives, and highlighted high-confidence novel candidates. Known fusions such as BCAS4–BCAS3 were confirmed, while MOV10–RHOC emerged as a biologically relevant novel fusion supported by multiple evidence types and recurrent in MCF7 and K562. Another candidate, CLUL1–TYMS, detected across four lines, likely represents a transcriptional read-through.Benchmarking against experimentally validated fusion transcripts, rather than DNA-based tools, established a transcript-focused alternative for fusion discovery. This dataset will be made publicly available to support benchmarking and machine learning research. The framework enables high-confidence detection of transcript-level fusions in cancer and shows strong potential for biomarker discovery and precision oncology.Item DEVELOPMENT OF A MULTI-LABEL CLASSIFIER FOR PREDICTING GENETIC MARKERS ASSOCIATED WITH MULTI-DRUG RESISTANCE IN Plasmodium falciparum STRAINS(Covenant University Ota, 2025-08) OGUNDIMU, Temitayo Ayomikun; Covenant University DissertationMalaria is an infectious disease of global health importance caused by Plasmodium falciparum. It is highly complicated by parasite’s ability to gain resistance to multiple antimalarial drugs simultaneously, a phenomenon known as multidrug resistance (MDR). Single-label models only predict resistance to one drug at a time and as such would not capture these complex resistance patterns, limiting their utility for real-world surveillance. To bridge this gap, this study developed and evaluated four advanced multi-label classification models: Random Forest with Binary Relevance (RFDTBR), Ensemble of Classifier Chains (ECCJ48), Ensemble of Binary Relevance (EBRJ48), and a Backpropagation Neural Network (BPNN), using genomic and phenotypic data for five key antimalarials. Notably, RFDTBR and EBRJ48 outperformed others in predicting exact MDR profiles, while BPNN performed faster compared to the other models. Sulfadoxine-Pyrimethamine had the lowest performance across the models. Specific genomic features consistently emerged as key predictive factors across all models. These findings demonstrate the value of multi-label learning for comprehensive MDR prediction. Also, effective models and genomic regions were identified, warranting further investigation, thereby paving the way for improved resistance surveillanceItem DEVELOPMENT OF A REGULARIZATION-BASED FAIRNESS-AWARE LOSS FUNCTION FOR MITIGATING POPULARITY BIAS IN MOVIE RECOMMENDER SYSTEMS(Covenant University Ota, 2025-08) IDOWU, Esther Oluwaseyi; Covenant University DissertationRecommender systems are a cornerstone of the movie entertainment industry, driving user engagement and personalizing content delivery to enhance customer experience. However, popularity bias where certain content is disproportionately recommended can limit the visibility of diverse contents, undermining innovation and customer satisfaction. This study proposed a regularization-based fairness-aware mechanism designed to mitigate popularity bias in recommender systems. The proposed mechanism integrates fairness-aware loss functions, such as exposure fairness loss, fairness-aware ranking loss and disparate Impact Loss, into the Neural Collaborative Filtering recommendation algorithm. The fairness-aware model was integrated into a movie recommendation system. The system was evaluated for technical effectiveness in terms of recommendation accuracy, fairness in exposure, and usability in real-world entertainment business contexts. All models achieved very high exposure fairness (≥0.9995). REBFAL matched the best baselines in long-tail coverage (0.2987) while showing a slightly higher exposure imbalance (0.7487), reflecting a trade-off between fairness and distribution.Item DEVELOPMENT OF A TRANSFER LEARNING PIPELINE FOR PROSTATE CANCER AGGRESSIVENESS CLASSIFICATION(Covenant University Ota, 2025-08) OLUSUYI, Fiyinfoluwa Ruth; Covenant University DissertationProstate cancer is a leading malignancy in men, where accurate aggressiveness assessment is crucial for guiding treatment. While multi-parametric MRI (mpMRI) is now the established standard for non-invasive diagnosis, its interpretation can be subjective. Deep learning has shown promise, but limited data poses a challenge. This study addresses this limitation by developing a comprehensive transfer learning pipeline for automated prostate cancer aggressiveness classification using mpMRI data. The public PROSTATEx dataset was processed into 2D image patches combining T2-weighted, ADC, and high b-value DWI sequences as 3-channel inputs. Seven state-of-the-art pre-trained Convolutional Neural Network (CNN) architectures, including EfficientNet-B0, ResNet18, VGG16, DenseNet121, MobileNetV3, InceptionV3, and ShuffleNet V2, were fine-tuned using a consistent framework incorporating WeightedRandomSampler and regularization to address class imbalance. Performance evaluation was carried out on a separate validation set using a range of standard metrics, including accuracy, F1-score, specificity, and AUC. The findings identified EfficientNet-B0 as the superior architecture. It delivered the best performance, achieving an overall accuracy of nearly 97% and a macro F1-score of 0.96. This result highlights the exceptional effectiveness of modern, efficient network designs. Remarkably, the lightweight MobileNetV3 delivered nearly identical performance, also achieving a 96% accuracy and macro F1-score. Other architectures, including ShuffleNet V2, DenseNet121, and ResNet18, also proved highly effective with accuracies between 94-96%. The VGG16 and InceptionV3 models did not reach the same level of performance as the leading architectures, with accuracies of 0.72 and 0.90, respectively.Item DEVELOPMENT OF A TRANSFER LEARNING PIPELINE FOR PROSTATE CANCER AGGRESSIVENESS CLASSIFICATION(Covenant University Ota, 2025-08) OLUSUYI, Fiyinfoluwa Ruth; Covenant University DissertationProstate cancer is a leading malignancy in men, where accurate aggressiveness assessment is crucial for guiding treatment. While multi-parametric MRI (mpMRI) is now the established standard for non-invasive diagnosis, its interpretation can be subjective. Deep learning has shown promise, but limited data poses a challenge. This study addresses this limitation by developing a comprehensive transfer learning pipeline for automated prostate cancer aggressiveness classification using mpMRI data. The public PROSTATEx dataset was processed into 2D image patches combining T2-weighted, ADC, and high b-value DWI sequences as 3-channel inputs. Seven state-of-the-art pre-trained Convolutional Neural Network (CNN) architectures, including EfficientNet-B0, ResNet18, VGG16, DenseNet121, MobileNetV3, InceptionV3, and ShuffleNet V2, were fine-tuned using a consistent framework incorporating WeightedRandomSampler and regularization to address class imbalance. Performance evaluation was carried out on a separate validation set using a range of standard metrics, including accuracy, F1-score, specificity, and AUC. The findings identified EfficientNet-B0 as the superior architecture. It delivered the best performance, achieving an overall accuracy of nearly 97% and a macro F1-score of 0.96. This result highlights the exceptional effectiveness of modern, efficient network designs. Remarkably, the lightweight MobileNetV3 delivered nearly identical performance, also achieving a 96% accuracy and macro F1-score. Other architectures, including ShuffleNet V2, DenseNet121, and ResNet18, also proved highly effective with accuracies between 94-96%. The VGG16 and InceptionV3 models did not reach the same level of performance as the leading architectures, with accuracies of 0.72 and 0.90, respectively.Item DEVELOPMENT OF AN ENHANCED EXPLAINABLE TRANSFORMER INPUT SAMPLING METHOD FOR NOISY IMAGE ENVIRONMENTS(Covenant University Ota, 2025-08) ADJAOKE, WO-O TERTIUS; Covenant University DissertationVision Transformers (ViTs) excel in computer vision tasks due to their ability to capture long-range dependencies, but their complex decision-making processes pose interpretability challenges, especially for low-quality images affected by noise, blur, or degradation. This study enhances the Transformer Input Sampling (TIS) method, introducing RobustTIS to improve explanation reliability in such conditions. The original TIS was evaluated on clean and degraded ImageNet images, revealing reduced performance under severe noise like Gaussian and motion blur. RobustTIS incorporates attention-guided clustering, adaptive token selection, noise-tolerant scoring, and sparsity regularization, achieving improved insertion (0.6755 vs. 0.6643), deletion (0.1849 vs. 0.1926), and sparseness (0.3360 vs. 0.3163) scores, with equivalent max-sensitivity (0.1177). Applied to PathMNIST and UC Merced datasets, RobustTIS generated precise saliency maps for medical and surveillance tasks despite low resolution and noise. However, it requires higher computational resources (31.40s, 1689.25 MB peak GPU) than TIS (8.85s, 801.04 MB). Quantitative and qualitative evaluations confirm RobustTIS’s enhanced robustness and interpretability, though its computational cost suggests a trade-off. This work advances ViT-based explainable AI, offering practical benefits for medical imaging and surveillance, and lays a foundation for future research into efficient, trustworthy AI systems in challenging imaging environments.Item GENOME-WIDE IDENTIFICATION OF SHORT TANDEM REPEATS ASSOCIATED WITH MULTI-DRUG RESISTANCE IN Plasmodium falciparum STRAINS(Covenant University Ota, 2025-08) EMMANUELLA EKURI MAMTUMAMBOH; Covenant University DissertationAntimalarial drug resistance in Plasmodium falciparum threatens global malaria control, and while single nucleotide polymorphisms (SNPs) are well-studied, the role of short tandem repeats (STRs) remains underexplored. This study investigates the contribution of pathogenic STRs to drug resistance using STR genotypes from HipSTR, phenotypic resistance data, and machine learning models. Allele frequency analysis revealed consistently lower alternative allele frequencies in resistant strains across all 14 chromosomes, with strong selective signals on chromosomes 2, 3, 4, 8, and 13. Population differentiation analyses (PCA, FST) identified key resistance loci near PfKelch13 and plasmepsin 2/3, along with potential novel resistance regions. A logistic regression model trained on STR alleles achieved perfect classification (AUC = 1.00), demonstrating the strong predictive power of STRs in distinguishing resistant from sensitive parasites. Top STRs showed both known and novel associations with resistance, reinforcing the polygenic nature of antimalarial resistance. These findings establish STRs as important genetic markers for resistance surveillance and highlight their potential utility in guiding malaria treatment strategies.Item IMPROVEMENT OF INFERENCE-TIME PREDICTION FOR SPEECH EMOTION RECOGNITION USING ITERATIVE kNN MAJORITY VOTING ON WavLM FEATURE EMBEDDINGS(Covenant University Ota, 2025-08) FALANA, John Oluwaseun; Covenant University DissertationThe prediction inconsistency and poor decision boundaries in high-dimensional embedding spaces limit the performance of Speech Emotion Recognition (SER) systems. This study proposes a post-processing framework that applies iterative k-Nearest Neighbors (kNN) majority voting to refine the output of a fine-tuned WavLM model without requiring retraining. Using the CREMA-D, an English dataset with 7,442 samples, embeddings were extracted and iteratively relabelled based on local neighborhood structure in the latent space. This refinement process enhanced label consistency and leveraged proximity-based corrections at inference time. Model performance was evaluated using standard SER metrics (accuracy and F1-score) and t-SNE visualization. Results show that repeated kNN refinement improves both classification accuracy and the clarity of decision boundaries, with a 1.87% improvement in F1 score from baseline compared to an improvement of 0.67% by the SCL+kNN approach from baseline. The approach is model-agnostic, efficient, and data-centric, offering a viable alternative to computationally expensive retraining. It highlights the value of embedding-space operations for improving SER reliability in real-world settings.Item IMPROVEMENT OF PARTICLE SWARM ALGORITHM WITH NAME-ENTITY RECOGNITION TECHNIQUE FOR OPTIMISED TASK SCHEDULING IN PUBLIC CLOUD SERVICES(Covenant University Ota, 2025-08) ASSOGBA, Danielle Agossi Jovincia; Covenant University DissertationPublic cloud services have transformed IT infrastructure by offering scalable, on-demand resources, yet efficient task scheduling remains a critical NP-hard challenge, often leading to latency, energy inefficiency, and poor resource utilization. Traditional Particle Swarm Optimization (PSO) algorithms, while effective, suffer from premature convergence and limited adaptability in dynamic environments. This study proposes an enhanced PSO model, named NERPSO, by integrating Name-entity Recognition (NER) techniques to process unstructured task data, enabling intelligent and adaptive scheduling in public cloud services. This study utilized generated task datasets from CloudSim Plus to simulate real-world cloud workloads, employing middleware to interface the NER module with the PSO load balancer. Methodologies include training NER models on historical task annotations using SpaCy, leveraging Word2Vec for semantic enhancement, and conducting comparative tests across 100, 200, 300, …1000 tasks scenarios. The approach evaluates performance through simulation phases, comparing baseline PSO, Reinforcement Learning PSO, and the proposed NERPSO. Results demonstrate that NERPSO significantly outperforms traditional PSO and Reinforcement Learning PSO, achieving lower average response times (2364.60 ms for 300 cloudlets), higher throughput (123.0821 tasks/sec), and improved load balance (85.5% for 100 cloudlets). Precision and F1 scores also improved (92% and 84.83% for 100 and 300 cloudlets, respectively), validating the efficacy of NER integration. This study claims that NERPSO offers a robust solution for optimized task scheduling in public cloud environments, with its context-aware capabilities reducing latency and enhancing response time, throughput, better than baseline PSO. The findings support its potential for adoption, suggesting further research into energy-efficient enhancements based on these promising results.Item OPTIMISATION OF QUERY ALGORITHM FOR SUFFIX-BASED DATA STRUCTURES USING HIERARCHICAL ORGANISATION AND DYNAMIC PRUNING(Covenant University Ota, 2025-08) OLUYINKA, Oluwatimilehin Oyebola; Covenant University DissertationSuffix trees and suffix arrays are fundamental data structures in the big textual data processing domain. However, the memory costs of suffix trees remain a bottleneck, and the logarithmic query time of suffix arrays is suboptimal. To address these problems, this study combines the best features of both data structures by representing the suffixes of the reference string in a query-aware hierarchical data structure of nodes and edges. The goal is to ensure that substring searches can be performed using less space than the suffix trees, and at a more optimal time than when querying with suffix arrays. The hierarchy is constructed by scanning through the suffix array, representing the fixed-length prefixes of suffixes as edge labels, and the corresponding indices stored in buckets. To minimise space cost, only a single level of the hierarchy is stored in memory. The novelty of this study lies in the introduction of a tunable pruning mechanism controlled by a pruning parameter at every level of the hierarchy to eliminate suffixes irrelevant to the current search. The tunability of the pruning parameter provides a balance between memory and time cost to suit every data type. The performance of the algorithm was evaluated with natural language and biological sequences. For natural languages with alphabet sizes greater than 20 letters, lower values of the pruning parameter offer the best time and memory costs, and even less with growing query string lengths. For biological sequences with alphabet sizes of less than 20 letters, the pruning parameter could be adjusted for a trade-off between memory and time. The algorithm was benchmarked against suffix trees, suffix arrays, and enhanced suffix arrays, This algorithm offers at least an 8x reduction in space consumption across all data types, accompanied by a 1.97x better query time performance for specific data types for patten matching and string searching operations on big textual dataset.Item PREDICTION OF PLASTIC DEGRADING ENZYMES WITH PROTEIN SEQUENCES AND 3D STRUCTURES INTEGRATION USING CONVOLUTIONAL NEURAL NETWORK(Covenant University Ota, 2025-08) AKINYEMI, Priscilla Oluwatomi; Covenant University DissertationThe growing problem of plastic waste has made the discovery of plastic-degrading enzymes (PDEs) essential, requiring innovative computational solutions. This study proposes a deep learning framework to predict plastic-degrading enzymes (PDEs) by integrating features from protein sequence embeddings and 3D structures. A curated dataset of 1,791 protein sequences consisting both plastic degrading enzymes and plastic non-degrading enzyme sequences were analyzed. ESM-2 language model representations were obtained for the sequences, while structural features were computed from AlphaFold2-predicted structures via graph neural networks. These multimodal features were fed into a Convolutional Neural Network (CNN) achieving an accuracy of 97.7% and an F1 score of 0.94, representing the state of the art. The trained model was used to predict a list of twenty-one (21) unannotated enzymes. six of these unannotated proteins with UniProt IDs; A0A6J6HCC9, A0A6J7GSX4, A0A6J6XVW8, A0A6J7ECY4, A0A6J6SVN6, AOA6J6T2V9 showed a predictive degradative probability of over 70% probability. This study facilitates the identification of possible PDEs using integrated sequence and structural data, for more accurate enzyme classification, as well as sustainable environmental applications.Item PREDICTION OF THE SPREAD OF MALARIA IN PLATEAU STATE: A MACHINE LEARNING APPROACH(Covenant University Ota, 2025-08) EGAH, Daniel Owhlama; Covenant University DissertationMalaria remains a major public health concern in Plateau State, Nigeria, with seasonal surges driven by climatic, environmental, and socio-economic factors. Despite various control interventions, locally adapted predictive models are scarce, limiting proactive disease control measures. This study aimed to develop and evaluate machine learning models capable of forecasting malaria incidence across the state, thereby supporting targeted prevention and control strategies. Using a ten-year dataset (2014– 2023) covering confirmed malaria cases, rainfall, temperature, and relative humidity for all 17 Local Government Areas (LGAs) of Plateau State, the data were preprocessed through cleaning, normalization, and integration of climatic and epidemiological variables. Three supervised machine learning algorithms—Support Vector Machine (SVM), Random Forest (RF), and Extreme Gradient Boosting (XGBoost)— were trained for both regression and classification tasks, and their performance was evaluated using Mean Squared Error (MSE), Coefficient of Determination (R2), accuracy, precision, recall, and F1-score. For classification, the Random Forest model achieved the highest accuracy (63.4%) with balanced precision and recall, followed by XGBoost, while SVM exhibited higher recall for class 0 but markedly lower performance for class 1. For regression, XGBoost outperformed all models, yielding the lowest MSE (554,539) and highest R2 (0.587), followed by Random Forest (R2 = 0.562), while SVM recorded a negative R2 (-0.037), indicating poor fit. The study concludes that tree-based ensemble models, particularly XGBoost, offer superior predictive capabilities for malaria incidence in Plateau State. It is recommended that such predictive models be integrated into the state’s malaria surveillance systems, retrained periodically with updated climatic and epidemiological data, and expanded to include socio-economic and intervention coverage variables for improved accuracy and operational relevance