Department of Computer and Information Sciences

Permanent URI for this communityhttp://itsupport.cu.edu.ng:4000/handle/123456789/28739

Welcome to the Page of Computer and Information Sciences

Browse

Search Results

Now showing 1 - 6 of 6
  • Item
    OPTIMISATION OF QUERY ALGORITHM FOR SUFFIX-BASED DATA STRUCTURES USING HIERARCHICAL ORGANISATION AND DYNAMIC PRUNING
    (Covenant University Ota, 2025-08) OLUYINKA, Oluwatimilehin Oyebola; Covenant University Dissertation
    Suffix trees and suffix arrays are fundamental data structures in the big textual data processing domain. However, the memory costs of suffix trees remain a bottleneck, and the logarithmic query time of suffix arrays is suboptimal. To address these problems, this study combines the best features of both data structures by representing the suffixes of the reference string in a query-aware hierarchical data structure of nodes and edges. The goal is to ensure that substring searches can be performed using less space than the suffix trees, and at a more optimal time than when querying with suffix arrays. The hierarchy is constructed by scanning through the suffix array, representing the fixed-length prefixes of suffixes as edge labels, and the corresponding indices stored in buckets. To minimise space cost, only a single level of the hierarchy is stored in memory. The novelty of this study lies in the introduction of a tunable pruning mechanism controlled by a pruning parameter at every level of the hierarchy to eliminate suffixes irrelevant to the current search. The tunability of the pruning parameter provides a balance between memory and time cost to suit every data type. The performance of the algorithm was evaluated with natural language and biological sequences. For natural languages with alphabet sizes greater than 20 letters, lower values of the pruning parameter offer the best time and memory costs, and even less with growing query string lengths. For biological sequences with alphabet sizes of less than 20 letters, the pruning parameter could be adjusted for a trade-off between memory and time. The algorithm was benchmarked against suffix trees, suffix arrays, and enhanced suffix arrays, This algorithm offers at least an 8x reduction in space consumption across all data types, accompanied by a 1.97x better query time performance for specific data types for patten matching and string searching operations on big textual dataset.
  • Item
    DEVELOPMENT OF A TRANSFER LEARNING PIPELINE FOR PROSTATE CANCER AGGRESSIVENESS CLASSIFICATION
    (Covenant University Ota, 2025-08) OLUSUYI, Fiyinfoluwa Ruth; Covenant University Dissertation
    Prostate cancer is a leading malignancy in men, where accurate aggressiveness assessment is crucial for guiding treatment. While multi-parametric MRI (mpMRI) is now the established standard for non-invasive diagnosis, its interpretation can be subjective. Deep learning has shown promise, but limited data poses a challenge. This study addresses this limitation by developing a comprehensive transfer learning pipeline for automated prostate cancer aggressiveness classification using mpMRI data. The public PROSTATEx dataset was processed into 2D image patches combining T2-weighted, ADC, and high b-value DWI sequences as 3-channel inputs. Seven state-of-the-art pre-trained Convolutional Neural Network (CNN) architectures, including EfficientNet-B0, ResNet18, VGG16, DenseNet121, MobileNetV3, InceptionV3, and ShuffleNet V2, were fine-tuned using a consistent framework incorporating WeightedRandomSampler and regularization to address class imbalance. Performance evaluation was carried out on a separate validation set using a range of standard metrics, including accuracy, F1-score, specificity, and AUC. The findings identified EfficientNet-B0 as the superior architecture. It delivered the best performance, achieving an overall accuracy of nearly 97% and a macro F1-score of 0.96. This result highlights the exceptional effectiveness of modern, efficient network designs. Remarkably, the lightweight MobileNetV3 delivered nearly identical performance, also achieving a 96% accuracy and macro F1-score. Other architectures, including ShuffleNet V2, DenseNet121, and ResNet18, also proved highly effective with accuracies between 94-96%. The VGG16 and InceptionV3 models did not reach the same level of performance as the leading architectures, with accuracies of 0.72 and 0.90, respectively.
  • Item
    A MULTI-DOCUMENT SUMMARIZATION APPROACH FOR QUERY-DRIVEN NON-FACTOID QUESTION-ANSWERING SYSTEM
    (Covenant University Ota, 2025-07) EFOSA-ZUWA, Emmanuel Temidire; Covenant University Dissertation
    In Natural Language Processing (NLP), Question Answering Systems (QAS) are essential for facilitating efficient access to relevant information. Traditional QAS approaches typically involve decomposing user queries, retrieving relevant documents, and ranking potential answers, often struggle with non-factoid questions that require detailed, context-rich responses synthesized from multiple sources. While existing research has focused heavily on passage selection and ranking, many methods fail to produce a coherent answer, leaving the challenge of multi-source summarization largely unresolved. This study presents a transfer learning-based QAS framework that addresses non-factoid queries through multi-source summarization. The framework follows a multi-stage methodology incorporating question paraphrasing, contradiction detection, sentence embedding and pruning, and a hybrid approach combining extractive and abstractive summarization techniques. Quantitative and qualitative analyses were conducted using benchmark datasets, including WikiHow QA and PubMedQA to evaluate its effectiveness. The proposed system achieved strong quantitative results, with scores on WikiHow QA (ROUGE-1: 34.10, ROUGE-2: 12.30, ROUGE-L: 32.10, BLEU: 25.14, BERTScore: 95.17) and PubMedQA (ROUGE-1: 42.30, ROUGE-2: 16.10, ROUGE-L: 33.40, BLEU: 31.66, BERTScore: 95.72), demonstrating its ability to generate accurate and contextually relevant answers. Qualitative evaluations also yielded promising outcomes, with average ratings of 4.37 for information, 4.16 for conciseness, 4.20 for readability, and 4.01 for correctness on a 5-point scale, confirming the model’s effectiveness in delivering accurate and comprehensible responses. This transfer learning-based QAS framework contributes meaningfully to advancements in NLP and offers valuable support for researchers and developers working on intelligent, explainable, and practical question answering systems.
  • Item
    AN OPTIMIZED DEEP-FOREST MODEL USING A MODIFIED DIFFERENTIAL EVOLUTION OPTIMIZATION ALGORITHM: A CASE OF HOST-PATHOGEN PROTEIN-PROTEIN INTERACTION PREDICTION
    (Covenant University Ota, 2025-04) EMMANUEL JERRY DAUDA; Covenant University Thesis
    Deep forest is an advanced ensemble learning technique that employs forest structures within a cascade framework, leveraging deep architectures to enhance predictive performance by adaptively capturing high-level feature representations. Despite its promise, deep forest models often face critical challenges, including manual hyperparameter optimization and inefficiencies in computational time and memory usage. To address these limitations, Bayesian optimization, a prominent model-based hyperparameter optimization method, is frequently utilized, with Differential Evolution (DE) serving as the acquisition function in recent implementations. However, DE's reliance on random index selection for constructing donor vectors introduces inefficiencies, as suboptimal or redundant indices may hinder the search for optimal solutions. This study introduces an optimized deep forest algorithm that integrates a modified DE acquisition function into Bayesian optimization to improve host-pathogen protein-protein interaction (HPPPI) prediction. The modified DE approach incorporates a weighted and adaptive donor vector selection mechanism, enhancing the exploration and exploitation of hyperparameter configurations. Performance evaluations using 10-fold cross-validation on human–Plasmodium falciparum (PF) protein sequence datasets sourced from reputable databases demonstrated the model's superiority over traditional Bayesian optimization, genetic algorithms, evolutionary strategies, and conventional machine learning models. The optimized framework achieved an accuracy of 89.3%, sensitivity of 85.4%, precision of 91.6%, and Area Under the Receiver Operating Characteristic Curve (AUROC) of 89.1%, surpassing existing methods. Additionally, the model exhibited reduced computational time and memory usage. The optimized DF was deployed as a web-based pipeline, DFH3PI (Deep Forest Host-Pathogen Protein-Protein Interaction Prediction), which successfully identified three potential human–PF PPIs previously classified as non-interacting: P50250–P08319, Q8ILI6–O94813, and Q7KQL3–Q96GQ7. These findings not only present the potential of DFH3PI for advancing HPPPI prediction but also establish the optimized deep forest framework as a transformative tool in computational biology. Its ability to combine accuracy and efficiency marks a significant step forward in predictive modeling.