Department of Computer and Information Sciences

Permanent URI for this communityhttp://itsupport.cu.edu.ng:4000/handle/123456789/28739

Welcome to the Page of Computer and Information Sciences

Browse

Search Results

Now showing 1 - 1 of 1
  • Item
    OPTIMISATION OF QUERY ALGORITHM FOR SUFFIX-BASED DATA STRUCTURES USING HIERARCHICAL ORGANISATION AND DYNAMIC PRUNING
    (Covenant University Ota, 2025-08) OLUYINKA, Oluwatimilehin Oyebola; Covenant University Dissertation
    Suffix trees and suffix arrays are fundamental data structures in the big textual data processing domain. However, the memory costs of suffix trees remain a bottleneck, and the logarithmic query time of suffix arrays is suboptimal. To address these problems, this study combines the best features of both data structures by representing the suffixes of the reference string in a query-aware hierarchical data structure of nodes and edges. The goal is to ensure that substring searches can be performed using less space than the suffix trees, and at a more optimal time than when querying with suffix arrays. The hierarchy is constructed by scanning through the suffix array, representing the fixed-length prefixes of suffixes as edge labels, and the corresponding indices stored in buckets. To minimise space cost, only a single level of the hierarchy is stored in memory. The novelty of this study lies in the introduction of a tunable pruning mechanism controlled by a pruning parameter at every level of the hierarchy to eliminate suffixes irrelevant to the current search. The tunability of the pruning parameter provides a balance between memory and time cost to suit every data type. The performance of the algorithm was evaluated with natural language and biological sequences. For natural languages with alphabet sizes greater than 20 letters, lower values of the pruning parameter offer the best time and memory costs, and even less with growing query string lengths. For biological sequences with alphabet sizes of less than 20 letters, the pruning parameter could be adjusted for a trade-off between memory and time. The algorithm was benchmarked against suffix trees, suffix arrays, and enhanced suffix arrays, This algorithm offers at least an 8x reduction in space consumption across all data types, accompanied by a 1.97x better query time performance for specific data types for patten matching and string searching operations on big textual dataset.