DEVELOPMENT OF A COMPUTATIONAL PIPELINE FOR THE
IDENTIFICATION OF NON-CODING RNAs FROM NEXT
GENERATION SEQUENCING DATA

NDIFON, NAOMI SIJE-OKIM, Covenant University, Theses Masters

DEVELOPMENT OF A COMPUTATIONAL PIPELINE FOR THE IDENTIFICATION OF NON-CODING RNAs FROM NEXT GENERATION SEQUENCING DATA

Files

Pages from NDIFON Naomi_22PBF02395 MSc Dissertation.pdf (190.21 KB)

Authors

NDIFON, NAOMI SIJE-OKIM, Covenant University, Theses Masters

Description

Recent advances in genomics have revealed the critical roles that non-coding RNAs play in disease occurrence, progression, and population disparities in patient treatment outcomes. With the evolution of Next Generation Sequencing (NGS) techniques and the generation of genomic big data, the ability of researchers to further explore the functions of these non-coding RNAs has become more widely accessible. However, efficient exploration requires user-friendly computational tools that can streamline and centralize data analysis, particularly for identifying non-coding RNAs within large volumes of NGS data. Current computational pipelines for non-coding RNA identification are often limited to detecting only a single class of non-coding RNA and do not integrate the latest standalone tools. Consequently, these pipelines are not workflow efficient as they restrict the comprehensive analysis of diverse non-coding RNA classes within a single framework. The aim of this study is to develop a computational pipeline for identifying multiple classes of non-coding RNAs namely micro RNAs, long non-coding RNAs and circular RNAs from NGS data. This aim was achieved by developing scripts for the selected software tools integrated into the pipeline and incorporating these scripts as individual processes within a unified Nextflow script. The software tools integrated into the pipeline include; miRDeep2, mirnovo and sRNAtoolbox for the identification of miRNAs; CIRI and KNIFE for the identification of circRNAs; PLEK and LncDC for the identification of lncRNAs. Nextflow was used as the scientific workflow management system and Docker was used for containerizing all the integrated tools and their software dependencies for easy use and reproducibility across different computing environments. The pipeline was then evaluated using test data provided by each of the individual software tools and it successfully identified all the reported miRNAs, lncRNAs and circRNAs, thus proving its effectiveness. Beyond the reduced execution time, the pipeline offers a more efficient solution by streamlining the analysis of noncoding RNAs and eliminating the need for separate software installation and environment setup, thereby reducing the user's workload.

Keywords

QA75 Electronic computers. Computer science

URI

https://repository.covenantuniversity.edu.ng/handle/123456789/49050

Collections

EPrint

Full item page

DEVELOPMENT OF A COMPUTATIONAL PIPELINE FOR THE IDENTIFICATION OF NON-CODING RNAs FROM NEXT GENERATION SEQUENCING DATA

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By