DEVELOPMENT OF A GENE FUSION DETECTION VALIDATION FRAMEWORK FOR LONG-READ RNA SEQUENCING USING ALIGNMENT EVIDENCE AND MACHINE LEARNING
| dc.contributor.author | AMOS, IISOMINEA ISOZO | |
| dc.contributor.author | Covenant University Dissertation | |
| dc.date.accessioned | 2025-10-06T18:25:14Z | |
| dc.date.issued | 2025-08 | |
| dc.description.abstract | Gene fusions are critical drivers of cancer and serve as diagnostic and therapeutic biomarkers. Detecting them reliably from long-read RNA sequencing (RNA-Seq) data remains challenging due to high error rates and complex transcript structures. Current methods often depend on matched whole-genome sequencing (WGS) data, which may be unavailable or uninformative when fusions are expressed without clear genomic breakpoints. To address this, a long-read fusion validation pipeline was developed, optimized for transcript-level evidence by removing reliance on genomic data and focusing on functionally expressed fusions. The pipeline integrates alignment support from realigned soft-clipped reads, supplementary alignments, and full-length chimeric reads to validate transcripts. A Random Forest model was further trained using features derived from validated events to refine classification. Applied to five cancer cell line datasets, with emphasis on breast cancer, the pipeline achieved a 68.1% overall validation rate and 77.8% in MCF7. It distinguished true fusions, deprioritized database-reported false positives, and highlighted high-confidence novel candidates. Known fusions such as BCAS4–BCAS3 were confirmed, while MOV10–RHOC emerged as a biologically relevant novel fusion supported by multiple evidence types and recurrent in MCF7 and K562. Another candidate, CLUL1–TYMS, detected across four lines, likely represents a transcriptional read-through.Benchmarking against experimentally validated fusion transcripts, rather than DNA-based tools, established a transcript-focused alternative for fusion discovery. This dataset will be made publicly available to support benchmarking and machine learning research. The framework enables high-confidence detection of transcript-level fusions in cancer and shows strong potential for biomarker discovery and precision oncology. | |
| dc.identifier.uri | https://repository.covenantuniversity.edu.ng/handle/123456789/50418 | |
| dc.language.iso | en | |
| dc.publisher | Covenant University Ota | |
| dc.subject | Gene Fusion | |
| dc.subject | Fusion Transcript | |
| dc.subject | Cancer Genomics | |
| dc.subject | Gene Fusion Validation | |
| dc.subject | Fusion Detection | |
| dc.subject | Long-Read Sequencing | |
| dc.subject | RNA-Seq | |
| dc.subject | Machine Learning | |
| dc.subject | Precision Oncology | |
| dc.title | DEVELOPMENT OF A GENE FUSION DETECTION VALIDATION FRAMEWORK FOR LONG-READ RNA SEQUENCING USING ALIGNMENT EVIDENCE AND MACHINE LEARNING | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- Pages from Final-Amos_Iisominea_Thesis_Submission_Aug26_2025_Final.pdf
- Size:
- 243.23 KB
- Format:
- Adobe Portable Document Format
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed to upon submission
- Description: