Abstract: In recent years, a vast amount of sequencing data has been generated and large improvements have been made to reference genome sequences. Despite these advances, significant portions of reads still do not map to reference genomes and these reads have been considered as junk or artificial sequences. Recent studies have shown that these reads can be useful, e.g., for refining reference genomes or detecting contaminating microorganisms present in the analyzed biological samples. A special case of this is RNA sequencing (RNA-Seq) reads that come from tissue transcriptomes. Unmapped reads from RNA-Seq have received much less attention than those from whole-genome sequencing. In particular, in the horse, an analysis of unmapped RNA reads has not been performed yet. Thus, in this study, we analyzed the unmapped reads originating from the RNA-Seq performed through the Functional Annotation of Animal Genomes (FAANG) project in the horse, using eight different tissues from two mares. We demonstrated that unmapped reads from RNA-Seq could be easily assembled into transcripts relating to many important genes present in the sequences of other mammals. Large portions of these transcripts did not have coding potential and, thus, can be considered as non-coding RNA. Moreover, reads that were not mapped to the reference genome but aligned to the entries in NCBI database of horse proteins were enriched for biological processes that largely correspond to the functions of organ from which RNA was isolated and thus are presumably true transcripts of genes associated with cell metabolism in those tissues. In addition, a portion of reads aligned to the common pathogenic or neutral microbiota, of which the most common was Brucella spp. These data suggest that unmapped reads can be an important target for in-depth analysis that may substantially enrich results of initial RNA-Seq experiments for various tissues and organs.
The Equine Research Bank provides access to a large database of publicly available scientific literature. Inclusion in the Research Bank does not imply endorsement of study methods or findings by Mad Barn.
This research summary has been generated with artificial intelligence and may contain errors and omissions. Refer to the original study to confirm details provided. Submit correction.
This research explores the value of unmapped RNA sequencing (RNA-Seq) reads from various horse tissues, suggesting these reads can sufficiently assemble into transcripts that reflect important genes found in other mammals, potentially enhancing the outcomes of initial RNA-Seq experiments.
Background of the Study
Despite the advancements in generating sequencing data and improving reference genome sequences, a significant amount of reads still do not map to these genomes, often being dismissed as junk or artificial sequences.
The researchers argue that these unmapped reads, especially those from RNA sequencing (RNA-Seq), can be useful for refining genome references and detecting contamination in biological samples.
This study focuses on unmapped RNA reads from horses, an area that previously hadn’t been addressed.
Methods and Findings
The research team took unmapped reads from eight different horse tissue samples using RNA-Seq methods under the Functional Annotation of Animal Genomes (FAANG) project.
Humongous portions of these unmapped reads could be assembled into transcripts correlating with many significant genes present in other mammals’ sequences.
Large portions of these constructed transcripts lack coding potential and can therefore be considered non-coding RNA.
Unmapped reads aligned themselves with entries in the NCBI database of horse proteins, suggesting a possible association with biological processes that directly link to the functionality of the organ from which the RNA was initially isolated.
Implications of the Findings
This denotes that these unmapped reads should potentially be authentic transcripts of genes connected with cell metabolism in those particular tissues.
In addition to this, some reads aligned themselves with common pathogenic or neutral microbiota, with Brucella spp. being the most prevalent.
The research implies that in-depth analyses of these unmapped reads could significantly enrich the results obtained from initial RNA-Seq experiments on various tissues and organs.
Cite This Article
APA
Gurgul A, Szmatoła T, Ocłoń E, Jasielczuk I, Semik-Gurgul E, Finno CJ, Petersen JL, Bellone R, Hales EN, Ząbek T, Arent Z, Kotula-Balak M, Bugno-Poniewierska M.
(2022).
Another lesson from unmapped reads: in-depth analysis of RNA-Seq reads from various horse tissues.
J Appl Genet, 63(3), 571-581.
https://doi.org/10.1007/s13353-022-00705-z
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological) 57:289–300.
Bosworth CM, Grandhi S, Gould MP, LaFramboise T. Detection and quantification of mitochondrial DNA deletions from next-generation sequence data.. BMC Bioinformatics 2017 Oct 16;18(Suppl 12):407.
Chen J, Dai Z, Cao C, Zhang Q, Liu H, Sun X. Next-generation sequencing data processing: analysis of unmapped reads and extremely high mapped peaks. 5th Int Conf BioMed Eng Informa 893–97.
Degner JF, Marioni JC, Pai AA, Pickrell JK, Nkadori E, Gilad Y, Pritchard JK. Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinform (oxford, England) 25(24):3207–3212.
Diraison F, Beylot M. Role of human liver lipogenesis and reesterification in triglycerides secretion and in FFA reesterification. Am J Phys-Endocrinol Metab 274(2):E321–E327.
Dodt M, Roehr JT, Ahmed R, Dieterich C. FLEXBAR-Flexible Barcode and Adapter Processing for Next-Generation Sequencing Platforms. Biol (Basel) 1(3):895–905.
Gouin A, Legeai F, Nouhaud P, Whibley A, Simon JC, Lemaitre C. Whole-genome re-sequencing of non-model organisms: lessons from unmapped reads. Heredity (edinb) 114(5):494–501.
Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A. Full-length transcriptome assembly from RNA-Seq data without a reference genome.. Nat Biotechnol 2011 May 15;29(7):644-52.
da Huang W, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 37(1):1–13.
Jarvis ED, Mirarab S, Aberer AJ, Li B, Houde P, Li C, Ho SY, Faircloth BC, Nabholz B, Howard JT, Suh A, Weber CC, da Fonseca RR, Li J, Zhang F, Li H, Zhou L, Narula N, Liu L, Ganapathy G, Boussau B, Bayzid MS, Zavidovych V, Subramanian S, Gabaldón T, Capella-Gutiérrez S, Huerta-Cepas J, Rekepalli B, Munch K, Schierup M, Lindow B, Warren WC, Ray D, Green RE, Bruford MW, Zhan X, Dixon A, Li S, Li N, Huang Y, Derryberry EP, Bertelsen MF, Sheldon FH, Brumfield RT, Mello CV, Lovell PV, Wirthlin M, Schneider MP, Prosdocimi F, Samaniego JA, Vargas Velazquez AM, Alfaro-Núñez A, Campos PF, Petersen B, Sicheritz-Ponten T, Pas A, Bailey T, Scofield P, Bunce M, Lambert DM, Zhou Q, Perelman P, Driskell AC, Shapiro B, Xiong Z, Zeng Y, Liu S, Li Z, Liu B, Wu K, Xiao J, Yinqi X, Zheng Q, Zhang Y, Yang H, Wang J, Smeds L, Rheindt FE, Braun M, Fjeldsa J, Orlando L, Barker FK, Jønsson KA, Johnson W, Koepfli KP, O'Brien S, Haussler D, Ryder OA, Rahbek C, Willerslev E, Graves GR, Glenn TC, McCormack J, Burt D, Ellegren H, Alström P, Edwards SV, Stamatakis A, Mindell DP, Cracraft J, Braun EL, Warnow T, Jun W, Gilbert MT, Zhang G. Whole-genome analyses resolve early branches in the tree of life of modern birds.. Science 2014 Dec 12;346(6215):1320-31.
Kaden R, Ferrari S, Jinnerot T, Lindberg M, Wahab T, Lavander M. Brucella abortus: determination of survival times and evaluation of methods for detection in several matrices.. BMC Infect Dis 2018 Jun 5;18(1):259.
Kazemian M, Ren M, Lin JX, Liao W, Spolski R, Leonard WJ. Comprehensive assembly of novel transcripts from unmapped human RNA-Seq data and their association with cancer.. Mol Syst Biol 2015 Aug 7;11(8):826.
Keegan KP, Glass EM, Meyer F. MG-RAST, a Metagenomics Service for Analysis of Microbial Community Structure and Function.. Methods Mol Biol 2016;1399:207-33.
Khan MZ, Zahoor M. An overview of brucellosis in cattle and humans, and its serological and molecular diagnosis in control strategies. Trop Med Infectious Dis 3(2):65.
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14(4):R36.
Laine VN, Gossmann TI, van Oers K, Visser ME, Groenen MAM. Exploring the unmapped DNA and RNA reads in a songbird genome.. BMC Genomics 2019 Jan 8;20(1):19.
Lee Y, Park K, Koh I. Analysis of unmapped regions associated with long deletions in Korean whole genome sequences based on short read data.. Genomics Inform 2019 Dec;17(4):e40.
Metsalu T, Vilo J. ClustVis: a web tool for visualizing clustering of multivariate data using Principal Component Analysis and heatmap.. Nucleic Acids Res 2015 Jul 1;43(W1):W566-70.
Park SJ, Onizuka S, Seki M, Suzuki Y, Iwata T, Nakai K. A systematic sequencing-based approach for microbial contaminant detection and functional inference.. BMC Biol 2019 Sep 13;17(1):72.
Rawlins EL, Okubo T, Xue Y, Brass DM, Auten RL, Hasegawa H, Wang F, Hogan BL. The role of Scgb1a1+ Clara cells in the long-term maintenance and repair of lung airway, but not alveolar, epithelium.. Cell Stem Cell 2009 Jun 5;4(6):525-34.
Schbath S, Martin V, Zytnicki M, Fayolle J, Loux V, Gibrat JF. Mapping reads on a genomic sequence: an algorithmic overview and a practical comparative analysis.. J Comput Biol 2012 Jun;19(6):796-813.
Shaffer HB, Minx P, Warren DE, Shedlock AM, Thomson RC, Valenzuela N, Abramyan J, Amemiya CT, Badenhorst D, Biggar KK, Borchert GM, Botka CW, Bowden RM, Braun EL, Bronikowski AM, Bruneau BG, Buck LT, Capel B, Castoe TA, Czerwinski M, Delehaunty KD, Edwards SV, Fronick CC, Fujita MK, Fulton L, Graves TA, Green RE, Haerty W, Hariharan R, Hernandez O, Hillier LW, Holloway AK, Janes D, Janzen FJ, Kandoth C, Kong L, de Koning AP, Li Y, Literman R, McGaugh SE, Mork L, O'Laughlin M, Paitz RT, Pollock DD, Ponting CP, Radhakrishnan S, Raney BJ, Richman JM, St John J, Schwartz T, Sethuraman A, Spinks PQ, Storey KB, Thane N, Vinar T, Zimmerman LM, Warren WC, Mardis ER, Wilson RK. The western painted turtle genome, a model for the evolution of extreme physiological adaptations in a slowly evolving lineage.. Genome Biol 2013 Mar 28;14(3):R28.
Tiessen A, Pérez-Rodríguez P, Delaye-Arredondo LJ. Mathematical modeling and comparison of protein size distribution in different plant, animal, fungal and microbial species reveals a negative correlation between protein size and protein number, thus providing insight into the evolution of proteomes.. BMC Res Notes 2012 Feb 1;5:85.
Usman T, Hadlich F, Demasius W, Weikard R, Kühn C. Unmapped reads from cattle RNAseq data: A source for missing and misassembled sequences in the reference assemblies and for detection of pathogens in the host.. Genomics 2017 Jan;109(1):36-42.
Whitacre LK, Tizioto PC, Kim J, Sonstegard TS, Schroeder SG, Alexander LJ, Medrano JF, Schnabel RD, Taylor JF, Decker JE. What's in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual.. BMC Genomics 2015 Dec 29;16:1114.
Wong AP, Keating A, Waddell TK. Airway regeneration: the role of the Clara cell secretory protein and the cells that express it.. Cytotherapy 2009;11(6):676-87.
Xu M, Yang W, Wang X, Nayak DK. Lung Secretoglobin Scgb1a1 Influences Alveolar Macrophage-Mediated Inflammation and Immunity.. Front Immunol 2020;11:584310.
Young EJ, Hasanjani Roushan MR, Shafae S, Genta RM, Taylor SL. Liver histology of acute brucellosis caused by Brucella melitensis.. Hum Pathol 2014 Oct;45(10):2023-8.
Zhang G, Li C, Li Q, Li B, Larkin DM, Lee C, Storz JF, Antunes A, Greenwold MJ, Meredith RW, Ödeen A, Cui J, Zhou Q, Xu L, Pan H, Wang Z, Jin L, Zhang P, Hu H, Yang W, Hu J, Xiao J, Yang Z, Liu Y, Xie Q, Yu H, Lian J, Wen P, Zhang F, Li H, Zeng Y, Xiong Z, Liu S, Zhou L, Huang Z, An N, Wang J, Zheng Q, Xiong Y, Wang G, Wang B, Wang J, Fan Y, da Fonseca RR, Alfaro-Núñez A, Schubert M, Orlando L, Mourier T, Howard JT, Ganapathy G, Pfenning A, Whitney O, Rivas MV, Hara E, Smith J, Farré M, Narayan J, Slavov G, Romanov MN, Borges R, Machado JP, Khan I, Springer MS, Gatesy J, Hoffmann FG, Opazo JC, Håstad O, Sawyer RH, KimH, Kim K-W, Kim HJ, Cho S, Li N, Huang Y, Bruford MW, Zhan X, Dixon A, Bertelsen MF, Derryberry E, Warren W, Wilson RK, Li S, Ray DA, Green RE, O’Brien SJ, Griffin D, Johnson WE, Haussler D, Ryder OA, Willerslev E, Graves GR, Alström P, Fjeldså J, Mindell DP, Edwards SV, Braun EL, Rahbek C, Burt DV, Houde P, Zhang Y, Yang H, Wang J, Avian Genome Consortium, Jarvis ED, Gilbert MTP, Wang J. Comparative genomics reveals insights into avian genome evolution and adaptation. Science (New York, N.Y.) 346(6215):1311-20.
Cappelletti E, Piras FM, Sola L, Santagostino M, Petersen JL, Bellone RR, Finno CJ, Peng S, Kalbfleisch TS, Bailey E, Nergadze SG, Giulotto E. The localization of centromere protein A is conserved among tissues. Commun Biol 2023 Sep 21;6(1):963.