Another lesson from unmapped reads: in-depth analysis of RNA-Seq reads from various horse tissues.
Abstract: In recent years, a vast amount of sequencing data has been generated and large improvements have been made to reference genome sequences. Despite these advances, significant portions of reads still do not map to reference genomes and these reads have been considered as junk or artificial sequences. Recent studies have shown that these reads can be useful, e.g., for refining reference genomes or detecting contaminating microorganisms present in the analyzed biological samples. A special case of this is RNA sequencing (RNA-Seq) reads that come from tissue transcriptomes. Unmapped reads from RNA-Seq have received much less attention than those from whole-genome sequencing. In particular, in the horse, an analysis of unmapped RNA reads has not been performed yet. Thus, in this study, we analyzed the unmapped reads originating from the RNA-Seq performed through the Functional Annotation of Animal Genomes (FAANG) project in the horse, using eight different tissues from two mares. We demonstrated that unmapped reads from RNA-Seq could be easily assembled into transcripts relating to many important genes present in the sequences of other mammals. Large portions of these transcripts did not have coding potential and, thus, can be considered as non-coding RNA. Moreover, reads that were not mapped to the reference genome but aligned to the entries in NCBI database of horse proteins were enriched for biological processes that largely correspond to the functions of organ from which RNA was isolated and thus are presumably true transcripts of genes associated with cell metabolism in those tissues. In addition, a portion of reads aligned to the common pathogenic or neutral microbiota, of which the most common was Brucella spp. These data suggest that unmapped reads can be an important target for in-depth analysis that may substantially enrich results of initial RNA-Seq experiments for various tissues and organs.
© 2022. The Author(s), under exclusive licence to Institute of Plant Genetics Polish Academy of Sciences.
Publication Date: 2022-06-07 PubMed ID: 35670911PubMed Central: 5657046DOI: 10.1007/s13353-022-00705-zGoogle Scholar: Lookup
The Equine Research Bank provides access to a large database of publicly available scientific literature. Inclusion in the Research Bank does not imply endorsement of study methods or findings by Mad Barn.
- Journal Article
Summary
This research summary has been generated with artificial intelligence and may contain errors and omissions. Refer to the original study to confirm details provided. Submit correction.
This research explores the value of unmapped RNA sequencing (RNA-Seq) reads from various horse tissues, suggesting these reads can sufficiently assemble into transcripts that reflect important genes found in other mammals, potentially enhancing the outcomes of initial RNA-Seq experiments.
Background of the Study
- Despite the advancements in generating sequencing data and improving reference genome sequences, a significant amount of reads still do not map to these genomes, often being dismissed as junk or artificial sequences.
- The researchers argue that these unmapped reads, especially those from RNA sequencing (RNA-Seq), can be useful for refining genome references and detecting contamination in biological samples.
- This study focuses on unmapped RNA reads from horses, an area that previously hadn’t been addressed.
Methods and Findings
- The research team took unmapped reads from eight different horse tissue samples using RNA-Seq methods under the Functional Annotation of Animal Genomes (FAANG) project.
- Humongous portions of these unmapped reads could be assembled into transcripts correlating with many significant genes present in other mammals’ sequences.
- Large portions of these constructed transcripts lack coding potential and can therefore be considered non-coding RNA.
- Unmapped reads aligned themselves with entries in the NCBI database of horse proteins, suggesting a possible association with biological processes that directly link to the functionality of the organ from which the RNA was initially isolated.
Implications of the Findings
- This denotes that these unmapped reads should potentially be authentic transcripts of genes connected with cell metabolism in those particular tissues.
- In addition to this, some reads aligned themselves with common pathogenic or neutral microbiota, with Brucella spp. being the most prevalent.
- The research implies that in-depth analyses of these unmapped reads could significantly enrich the results obtained from initial RNA-Seq experiments on various tissues and organs.
Cite This Article
APA
Gurgul A, Szmatoła T, Ocłoń E, Jasielczuk I, Semik-Gurgul E, Finno CJ, Petersen JL, Bellone R, Hales EN, Ząbek T, Arent Z, Kotula-Balak M, Bugno-Poniewierska M.
(2022).
Another lesson from unmapped reads: in-depth analysis of RNA-Seq reads from various horse tissues.
J Appl Genet, 63(3), 571-581.
https://doi.org/10.1007/s13353-022-00705-z Publication
Researcher Affiliations
- Center for Experimental and Innovative Medicine, University of Agriculture in Krakow, Rędzina 1c, 30-248, Kraków, Poland. artur.gurgul@urk.edu.pl.
- Center for Experimental and Innovative Medicine, University of Agriculture in Krakow, Rędzina 1c, 30-248, Kraków, Poland.
- Center for Experimental and Innovative Medicine, University of Agriculture in Krakow, Rędzina 1c, 30-248, Kraków, Poland.
- Center for Experimental and Innovative Medicine, University of Agriculture in Krakow, Rędzina 1c, 30-248, Kraków, Poland.
- Department of Animal Molecular Biology, National Research Institute of Animal Production, Krakowska 1, 32-083, Balice, Poland.
- Department of Population Health and Reproduction, University of California Davis School of Veterinary Medicine, Davis, CA, USA.
- Department of Animal Science, University of Nebraska Lincoln, Lincoln, NB, USA.
- Department of Population Health and Reproduction, University of California Davis School of Veterinary Medicine, Davis, CA, USA.
- Veterinary Genetics Laboratory, University of California Davis School of Veterinary Medicine, Davis, CA, USA.
- Department of Population Health and Reproduction, University of California Davis School of Veterinary Medicine, Davis, CA, USA.
- Department of Animal Molecular Biology, National Research Institute of Animal Production, Krakowska 1, 32-083, Balice, Poland.
- Center for Experimental and Innovative Medicine, University of Agriculture in Krakow, Rędzina 1c, 30-248, Kraków, Poland.
- University Centre of Veterinary Medicine, University of Agriculture in Krakow, Mickiewicza 24/28, 30-059, Krakow, Poland.
- Department of Animal Reproduction, Anatomy and Genomics, University of Agriculture in Kraków, al. Mickiewicza 24/28, 30-059, Kraków, Poland.
MeSH Terms
- Animals
- Base Sequence
- Female
- Genome / genetics
- High-Throughput Nucleotide Sequencing
- Horses / genetics
- Mammals / genetics
- RNA / genetics
- RNA-Seq
- Sequence Analysis, RNA
- Transcriptome / genetics
Grant Funding
- L40 TR001136 / NCATS NIH HHS
References
This article includes 46 references
- Alici H, Ercan S, Davutoglu V. Brucella infective endocarditis. Cor Vasa 56(5):e433–e435.
- Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological) 57:289–300.
- Bosworth CM, Grandhi S, Gould MP, LaFramboise T. Detection and quantification of mitochondrial DNA deletions from next-generation sequence data.. BMC Bioinformatics 2017 Oct 16;18(Suppl 12):407.
- Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND.. Nat Methods 2015 Jan;12(1):59-60.
- Burns EN, Bordbari MH, Mienaltowski MJ, Affolter VK, Barro MV, Gianino F, Gianino G, Giulotto E, Kalbfleisch TS, Katzman SA, Lassaline M, Leeb T, Mack M, Müller EJ, MacLeod JN, Ming-Whitfield B, Alanis CR, Raudsepp T, Scott E, Vig S, Zhou H, Petersen JL, Bellone RR, Finno CJ. Generation of an equine biobank to be used for Functional Annotation of Animal Genomes project.. Anim Genet 2018 Dec;49(6):564-570.
- Bussotti G, Notredame C, Enright AJ. Detecting and comparing non-coding RNAs in the high-throughput era.. Int J Mol Sci 2013 Jul 24;14(8):15423-58.
- Chen J, Dai Z, Cao C, Zhang Q, Liu H, Sun X. Next-generation sequencing data processing: analysis of unmapped reads and extremely high mapped peaks. 5th Int Conf BioMed Eng Informa 893–97.
- Côté O, Lillie BN, Hayes MA, Clark ME, van den Bosch L, Katavolos P, Viel L, Bienzle D. Multiple secretoglobin 1A1 genes are differentially expressed in horses.. BMC Genomics 2012 Dec 19;13:712.
- Degner JF, Marioni JC, Pai AA, Pickrell JK, Nkadori E, Gilad Y, Pritchard JK. Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinform (oxford, England) 25(24):3207–3212.
- Dhorne-Pollet S, Barrey E, Pollet N. A new method for long-read sequencing of animal mitochondrial genomes: application to the identification of equine mitochondrial DNA variants.. BMC Genomics 2020 Nov 11;21(1):785.
- Diraison F, Beylot M. Role of human liver lipogenesis and reesterification in triglycerides secretion and in FFA reesterification. Am J Phys-Endocrinol Metab 274(2):E321–E327.
- Dodt M, Roehr JT, Ahmed R, Dieterich C. FLEXBAR-Flexible Barcode and Adapter Processing for Next-Generation Sequencing Platforms. Biol (Basel) 1(3):895–905.
- Gouin A, Legeai F, Nouhaud P, Whibley A, Simon JC, Lemaitre C. Whole-genome re-sequencing of non-model organisms: lessons from unmapped reads. Heredity (edinb) 114(5):494–501.
- Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A. Full-length transcriptome assembly from RNA-Seq data without a reference genome.. Nat Biotechnol 2011 May 15;29(7):644-52.
- Hasan MS, Wu X, Zhang L. Uncovering missed indels by leveraging unmapped reads.. Sci Rep 2019 Jul 31;9(1):11093.
- Hodson L Gunn PJ. The regulation of hepatic fatty acid synthesis and partitioning: the effect of nutritional state. Nat Rev Endocrinol 15(12):689–700.
- da Huang W, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 37(1):1–13.
- Jarvis ED, Mirarab S, Aberer AJ, Li B, Houde P, Li C, Ho SY, Faircloth BC, Nabholz B, Howard JT, Suh A, Weber CC, da Fonseca RR, Li J, Zhang F, Li H, Zhou L, Narula N, Liu L, Ganapathy G, Boussau B, Bayzid MS, Zavidovych V, Subramanian S, Gabaldón T, Capella-Gutiérrez S, Huerta-Cepas J, Rekepalli B, Munch K, Schierup M, Lindow B, Warren WC, Ray D, Green RE, Bruford MW, Zhan X, Dixon A, Li S, Li N, Huang Y, Derryberry EP, Bertelsen MF, Sheldon FH, Brumfield RT, Mello CV, Lovell PV, Wirthlin M, Schneider MP, Prosdocimi F, Samaniego JA, Vargas Velazquez AM, Alfaro-Núñez A, Campos PF, Petersen B, Sicheritz-Ponten T, Pas A, Bailey T, Scofield P, Bunce M, Lambert DM, Zhou Q, Perelman P, Driskell AC, Shapiro B, Xiong Z, Zeng Y, Liu S, Li Z, Liu B, Wu K, Xiao J, Yinqi X, Zheng Q, Zhang Y, Yang H, Wang J, Smeds L, Rheindt FE, Braun M, Fjeldsa J, Orlando L, Barker FK, Jønsson KA, Johnson W, Koepfli KP, O'Brien S, Haussler D, Ryder OA, Rahbek C, Willerslev E, Graves GR, Glenn TC, McCormack J, Burt D, Ellegren H, Alström P, Edwards SV, Stamatakis A, Mindell DP, Cracraft J, Braun EL, Warnow T, Jun W, Gilbert MT, Zhang G. Whole-genome analyses resolve early branches in the tree of life of modern birds.. Science 2014 Dec 12;346(6215):1320-31.
- Kaden R, Ferrari S, Jinnerot T, Lindberg M, Wahab T, Lavander M. Brucella abortus: determination of survival times and evaluation of methods for detection in several matrices.. BMC Infect Dis 2018 Jun 5;18(1):259.
- Kazemian M, Ren M, Lin JX, Liao W, Spolski R, Leonard WJ. Comprehensive assembly of novel transcripts from unmapped human RNA-Seq data and their association with cancer.. Mol Syst Biol 2015 Aug 7;11(8):826.
- Keegan KP, Glass EM, Meyer F. MG-RAST, a Metagenomics Service for Analysis of Microbial Community Structure and Function.. Methods Mol Biol 2016;1399:207-33.
- Khan MZ, Zahoor M. An overview of brucellosis in cattle and humans, and its serological and molecular diagnosis in control strategies. Trop Med Infectious Dis 3(2):65.
- Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14(4):R36.
- Laine VN, Gossmann TI, van Oers K, Visser ME, Groenen MAM. Exploring the unmapped DNA and RNA reads in a songbird genome.. BMC Genomics 2019 Jan 8;20(1):19.
- Lee HY, Kim JY, Kim KH, Jeong S, Cho Y, Kim N. Gene expression profile in similar tissues using transcriptome sequencing data of whole-body horse skeletal muscle. Genes 11(11):1359.
- Lee H, Zhang Z, Krause HM. Long Noncoding RNAs and Repetitive Elements: Junk or Intimate Evolutionary Partners?. Trends Genet 2019 Dec;35(12):892-902.
- Lee Y, Park K, Koh I. Analysis of unmapped regions associated with long deletions in Korean whole genome sequences based on short read data.. Genomics Inform 2019 Dec;17(4):e40.
- Li Z, Qin F, Li H. Chimeric RNAs and their implications in cancer.. Curr Opin Genet Dev 2018 Feb;48:36-43.
- Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.. Genome Biol 2014;15(12):550.
- Metsalu T, Vilo J. ClustVis: a web tool for visualizing clustering of multivariate data using Principal Component Analysis and heatmap.. Nucleic Acids Res 2015 Jul 1;43(W1):W566-70.
- Park SJ, Onizuka S, Seki M, Suzuki Y, Iwata T, Nakai K. A systematic sequencing-based approach for microbial contaminant detection and functional inference.. BMC Biol 2019 Sep 13;17(1):72.
- Pei J, Chu M, Bao P, Sha Z, Ding X, Yan P, Guo X. The complete mitochondrial genome of Sanhe Horse (Equus caballus). Conserv Genet Resour 11(1):11–14.
- Pollitt CC, Daradka M. Equine laminitis basement membrane pathology: loss of type IV collagen, type VII collagen and laminin immunostaining. Equine Vet J 30(S26):139–144.
- Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features.. Bioinformatics 2010 Mar 15;26(6):841-2.
- Rawlins EL, Okubo T, Xue Y, Brass DM, Auten RL, Hasegawa H, Wang F, Hogan BL. The role of Scgb1a1+ Clara cells in the long-term maintenance and repair of lung airway, but not alveolar, epithelium.. Cell Stem Cell 2009 Jun 5;4(6):525-34.
- Schbath S, Martin V, Zytnicki M, Fayolle J, Loux V, Gibrat JF. Mapping reads on a genomic sequence: an algorithmic overview and a practical comparative analysis.. J Comput Biol 2012 Jun;19(6):796-813.
- Shaffer HB, Minx P, Warren DE, Shedlock AM, Thomson RC, Valenzuela N, Abramyan J, Amemiya CT, Badenhorst D, Biggar KK, Borchert GM, Botka CW, Bowden RM, Braun EL, Bronikowski AM, Bruneau BG, Buck LT, Capel B, Castoe TA, Czerwinski M, Delehaunty KD, Edwards SV, Fronick CC, Fujita MK, Fulton L, Graves TA, Green RE, Haerty W, Hariharan R, Hernandez O, Hillier LW, Holloway AK, Janes D, Janzen FJ, Kandoth C, Kong L, de Koning AP, Li Y, Literman R, McGaugh SE, Mork L, O'Laughlin M, Paitz RT, Pollock DD, Ponting CP, Radhakrishnan S, Raney BJ, Richman JM, St John J, Schwartz T, Sethuraman A, Spinks PQ, Storey KB, Thane N, Vinar T, Zimmerman LM, Warren WC, Mardis ER, Wilson RK. The western painted turtle genome, a model for the evolution of extreme physiological adaptations in a slowly evolving lineage.. Genome Biol 2013 Mar 28;14(3):R28.
- Sonawane AR, Platig J, Fagny M, Chen CY, Paulson JN, Lopes-Ramos CM, DeMeo DL, Quackenbush J, Glass K, Kuijjer ML. Understanding Tissue-Specific Gene Regulation.. Cell Rep 2017 Oct 24;21(4):1077-1088.
- Tiessen A, Pérez-Rodríguez P, Delaye-Arredondo LJ. Mathematical modeling and comparison of protein size distribution in different plant, animal, fungal and microbial species reveals a negative correlation between protein size and protein number, thus providing insight into the evolution of proteomes.. BMC Res Notes 2012 Feb 1;5:85.
- Usman T, Hadlich F, Demasius W, Weikard R, Kühn C. Unmapped reads from cattle RNAseq data: A source for missing and misassembled sequences in the reference assemblies and for detection of pathogens in the host.. Genomics 2017 Jan;109(1):36-42.
- Wang AX, Ruzzo WL, Tompa M. How accurately is ncRNA aligned within whole-genome multiple alignments?. BMC Bioinform 8:417.
- Whitacre LK, Tizioto PC, Kim J, Sonstegard TS, Schroeder SG, Alexander LJ, Medrano JF, Schnabel RD, Taylor JF, Decker JE. What's in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual.. BMC Genomics 2015 Dec 29;16:1114.
- Wong AP, Keating A, Waddell TK. Airway regeneration: the role of the Clara cell secretory protein and the cells that express it.. Cytotherapy 2009;11(6):676-87.
- Xu M, Yang W, Wang X, Nayak DK. Lung Secretoglobin Scgb1a1 Influences Alveolar Macrophage-Mediated Inflammation and Immunity.. Front Immunol 2020;11:584310.
- Young EJ, Hasanjani Roushan MR, Shafae S, Genta RM, Taylor SL. Liver histology of acute brucellosis caused by Brucella melitensis.. Hum Pathol 2014 Oct;45(10):2023-8.
- Zhang G, Li C, Li Q, Li B, Larkin DM, Lee C, Storz JF, Antunes A, Greenwold MJ, Meredith RW, Ödeen A, Cui J, Zhou Q, Xu L, Pan H, Wang Z, Jin L, Zhang P, Hu H, Yang W, Hu J, Xiao J, Yang Z, Liu Y, Xie Q, Yu H, Lian J, Wen P, Zhang F, Li H, Zeng Y, Xiong Z, Liu S, Zhou L, Huang Z, An N, Wang J, Zheng Q, Xiong Y, Wang G, Wang B, Wang J, Fan Y, da Fonseca RR, Alfaro-Núñez A, Schubert M, Orlando L, Mourier T, Howard JT, Ganapathy G, Pfenning A, Whitney O, Rivas MV, Hara E, Smith J, Farré M, Narayan J, Slavov G, Romanov MN, Borges R, Machado JP, Khan I, Springer MS, Gatesy J, Hoffmann FG, Opazo JC, Håstad O, Sawyer RH, KimH, Kim K-W, Kim HJ, Cho S, Li N, Huang Y, Bruford MW, Zhan X, Dixon A, Bertelsen MF, Derryberry E, Warren W, Wilson RK, Li S, Ray DA, Green RE, O’Brien SJ, Griffin D, Johnson WE, Haussler D, Ryder OA, Willerslev E, Graves GR, Alström P, Fjeldså J, Mindell DP, Edwards SV, Braun EL, Rahbek C, Burt DV, Houde P, Zhang Y, Yang H, Wang J, Avian Genome Consortium, Jarvis ED, Gilbert MTP, Wang J. Comparative genomics reveals insights into avian genome evolution and adaptation. Science (New York, N.Y.) 346(6215):1311-20.
Citations
This article has been cited 1 times.- Cappelletti E, Piras FM, Sola L, Santagostino M, Petersen JL, Bellone RR, Finno CJ, Peng S, Kalbfleisch TS, Bailey E, Nergadze SG, Giulotto E. The localization of centromere protein A is conserved among tissues. Commun Biol 2023 Sep 21;6(1):963.
Use Nutrition Calculator
Check if your horse's diet meets their nutrition requirements with our easy-to-use tool Check your horse's diet with our easy-to-use tool
Talk to a Nutritionist
Discuss your horse's feeding plan with our experts over a free phone consultation Discuss your horse's diet over a phone consultation
Submit Diet Evaluation
Get a customized feeding plan for your horse formulated by our equine nutritionists Get a custom feeding plan formulated by our nutritionists