Comparison of the Equine Reference Sequence with Its Sanger Source Data and New Illumina Reads.
Abstract: The reference assembly for the domestic horse, EquCab2, published in 2009, was built using approximately 30 million Sanger reads from a Thoroughbred mare named Twilight. Contiguity in the assembly was facilitated using nearly 315 thousand BAC end sequences from Twilight's half brother Bravo. Since then, it has served as the foundation for many genome-wide analyses that include not only the modern horse, but ancient horses and other equid species as well. As data mapped to this reference has accumulated, consistent variation between mapped datasets and the reference, in terms of regions with no read coverage, single nucleotide variants, and small insertions/deletions have become apparent. In many cases, it is not clear whether these differences are the result of true sequence variation between the research subjects' and Twilight's genome or due to errors in the reference. EquCab2 is regarded as "The Twilight Assembly." The objective of this study was to identify inconsistencies between the EquCab2 assembly and the source Twilight Sanger data used to build it. To that end, the original Sanger and BAC end reads have been mapped back to this equine reference and assessed with the addition of approximately 40X coverage of new Illumina Paired-End sequence data. The resulting mapped datasets identify those regions with low Sanger read coverage, as well as variation in genomic content that is not consistent with either the original Twilight Sanger data or the new genomic sequence data generated from Twilight on the Illumina platform. As the haploid EquCab2 reference assembly was created using Sanger reads derived largely from a single individual, the vast majority of variation detected in a mapped dataset comprised of those same Sanger reads should be heterozygous. In contrast, homozygous variations would represent either errors in the reference or contributions from Bravo's BAC end sequences. Our analysis identifies 720,843 homozygous discrepancies between new, high throughput genomic sequence data generated for Twilight and the EquCab2 reference assembly. Most of these represent errors in the assembly, while approximately 10,000 are demonstrated to be contributions from another horse. Other results are presented that include the binary alignment map file of the mapped Sanger reads, a list of variants identified as discrepancies between the source data and resulting reference, and a BED annotation file that lists the regions of the genome whose consensus was likely derived from low coverage alignments.
Publication Date: 2015-06-24 PubMed ID: 26107638PubMed Central: PMC4479572DOI: 10.1371/journal.pone.0126852Google Scholar: Lookup
The Equine Research Bank provides access to a large database of publicly available scientific literature. Inclusion in the Research Bank does not imply endorsement of study methods or findings by Mad Barn.
- Journal Article
- Research Support
- N.I.H.
- Extramural
- Research Support
- Non-U.S. Gov't
Summary
This research summary has been generated with artificial intelligence and may contain errors and omissions. Refer to the original study to confirm details provided. Submit correction.
This research article focuses on detecting and assessing inconsistencies in the EquCab2 assembly, the key reference genome for the domestic horse. The intention is to understand whether observed differences between mapped datasets and the reference are due to genuine sequence variances or errors in the reference.
Objectives of the Study
- The main goal of this research was to identify differences between the EquCab2 assembly, which is the reference genome for the domestic horse, and the Twilight Sanger data that contributed to its construction. The study specifically sought to evaluate whether discrepancies were due to real sequence variations or were artefacts.
- Another objective was to identify those areas that had low Sanger read coverage and to evaluate genomic content variations inconsistent with either the original Twilight Sanger data or the new genomic sequence data.
- Lastly, it aimed to identify heterozygous and homozygous variations within the dataset and determine if they were discrepancies or contributions from Bravo’s BAC end sequences.
Approach and Methodology
- The researchers re-mapped the original Sanger and BAC end reads back to the equine reference, now including about 40 times coverage of new Illumina Paired-End sequence data.
- Through this process, they identified genomic regions with low Sanger read coverage and inconsistencies in the genomic content as compared to the original Twilight Sanger data or the new sequence data.
Key Findings
- The study identified 720,843 homozygous discrepancies between the new high-throughput genomic sequence data generated for Twilight and the EquCab2 reference assembly.
- Most of these differentiations are portrayed to represent errors in the assembly, while approximately 10,000 are shown to be contributions from another horse.
- The study also provided a binary alignment map file of the mapped Sanger reads, a list of variants identified as discrepancies between the source data and the resulting reference, and a BED annotation file that lists the regions of the genome whose consensus was likely derived from low coverage alignments.
Implications of the Research
- This study can provide valuable insights into the reference genome of the domestic horse and can clarify the possible reasons for the observed discrepancies between the mapped datasets and the reference.
- These findings can guide future genetic studies on horses and potentially other equine species. Understanding the errors and discrepancies in the reference genome will allow for more confident use and interpretation of these genetic tools. This could open up new possibilities for studying the genetic structure and diversity of horses.
Cite This Article
APA
Rebolledo-Mendez J, Hestand MS, Coleman SJ, Zeng Z, Orlando L, MacLeod JN, Kalbfleisch T.
(2015).
Comparison of the Equine Reference Sequence with Its Sanger Source Data and New Illumina Reads.
PLoS One, 10(6), e0126852.
https://doi.org/10.1371/journal.pone.0126852 Publication
Researcher Affiliations
- Department of Biochemistry and Molecular Biology, School of Medicine, University of Louisville, Louisville, Kentucky, United States of America.
- Maxwell H. Gluck Equine Research Center, Department of Veterinary Science, University of Kentucky, Lexington, Kentucky, United States of America.
- Maxwell H. Gluck Equine Research Center, Department of Veterinary Science, University of Kentucky, Lexington, Kentucky, United States of America.
- Department of Computer Science, University of Kentucky, Lexington, Kentucky, United States of America.
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark.
- Maxwell H. Gluck Equine Research Center, Department of Veterinary Science, University of Kentucky, Lexington, Kentucky, United States of America.
- Department of Biochemistry and Molecular Biology, School of Medicine, University of Louisville, Louisville, Kentucky, United States of America; Intrepid Bioinformatics, Louisville, Kentucky, United States of America.
MeSH Terms
- Animals
- Genome
- High-Throughput Nucleotide Sequencing
- Horses / genetics
- Sequence Analysis, DNA
Grant Funding
- P20 GM103436 / NIGMS NIH HHS
- 5P20GM103436-13 / NIGMS NIH HHS
Conflict of Interest Statement
Ted Kalbfleisch is the CEO of Intrepid Bioinformatics. There are no patents, products in development or marketed products to declare. This does not alter the authors' adherence to all the PLOS ONE policies on sharing data and materials.
References
This article includes 13 references
- Wade CM, Giulotto E, Sigurdsson S, Zoli M, Gnerre S, Imsland F, Lear TL, Adelson DL, Bailey E, Bellone RR, Blöcker H, Distl O, Edgar RC, Garber M, Leeb T, Mauceli E, MacLeod JN, Penedo MC, Raison JM, Sharpe T, Vogel J, Andersson L, Antczak DF, Biagi T, Binns MM, Chowdhary BP, Coleman SJ, Della Valle G, Fryc S, Guérin G, Hasegawa T, Hill EW, Jurka J, Kiialainen A, Lindgren G, Liu J, Magnani E, Mickelson JR, Murray J, Nergadze SG, Onofrio R, Pedroni S, Piras MF, Raudsepp T, Rocchi M, Røed KH, Ryder OA, Searle S, Skow L, Swinburne JE, Syvänen AC, Tozaki T, Valberg SJ, Vaudin M, White JR, Zody MC, Lander ES, Lindblad-Toh K. Genome sequence, comparative analysis, and population genetics of the domestic horse.. Science 2009 Nov 6;326(5954):865-7.
- Coleman SJ, Zeng Z, Hestand MS, Liu J, Macleod JN. Analysis of unannotated equine transcripts identified by mRNA sequencing.. PLoS One 2013;8(7):e70125.
- Finno CJ, Bannasch DL. Applied equine genetics.. Equine Vet J 2014 Sep;46(5):538-44.
- Coleman SJ, Zeng Z, Wang K, Luo S, Khrebtukova I, Mienaltowski MJ, Schroth GP, Liu J, MacLeod JN. Structural annotation of equine protein-coding genes determined by mRNA sequencing.. Anim Genet 2010 Dec;41 Suppl 2:121-30.
- Orlando L, Ginolhac A, Zhang G, Froese D, Albrechtsen A, Stiller M, Schubert M, Cappellini E, Petersen B, Moltke I, Johnson PL, Fumagalli M, Vilstrup JT, Raghavan M, Korneliussen T, Malaspinas AS, Vogt J, Szklarczyk D, Kelstrup CD, Vinther J, Dolocan A, Stenderup J, Velazquez AM, Cahill J, Rasmussen M, Wang X, Min J, Zazula GD, Seguin-Orlando A, Mortensen C, Magnussen K, Thompson JF, Weinstock J, Gregersen K, Røed KH, Eisenmann V, Rubin CJ, Miller DC, Antczak DF, Bertelsen MF, Brunak S, Al-Rasheid KA, Ryder O, Andersson L, Mundy J, Krogh A, Gilbert MT, Kjær K, Sicheritz-Ponten T, Jensen LJ, Olsen JV, Hofreiter M, Nielsen R, Shapiro B, Wang J, Willerslev E. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse.. Nature 2013 Jul 4;499(7456):74-8.
- Hestand MS, Kalbfleisch TS, Coleman SJ, Zeng Z, Liu J, Orlando L, MacLeod JN. Annotation of the Protein Coding Regions of the Equine Genome.. PLoS One 2015;10(6):e0124375.
- Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform.. Bioinformatics 2010 Mar 1;26(5):589-95.
- Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform.. Bioinformatics 2009 Jul 15;25(14):1754-60.
- Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP. Integrative genomics viewer.. Nat Biotechnol 2011 Jan;29(1):24-6.
- Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D. The human genome browser at UCSC.. Genome Res 2002 Jun;12(6):996-1006.
- McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.. Genome Res 2010 Sep;20(9):1297-303.
- Hyman RW, Jiang H, Fukushima M, Davis RW. A direct comparison of the KB™ Basecaller and phred for identifying the bases from DNA sequencing using chain termination chemistry.. BMC Res Notes 2010 Oct 8;3:257.
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. The Sequence Alignment/Map format and SAMtools.. Bioinformatics 2009 Aug 15;25(16):2078-9.
Citations
This article has been cited 4 times.- Durward-Akhurst SA, Schaefer RJ, Grantham B, Carey WK, Mickelson JR, McCue ME. Genetic Variation and the Distribution of Variant Types in the Horse. Front Genet 2021;12:758366.
- Raudsepp T, Finno CJ, Bellone RR, Petersen JL. Ten years of the horse reference genome: insights into equine biology, domestication and population dynamics in the post-genome era. Anim Genet 2019 Dec;50(6):569-597.
- Kalbfleisch TS, Rice ES, DePriest MS Jr, Walenz BP, Hestand MS, Vermeesch JR, O Connell BL, Fiddes IT, Vershinina AO, Saremi NF, Petersen JL, Finno CJ, Bellone RR, McCue ME, Brooks SA, Bailey E, Orlando L, Green RE, Miller DC, Antczak DF, MacLeod JN. Improved reference genome for the domestic horse increases assembly contiguity and composition. Commun Biol 2018;1:197.
- Hestand MS, Kalbfleisch TS, Coleman SJ, Zeng Z, Liu J, Orlando L, MacLeod JN. Annotation of the Protein Coding Regions of the Equine Genome. PLoS One 2015;10(6):e0124375.
Use Nutrition Calculator
Check if your horse's diet meets their nutrition requirements with our easy-to-use tool Check your horse's diet with our easy-to-use tool
Talk to a Nutritionist
Discuss your horse's feeding plan with our experts over a free phone consultation Discuss your horse's diet over a phone consultation
Submit Diet Evaluation
Get a customized feeding plan for your horse formulated by our equine nutritionists Get a custom feeding plan formulated by our nutritionists