PloS one2015; 10(6); e0126852; doi: 10.1371/journal.pone.0126852

Comparison of the Equine Reference Sequence with Its Sanger Source Data and New Illumina Reads.

Rebolledo-Mendez, J
·
Hestand, MS
·
Coleman, SJ
·
Zeng, Z
·
Orlando, L
·
MacLeod, JN
·
Kalbfleisch, T

Abstract: The reference assembly for the domestic horse, EquCab2, published in 2009, was built using approximately 30 million Sanger reads from a Thoroughbred mare named Twilight. Contiguity in the assembly was facilitated using nearly 315 thousand BAC end sequences from Twilight's half brother Bravo. Since then, it has served as the foundation for many genome-wide analyses that include not only the modern horse, but ancient horses and other equid species as well. As data mapped to this reference has accumulated, consistent variation between mapped datasets and the reference, in terms of regions with no read coverage, single nucleotide variants, and small insertions/deletions have become apparent. In many cases, it is not clear whether these differences are the result of true sequence variation between the research subjects' and Twilight's genome or due to errors in the reference. EquCab2 is regarded as "The Twilight Assembly." The objective of this study was to identify inconsistencies between the EquCab2 assembly and the source Twilight Sanger data used to build it. To that end, the original Sanger and BAC end reads have been mapped back to this equine reference and assessed with the addition of approximately 40X coverage of new Illumina Paired-End sequence data. The resulting mapped datasets identify those regions with low Sanger read coverage, as well as variation in genomic content that is not consistent with either the original Twilight Sanger data or the new genomic sequence data generated from Twilight on the Illumina platform. As the haploid EquCab2 reference assembly was created using Sanger reads derived largely from a single individual, the vast majority of variation detected in a mapped dataset comprised of those same Sanger reads should be heterozygous. In contrast, homozygous variations would represent either errors in the reference or contributions from Bravo's BAC end sequences. Our analysis identifies 720,843 homozygous discrepancies between new, high throughput genomic sequence data generated for Twilight and the EquCab2 reference assembly. Most of these represent errors in the assembly, while approximately 10,000 are demonstrated to be contributions from another horse. Other results are presented that include the binary alignment map file of the mapped Sanger reads, a list of variants identified as discrepancies between the source data and resulting reference, and a BED annotation file that lists the regions of the genome whose consensus was likely derived from low coverage alignments.

Get Full Text

Publication Date: 2015-06-24 PubMed ID: 26107638PubMed Central: PMC4479572DOI: 10.1371/journal.pone.0126852Google Scholar: Lookup

The Equine Research Bank provides access to a large database of publicly available scientific literature. Inclusion in the Research Bank does not imply endorsement of study methods or findings by Mad Barn.

Facebook X LinkedIn Email Copy

Summary
Cite This
Publication
Affiliations
MeSH
Grants
Conflict of Interest
References
Citations

Journal Article
Research Support
N.I.H.
Extramural
Research Support
Non-U.S. Gov't

Summary

This research summary has been generated with artificial intelligence and may contain errors and omissions. Refer to the original study to confirm details provided. Submit correction.

This research article focuses on detecting and assessing inconsistencies in the EquCab2 assembly, the key reference genome for the domestic horse. The intention is to understand whether observed differences between mapped datasets and the reference are due to genuine sequence variances or errors in the reference.

Objectives of the Study

The main goal of this research was to identify differences between the EquCab2 assembly, which is the reference genome for the domestic horse, and the Twilight Sanger data that contributed to its construction. The study specifically sought to evaluate whether discrepancies were due to real sequence variations or were artefacts.
Another objective was to identify those areas that had low Sanger read coverage and to evaluate genomic content variations inconsistent with either the original Twilight Sanger data or the new genomic sequence data.
Lastly, it aimed to identify heterozygous and homozygous variations within the dataset and determine if they were discrepancies or contributions from Bravo’s BAC end sequences.

Approach and Methodology

The researchers re-mapped the original Sanger and BAC end reads back to the equine reference, now including about 40 times coverage of new Illumina Paired-End sequence data.
Through this process, they identified genomic regions with low Sanger read coverage and inconsistencies in the genomic content as compared to the original Twilight Sanger data or the new sequence data.

Key Findings

The study identified 720,843 homozygous discrepancies between the new high-throughput genomic sequence data generated for Twilight and the EquCab2 reference assembly.
Most of these differentiations are portrayed to represent errors in the assembly, while approximately 10,000 are shown to be contributions from another horse.
The study also provided a binary alignment map file of the mapped Sanger reads, a list of variants identified as discrepancies between the source data and the resulting reference, and a BED annotation file that lists the regions of the genome whose consensus was likely derived from low coverage alignments.

Implications of the Research

This study can provide valuable insights into the reference genome of the domestic horse and can clarify the possible reasons for the observed discrepancies between the mapped datasets and the reference.
These findings can guide future genetic studies on horses and potentially other equine species. Understanding the errors and discrepancies in the reference genome will allow for more confident use and interpretation of these genetic tools. This could open up new possibilities for studying the genetic structure and diversity of horses.

Cite This Article

APA

Rebolledo-Mendez J, Hestand MS, Coleman SJ, Zeng Z, Orlando L, MacLeod JN, Kalbfleisch T. (2015). Comparison of the Equine Reference Sequence with Its Sanger Source Data and New Illumina Reads. PLoS One, 10(6), e0126852. https://doi.org/10.1371/journal.pone.0126852

Publication

PloS one

ISSN: 1932-6203

NlmUniqueID: 101285081

Country: United States

Language: English

Volume: 10

Issue: 6

Pages: e0126852

PII: e0126852

Researcher Affiliations

Rebolledo-Mendez, Jovan

Department of Biochemistry and Molecular Biology, School of Medicine, University of Louisville, Louisville, Kentucky, United States of America.

Hestand, Matthew S

Maxwell H. Gluck Equine Research Center, Department of Veterinary Science, University of Kentucky, Lexington, Kentucky, United States of America.

Coleman, Stephen J

Maxwell H. Gluck Equine Research Center, Department of Veterinary Science, University of Kentucky, Lexington, Kentucky, United States of America.

Zeng, Zheng

Department of Computer Science, University of Kentucky, Lexington, Kentucky, United States of America.

Orlando, Ludovic

Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark.

MacLeod, James N

Maxwell H. Gluck Equine Research Center, Department of Veterinary Science, University of Kentucky, Lexington, Kentucky, United States of America.

Kalbfleisch, Ted

Department of Biochemistry and Molecular Biology, School of Medicine, University of Louisville, Louisville, Kentucky, United States of America; Intrepid Bioinformatics, Louisville, Kentucky, United States of America.

MeSH Terms

Animals
Genome
High-Throughput Nucleotide Sequencing
Horses / genetics
Sequence Analysis, DNA

Grant Funding

P20 GM103436 / NIGMS NIH HHS
5P20GM103436-13 / NIGMS NIH HHS

Conflict of Interest Statement

Ted Kalbfleisch is the CEO of Intrepid Bioinformatics. There are no patents, products in development or marketed products to declare. This does not alter the authors' adherence to all the PLOS ONE policies on sharing data and materials.

References

This article includes 13 references

Wade CM, Giulotto E, Sigurdsson S, Zoli M, Gnerre S, Imsland F, Lear TL, Adelson DL, Bailey E, Bellone RR, Blöcker H, Distl O, Edgar RC, Garber M, Leeb T, Mauceli E, MacLeod JN, Penedo MC, Raison JM, Sharpe T, Vogel J, Andersson L, Antczak DF, Biagi T, Binns MM, Chowdhary BP, Coleman SJ, Della Valle G, Fryc S, Guérin G, Hasegawa T, Hill EW, Jurka J, Kiialainen A, Lindgren G, Liu J, Magnani E, Mickelson JR, Murray J, Nergadze SG, Onofrio R, Pedroni S, Piras MF, Raudsepp T, Rocchi M, Røed KH, Ryder OA, Searle S, Skow L, Swinburne JE, Syvänen AC, Tozaki T, Valberg SJ, Vaudin M, White JR, Zody MC, Lander ES, Lindblad-Toh K. Genome sequence, comparative analysis, and population genetics of the domestic horse.. Science 2009 Nov 6;326(5954):865-7.
doi: 10.1126/science.1178158pmc: PMC3785132pubmed: 19892987google scholar: lookup
Coleman SJ, Zeng Z, Hestand MS, Liu J, Macleod JN. Analysis of unannotated equine transcripts identified by mRNA sequencing.. PLoS One 2013;8(7):e70125.
doi: 10.1371/journal.pone.0070125pmc: PMC3726457pubmed: 23922931google scholar: lookup
Finno CJ, Bannasch DL. Applied equine genetics.. Equine Vet J 2014 Sep;46(5):538-44.
doi: 10.1111/evj.12294pmc: PMC4327934pubmed: 24802051google scholar: lookup
Coleman SJ, Zeng Z, Wang K, Luo S, Khrebtukova I, Mienaltowski MJ, Schroth GP, Liu J, MacLeod JN. Structural annotation of equine protein-coding genes determined by mRNA sequencing.. Anim Genet 2010 Dec;41 Suppl 2:121-30.
doi: 10.1111/j.1365-2052.2010.02118.xpubmed: 21070285google scholar: lookup
Orlando L, Ginolhac A, Zhang G, Froese D, Albrechtsen A, Stiller M, Schubert M, Cappellini E, Petersen B, Moltke I, Johnson PL, Fumagalli M, Vilstrup JT, Raghavan M, Korneliussen T, Malaspinas AS, Vogt J, Szklarczyk D, Kelstrup CD, Vinther J, Dolocan A, Stenderup J, Velazquez AM, Cahill J, Rasmussen M, Wang X, Min J, Zazula GD, Seguin-Orlando A, Mortensen C, Magnussen K, Thompson JF, Weinstock J, Gregersen K, Røed KH, Eisenmann V, Rubin CJ, Miller DC, Antczak DF, Bertelsen MF, Brunak S, Al-Rasheid KA, Ryder O, Andersson L, Mundy J, Krogh A, Gilbert MT, Kjær K, Sicheritz-Ponten T, Jensen LJ, Olsen JV, Hofreiter M, Nielsen R, Shapiro B, Wang J, Willerslev E. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse.. Nature 2013 Jul 4;499(7456):74-8.
doi: 10.1038/nature12323pubmed: 23803765google scholar: lookup
Hestand MS, Kalbfleisch TS, Coleman SJ, Zeng Z, Liu J, Orlando L, MacLeod JN. Annotation of the Protein Coding Regions of the Equine Genome.. PLoS One 2015;10(6):e0124375.
doi: 10.1371/journal.pone.0124375pmc: PMC4481266pubmed: 26107351google scholar: lookup
Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform.. Bioinformatics 2010 Mar 1;26(5):589-95.
doi: 10.1093/bioinformatics/btp698pmc: PMC2828108pubmed: 20080505google scholar: lookup
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform.. Bioinformatics 2009 Jul 15;25(14):1754-60.
doi: 10.1093/bioinformatics/btp324pmc: PMC2705234pubmed: 19451168google scholar: lookup
Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP. Integrative genomics viewer.. Nat Biotechnol 2011 Jan;29(1):24-6.
doi: 10.1038/nbt.1754pmc: PMC3346182pubmed: 21221095google scholar: lookup
Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D. The human genome browser at UCSC.. Genome Res 2002 Jun;12(6):996-1006.
doi: 10.1101/gr.229102pmc: PMC186604pubmed: 12045153google scholar: lookup
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.. Genome Res 2010 Sep;20(9):1297-303.
doi: 10.1101/gr.107524.110pmc: PMC2928508pubmed: 20644199google scholar: lookup
Hyman RW, Jiang H, Fukushima M, Davis RW. A direct comparison of the KB™ Basecaller and phred for identifying the bases from DNA sequencing using chain termination chemistry.. BMC Res Notes 2010 Oct 8;3:257.
doi: 10.1186/1756-0500-3-257pmc: PMC3020662pubmed: 20932319google scholar: lookup
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. The Sequence Alignment/Map format and SAMtools.. Bioinformatics 2009 Aug 15;25(16):2078-9.
doi: 10.1093/bioinformatics/btp352pmc: PMC2723002pubmed: 19505943google scholar: lookup

Citations

This article has been cited 4 times.

Durward-Akhurst SA, Schaefer RJ, Grantham B, Carey WK, Mickelson JR, McCue ME. Genetic Variation and the Distribution of Variant Types in the Horse. Front Genet 2021;12:758366.
doi: 10.3389/fgene.2021.758366pubmed: 34925451google scholar: lookup
Raudsepp T, Finno CJ, Bellone RR, Petersen JL. Ten years of the horse reference genome: insights into equine biology, domestication and population dynamics in the post-genome era. Anim Genet 2019 Dec;50(6):569-597.
doi: 10.1111/age.12857pubmed: 31568563google scholar: lookup
Kalbfleisch TS, Rice ES, DePriest MS Jr, Walenz BP, Hestand MS, Vermeesch JR, O Connell BL, Fiddes IT, Vershinina AO, Saremi NF, Petersen JL, Finno CJ, Bellone RR, McCue ME, Brooks SA, Bailey E, Orlando L, Green RE, Miller DC, Antczak DF, MacLeod JN. Improved reference genome for the domestic horse increases assembly contiguity and composition. Commun Biol 2018;1:197.
doi: 10.1038/s42003-018-0199-zpubmed: 30456315google scholar: lookup
Hestand MS, Kalbfleisch TS, Coleman SJ, Zeng Z, Liu J, Orlando L, MacLeod JN. Annotation of the Protein Coding Regions of the Equine Genome. PLoS One 2015;10(6):e0124375.
doi: 10.1371/journal.pone.0124375pubmed: 26107351google scholar: lookup

Find The Right Product For Your Horse

Backed By Science

Nutrition Consult

We're Here to Help

My Horses

Comparison of the Equine Reference Sequence with Its Sanger Source Data and New Illumina Reads.

Objectives of the Study

Approach and Methodology

Key Findings

Implications of the Research

MeSH Terms

Citations

Comparison of the Equine Reference Sequence with Its Sanger Source Data and New Illumina Reads.

Summary

Objectives of the Study

Approach and Methodology

Key Findings

Implications of the Research

Cite This Article

Publication

Researcher Affiliations

MeSH Terms

Grant Funding

Conflict of Interest Statement

References

Citations

Omneity® Pellets

Visceral+

w-3 Oil

Omneity® Pellets

Omneity® Premix

AminoTrace+

Visceral+

Optimum Digestive Health

Optimum Probiotic

Omneity® Pellets

AminoTrace+

Biotin

MSM

w-3 Oil

Biotin

w-3 Oil

Omneity® Pellets

3:1 Zinc Copper

AminoTrace+

Chasteberry

MagneChrome

Chasteberry

Magnesium Oxide

Visceral+

Visceral+

NOCR

Three Amigos

Three Amigos

Magnesium Oxide

Acetyl-L-Carnitine

NOCR

w-3 Oil

Spirulina