Abstract: Single nucleotide polymorphism (SNP) arrays are commonly used for studying the genomic structure and diversity of livestock breeds, but whole-genome sequencing (WGS) provides higher-resolution genomic data. Genotype imputation has become a standard practice for increasing the genomic resolution of association studies. This work aimed to extend imputation to biodiversity analyses, comparing SNP array data before and after imputation. A 40 k SNP dataset of 281 horses from 12 breeds (DS) was imputed to sequence-level using a reference panel of 327 sequenced individuals, generating approximately 9 million markers after filtering (DS). Both datasets were used to study genetic variability, population structure and runs of homozygosity (ROH). Results: Genetic indices and relationships showed similar trends for both datasets, with high Pearson correlations and Mantel test values (> 0.8) indicating that the imputed data are a reliable alternative to SNP array data for genetic studies. Multidimensional scaling and admixture analyses highlighted how the genetic proximity between breeds observed for the DS was amplified by the imputation process in cases of those breeds with a few sequences included in the WGS reference panel. ROH investigation showed overlapping homozygosity regions between the two datasets, highlighting the benefits of having more markers for gene and QTL annotation. Of the 141 ROH islands identified in the DS, 79 overlapped perfectly with those found in the imputed data. Validation with the reference panel of 327 sequenced horses revealed a single ROH island on ECA11 shared across all three datasets, containing genes associated with morphology and behavioral traits. Conclusions: High correlations between SNP array and imputed data indicate that imputed genotypes provide a reliable alternative for assessing population structure and genetic diversity in horse breeds. Specifically, imputation can enhance the detection of ROH and the annotation of genes within ROH islands, with the reliability of these results depending on the quality of the reference panel and its representation of the studied breeds, among others.
The Equine Research Bank provides access to a large database of publicly available scientific literature. Inclusion in the Research Bank does not imply endorsement of study methods or findings by Mad Barn.
This research summary has been generated with artificial intelligence and may contain errors and omissions. Refer to the original study to confirm details provided. Submit correction.
Overview
This study compares the use of SNP arrays and imputed whole-genome sequencing (WGS) data to analyze population structure and runs of homozygosity (ROH) hotspots in horse breeds.
It evaluates whether imputed data, generated by enhancing lower-resolution SNP array data using a WGS reference panel, can reliably replace SNP array data for genetic diversity and structure studies.
Introduction to Genetic Tools Used
SNP arrays: Tools that genotype a fixed number of known single nucleotide polymorphisms (SNPs), typically tens of thousands, useful for genomic diversity and structure analysis but with limited resolution.
Whole-genome sequencing (WGS): Provides comprehensive genomic data at single-nucleotide resolution, offering greater detail but at higher cost and complexity.
Genotype imputation: A statistical method that predicts unobserved genotypes by leveraging a reference panel of sequenced genomes to increase marker density from SNP arrays to sequence-level, aimed at improving genomic resolution without the full cost of WGS.
Research Objectives
To apply genotype imputation for biodiversity analysis in horse populations by increasing SNP array data to sequence-level marker density.
To compare key genetic analyses – including measures of genetic variability, population structure, and ROH detection – between the original SNP array data and the imputed high-density data.
Methodology
Samples: 281 horses from 12 distinct breeds genotyped by a 40,000 SNP (40k) SNP array dataset.
Imputation: Used a reference panel of 327 fully sequenced horses to impute the 40k SNP data, resulting in approximately 9 million high-quality variant markers after filtering.
Analyses performed on both datasets:
Genetic variability indices
Population structure assessments via multidimensional scaling and admixture analyses
Runs of homozygosity (ROH) identification and characterization
Key Findings
Genetic variability and relationships:
Both datasets showed similar trends in genetic indices and pairwise genetic relationships among breeds.
High statistical concordance was observed, with Pearson correlation and Mantel test values above 0.8, confirming the reliability of imputed data for genetic studies.
Population structure:
Multidimensional scaling and admixture analyses showed genetic proximities between breeds.
Imputation amplified the observed genetic proximity, especially in breeds that had fewer sequences in the reference panel, indicating an effect of reference panel representation on results.
Runs of homozygosity (ROH):
There was significant overlap in homozygosity regions identified in both datasets.
The imputed dataset, with more markers, allowed improved annotation of genes and quantitative trait loci (QTL) within ROH islands.
Out of 141 ROH islands found using the SNP array data, 79 perfectly overlapped ROH islands identified with imputed data.
Validation using the reference panel’s sequenced horses discovered a shared ROH island on equine chromosome 11 (ECA11) associated with genes controlling morphology and behavior, present in all three datasets.
Conclusions and Implications
Imputed genotypes strongly correlate with SNP array data and thus represent a reliable alternative for assessing population structure and genetic diversity in horses.
Imputation enhances detection and characterization of ROH, leading to better insight into genomic regions linked to important traits.
The success of imputation-based analyses depends on factors such as the quality and breed representation within the WGS reference panel.
This approach can optimize cost-effectiveness by leveraging existing SNP array data for high-resolution genetic analyses without requiring full WGS for all samples.
Cite This Article
APA
Chessari G, Reich P, Criscione A, Falker-Gieske C, Mastrangelo S, Tumino S, Bordonaro S, Marletta D, Tetens J.
(2025).
Comparison between SNP array and imputed data to estimate population structure and ROH hotspots in horse breeds.
BMC Genomics, 26(1), 1086.
https://doi.org/10.1186/s12864-025-12256-8
Department of Agriculture, Food and Environment, University of Catania, Catania, 95131, Italy. giorgio.chessari@unict.it.
Department of Animal Sciences, Georg-August-University Göttingen, Göttingen, 37077, Germany. giorgio.chessari@unict.it.
Reich, Paula
Department of Animal Sciences, Georg-August-University Göttingen, Göttingen, 37077, Germany.
Center for Integrated Breeding Research (CiBreed), Georg-August- University Göttingen, Göttingen, 37075, Germany.
Criscione, Andrea
Department of Agriculture, Food and Environment, University of Catania, Catania, 95131, Italy.
Falker-Gieske, Clemens
Department of Animal Sciences, Georg-August-University Göttingen, Göttingen, 37077, Germany.
Center for Integrated Breeding Research (CiBreed), Georg-August- University Göttingen, Göttingen, 37075, Germany.
Mastrangelo, Salvatore
Department of Agricultural, Food and Forestry Sciences, University of Palermo, Palermo, 90128, Italy.
Tumino, Serena
Department of Agriculture, Food and Environment, University of Catania, Catania, 95131, Italy.
Bordonaro, Salvatore
Department of Agriculture, Food and Environment, University of Catania, Catania, 95131, Italy.
Marletta, Donata
Department of Agriculture, Food and Environment, University of Catania, Catania, 95131, Italy.
Tetens, Jens
Department of Animal Sciences, Georg-August-University Göttingen, Göttingen, 37077, Germany.
Center for Integrated Breeding Research (CiBreed), Georg-August- University Göttingen, Göttingen, 37075, Germany.
MeSH Terms
Animals
Polymorphism, Single Nucleotide
Horses / genetics
Homozygote
Breeding
Genotype
Genetics, Population
Whole Genome Sequencing
Genomics / methods
Conflict of Interest Statement
Declarations. Ethics approval and consent to participate: Not applicable. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.
References
This article includes 143 references
Rook AJ, Dumont B, Isselstein J, Osoro K, WallisDeVries MF, Parente G. Matching type of livestock to desired biodiversity outcomes in pastures – a review. Biol Conserv 2004;119:137–50.
Bordonaro S, Chessari G, Mastrangelo S, Senczuk G, Chessa S, Castiglioni B. Genome-wide population structure, homozygosity, and heterozygosity patterns of Nero Siciliano pig in the framework of Italian and cosmopolitan breeds. Anim Genet 2023;00:1–15.
Colli L, Milanesi M, Talenti A, Bertolini F, Chen M, Crisa A. Genome-wide SNP profiling of worldwide goat populations reveals strong partitioning of diversity and highlights post-domestication migration routes. Genet Sel Evol 2018;50:58.
Bovo S, Ribani A, Munoz M, Alves E, Araujo JP, Bozzi R. Whole-genome sequencing of European autochthonous and commercial pig breeds allows the detection of signatures of selection for adaptation of genetic resources to different breeding and production systems. Genet Sel Evol 2020;52:1–19.
Mahrous KF, Hassanane M, Abdel Mordy M, Shafey HI, Hassan N. Genetic variations in horse using microsatellite markers. J Genet Eng Biotechnol 2011;9:103–9.
Pereira GL, Chud TCS, Bernardes PA, Venturini GC, Chardulo LAL, Curi RA. Genotype imputation and accuracy evaluation in racing quarter horses genotyped using different commercial SNP panels. J Equine Vet Sci 2017;58:89–96.
Raymond B, Bouwman AC, Schrooten C, Houwing-Duistermaat J, Veerkamp RF. Utility of whole-genome sequence data for across-breed genomic prediction. Genet Sel Evol 2018;50:27.
Cheruiyot EK, Haile-Mariam M, Cocks BG, MacLeod IM, Xiang R, Pryce JE. New loci and neuronal pathways for resilience to heat stress in cattle. Sci Rep 2021;11(1):16619.
Xiang R, MacLeod IM, Daetwyler HD, de Jong G, O’Connor E, Schrooten C. Genome-wide fine-mapping identifies pleiotropic and functional variants that predict many traits across global cattle populations. Nat Commun 2021;12:860.
Ye S, Yuan X, Lin X, Gao N, Luo Y, Chen Z. Imputation from SNP chip to sequence: a case study in a Chinese Indigenous chicken population. J Anim Sci Biotechnol 2018;9:30.
van Binsbergen R, Bink MCAM, Calus MPL, van Euwijk FA, Hayes BJ, Hulsegge I. Accuracy of imputation to whole-genome sequence data in Holstein Friesian cattle. Genet Sel Evol 2014;46.
Veerkamp RF, Bouwman AC, Schrooten C, Calus MP. Genomic prediction using preselected DNA variants from a GWAS with whole-genome sequence data in Holstein-Friesian cattle. Genet Sel Evol 2016;48:95.
Sanchez MP, Govignon-Gion A, Croiseau P, Fritz S, Hoze C, Miranda G. Within-breed and multi-breed GWAS on imputed whole-genome sequence variants reveal candidate mutations affecting milk protein composition in dairy cattle. Genet Sel Evol 2017;49:68.
van den Berg S, Vandenplas J, van Eeuwijk FA, Bouwman AC, Lopes MS, Veerkamp RF. Imputation to whole-genome sequence using multiple pig populations and its use in genome-wide association studies. Genet Sel Evol 2019;51:2.
van Binsbergen R, Calus MP, Bink MC, van Eeuwijk FA, Schrooten C, Veerkamp RF. Genomic prediction using imputed whole-genome sequence data in Holstein Friesian cattle. Genet Sel Evol 2015;47:71.
Calus MP, Bouwman AC, Hickey JM, Veerkamp RF, Mulder HA. Evaluation of measures of correctness of genotype imputation in the context of genomic prediction: a review of livestock applications. Animal 2014;8:1743–53.
Jiang Y, Song H, Gao H, Zhang Q, Ding X. Exploring the optimal strategy of imputation from SNP array to whole-genome sequencing data in farm animals. Front Genet 2022;13:963654.
Gualdrón Duarte JL, Bates RO, Ernst CW, Raney NE, Cantet RJC, Steibel JP. Genotype imputation accuracy in a F2 pig population using high density and low density SNP panels. BMC Genet 2013;14:1–13.
Chitneedi PK, Arranz JJ, Suarez-Vega A, Garcia-Gamez E, Gutierrez-Gil B. Estimations of linkage disequilibrium, effective population size and ROH-based inbreeding coefficients in Spanish Churra sheep using imputed high-density SNP genotypes. Anim Genet 2017;48:436–46.
Forutan M, Ansari Mahyari S, Baes C, Melzer N, Schenkel FS, Sargolzaei M. Inbreeding and runs of homozygosity before and after genomic selection in North American Holstein cattle. BMC Genomics 2018;19:98.
Bhati M, Kadri NK, Crysnanto D, Pausch H. Assessing genomic diversity and signatures of selection in original Braunvieh cattle using whole-genome sequencing data. BMC Genomics 2020;21:27.
Ewels P, Magnusson M, Lundin S, Kaller M. Multiqc: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 2016;32:3047–8.
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A. The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. Genome Res 2010;20:1297–303.
DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 2011;43:491–8.
Nicolazzi EL, Caprera A, Nazzicari N, Cozzi P, Strozzi F, Lawley C. SNPchiMp v.3: integrating and standardizing single nucleotide polymorphism data for livestock species. BMC Genomics 2015;16:283.
Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 2015;4:1–16.
Browning BL, Conform-. gt. 2016. https://faculty.washington.edu/browning/conform-gt.html. Accessed 2 April 2023.
Browning BL, Browning SR. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. J Hum Genet 2009;84:210–23.
Excoffier L, Lischer HE. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and windows. Mol Ecol Resour 2010;10:564–7.
Biscarini F, Cozzi P, Gaspa G, Marras G. DetectRUNS: an R package to detect runs of homozygosity and heterozygosity in diploid genomes. CRAN (The comprehensive R archive Network) 2018. https://orca.cardiff.ac.uk/108906/.
Kirin M, McQuillan R, Franklin CS, Campbell H, McKeigue PM, Wilson JF. Genomic runs of homozygosity record population history and consanguinity. PLoS One 2010;5:e13996.
Ferenčaković M, Sölkner J, Curik I. Estimating autozygosity from high-throughput information: effects of SNP density and genotyping errors. Genet Sel Evol 2013;45:1–9.
Chessari G, Criscione A, Marletta D, Crepaldi P, Portolano B, Manunza A. Characterization of heterozygosity-rich regions in Italian and worldwide goat breeds. Sci Rep 2024;14:1–16.
Gorssen W, Meyermans R, Janssens S, Buys N. A publicly available repository of ROH Islands reveals signatures of selection in different livestock and pet species. Genet Sel Evol 2021;53:1–10.
Cingolani P, Patel VM, Coon M, Nguyen T, Land SJ, Ruden DM. Using as a model for genotoxic chemical mutational studies with a new program, SnpSift. Front Genet 2012;3:35.
Hu ZL, Park CA, Reecy JM. Bringing the animal QTLdb and CorrDB into the future: meeting new challenges and providing updated services. Nucleic Acids Res 2022;50:D956–61.
Fonseca PAS, Suarez-Vega A, Marras G, Canovas A. GALLO: an R package for genomic annotation and integration of multiple data sources in livestock for positional candidate loci. GigaScience 2020;9:1–9.
Masharing N, Sodhi M, Chanda D, Singh I, Vivek P, Tiwari M. DdRAD sequencing based genotyping of six Indigenous dairy cattle breeds of India to infer existing genetic diversity and population structure. Sci Rep 2023;13:9379.
Ventura RV, Miller SP, Dodds KG, Auvray B, Lee M, Bixley M. Assessing accuracy of imputation using different SNP panel densities in a multi-breed sheep population. Genet Sel Evol 2016;48:71.
Daetwyler HD, Capitan A, Pausch H, Stothard P, van Binsbergen R, Brondum RF. Whole-genome sequencing of 234 bulls facilitates mapping of Monogenic and complex traits in cattle. Nat Genet 2014;46:858–65.
Pook T, Mayer M, Geibel J, Weigend S, Cavero D, Schoen CC. Improving imputation quality in BEAGLE for crop and livestock data. G3 (Bethesda) 2020;10:177–88.
Gilly A, Southam L, Suveges D, Kuchenbaecker K, Moore R, Melloni GEM. Very low-depth whole-genome sequencing in complex trait association studies. Bioinformatics 2019;35:2555–61.
Butty AM, Sargolzaei M, Miglior F, Stothard P, Schenkel FS, Gredler-Grandl B. Optimizing selection of the reference population for genotype imputation from array to sequence variants. Front Genet 2019;10:510.
Mitt M, Kals M, Parn K, Gabriel SB, Lander ES, Palotie A. Improved imputation accuracy of rare and low-frequency variants using population-specific high-coverage WGS-based imputation reference panel. Eur J Hum Genet 2017;25:869–76.
Pausch H, MacLeod IM, Fries R, Emmerling R, Bowman PJ, Daetwyler HD. Evaluation of the accuracy of imputed sequence variant genotypes and their utility for causal variant detection in cattle. Genet Sel Evol 2017;49:24.
Schaefer RJ, Schubert M, Bailey E, Bannasch DL, Barrey E, Bar-Gal GK. Developing a 670k genotyping array to tag ~ 2 M SNPs across 24 horse breeds. BMC Genom 2017;18:1–18.
Falchi L, Cesarani A, Criscione A, Hidalgo J, Garcia A, Mastrangelo S. Effect of genotyping density on the detection of runs of homozygosity and heterozygosity in cattle. J Anim Sci 2024;102:skae147.
Greenbaum G, Rubin A, Templeton AR, Rosenberg NA. Network-based hierarchical population structure analysis for large genomic data sets. Genome Res 2019;29:2020–33.
Ceballos FC, Joshi PK, Clark DW, Ramsay M, Wilson JF. Runs of homozygosity: windows into population history and trait architecture. Nat Rev Genet 2018;19:220–34.
Grilz-Seger G, Mesarič M, Cotman M, Neuditschko M, Druml T, Brem G. Runs of homozygosity and population history of three horse breeds with small population size. J Equine Vet Sci 2018;71:27–34.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007;81:559–75.
Marras G, Gaspa G, Sorbolini S, Dimauro C, Ajmone-Marsan P, Valentini A. Analysis of runs of homozygosity and their relationship with inbreeding in five cattle breeds farmed in Italy. Anim Genet 2015;46:110–21.
Mulim HA, Brito LF, Pinto LFB, Ferraz JBS, Grigoletto L, Silva MR. Characterization of runs of homozygosity, heterozygosity-enriched regions, and population structure in cattle populations selected for different breeding goals. BMC Genomics 2022;23:209.
Metzger J, Pfahler S, Distl O. Variant detection and runs of homozygosity in next generation sequencing data elucidate the genetic background of Lundehund syndrome. BMC Genomics 2016;17:535.
Jensen A, Lillie M, Bergstrom K, Larsson P, Hoglund J. Whole genome sequencing reveals high differentiation, low levels of genetic diversity and short runs of homozygosity among Swedish wels catfish. Heredity (Edinb) 2021;127:79–91.
Fang Y, Hao X, Xu Z, Sun H, Zhao Q, Cao R. Genome-wide detection of runs of homozygosity in Laiwu pigs revealed by sequencing data. Front Genet 2021;12:629966.
McQuillan R, Leutenegger AL, Abdel-Rahman R, Franklin CS, Pericic M, Barac-Lauc L. Runs of homozygosity in European populations. Am J Hum Genet 2008;83(3):359–72.
Howrigan DP, Simonson MA, Keller MC. Detecting autozygosity through runs of homozygosity: a comparison of three autozygosity detection algorithms. BMC Genom 2011;12:460–75.
Szmatoła T, Gurgul A, Jasielczuk I, Fu W, Ropka-Molik K. A detailed characteristics of bias associated with long runs of homozygosity identification based on medium density SNP microarrays. J Genomics 2020;8:43–8.
Zhang Q, Guldbrandtsen B, Bosse M, Lund MS, Sahana G. Runs of homozygosity and distribution of functional variants in the cattle genome. BMC Genomics 2015;16:542.
Nothnagel M, Lu TT, Kayser M, Krawczak M. Genomic and geographic distribution of SNP-defined runs of homozygosity in Europeans. Hum Mol Genet 2010;19:2927–35.
Rogowski K, van Dijk J, Magiera MM, Bosc C, Deloulme JC, Bosson A. A family of protein-deglutamylating enzymes associated with neurodegeneration. Cell 2010;143:564–78.
Zhong Y, Yan W, Ruan J, Fang M, Yu C, Du S. XBP1 variant 1 promotes mitosis of cancer cells involving upregulation of the polyglutamylase TTLL6. Hum Mol Genet 2022;31:2639–54.
Jacquet BV, Salinas-Mondragon R, Liang H, Therit B, Buie JD, Dykstra M. FoxJ1-dependent gene expression is required for differentiation of radial glia into ependymal cells and a subset of astrocytes in the postnatal brain. Development 2009;136:4021–31.
El Nagar AG, Salem MMI, Amin AMS, Khalil MH, Ashour AF, Hegazy MM. A single-step genome-wide association study for semen traits of Egyptian Buffalo bulls. Anim (Basel) 2023;13.
Mei C, Junjvlieke Z, Raza SHA, Wang H, Cheng G, Zhao C. Copy number variation detection in Chinese Indigenous cattle by whole genome sequencing. Genomics 2020;112:831–36.
Sun X, Wang Y, Loor JJ, Bucktrout R, Shu X, Jia H. High expression of cell death-inducing DFFA-like effector a (CIDEA) promotes milk fat content in dairy cows with clinical ketosis. J Dairy Sci 2019;102:1682–92.
Gummesson A, Jernas M, Svensson PA, Larsson I, Glad CA, Schele E. Relations of adipose tissue CIDEA gene expression to basal metabolic rate, energy restriction, and obesity: population-based and dietary intervention studies. J Clin Endocrinol Metab 2007;92:4759–65.
Dahlman I, Kaaman M, Jiao H, Kere J, Laakso M, Arner P. The CIDEA gene V115F polymorphism is associated with obesity in Swedish subjects. Diabetes 2005;54:3032–4.
An ZX, Shi LG, Hou GY, Zhou HL, Xun WJ. Genetic diversity and selection signatures in Hainan black goats revealed by whole-genome sequencing data. Animal 2024;18:101147.
Bussiman FO, Aparecida dos Santos B, Abreu Silva BdC, Mamani GCM, Grigoletto L, Pereira GL. Genome-wide association study: Understanding the genetic basis of the gait type in Brazilian Mangalarga Marchador horses, a preliminary study. Livest Sci 2020;231.
Patterson L, Staiger EA, Brooks SA. is associated with gait type in Mangalarga Marchador horses, but does not control gait ability. Anim Genet 2015;46:213–5.
Moazemi I, Mohammadabadi MR, Mostafavi A, Esmailizadeh AK, Babenko OI, Bushtruk MV. Polymorphism of DMRT3 gene and its association with body measurements in horse breeds. Russ J Genet 2020;56:1232–40.
Jäderkvist K, Holm N, Imsland F, Árnason T, Andersson L, Andersson LS. The importance of the DMRT3 ‘Gait keeper’ mutation on riding traits and gaits in standardbred and Icelandic horses. Livest Sci 2015;176:33–9.