Maize’s popularity and high demand led to large R&D infrastructures for breeding pipelines and the motivation to apply the most cutting-edge technologies, including genotyping. However, even for large companies and institutions, genotyping maize at the scale required for genomic prediction can be costly. Other factors that have promoted the need for efficient breeding programs include limited experimental field space and unpredictable environmental conditions that can cause collected field data to be skewed or missing. Accordingly, the shift towards computational tools is becoming increasingly important.
To mitigate costs and time, a company could test fewer progenies or reduce their marker set analysis. Neither option is ideal and can damage breeding pipelines by reducing the size of the breeding program or overlooking important data. Maize researchers and companies want to reduce their costs without compromising on the size of their progenies or the amount of generated data and knowledge.
Imputation is the answer to this problem – it fully optimizes a genotyping strategy by maximizing the information generated from a minimal marker set. Imputation can assist all genotyping applications as it relies on a specific breeding program’s genetic background and generates a small and accurate subset of markers to serve as a skeleton for the imputation of a larger marker dataset. NRGene’s genotyping solution, SNPer™, does just that. The intent is to reduce the costs of genotyping by minimizing the number of markers directly genotyped in the lab, yet deliver maximum marker data through computational algorithms. Because the cost of genotyping scales with marker number but not with imputation, substituting imputation for genotyping can greatly reduce the costs of acquiring genotypic data.
It is not unusual to accurately impute a number of markers that is ten times greater than the number genotyped. For example, in the absence of marker optimization and data imputation, one might need to genotype 5,000 or more loci in a breeding population to obtain data for 5,000 loci that are non-redundant and informative for genomic selection. With imputation however, one might genotype 500 loci and accurately impute data for the remaining 4,500. The resulting dataset is nearly identical to directly genotyping the 5,000 loci, but the lab costs are greatly reduced.
Quality control for genotypic data is an important step in the imputation process. Erroneous data may be detected and corrected, and progenies for which genotypic data are inconsistent with pedigree records may be detected and discarded. The imputed data can then be used to make genomic predictions on new progenies just as if the data came from direct genotyping in the lab. These predictions are made prior to field testing, allowing breeders to select only those individuals for field testing that are expected to exhibit superior performance. Taken together, the methods of marker optimization and data imputation allow breeding programs to reduce the costs of genotyping, make accurate genomic predictions, use field resources efficiently, and improve year-to-years gains.
Understanding the full picture is important…. but doing that at the lowest possible cost is essential. Whether to catch mistakes in advance or to help generate and develop markers for accurate breeding predictions,companies have selected marker optimization using low-cost genomic imputation to assist their genotyping programs.