Mapping copy number variation by population-scale genome sequencing.
Mills RE., Walter K., Stewart C., Handsaker RE., Chen K., Alkan C., Abyzov A., Yoon SC., Ye K., Cheetham RK., Chinwalla A., Conrad DF., Fu Y., Grubert F., Hajirasouliha I., Hormozdiari F., Iakoucheva LM., Iqbal Z., Kang S., Kidd JM., Konkel MK., Korn J., Khurana E., Kural D., Lam HYK., Leng J., Li R., Li Y., Lin C-Y., Luo R., Mu XJ., Nemesh J., Peckham HE., Rausch T., Scally A., Shi X., Stromberg MP., Stütz AM., Urban AE., Walker JA., Wu J., Zhang Y., Zhang ZD., Batzer MA., Ding L., Marth GT., McVean G., Sebat J., Snyder M., Wang J., Ye K., Eichler EE., Gerstein MB., Hurles ME., Lee C., McCarroll SA., Korbel JO., 1000 Genomes Project None.
Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.