Genomes project publishes map of genetic variation
1 Nov 2012
DNA sequences have been made available by the 1,000 Genomes Project to help discover the genetic causes of disease.
In a DNA version of ’spot-the-difference’, European Molecular Biology Laboratory (EMBL) scientists studied the genomes of 1092 healthy people from Europe, the Americas and East Asia, systematically tracking what makes us different from each other.
“The 1000 Genomes Project has achieved something truly exceptional in providing this powerful baseline of human variation,” said Paul Flicek of EMBL-EBI, who co-chairs the project’s Data Coordination Centre (DCC).
The five-year project, which is collaboration between scientists, charities and businesses, has taken advantage of the increasing speed and lower costs of sequencing machines.
As well as providing a clearer picture of which DNA sequences are common and which are rare, the results could help the ever-ongoing search for genetic links to diseases.
Jan Korbel from EMBL Heidelberg pointed out the advantages of combining information on such large-scale variations with data on changes at a smaller scale.
The 1000 Genomes Project has achieved something truly exceptional in providing this powerful baseline of human variation
“This integrated view of genome variation will be extremely useful for understanding cause and consequence, and hence provide an invaluable context for future medical studies,” Korbel said.
“When people find a SNP, a single letter change, that’s associated with a disease, they can now see if there’s a change in a larger chunk of the genome that’s always inherited alongside that SNP, and could cause the disease.”
The results also open up new avenues for researchers interested in how different genetic sequences have spread across human populations, taken by European settlers to the Americas, for instance.
Ensuring that the project’s results are useful to researchers working in a wide range of fields is the mission of Flicek’s data coordination team.
“Like ENCODE and other massive datasets, it is crucial that people working in all areas of human health and biomedical research can make the most of it. Our role has been to make these data not just freely available but truly accessible.”
To that end, the scientists have already made the current results available to the scientific community.
“The results of this first phase are in the 1000 Genomes browser, which has a whole suite of Ensembl-based tools that help you make practical use of the data,” said Laura Clarke of EMBL-EBI, Technical Lead for the DCC.
“For example it lets you look at shared patterns of variance, which can be a good indicator of whether a particular genetic factor is related to disease. Another very practical tool lets you take just a slice of the data, so you don’t have to download the whole massive dataset.”