Applied Biosystems was able to analyse the human genome sequence for a cost of less than $60,000, which is the commercial price for all required reagents needed to complete the project.
Applied Biosystems reports a significant development in the quest to lower the cost of DNA sequencing.
Scientists from the company have sequenced a human genome using its next-generation genetic analysis platform.
The sequence data generated by this project reveal numerous previously unknown and potentially medically significant genetic variations.
It also provides a high-resolution, whole-genome view of the structural variants in a human genome, making it one of the most in-depth analyses of any human genome sequence.
Applied Biosystems is making this information available to the worldwide scientific community through a public database hosted by the National Center for Biotechnology Information (NCBI).
This project was completed at a fraction of the cost of any previously released human genome data, including the approximately $300million spent on the Human Genome Project.
The cost of the Applied Biosystems sequencing project is less than the $100,000 milestone set forth by the industry for the new generation of DNA sequencing technologies, which are beginning to gain wider adoption by the scientific community.
The availability of this sequence data in the public domain is expected to help scientists gain a greater understanding of human genetic variation and potentially help them to explain differences in individual susceptibility and response to treatment for disease, which is the goal of personalized medicine.
Although most human genetic information is the same in all people, researchers are generally more interested in studying the small percentage of genetic material that varies among individuals.
They seek to characterise that variation as either single-base changes, or as a series of larger stretches of sequence variation known as structural variants.
Structural variants comprise fragments of DNA - which include insertions, deletions, inversions, and translocations of DNA sequences ranging from a few to millions of base pairs that have a higher potential of impacting genes and thus contributing to human disease.
Under the direction of Kevin McKernan, Applied Biosystems's senior director of scientific operations, the scientists resequenced a human DNA sample that was included in the International HapMap Project.
The team used the company's Solid System to generate 36 gigabases of sequence data in seven runs of the system, achieving throughput up to nine gigabases per run, which is the highest throughput reported by any of the providers of DNA sequencing technology.
The 36 gigabases includes DNA sequence data generated from covering the contents of the human genome more than 12 times, which helped the scientists to determine the precise order of DNA bases and to confidently identify the millions of single-base variations (SNPs) present in a human genome.
The team also analysed the areas of the human genome that contain the structural variation between individuals.
These regions of structural variation were revealed by greater than 100-fold physical coverage, which shows positions of larger segments of the genome that may vary relative to the human reference genome.
"We believe this project validates the promise of next-generation sequencing technologies, which is to lower the cost and increase the speed and accuracy of analysing human genomic information," said McKernan.
"With each technological milestone, we are moving closer to realizing the promise of personalised medicine".
McKernan's team used the Solid System's ultra-high-throughput capabilities to obtain deep sequence coverage of the genome of an anonymous African male of the Yoruba people of Ibadan, Nigeria, who participated in the International HapMap Project.
The scientists were able to perform an in-depth analysis of structural variants by creating multiple paired-end libraries of genomic sequence that included a wide range of insert sizes.
Most inserts exceeded 1000 bases.
The Solid System has the ability to analyse paired-end libraries with large insert sizes.
For the millions of SNPs identified in the project, the Solid System's two-base encoding chemistry discriminated random or systematic errors from true SNPs to reveal these SNPs with greater than 99.94 percent sequencing accuracy.
Another important attribute of the Solid System is that, unlike other available DNA sequencing platforms, the system is inherently scalable to support higher levels of throughput without requiring changes to the system's hardware.
The high-throughput, accuracy and paired-end analysis capability of the Solid System are expected to continue to reduce the cost of conducting studies of complex genomes and how variation in these genomes contributes to conditions such as cancer, diabetes and heart disease, among others.
Associating genetic variation with cancer and other diseases.
As in-depth resequencing efforts continue to reveal previously uncharacterised genetic variation in human genomes, researchers such as John McPherson at the Ontario Institute for Cancer Research expect to be able to associate these genetic variants with diseases such as cancer.
McPherson is cataloguing genetic alterations that occur in different types of cancers to better classify tumours and identify the important early events driving the disease.
These provide critical targets for refining and developing new targeted treatments and diagnostic tools.
"Paired-end sequencing is an essential component of whole genome analysis," said McPherson.
"The tight fragment size range provided by the Solid protocols allows the identification of a wide range of insertion and deletion sizes".
"Structural rearrangements are readily identified and deep genome coverage easily attained due to the high throughput of this platform".
Evan Eichler, an associate professor of genome sciences at the University of Washington's School of Medicine and a Howard Hughes Medical Institute Investigator, focuses his research on the role of duplicate regions and structural variation in the human genome.
Using computational and experimental approaches, he investigates the architecture of these regions and their role in evolution and disease.
"To understand the extent and prevalence of structural variation in the human genome, which is still largely unknown, my lab has been applying traditional sequencing methods with good results, but much more needs to be discovered at a faster pace," said Eichler.
"The human paired-end data being released is of such depth that discovering smaller structural events at higher resolution becomes possible".
"The availability of this dataset in the public domain will accelerate our understanding of structural variation in normal and disease states, and open the door to a faster exploration of this type of genetic diversity across human populations".
Developing software analysis tools for next-generation sequencing.
Next-generation sequencing platforms have enabled researchers to generate more genetic data than ever before.
Applied Biosystems's human resequencing effort represents one of the most comprehensive datasets of genomic data, which is expected to provide researchers with libraries of sequence data that will serve as a model for how to prepare and analyse samples of other complex genomes for future genome analysis projects.
Applied Biosystems expects that the public availability of the human sequence data will help drive innovation and speed the development of new bioinformatics tools.
These new tools are expected to enable researchers to interpret the meaning of the data that provide clues to better understand various aspects of health and disease.
In addition to the full human dataset, subsets of sequence data are available at NCBI.
These datasets can be accessed by independent academic and commercial software developers to further enable the development of analytical tools.
Applied Biosystems is making an analysis tool available through the Solid System software development community, which is expected to help independent software providers to interpret the subsets of data.
Through its software development community, Applied Biosystems has established relationships with scientists and bioinformatics companies to help scientists address next-generation sequencing bioinformatics challenges and develop tools that are expected to advance data analysis and management.
The data have also been deposited at the National Center for Biotechnology Information (NCBI), which is part of the National Library of Medicine, National Institutes of Health (Bethesda MD USA).