CLC bio has just released a scientific white paper which confirms, that, in benchmarking tests, its new algorithm for assembly of Next Generation Sequencing data is the fastest available
Not only is CLC bio's algorithm considerably faster, but it also provides a better quality of the results, compared to other algorithms benchmarked in the white paper.
Assistant professor at Rutgers University, Todd Michael, states: "The speed of CLC bio's new algorithm for reference assembly of Next Generation Sequencing data raises the bar to a level currently unmatched by any competitor.
"When CLC bio continues this impressive rate of development, and eventually also handles Solid's Color Space analysis in the same convincing manner, this could easily become a de facto tool for scientists working with Next Generation Sequencing analysis".
Instead of using around tghree to four hours assembling 8.5 million reads against a whole human genome, CLC bio's assembly algorithm accomplished the same calculation in little more than half an hour, which means at least five times faster than the closest competitor.
For the assembly of large data sets, the increase in speed is even bigger: when assembling 86 million reads against the whole human genome, CLC bio's assembly algorithm is more than 14 times faster, meaning an assembly normally taking almost 40 hours can be done in less than two.
At the same time, CLC bio's algorithm provides a better quality of the results delivered with more than 85% accuracy, compared to around 83% for the other algorithms in the white paper.
Another highly interesting aspect of this improved assembly algorithm is the modest requirements of physical memory - at no point during the benchmark tests did CLC bio's algorithm require more than 8GB RAM.
The benchmark tests were conducted by comparing both 8.5 million reads and 86 million reads against a whole human genome, kindly supplied by the Beijing Genomics Institute.
The data set for the benchmark tests was sequenced on Illumina's Solexa platform and each read had a length of 35 nucleotides.
Once the assembly algorithm will be released in August, it will be available both in a command-line version on CLC bio's Bioinformatics Cell platform and through CLC Genomics Workbench, which offers an intuitive graphical interface for analysing and visualising Next Generation Sequencing data.
CLC bio's white paper is free to download.