Sage-N Research has defined the next-generation Sequest standard specifically for throughput and sensitivity required for translational proteomics research.
It is especially suitable for research involving phosphorylation and protein post-translational modifications (PTMs) important to cancer and stem-cell research.
The Sequest 3G standard is defined in close collaboration with Dr John R Yates of the Scripps Research Institute, the primary co-inventor of the original Sequest search engine (SE).
It defines a single common standard for similarity scores, search statistics and file formats to provide a robust foundation that meets the needs of translational research, including support for high-accuracy mass spectrometers and dissociation technologies such as electron-transfer dissociation (ETD).
Translational proteomics technology has evolved specifically to accurately characterise low-abundance proteins and protein PTMs in complex cell lysates, in contrast with first-generation proteomics that optimised for simpler protein mixtures.
Therefore, a translational proteomics search engine must: be optimised for high-throughput searching of large data sets with 100K+ spectral; include digital signal processing (DSP) to improve sensitivity for noisy spectra; provide robust PTM search, including support for ETD; and be extensible to support multiple similarity scores to improve specificity.
Sage-N claims the Sequest 3G proteomics SE is the first commercially robust version to incorporate all of these requirements.
It is the latest generation of Sage-N Research's re-implementation of the Sequest algorithm using a proprietary, patent-pending indexed search optimised for high-throughput and on-the-fly PTM searching, called Sorcerer-Sequest.
Comprehensive PTM searches against species-specific protein sequences can now exceed 100,000 spectra per hour even for a low-end Sorcerer system, which is several orders of magnitude faster than common PC software for covering the same search space.
Sequest 3G improves on previous Sequest versions by using the DSP-based cross-correlation score (XCorr) as the primary score, while preserving the 'preliminary score' (Sp) value for the top 500 XCorr hits.
This feature improves search sensitivity for phosphorylated peptides, while maintaining backwards compatibility with existing Sequest-based workflows.
In addition, Sequest 3G includes the calculation of the 'E-value' statistical parameter for each spectrum.
This parameter, popularised by the Blast algorithm, measures the likelihood of the high XCorr score being derived by chance alone.
More importantly, the E-value allows a more statistically rigorous estimation of the significance of the top XCorr hit and can replace the simple but problematic 'delta-Cn' (dCn) parameter for searching small protein databases using high mass-accuracy data.
These important enhancements update the venerable Sequest standard for translational proteomics research using modern mass spectrometers, just as the proteomics practitioners are finally moving from simple protein identification to profiling molecular pathways.
Leading proteomics labs are routinely using different search engines in parallel to achieve higher sensitivity and specificity.
While this is commonly performed with multiple software copies running in separate servers, this can be most efficiently implemented using a highly sensitive first-pass filter that reports, for example, the top 50 candidates, which can then be rescored at the second pass using more sophisticated computing-intensive scoring modules.
This multiple-pass search-engine architecture, in which the first pass is the Sequest 3G engine, is the basis of the Sorcerer Search Engine Architecture available on the Sorcerer platform, starting with v4.0.
The second-pass rescore modules can implement different similarity scores without the need to search the protein sequences.
It allows more sophisticated fragmentation models for improved sensitivity and specificity normally prohibitive with one-pass search engines.
Specialised rescore modules, including open-source modules, will be available for different mass spectrometers and technologies including ETD and PTM site localisation.
Sequest 3G is developed and maintained exclusively by Sage-N Research and is available for licensing within third-party bioinformatics software suites for a variety of mass spectrometers and technologies.