Final Report08 Aug 2018
In a typical MS workflow, experimental mass spectra are matched against a protein database resulting in a ranked list of peptide matches characterised by a particular score (depending on the search engine used). Additional work is needed to come up with a measure of confidence that a peptide-spectrum match with a certain score is correct. In OpenMS, the postprocessing tool assigning confidence to peptide-spectrum matches is called IDPEP. As a simplified re-implementation of PeptideProphet, IDPEP has been known to be unstable in some cases. The aim of this project was to investigate whether the well-known tool Percolator could be used instead and if so, establish Percolator as the go-to tool for PEP estimation in OpenMS.
- Compare OpenMS’s current peptide search engine postprocessor tool IDPEP with Percolator as alternative method for a smaller (Jupyter Notebooks) and larger dataset (Jupyter Notebook, Jupyter Notebook) and three different search engines (X!Tandem, MS-GF+, Comet) on the peptide and protein level (Code Base)
- Fit Percolator better into a typical OpenMS workflow (Pull Request)
- Improve Percolator’s performance by adding additional features computed across multiple replicate runs (Pull Request) and investigate their effect on performance (Jupyter Notebook, Jupyter Notebook)
- Other: Fix existing feature (Pull Request)
- Add new Percolator features in the case of multiple fractions
- Test against a larger protein database (e.g. UniProt)
Have a look for yourself:
Here are some results for an example dataset (iPRG2016 study, sample A1, search engine: MSGF+). Shown are the number of correct identifications for various PEP thresholds as well as threshold-free ROC curves (True Positive Rate (TPR) vs. False Positive Rate (FPR)) and corresponding AUC values. The percentage change in the number of correctly identified peptides is roughly 50% (which is a lot). Additionally, we win 5% of area under the ROC curve, which assures us that finding more correctly identified peptides doesn’t come at the cost of collecting a ton of additional false positives as well. Percolator’s performance is stable across various search engines and datasets of varying size (data not shown). All in all, Percolator has proved itself to be a worthy successor of IDPEP.
|Number of correct identifications||ROC curves|