OpenMS-PEP Google Summer of Code 2018

Final Report

In a typical MS workflow, experimental mass spectra are matched against a protein database resulting in a ranked list of peptide matches characterised by a particular score (depending on the search engine used). Additional work is needed to come up with a measure of confidence that a peptide-spectrum match with a certain score is correct. In OpenMS, the postprocessing tool assigning confidence to peptide-spectrum matches is called IDPEP. As a simplified re-implementation of PeptideProphet, IDPEP has been known to be unstable in some cases. The aim of this project was to investigate whether the well-known tool Percolator could be used instead and if so, establish Percolator as the go-to tool for PEP estimation in OpenMS.

Work accomplished:

Future work:

  • Add new Percolator features in the case of multiple fractions
  • Test against a larger protein database (e.g. UniProt)

Have a look for yourself:

Here are some results for an example dataset (iPRG2016 study, sample A1, search engine: MSGF+). Shown are the number of correct identifications for various PEP thresholds as well as threshold-free ROC curves (True Positive Rate (TPR) vs. False Positive Rate (FPR)) and corresponding AUC values. The percentage change in the number of correctly identified peptides is roughly 50% (which is a lot). Additionally, we win 5% of area under the ROC curve, which assures us that finding more correctly identified peptides doesn’t come at the cost of collecting a ton of additional false positives as well. Percolator’s performance is stable across various search engines and datasets of varying size (data not shown). All in all, Percolator has proved itself to be a worthy successor of IDPEP.

Number of correct identifications ROC curves
png png