Performance Contribution of Distinct Feature Sets13 Aug 2018
In order to determine which new features are worth keeping, I analysed to which degree a group of features contributes to Percolator’s performance improvement. The idea is to ignore a single feature (or a group of features) and determine the drop in performance. For a fair comparison, it is important to account for correlations between features.
Shown are pairwise Pearson correlations (all p-values < 0.001). We can determine two groups of features that are stronger correlated with each other than with other features:
REP:replicateSpectra: Both features are based on PSMs that not only share the same (unmodified) peptide sequence but also the same precursor ion. (Label: “Extra precursor features”)
REP:siblingIons: These features are based on PSMs that only share the same (unmodified) peptide sequence. (Label: “Extra peptide features”)
Percolator was then run in four configurations:
- Default features (Label: “No extra features”)
- Default features + Extra peptide features
- Default features + Extra precursor features
- Default features + Extra peptide features + Extra precursor features (Label: “All extra features”)
The plots below reveal that most of the performance improvement stems from the extra peptide features, extra precursor features contribute very little.
Number of correct identifications