percolator / percolator

Semi-supervised learning for peptide identification from shotgun proteomics datasets

Home | Edit | New

Example

The data sets used in percolators original publication are all available from the Noble lab web site. So if you want to just get a test example run the command sequence

$ mkdir test; cd test
$ wget -q http://noble.gs.washington.edu/proj/percolator/data/yeast-01.sqt.tar.gz
$ tar xvzf yeast-01.sqt.tar.gz
$ percolator yeast-01.sqt yeast-01.shuffled.sqt > yeast-01.psms
Percolator version 1.11, Build Date May 27 2009 09:50:32
Copyright (c) 2006-9 University of Washington. All rights reserved.
Written by Lukas Käll (lukall@u.washington.edu) in the
Department of Genome Sciences at the University of Washington.
Issued command:
percolator yeast-01.sqt yeast-01.shuffled.sqt
Started Tue Jun 16 21:01:14 2009 on unknown_host
Hyperparameters fdr=0.01, Cpos=0, Cneg=0, maxNiter=10
69705 records in file yeast-01.sqt
69705 records in file yeast-01.shuffled.sqt
Train/test set contains 69705 positives and 69705 negatives, size ratio=1 and pi0=1
selecting cpos by cross validation
selecting cneg by cross validation
Estimating 7084 over q=0.01 in initial direction
Reading in data and feature calculation took 4.5 cpu seconds or 5 seconds wall time
---Training with Cpos selected by cross validation, Cneg selected by cross validation, fdr=0.01
Iteration 1 : After the iteration step, 10183[1] positives with q<0.01 were estimated by cross validation
Iteration 2 : After the iteration step, 11273 positives with q<0.01 were estimated by cross validation
Iteration 3 : After the iteration step, 11551 positives with q<0.01 were estimated by cross validation
Iteration 4 : After the iteration step, 11665 positives with q<0.01 were estimated by cross validation
Iteration 5 : After the iteration step, 11689 positives with q<0.01 were estimated by cross validation
Iteration 6 : After the iteration step, 11691 positives with q<0.01 were estimated by cross validation
Iteration 7 : After the iteration step, 11694 positives with q<0.01 were estimated by cross validation
Iteration 8 : After the iteration step, 11697 positives with q<0.01 were estimated by cross validation
Iteration 9 : After the iteration step, 11713 positives with q<0.01 were estimated by cross validation
Iteration 10 : After the iteration step, 11722 positives with q<0.01 were estimated by cross validation
Obtained weights (only showing weights of first cross validation set)
first line contains normalized weights, second line the raw weights
lnrSp	deltLCn	deltCn	Xcorr	Sp	IonFrac	Mass	PepLen	Charge1	Charge2	Charge3	enzN	enzC	enzInt	lnNumSP	dM	absdM	m0
-0.372	0.131	0.602	0.474[2]	0.0658	-0.029	0.708	-0.519	0.137	0.0303	-0.0584	1.02	1.1	-1.53	0.0182	0.224	-0.128[3]	-5.91
-0.2	0.786	6.9	0.788	0.000248	-0.179	0.00117	-0.0943	1.34	0.0606	-0.117	2.56	2.52	-1.15	0.585	0.348	-0.347	-14.7
After all training done, 11540 positives with q<0.01 were found when measuring on the test set
Found 11540 peptides scoring over 1% FDR level on testset
Merging results from 3 datasets
Calibrating statistics - estimating pi_0
Selecting pi_0=0.815[4]
Calibrating statistics - calculating q values
New pi_0 estimate on merged list gives 11786 over q=0.01
Calibrating statistics - calculating Posterior error probabilities (PEPs)
Binned data into 500 bins for PEP calcuation
Processing took 163.5 cpu seconds or 164 seconds wall time

Here we have labeled couple of interesting features in the output, reference with brackets ([x])above:

  1. Here the number of PSMs over a q-value of 0.01 are estimated by cross-validation
  2. The weight of XCorr is positive – indicative of a high xcorr gives a better hit – that is a good indication
  3. The weight of the absdM is negative – indicative that large differences between observed and calculated mass gives a worse score – that is good
  4. We estimate that 81.5% of the PSMs are incorrect matches
Last edited by percolator, Tue Jun 16 15:42:19 -0700 2009
Home | Edit | New
Versions: