ProteinCenter User Manual
Chapter 13. Protein data view

13.1. The ProteinData view

There are four different levels at which protein data can be explored. These are accessed through the Peptides, ProteinData, Proteins and Clusters panes in the workbench major module.

The protein data view is the basic level, where each line correspond to a protein with supplementary data/measurements. In this view (see below) there may be multiple lines for the same protein in cases where multiple measurements exists for a given protein. The supplementary data are the data that can be imported with the protein accession keys. Refer to Chapter 21, Import for more details on importing data, and to Appendix B, Supplementary Data for more details on defined supplementary data.

In the Proteins view, the supplementary data are summaries for each protein.

In the Clusters view, sets of proteins can be grouped together and data are summaries for the cluster.

Note, that when dataset folders are created from the lookup or basket, no supplementary data exists and thus columns 8-23 will not be shown. For imported data only the relevant columns (those for which data exist) are shown.

Below is a short description of the most common columns in the protein data view. To manually select which columns to display, see Section 5.4.5, “Selecting which columns to display”.

Figure 13.1. Proteindata view - Part 1

  1. Acc. Key - preferred accession key

  2. Pep - the number of peptides imported

  3. Description - the general descriptive information stored with the particular protein key

  4. Key - imported accession key

  5. Select protein - check box for selection of protein entries

  6. Add protein to Basket - click the basket to add this protein to the basket (see Chapter 8, The Protein Basket)

  7. Cluster - the cluster identifier

  8. Figure 13.2. ProteinData view - Part 2

  9. L1 - Generic Link

  10. PP - Protein Probability

  11. AQR - Average Quantitative Ratio

  12. E - Error for average quantitative ratio

  13. PV - P-Value for average quantitative ratio

  14. GD1 - Generic Decimal number

  15. NP - Number of Peptides

  16. GI1 - Generic Integer

  17. GI2 - Generic Integer

  18. UI - Unambiguous Identification

  19. IS - Indistinguishable Subgroup

  20. PTM - Post-Translational Modification

  21. GS1 & GS2 - Generic Strings

    Figure 13.3. ProteinData view - Part 3, quantitation data

  22. QR# - Quantitation ratios: QR1-QR3 for iTRAQ 4-plex, QR1-QR7 for iTRAQ 8-plex, and QR2-QR5 for SILAC

  23. QN# - iTRAQ quantitation number of peptides

  24. QSD# - Standard deviation for ratios: QSD1-QSD3 for iTRAQ 4-plex, QSD1-QSD7 for iTRAQ 8-plex, and QSD2-QSD5 for SILAC

    Figure 13.4. ProteinData view - Part 4, emPAI data

  25. Imported emPAI - Exponentially modified protein abundance index. See Ishihama et al, 2005

  26. emPAI calculated by ProteinCenter using the formula defined in Ishihama et al, 2005, with the number of experimental ('observed') peptides being the number of unique tryptic peptide sequences for the protein, and the number of theoretical ('observable') peptides being the number of unique peptide sequences found by a theoretical tryptic digestion of the protein sequence. Before this calculation is carried out, the peptides are filtered, in effect only considering peptides with masses between 700 and 2800 Da. Any incompletely digested peptides are treated as if all peptides resulting from a perfect tryptic digestion of the peptide sequence are observed. Semi- or non-tryptic peptide sequences are considered as evidence of their corresponding full tryptic peptide, if the peptide sequence part passes the mass filter

Furthermore, there are two optional columns not shown by default, which can be activated by the column selector:

  • Mono. mass - The monoisotopic mass of the protein sequence, including all modifications on the peptides

  • Avg. mass - The average mass of the protein sequence, including all modifications on the peptides

If peptides are imported, the "Pep"-column will show the number of unique peptide sequences. Clicking the number will open a view of these peptides, as shown in the following figure:

Figure 13.5. A row in the protein data view expanded to show peptides

Below is a short description of each column

  1. Click the number of unique peptide sequence to display the detailed peptide information. Click it again to hide it

  2. Peptide - peptide sequence imported. Modified amino acids are displayed in red

  3. Probability - the peptide score given by the search engine

  4. NSP - number of sibling peptides

  5. NSP-Probability - Peptide probability adjusted for number of sibling peptides

  6. Modification - if peptide was found to be modified this column contains the name of the modification(s)

  7. Position - if peptide was found to be modified this column contains the modification position number(s)

  8. AA - if peptide was found to be modified this column contains the modified amino acid(s)

  9. Observed - the observed amino acid mass(es)

  10. Mono. mass - the monoisotopic amino acid mass(es)

  11. Avg.mass - the average amino acid mass(es)

  12. QR1 & QR2 & QR3 - iTRAQ or SILAC quantitation ratios

  13. pI - the predicted isoelectric point of the peptide. Calculated by ProteinCenter on import based on Bjellqvist et al, 1993. The values are calculated based on the peptide sequence only. Post-translational modifications are not taken into consideration

  14. GRAVY - the grand average of hydropathy value, i.e. the sum of hydropathy values of all the amino acids in the sequence, divided by the number of residues. Calculated by ProteinCenter on import based on Kyte & Doolittle, 1982