ProteinCenter User Manual
Table of Contents

ProteinCenter Manual

This user manual is for ProteinCenter Release 3.15

ProteinCenter is a trademark of Thermo Fisher Scientific.

2017


Table of Contents

I. Getting Started & Tutorials
1. Getting Started
1.1. Login
1.2. Looking up a protein
1.3. Protein neighbors
1.4. Looking up a peptide
1.5. Selecting, saving and deleting within a dataset
1.6. Import a dataset and view experimental data, proteins and groups
1.6.1. Import a dataset
1.6.2. View a dataset in the Protein Data view
1.6.3. View a dataset in the Proteins view
1.6.4. View a dataset in the Clusters view
1.7. Filters
1.8. Reports
1.9. Comparison of datasets
1.10. Context-sensitive online help
2. Tutorials
2.1. Single protein bioinformatical analysis
2.1.1. Example – GI:6807647
2.2. Bioinformatical and statistical analysis of a protein dataset
2.2.1. Traditional approach
2.2.2. Using ProteinCenter
2.3. Is the unknown protein really unknown?
2.4. Converting accession numbers
2.5. Bioinformatical analysis of experimental data using peptide clustering
2.6. Comparison of proteomics datasets
2.7. Outdated accession — but is the protein still valid?
2.8. Multiple alignment
II. User manual
3. Introduction
3.1. The four main concepts of ProteinCenter
3.1.1. Nr. 1 - All data and tools under one umbrella
3.1.2. Nr. 2 - Efficient large scale analysis from A-Z
3.1.3. Nr. 3 - Automation
3.1.4. Nr. 4 - A simplistic powerful interface
4. The Protein record
4.1. The basics of a protein record
4.2. Versioning and outdated entries
4.3. Preferred accession
4.3.1. Why a preferred type of protein keys?
4.3.2. How the optimal description is picked automatically
5. Overview of the interface
5.1. Browsing the interface
5.2. The Navigation Menu
5.3. The Lookup, Workspace, Basket switchboard
5.4. General utilities
5.4.1. Status field
5.4.2. Status bar menu
5.4.3. Paging functionality
5.4.4. Sorting columns
5.4.5. Selecting which columns to display
5.4.6. Mouse-over information
6. Login
6.1. Login
7. The Lookup
7.1. Lookup by GI or accession code
7.1.1. Introduction
7.1.2. How to look up a protein key
7.2. Lookup by peptide
7.2.1. Introduction
7.2.2. How to look up proteins by tryptic peptides
7.3. Lookup by gene symbol
7.3.1. Introduction
7.3.2. How to look up proteins by gene symbols
7.4. Lookup by annotation
7.4.1. Introduction
7.4.2. How to look up proteins by annotation
8. The Protein Basket
9. The Workspace
9.1. Description of the Workspace
9.1.1. Workspace controls
9.1.2. Folder tree
9.1.3. How to merge datasets
9.1.4. Searching the workspace
10. The ProteinCard
10.1. Introducing the ProteinCard
10.2. ProteinCard summary sections
10.2.1. Keys details
10.2.2. Peptides details
10.2.3. Sequence details
10.2.4. Similar proteins details
10.2.5. Features details
10.2.6. Molecular Functions details
10.2.7. Cellular Components details
10.2.8. Biological Processes details
10.2.9. Interactions & Pathways details
10.2.10. Diseases details
10.2.11. External links
11. Filters
11.1. Introduction to filters
11.2. How to use filters
11.3. Categories of filters
11.3.1. Accession Key
11.3.2. Alternatively Spliced Gene
11.3.3. Cluster Anchor
11.3.4. Cluster Size
11.3.5. Disease
11.3.6. Enzyme
11.3.7. Gene Id
11.3.8. Alternative Gene Key
11.3.9. Gene Symbols
11.3.10. Gene Official Symbol
11.3.11. Gene Summary
11.3.12. Chromosome
11.3.13. GO Cellular Component
11.3.14. GO Molecular Function
11.3.15. GO Biological Process
11.3.16. GO ID
11.3.17. GO Source
11.3.18. GO Evidence Code
11.3.19. InterPro Accession
11.3.20. InterPro Description
11.3.21. Interaction Source
11.3.22. Outdated
11.3.23. Pathway
11.3.24. Peptide Modification
11.3.25. Peptide Modification (Unannotated)
11.3.26. Peptide Length
11.3.27. Peptide Sequence
11.3.28. Peptide Unique Count
11.3.29. PFAM Accession
11.3.30. PFAM Description
11.3.31. Post Translational Modification
11.3.32. Protein Description
11.3.33. Protein Function
11.3.34. Protein Keyword
11.3.35. Protein Sequence
11.3.36. Protein Sequence Length
11.3.37. Signal
11.3.38. Source Database
11.3.39. Taxonomy
11.3.40. Transmembrane Count
11.3.41. Supplementary data
12. Peptides view
12.1. The peptides view
13. Protein data view
13.1. The ProteinData view
14. Proteins view
14.1. The proteins view
15. Genes View
15.1. The genes view
16. Clusters View
16.1. The clusters view
16.2. Introduction to clustering
16.3. Types of clustering
16.3.1. Clustering based on sequence similarity
16.3.2. Clustering based on peptide sharing
16.4. The biological significance of clustering levels
16.5. The many advantages to clustering
16.5.1. Clustering reduces the complexity of analysis
16.5.2. Clustering ensures better annotation coverage
16.5.3. Clustering reduces redundancy
16.5.4. Grouping related proteins
16.5.5. Grouping of alleles, fragments, isoforms
16.5.6. Dataset comparisons
16.5.7. Peptide sharing
16.6. How to cluster datasets
16.6.1. Clustering a dataset
16.6.2. How to cluster different parts of a dataset at different clustering levels
16.6.3. Using preferred data from a particular species as anchor
16.6.4. How to uncluster a dataset
16.6.5. How to use imported clusters
16.6.6. Different clustering algorithms
17. Profiling View
17.1. Profiling
17.1.1. The profiling view
17.1.2. Profiling algorithm details
17.2. How to profile datasets
17.2.1. Profiling a comparison or dataset
18. Heat Maps view
18.1. Heat maps
18.1.1. The heat map view
18.2. How to generate a heat map
18.2.1. Generating a heat map for a comparison or dataset
19. Dataset comparison
19.1. How to compare datasets
19.1.1. Choose datasets for a comparative analysis
19.1.2. Comparing datasets
19.1.3. Compare clustered datasets
20. The Alignment Viewer
20.1. Overview mode
20.2. Sequence information view
20.3. Comparing datasets mode
20.4. Alignment method
20.4.1. Alignment anchor selection
21. Import
21.1. How to import data
21.1.1. A note on ambiguous protein keys
21.2. Importing data from CSV files
21.2.1. A simple list of proteins
21.2.2. A list of proteins with peptides
21.2.3. Peptide modifications
21.2.4. Grouped proteins
21.2.5. Taxonomy limitation
21.2.6. Proteins from a list of gene identifiers
21.2.7. Quantitation
21.2.8. An example using more of the available columns
21.2.9. Including URLs and file links
21.2.10. Regional settings and the CSV format
21.2.11. Handling of incorrect values
21.3. Importing data from other software using CSV files
21.3.1. ProteinPilot from Applied BioSystems
21.3.2. MaxQuant from the Max Planck Institute of Biochemistry
21.3.3. Spectrum Mill from Agilent
21.3.4. ProteinLynx Global Server from Waters
21.3.5. VEMS III software from SDU
21.3.6. The Elucidator from Rosetta Biosoftware
21.3.7. ProteinProphet (Trans-Proteomic Pipeline) from ISB
21.3.8. Sorcerer from SageNResearch
21.4. Importing data from XML-based formats
21.4.1. ProtXML
21.4.2. Mascot XML
21.4.3. X! Tandem XML
21.4.4. PRIDE XML
21.4.5. BioWorks XML
21.4.6. Quantitation support for XML formats
21.5. Other formats
21.5.1. MSQuant
21.6. Direct upload from other applications
21.6.1. Phenyx
21.7. Identification of proteins and peptides
22. μLIMS
22.1. Editing link & text annotation
22.2. Permissions
23. Export
23.1. How to export data
23.1.1. Exporting the contents of the basket
23.2. Data formats for exported data
23.2.1. Proteins CSV / Protein Data CSV formats
23.2.2. Protein Genes CSV format
23.2.3. Genes CSV / Gene Data CSV formats
23.2.4. Peptides CSV format
23.2.5. Protein FASTA format
23.2.6. Protein interactions format
23.3. Using exported data
23.3.1. Analyzing interaction networks in Cytoscape
24. Reports
24.1. How to create reports
24.2. The Proteins report
24.3. The GO Slim reports
25. Statistics
25.1. The Statistics view
25.2. Introduction to the Statistics view
25.2.1. Summary view
25.2.2. Details view
25.3. Types of statistics
25.3.1. Statistics based on all data
25.3.2. Statistics based on selected data
25.3.3. Statistics based on cluster anchors
25.4. How to calculate statistics
25.4.1. How to calculate statistics for a single dataset
25.4.2. How to calculate statistics for comparison datasets
25.4.3. Significance statistics
25.4.4. Significance statistics theory
25.5. How to copy images from ProteinCenter
25.5.1. How to use images from ProteinCenter in MS PowerPoint
25.5.2. How to use images from ProteinCenter in Adobe Photoshop
26. Vocabulary
27. FAQ
28. Known issues
28.1. Known bugs
28.1.1. Attempt to import non existing file in Internet Explorer
28.1.2. Graphics not displayed after lengthy operations
29. References
III. Version history
30. Version History for ProteinCenter
30.1. Updates in ProteinCenter 3.15
30.2. Updates in ProteinCenter 3.14
30.3. Updates in ProteinCenter 3.13
30.4. Updates in ProteinCenter 3.12
30.5. Updates in ProteinCenter 3.11
30.6. Updates in ProteinCenter 3.10
30.7. Updates in ProteinCenter 3.9
30.8. Updates in ProteinCenter 3.8
30.9. Updates in ProteinCenter 3.7
30.10. Updates in ProteinCenter 3.6
30.11. Updates in ProteinCenter 3.5
30.12. Updates in ProteinCenter 3.4
30.13. Updates in ProteinCenter 3.3
30.14. Updates in ProteinCenter 3.2
30.15. Updates in ProteinCenter 3.1
30.16. Updates in ProteinCenter 3.0
30.17. Updates in ProteinCenter 2.8
30.18. Updates in ProteinCenter 2.7
30.19. Updates in ProteinCenter 2.6
30.20. Updates in ProteinCenter 2.5
30.21. Updates in ProteinCenter 2.2
30.22. Updates in ProteinCenter 2.0
30.23. Updates in ProteinCenter 1.4
30.24. Updates in ProteinCenter 1.3
30.25. Updates in ProteinCenter 1.2
30.26. Updates in ProteinCenter 1.1
IV. Administration Manual
31. Installation
31.1. Server Installation
31.1.1. Linux
31.1.2. DNS
32. Administrative Settings
32.1. License
32.1.1. Add or Upgrade License
32.1.2. License Information
32.2. Users
32.2.1. User State
32.2.2. Authentication method
32.2.3. Users and Roles
32.2.4. Add user
32.3. Communication
32.3.1. HTTP proxy server
32.4. Update
32.4.1. Field explanation
33. Content Update
34. System Maintenance
34.1. Workspace
34.2. MySQL
34.2.1. Start and stop
34.3. TomEE
34.3.1. Start and stop
V. Appendices
A. Data Sources
A.1. Protein Data Sources
A.2. Gene Data Sources
A.3.
B. Supplementary Data
B.1. Protein supplementary data
B.1.1. Calculated emPAI
B.2. Gene supplementary data
B.3. Peptide supplementary data
C. Abbreviations for commonly used organisms
D. GOslim Categories
D.1. GeneOntology
D.1.1. Cellular component
D.1.2. Molecular function
D.1.3. Biological process

List of Figures

1.1. Login menu
1.2. Import pane
2.1. Proteins view for the comparison.
2.2. Initial view of the Clusters pane.
2.3. Clustering by shared peptides
2.4. The clustered data
2.5. Viewing a single cluster
2.6. Viewing the peptides in the ProteinCard
2.7. Viewing only what the two datasets have in common
2.8. Excluding clusters with proteins from a particular set
2.9. Displaying the comparison in the Proteins view
2.10. Setting up the clustering
2.11. The result of the clustering
2.12. Clusters containing proteins from all sets
2.13. Showing clusters not containing plasma proteins (Actual clusters not displayed)
2.14. Editing filter definitions
2.15. Choosing extracellular
2.16. The filter has been saved
2.17. Filtering result
2.18. Calculating statistics for all datasets
2.19. Venn diagram for two datasets
2.20. Venn diagram for three datasets
2.21. Statistics for GO Slim Molecular Function (Truncated)
2.22. Clustering the proteins in the set according to shared peptide evidence
2.23. The result of the clustering
3.1. ProteinCenter transforms lists of experimentally identified proteins to useful biological information
3.2. ProteinCenter user interface components
4.1. Some less informative entries in GenBank
4.2. Multiple keys and descriptions for the same protein entry
5.1. The Navigation menu - major modules in the top line and minor in the lower line
5.2. The switchboard for switching between the three ways of working with data.
5.3. The status field
5.4. The status bar menu
5.5. The Settings page
5.6. The About box
5.7. The Data Release Statistics page
5.8. The paging functionality
5.9. The column selection drop-down menu
6.1. Login menu
7.1. The Lookup Component
8.1. The Basket
9.1. The Workspace
10.1. ProteinCard for protein with UniProt accession P06213
10.2. Revision history at NCBI for GI 34867630
10.3. Flagging of a dead protein
11.1. The Filter menu
11.2. The Filter definition menu
11.3. Filtering by peptide modifications
11.4. Filtering by peptide modifications
12.1. Peptides view
12.2. Data handling functions
13.1. Proteindata view - Part 1
13.2. ProteinData view - Part 2
13.3. ProteinData view - Part 3, quantitation data
13.4. ProteinData view - Part 4, emPAI data
13.5. A row in the protein data view expanded to show peptides
14.1. The protein view (truncated to the right)
14.2. Expanding protein data inside the protein view
15.1. The Genes view
16.1. The Clusters view
16.2. The blast alignment between two proteins in a cluster
16.3. Clustering indistinguishable proteins
16.4. The cluster menu
17.1. The profiling view
17.2. The profiling menu
18.1. The heat map view
18.2. Example of quantitation coloring on a KEGG pathway map
18.3. The heat map menu
20.1. The Cluster view with alignment link
20.2. The Alignment view of selected protein cluster
20.3. The Alignment view showing sequence information
20.4. The Alignment view scrollbar clicked
20.5. The Alignment view when comparing datasets
21.1. Import pane
21.2. Selecting the destination category for the imported dataset
21.3. Selecting a file for import
21.4. Upon import, selecting a dataset and clicking the μLIMS pane, detailed information about the dataset will be displayed
21.5. An imported file results in a new data folder
21.6. The spreadsheet format for a dataset consisting of protein keys
21.7. The save as menu in Excel
21.8. The flat format in the CSV file
21.9. Excel view of an import file using many of the available columns
21.10. Exporting the Mascot Search Results from the Mascot application
21.11. Specifying what to include in the exported file
21.12.
21.13. Exporting the search results
21.14. Saving the X! Tandem results as XML
21.15. Selecting the Protein Information display in BioWorks 3.3
21.16. Exporting to XML from BioWorks 3.3
21.17. The Proteins Overview page in the Phenyx application
21.18. The uploaded proteins displayed in ProteinCenter
21.19. Bringing up the ProteinCard for a particular protein in ProteinCenter
22.1. The μLIMS page in view mode
22.2. The μLIMS page in edit mode
22.3. The μLIMS permissions section
23.1. The Export Pane
24.1. The General Info page
24.2. The proteins report
25.1. The Statistics View for a single dataset ('TestSet')
25.2. Details for General Information
25.3. Details for GO Cellular Components
25.4. Calculating statistics for a single folder
25.5.
25.6. Calculating statistics for comparison
25.7. Statistics for the comparison of YeastExp1 and YeastRef
25.8. Over- and under-represented Molecular functions in YeastExp1 compared to YeastRef
32.1. Administration pane
32.2. Users pane
32.3. Communications pane
32.4. Update information icon
32.5. Update pane

List of Tables

16.1. Members of a cluster do not have to be similar - they only have to be similar to the anchor
16.2. Types of proteins that are grouped, based on clustering level
19.1. Comparing datasets
21.1. Legal column headers for all protein based CSV importers
21.2. Sequence encoded peptide modifications supported in the CSV format
21.3. Legal column headers for Genes proteins CSV importer
21.4. Data imported from ProteinPilot protein summary files
21.5. Data imported from ProteinPilot peptide summary files
21.6. Data imported from MaxQuant protein group files
21.7. Data imported from MaxQuant peptide files
21.8. Data imported using the Spectrum Mill format
21.9. Data imported using the ProteinLynx format
21.10. Data headers in ProteinProphet and ProteinCenter
21.11. Primary measure of identification for proteins and peptides
23.1. Data description for proteins and protein data CSV export formats
23.2. Data description for gene-centric CSV export format, where protein annotations are aggregated onto genes
23.3. Data description for gene-centric CSV export format, where genes and their annotations originate from gene data (not aggregated from proteins)
23.4. Data description for peptide-centric CSV export format
26.1. Vocabulary
32.1. User values
32.2. HTTP proxy settings
32.3. Update information
A.1. Protein Data Sources
A.2. Gene Data Sources
A.3. Description of the Most Prominent Data Sources in ProteinCenter
B.1. Protein supplementary data
B.2. Gene supplementary data
B.3. Peptide supplementary data
C.1. Abbreviations for commonly used organisms
D.1. GOslim categories for Cellular Component
D.2. GOslim categories for Molecular Function
D.3. GOslim categories for Biological Process