Table of Contents
Every protein in ProteinCenter has a ProteinCard which provides concise, precise and focused annotation. All information is displayed in a single screen, to provide an instant overview without the need to scroll down pages. This is achieved by providing sets of summary information, and for each category of information the details can be displayed on the same page by a single click.
The ProteinCard is split into a summary view (the upper part framed in green in the figure below) and details view (the lower part framed in red). The summary view is again split into a number of summary sections for which more detailed information may be displayed in the detail view. The ProteinCard in the figure can be displayed in ProteinCenter by entering an accession key (e.g. P06213) in the lookup (See Chapter 7, The Lookup ), selecting a match from the list and (if necessary) navigating to the ProteinCard by selecting the ProteinCard pane marked by 1 in the figure.
In the following list the different summary sections are described:
The ProteinCard pane.
A page counter allowing navigation to the next or the previous protein in a dataset. In this example the dataset contains only 1 protein, so the selected protein is number 1 of 1. In case the protein being looked at is not in your dataset this section can also display an accession key or a series of accession keys. If, for example, you click the key of a protein in the similar proteins list (e.g. "Acc1") and then again click on a key ("Acc2") in that protein's list of similar proteins, "Acc1->Acc2" will be displayed.
The basket icon, for adding the viewed protein to the basket (for details see Chapter 8, The Protein Basket).
The full workspace path to the dataset from which the protein was selected.
Official symbols (in bold), alternative symbols and the official name for the gene.
Species name (linked to NCBI taxonomy browser) chromosome number & chromosomal location. The icon indicates that the gene is spliced.
Keys - Includes an accession key from each of the sources included in ProteinCenter, if any such exists for the given protein. Click Keys to retrieve all accessions keys for the protein in the details view as described in Section 10.2.1, “Keys details”.
Peptides - The overview shows the number of experimental peptides on the protein with the number of unique peptides in parenthesis (in case of unmatched peptides, the number of unique peptides is split into matched and unmatched), followed by the protein coverage percentage. Note that if you lookup a protein, it will not have any peptides. But in this example the protein has been selected from a dataset which contains peptides. This particular protein has seven peptides where only four are unique. Click header to see peptide details as described in Section 10.2.2, “Peptides details”.
Sequence - The summary includes only the sequence length. Click header to see sequence details as described in Section 10.2.3, “Sequence details”.
Similar proteins - Click the header to see "similar proteins", i.e. proteins with homologous sequence similarity to the given protein. This is done with post-processed BLAST jobs, so it may take a considerable amount of time when run against the entire ProteinCenter sequence database. The results are presented with functional annotation as in the proteins view, and include links to detailed alignments as described in Section 10.2.4, “Similar proteins details”. In the text box, a specific similarity threshold (60-100%) can be applied, in order to limit the number of similar proteins shown. The "Lookup similar proteins" button to the right is used to retrieve annotation from the similar proteins, all of which are filled into the categories 13-17 in this list, further enriching the information content for the protein. When the similar proteins have been retrieved, their number will be displayed after the Similar Proteins button Section 10.2.4, “Similar proteins details”.
Features - A graphical overview of the sequence features. Different features can be removed from the view. The number of features are listed. The details view lists the features - see Section 10.2.5, “Features details”.
Gene & Protein summary - An abbreviation of the textual information reqistered for the protein and its corresponding gene. Click the header to display the full text in the details view.
Molecular Functions - Summarized Gene ontology annotation and IUBMB Enzyme Nomenclature - click header to get detailed GO information as described in Section 10.2.6, “Molecular Functions details”.
Cellular Components - Summarized Gene ontology annotation for subcellular localization - click header to get detailed GO information as described in Section 10.2.7, “Cellular Components details”.
Biological Processes - Summarized Gene ontology annotation - click header to get detailed GO information as described in Section 10.2.8, “Biological Processes details”.
Interactions & Pathways - Number of annotated protein interactions and pathways in which the gene is involved. Click header to see details as described in Section 10.2.9, “Interactions & Pathways details”.
Diseases - Disease annotation from UniProt. Click header to see details as described in Section 10.2.10, “Diseases details”.
External links - A set of external links - as described in Section 10.2.11, “External links”.
Details - Displays detailed annotation for chosen categories of annotation. In the given example for Gene & Protein Summary.
More detailed information on the details view for the individual sections is found in the subsections of this chapter.
The following section shows examples of the detailed information for the summarized parts.
The gene and protein function summary is shown as default.
All accession keys for a given protein are summarized
Click the blue header to display the details of the keys. In this summary section the best choice of accession key for each source is shown if such exists for the given protein. In this example only a GI and a UniProt accession exists. The type shown in the proteins view is configured on a per-user basis - see Section 4.3, “Preferred accession”.
The best choice of a representative GI from NCBI. The details for how the identifier is chosen is explained in Section 4.3.2, “How the optimal description is picked automatically”.
The best choice of a UniProt accession.
Primary key - the accession key from the database from which the sequence was imported. It is linked to the original database record in the source database. The preferred type of accession (in this case NRDB) is emphasized.
Src - An abbreviation of the source database - for details see Section 4.3, “Preferred accession”.
Secondary key - secondary accession keys are either:
Alternative keys used in the source database.
Reference from another database, where this key acts as primary identifier (e.g. an IPI key originally deduced from an Ensembl record would list that Enseml identifier as a secondary key).
Source for the secondary key.
Description - the original description for the original database entry.
Outdated protein keys are flagged by the exclamation mark sign, and the keys are linked to the outdating history (at their respective source database) as explained in the following section. In this case we see that the protein has an outdated IPI source.
Keep in mind that even though a single accession key is outdated as in the given example, it does not necessarily imply that the protein does not exist - other entries may provide evidence for the protein.
Also keep in mind that on the ProteinCard, a protein is considered a specific amino acid sequence for a given species - see Chapter 4, The Protein record.
The tracking of outdated entries can be very valuable in analysis of old datasets. There is a very large turnover of accession numbers especially associated with the predicted sequences, which has resulted in outdating of more than 90 million source records .Therefore, a protein key used to store a protein identification from an experiment may quickly become outdated - and in certain cases e.g. at GenBank the outdated entry may be removed completely making the accession number worthless, unless the sequence has been stored, too. By tracking the outdated accession numbers old datasets may still be analyzed and compared to new datasets.
In ProteinCenter™ protein keys are flagged as outdated when they no longer appear in the source database (because they have been outdated or replaced). Outdated proteins are linked to revision history at the original sources.
For a given protein record in ProteinCenter™ , a number of associated accession numbers may have been outdated, but as long as there is just one live accession number the record is still "live". I.e. a protein record is not dead just because one particular external database consider it a wrong entry as long as there is another external database that considers the protein sequence to be a valid entry.
A protein record in ProteinCenter™ will be considered dead if all accession numbers are outdated. ProteinCenter™ will however maintain the basic information about the dead protein. Dead proteins are flagged wherever they occur using the exclamation mark sign:
Although ProteinCenter strives to incorporate all existing and
outdated accessions, the set of outdated accessions is not guaranteed
to be complete. When encountering accessions (from the supported
source databases) that are not found in the system, we encourage users
to send them to:
The peptide overview section shows the total number of peptides and the number of unique peptides for the selected protein (in case of unmatched peptides, the number of unique peptides is split into matched and unmatched), followed by the protein coverage percentage. Note that the section is not displayed for proteins that do not have any peptides
In this example the protein has seven peptides with four unique peptides and a protein coverage of 2.6%. When clicking the Peptides header, the details view will display the sequence with the first peptide (sorted by their positions in the protein sequence) as showed in the following figure:
The peptide details view contains an overview of the sequence with the possibility to zoom in on areas around peptides. Referring to the figure we see:
A list of the peptides for the protein denoted by their start and end positions in the protein sequence. In this example there are four peptides on position 1182-1191. Peptides on the same position will have different modifications. Click on one of the peptides an the system will zoom in on that peptide. Each dataset has its own color. This is very useful when comparing datasets.
The position of the first peptide.
The peptide sequence can be seen on the zoomed-in view of the sequence (corresponding to the frame around the first peptide above). Modifications are shown as white letters: 'P' for phosphorylations, 'M' for any other modification, and 'X' for multiple modifications on one site. N- and c-terminal modifications are shown on the first and last amino acid respectively.
Glyco and Phospho modifications (PTMs) are shown on the whole sequence as Gs and Ps.
Other peptides found in the sequence. Choosing the 4th peptide (by clicking 1182-1191 ) moves the zoomed in focus to that peptide as showed in the following figure. Peptides can be selected by clicking on the colored spots in the sequence overview.
The chosen peptide is highlighted (it has a more solid color and a black box around it).
Details on the peptide are the dataset name (if comparison dataset), the peptide sequence, the position in the protein sequence, the initial probability score and (if any) the number of sibling peptides.
The peptide modifications, including mass and position.
The sequence details provides the full protein sequence
Click the blue sequence header to display the sequence in the details view:
The length of the protein sequence (in amino acids):
The monoisotopic and average molecular weight of the protein sequence.
The sequence in GenBank format.
The protein sequence in FASTA format for use in other tools.
The nucleotide sequences can be found by following links to genome browsers or the NT link (when available) in the external link menu described in Section 10.2.11, “External links” .
The Similar Proteins view provides detailed information about homologeus proteins.
Clicking the blue header will display the details of the latest similar proteins action in the detail view.
A field for entering a sequence similarity threshold (at least 60%). When leaving the field blank a threshold of 98% will be assumed.
Click the icon to fetch the homologously similar proteins.
The number of homologously similar proteins (in this case 25) will be shown in the overview. In addition, this action will fill in annotations from the similar proteins in the relevant annotation fields:
The details view for similar proteins shows every protein in the format resembling the general Proteins view. It is also possible to select all, deselect all, delete from list, and create a new dataset from the similar proteins.
Click the Accession Key to go to the ProteinCard for this protein.
Click the percent sequence similarity to see a pairwise alignment. The alignment will open in a new window, which can be closed without leaving ProteinCenter. This similarity is given as the percentage of identical residues over the complete length of the shortest protein, hence the local BLAST alignment may show higher similarity for only part of the sequence. Note that only a part of the alignment is shown in the figure.
Dataset - this column is highlighted for proteins that are also found in the original dataset/lookup, from which the protein was derived. In this particular case, none of the similar proteins are in the selected dataset.
Sequence features consist of a selection of sequence features from UniProt, from various conserved domain predictions and from the computational enrichment undertaken by ProteinCenter.
The features overview is a graphical overview of the protein features.
The different types of features can be removed from the view.
Mouse over will provide additional info about the feature. Clicking on the feature will open the source web page for that feature.
A textual representation of the features will be shown in the details view when clicking the Features header:
The details view show the features sorted according to their start positions in the protein sequence. Note that the figure only shows some of the features - the overview reveals that there are 94 features.
The source of the feature:
The category - depending on source:
UniProt - Key names for features from UniProt (see UniProt documentation online )
InterPro - Identifier from the original method used. The various methods can be derived from the prefix of the key, e.g. PS99999 for ProSite, PR99999 for Prints, PF99999 for PFAM (see InterPro documentation for details).
Tmap -Transmembrane prediction.
From - amino acid start position.
To - amino acid end position.
Accession identifier for the domain linked to InterPro or PFAM.
The information in the summary is a consolidation of Gene Ontology data and EC information - Please refer to GOslim Appendix Section D.1.2, “Molecular function” for details.
If similar proteins have been retrieved (see Section 10.2.4, “Similar proteins details” ), annotation for these are also displayed - those only occurring in similar proteins are shown in a normal font (not bold). The number in parentheses is the count of similar proteins that the given trait is found in.
Those in bold are annotations that are directly associated with the selected protein.
In the details panel, the individual (not summarized) GO terms, ids and names are shown
The GO ID is linked to the QuickGO browser at EBI.
Evidence codes for GO annotation. Use mouse-over to get the non-abbreviated evidence code. More info at the Gene Ontology consortium.
List of PMID links to PubMed.
Go Slim indicates the basic GO Slim category for the GO term.
The name is the description for a GO term - created by the Gene Ontology consortium.
Enzymes with EC number for IUBMB Enzyme Nomenclature are displayed with links to detailed information at the International Union of Biochemistry and Molecular Biology.
The Cellular Components section works exactly as the Molecular Functions section.
The information in the summary is a consolidation of Gene Ontology data - Please refer to GOslim Appendix Section D.1.1, “Cellular component” for details.
As with Molecular Functions, similar protein annotation may be retrieved - see above.
The details are outlined as for Molecular Functions - see above.
The Biological Processes section works exactly as the Molecular Functions and the Cellular Components sections.
The information in the summary is a consolidation of Gene Ontology data - Please refer to GOslim Appendix Section D.1.3, “Biological process” for details.
As with Molecular Functions, similar protein annotation may be retrieved - see above.
The details are outlined as for Molecular Functions and Cellular Components - see above.
The Interactions & Pathways summary includes information about KEGG, UniProt, Wiki and Reactome pathways, as well as the number of interactions (IntAct, MIPS, STRING) the protein is involved in:
Note: KEGG pathways are only shown if the license grants that right.
The details pane is a list of interacting proteins similar to the list in the Proteins pane - see Chapter 14, Proteins view for a description of the included columns.
An additional column indicates whether the interacting proteins are found in the selected dataset. In this example none of the interacting proteins are in the selected dataset.
Click the Accession Key to go to the ProteinCard for this protein.
The Diseases section works like the GO sections. If similar proteins have been retrieved, their disease information is also included in the overview.
The overview contains truncated disease descriptions. Clicking on the header displays the full descriptions in the details view:
Like for GO annotation, it is possible to bring in annotation from similar proteins. The numbers indicate the number of times this disease description occurs in the set of similar proteins:
The summary provides the full annotation of disease annotation (for the selected protein only):
A general set of external links are provided. Note, that a number of external links are found associated with specific annotation like protein keys, conserved domains, GO domains etc.
The external link menu provides a number of links applicable for the particular protein.
HPRD - HUMAN Protein Reference Database (commercial fee).
MIM - OMIM - Online Mendelian Inheritance in Man at NCBI.
Entrez Gene - a searchable database of genes, from RefSeq genomes, and defined by sequence and/or located in the NCBI Map Viewer.
BLINK - "BLAST Link" displays the results of BLAST searches that have been done for every protein sequence in the Entrez Proteins data domain.
UniRef100 - UniRef Entry Viewer at 100% sequence similarity (at UniProt).
UniRef90 - UniRef Entry Viewer at 90% sequence similarity (at UniProt).
UniRef50 - UniRef Entry Viewer at 50% sequence similarity (at UniProt).
PubMed - A simple search with all protein synonyms against all PubMed literature.
SNPs - Lookup of SNPs (Single Nucleotide Polymorphisms) in dbSNP at NCBI.
Nt - A search against Entrez for nucleotides associated with any of the protein GIs for the given protein - Note that not all proteins have nucleotide records at Entrez.
ESBL - Text search of the Ensembl protein record against the Ensembl genome browser.
NCBI map - The Map Viewer supports search and display of genomic information by chromosomal position.
HomoloGene - HomoloGene is a NCBI system for automated detection of homologous proteins among the annotated genes of several completely sequenced Eukaryotic genomes.
UniGene - UniGene is an experimental system for automatically partitioning GenBank sequences into a non-redundant set of gene-oriented clusters.
RATMAP - The Rat Genome Database is focused on presenting rat genes, DNA-markers, QTLs, etc. that is localized to chromosome.
MGD - Mouse Genome Database.
MGI - Mouse Genome Informatics.
SGD - SGD is a scientific database of the molecular biology and genetics of the yeast Saccharomyces Cerevisiae.
RGD - The Rat Genome Database (RGD) curates and integrates rat genetic and genomic data.
WORMB - Wormbase - A database of C. Elegans.
FLYDB - A Database of the Drosophila Genome.
ZFIN - The Zebra fish Information Network.
IMGT - The international ImMunoGeneTics information system® http://imgt.cines.fr, is a high-quality integrated knowledge resource specialized in the immunoglobulins (IG), T cell receptors (TR), major histocompatibility complex (MHC), immunoglobulin superfamily and related proteins of the immune system (RPI) of human and other vertebrate species.
PDB - The RCSB PDB provides a variety of tools and resources for studying the structures of biological macromolecules and their relationships to sequence, function, and disease.
IntActAll - Interactions at EBI IntAct.
MIPS - Interactions at MIPS.
STRING - Interactions at STRING.
Note, that a number of other links are found in other views - e.g. links to UniProt, NCBI and IPI, are found in the details view of protein keys, where each accession key is linked to the resource from which it originates. Only links applicable to the selected protein are shown.
© 2005-2017 Thermo Fisher Scientific