Table of Contents
Theis located on the switchboard, and is identified with a binoculars icon. It allows users to look up proteins based on:
Protein keys (accession numbers, GIs or entry names)
Tryptic peptides (sequences)
Entry field for protein keys.
Pressto lookup protein keys and display information for the corresponding proteins.
Entry field for peptide sequences.
Pressto lookup tryptic peptides and display information for the corresponding proteins.
Entry field for gene symbols.
Pressto lookup gene symbols and display information for the corresponding proteins.
Protein annotation type selection.
Entry field for annotation descriptions.
Pressto lookup annotation identifiers for the given description.
Entry text area for annotation identifier specifications.
Pressto lookup proteins annotated with the given identifiers.
As described in Chapter 4, The Protein record, each protein in ProteinCenter™ is a unified record of a number of external protein sequence records. For every species, external protein sequence records containing the same amino acid sequence are consolidated into a single protein in ProteinCenter™.
Therefore, many different accession numbers can be associated with a single protein record in ProteinCenter™, as shown in the example below. Here, all the accession numbers, the GIs and the single IPI reference are keys for the same protein sequence in the same species, and hence all these keys represent redundant entries of the same protein. In ProteinCenter™ there is only one protein record per specific isoform in a given species.
The original database entries may contain different annotation, which is captured in the single protein record in ProteinCenter™. Cross-linking and mapping different types of accession numbers represent a significant workload for many researchers wishing to compare different datasets from both their own experiments, from databases and from literature. By providing a complete mapping of protein keys to protein sequences, ProteinCenter™ is the ultimate bookkeeper that allows researchers to handle and analyze data independent of the ProteinID types.
The lookup menu provides the users with a fast way to look up protein entries associated with a specific protein key (accession numbers, GIs, IPI identifiers, etc.). ProteinCenter™ contains protein sequences from all the popular sources, cf. Table A.1, “Protein Data Sources”.
When performing a lookup, it is not necessary to specify whether the key is a GI or IPI identifier etc.
The protein keys should be entered in their own standard format, like in these examples:
Please note that these examples do not represent a comprehensive list with regard to formats for valid protein keys.
Both primary and secondary protein keys may be used. Primary protein keys are e.g. the first accession number of a UniProt record. Researchers who wish to cite entries from UniProt in their publications are strongly advised (by UniProt) to always use primary accession numbers. Over time, however, a primary UniProt ProteinID may become secondary and even ambiguous. To address this potential problem, ProteinCenter™ allows the use of ambiguous secondary protein keys, by displaying the various choices of protein keys to the user in the cases where multiple proteins are associated with a secondary protein key.
The letters in the protein keys can be either uppercase or lowercase. Version numbers (".1" or ".2" etc) are optional. For more details on versioning, please refer to Section 4.2, “Versioning and outdated entries”.
Please note, that you may only enter one protein key at a time, and that the accession keys in the first column of the Proteins view are different depending on the configuration of preferred accessions keys.
In this example a UniProt accession number is looked up:
In thefind the text box for entering a protein accession.
Type the accession number
P13073 into the box.
The associated proteins are fetched, and in this case a single protein record is returned and displayed in the selected view (or ). In the view the result will be presented as follows:
The individual columns in the Chapter 14, Proteins view.are described in
Depending on the preferred choice of protein keys, different protein keys and descriptions may be shown, while it is always the exact same protein entry that is returned. In the example above the preferred source is NRDB so a GI and the NRDB description are displayed. If thepane had been selected prior to the lookup, the for this particular protein record will be displayed.
If no proteins are found for the protein key, the
Search returned 0 matches.
For more information regarding the choice of accession type and description to display, please refer to Section 4.3, “Preferred accession”
If the version number is included with the accession number
NP_001007236.1), the specific entry is
fetched by the lookup routine, ignoring any occurrence of the
accession number with other versions. This allows the user to retrieve
proteins by outdated accession numbers.
In the following example, the version of the protein key is included. Hence the particular protein entry corresponding to that accession.version key is returned. This occurs independent of whether the protein entry is live or outdated, and hence independent of whether a newer version exists or not.
In theuse the lookup by .
Type the protein key
XP_007651.14 into the box, including the version number.
The associated proteins are fetched and a single protein record is displayed in theview (if selected):
In this example, the retrieved protein entry is outdated, due to the fact that all associated protein keys are obsolete. In the Chapter 10, The ProteinCard).the protein key is linked to the newer version (see
If the version number is not included (e.g.
NP_001007236), the system returns all
versions e.g. proteins with keys NP_001007236.1 and NP_001007236.2 etc.
For more details on versioning and live vs. obsolete accession numbers, please refer to Section 4.2, “Versioning and outdated entries” and Section 10.2.1.1, “Flagging of outdated protein keys in ProteinCenter”.
Thefeature also allows users to look up protein entries by one or more tryptic peptides. Only proteins that contain the specified set of peptides are returned.
This functionality allow users with data derived from mass spectrometry to quickly evaluate the information content of tryptic peptides, i.e. to assess whether the peptides are information rich (pointing to only a few proteins) or not. Secondly, it will allow users to identify other peptides related to the same set of proteins.
ProteinCenter™ enables instant lookup of completely cleaved tryptic peptides from any of the proteins in the database. Peptides must have a minimum length of 5, as shorter peptides will tend to return a disproportionately large number of false positives. Any search on peptides will return a maximum of 10000 hits.
Examples of valid peptides:
The peptide lookup only searches ProteinCenter™ for tryptic peptides - peptides imported by users will not be searched.
In this example a single tryptic peptide is used to look up a set of proteins:
In thelocate the textfield.
Specify the peptide
TGWGSR in the box.
To inspect the returned protein for a particular species, use the sorting function.
Proteins can also be found from their gene symbols. ProteinCenter will look for the official symbol supplied from Entrez and Ensembl, and alternative symbol from other sources.
Lookup by gene symbol is similar to lookup by protein key or by tryptic peptides.
In theselect the textfield.
Specify the gene symbol
MSI in the box.
The Gene column shows the official symbol for the gene. Note that the lookup for gene symbols is not case sentitive. Also note that the three bottom proteins have Ebp as their official symbol. This means that they must have MSI as one of their altenative symbols. This can be seen at the ProteinCard. To go to the ProteinCard, click one of the accession keys in the first column. If IPI00137471.2 is clicked, the following ProteinCard will appear (shown only in part):
This protein does indeed have the alternative gene symbol mSI, which is why it is returned my the 'MSI' lookup.
Targeted proteomics require the retrieval of proteins based on the particular annotation of interest. ProteinCenter implements this in a two-step process: an initial (and optional) search on annotation identifiers based on their description, followed by a protein search based on the given annotation identifiers.
In the, select the particular type of annotation in the drop-down box.
(Optionally) Search for annotation identifiers based on their description (or name, title, GO term or similar), using thebutton. This will list the annotations with at least a partial match in the annotation textarea below, along with their exact description.
Add, delete or edit annotation identifiers in the annotation textarea. Each identifier must be kept on its own line. The identifiers must adhere to the standard definition for that particular annotation:
|Taxonomy||Taxonomy ID as defined by NCBI||#######|
|GO category||'GO:' followed by 7 digit identifier||GO:#######|
|Enzyme Code||'EC:' followed by four dot-separated numbers||EC:#.#.#.#|
|KEGG pathway||3-letter taxonomy acronym and 5 digit pathway identifier||ORG#####|
|PFAM domain||'PF' followed by 5 digit identifier||PF#####|
|InterPro domain||'IPR' followed by 6 digit identifier||IPR######|
© 2005-2017 Thermo Fisher Scientific