Table of Contents
To compare datasets of hundreds of proteins can be a very difficult task, since proteins are stored by protein keys which contain no information about what protein, which isoform and whether the protein is processed or a fragment. Comparing larger datasets or a large number of datasets manually is an almost impossible task, and repeating an analysis is not really possible.
With ProteinCenter, multiple datasets of thousands of proteins can readily be compared. It is a fast process allowing for repeated analysis (e.g. trying out with different filters switched on, exploring proteins of different compartments, etc.). It is possible to analyze even larger datasets, although this will effect the speed of the analysis.
The compare menu appears for all compare folders. In this chapter it is explained how to select datasets for comparisons (and create a compare folder) - and how to compare.
Switch to turn on and off the comparison
Selection of which datasets to show. See how in Table 19.1, “Comparing datasets”
Reset the compare settings
This section explains how to compare datasets. Comparison is a two step process, summarized below.
First you choose datasets that should be included, and create a compare folder
Then you start comparing them by selection
Prior to a comparative analysis, choose the datasets that are to be compared.
Click the yellow folder icons of datasets to include them in the comparison. Selected datasets will have a green check mark on the folder icon. Clicking the folder icon again will remove the check mark.
Click the description of the category in which the new "compare folder" containing the result of the comparison should be placed
Click the 'compare dataset' button
Next, a compare folder is created. The folder contains each of the datasets, but it is the compare folder (not the subfolders) that should be selected in the subsequent comparison analysis (as shown in this graphics). A compare folder can be moved, renamed, deleted, etc. just as other folders. But it cannot be used in another comparison.
In the various data views an extra column appears, showing which datasets contain a particular protein. Each dataset has a unique column and color - as in the example shown here with three datasets.
Once the compare folder has been created, the actual comparison is undertaken using the comparison filter in the compare menu. When a compare folder is selected, the compare menu is always shown next to the filters menu.
In the following table, examples are given on how to specify a comparison with logical AND, OR & NOT operators.
The individual datasets can be selected by clicking the white field - and deselected if clicking once again (on the now colored field).
Table 19.1. Comparing datasets
|Compare command||Result set|
|Proteins that occur in both dataset 1 and 2|
Proteins that occur in dataset 1 or 2
Proteins that occur in dataset 1 or 2 but not in 3
Proteins that occur in dataset 3 and in either 1 or 2
Proteins that occur in at least 2 datasets. This may be combined with the other commands shown above
After applying a comparison filter, the resulting subset of proteins may be analyzed and/or saved to a new folder, and analysis can continue. Below is an example shown.
The various biological filters can be combined to exclude or include certain subsets of the complete merged dataset. This allows for example to restrict a comparison to the membrane proteins, or to exclude proteins from certain species.
The filter setting for a comparison including filters could look like this:
This would imply: Show me all proteins that are:
Found in dataset 1 and 2
But not in dataset 3
And that are human proteins
And are annotated as either being membrane or golgi proteins
For more information about filters in general see Chapter 11, Filters.
Rather than just comparing proteins, it may be useful to compare proteins in clusters. The basic idea is that proteins are considered the same if they appear in the same cluster.
Hence, this allows the user to choose at which level the comparison should be undertaken. With some rough examples based on clustering at level:
100% to be able to compare proteins at the level of fragments vs full length
98% to be able to compare proteins at the level of alleles
95% to be able to compare at the level of highly similar proteins – most likely alleles and splice variants
60% to be able to compare at the level of homologous proteins
For example, the 98% similarity level allow you to ask questions like:
"Show me all proteins which occur in multiple datasets, whether they represent one allele or another".
"Show me proteins occurring in dataset A but not in dataset B, unless these proteins are merely different alleles of proteins occurring in dataset B".
Obviously, the higher level of similarity, the more groups are created.
For more information on the biological significance of different similarity levels please refer to Section 16.4, “The biological significance of clustering levels”.
For details on how to cluster datasets see Section 16.6, “How to cluster datasets”
© 2005-2017 Thermo Fisher Scientific