ProteinCenter User Manual
Table of Contents

Chapter 19. Dataset comparison

Table of Contents

19.1. How to compare datasets
19.1.1. Choose datasets for a comparative analysis
19.1.2. Comparing datasets
19.1.3. Compare clustered datasets

To compare datasets of hundreds of proteins can be a very difficult task, since proteins are stored by protein keys which contain no information about what protein, which isoform and whether the protein is processed or a fragment. Comparing larger datasets or a large number of datasets manually is an almost impossible task, and repeating an analysis is not really possible.

With ProteinCenter, multiple datasets of thousands of proteins can readily be compared. It is a fast process allowing for repeated analysis (e.g. trying out with different filters switched on, exploring proteins of different compartments, etc.). It is possible to analyze even larger datasets, although this will effect the speed of the analysis.

The compare menu appears for all compare folders. In this chapter it is explained how to select datasets for comparisons (and create a compare folder) - and how to compare.

  1. Switch to turn on and off the comparison

  2. Selection of which datasets to show. See how in Table 19.1, “Comparing datasets”

  3. Reset the compare settings

19.1. How to compare datasets

This section explains how to compare datasets. Comparison is a two step process, summarized below.

  1. First you choose datasets that should be included, and create a compare folder

  2. Then you start comparing them by selection

19.1.1. Choose datasets for a comparative analysis

Prior to a comparative analysis, choose the datasets that are to be compared.

  1. Click the yellow folder icons of datasets to include them in the comparison. Selected datasets will have a green check mark on the folder icon. Clicking the folder icon again will remove the check mark.

  2. Click the description of the category in which the new "compare folder" containing the result of the comparison should be placed

  3. Click the 'compare dataset' button

Next, a compare folder is created. The folder contains each of the datasets, but it is the compare folder (not the subfolders) that should be selected in the subsequent comparison analysis (as shown in this graphics). A compare folder can be moved, renamed, deleted, etc. just as other folders. But it cannot be used in another comparison.

In the various data views an extra column appears, showing which datasets contain a particular protein. Each dataset has a unique column and color - as in the example shown here with three datasets.

19.1.2. Comparing datasets

Once the compare folder has been created, the actual comparison is undertaken using the comparison filter in the compare menu. When a compare folder is selected, the compare menu is always shown next to the filters menu.

In the following table, examples are given on how to specify a comparison with logical AND, OR & NOT operators.

The individual datasets can be selected by clicking the white field - and deselected if clicking once again (on the now colored field).

Table 19.1. Comparing datasets

Compare commandResult set
Proteins that occur in both dataset 1 and 2

Proteins that occur in dataset 1 or 2

Proteins that occur in dataset 1 or 2 but not in 3

Proteins that occur in dataset 3 and in either 1 or 2

Proteins that occur in at least 2 datasets. This may be combined with the other commands shown above

After applying a comparison filter, the resulting subset of proteins may be analyzed and/or saved to a new folder, and analysis can continue. Below is an example shown.

The various biological filters can be combined to exclude or include certain subsets of the complete merged dataset. This allows for example to restrict a comparison to the membrane proteins, or to exclude proteins from certain species.

The filter setting for a comparison including filters could look like this:

This would imply: Show me all proteins that are:

  1. Found in dataset 1 and 2

  2. But not in dataset 3

  3. And that are human proteins

  4. And are annotated as either being membrane or golgi proteins

For more information about filters in general see Chapter 11, Filters.

19.1.3. Compare clustered datasets

Rather than just comparing proteins, it may be useful to compare proteins in clusters. The basic idea is that proteins are considered the same if they appear in the same cluster.

Hence, this allows the user to choose at which level the comparison should be undertaken. With some rough examples based on clustering at level:

  • 100% to be able to compare proteins at the level of fragments vs full length

  • 98% to be able to compare proteins at the level of alleles

  • 95% to be able to compare at the level of highly similar proteins – most likely alleles and splice variants

  • 60% to be able to compare at the level of homologous proteins

For example, the 98% similarity level allow you to ask questions like:

"Show me all proteins which occur in multiple datasets, whether they represent one allele or another".

"Show me proteins occurring in dataset A but not in dataset B, unless these proteins are merely different alleles of proteins occurring in dataset B".

Obviously, the higher level of similarity, the more groups are created.

For more information on the biological significance of different similarity levels please refer to Section 16.4, “The biological significance of clustering levels”.

For details on how to cluster datasets see Section 16.6, “How to cluster datasets”