ProteinCenter User Manual
Table of Contents

Chapter 20. The Alignment Viewer

Table of Contents

20.1. Overview mode
20.2. Sequence information view
20.3. Comparing datasets mode
20.4. Alignment method
20.4.1. Alignment anchor selection

The Alignment Viewer is used to align proteins, and examine the peptides used to identify these proteins (if any). It works in two modes:

Currently only clusters of proteins can be aligned and viewed. Any kind of clustering can be used.

20.1. Overview mode

Figure 20.1. The Cluster view with alignment link

The Cluster view with alignment link

  1. Select a dataset and, click Cluster tab.

  2. Cluster the dataset (here we use 1 common peptide), click Cluster all data button.

  3. Click the 'Cluster Name' link (here IPI00023542.5).

  4. A new window will open and show the alignment (here 2 proteins and their peptides). Several alignment windows can be opened at the same time.

Figure 20.2. The Alignment view of selected protein cluster

The Alignment view of selected protein cluster

The Alignment View displays a list of all the protein accession codes with the aligned sequences next to them. The first protein is the anchor of the cluster. The alignment is done by aligning each of the neighbor proteins to the anchor each at a time using BLAST.

  1. The background of the sequences is white, so any gaps in the sequences will appear as white.

  2. Each protein sequence is colored in light blue.

  3. When neighbors are aligned, gaps are applied to the neighbor sequences. Gaps can be either deletion gaps (shown as small triangles), or insertion gaps (white spaces, see 1). Deletion gaps hides some sequence information, to make the rest of the sequence align. Insertion gaps adds spaces in the sequence to make the sequences align. The insertion gap markers can be either green, red, or yellow. Green means that they are hiding some sequence information. Red means that they are also hiding one or more peptides. Click on them to show the hidden information. When they show the hidden information, their starting position is marked with a yellow triangle.

  4. The peptides (if any) are colored in the same color as their dataset color (see Chapter 19, Dataset comparison). Hover over them with the mouse pointer and all identical peptides will be highlighted. Start and end positions are highlighted with two black bars. A tooltip text is also displayed with information about the peptide. Peptide modification are shown below, but they are not included in the comparison of peptides, where only the amino acids are used.

  5. A ruler above the sequences shows their length and the approximate position of the peptides.

  6. Wherever there are differences in the amino acid sequences (compared to the anchor), an orange rectangle is displayed below the sequences. In the sequence information view, a letter or a number is displayed. A single letter if only 1 amino acid differs from the anchor, and a number (from 2 to 9) indicating the number of differing sequences otherwise. A '+' is appended to the number when it exceeds 9.

  7. All peptide modifications are shown on the bottom line. Blue rectangles mark their positions.

  8. It is possible to change the colors by clicking a particular color representative and moving the color sliders at the bottom of the page. This can help to visualize the differences better (please note that the color change is currently not persistent).

  9. Click on the protein accession codes to align using that protein as anchor (as highlighted in Figure 20.3, “The Alignment view showing sequence information”).

  10. Click on 'Click here to show alignment with sequence information' to go into the second display mode.

20.2. Sequence information view

Figure 20.3. The Alignment view showing sequence information

The Alignment view showing sequence information

The second mode shows the sequence information (the amino acids), and a scrollbar to move the window through the whole amino acid sequence. Here all the amino acids are shown at their aligned positions, as well as on the differences and modifications lines. N- and c-terminal modifications show up as '{' and '}' respectively.

When several peptides overlap, their color becomes darker. This effect is achieved by using transparent colors, and accumulating color intensity. Use the color sliders below to adjust that effect. This way it is easier to see where many peptides cover the same sequence regions.

Figure 20.4. The Alignment view scrollbar clicked

The Alignment view scrollbar clicked

Click and drag the scrollbar to move the window. Alternatively, click to the left or right of the scrollbar to move the window a whole page left or right.

On the scrollbar, all the peptides are shown in overview, so it is easy to find them on the sequence.

20.3. Comparing datasets mode

Figure 20.5. The Alignment view when comparing datasets

The Alignment view when comparing datasets

When comparing datasets, all peptides matching the same protein are shown on the same sequence, but below each other and in its own dataset color.

In the figure above, peptides from 'Demoset2' are blue, and peptides from 'Demoset3' are light green.

When the mouse pointer is hovering over a peptide modification, all peptides with that modification in that position are highlighted (thick black box).

20.4. Alignment method

The method behind the alignment process is not global, in the sense that every sequence is attempted aligned with every other sequence (like e.g. ClustalW). Rather, it is anchor-centric, meaning that every member sequence is aligned individually in the best possible way with the anchor sequence. The alignment algorithm seeks to create the best alignment, given all identified local sub-sequence alignments and peptide evidence.

20.4.1. Alignment anchor selection

Given that every member sequence is always aligned to the anchor, changing the anchor sequence will have a significant effect on the alignment configuration. This allows a user to achieve different kinds of alignments, depending on the subject at hand. Generally good all-round alignments will be created by selecting anchors with the following characteristics:

  • Sequences with the most sequential similarities with all other members.

  • Sequences with the most peptides in common with all other members.

  • Longer sequences.

  • Sequences with many peptides.