Table of Contents
A short introduction to ProteinCenter
ProteinCenter is a protein centric bioinformatics system that integrates data and tools and serves as a powerful personal discovery workbench. Analysing the exponentially growing amount of data is one of the biggest challenges in the life science industry today. To deal with this challenge solutions are needed that integrate disperse datasets into a highly controlled and consolidated information repository - a criterion for applying sophisticated data analysis tools successfully. ProteinCenter is such a solution, incorporating expert knowledge at the data integration level as well. It is a compact standalone bioinformatics system integrating experimental proteomics data with biomedical annotation.
ProteinCenter gives small to medium size research groups an easily maintainable in-house bioinformatics information system for storage, analysis and data mining of proprietary proteomics data in the context of public sequence information. ProteinCenter enables semi-automated analysis and data processing directed toward speeding up drug discovery as well other types of proteomics analysis of protein identifications. ProteinCenter solves many of the main issues associated with analysis of results from proteomics experimentation. For example, it facilitates a very easy transformation of lists of protein identifiers to useful biological information as shown in the simple workflow diagram below. Some of the advantages to the system is that it enables:
Very fast and easy analysis of large sets of proteins
Functional analysis, localization analysis, tryptic peptide analysis
Comparisons of large protein datasets - independent of source - UniProt, Swiss-Prot, Trembl, PIR, IPI, NCBI, RPF, PDB, Embl, Ensembl, TPE, TPG, SGD, FlyBase, TAIR, PlasmoDB etc.
A very powerful data integration framework minimizing propagation of cross referencing errors
Easy creation of summary statistics and reports
Figure 3.1. ProteinCenter transforms lists of experimentally identified proteins to useful biological information
ProteinCenter contains protein sequences and accessions keys from eukaryotic and prokaryotic protein records in UniProt, Swiss-Prot, Trembl, PlasmoDB, CMR, TubercuList, TriTrypDB, PSE, PIR, IPI, NCBI, RPF, PDB, Embl, Ensembl, TPE, TPG & TAIR, SGD, FlyBase.
There are four concepts that describes the main ideas behind ProteinCenter.
Simply put, the four concepts describes how ProteinCenter is a single piece of bioinformatics software allowing a wide range of tools and incongruous data sources to collaborate efficiently, automating all the time consuming tasks of keeping track of data relations to assure a quick and smooth transition from lists of data to biological answers.
The handling of disperse data often take a very large proportion of the time spent on bioinformatics analysis. To obtain a biological overview of a set of proteins, information must be gathered from a number of tools and databases, a task which besides the actual retrieval of information also requires a great deal of consolidation of results, removal of redundancy and regular bookkeeping to keep track of outdated ProteinIDs and the relationships between proteins associated with these ProteinIDs.
In ProteinCenter, a range of databases, datasets and predicted results from disparate sources has been consolidated into a single data source devoid of redundancy. To ensure that cross-referencing errors are not propagated, data are integrated by remapping the references using the original data wherever possible. The system automatically keeps track of relations between ProteinIDs and sequences, which allows a much broader coverage of proteins being annotated. Furthermore, well chosen sets of parameters are used in the precomputed data analysis using only high quality tools which ensures a comprehensive and reliable annotation. Thus, bioinformatics expertise is built into the system, with the intention to allow the user to explore experimental datasets in the context of easy accessible data and tools.
One major aim of ProteinCenter is to allow users spend their time on resolving the complexity of biology rather than on the complexity of bioinformatics data formats.
One principal task for ProteinCenter is to help transform lists of proteins to biological discoveries in a fast and efficient manner. The idea is to ensure a smooth workflow from the protein identification to the biological story hidden in the dataset. This implies:
Smooth import of experimental datasets (including those derived from the literature)
Smooth integration of experimental data with the biological annotation
Easy application of filters to isolate subsets of data of particular interest
Easy comparison of datasets
Fast and easy removal of redundancy in experimental data
Easy access to detailed information and original data sources
Easy export of final results as datasets or reports
The goal of A-Z analysis is that you as a user may take your set of proteins and do a number of analyses leading you to answers that closes a given workflow. Some examples include:
Quickly import and evaluate experimental protocols - checking whether the outcome of analysis matches the experiment, e.g. to ask questions like: Did my prefractionation isolate primarily mitochondrial proteins? Or have I isolated a set of small proteins?
Import a number of datasets and compare them to find fractions of data that are specific to a certain dataset or common to all. Such analysis can readily be done using ProteinCenter because the integration component and the clustering functionality allows for a comprehensive comparison of datasets without ever worrying if two identifiers represent e.g. alleles or fragments of a given protein.
Import a highly redundant dataset (e.g. due to MS data not distinguishing various isoforms) and get a reliable overview of the distribution of e.g. subcellular components, function etc.
Redo analysis again and again (e.g. using different filters to exclude or include particular data)
These examples includes some of the ProteinCenter functionality that ensures efficiency. E.g. the clustering functionality to deal with redundancy in a single step (The clustering can be used as is although it is also possible to refine it further). E.g. the import of data originating from different source data bases which may be merged and compared, since the relationship between ProteinID's is automatically handled. The instant analysis of e.g. subcellular localization is possible because comprehensive annotation is already integrated with the sequence data.
Automation is a key component in ProteinCenter and is the way to ensure that users can focus on the biological analysis. In biology there will always be a need for additional tools for particular specialized analysis, but in the domain of ProteinCenter, there is no need to worry about keeping track of computing infrastructure and organization of tools and data. All information for a given protein is consolidated into one record - enough to provide rich non-redundant information and still not compromising the important ability to allow distinction between different isoforms. Information is automatically consolidated and redundancy is removed. This allows an enrichment of the annotation, which ensures consistency and comprehensiveness.
Data is automatically updated and integrated in the system, and essential bioinformatics analyses and predictions are automatically computed. No IT staff is needed.
The user can add, analyse and compare a number of dataset from different sources and feel assured that the experimental data are treated consistently and analysed in the context of up-to-date and comprehensive data.
Bioinformatics data analysis of large sets of proteins generally often involve a large number of data files, web browser windows and tools. It is easy to loose track of where you are at.
In ProteinCenter a plain and yet powerful interface has been implemented. The components of the interface are shown in the figure below. At the top, there is a navigation bar with panes to navigate the major and minor modules of the system. On the left, there is a choice between three components, 1) a search menu, 2) A shopping basket for storing of interesting proteins and 3) the workspace for protein datasets. On the right is the area in which data and tools and shown on top of which is the selection menus for application of filters. With this setup it is possible to keep track of which data is analysed and the context of these data. This includes handling of multiple stored datasets, while building up new datasets (or creating some on the fly) and exploring specific proteins (and their similarity neighbors).
See more details on the interface in the individual chapters of this user manual.
© 2005-2017 Thermo Fisher Scientific