11/19/2014: Presentation: Ruben Acuña, Zoé Lacroix, Jacques Chomilier, and Nikolaos Papandreou. SMIR: a method to predict the residues involved in the core of a protein. European Conference on Computational Biology 2014, 7 - 10 Sep 2014, E03. Now available online.
See more news.
The project SPROUTS (Structural Prediction for PRotein FOlding UTility System) was initiated in 2006 with the aim of compairing and integrating various structural analyses and producing a combined view of the results to the scientists. The first database compiled in 2008 presented data that capture representative folds and results related to the prediction of critical residues expected to belong to the folding nucleus of 429 structures produced by seven programs. The complexity required to manage seven different tools, the execution time, and the size of the results motivated the development of a database to organize the data and provide a meaningful interface to the scientist.
Originally, 10 structures corresponding to a total of 1211 amino acids had been processed by 5 different programs for the 19 possible mutations on each amino acid. Thus, the output data at the end of the experiment consisted in 115045 pieces of data. The execution of the different programs produced one file per amino acid for a total of more than 7200 files to manipulate. It was clear that this solution is not conceivable for open and easy access. The need of a database is mandatory and also offers the opportunity to provide this information for the whole scientific community.
After our work on making the data publically accessible, the database had grown and consisted of more than 200 structures which had been computed for a total of around 16500 amino acids. The second aim of our work was to offer simple and user-friendly tools to better visualize and analyze the results obtained. We created three visualization and analysis methods: the first one consists of displaying raw ΔΔG values in a table. The second one is a 2D graph representation of a computed stability score for each residue of a given sequence and for each tool. The last one is based on a Jmol applet with the ablity to represent the 3D structure of a given protein with symbols representing the information stored in the database. The visualization modes offer different ways to look at on the data stored in the database and will suit scientists willing to query the database whether they are more used to handling 3D protein structure or 1D/2D sequence problems. While this version of the system supported scientist interactions with protein data, scientists were to limited to data for proteins that were already in the database or had to contact the project team to request that the database be extended to their proteins of interest.
It became apparent that the database needed to be revised by providing a submission server for users to submit their proteins and populate the SPROUTS database. For that purpose, the script-based system originally used to compute data was turned into a full scientific workflow with online submission. With this new addition, our database has growned to nearly 900 proteins.
We plan to continue our work on integrating these data and their analyses with other structural bioinformatic concepts in order to improve other methods that may be related to this concept. Our aim is to provide a meta server devoted to the characterisation of the folding core of proteins. Our particular goals include adding additional tools, improving visualization methods, integerating other database, and add new related concepts. We invite you to contact us regarding any tool(s) you think would aid in improving our system.
SPROUTS has two general functions.
The first is to provide existing mutation data given a protein specified by a PDB ID. This is Query mode.
The second is to generate new mutation data based on a new PDB ID or a user input. This is Submit mode
When using SPROUTS, please reference: Lonquety, M., Lacroix, Z., Papandreou, N., and Chomilier, J. (2009) SPROUTS: a database for the evaluation of protein stability upon point mutation. Nucleic Acids Res., 37, D374-D379.