Protein solubility is an important property in industrial and therapeutic applications. Prediction is a challenge, despite a growing understanding of the relevant physicochemical properties. This tool is for predicting protein solubility. Using available data for Escherichia coli protein solubility in a cell-free expression system, 35 sequence-based properties are calculated. Feature weights are determined from separation of low and high solubility subsets. The model returns a predicted solubility. Any scaled solubility value greater than 0.45 is predicted to have a higher solubility than the average soluble E.coli protein from the experimental solubility dataset Niwa et al 2009(10.1073/pnas.0811922106), and any protein with a lower scaled solubility value is predicted to be less soluble.
1. Enter a single protein (raw sequence):
Description
Predicting the soluble expression of proteins in E. coli is a challenging task. Solubility is highly dependent on the method of protein construction, protein expression conditions, protein origin, the need for post-translational modifications, and the sequence itself.
To understand this value more intuitively, we have calculated solubility values for several commonly used fusion tags
- TrxA: 0.765
- GST:0.416
- SUMO:0.932
- NusA:0.748
- MBP:0.588
- GFP:0.592
- HaloTag:0.430
- GB1:0.899
Reference
- Hebditch M, Carballo-Amador M.A., Charonis S, Curtis R, Warwicker J. Protein-Sol: a web tool for predicting protein solubility from sequence. Bioinformatics (2017)