Protein solubility is an important property in industrial and therapeutic applications. Prediction is a challenge, despite a growing understanding of the relevant physicochemical properties. This tool is for predicting protein solubility. Using available data for Escherichia coli protein solubility in a cell-free expression system, 35 sequence-based properties are calculated. Feature weights are determined from separation of low and high solubility subsets. The model returns a predicted solubility. Any scaled solubility value greater than 0.45 is predicted to have a higher solubility than the average soluble E.coli protein from the experimental solubility dataset Niwa et al 2009(10.1073/pnas.0811922106), and any protein with a lower scaled solubility value is predicted to be less soluble.
1. Enter a single protein (raw sequence):
Predicting the soluble expression of proteins in E. coli is a challenging task. Solubility is highly dependent on the method of protein construction, protein expression conditions, protein origin, the need for post-translational modifications, and the sequence itself.
To understand this value more intuitively, we have calculated solubility values for several commonly used fusion tags