PepSMI: Convert Peptide to SMILES string

Protein solubility is an important property in industrial and therapeutic applications. Prediction is a challenge, despite a growing understanding of the relevant physicochemical properties. This tool is for predicting protein solubility. Using available data for Escherichia coli protein solubility in a cell-free expression system, 35 sequence-based properties are calculated. Feature weights are determined from separation of low and high solubility subsets. The model returns a predicted solubility. Any scaled solubility value greater than 0.45 is predicted to have a higher solubility than the average soluble E.coli protein from the experimental solubility dataset Niwa et al 2009(10.1073/pnas.0811922106), and any protein with a lower scaled solubility value is predicted to be less soluble.

1. Enter a single protein (raw sequence):

Full Length:0

Description

Predicting the soluble expression of proteins in E. coli is a challenging task. Solubility is highly dependent on the method of protein construction, protein expression conditions, protein origin, the need for post-translational modifications, and the sequence itself.

To understand this value more intuitively, we have calculated solubility values for several commonly used fusion tags

TrxA: 0.765
GST:0.416
SUMO:0.932
NusA:0.748
MBP:0.588
GFP:0.592
HaloTag:0.430
GB1:0.899

Reference

Hebditch M, Carballo-Amador M.A., Charonis S, Curtis R, Warwicker J. Protein-Sol: a web tool for predicting protein solubility from sequence. Bioinformatics (2017)