PepSMI: Convert Peptide to SMILES string

Protein solubility is an important property in industrial and therapeutic applications. Prediction is a challenge, despite a growing understanding of the relevant physicochemical properties. This tool is for predicting protein solubility. Using available data for Escherichia coli protein solubility in a cell-free expression system, 35 sequence-based properties are calculated. Feature weights are determined from separation of low and high solubility subsets. The model returns a predicted solubility. Any scaled solubility value greater than 0.45 is predicted to have a higher solubility than the average soluble E.coli protein from the experimental solubility dataset Niwa et al 2009(10.1073/pnas.0811922106), and any protein with a lower scaled solubility value is predicted to be less soluble.

1. Enter a single protein (raw sequence):

Full Length:0


Predicting the soluble expression of proteins in E. coli is a challenging task. Solubility is highly dependent on the method of protein construction, protein expression conditions, protein origin, the need for post-translational modifications, and the sequence itself.

To understand this value more intuitively, we have calculated solubility values for several commonly used fusion tags

  • TrxA: 0.765
  • GST:0.416
  • SUMO:0.932
  • NusA:0.748
  • MBP:0.588
  • GFP:0.592
  • HaloTag:0.430
  • GB1:0.899


  • Hebditch M, Carballo-Amador M.A., Charonis S, Curtis R, Warwicker J. Protein-Sol: a web tool for predicting protein solubility from sequence. Bioinformatics (2017)