Peptide Solubility Prediction

This tool use deep learning sequence-based prediction model for peptide solubility prediction.

The intended use of this tool is for peptides or proteins expressed in E. coli that are less than 200 residues long. May provide solubility predictions more broadly applicable.

It shows excellent performance in predicting short peptides (<50). The AUROC and accuracy are 95% and 91.3% respectively, outperforming the existing method DSResSol in predicting the solubility of short peptides.

The training data contains 18,453 sequences (47.6% positive and 52.4% negative), sourced from PROSO II. These data have a wide distribution of sequence lengths (18 - 198).

It has lower accuracy for long peptide sequences (>100).

Solubility was defined in PROSO II as sequence that was transfectable, expressible, secretable, separable, and soluble in E. coli system.

Input max-length: 200 residues long.

1. Enter a single peptide (raw sequence):

Full Length:0

Reference

  • Ansari M, White AD. Serverless Prediction of Peptide Properties with Recurrent Neural Networks. J Chem Inf Model. 2023 Apr 24;63(8):2546-2553. doi: 10.1021/acs.jcim.2c01317. Epub 2023 Apr 3. PMID: 37010950; PMCID: PMC10131225.
  • Investigating Active Learning and Meta-Learning for Iterative Peptide Design. J. Chem. Inf. Model., 2021.
  • PROSO II–a new method for protein solubility prediction. The FEBS journal , 2012.