ProtSol is a protein solubility prediction tool built on protein pretrained language models and deep learning networks. By inputting only protein amino acid sequences, users can quickly and accurately predict the solubility probability of proteins in the Escherichia coli expression system, providing efficient support for protein expression, purification, and structure-function studies. The tool is trained on the large-scale debiased UESolDS dataset and reaches state-of-the-art performance on independent test sets (AUC 0.83+, MCC 0.50+), outperforming traditional approaches and offering an efficient, easy-to-use solution for protein solubility prediction.
This tool does not require protein structural information or multiple-sequence alignment. Prediction is completed directly from amino acid sequences, with advantages in speed, usability, and generalization ability.
1. Protein Sequences (up to 10 FASTA entries):
Parsed sequences: 0, total residues: 0
The current model uses 0.84 as the threshold, giving the highest precision and specificity with the fewest false positives (i.e., minimizing insoluble proteins misclassified as soluble). This setting is suitable for users focused on lab purification who want to avoid wasting experimental resources, at the cost of misclassifying some soluble proteins as insoluble.
Model Performance Metrics
Impact of different thresholds on model performance:
Test(default thr=0.500) acc=0.6929 auc=0.8356 aupr=0.8493 f1=0.7451 mcc=0.4230 precision=0.6367 recall=0.8978 specificity=0.4880
Test(best_mcc thr=0.840) acc=0.7297 auc=0.8356 aupr=0.8493 f1=0.6631 mcc=0.4998 precision=0.8792 recall=0.5323 specificity=0.9269
Test(best_f1 thr=0.460) acc=0.6778 auc=0.8356 aupr=0.8493 f1=0.7418 mcc=0.4097 precision=0.6188 recall=0.9259 specificity=0.4299
BENCHMARK Performance Metrics (Threshold = 0.84)
Class | Samples | Precision | Recall | F1 | MCC | AUC | AUPR | Specificity
----------------------------------------------------------------------------------------------------------------
Overall | 3995 | 0.8798 | 0.5317 | 0.6629 | 0.5000 | 0.8356 | 0.8493 | 0.9274
================================================================================================================
Last updated: 2026-04-09