ProtSA is a protein structural property predictor that relies only on amino acid sequences: given a protein sequence, it outputs residue-level relasa (relative solvent-accessible surface area), plddt (local confidence), and sec (secondary structure H/E/C). The model is trained with a three-stage curriculum: Stage 1 learns stable residue-level structural semantic representations (mainly node regression and secondary structure recognition), then Stage 2 introduces more complete structural constraints for joint optimization. A leakage-free design ensures that both training and inference use sequence-side information only. The final version performs stably on test tasks, balancing practical accuracy and inference efficiency, and is suitable for large-scale sequence structural annotation frontends.
1. Protein Sequence (max 1024 aa):
Parsed sequences: 0, total residues: 0
Test Set Metrics (focused on relasa/plddt/sec)
============================================================
Model Metrics (Stage2, Sequence-only)
============================================================
[Residue-level Regression]
------------------------------------------------------------
Target MAE R2 Note
------------------------------------------------------------
relasa 0.0952 0.7540 relative solvent accessibility
plddt_norm 0.0327 0.7446 normalized pLDDT (0-1)
plddt(0-100) 3.27 - converted from plddt_norm
------------------------------------------------------------
rsa_pcc 0.8685051555514335
------------------------------------------------------------
============================================================
Secondary Structure Classification
============================================================
Class Support Prec Recall F1 Acc
--------------------------------------------------
H 913987 0.9557 0.9589 0.9573 0.9589
E 404837 0.9369 0.9107 0.9236 0.9107
C 1064603 0.9331 0.9403 0.9367 0.9403
--------------------------------------------------
Macro 0.9392 0.9424
Confusion Matrix (rows=true, cols=pred):
H E C
H 876424 779 36784
E 1168 368678 34991
C 39487 24059 1001057
Last updated: 2026-04-30