Desciption of the server

Transmembrane proteins (TMPs) are located in different membranes and they provide gates between the inner and outer side of cells or organelles. From a structural point of view, regions embedded in the membrane are highly ordered, however tails and connection loops may contains flexible segments that can serve as linkers, binding sites or exhibit short linear motifs.

MemDis is the first disordered prediction method that is specific to membrane proteins. Both during the construction of dataset, and training of our methos we considered the special characteristics of transmembrane proteins.

MemDis was trained using x-ray chrystallography data: we used the MobiDB database as source, and selected only those proteins, where 90% of the structures were in agreement. Electron densities, and therefore coordinates belonging to residues and segments that cannot have a stable structure are missing from the final structure – giving a complement indication of protein disorder. We selected these regions along with ordered segments and trained convolutional neural networks (CNNs) and bidirectional long-short term memory (LSTM) networks to predict disorder propensity (Figure 1).

Figure 1: Data preparation for the training of MEMDIS. First we selected protein fragments based on the available PDB information. Extracellular distant (distance from membrane >15 AA), proximal (<15AA) and intracellular distant, proximal residues from these fragments were fed into the appropriate CNN, also considering information from residues within 5AA from the residue of interest. The LSTM was trained on the full length protein fragments considering the preceding 10AA.

MemDis separates different subcellular localizations based on protein topology: tail and loop segments of membrane proteins are further divided into intra and extracellular, membrane proximal and distant regions – we trained different CNNs for each localization as segments are exposed to different environments, that can affect folding. In addition MemDis also considers membrane protein specific features, such as topology, length of membrane segments (carrying informormation about the tilt of the segments) and more.

We evaluated MemDis on an independent test set, that do not share similar sequences to the train and validation sets. We measured the accuracy of popular disordered prediction methods and compared them to MemDis sensitive and specific settings. MemDis outperforms currently available state-of-the-art methods on this membrane protein specific dataset (Table 1). Although some methods may offer similar or slightly higher specifcity, however they barely predict disordered residues at all. In contrast MemDis produces the highest MCC, AUC and Balanced accuracy values.


Method True Positive False Positive False Negative True Negative Balanced Accuracy Sensitivity Specificity Matthew's Correlation Coefficient Positive Prediction Value F1 Score Segment overlap Area Under Curve
Disembl rem465 1460 313 3199 5124 0.63 0.31 0.94 0.34 0.82 0.45 0.72 0.77
IUPred long 1441 264 3218 5173 0.63 0.31 0.95 0.35 0.85 0.45 0.63 0.78
Disembl hot loops 2350 1086 2309 4351 0.65 0.50 0.80 0.32 0.68 0.58 0.67 0.74
IUPred short 1650 340 3009 5097 0.65 0.35 0.94 0.37 0.83 0.50 0.67 0.79
Espritz DisProt 391 228 4268 5209 0.52 0.08 0.96 0.09 0.63 0.15 0.57 0.67
Espritz NMR 2217 762 2442 4675 0.67 0.48 0.86 0.37 0.74 0.58 0.72 0.75
Espritz X-ray 1758 351 2901 5086 0.66 0.38 0.94 0.38 0.83 0.52 0.73 0.76
GlobPlot 1476 611 3183 4826 0.60 0.32 0.89 0.25 0.71 0.44 0.60 0.45
MemDis specific 2220 291 2439 5146 0.71 0.48 0.95 0.49 0.88 0.62 0.76 0.84
MemDis sensitive 3258 1030 1401 4407 0.75 0.70 0.81 0.51 0.76 0.73 0.78 0.83


We also checked a handful of well defined examples where the output of MemDis is supported by literature evidences. (Figure 2).

Figure 2: Case studies using specific settings on the MemDis server A)    Syntaxin-1A is a nervous system protein playing role int he fusion of synaptic vesicles to the plasma membrane via formation of SNARE complex. Munc18a controls SNARE assembly through its interaction with the syntaxin N-peptide, which is disordered. The protein also contains a linker region between Habc and SNARE domains (18337752) B)    Integrin alpha-IIb is a receptor protein with a cytosolic disordered tail, exhibiting short linear motifs proposed to play role in SARS-COV-2 infection (3343649733436498) C)    Stannin is a small bitopic transmembrane protein, where a flexible linker provides connecttion between the CXC metal-binding motif and the 14-3-3-zeta binding domain (16246365). D)    GPCRs are a large family of receptor proteins with 7 transmembrane helices. N- and C-terminal regions, and the third intracellular loop (ICL3) is considered to be disordered. Their C-terminal and ICL3 segments mediate interactions with signaling partners. The role of N-terminal sites is not fully understood, however they exhibit many PTM sites, arguing that modifications might occur during sorting (25198166).

All: #jobs: 2131 (21022 seqs) .:|:. Last week: #jobs: 1 (1 seqs) .:|:. Current load: #jobs 0 (0 seqs)