Methods. 2021 Sep 21. pii: S1046-2023(21)00222-X. [Epub ahead of print]
Protein adenosine diphosphate-ribosylation (ADPr) is caused by the covalent binding of one or more ADP-ribose moieties to a target protein and regulates the biological functions of the target protein. To fully understand the regulatory mechanism of ADP-ribosylation, the essential step is the identification of the ADPr sites from the proteome. As the experimental approaches are costly and time-consuming, it is necessary to develop a computational tool to predict ADPr sites. Recently, serine has been found to be the major residue type for ADP-ribosylation but no predictor is available. In this study, we collected thousands of experimentally validated human ADPr sites on serine residue and constructed several different machine-learning classifiers. We found that the hybrid model, dubbed DeepSADPr, which integrated the one-dimensional convolutional neural network (CNN) with the One-Hot encoding approach and the word-embedding approach, compared favourably to other models in terms of both ten-fold cross-validation and independent test. Its AUC values reached 0.935 for ten-fold cross-validation. Its values of sensitivity, accuracy and Matthews's correlation coefficient reached 0.933, 0.867 and 0.740, respectively, with the fixed specificity value of 0.80. Overall, DeepSADPr is the first classifier for predicting Serine ADPr sites, which is available at http://www.bioinfogo.org/DeepSADPr.
Keywords: ADP-ribosylation; Post-translational modification; convolutional neural network; deep learning