The benchmark datasets provided in nhKcr were used for modelling, which was obtained through the following steps: First, 19,287 experimentally proven Kcr proteins were filtered from the UniProt database; Then, the threshold of 0.3 was set in the CD-HIT package to remove redundant segments and further split into the samples with a fixed window length of 29; Finally, there are 12,262 positive samples and 60,101 negative samples collected (written as 12,262/60,101) for the training dataset and 3343/15,010 samples for the testing datasets. In addition, specific samples with autofilled residue “O” or sparse amino acid “U” were deleted to avoid interference. Ultimately, the training and testing datasets consisted of 12022/59226 and 3252/14792 samples in total, respectively. The correlated IR between Kcr and non-Kcr samples is approximately equal to 1:5.
Training samples: 12022/59226 ⇩
Scriptsg samples: 12022/59226 ⇩
The CNNs algorithm was implemented using TensorFlow and Keras. Associated Python scripts and models are provided at Git-Hub platform github.com/lijundou/iKcr_CNN.git to perform large-scale predictions and improvements locally. Here, we provided both CPU and GPU-based models. For fast prediction, it is recommended to install the TensorFlow package of GPU version if GPU is available.