In the rapidly evolving field of Ribonucleic Acid (RNA) secondary structure prediction, the integration of domain-specific knowledge with advanced machine learning techniques offers a promising pathway to enhance prediction accuracy, particularly in the context of limited quantitative high-quality data. This study introduces an innovative approach termed 'Structural Stability-Aware Deep Learning.' Initially, I developed an Energy Neural Network (EnergyNN) utilizing Convolutional Neural Networks (CNNs) and Graph Neural Networks (GCNs) to estimate the RNA's free energy from its sequence and secondary structure. An energy loss, calculated from both the real and predicted structures, was incorporated as an additional information into existing machine learning-based RNA secondary structure frameworks without altering their architectures. By leveraging the correlation between structural stability and free energy, this method integrates structural stability considerations into the machine learning model, significantly improving the accuracy, generalizability, and interpretability of RNA secondary structure predictions. I tested this integration on two different models, E2EFold and UFold. Rigorous testing against established datasets shows that my structural stability-aware model consistently outperforms the original E2EFold and UFold models in terms of prediction F1 score, precision, and recall. Moreover, this approach exhibits potential for broad applicability across various RNA types and machine learning methods, marking a significant advancement in the generalization of machine learning models for RNA research. This study not only progresses the field of RNA secondary structure prediction but also paves the way for future research that integrates domain knowledge into machine learning frameworks.
|