Enhancing Heart Disease Prediction with Data Augmentation and ML Classifiers

Publications

Enhancing Heart Disease Prediction with Data Augmentation and ML Classifiers

Enhancing Heart Disease Prediction with Data Augmentation and ML Classifiers

Year : 2025

Publisher : Institute of Electrical and Electronics Engineers Inc.

Source Title : 2025 International Conference on Artificial Intelligence and Machine Vision, AIMV 2025

Document Type :

Abstract

Heart disease is a significant cause of death worldwide, and early prediction is vital for prevention and treatment. This project uses the Framingham Heart Study dataset for the early prediction of Coronary Heart Disease (CHD) using machine learning methods. The Framingham Heart Study is a highly unbalanced dataset, with only 16 % cases of CHD, which impacts the accuracy of the model. To overcome this, data augmentation techniques such as SMOTE and cGAN are applied to create synthetic cases of CHD. The machine learning algorithms that are compared: Random Forest, XGBoost, SVM, and MLP. XGBoost has achieved the highest AUC-ROC of 0.973 when cGAN-augmented data is used, while cGAN-augmented data improves recall and overall model performance significantly. This study identifies the potential for combining machine learning with data augmentation to improve CHD prediction.