Comparison of Machine Learning Models for Heart Disease Classification with Web-Based Implementation
Abstract
Heart disease has become one of the most concerning diseases in Indonesia according to research published in 2018 by the Health Ministry of Indonesia. Based on said research, 15 out of 1000 Indonesians suffer from heart disease. Furthermore, according to data published by the Health Ministry of Indonesia, 3 million premature deaths (under 60 years old) occurred in 2013 due to heart disease. Therefore, this research aims to develop a web-based system designed to aid health workers in screening for heart diseases and producing early diagnosis. In developing this system, 5 models were evaluated based on performance and the model with the best metrics was selected to be used in the final system. These models were: Logistic Regression, Decision Tree, Random Forest, Naïve Bayes, and K-Nearest Neighbours. SMOTE and ADASYN was also used to deal with imbalanced data, and the resulting balanced data was used as additional training scenarios in order to compare the result with algorithms trained using imbalanced data. Cross validation, accuracy, precision, recall, f1-score, and ROC with AUC were set as evaluation metrics. Results show that Random Forest trained with data balanced using ADASYN achieved the highest AUC score of 0.920. Meanwhile, Logistic Regression scored lowest with an AUC score of 0.500. These results indicate that Random Forest is the most suitable for this system Therefore, Random Forest was selected as the algorithm to be used in the final system. Furthermore, this system has been tested successfully using the black-box method and is ready to be implemented.
References
[2] T. A. Munandar and A. Q. Munir, “Implementasi K-Nearest Neighbor Untuk Prototype Sistem Pakar Identifikasi Dini Penyakit Jantung K-Nearest Neighbor for Prototype Expert System for Early Identification of Heart Disease,” Jurnal Teknologi Informasi, vol. XVII, no. 2, pp. 44–50, 2022.
[3] B. Hirwono, A. Hermawan, and D. Avianto, “Implementasi Metode Naïve Bayes untuk Klasifikasi Penderita Penyakit Jantung,” Jurnal Teknologi Informasi dan Komunikasi), vol. 7, no. 3, pp. 451–457, 2023, doi: 10.35870/jti.
[4] A. Riani, Y. Susianto, and N. Rahman, “Implementasi Data Mining Untuk Memprediksi Penyakit Jantung Mengunakan Metode Naive Bayes,” Journal of Innovation Information Technology and Application (JINITA), vol. 1, no. 01, pp. 25–34, Dec. 2019, doi: 10.35970/jinita.v1i01.64.
[5] J. Waruwu and A. Dharma, “Perbandingan Algoritma Klasifikasi Pada Pasien Penyakit Jantung Comparison Of Classification Algorithms In Heart Disease Patients,” Journal of Information Technology and Computer Science (INTECOMS), vol. 7, no. 5, 2024, [Online]. Available: https://www.kaggle.com/datasets/mexw
[6] I. Optimasi et al., “Implementasi Optimasi Hyperparameter GridSearchCV Pada Sistem Prediksi Serangan Jantung Menggunakan SVM,” Online) Teknologi: Jurnal Ilmiah Sistem Informasi, vol. 13, no. 1, pp. 8–15, 2023, doi: 10.26594/teknologi.v13i1.3098.
[7] D. Sitanggang, N. Nicholas, V. Wilson, A. R. A. Sinaga, and A. D. Simanjuntak, “Implementasi Data Mining untuk Memprediksi Penyakit Jantung Menggunakan Metode K-Nearest Neighbor dan Logistic Regression,” Jurnal Teknik Informasi dan Komputer (Tekinkom), vol. 5, no. 2, p. 493, Dec. 2022, doi: 10.37600/tekinkom.v5i2.698.
[8] D. P. Utomo and M. Mesran, “Analisis Komparasi Metode Klasifikasi Data Mining dan Reduksi Atribut Pada Data Set Penyakit Jantung,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 4, no. 2, p. 437, Apr. 2020, doi: 10.30865/mib.v4i2.2080.
[9] H. Azis, P. Purnawansyah, F. Fattah, and I. P. Putri, “Performa Klasifikasi K-NN dan Cross Validation Pada Data Pasien Pengidap Penyakit Jantung,” ILKOM Jurnal Ilmiah, vol. 12, no. 2, pp. 81–86, Aug. 2020, doi: 10.33096/ilkom.v12i2.507.81-86.
[10] A. Sari, A. Sihananto, and D. Prasetya, “Implementasi Metode K-NN dalam Klasterisasi Kasus Kesehatan Jantung,” ALINIER JURNAL, vol. 3, no. 2, pp. 95–99, 2022, [Online]. Available: www.elektro.itn.ac.id
[11] Sahar, “Analisis Perbandingan Metode K-Nearest Neighbor dan Naïve Bayes Classiffier pada Data Set Penyakit Jantung,” Indonesian Journal of Data and Science (IJODAS), vol. 1, no. 3, pp. 79–86, 2020.
[12] R. Pranandito, “Pperbandingan Pprediksi Peyakit Serangan Jantung Menggunakan Model Machine Learning,” JIPI (Jurnal Ilmiah Penelitian dan Pembelajaran Informatika), vol. 8, no. 4, pp. 1228–1237, 2023, doi: 10.29100/jipi.v8i4.4165.
[13] M. G. Pradana, P. H. Saputro, and D. P. Wijaya, “Komparasi Metode Support Vector Machine dan Naive Bayes Dalam Klasifikasi Peluang Penyakit Serangan Jantung,” Indonesian Journal of Business Intelligence (IJUBI), vol. 5, no. 2, p. 87, Dec. 2022, doi: 10.21927/ijubi.v5i2.2659.
[14] J. Pangaribuan, C. Tedja, and S. Wibowo, “Perbandingan Metode Algoritma C4.5 dan Extreme Learning Machine untuk Mendiagnosis Penyakit Jantng Korner,” Informatics Engineering Research And Technology, vol. 1, no. 1, pp. 1–7, 2019.
[15] I. Kusuma and N. Cahyono, “Analisis Sentimen Masyarakat Terhadap Penggunaan E-Commerce Menggunakan Algoritma K-Nearest Neighbor,” Jurnal Pengembangan IT (JPIT), vol. 8, no. 3, pp. 302–307, 2023.
[16] B. Rachmat, A. Suwarisman, I. Afriyanti, A. Wahyudi, and D. Saputra, “Analisis Sentimen Complaindan Bukan Complainpada Twitter Telkomsel dengan SMOTEdan Naïve Bayes,” Jurnal Teknologi Informasi dan Komunikasi), vol. 7, no. 1, pp. 107–113, 2023, doi: 10.35870/jti.
[17] J. Al Amien, Yoze Rizki, and Mukhlis Ali Rahman Nasution, “Implementasi Adasyn Untuk Imbalance Data Pada Dataset UNSW-NB15 Adasyn Implementation For Data Imbalance on UNSW-NB15 Dataset,” Jurnal CoSciTech (Computer Science and Information Technology), vol. 3, no. 3, pp. 242–248, Dec. 2022, doi: 10.37859/coscitech.v3i3.4339.
[18] A. Watratan, A. B, and D. Moeis, “Implementasi Algoritma Naive Bayes Untuk Memprediksi Tingkat Penyebaran Covid-19 Di Indonesia,” Journal of Applied Computer Science and Technology (JACOST), vol. 1, no. 1, pp. 7–14, 2020, [Online]. Available: http://journal.isas.or.id/index.php/JACOST
[19] Y. A’yunan, U. Indahyanti, and S. Busono, “Implementasi Data Mining Dalam Klasifikasi Diagnosa Kanker Payudara Menggunakan Algoritma Logistic Regresion,” Jurnal TEKINKOM, vol. 6, no. 2, pp. 400–407, 2023, doi: 10.37600/tekinkom.v6i2.948.
[20] Hozairi, Anwari, and S. Alim, “Implementasi Orange Data Mining untuk Klasifikasi Kelulusan Mahasiswa dengan Model K-Nearest Neighbor, Decision Tree serta Naive Bayes,” Jur nal Ilmiah NERO, vol. 6, no. 2, pp. 133–144, 2021.
[21] G. Mursianto, I. Falih, M. Irfan, T. Sakinah, and D. Sandya, “Perbandingan Metode Klasifikasi Random Forest dan XGBoost Serta Implementasi Teknik SMOTE pada Kasus Prediksi Hujan,” Seminar Nasional Mahasiswa Ilmu Komputer dan Aplikasinya (SENAMIKA), pp. 41–50, 2021.
[22] D. P. Sinambela, H. Naparin, M. Zulfadhilah, and N. Hidayah, “Implementasi Algoritma Decision Tree dan Random Forest dalam Prediksi Perdarahan Pascasalin,” Jurnal Informasi dan Teknologi, vol. 5, no. 3, pp. 58–64, Sep. 2023, doi: 10.60083/jidt.v5i3.393.
[23] A. Khairi, A. Fais Ghozali, and A. Darul Nur Hidayah, “Implementasi K-Nearest Neighbor (KNN) untuk Klasifikasi Masyarakat Pra Sejahtera Desa Sapikerep Kecamatan Sukapura,” TRILOGI: Jurnal Ilmu Teknologi, Kesehatan, dan Humaniora, vol. 2, no. 3, pp. 319–323, 2021.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Penulis yang telah mempublikasikan artikel pada JAIC menyatakan setuju bahwa:
1. Artikel belum dan tidak pernah dipublikasikan sebelumnya pada jurnal ilmiah lain, prosiding ataupun jurnal elektronik lainnya.
2. Artikel yang telah diserahkan menjadi hak penuh kepada pengelola JAIC Politeknik Negeri Batam
3. Artikel diperbolehkan untuk dishare ke khalayak untuk meningkatkan produktivitas rujukan dan sitasi dari naskah yang telah terbit.