Comparison of Clustering Algorithms: Fuzzy C-Means, K-Means, and DBSCAN for House Classification Based on Specifications and Price
Abstract
This study aims to compare the performance of three clustering algorithms, namely Fuzzy C-Means, K-Means, and DBSCAN, in grouping houses based on their specifications and prices. The data used includes features such as price, building area, land area, number of bedrooms, number of bathrooms, and availability of garages. The performance of these algorithms was evaluated using Silhouette Score and Davies-Bouldin Score to determine the quality of cluster separation. The results indicate that K-Means achieved the best performance with the highest Silhouette Score of 0.7702 for two clusters, followed by Fuzzy C-Means, which excelled in handling overlapping clusters. DBSCAN, while effective in detecting outliers, showed suboptimal performance for this housing dataset. These findings suggest that K-Means is the most suitable clustering method for housing data, while Fuzzy C-Means and DBSCAN can serve as alternatives depending on the data characteristics. This research is expected to assist in making the house searching and classification process more efficient and provide additional insights for developers in shaping housing market strategies.
References
[2] H. Ward, “Foreword,” in Revitalizing Residential Care for Children and Youth, Oxford University Press, 2022, pp. xi–xvi. doi: 10.1093/oso/9780197644300.002.0008.
[3] B. Harsanto, Dasar-Dasar Manajement Operasi: Konsep, Batang Tubuh Ilmu dan Industri 4.0, 2nd ed. Jakarta: KENCANA, 2022.
[4] J. Ilmiah and U. Muhammadiyah, “Sang pencerah,” pp. 504–516, 2024.
[5] A. Widyastuti, “Prediksi Harga Rumah Sesuai Spesifikasi Menggunakan Metode Multiple Linear Regression,” vol. 4, no. 1, pp. 30–35, 2024, [Online]. Available: http://ejurnal.unim.ac.id/index.php/submit/article/download/3343/1556
[6] I. Mirzaya Putra, Pengembangan Wilayah, Pertama. Medan: CV. Prokreatif, 2023.
[7] T. L. Putri et al., “Penerapan data mining pada clustering data harga rumah dki jakarta menggunakan algoritmak-means,” vol. 8, no. 1, pp. 1174–1179, 2024.
[8] N. Hendrastuty, “Penerapan Data Mining Menggunakan Algoritma K-Means Clustering Dalam Evaluasi Hasil Pembelajaran Siswa,” vol. 3, pp. 46–56, 2024.
[9] J. Saputra, M. Iqbal, A. Aksha, and L. Maryani, “EXPLORE – Volume 14 No 2 Tahun 2024 Terakreditasi Sinta 5 SK No : 23 / E / KPT / 2019 Analisis Perbandingan Efektivitas Metode Fuzzy C-Means dan K-Means dalam Mengelompokkan Buku Berdasarkan Frekuensi Peminjaman di Perpustakaan SMKN 1 Mandau EXPLORE – Vol,” vol. 14, no. 2, pp. 87–92, 2024.
[10] S. Butsianto and N. T. Mayangwulan, “Penerapan Data Mining Untuk Prediksi Penjualan Mobil Menggunakan Metode K-Means Clustering,” vol. 3, no. 3, pp. 187–201, 2020.
[11] M. A. Pryono, S. H. Wijoyo, and F. A. Bachtiar, “Analisis Sentimen Terhadap Program Merdeka Belajar Kampus Merdeka Pada Sosial Media Twitter Menggunakan K-Means Clustering , Support Vector Machine ( SVM ) dan Syntethic Minority Oversampling Technique ( SMOTE ),” vol. 1, no. 1, pp. 1–10, 2017.
[12] F. M. Pranata, S. H. Wijoyo, and N. Y. Setiawan, “Analisis Performa Algoritma K-Means dan DBSCAN Dalam Segmentasi Pelanggan Dengan Pendekatan Model RFM,” vol. 1, no. 1, pp. 1–9, 2017.
[13] R. F. Almahdy and W. M. P. D, “Prediksi Harga Rumah Di Kabupaten Bantul Menggunakan Algoritma Support Vector Regression,” vol. 11, no. 2, pp. 152–165, 2024.
[14] I. H. Zahro, U. A. Rosyidah, and L. Handayani, “Implementasi Algoritma Fuzzy C-Means untuk Pengelompokkan Provinsi di Indonesia Berdasarkan Kualitas Perguruan Tinggi,” BIOS J. Teknol. Inf. dan Rekayasa Komput., vol. 5, no. 1, pp. 80–86, 2024, doi: 10.37148/bios.v5i1.102.
[15] W. Anggara, “Daftar Harga Rumah.” Accessed: Jul. 09, 2024. [Online]. Available: https://www.kaggle.com/datasets/wisnuanggara/daftar-harga-rumah/data
[16] R. Dalam, M. Anggaran, B. Manajemen, and P. S. Informasi, “Komparasi Multiple Linear Regression dan Random Forest Regression Dalam Memprediksi Anggaran Biaya Manajemen Proyek Sistem Informasi,” vol. 3, no. 2, pp. 86–97, 2024.
[17] Nasution, A. Lestari, and R. N. S. Fatonah, Klasifikasi Kondisi Peralatan Elektronik Metode Gaussian Naïve Bayes. Penerbit Buku Pedia, 2023.
[18] M. Boull, “Two-level histograms for dealing with outliers and heavy tail distributions,” 2023.
[19] A. Erdely and M. Rubio-sánchez, “Visual analysis of bivariate dependence between continuous random variables”.
[20] S. Shah, M. Telrandhe, P. Waghmode, and S. Ghane, “Imputing missing values for Dataset of Used Cars,” in 2022 2nd Asian Conference on Innovation in Technology (ASIANCON), 2022, pp. 1–5. doi: 10.1109/ASIANCON55314.2022.9908600.
[21] A. L. Nogueira and C. S. Munita, “The effect of data standardization in cluster analysis,” pp. 1–15, 2021.
[22] T. Malatesta, Q. Li, and J. K. Breadsell, “Distinguishing Household Groupings within a Precinct Based on Energy Usage Patterns Using Machine Learning Analysis,” 2023.
[23] O. N. Purba, D. N. Sitompul, T. H. Harahap, S. R. Dewi, and R. F. Siregar, “Application of Fuzzy C-Means Algorithm for Clustering Customers,” pp. 0–10, 2023.
[24] T. D. Pangestu, V. Y. Ardila, M. Suteja, and S. P. Barus, “Klasterisasi Hewan berdasarkan Morfologi dengan K-Means Klastering untuk Memudahkan Pemahaman Taksonomi Hewan Klastering Animals based on Morphology with K-Means Klastering to Facilitate Understanding of Animal Taxonomy,” vol. 14, no. 2, pp. 10–20, 2024.
[25] O. Kulkarni and A. Burhanpurwala, “A Survey of Advancements in DBSCAN Clustering Algorithms for Big Data,” in 2024 3rd International conference on Power Electronics and IoT Applications in Renewable Energy and its Control (PARC), 2024, pp. 106–111. doi: 10.1109/PARC59193.2024.10486339.
[26] B. E. Adiana, I. Soesanti, A. E. Permanasari, J. G. No, J. G. No, and J. G. No, “Analisis Segmentasi Pelanggan Menggunakan Kombinasi RFM Model dan Teknik Clustering,” no. 2, pp. 23–32, 2018, doi: 10.21460/jutei.2017.21.76.
[27] A. Nowak-brzezi, “How the Outliers Influence the Quality of Clustering?,” 2022.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Penulis yang telah mempublikasikan artikel pada JAIC menyatakan setuju bahwa:
1. Artikel belum dan tidak pernah dipublikasikan sebelumnya pada jurnal ilmiah lain, prosiding ataupun jurnal elektronik lainnya.
2. Artikel yang telah diserahkan menjadi hak penuh kepada pengelola JAIC Politeknik Negeri Batam
3. Artikel diperbolehkan untuk dishare ke khalayak untuk meningkatkan produktivitas rujukan dan sitasi dari naskah yang telah terbit.