Detecting Fake Reviews Using BERT and Sublinear_TF Methods on Hotel Reviews in the Lombok Tourism Area

  • Zulpan Hadi Universitas Teknologi Mataram
  • M. Zulpahmi Universitas Teknologi Mataram
  • Zaenudin . Universitas Teknologi Mataram
  • Akmaludin Asrory Universitas Teknologi Mataram

Abstract

The number of visitors to Lombok, one of the famous tourist destinations in Indonesia, increased from 400,595 in 2020 to 1,376,295 in 2022. Although the government supports the hotel industry, fake reviews are a significant problem that can damage hotel reputations and mislead tourists. This study uses BERT and Sublinear_TF feature extraction techniques to analyze fake reviews from three main areas: Gili Trawangan, Senggigi, and Kuta. BERT detects fake reviews by understanding the context of words, while Sublinear_TF emphasizes more informative words by reducing the weight of irrelevant common words. The results showed that the more extensive and diverse dataset from Gili Trawangan had the best classification results. The combination of BERT and Random Forest achieved the highest accuracy of 0.84. Overall, BERT excels in Gili Trawangan with an accuracy of 0.79 for SVM and 0.84 for Random Forest. In contrast, smaller and more homogeneous datasets such as Senggigi and Kuta have lower accuracy. In addition, Sublinear_TF performed well on Gili Trawangan with an accuracy of 0.82 using SVM and 0.83 using Random Forest; however, its performance declined in Senggigi and Kuta. BERT and Sublinear_TF techniques are more effective on large and diverse datasets such as Gili Trawangan. Sublinear_TF is better for varied data but less effective on more homogeneous datasets, while BERT with Random Forest showed the highest accuracy due to its ability to capture broader language context. This suggests that the size and variety of the dataset highly influence the success of fake review classification techniques.

References

[1] Dinas Pariwisata NTB, “Jumlah Kunjungan Wisatawan ke Provinsi Nusa Tenggara Barat (NTB) | Satu Data NTB,” Ntbprov.Go.Id. [Online]. Available: file:///E:/POLTEKPAR/PROYEK AKHIR/Jumlah Kunjungan Wisatawan ke Provinsi Nusa Tenggara Barat (NTB) _ Satu Data NTB.html
[2] G. S. Budhi, R. Chiong, and Z. Wang, “Resampling imbalanced data to detect fake reviews using machine learning classifiers and textual-based features,” Multimed. Tools Appl., vol. 80, pp. 13079–13097, 2021.
[3] R. Barbado, O. Araque, and C. A. Iglesias, “A framework for fake review detection in online consumer electronics retailers,” Inf. Process. Manag., vol. 56, no. 4, pp. 1234–1244, 2019, doi: 10.1016/j.ipm.2019.03.002.
[4] Z. Hadi, E. Utami, and D. Ariatmanto, “Detect Fake Reviews Using Random Forest and Support Vector Machine,” SinkrOn, vol. 8, no. 2, pp. 623–630, 2023, doi: 10.33395/sinkron.v8i2.12090.
[5] Z. Hadi and S. Andi, “Detecting Fake Reviews Using N-gram Model and Chi-Square,” 2023 6th Int. Conf. Inf. Commun. Technol., 2023, doi: 10.1109/ICOIACT59844.2023.10455895.
[6] R. Mohawesh et al., “Fake Reviews Detection: A Survey,” IEEE Access, vol. 9, pp. 65771–65802, 2021, doi: 10.1109/ACCESS.2021.3075573.
[7] M. Abdulqader, A. Namoun, and Y. Alsaawy, “Fake Online Reviews: A Unified Detection Model Using Deception Theories,” IEEE, vol. 10, pp. 128622–128655, 2022, doi: 10.1109/ACCESS.2022.3227631.
[8] A. Ahmed, I. Bacho, and S. Talpur, “Identification of Real and Fake Reviews Written in Roman Urdu,” vol. 5, no. 4, pp. 787–797, 2023.
[9] A. Q. Mir, F. Y. Khan, and M. A. Chishti, “Online Fake Review Detection Using Supervised Machine Learning And BERT Model,” Comput. Lang., 2023.
[10] M. Ott, C. Cardie, and J. T. Hancock, “Negative deceptive opinion spam,” NAACL HLT 2013 - 2013 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. Proc. Main Conf., no. June, pp. 497–501, 2013.
[11] J. K. Rout, A. Dalmia, K. K. R. Choo, S. Bakshi, and S. K. Jena, “Revisiting semi-supervised learning for online deceptive review detection,” IEEE Access, vol. 5, pp. 1319–1327, 2017, doi: 10.1109/ACCESS.2017.2655032.
[12] R. Hassan and M. R. Islam, “Detection of fake online reviews using semi-supervised and supervised learning,” 2nd Int. Conf. Electr. Comput. Commun. Eng. ECCE 2019, pp. 1–5, 2019, doi: 10.1109/ECACE.2019.8679186.
[13] J. Piskorski and G. Jacquet, “TF-IDF Character N-grams versus Word Embedding-based Models for Fine-grained Event Classification: A Preliminary Study,” Proc. Work. Autom. Extr. Socio-political Events from News 2020, no. May, pp. 26–34, 2020.
[14] M. S. Isa, “Penerapan Algoritma BERT dalam Search Engine Google,” Master of Computer Science. Accessed: Sep. 17, 2024. [Online]. Available: https://mti.binus.ac.id/2020/09/03/penerapan-algoritma-bert-dalam-search-engine-google/
[15] M. Mozafari, R. Farahbakhsh, and N. Crespi, “A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media,” Int. Conf. Complex Networks Their Appl., vol. 881, 2019, doi: https://doi.org/10.1007/978-3-030-36687-2_77 .
[16] K. Florio, V. Basile, M. Polignano, P. Basile, and V. Patti, “Time of your hate: The challenge of time in hate speech detection on social media,” Appl. Sci., vol. 10, no. 12, 2020, doi: 10.3390/APP10124180.
[17] G. R. Ditami, E. F. Ripanti, and H. Sujaini, “Implementasi Support Vector Machine untuk Analisis Sentimen Terhadap Pengaruh Program Promosi Event Belanja pada Marketplace,” J. Edukasi dan Penelit. Inform., vol. 8, no. 3, p. 508, 2022, doi: 10.26418/jp.v8i3.56478.
[18] Y. X. Chu, X. G. Liu, and C. H. Gao, “Multiscale models on time series of silicon content in blast furnace hot metal based on Hilbert-Huang transform,” Proc. 2011 Chinese Control Decis. Conf. CCDC 2011, pp. 842–847, 2011, doi: 10.1109/CCDC.2011.5968300.
[19] I. H. Witten, E. Frank, and M. A. Hall, Data Mining: Practical Machine Learning Tools and Techniques. 2011. doi: https://doi.org/10.1016/C2009-0-19715-5.
[20] K. Dinas et al., “Prediksi Jumlah Penggunaan BBM Perbulan Menggunakan Algoritma Decition Tree (C4.5) Pada,” J. Inform. dan Teknol., vol. 1, no. 1, pp. 56–63, 2018.
[21] L. T. E. . Kusrini, Algoritma Data Mining. Buku Algoritma Data Mining, I. Yogyakarta: C.V ANDI, 2009. [Online]. Available: https://books.google.co.id/books?id=-Ojclag73O8C&printsec=frontcover&hl=id#v=onepage&q&f=false
Published
2024-11-20
How to Cite
HADI, Zulpan et al. Detecting Fake Reviews Using BERT and Sublinear_TF Methods on Hotel Reviews in the Lombok Tourism Area. JOURNAL OF APPLIED INFORMATICS AND COMPUTING, [S.l.], v. 8, n. 2, p. 550-556, nov. 2024. ISSN 2548-6861. Available at: <http://704209.wb34atkl.asia/index.php/JAIC/article/view/8721>. Date accessed: 28 nov. 2024. doi: https://doi.org/10.30871/jaic.v8i2.8721.
Section
Articles

Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.