Imbalanced Text Classification on Tourism Reviews using Ada-boost Naïve Bayes

Ika Oktavia Suzanti; Fajrul Ihsan Kamil; Eka Mala Sari Rochman; Huzain Azis; Alfa Faridh Suni; Fika Hastarita Rachman; Firdaus Solihin

doi:10.31961/eltikom.v9i1.1496

Authors

Ika Oktavia Suzanti Universitas Trunojoyo Madura, Indonesia
Fajrul Ihsan Kamil Universitas Trunojoyo Madura, Indonesia
Eka Mala Sari Rochman Universitas Trunojoyo Madura, Indonesia
Huzain Azis Universiti Kuala Lumpur, Malaysia
Alfa Faridh Suni Newcastle University, United Kingdom
Fika Hastarita Rachman Universitas Trunojoyo Madura, Indonesia
Firdaus Solihin Universitas Trunojoyo Madura, Indonesia

DOI:

https://doi.org/10.31961/eltikom.v9i1.1496

Keywords:

Imbalanced Data, Naïve Bayes, Sentiment Analysis, Text Classification, Text Mining

Abstract

Hidden paradise is a term that aptly describes the island of Madura, which offers diverse tourism potential. Through the Google Maps application, tourists can access sentiment-based information about various attractions in Madura, serving both as a reference before visiting and as evaluation material for the local government. The Multinomial Naïve Bayes method is used for text classification due to its simplicity and effectiveness in handling text mining tasks. The sentiment classification is divided into three categories: positive, negative, and mixed. Initial analysis revealed an imbalance in sentiment data, with most reviews being positive. To address this, sampling techniques—both oversampling and undersampling—were applied to achieve a more balanced data distribution. Additionally, the Adaptive Boosting ensemble method was used to enhance the accuracy of the Multinomial Naïve Bayes model. The dataset was split into training and testing sets using ratios of 60:40, 70:30, and 80:20 to evaluate the model’s stability and reliability. The results showed that the highest F1-score, 84.1%, was achieved using the Multinomial Naïve Bayes method with Adaptive Boosting, which outperformed the model without boosting, which had an accuracy of 76%.

Downloads

Download data is not yet available.

References

Z. Abbasi-Moud, H. Vahdat-Nejad, and J. Sadri, “Tourism recommendation system based on semantic clustering and sentiment anal-ysis,” Expert Syst Appl, vol. 167, p. 114324, 2021. https://doi.org/10.1016/j.eswa.2020.114324

S. Arifin, “Digitalisasi pariwisata madura,” Jurnal Komunikasi, vol. 11, no. 1, pp. 53–60, 2017. https://doi.org/10.21107/ilkom.v11i1.2835

E. D. Madyatmadja, B. N. Yahya, and C. Wijaya, “Contextual text analytics framework for citizen report classification: A case study using the Indonesian language,” IEEE Access, vol. 10, pp. 31432–31444, 2022. https://doi.org/10.1109/ACCESS.2022.3158940

S. Symeonidis, D. Effrosynidis, and A. Arampatzis, “A comparative evaluation of pre-processing techniques and their interactions for twitter sentiment analysis,” Expert Syst Appl, vol. 110, pp. 298–310, 2018. https://doi.org/10.1016/j.eswa.2018.06.022

H. Hartono, A. Hajjah, and Y. N. Marlim, “Penerapan Metode Naïve Bayes Classifier Untuk Klasifikasi Judul Berita,” Jurnal Siman-tec, vol. 12, no. 1, pp. 37–46, 2023. https://doi.org/10.21107/simantec.v12i1.19398

F. Fitriyani and T. Arifin, “Penerapan Word N-Gram Untuk Sentiment Analysis Review Menggunakan Metode Support Vector Ma-chine (Studi Kasus: Aplikasi Sambara),” Sistemasi: Jurnal Sistem Informasi, vol. 9, no. 3, pp. 610–621, 2020. https://doi.org/10.32520/stmsi.v9i3.95

B. Y. Pratama and R. Sarno, “Personality classification based on Twitter text using Naive Bayes, KNN and SVM,” in 2015 interna-tional conference on data and software engineering (ICoDSE), 2015, pp. 170–174. https://doi.org/10.1109/ICODSE.2015.7436992

F. Gürcan, “Multi-class classification of turkish texts with machine learning algorithms,” in 2018 2nd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), 2018, pp. 1–5. https://doi.org/10.1109/ISMSIT.2018.8567307

W. Zhang and F. Gao, “An improvement to naive bayes for text classification,” Procedia Eng, vol. 15, pp. 2160–2164, 2011. https://doi.org/10.1016/j.proeng.2011.08.404

V. Vijay and P. Verma, “Variants of Naive Bayes algorithm for hate speech detection in text documents,” in 2023 International Con-ference on Artificial Intelligence and Smart Communication (AISC), 2023, pp. 18–21. https://doi.org/10.1109/AISC56616.2023.10085511

J. B. Awotunde, S. Misra, V. Katta, and O. C. Adebayo, “An ensemble-based hotel reviews system using naive bayes classifier,” 2023. https://doi.org/10.32604/cmes.2023.026812

H. Candra, E. D. Madyatmadja, J. Nathaniel, and M. R. Jonathan, “Sentiment Analysis on Indonesian Telegram Reviews Using Naive Bayes, SVM, Random Forest, and Boosting Models,” in 2024 8th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), 2024, pp. 493–498. https://doi.org/10.1109/ICITISEE63424.2024.10730718

U. K. Agrawal, B. V Ramana, D. Singh, and N. Panda, “Leveraging LSTM and Multinomial Naive Bayes for Nuanced Textual-Based Sentiment Analysis,” SN Comput Sci, vol. 5, no. 8, p. 1090, 2024. https://doi.org/10.1007/s42979-024-03463-3

R. Cekik, “A New Filter Feature Selection Method for Text Classification,” IEEE Access, 2024. https://doi.org/10.1109/ACCESS.2024.3468001

Y. A. Singgalen, “Analisis Performa Algoritma NBC, DT, SVM dalam Klasifikasi Data Ulasan Pengunjung Candi Borobudur Berbasis CRISP-DM,” Build. Informatics, Technol. Sci, vol. 4, no. 3, pp. 1634–1646, 2022. https://doi.org/10.47065/bits.v4i3.2766

P. Wang et al., “Classification of proactive personality: Text mining based on weibo text and short-answer questions text,” Ieee Ac-cess, vol. 8, pp. 97370–97382, 2020. https://doi.org/10.1109/ACCESS.2020.2995905

J. Chen, H. Huang, S. Tian, and Y. Qu, “Feature selection for text classification with Na"ive Bayes,” Expert Syst Appl, vol. 36, no. 3, pp. 5432–5435, 2009. https://doi.org/10.1016/j.eswa.2008.06.054

M. R. H. Suryanto and D. W. Utomo, “Pembelajaran Ensemble Untuk Klasifikasi Ulasan Pelanggan E-commerce Menggunakan Teknik Boosting,” Infotekmesin, vol. 15, no. 2, pp. 238–244, 2024. https://doi.org/10.35970/infotekmesin.v15i2.2314

D. Setiyawati and N. Cahyono, “Analisis Sentimen Pengguna Sosial Media Twitter Terhadap Perokok Di Indonesia,” The Indonesian Journal of Computer Science, vol. 12, no. 1, 2023. https://doi.org/10.33022/ijcs.v12i1.3154

M. R. Indrahimawan, P. I. Santosa, and T. B. Adji, “Handling Data Imbalance Using Text Augmentation For Classifying Public Com-plaints,” in 2023 International Conference on Computer, Control, Informatics and its Applications (IC3INA), 2023, pp. 284–289. https://doi.org/10.1109/IC3INA60834.2023.10285813

J. Song, X. Huang, S. Qin, and Q. Song, “A bi-directional sampling based on K-means method for imbalance text classification,” in 2016 IEEE/ACIS 15th international conference on computer and information science (ICIS), 2016, pp. 1–5. https://doi.org/10.1109/ICIS.2016.7550920

X. Chen, W. Zhang, S. Pan, and J. Chen, “Solving data imbalance in text classification with constructing contrastive samples,” IEEE Access, vol. 11, pp. 90554–90562, 2023. https://doi.org/10.1109/ACCESS.2023.3306805

J. Laksana and A. Purwarianti, “Indonesian Twitter text authority classification for government in Bandung,” in 2014 International Conference of Advanced Informatics: Concept, Theory and Application (ICAICTA), 2014, pp. 129–134. https://doi.org/10.1109/ICAICTA.2014.7005928

I. O. Suzanti and A. Jauhari, “Comparison of Stemming and Similarity Algorithms in Indonesian Translated Al-Qur’an Text Search,” Jurnal Ilmiah KURSOR, vol. 11, no. 2, p. 91, 2021. https://doi.org/10.21107/kursor.v11i2.280

A. Khurana and O. P. Verma, “Optimal feature selection for imbalanced text classification,” IEEE Transactions on Artificial Intelli-gence, vol. 4, no. 1, pp. 135–147, 2022. https://doi.org/10.1109/TAI.2022.3144651

C. Magnolia, A. Nurhopipah, and B. A. Kusuma, “Penanganan imbalanced dataset untuk klasifikasi komentar program kampus merdeka pada aplikasi twitter,” Edu Komputika Journal, vol. 9, no. 2, pp. 105–113, 2022. https://doi.org/10.15294/edukomputika.v9i2.61854

G. I. Winata and M. L. Khodra, “Handling imbalanced dataset in multi-label text categorization using Bagging and Adaptive Boost-ing,” in 2015 International Conference on Electrical Engineering and Informatics (ICEEI), 2015, pp. 500–505. https://doi.org/10.1109/ICEEI.2015.7352552

S. Liu, H. Tao, and S. Feng, “Text classification research based on bert model and bayesian network,” in 2019 Chinese automation congress (CAC), 2019, pp. 5842–5846. https://doi.org/10.1109/CAC48633.2019.8996183

R. A. Pane, M. S. Mubarok, N. S. Huda, and others, “A multi-lable classification on topics of quranic verses in english translation using multinomial naive bayes,” in 2018 6th International Conference on Information and Communication Technology (ICoICT), 2018, pp. 481–484. https://doi.org/10.1109/ICoICT.2018.8528777

G. Singh, B. Kumar, L. Gaur, and A. Tyagi, “Comparison between multinomial and Bernoulli na"ive Bayes for text classification,” in 2019 International conference on automation, computational and technology management (ICACTM), 2019, pp. 593–596. https://doi.org/10.1109/ICACTM.2019.8776800

Y. Lin, Z. Yu, K. Yang, Z. Fan, and C. L. P. Chen, “Boosting Adaptive Weighted Broad Learning System for Multi-Label Learning,” IEEE/CAA Journal of Automatica Sinica, vol. 11, no. 11, pp. 2204–2219, 2024. https://doi.org/10.1109/JAS.2024.124557

K. M. Ting, “Confusion Matrix,” in Encyclopedia of Machine Learning and Data Mining, Boston, MA: Springer US, 2017, pp. 260–260. https://doi.org/10.1007/978-1-4899-7687-1_50

Imbalanced Text Classification on Tourism Reviews using Ada-boost Naïve Bayes

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

info

Make a Submission

stat

Imbalanced Text Classification on Tourism Reviews using Ada-boost Naïve Bayes

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

info

Make a Submission

login

stat