Classification of Diabetes Mellitus using Decision Trees

Authors

  • Rinci Kembang Hapsari Institut Teknologi Adhi Tama Surabaya, Indonesia
  • Abdullah Harits Salim Institut Teknologi Sepuluh November, Indonesia
  • Leonardo Fahsi Oktavian Institut Teknologi Adhi Tama Surabaya, Indonesia
  • Aldy Ramadhan Fitra Institut Teknologi Adhi Tama Surabaya, Indonesia

DOI:

https://doi.org/10.31961/eltikom.v9i2.1461

Keywords:

classification, decision tree, diabetes mellitus, pre-processing

Abstract

Diabetes Mellitus is a global health concern, with its prevalence and incidence rising sharply world-wide, including in Indonesia. Several factors contribute to the onset of diabetes mellitus, such as heredity, age, weight, and blood pressure. Managing blood sugar levels, maintaining a balanced diet, exercising regularly, and undergoing early screening when necessary are among the key measures to prevent and control this disease. Early diagnosis is essential to reduce both the number of cases and the associated risks. This study aims to detect diabetes mellitus using classification techniques. The method involves several subprocesses within the classification procedure. The first stage, data preprocessing, includes feature selection and data cleaning. The resulting preprocessed data are then used in the classification stage, specifically the learning subprocess, to generate a decision tree model. Model construction employs pruning, followed by training and performance evaluation. The study utilizes a diabetes dataset obtained from kaggle.com, consisting of 768 records. The dataset includes attributes such as Pregnancies, Glucose, Blood Pressure, Skin Thickness, Insulin, Body Mass Index (BMI), Diabetes Pedigree Function, Age, and the label Outcome. Testing was conducted using decision trees with maximum depths of 3, 5, 7, 10, and 15. The results show that the highest accuracy (88.56%) occurred at a maximum depth of 5, while the highest recall (100%) was achieved at a depth of 3. The highest precision (47.37%) and specificity (95.85%) were also obtained at a depth of 3.

Downloads

Download data is not yet available.

References

[1] WHO, “Cardiovascular diseases,” World Health Organization: Geneva, Switzerland, 2017. https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds) (accessed Mar. 12, 2021).

[2] “Diabetes Mellitus,” Pusat Data Dan Informasi Kementrian Kesehatan Republik Indonesia, 2020. https://pusdatin.kemkes.go.id/article/view/20111800001/diabetes-melitus.html

[3] I. Oktanisa and A. A. Supianto, “Comparison of Classification Techniques in Data Mining for Bank Direct Marketing,” J. Teknol. Inf. dan Ilmu Komput., vol. 5, no. 5, p. 567, 2018, doi: 10.25126/jtiik.201855958.

[4] V. C. Bavkar and A. A. Shind, “Machine learning algorithms for Diabetes prediction and neural network method for blood glucose measurement,” Indian J. Sci. Technol., vol. 14, no. 10, pp. 869–880, 2021, doi: https://doi.org/ 10.17485/IJST/v14i10.2187 ∗.

[5] E. S. Almutairi and M. F. Abbod, “Machine Learning Methods for Diabetes Prevalence Classification in Saudi Arabia,” Modelling, vol. 2023, no. 4, pp. 37–55, 2023, doi: https://doi.org/10.3390/ modelling4010004.

[6] A. M. Argina, “Application of the K-Nearest Neighbor Classification Method on a Dataset of Diabetes Patients,” Indones. J. Data Sci., vol. 1, no. 2, pp. 29–33, 2020, doi: 10.33096/ijodas.v1i2.11.

[7] A. C. Mawarni, R. Rusdah, L. L. Hin, and D. Anubhakti, “Early Detection of Early Symptoms of Diabetes using the Random Forest Algorithm,” IDEALIS Indones. J. Inf. Syst., vol. 6, no. 2, pp. 165–171, 2023, doi: 10.36080/idealis.v6i2.3018.

[8] E. Patimah, V. B. Haekal, and D. Sandya Prasvita, “Classification of Liver Disease Using the Decision Tree Method,” in Seminar Nasional Mahasiswa Ilmu Komputer dan Aplikasinya (SENAMIKA) Jakarta-Indonesia, Universitas Pembangunan Nasional Veteran Jakarta, 2021, pp. 655–659. [Online]. Available: https://conference.upnvj.ac.id/index.php/senamika/article/view/1388

[9] B. Gopi, C. Nalini, and A. Francesco, “Late-Life Alzheimer’s Disease (AD) Detection Using Pruned Decision Trees,” Int. J. Brain Disord. Treat., vol. 6, no. 1, pp. 8–11, 2020, doi: 10.23937/2469-5866/1410033.

[10] A. Fuoco, A. Komeshak, C. Farida, C. White, and M. Mahmoud, “Decision Tree Classifier Based Model for Disease Prediction,” in Conference: The 2021 World Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE), 2021. [Online]. Available: https://www.researchgate.net/publication/353654932_Decision_Tree_Classifier_Based_Model_for_Disease_Prediction

[11] A. Khalemsky, R. Gelbard, and Y. Stukalin, “Constructing a Course on Classification Methods for Undergraduate Non-STEM Students: Striving to Reach Knowledge Discovery Constructing a Course on Classification Methods for Undergraduate Non-STEM Students: Striving to Reach Knowledge Discovery,” J. Stat. Data Sci. Educ., vol. 33, no. 1, pp. 68–76, 2025, doi: 10.1080/26939169.2024.2320218.

[12] N. L. W. S. R. Ginantra et al., Data Mining and Application of Algorithms, Pertama. Yayasan Kita Menulis, 2021.

[13] N. S´anchez-Maro˜no, A. Alonso-Betanzos, and M. Tombilla-Sanrom´an, “Intelligent Data Engineering and Automated Learning - IDEAL 2007,” in Conference: Intelligent Data Engineering and Automated Learning - IDEAL 2007, 2007, pp. 790–799. doi: 10.1007/978-3-540-77226-2.

[14] H. Wang, S., Tang, J., & Liu, Encyclopedia of Machine Learning and Data Science. 2020. doi: 10.1007/978-1-4899-7502-7.

[15] J. N. B. S. Ringo, W. J. Mursalin, N. C. Nurfadilah, D. R. Ramadhan, and W. O. Z. Madjida, “Perbandingan Metode Klasifikasi Multiclass untuk Pemetaan Zona Risiko COVID-19 di Pulau Jawa,” J. Komput. dan Inform., vol. 9, no. 1, pp. 98–107, 2021, doi: 10.35508/jicon.v9i1.3602.

[16] R. K. Hapsari, E. Purwanti, W. Widyanto, R. Gunawan, F. Nurlaily, and A. H. Salim, “Optimization Based Random Forest Algorithm Modification for Detecting Monkeypox Disease,” in 2023 Sixth International Conference on Vocational Education and Electrical Engineering (ICVEE), IEEE, 2023, pp. 340–346. doi: 10.1109/icvee59738.2023.10348223.

[17] A. Jain, D. Somwanshi, K. Joshi, and S. S. Bhatt, “A Review: Data Mining Classification Techniques,” in Proceedings of 3rd International Conference on Intelligent Engineering and Management, ICIEM 2022, 2022, pp. 636–642. doi: 10.1109/ICIEM54221.2022.9853036.

[18] J. E. Black, J. K. Kueper, and T. S. Williamson, “An introduction to machine learning for classification and prediction,” Fam. Pract., vol. 40, no. October, pp. 200–204, 2023.

[19] B. T. Jijo and A. Abdulazeez, “Classification Based on Decision Tree Algorithm for Machine Learning,” J. Appl. Sci. Technol. Trends, vol. 2, no. 01, pp. 20–28, 2021, doi: 10.38094/jastt20165.

[20] G. Nanfack, P. Temple, and B. Frénay, “Constraint Enforcement on Decision Trees: A Survey,” ACM Comput. Surv., vol. 54, no. 10, 2022, doi: 10.1145/3506734.

[21] P. Bhargav and K. Sashirekha, “A Machine Learning Method for Predicting Loan Approval by Comparing the Random Forest and Decision Tree Algorithms.,” J. Surv. Fish. Sci., vol. 10, no. 1S, pp. 1803–1813, 2023, [Online]. Available: https://sifisheriessciences.com/journal/index.php/journal/article/view/414

[22] M. S. A. Rahman, N. A. A. Jamaludin, Z. Zainol, and T. M. T. Sembok, “The Application of Decision Tree Classification Algorithm on Decision-Making for Upstream Business,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 8, pp. 660–667, 2023, doi: 10.14569/IJACSA.2023.0140873.

[23] A. Arifuddin, G. S. Buana, R. A. Vinarti, and A. Djunaidy, “Performance Comparison of Decision Tree and Support Vector Machine Algorithms for Heart Failure Prediction,” Procedia Comput. Sci., vol. 234, pp. 628–636, 2024, doi: 10.1016/j.procs.2024.03.048.

[24] F. Hajjej, M. A. Alohali, M. Badr, and A. Rahman, “A Comparison of Decision Tree Algorithms in the Assessment of Biomedical Data,” Biomed Res. Int., vol. 2022, 2022, doi: 10.1155/2022/9449497.

[25] R. K. Hapsari, M. Miswanto, R. Rulaningtyas, and H. Suprajitno, “Identification of Diabetes Mellitus and High Cholesterol Based on Iris Image,” J. Hunan Univ. (Natural Sci., vol. 48, no. 10, pp. 151–160, 2021, [Online]. Available: http://jonuns.com/index.php/journal/article/view/776

[26] A. S. Paymode and V. B. Malode, “Transfer Learning for Multi-Crop Leaf Disease Image Classi fi cation using Convolutional Neural Network VGG,” Artif. Intell. Agric., vol. 6, pp. 23–33, 2022, doi: 10.1016/j.aiia.2021.12.002.

[27] D. Saputra, W. S. Dharmawan, M. Wahyudi, W. Irmayani, J. Sidauruk, and Martias, “Performance Comparison and Optimized Algorithm Classification,” J. Phys. Conf. Ser., vol. 1641, no. 1, 2020, doi: 10.1088/1742-6596/1641/1/012087.

[28] O. Y. Odufuwa, L. K. Tartibu, and K. Kusakana, “Artificial neural network modelling for predicting efficiency and emissions in mini-diesel engines: Key performance indicators and environmental impact analysis,” Fuel, vol. 387, no. December, p. 134294, 2025, doi: 10.1016/j.fuel.2025.134294.

[29] M. Ahsan et al., “Deep transfer learning approaches for Monkeypox disease diagnosis,” Expert Syst. Appl., vol. 216, no. December, p. 119483, 2023, doi: 10.1016/j.eswa.2022.119483.

[30] F. R. Razak, M. K. Biddinika, and H. Yuliasnyah, “Radial basis function model for obesity classification based on lifestyle and physical condition,” J. ELTIKOM J. Tek. Elektro, Teknol. Inf. dan Komput., vol. 8, no. 2, pp. 192–200, 2024, doi: http://doi.org/10.31961/eltikom.v8i2.1347.

[31] D. Muller, I. Soto-Rey, and F. Kramer, “Towards a Guideline for Evaluation Metrics in Medical Image Segmentation,” arXiv (Cornell Univ., pp. 1–7, 2022, doi: https://doi.org/10.48550/arxiv.2202.05273.

[32] R. K. Hapsari, M. Miswanto, R. Rulaningtyas, H. Suprajitno, and G. H. Seng, “Modified Gray-Level Haralick Texture Features for Early Detection of Diabetes Mellitus and High Cholesterol with Iris Image,” Int. J. Biomed. Imaging, vol. 2022, 2022, doi: https://doi.org/10.1155/2022/5336373.

[33] G. M. Foody, “Challenges in the real world use of classification accuracy metrics: From recall and precision to the Matthews correlation coefficient,” PLoS One, vol. 18, no. 10, pp. 1–19, 2023, doi: https://doi.org/10.1371/journal.pone.0291908.

[34] I. Ihoume, R. Tadili, N. Arbaoui, M. Benchrifa, A. Idrissi, and M. Daoudi, “Developing a multi-label tinyML machine learning model for an active and optimized greenhouse microclimate control from multivariate sensed data,” Artif. Intell. Agric., vol. 6, pp. 129–137, 2022, doi: 10.1016/j.aiia.2022.08.003.

[35] B. Pham, P. T. Le, T. Tai, Y. Hsu, Y. Li, and J.-C. Wang, “Electrocardiogram Heartbeat Classification for Arrhythmias and Myocardial Infarction,” Sensors, vol. 23, pp. 1–18, 2023, doi: doi.org/10.3390/s23062993.

[36] J. Nemecek, T. Pevny, and J. Marecek, “Improving the Validity of Decision Trees as Explanations arXiv: 2306 . 06777v5 [ cs . LG ] 4 Jun 2024,” arXiv (Cornell Univ., vol. June, 2024, doi: https://doi.org/10.48550/arxiv.2306.06777.

Downloads

Published

22-12-2025

Issue

Section

Articles

How to Cite

[1]
2025. Classification of Diabetes Mellitus using Decision Trees. Jurnal ELTIKOM : Jurnal Teknik Elektro, Teknologi Informasi dan Komputer. 9, 2 (Dec. 2025), 107–114. DOI:https://doi.org/10.31961/eltikom.v9i2.1461.

Similar Articles

1-10 of 49

You may also start an advanced similarity search for this article.