Type 2 Diabetes Mellitus (T2DM) is associated with several complications. Previous research has classified the prognosis of T2DM for complications using a patient dataset at Panti Rapih Hospital, Yogyakarta. The dataset was taken from six years of patient medical records. The dataset had a missing value of 29%. Previous studies compared several imputation methods for missing values. The results demonstrated that the Linear Regression method provided the best imputation performance compared with other imputation methods, including MEAN and KNN. However, the Linear Regression imputation method cannot impute all the missing values in the dataset. The results obtained 598 instances with non-missing values out of the 700 instances in the dataset. Therefore, this study proposes the development of an ensemble method to impute missing values optimally. The proposed Ensemble Majority Voting method uses the imputation results from three base learner methods–MEAN, KNN, and Linear Regression–to vote by calculating the MODE value. The voting results are used as imputation values to build a new imputation dataset. The results show that Ensemble Majority Voting can impute all the missing values. Furthermore, this method improved the performance of the decision tree and Support Vector Machine classification methods when the imputed dataset was applied to classify the prognosis of T2DM patients with their complications. The best accuracy increased by 1.2%.
Makalah
Sertifikat Author
[1] R. F. McCloud, M. A. Bekalu, T. Vaughan, L.Maranta, E. Peck, and K.Viswanath, Evidence for Decision-Making: The Importance of Systematic Data Collection as an Essential Component of Responsive Feedback, Glob. Heal. Sci. Pract., vol. 11, no. Supplement 2, p. e2200246, Dec. 2023, doi:10.9745/GHSP-D-22-00246.
[2] C. F. Caiafa, Z. Sun, T. Tanaka, P. Marti-Puig, and J. Sole-Casals, “Machine Learning Methods with Noise, Incomplete, or Small Datasets.” pp.2–4, 2021.[Online]. Available:https://doi.org/10.3390/app11094132.
[3] Hasan, M. K., Alam, M. A., Roy, S., Dutta, A., Jawad, M. T., and Das, S. (2021). Missing value imputation affects the performance of machine learning: a review and analysis of the literature (2010–2021). Informatics in Medicine Unlocked 27. https://doi.org/10.1016/j.imu.2021.100799.
[4] O. Altukhova, “Choice of method imputation missing values for obstetrics clinical data,” vol. 00, 2020.
[5] S. Alam, M. Sohaib, S. Arora, and M. Asad, “An investigation of the imputation techniques for missing values in ordinal data enhancing clustering and classification analysis validity,” Decis. Anal. J., vol. 9, no. October, 2023.
[6] L. O. Joel, W. Doorsamy, B. S. Paul, “A comparative study of imputation techniques for missing values in healthcare diagnostic datasets.” pp.6357–6373, 2025. [Online]. Available: https://doi.org/10.1007/s41060-
025-00825-9
[7] C. F. Tsai and Y. H. Hu, “Empirical comparison of supervised learning techniques for missing value imputation,” Knowl. Inf. Syst., vol. 64, no.4, pp. 1047–1075, 2022, doi: 10.1007/s10115-022-01661-0.
[8] A. Andriani, S. and Hartati, C. W. Danawati, “Missing Value Imputation in Data MCAR for Classification of Type 2 Diabetes Mellitus and its
Complications,” Int. J. Adv. Comput. Sci. Appl., vol. 15, no. 8, pp.459–466, 2024.
[9] W. Wen, B. Goh, H. Wai, H. Hui, and L. Wong, “How missing value imputation is confounded with batch effects and what you can do about it,” Drug Discov. Today, vol. 28, no. 9, p. 103661, 2023, doi:
10.1016/j.drudis.2023.103661.
[10] D. Santhusitha and K. Karunasingha,“Root mean square error or mean absolute error? Using this ratio,”vol. 585, pp. 609–629, 2022,doi:10.1016/j.ins.2021.11.036.
[11] H.-J. Park, Y.-S. Koo, H.-Y. Yang, Y.-S. Han, and C.-S. Nam, “Study on Data Preprocessing for Machine Learning Based on Semiconductor Manufacturing Processes,” Sensors, pp. 1–14, 2024.
[12] S. I. Khan, A. Sayed, and L. Hoque,“SICE: an improved missing data imputation technique,” Journal of Big Data, 2020,doi: 10.1186/s40537-020-00313-w.
[13] F. I. Kurniadi, R. C. Rohmana and L. Taufani, “Local mean imputation for handling missing values to provide more accurate facies classification.” pp. 301–309, 2023. doi: 10.106/j.procs.2022.12.140.
[14] A. R. Ismail, N. Z. Abidin, and M. K. Maen, “Systematic Review on Missing Data Imputation Techniques with Machine Learning Algorithms for Healthcare,” vol. 3, no. 2, 2022, doi:10.18196/jrc.v3i2.13133.
[15] D. Zou et al., “Outlier detection and data filling based on KNN and LOF for power transformer operation data classification.” pp. 698–711, 2023.
[16] I. Curioso et al.,“Addressing the Curse of Missing Data in Clinical Contexts: A Novel Approach to Correlation-based Imputation,” Journal of King Saud University – Computer and Information Sciences,vol. 35, no. 6, p. 101562, 2023, doi:10.1016/j.jksuci.2023.101562.
[17] J. M. Sangeetha and K. J. Alfia,“Measurement: Sensors financial stock market forecast using evaluated linear regression-based machine learning
technique,” Measurement: Sensors, vol. 31, no. October 2023, p. 100950, 2024, doi:10.1016/j.measen.2023.100950.
[18] A. Entezami, B. Behkamal, C. De Michele, and S. Mariani,“Displacement prediction for long-span bridges via limited remote sensing images: An adaptive ensemble regression method,”vol. 245, no. June 2024, 2025.
[19] J. Chaki, S. T. Ganesh, S. K. Cidham, and S. A. Theertan,“Machine learning and artificial intelligence based Diabetes Mellitus detection and self-management: A systematic review,”vol. 34, pp. 3204–3225, 2022.
[20] T. Boutin, I., and Bendaoud, J. Delmas D., Borel Borel C. al.. Bordreuil, “CIRP Journal of Manufacturing Science and Technology Machine learning approach for weld configuration classification within the GTAW
process,” CIRP J. Manuf. Sci. Technol., vol. 47, no. October, pp.116–131, 2023.