LIGHT GRADIENT BOOSTING DENGAN OPTIMASI HYPERPARAMATER TUNING UNTUK PREDIKSI COVID-19 BERDASARKAN DARAH DAN USIA

research
  • 27 Feb
  • 2023

LIGHT GRADIENT BOOSTING DENGAN OPTIMASI HYPERPARAMATER TUNING UNTUK PREDIKSI COVID-19 BERDASARKAN DARAH DAN USIA

Pandemi penyakit virus corona (COVID-19) 2019 menyebabkan sejumlah besar kematian di dunia. Skrining COVID-19 diperlukan untuk mengidentifikasi suspek positif COVID19 atau tidak dan dapat mengurangi penyebaran COVID-19. Tes reaksi berantai polimerase (PCR) untuk COVID-19 adalah tes yang menganalisis spesimen pernapasan. Tes darah juga dapat digunakan untuk menunjukkan seseorang yang telah terinfeksi SARS-CoV-2. Selain itu, parameter usia juga berkontribusi terhadap kerentanan penularan COVID-19. Penelitian ini menyajikan pengujian  light gradient boosting  menggunakan teknik sampling SMOTE-Tomek dengan hyperameter tuning dengan mempertimbangkan parameter darah dan usia untuk skrining COVID-19. Pengujian menggunakan data uji Rumah Sakit Albert Einstein di Brazil yang terdiri dari 5.644 sampel data dengan 559 pasien terinfeksi SARS-CoV-2. Penelitian ini mengusulkan peningkatan preprocessing data dengan menggunakan KNN Imputer untuk menangani missing value yang besar. Dilakuan pengujian terhadap metode klasifikasi yang ada seperti Random Forest, Extra Trees, Ada Boost, Gradient Boosting, dan Light Gradient Boosting untuk mengukur prediksi pasien yang terinfeksi SARS-CoV-2. Selanjutnya, dilakukan hyperameter tuning untuk light gradient boosting untuk optimasi hasil. Hasil pengujian menunjukkan bahwa usulan dalam menggunakan light gradient boosting  menggunakan teknik sampling ROS dengan hyperameter tuning mencapai hasil Accuracy 98,58%, Recall 98,58%, Precision 98,61%, F1-Score 98,61% dan AUC 0,9682.

Unduhan

  • Tesis.pdf

    LIGHT GRADIENT BOOSTING DENGAN OPTIMASI HYPERPARAMATER TUNING UNTUK PREDIKSI COVID-19 BERDASARKAN DARAH DAN USIA

    •   diunduh 772x | Ukuran 3,234 KB

 

REFERENSI

DAFTAR PUSTAKA

 

[1]      M. A. Shereen, S. Khan, A. Kazmi, N. Bashir, and R. Siddique, “COVID-19 infection: Origin, transmission, and characteristics of human coronaviruses,” J. Adv. Res., vol. 24, pp. 91–98, 2020, doi: 10.1016/j.jare.2020.03.005.

[2]      A. Al-Hazmi, “Challenges presented by MERS corona virus, and SARS corona virus to global health,” Saudi J. Biol. Sci., vol. 23, no. 4, pp. 507–511, 2016, doi: 10.1016/j.sjbs.2016.02.019.

[3]      H. A. Rothan and S. N. Byrareddy, “The epidemiology and pathogenesis of coronavirus disease (COVID-19) outbreak,” J. Autoimmun., vol. 109, no. February, p. 102433, 2020, doi: 10.1016/j.jaut.2020.102433.

[4]      Q. Li et al., “Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus–Infected Pneumonia,” N. Engl. J. Med., vol. 382, no. 13, pp. 1199–1207, 2020, doi: 10.1056/nejmoa2001316.

[5]      Z. Zhang et al., “Insight into the practical performance of RT-PCR testing for SARS-CoV-2 using serological data: a cohort study,” The Lancet Microbe, vol. 2, no. 2, pp. e79–e87, 2021, doi: 10.1016/S2666-5247(20)30200-7.

[6]      T. Ai et al., “Correlation of Chest CT and RT-PCR Testing for Coronavirus Disease 2019 (COVID-19) in China: A Report of 1014 Cases,” Radiology, vol. 296, no. 2, pp. E32–E40, 2020, doi: 10.1148/radiol.2020200642.

[7]      E. F. Strasser et al., “Validation of a SARS-CoV-2 RNA RT-PCR assay for high-throughput testing in blood of COVID-19 convalescent plasma donors and patients,” Transfusion, vol. 61, no. 2, pp. 368–374, 2021, doi: 10.1111/trf.16178.

[8]      V. A. de F. Barbosa et al., “Heg.IA: an intelligent system to support diagnosis of Covid-19 based on blood tests,” Res. Biomed. Eng., no. December 2019, 2021, doi: 10.1007/s42600-020-00112-5.

[9]      A. Imran et al., “AI4COVID-19: AI enabled preliminary diagnosis for COVID-19 from cough samples via an app,” Informatics Med. Unlocked, vol. 20, p. 100378, 2020, doi: 10.1016/j.imu.2020.100378.

 

 

[10]    D. Ferrari, A. Motta, M. Strollo, G. Banfi, and M. Locatelli, “Routine blood tests as a potential diagnostic tool for COVID-19,” Clin. Chem. Lab. Med., vol. 58, no. 7, pp. 1095–1099, 2020, doi: 10.1515/cclm-2020-0398.

[11]    C. M. Goldstein E, Lipsitch M, “On the effect of age on the transmission of SARS-CoV-2 in households, schools and the community,” J. Infect. Dis., 2020, doi: https://doi.org/10.1101/2020.07.19.20157362.

[12]    M. Dorn et al., “Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets,” PeerJ Comput. Sci., vol. 7, p. e670, 2021, doi: 10.7717/peerj-cs.670.

[13]    M. S. Pulia, T. P. O’Brien, P. C. Hou, A. Schuman, and R. Sambursky, “Multi-tiered screening and diagnosis strategy for COVID-19: a model for sustainable testing capacity in response to pandemic,” Ann. Med., vol. 52, no. 5, pp. 207–214, 2020, doi: 10.1080/07853890.2020.1763449.

[14]    M. A. Alves et al., “Explaining machine learning based diagnosis of COVID-19 from routine blood tests with decision trees and criteria graphs,” Comput. Biol. Med., vol. 132, no. March, 2021, doi: 10.1016/j.compbiomed.2021.104335.

[15]    K. Handayani, E. H. Juningsih, D. Riana, S. Hadianti, A. Rifai, and R. K. Serli, “Measuring The Quality of Website Services covid19.kalbarprov.go.id Using The Webqual 4.0 Method,” J. Phys. Conf. Ser., vol. 1641, p. 012049, 2020, doi: 10.1088/1742-6596/1641/1/012049.

[16]    A. Jarndal, S. Husain, O. Zaatar, T. Al Gumaei, and A. Hamadeh, “GPR and ANN based Prediction Models for COVID-19 Death Cases,” Proc. 2020 IEEE Int. Conf. Commun. Comput. Cybersecurity, Informatics, CCCI 2020, no. Ml, 2020, doi: 10.1109/CCCI49893.2020.9256564.

[17]    Y. Jiang, H. Chen, M. Loew, and H. Ko, “COVID-19 CT Image Synthesis with a Conditional Generative Adversarial Network,” IEEE J. Biomed. Heal. Informatics, vol. 25, no. 2, pp. 441–452, 2021, doi: 10.1109/JBHI.2020.3042523.

[18]    S. Aktar et al., “Predicting Patient COVID-19 Disease Severity by means of Statistical and Machine Learning Analysis of Blood Cell Transcriptome Data,” 2020, doi: 10.2196/25884.

[19]    C. E. G. Moreta, M. R. C. Acosta, and I. Koo, “Prediction of digital terrestrial television coverage using machine learning regression,” IEEE Trans. Broadcast., vol. 65, no. 4, pp. 702–712, 2019, doi: 10.1109/TBC.2019.2901409.

[20]    M. R. Camana Acosta, S. Ahmed, C. E. Garcia, and I. Koo, “Extremely randomized trees-based scheme for stealthy cyber-attack detection in smart grid networks,” IEEE Access, vol. 8, no. Ml, pp. 19921–19933, 2020, doi: 10.1109/ACCESS.2020.2968934.

[21]    E. K. Ampomah, Z. Qin, and G. Nyame, “Evaluation of tree-based ensemble machine learning models in predicting stock price direction of movement,” Inf., vol. 11, no. 6, 2020, doi: 10.3390/info11060332.

[22]    B. Baranidharan, A. Pal, and P. Muruganandam, “Cardio-vascular disease prediction based on ensemble technique enhanced using extra tree classifier for feature selection,” Int. J. Recent Technol. Eng., vol. 8, no. 3, pp. 3236–3242, 2019, doi: 10.35940/ijrte.C5404.098319.

[23]    A. Zafari, R. Zurita-Milla, and E. Izquierdo-Verdiguier, “Land Cover Classification Using Extremely Randomized Trees: A Kernel Perspective,” IEEE Geosci. Remote Sens. Lett., vol. 17, no. 10, pp. 1702–1706, 2020, doi: 10.1109/LGRS.2019.2953778.

[24]    K. Nugroho et al., “Improving random forest method to detect hatespeech and offensive word,” 2019 Int. Conf. Inf. Commun. Technol. ICOIACT 2019, pp. 514–518, 2019, doi: 10.1109/ICOIACT46704.2019.8938451.

[25]    W. Guohua, Y. Diping, Y. Jiyao, Z. Wenhua, D. Peng, and X. Yiqing, “Research on Non-Intrusive Load Monitoring Based on Random Forest Algorithm,” 4th Int. Conf. Smart Grid Smart Cities, ICSGSC 2020, pp. 1–5, 2020, doi: 10.1109/ICSGSC50906.2020.9248565.

[26]    Y. Guo, Y. Zhou, X. Hu, and W. Cheng, “Research on recommendation of insurance products based on random forest,” Proc. - 2019 Int. Conf. Mach. Learn. Big Data Bus. Intell. MLBDBI 2019, pp. 308–311, 2019, doi: 10.1109/MLBDBI48998.2019.00069.

[27]    R. Chatterjee, A. Datta, and D. K. Sanyal, Ensemble Learning Approach to Motor Imagery EEG Signal Classification. Elsevier Inc., 2019.

[28]    S. B. Koduri, L. Gunisetti, C. R. Ramesh, K. Mutyalu, and D. Ganesh, “Prediction of crop production using adaboost regression method Prediction of crop production using adaboost regression method,” J. Phys. Conf. Ser., 2019, doi: 10.1088/1742-6596/1228/1/012005.

[29]    H. Rao et al., “Feature selection based on artificial bee colony and gradient boosting decision tree,” Appl. Soft Comput. J., 2019, doi: 10.1016/j.asoc.2018.10.036.

[30]    Z. Zhang et al., “Exploring the clinical features of narcolepsy type 1 versus narcolepsy type 2 from European Narcolepsy Network database with machine learning,” Sci. Rep., vol. 8, no. 1, pp. 1–12, 2018, doi: 10.1038/s41598-018-28840-w.

[31]    Y. Ju, G. Sun, Q. Chen, M. Zhang, H. Zhu, and M. U. Rehman, “A model combining convolutional neural network and lightgbm algorithm for ultra-short-term wind power forecasting,” IEEE Access, vol. 7, no. c, pp. 28309–28318, 2019, doi: 10.1109/ACCESS.2019.2901920.

[32]    Y. Su, “Prediction of air quality based on Gradient Boosting Machine Method,” Proc. - 2020 Int. Conf. Big Data Informatiz. Educ. ICBDIE 2020, pp. 395–397, 2020, doi: 10.1109/ICBDIE50010.2020.00099.

[33]    S. P. Singh, P. Singh, and A. Mishra, “Predicting Potential Applicants for any Private College using LightGBM,” 2020 Int. Conf. Innov. Trends Inf. Technol. ICITIIT 2020, 2020, doi: 10.1109/ICITIIT49094.2020.9071525.

[34]    E. Rendón, R. Alejo, C. Castorena, F. J. Isidro-Ortega, and E. E. Granda-Gutiérrez, “Data sampling methods to dealwith the big data multi-class imbalance problem,” Appl. Sci., vol. 10, no. 4, 2020, doi: 10.3390/app10041276.

[35]    R. Mohammed, J. Rawashdeh, and M. Abdullah, “Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results,” 2020 11th Int. Conf. Inf. Commun. Syst. ICICS 2020, no. May, pp. 243–248, 2020, doi: 10.1109/ICICS49469.2020.239556.

[36]    T. Yu and H. Zhu, “Hyper-Parameter Optimization: A Review of Algorithms and Applications,” pp. 1–56, 2020.

[37]    P. Probst, A. L. Boulesteix, and B. Bischl, “Tunability: Importance of hyperparameters of machine learning algorithms,” J. Mach. Learn. Res., vol. 20, pp. 1–32, 2019.

[38]    T. T. Joy, S. Rana, S. Gupta, and S. Venkatesh, “Hyperparameter tuning for big data using Bayesian optimisation,” Proc. - Int. Conf. Pattern Recognit., vol. 0, pp. 2574–2579, 2016, doi: 10.1109/ICPR.2016.7900023.

[39]    F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” J. Mach. Learn. Res., vol. 12, pp. 2825–2830, 2011.

[40]    H. A. Prihanditya, “The Implementation of Z-Score Normalization and Boosting Techniques to Increase Accuracy of C4 . 5 Algorithm in Diagnosing Chronic Kidney Disease,” vol. 5, no. 1, pp. 63–69, 2020.

[41]    A. F. Sallaby, “Analysis of Missing Value Imputation Application with K-Nearest Neighbor ( K-NN ) Algorithm in Dataset,” vol. 5, no. 2, pp. 141–144, 2021, doi: 10.30865/ijics.v5i2.3185.

[42]    A. Banerjee et al., “Use of Machine Learning and Artificial Intelligence to predict SARS-CoV-2 infection from Full Blood Counts in a population,” Int. Immunopharmacol., vol. 86, no. July, p. 106705, 2020, doi: 10.1016/j.intimp.2020.106705.

[43]    E. C. Gök and M. O. Olgun, “SMOTE-NC and gradient boosting imputation based random forest classifier for predicting severity level of covid-19 patients with blood samples,” Neural Comput. Appl., vol. 0123456789, 2021, doi: 10.1007/s00521-021-06189-y.

[44]    M. Iqbal, “Deep Neural Network Untuk Prediksi Infeksi Covid-19 Berdasarkan Diagnosis Gejala,” 2021.