Penyebaran Coronavirus 2019 atau yang disingkat dengan COVID-19 masih belum berakhir setelah hampir 2 tahun virus ini ditemukan. Penyebaran dapat berupa interaksi pada orang yang terpapar, bersin, ataupun batuk. Virus ini lebih rentan terhadap orang yang memiliki usia tua. Selain itu luasnya geografis sebuah negara dan temperatur suatu daerah dapat menjadi faktor penyebaran Covid-19 terjadi. Dari beberapa penyebab penyebaran virus dapat dijadikan sebagai variabel dalam pengolahan data menggunakan data mining. Adapun model yang digunakan dalam pengujian model adalah Decision Tree, Naive Bayes, Logistic Regression, Random forest, k-NN dan Light Gradient Boosting. Hasil terbaik didapatkan oleh model Light Gradient Boosting dengan accuracy 90.03%, AUC 96.66% recall 92.35% dan precision 92.33%. Namun, dilihat dari feature importance fitur tambahan dari cuaca lebih berpengaruh dibandingkan dengan fitur dari perpindahan penduduk.
Tesis
[1] World Health Organization, “COVID-19 Weekly Epidemiological Update 22,” World Heal. Organ., no. December, pp. 1–3, 2021, [Online]. Available: https://www.who.int/docs/default-source/coronaviruse/situation-reports/weekly_epidemiological_update_22.pdf.
[2] C. Sohrabi et al., “World Health Organization declares global emergency: A review of the 2019 novel coronavirus (COVID-19),” International Journal of Surgery. 2020, doi: 10.1016/j.ijsu.2020.02.034.
[3] M. Sahin, “Impact of weather on COVID-19 pandemic in Turkey,” Sci. Total Environ., vol. 728, no. 2020, p. 138810, 2020.
[4] R. Tosepu, J. Gunawan, D. Savitri, L. Ode, A. Imran, and H. Lestari, “Correlation between weather and Covid-19 pandemic in Jakarta, Indonesia,” Sci. Total Environ., no. January, 2020.
[5] L. Muflikhah, D. E. Ratnawati, and R. R. MP, DATA MINIG. Malang: UB Press, 2018.
[6] F. A. Irawan, H. Suhel, and A. E. Wibawanto, “Identifikasi Geospasial Cuaca Dan Kelembapan Terhadap Penyebaran Virus Covid-19 Menggunakan Sistem Informasi Geografis Provinsi Kalimantan Selatan,” J. POROS Tek., vol. 12, no. 2, pp. 99–106, 2020.
[7] E. N. Wahyudi, Y. Anis, and A. Jananto, “Analisa Pengaruh Jumlah Penduduk, Luas Wilayah Dan Cuaca Terhadap Penyebaran Kasus Virus Corona Pada Beberapa Negara Terpapar Di Dunia,” J. Din. Inform., vol. 12, no. 2, pp. 82–97, 2020, doi: 10.35315/informatika.v12i2.8277.
[8] Datartist, “[NeurIPS 2020] Data Science for COVID-19 (DS4C),” 2020. https://www.kaggle.com/kimjihoo/ds4c-what-is-this-dataset-detailed-description.
[9] N. L. W. S. R. Ginantra et al., Data Mining dan Penerapan Algoritma. Medan: Yayasan Kita Menulis, 2021.
[10] P. Ristoski and H. Paulheim, “Semantic Web in data mining and knowledge discovery: A comprehensive survey,” J. Web Semant., vol. 36, pp. 1–22, 2016, doi: 10.1016/j.websem.2016.01.001.
[11] B. Charbuty and A. Abdulazeez, “Classification Based on Decision Tree Algorithm for Machine learning,” J. Appl. Sci. Technol. Trends, vol. 2, no. 01, pp. 20–28, 2021, doi: 10.38094/jastt20165.
[12] C. Chen, L. Geng, and S. Zhou, “Design and implementation of bank CRM system based on Decision Tree algorithm,” Neural Comput. Appl., vol. 8, 2020, doi: 10.1007/s00521-020-04959-8.
[13] S. Chen, G. I. Webb, L. Liu, and X. Ma, “A novel selective naïve Bayes algorithm,” Knowledge-Based Syst., vol. 192, no. xxxx, p. 105361, 2020, doi: 10.1016/j.knosys.2019.105361.
[14] P. A. Rahayuningsih, “Analisis Komparasi Algoritma Klasifikasi Data Mining,” J. Tek. Inform. Kaputama, vol. 3, no. 1, 2019.
[15] C. Bonte and F. Vercauteren, “Privacy-preserving Logistic Regression trainin,” BMC Med. Genomics, vol. 11, no. Suppl 4, 2018, doi: 10.1186/s12920-018-0398-y.
[16] Z. Wu, W. Lin, Z. Zhang, A. Wen, and L. Lin, “An Ensemble Random forest Algorithm for Insurance Big Data Analysis,” Proc. - 2017 IEEE Int. Conf. Comput. Sci. Eng. IEEE/IFIP Int. Conf. Embed. Ubiquitous Comput. CSE EUC 2017, vol. 1, pp. 531–536, 2017, doi: 10.1109/CSE-EUC.2017.99.
[17] Okfalisa, I. Gazalba, Mustakim, and N. G. I. Reza, “Comparative analysis of k-nearest neighbor and modified k-nearest neighbor algorithm for data classification,” Proc. - 2017 2nd Int. Conf. Inf. Technol. Inf. Syst. Electr. Eng. ICITISEE 2017, vol. 2018-Janua, pp. 294–298, 2018, doi: 10.1109/ICITISEE.2017.8285514.
[18] A. Tharwat, H. Mahdi, M. Elhoseny, and A. E. Hassanien, “Recognizing human activity in mobile crowdsensing environment using optimized k-NN algorithm,” Expert Syst. Appl., vol. 107, pp. 32–44, 2018, doi: 10.1016/j.eswa.2018.04.017.
[19] A. A. Taha and S. J. Malebary, “An Intelligent Approach to Credit Card Fraud Detection Using an Optimized Light Gradient Boosting Machine,” IEEE Access, vol. 8, pp. 25579–25587, 2020, doi: 10.1109/ACCESS.2020.2971354.
[20] F. Alzamzami, M. Hoda, and A. El Saddik, “Light Gradient Boosting Machine for General Sentiment Classification on Short Texts: A Comparative Evaluation,” IEEE Access, vol. 8, pp. 101840–101858, 2020, doi: 10.1109/ACCESS.2020.2997330.
[21] A. Novandya, “Penerapan Algoritma Klasifikasi Data Mining Dalam,” KNiST, pp. 368–372, 2017.
[22] S. H. Waluyo and Prihandoko, “Klasifikasi Pemanfaat Program Beras Sejahtera ( RASTRA ) Berdasarkan Tingkat Kemiskinan Dengan Menggunakan Algoritma Decision Tree C4 . 5 Berbasis Particle Swarm Optimization,” vol. 7, no. 2, pp. 19–24, 2017.
[23] S. A. Khan and Z. Ali Rana, “Evaluating Performance of Software Defect Prediction Models Using Area under Precision-Recall Curve (AUC-PR),” 2019 2nd Int. Conf. Adv. Comput. Sci. ICACS 2019, pp. 4–9, 2019, doi: 10.23919/ICACS.2019.8689135.
[24] D. Brzezinski and J. Stefanowski, “Prequential AUC: properties of the area under the ROC curve for data streams with concept drift,” Knowl. Inf. Syst., vol. 52, no. 2, pp. 531–562, 2017, doi: 10.1007/s10115-017-1022-8.
[25] H. Rianto and R. S. Wahono, “Resampling Logistic Regression untuk Penanganan Ketidakseimbangan Class pada Prediksi Cacat Software,” IlmuKomputer.com J. Softw. Eng., vol. 1, no. 1, pp. 46–53, 2015.
[26] J. Kim, S. Jang, W. Lee, J. K. Lee, and D.-H. Jang, “DS4C Patient Policy Province Dataset: a Comprehensive COVID-19 Dataset for Causal and Epidemiological Analysis,” no. NeurIPS, 2020.
[27] H. Al-Najjar and N. Al-Rousan, “A classifier prediction model to predict the status of Coronavirus CoVID-19 patients in South Korea,” Eur. Rev. Med. Pharmacol. Sci., vol. 24, no. 6, pp. 3400–3403, 2020, doi: 10.26355/eurrev_202003_20709.
[28] L. J. Muhammad, M. M. Islam, S. S. Usman, and S. I. Ayon, “Predictive Data Mining Models for Novel Coronavirus (COVID-19) Infected Patients’ Recovery,” SN Comput. Sci., vol. 1, no. 4, pp. 1–7, 2020, doi: 10.1007/s42979-020-00216-w.
[29] T. Alafif, R. Alotaibi, A. Albassam, and A. Almudhayyani, “On the prediction ofisolation, release, and decease states for COVID-19 patients: A case study in South Korea,” no. January, 2020, doi: https://doi.org/10.1016/j.isatra.2020.12.053.
[30] Alvina Felicia Watratan, Arwini Puspita. B, and Dikwan Moeis, “Implementasi Algoritma Naive Bayes Untuk Memprediksi Tingkat Penyebaran Covid-19 Di Indonesia,” J. Appl. Comput. Sci. Technol., vol. 1, no. 1, pp. 7–14, 2020, doi: 10.52158/jacost.v1i1.9.