METODA DISTRIBUTION BASED BALANCE AND BAGGING C4.5 UNTUK PREDIKSI CACAT SOFTWARE

research
  • 06 Mar
  • 2023

METODA DISTRIBUTION BASED BALANCE AND BAGGING C4.5 UNTUK PREDIKSI CACAT SOFTWARE

Software berkualitas tinggi adalah software yang tidak ditemukan cacat (defect) baik selama proses pemeriksaan atau pengujian. cacat software  yang  ditemukan  pada  akhir  proyek  secara  sistematis menyebabkan  penyelesaian  proyek melebihi jadwal yang sudah ditentukan. Dataset NASA MDP Repository adalah software metrics yang  sering  digunakan  pada  penelitian  software defect prediction. Masalah utama dalam dataset software metrics adalah imbalance class yang membuat data menjadi tidak seimbang karena data yang cacat (kelas minoritas) jumlahnya lebih sedikit dibandingkan dengan data yang tidak cacat (kelas mayoritas), masalah ini dapat menurunkan kinerja klasifikasi. Terdapat dua pendekatan yang dapat menangani masalah imbalance class yaitu pendekatan level data dan pendekatan level algoritma (ensemble), sedangkan klasifikasi adalah pendekatan yang paling populer untuk menangani masalah prediksi cacat software. Pada Penelitian ini, untuk menangani masalah imbalance class dilakukan dengan integrasi Distribution Based Balance dan Bagging berbasis classifier C4.5 dan Naïve Bayes. Hasil  penelitian menunjukkan  bahwa  model yang diusulkan mencapai akurasi dan AUC klasifikasi yang lebih tinggi. Rata-rata akurasi 93.84%, rata-rata nilai AUC 0.939 dengan nilai rata-rata peningkatan presentase AUC mencapai 0.34. Hasil kinerja Classifier C4.5 lebih baik dibandingkan Naïve Bayes dengan rata-rata akhir akurasi dari kinerja model klasifikasi 82.42% dan AUC 0.738 lebih baik dibandingkan kinerja algoritma pembanding Naïve Bayes dengan selisih akurasi 4.4% dan selisih AUC 0.023. Model yang diusulkan merupakan  model  terbaik  dalam  penelitian prediksi cacat software untuk menangani masalah imbalance class.

Unduhan

 

REFERENSI

Attenberg, J., & Ertekin, S. (2013). Class Imbalance and Active Learning. In H. He,  &  Y.  Ma,  Imbalanced  Learning:  Foundations,  Algorithms,  and Applications (pp. 101-149). New Jersey: John Wiley & Sons.

Carver, R. H., & Nash, J. G. (2012). Doing Data Analysis with SPSS® Version 18.Boston: Cengage Learning.

Chawla,  N.  V.,  Bowyer,  K.  W.,  Hall,  L.  O.,  &  Kegelmeyer,  W.  P.  (2002).

Cheng, M., Wan, H., Wu, G., & Yuan, M. (2016). Semi-supervised software defect prediction using task-driven dictionary learning. Chinese Journal ofElectronics, 25(6),1089–1096. https://doi.org/10.1049/cje.2016.08.034.

Chiş, M. (2008). Evolutionary Decision Trees and Software Metrics for Module Defects Identification. Program, 2(2), 25–29.

Dubey, R.,  Zhou, J., Wang,  Y., Thompson, P. M., & Ye, J. (2014).  Analysis of Sampling  Techniques  for  Imbalanced  Data:  An  n  =  648  ADNI Study. NeuroImage, 220–241.

Gao, K., Khoshgoftaar, T., & Wald , R. (2014). Combining Feature Selection  and  Ensemble  Learning  for  Software  Quality Estimation. Twenty-Seventh International Florida Artificial Intelligence  Research  society  Conference  (pp.  47-52).

Gorunescu,  F.  (2011).  Data  Mining:  Concepts,  Models  and  Techniques.  Berlin: Springer-Verlag.

Hall, T., Beecham, S., Bowes, D., Gray, D., & Counsell, S. (2012). A systematic literature  review  on  fault  prediction  performance  in  software  engineering. IEEE  Transactions  on  Software  Engineering,   38(03),   1276–1304. http://doi.org/10.1109/TSE.2011.103.

Han,  J.,  Kamber,  M.,  &  Pei,  J.  (2012).  Data  Mining:  Concepts  and  Techniques (3rd ed.). San Francisco: Morgan Kaufmann Publishers Inc.

Huda, S., Liu, K., Liu, S., Abdelrazek, M., Ibrahim, A., Al-dossari, H., & Ahmad, S. (2018). An ensemble oversampling model for class imbalance problem insoftwaredefectprediction,3536(c). https://doi.org/10.1109/ACCESS.2018.2817572.

Korb, K. B., & Nicholson, A. E. (2011). Bayesian Artificial Intelligence (2nd ed.).Florida: CRC Press.

Laradji, I. H., Alshayeb, M., & Ghouti, L. (2015). Software defect prediction using ensemble learning on selected features. Information and Software Technology, 58, 388–402. https://doi.org/10.1016/j.infsof.2014.07.005.

Lehtinen, T. O. A., Mäntylä, M. V., Vanhanen, J., Itkonen, J., & Lassenius, C. (2014). Perceived causes of software project failures - An analysis of their relationships. Information and Software Technology, 56(6), 623–643. https://doi.org/10.1016/j.infsof.2014.01.015.

Lessmann, S., Baesens, B., Mues, C., & Pietsch, S. (2008). Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings. IEEE Transactions on Software Engineering, 34(4), 485–496. https://doi.org/10.1109/TSE.2008.35.

Liu,  X.-Y.,  &  Zhou,  Z.-H.  (2013).  Ensemble  Methods  for  Class  Imbalance Learning.  In  H.  He,  &  Y.  Ma,  Imbalanced  Learning:  Foundations, Algorithms, and Applications (pp. 61-82). New Jersey: John Wiley & Sons.

Li, Z., Jing, X., & Zhu, X. (2018). Progress on approaches to software defect prediction, 161–175. https://doi.org/10.1049/iet-sen.2017.0148.

López,  V.,  Fernández,  A.,  &  Herrera,  F.  (2014).  On  the  Importance  of  the Validation  Technique  for  Classification  with  Imbalanced  Datasets: Addressing Covariate Shift when Data is Skewed. Information Sciences, 1-13. https://doi:10.1016/j.ins.2013.09.038.

Lorena, S., Zarman, W., & Hamidah, I. (2014). Analisis Dan Penerapan Algoritma C4.5 Dalam Data Mining Untuk Memprediksi Masa Studi Mahasiswa Berdasarkan Data Nilai Akademik. Prosiding Seminar Nasional Aplikasi Sains Dan Teknologi (SNAST), (November), 263–272. https://doi.org/10.5829/idosi.weasj.2015.6.2.22162.

Ma, Y., Luo, G., Zeng, X., & Chen, A. (2012). Transfer learning for cross-company software defect prediction. Information and Software Technology, 54(3), 248–256. https://doi.org/10.1016/j.infsof.2011.09.007

Mantas, C. J., & Abellán, J. (2014). Credal-C4.5: Decision tree based on imprecise probabilities to classify noisy data. Expert Systems with Applications,41(10),4625–4637. https://doi.org/10.1016/j.eswa.2014.01.017.

McDonald, M., Musson, R., & Smith, R. (2008).  The Practical Guide to Defect Prevention. Washington: Microsoft Press.

Putri, S. A., & Frieyadie. (2017). Combining integreted sampling technique with feature selection for software defect prediction. 2017 5th International Conference on Cyber and IT Service Management, CITSM 2017, 1–6. https://doi.org/10.1109/CITSM.2017.8089264.

Saifudin, A., & Wahono, R. S. (2015). Pendekatan Level Data untuk Menangani Ketidakseimbangan Kelas pada Prediksi Cacat Software. Journal of Software Engineering Vol. 1, 1(2).

Strate, J. D., & Laplante, P. A. (2013). A literature review of research in software defect reporting. IEEE Transactions on Reliability, 62(2), 444–454. https://doi.org/10.1109/TR.2013.2259204.

Verma,  J.  P.  (2013).  Data  Analysis  in  Management  with  SPSS  Software.  New Delhi: Springer.

Wahono, R. S. (2015). A Systematic Literature Review of Software Defect Prediction : Research Trends, Datasets, Methods and Frameworks. Journal of Software Engineering, 1(1), 1–16. https://doi.org/2356-3974.

Wahono, R. S., & Suryana, N. (2013). Combining particle swarm optimization based feature selection and bagging technique for software defect prediction. International Journal of Software Engineering and Its Applications, 7(5), 153–166. https://doi.org/10.14257/ijseia.2013.7.5.16.

Wang, J., Shen, B., & Chen, Y. (2012). Compressed C4 . 5 Models for Software Defect Prediction, 2(1), 4–7. https://doi.org/10.1109/QSIC.2012.19

Wang, S., & Yao, X. (2013). Using Class Imbalance Learning for Software Defect Prediction, 62(2), 434–443.

Weiss, G. M. (2013).  Foundations of Imbalanced Learning. In H. He, & Y. Ma, Imbalanced Learning: Foundations, Algorithms, and Applications (pp. 13-41). New Jersey: John Wiley & Sons.

Wu, F., Jing, X. Y., Sun, Y., Sun, J., Huang, L., Cui, F., & Sun, Y. (2018). Cross-Project and Within-Project Semisupervised Software Defect Prediction: A Unified Approach. IEEE Transactions on Reliability, 1–17. https://doi.org/10.1109/TR.2018.2804922

Yap, B. W., Rani, K. A., Aryani, H., Rahman, A., Fong, S., Khairudin, Z., & Abdullah, N. N. (2014). An Application of Oversampling , Undersampling , Bagging and Boosting in Handling Imbalanced Datasets, 13–23. https://doi.org/10.1007/978-981-4585-18-7.

Yu, D., Hu, J., Tang, Z., Shen, H., Yang, J., & Yang, J. (2013). Neurocomputing Improving protein-ATP binding residues prediction by boosting SVMs with random under-sampling. Neurocomputing, 104, 180–190. https://doi.org/10.1016/j.neucom.2012.10.012.

Yunus, M., Dahlan, H. S., & Santoso, P. B. (2014). SPK Pemilihan Calon Pendonor Darah Potensial dengan Algoritma C4.5 dan Fuzzy Tahani. Jurnal EECCIS, Vol. 8 No.(1), 47–54.

Zhang, F., Hassan, A. E., Mcintosh, S., & Zou, Y. (2016). The Use of Summation to Aggregate Software Metrics Hinders the Performance of Defect Prediction Models, 1–16. https://doi.org/10.1109/TSE.2016.2599161.

Zhang,  D.,  Liu,  W.,  Gong,  X.,  &  Jin,  H.  (2011).  A  Novel  Improved  SMOTE Resampling  Algorithm  Based  on  Fractal.  Computational  Information Systems, 2204-2211.

Zhang,  H.,  &  Wang,  Z.  (2011).  A  Normal  Distribution-Based  Over Sampling Approach  to  Imbalanced  Data  Classification.  Advanced  Data  Mining  and Applications - 7th International Conference (pp. 83-96). Beijing: Springer.