Software berkualitas tinggi adalah software yang tidak ditemukan cacat (defect) baik selama proses pemeriksaan atau pengujian. cacat software yang ditemukan pada akhir proyek secara sistematis menyebabkan penyelesaian proyek melebihi jadwal yang sudah ditentukan. Dataset NASA MDP Repository adalah software metrics yang sering digunakan pada penelitian software defect prediction. Masalah utama dalam dataset software metrics adalah imbalance class yang membuat data menjadi tidak seimbang karena data yang cacat (kelas minoritas) jumlahnya lebih sedikit dibandingkan dengan data yang tidak cacat (kelas mayoritas), masalah ini dapat menurunkan kinerja klasifikasi. Terdapat dua pendekatan yang dapat menangani masalah imbalance class yaitu pendekatan level data dan pendekatan level algoritma (ensemble), sedangkan klasifikasi adalah pendekatan yang paling populer untuk menangani masalah prediksi cacat software. Pada Penelitian ini, untuk menangani masalah imbalance class dilakukan dengan integrasi Distribution Based Balance dan Bagging berbasis classifier C4.5 dan Naïve Bayes. Hasil penelitian menunjukkan bahwa model yang diusulkan mencapai akurasi dan AUC klasifikasi yang lebih tinggi. Rata-rata akurasi 93.84%, rata-rata nilai AUC 0.939 dengan nilai rata-rata peningkatan presentase AUC mencapai 0.34. Hasil kinerja Classifier C4.5 lebih baik dibandingkan Naïve Bayes dengan rata-rata akhir akurasi dari kinerja model klasifikasi 82.42% dan AUC 0.738 lebih baik dibandingkan kinerja algoritma pembanding Naïve Bayes dengan selisih akurasi 4.4% dan selisih AUC 0.023. Model yang diusulkan merupakan model terbaik dalam penelitian prediksi cacat software untuk menangani masalah imbalance class.
Tesis
Attenberg, J., & Ertekin, S. (2013). Class Imbalance and Active Learning. In H. He, & Y. Ma, Imbalanced Learning: Foundations, Algorithms, and Applications (pp. 101-149). New Jersey: John Wiley & Sons.
Carver, R. H., & Nash, J. G. (2012). Doing Data Analysis with SPSS® Version 18.Boston: Cengage Learning.
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002).
Cheng, M., Wan, H., Wu, G., & Yuan, M. (2016). Semi-supervised software defect prediction using task-driven dictionary learning. Chinese Journal ofElectronics, 25(6),1089–1096. https://doi.org/10.1049/cje.2016.08.034.
Chiş, M. (2008). Evolutionary Decision Trees and Software Metrics for Module Defects Identification. Program, 2(2), 25–29.
Dubey, R., Zhou, J., Wang, Y., Thompson, P. M., & Ye, J. (2014). Analysis of Sampling Techniques for Imbalanced Data: An n = 648 ADNI Study. NeuroImage, 220–241.
Gao, K., Khoshgoftaar, T., & Wald , R. (2014). Combining Feature Selection and Ensemble Learning for Software Quality Estimation. Twenty-Seventh International Florida Artificial Intelligence Research society Conference (pp. 47-52).
Gorunescu, F. (2011). Data Mining: Concepts, Models and Techniques. Berlin: Springer-Verlag.
Hall, T., Beecham, S., Bowes, D., Gray, D., & Counsell, S. (2012). A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering, 38(03), 1276–1304. http://doi.org/10.1109/TSE.2011.103.
Han, J., Kamber, M., & Pei, J. (2012). Data Mining: Concepts and Techniques (3rd ed.). San Francisco: Morgan Kaufmann Publishers Inc.
Huda, S., Liu, K., Liu, S., Abdelrazek, M., Ibrahim, A., Al-dossari, H., & Ahmad, S. (2018). An ensemble oversampling model for class imbalance problem insoftwaredefectprediction,3536(c). https://doi.org/10.1109/ACCESS.2018.2817572.
Korb, K. B., & Nicholson, A. E. (2011). Bayesian Artificial Intelligence (2nd ed.).Florida: CRC Press.
Laradji, I. H., Alshayeb, M., & Ghouti, L. (2015). Software defect prediction using ensemble learning on selected features. Information and Software Technology, 58, 388–402. https://doi.org/10.1016/j.infsof.2014.07.005.
Lehtinen, T. O. A., Mäntylä, M. V., Vanhanen, J., Itkonen, J., & Lassenius, C. (2014). Perceived causes of software project failures - An analysis of their relationships. Information and Software Technology, 56(6), 623–643. https://doi.org/10.1016/j.infsof.2014.01.015.
Lessmann, S., Baesens, B., Mues, C., & Pietsch, S. (2008). Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings. IEEE Transactions on Software Engineering, 34(4), 485–496. https://doi.org/10.1109/TSE.2008.35.
Liu, X.-Y., & Zhou, Z.-H. (2013). Ensemble Methods for Class Imbalance Learning. In H. He, & Y. Ma, Imbalanced Learning: Foundations, Algorithms, and Applications (pp. 61-82). New Jersey: John Wiley & Sons.
Li, Z., Jing, X., & Zhu, X. (2018). Progress on approaches to software defect prediction, 161–175. https://doi.org/10.1049/iet-sen.2017.0148.
López, V., Fernández, A., & Herrera, F. (2014). On the Importance of the Validation Technique for Classification with Imbalanced Datasets: Addressing Covariate Shift when Data is Skewed. Information Sciences, 1-13. https://doi:10.1016/j.ins.2013.09.038.
Lorena, S., Zarman, W., & Hamidah, I. (2014). Analisis Dan Penerapan Algoritma C4.5 Dalam Data Mining Untuk Memprediksi Masa Studi Mahasiswa Berdasarkan Data Nilai Akademik. Prosiding Seminar Nasional Aplikasi Sains Dan Teknologi (SNAST), (November), 263–272. https://doi.org/10.5829/idosi.weasj.2015.6.2.22162.
Ma, Y., Luo, G., Zeng, X., & Chen, A. (2012). Transfer learning for cross-company software defect prediction. Information and Software Technology, 54(3), 248–256. https://doi.org/10.1016/j.infsof.2011.09.007
Mantas, C. J., & Abellán, J. (2014). Credal-C4.5: Decision tree based on imprecise probabilities to classify noisy data. Expert Systems with Applications,41(10),4625–4637. https://doi.org/10.1016/j.eswa.2014.01.017.
McDonald, M., Musson, R., & Smith, R. (2008). The Practical Guide to Defect Prevention. Washington: Microsoft Press.
Putri, S. A., & Frieyadie. (2017). Combining integreted sampling technique with feature selection for software defect prediction. 2017 5th International Conference on Cyber and IT Service Management, CITSM 2017, 1–6. https://doi.org/10.1109/CITSM.2017.8089264.
Saifudin, A., & Wahono, R. S. (2015). Pendekatan Level Data untuk Menangani Ketidakseimbangan Kelas pada Prediksi Cacat Software. Journal of Software Engineering Vol. 1, 1(2).
Strate, J. D., & Laplante, P. A. (2013). A literature review of research in software defect reporting. IEEE Transactions on Reliability, 62(2), 444–454. https://doi.org/10.1109/TR.2013.2259204.
Verma, J. P. (2013). Data Analysis in Management with SPSS Software. New Delhi: Springer.
Wahono, R. S. (2015). A Systematic Literature Review of Software Defect Prediction : Research Trends, Datasets, Methods and Frameworks. Journal of Software Engineering, 1(1), 1–16. https://doi.org/2356-3974.
Wahono, R. S., & Suryana, N. (2013). Combining particle swarm optimization based feature selection and bagging technique for software defect prediction. International Journal of Software Engineering and Its Applications, 7(5), 153–166. https://doi.org/10.14257/ijseia.2013.7.5.16.
Wang, J., Shen, B., & Chen, Y. (2012). Compressed C4 . 5 Models for Software Defect Prediction, 2(1), 4–7. https://doi.org/10.1109/QSIC.2012.19
Wang, S., & Yao, X. (2013). Using Class Imbalance Learning for Software Defect Prediction, 62(2), 434–443.
Weiss, G. M. (2013). Foundations of Imbalanced Learning. In H. He, & Y. Ma, Imbalanced Learning: Foundations, Algorithms, and Applications (pp. 13-41). New Jersey: John Wiley & Sons.
Wu, F., Jing, X. Y., Sun, Y., Sun, J., Huang, L., Cui, F., & Sun, Y. (2018). Cross-Project and Within-Project Semisupervised Software Defect Prediction: A Unified Approach. IEEE Transactions on Reliability, 1–17. https://doi.org/10.1109/TR.2018.2804922
Yap, B. W., Rani, K. A., Aryani, H., Rahman, A., Fong, S., Khairudin, Z., & Abdullah, N. N. (2014). An Application of Oversampling , Undersampling , Bagging and Boosting in Handling Imbalanced Datasets, 13–23. https://doi.org/10.1007/978-981-4585-18-7.
Yu, D., Hu, J., Tang, Z., Shen, H., Yang, J., & Yang, J. (2013). Neurocomputing Improving protein-ATP binding residues prediction by boosting SVMs with random under-sampling. Neurocomputing, 104, 180–190. https://doi.org/10.1016/j.neucom.2012.10.012.
Yunus, M., Dahlan, H. S., & Santoso, P. B. (2014). SPK Pemilihan Calon Pendonor Darah Potensial dengan Algoritma C4.5 dan Fuzzy Tahani. Jurnal EECCIS, Vol. 8 No.(1), 47–54.
Zhang, F., Hassan, A. E., Mcintosh, S., & Zou, Y. (2016). The Use of Summation to Aggregate Software Metrics Hinders the Performance of Defect Prediction Models, 1–16. https://doi.org/10.1109/TSE.2016.2599161.
Zhang, D., Liu, W., Gong, X., & Jin, H. (2011). A Novel Improved SMOTE Resampling Algorithm Based on Fractal. Computational Information Systems, 2204-2211.
Zhang, H., & Wang, Z. (2011). A Normal Distribution-Based Over Sampling Approach to Imbalanced Data Classification. Advanced Data Mining and Applications - 7th International Conference (pp. 83-96). Beijing: Springer.