Dataset yang ada pada umumnya yang digunakan dalam memprediksi cacat perangkat lunak cenderung memiliki kelas yang tidak sama (class imbalance), hal ini dapat menurunkan kinerja model prediksi. Pada penelitian ini mengusulkan penerapan metode klasifikasi Random Forest dengan pendekatan level data Random Over Sampling (ROS) dan SMOTE agar model pengklasifikasi lebih optimal. Hasil penelitian didapatkan bahwa teknik pendekatan level data mampu menangani ketidakseimbangan pada dataset dengan menghasilkan nilai akurasi yang sangat baik. Dengan nilai akurasi terbaik didapatkan pada model Random Forest yang telah dioptimasi dengan resampling data ROS, dengan nilai akurasi rata-rata sebesar 0.936 untuk RF+ROS, RF+SMOTE sebesar 0.907, RF sebesar 0.869. Untuk nilai rata-rata AUC model klasifikasi Random Forest yang dioptimasi dengan ROS dan SMOTE cenderung mengalami kenaikan yang sangat baik (excellent), dengan nilai AUC rata-rata sebesar 0.980 untuk RF+ROS dan RF+SMOTE sebesar 0.961, sedangkan dengan RF sebesar 0.806.
Agarwal, S., & Tomar, D. (2014). A Feature Selection Based Model for Software Defect Prediction, 65, 39–58.
Aleem, S., Capretz, L. F., & Ahmed, F. (2015). BENCHMARKING MACHINE LEARNING TECHNIQUES FOR SOFTWARE DEFECT DETECTION, 6(3), 11–23.
Aliady, H., Tuasikal, N. J., Widodo, E., Statistika, P. S., Indonesia, U. I., Statistika, P. S., … Forest, R. (2018). Implementasi Support Vector Machine ( Svm ) Dan Random Forest, 2018(Sentika), 23–24.
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research.
Chhabra, J. K., & Gupta, V. (2010). A survey of dynamic software metrics. Journal of Computer Science and Technology, 25(5), 1016–1029. https://doi.org/10.1007/s11390-010-9384-3
Elish, K. O., & Elish, M. O. (2008). Predicting defect-prone software modules using support vector machines, 81, 649–660. https://doi.org/10.1016/j.jss.2007.07.040
Fenton, N. E., & Neil, M. (1999). Software metrics: successes, failures and new directions. Journal of Systems and Software. https://doi.org/10.1016/S0164-1212(99)00035-7
Fenton, N. E., Neil, M., & Square, N. (2005). A critique of software defect prediction models. Series on Software Engineering and Knowledge Engineering, 16(5), 72.
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., & Herrera, F. (2012). A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews, 42(4), 463–484. https://doi.org/10.1109/TSMCC.2011.2161285
Gao, K., Khoshgoftaar, T. M., Wang, H., & Seliya, N. (2011). Choosing software metrics for defect prediction : an investigation on feature selection techniques, 579–606. https://doi.org/10.1002/spe
Gao, K., & Raton, B. (n.d.). Combining Feature Subset Selection and Data Sampling for Coping with Highly Imbalanced Software Data.
Gayatri, N., Nickolas, S., Reddy, A. V., & Chitra, R. (2009). Performance analysis of data mining algorithms for software quality prediction. In ARTCom 2009 - International Conference on Advances in Recent Technologies in Communication and Computing. https://doi.org/10.1109/ARTCom.2009.12
Hall, T., Beecham, S., Bowes, D., Gray, D., & Counsell, S. (2012). A Systematic Literature Review on Fault Prediction Performance in Software Engineering, 38(6), 1276–1304.
Lessmann, S., Member, S., Baesens, B., Mues, C., & Pietsch, S. (2008). Benchmarking Classification Models for Software Defect Prediction : A Proposed Framework and Novel Findings, 34(4), 485–496.
Rendra Dwi Lingga P, Chastine Fatichah, D. P. (2017). Deteksi Gempa Berdasarkan Data Twitter. Jurnal Teknik Its, 6(1), 159–162.
Rodriguez, D., Herraiz, I., Harrison, R., Dolado, J., & Riquelme, J. C. (2014). Preliminary comparison of techniques for dealing with imbalance in software defect prediction, 1–10. https://doi.org/10.1145/2601248.2601294
Seiffert, C., Khoshgoftaar, T. M., Van Hulse, J., & Folleco, A. (2014). An empirical study of the classification performance of learners on imbalanced and noisy software quality data. Information Sciences, 259, 571–595. https://doi.org/10.1016/j.ins.2010.12.016
Shatnawi, R. (2016). An Empirical Investigation of Predicting Fault Count , Fix Cost and Effort Using Software Metrics, (February). https://doi.org/10.14569/IJACSA.2016.070264
Stensrud, M. E., & Shepperd, S. M. (2009). Reliability and Validity in Comparative Studies of Software Prediction Models, 31(May 2005), 380–391.
Sun, Y., Kamel, M. S., Wong, A. K. C., & Wang, Y. (2007). Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition, 40(12), 3358–3378. https://doi.org/10.1016/j.patcog.2007.04.009
Sun, Z., Mair, C., Song, Q., & Shepperd, M. (2013). Data Quality: Some Comments on the NASA Software Defect Datasets. IEEE Transactions on Software Engineering, 39(9), 1208–1215. https://doi.org/10.1109/TSE.2013.11
Tao, W., Wei-hua, L. I., & Overview, A. (2010). Naïve Bayes Software Defect Prediction Model, (2006), 0–3.
Wahono, R. S. (2015). A Systematic Literature Review of Software Defect Prediction : Research Trends , Datasets , Methods and Frameworks, 1(1).
Wang, S., & Yao, X. (2013). Using class imbalance learning for software defect prediction. IEEE Transactions on Reliability, 62(2), 434–443. https://doi.org/10.1109/TR.2013.2259203
Zhang, D., Liu, W., Gong, X., & Jin, H. (2011). A novel improved SMOTE resampling algorithm based on fractal. Journal of Computational Information ….
Zhang, Harry, & Su, J. (2004). Naive Bayesian Classifiers for Ranking, 501–512.
Zhang, Huaxiang, & Wang, Z. (2011). A normal distribution-based over-sampling approach to imbalanced data classification. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 7120 LNAI(PART 1), 83–96. https://doi.org/10.1007/978-3-642-25853-4_7