Currently the high quality of software is more increasing. The quality of a software is usually measured by the number of defects in the product. To find and fix software defects at the testing stage is expensive and time consuming. However, until now there is no predictive model of software defect that is generally accepted because almost all classification algorithms show very poor performance when working on data with an unbalanced class. The K-Nearest Neighbor (k-NN) method is one of the most popular and widely applied methods for building predictive Software defect models but is not equipped with the ability to handle class imbalance issues resulting in low accuracy. This study aims to apply the resampling method of Random Walk Over-Sampling (RWO-S) to overcome the problem which will be compared with Random Over Sampling (ROS). The experiments performed were to compare the results obtained were k-NN without resampling and k-NN integrated resampling (ROS and RWO-S). The results shown that the application of resampling will increase the value of accuracy, sensitivity, precision, and the value of AUC to k-NN. The proposed model in this study, RWO-S and k-NN is better than the other models result.
Aggarwal, C. C. (2015). Data Mining : The Textbook. New York, USA: Springer International Publishing Switzerland. doi:10.1007/978-3-319-14142-81
Attenberg, J., & Ertekin, S. (2013). Class Imbalance and Active Learning. In H. He, & Y. Ma, Imbalanced Learning: Foundations, Algorithms, and Applications (pp. 101-149). New Jersey: John Wiley & Sons.
Beckmann, M., Ebecken, N. F., & Beatriz, P. S. (2015, November 11). A KNN Undersampling Approach for Data Balancing. Journal of Intelligent Learning Systems and Applications, 104-116. doi:http://dx.doi.org/10.4236/jilsa.2015.74010
Berntsson, M., Hansson, J., Olsson, B., & Lundell, B. (2008). A Guide for Students in Computer Science and Information Systems. London: Springer Verlag London.
Berry, M. J., & Linoff, G. S. (2011). Data Mining Techniques. Canada: Wiley Publishing, Inc.
Bowes, D. H. (2013). Factors Affecting The Performance of Trainable Models For Software Defect Prediction. Britania Raya, Inggris: University of Hertfordshire.
Bowes, D., Hall, T., & Gray, D. (2013). Comparing the Performance of Fault Prediction Models Which Report Multiple Performance Measures: Recomputing the Confusion Matrix. In D. H. Bowes, Factors Affecting The Performance of Trainable Models For Software Defect Prediction (pp. 101-108). England: University of Hertfordshire.
Dawson, C. W. (2009). Project in Computing and Information Systems. England: Pearson Education.
Gorunescu, F. (2011). Data Mining Concepts,Models and Techniques (gorun@umfcv.ro ed.). Verlag Berlin, Germany: Springer-Verlag Berlin Heidelberg. doi:10.1007/978-3-642-19721-5
Hall, T., & Bowes, D. (2013). The State of Machine Learning Methodology in Software Fault Prediction. In D. H. BOWES, Factors Affecting The Performance of Trainable Models For Software Defect Prediction (pp. 113-118). University of Hertfordshire in partial fulfilment.
Hall, T., Beecham, S., Bowes, D., Gray, D., & Counsell, S. (2012). A Systematic Literature Review on Fault Prediction Performance in Software Engineering. IEEE Transactions on Software Engineering, 1276-1304.
Han, J., Kamber, M., & Pei, J. (2012). Data Mining Concepts and Techniques. Waltham, USA: Elsevier Inc. Retrieved from www.mkp.com or www.elsevierdirect.com
He, H., & Ma, Y. (2014). Imbalanced Learning. In H. He, & Y. Ma, Foundations, Algorithms, and Applications (p. 23). Canada: IEEE Press Wiley.
Hoens, T. R., & Chawla, N. V. (2014). Imbalanced Datasets : From Sampling To Classifier. In H. He, & Y. Ma, Imbalanced Learning Foundations, Algorithms, and Applications (pp. 43-57). Notre Dame, USA: The University.
Irawan, E., & Wahono, R. S. (2015, Desember 2). Penggunaan Random Under Sampling untuk Penanganan Ketidakseimbangan Kelas pada Prediksi Cacat Software Berbasis Neural Network. Journal of Software Engineering, 1, 92-100. Retrieved from http://journal.ilmukomputer.org
Jian, C., Gao, J., & Ao, Y. (2016). A New Sampling Method for Classifying Imbalanced Data Based on Support Vector Ensemble. Neurocomputing, 1-21.
Jones, C., & Bonsignour, O. (2012). The Economic of Software Quality. Boston: Pearson Education, Inc.
Lopez, V., Fernández, A., & Herrera, F. (2014). On the importance of The Validation Technique for Classification With Imbalanced Datasets: Addressing Covariate Shift When Data is Skewed. Information Sciences Elsevier, 1-13. doi:https://doi.org/10.1016/j.ins.2013.09.038
M.Novikov, A., & Novikov, D. A. (2013). Research Methodology From Philoshophy of Science to Research Design. Grove City, USA: CRC Press.
Ozturk, M. M. (2017). Which Type of Metrics are Useful to Deal with Class Imbalance in Software Defect Prediction. Information and Software Technology, 1-13.
Rianto, H., & Wahono, R. S. (2015). Resampling Logistic Regression untuk Penanganan Ketidakseimbangan Class pada Prediksi Cacat Software. Software Engineering, 46-53.
Sahu, P. K. (2013). New Delhi, India: Springer India.
Saifudin, A. (2014, Desember 2). Pendekatan Level Data dan Algoritma untuk Penanganan Ketidakseimbangan Kelas Pada Prediksi Cacat Software Berbasis Naive Bayes. Retrieved from ResearchGate: https://www.researchgate.net/publication/291339307
Saifudin, A., & Wahono, R. S. (2015). Penerapan Teknik Ensemble untuk Menangani Ketidakseimbangan Kelas pada Prediksi Cacat Software. Journal of Software Engineering, 28-37.
Shepperd, M. J., Sun, Z., Song, Q., & Mair, C. (2013, September). Data Quality: Some Comments on the NASA Software Defect Datasets. IEEE Transactions on Software Engineering, 1-14. doi:10.1109/TSE.2013.11
Shuai, B., Li, H., Zhang, Q., Li, M., & Tang, C. (2013). Software Defect Prediction Using Dynamic Support Vector Machine. Proceeding of 9th International Conference on Computational Intelligence and Security, 260-264.
Siers, M. J., & Islam, M. Z. (2015). Software Defect Prediction Using A Cost Sensitive Decision Forest And Voting And A Potential Solution To The Class Imbalance Problem. Information Systems, 1-10.
Siringoringo, R. (2017, Januari 1). Ingtegrasi Metode Resampling dan K-Nearest Neighbor Pada Prediksi Cacat Software Aplikasi Android. Jurnal ISD, 2, 47-58.
Tay, B., Hyun, J. K., & Sejong, O. (2014). A Machine Learning Approach for Specification of Spinal Cord Injuries Using Fractional Anisotropy Values Obtained from Diffusion Tensor Images. Computational and Mathematical Methods in Medicine, 1-8.
Thanathamathee, P., & Lursinsap, C. (2013). Handling Imbalanced Data Sets with Synthetic Boundary Data Generation Using Bootstrap Re-sampling and AdaBoost Techniques. Pattern Recognition Letters, 1-36. doi:http://dx.doi.org/10.1016/j.patrec.2013.04.019
Vluymans, S., Triguero, I., Cornelis, C., & Saeys, Y. (2016). EPRENNID: An Evolutionary Prototype Reduction Based Ensemble for Nearest Neighbor Classification of Imbalanced Data. Neurocomputing, 1-15.
Wahono, R. S. (2015). A Systematic Literature Review of Sotware Defect Prediction: Research Trends, Datasets, Methods and Framework. Software Engineering, 1.
Wahono, R. S., Suryana, N., & Ahmad, S. (2014). Metaheuristic Optimization based Feature Selection for Software Defect Prediction. Journal of Software, 1324-1333.
Wang, S., & Yao, X. (2013). Using Class Imbalance Learning for Software Defect Prediction. IEEE Transactions on Reliability, 434-443.
Witten, I. H., Frank, E., & Hall, M. A. (2011). Data Mining Practical Machine Learning Tools and Techniques. Burlington, USA: Elsevier Inc.
Yap, B. W., Rani, K. A., Rahman, H. A., Fong, S., Khairudin, Z., & Abdullah, N. N. (2014). An Application of Oversampling, Undersampling, Bagging and Boosting in Handling Imbalanced Datasets. Proceedings of the First International Conference on Advanced Data and Information Engineering, 13-22. doi:10.1007/978-981-4585-18-7_2
Zhang, H., & Li, M. (2014). RWO-Sampling: A Random Walk Over-Sampling Approach to Imbalanced. Information Fusion, 99-116.
Zhang, X., Li, Y., Kotagiri, R., Wu, L., & Tari, Z. (2016, Agustus 26). KRNN: k Rare-Class Nearest Neighbour Classification. Pattern Recognition (Elsevier), 33-44.