The dataset of software metrics in general are not balanced
(unbalanced).An imbalance distribution of classes and attributes that
are not relevant may decrease the performance of the model prediction
software defect, because the majority of the class predictions tend to
produce than minority class. This research uses a public dataset from
NASA (National Aeronautics and Space Administration) MDP (Metrics Data
Program) repository. This research aims to reduce the influence of class
imbalance in the dataset, so that performance can be improved in the
classification of defect prediction software. The model proposed in this
research is applying the technique feature selection with particle
swarm optimization (PSO), approaches the level of data using Random
Under Sampling (RUS) and SMOTE (Synthetic Minority Over-sampling
Technique) and (ensemble) Bagging with Naive Bayes Classifier. Research
results show that the proposed model can improve the performance of
naive bayes of the overall value of the AUC reached > 0.8.
Statistical tests indicate that there is a significant difference
between a naive bayes model with the model proposed by the p value
(0.043) smaller than the alpha values (0.05) which means there is a
significant difference between the two models.
Jurnal
Arora, I., Tetarwal, V., & Saha, A. (2015). Open Issues in
Software Defect Prediction. Procedia Computer Science, Volume 46, p.
906-912.
Jones, C. (2013). Software Defect Origins and Removal Methods. Namcook Analytics.
Wahono, R. S. (2015). A Systematic Literature Review of Software Defect
Prediction: Research Trends, Datasets, Methods and Frameworks. Journal
of Software Engineering, 1-16.
Laradji, I. H., Alshayeb, M., & Ghouti, L. (2015). Software Defect
Prediction Using Ensemble Learning on Selected Features. Information and
Software Technology, 388-402.
Yap, B. W., Rani, K. A., Rahman, H. A., Fong, S., Khairudin, Z., &
Abdullah, N. N. (2014). An Application of Oversampling, Undersampling,
Bagging and Boosting in Handling Imbalanced Datasets. Proceedings of the
First International Conference on Advanced Data and Information
Engineering (DaEng-2013). 285, pp. 13-22. Singapore: Springer.
doi:10.1007/978-9814585-18-7_2
Wahono, R. S., & Suryana, N. (2013). Combining Particle Swarm
Optimization based Feature Selection and Bagging Technique for Software
Defect. IJSEIA, 153-166.
Wahono, R. S., Suryana, N., & Ahmad, S. (2014). Metaheuristic
Optimization based Feature Selection for Software Defect Prediction.
Journal of Software, 1324-1333.
Putri, S. A. & Frieyadie (2017). Combining Integreted Sampling
Technique With Feature Selection For Software Defect Prediction, 2017
5th International Conference on Cyber and IT Service Management (CITSM),
Denpasar, 2017, pp. 1-6. doi: 10.1109/CITSM.2017.8089264
Putri S. A. and Wahono R. S. (2015). Integrasi SMOTE dan Information
Gain pada Naive Bayes untuk Prediksi Cacat Software. Journal Software
Engineering, vol. 1, no. 2, pp. 86–91.
Cong jin & Shu-Wei Jin. (2015). Prediction approach of software
fault-proneness based on hybrid artificial neural network and quantum
particle swarm optimization. Applied Soft
Alfaro, E., Gamez, M., & García, N. (2013). adabag: An R Package for
Classification with Boosting and Bagging. Journal of Statistical
Software, 54(2), 1 - 35.
Jain, M., & Richariya, V. (2012). An Improved Techniques Based on
Naive Bayesian for Attack Detection. International Journal of Emerging
Technology and Advanced Engineering, 2(1), 324-33