Characterization Of Composition Error Summary Using Machine Learning Techniques And Natural Language Processing

Mars Caroline Wibowo; Budi Raharjo

doi:10.51903/pixel.v16i1.1885

Authors

Mars Caroline Wibowo Universitas Sains dan Teknologi Komputer
Budi Raharjo Universitas Sains dan Teknologi Komputer

DOI:

https://doi.org/10.51903/pixel.v16i1.1885

Keywords:

Composition Error Summary, Machine Learning, Natural Language Processing

Abstract

As software technology becomes more complex, software maintenance costs become more expensive. In connection with this, the development of software engineering makes the software system has many Composition choices that can be adjusted to the needs of the user. Error fixing involves analyzing Error Summary and modifying code. If bug-fixing steps are made as efficiently and effectively as possible then maintenance costs can be minimal. The purpose of this research is to establish a tool of machine learning for identifying Composition Error Summary and to find out the types of special Composition choices that can be used to save costs, time, and effort. In this study, the T-test was applied to appraise the analytical implication of conduct metrics when the “F-test” was taken to the Variance’s test. Classifiers used in this study are “All words” or “AW”, “Highly Informative Words” or “H-IW”, and “Highly Informative Words plus Bigram” or “H-WB”. Identical validation and Vexed validation techniques were used to calculate the effectiveness of machine learning tools. The results of this research denote that the instrument is competent for definitive Composition Error Summary and other Composition choices for definite Error Summary. This research determines the practicality of machine learning techniques in corrective issues relevant to Error summary. The result of this study also explained that Composition/non-Composition Error Summaries have contrasting aspects that can be accomplished by machine learning devices. The advanced tool could be upgraded in some areas to create it more powerful. The array identification section of the current study has limitations, an array with different words and Composition recognition tools tend to prefer Compositions with more words, so improvements to this could implicate consideration of the semantics of Error Summary, equivalent, and use of n-grams. Also, in using the technology of machine learning and Natural Language processing some advancements to be made to the present characterization structure so for future research it is highly recommended to clear up the first’s Error Summary before operating several operations in the present study.Composition Error Summary

References

Anderson, David R., Kenneth P. Burnham, and William L. Thompson. (2000). Null hypothesis testing: problems, prevalence, and an alternative. The journal of wildlife management: 912-923.

Arellano, Andres, Edward Carney, and Mark A. Austin. (2015). Natural Language Processing of Textual Requirements. The Tenth International Conference on Systems (ICONS 2015), Barcelona, Spain.

Blei, David M., Andrew Y. Ng, and Michael I. Jordan. (2003). Latent dirichlet allocation." Journal of machine Learning research 3. 993-1022.

Briand, Lionel C., Yvan Labiche, and Xuetao Liu. (2007). Using machine learning to support debugging with tarantula." The 18th IEEE International Symposium on Software Reliability (ISSRE'07). IEEE.

Brill, Eric. (2000). Part-of-speech tagging. Handbook of natural language processing: 403-414.

Chowdhury, Gobinda G. (2003). Natural language processing. Annual review of information science and technology 37.1: 51-89.

Davis, Jesse, and Mark Goadrich. (2006). The relationship between Precision-Recall and ROC curves." Proceedings of the 23rd international conference on Machine learning. ACM, 2006.

Dommati, Sunil Joy, Ruchi Agrawal, and S. Sowmya Kamath. (2013). Error Classification: Feature Extraction and Comparison of Event Model using Naive Bayes Approach. arXiv preprint arXiv:1304.1677 (2013).

Fukumizu, Kenji, Francis R. Bach, and Michael I. Jordan. (2004). Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces." Journal of Machine Learning Research 5: 73-99.

Gawade, Trunal. (2016). Feature Extraction using Text mining." International Journal Of Emerging Technology and Computer Science 1.2.

Gegick, Michael, Pete Rotella, and Tao Xie. (2010). Identifying security Error Summary via text mining: An industrial case study." 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010). IEEE.

Gentleman, R., and V. J. Carey. (2008). Unsupervised machine learning." Bioconductor Case Studies. Springer New York. 137-157.

Glantz, Stanton A. (2002). Primer of biostatistics: 246.

Hall, Mark, et al. (2009). The WEKA data mining software: an update." ACM SIGKDD explorations newsletter 11.1 : 10-18.

Hosmer, David W., and Stanley Lemeshow. (2000). Introduction to the logistic regression model." Applied Logistic Regression, Second Edition: 1-30.

Jin, Dongpu, et al. (2014). Compositions everywhere: Implications for testing and debugging in practice." Companion Proceedings of the 36th International Conference on Software Engineering. ACM.

Kim, Dongsun, et al. (2013). Where should we fix this bug? a two-phase recommendation model. IEEE transactions on Software Engineering 39.11: 1597-1610.

Kirkby, Richard, Eibe Frank, and Peter Reutemann. (2007). WEKA Explorer User Guide for Version 3-5-6.

Kotsiantis, Sotiris B., I. Zaharakis, and P. Pintelas. (2007). Supervised machine learning: A review of classification techniques: 3-24.

Kratz, Marie, and Sidney I. Resnick. (1996). The QQ-estimator and heavy tails. Stochastic Models 12.4: 699-724.

Lamkanfi, Ahmed, et al. (2010). Predicting the severity of a reported bug. 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010). IEEE.

Lee, Changki, and Gary Geunbae Lee. (2006). Information gain and divergence-based feature selection for machine learning-based text categorization." Information processing & management 42.1: 155-165.

Liu, Ting, et al. (2005). Semantic role lableing system using ME classifier. Proceedings of the Ninth Conference on Computational Natural Language Learning. Association for Computational Linguistics.

Matter, Dominique, Adrian Kuhn, and Oscar Nierstrasz. (2009). Assigning Error Summary using a vocabulary-based expertise model of developers. 2009 6th IEEE International Working Conference on Mining Software Repositories. IEEE.

Michael Gegick, Pete Rotella, Tao Xie. (2010). Identifying Security Error Summary via Text mining: An Industry Case Study. InMining software repositories (MSR), 2010 7th IEEE working conference on 2010 May 2 (pp. 11-20). IEEE

Moore, David S. (2007). The basic practice of statistics. Vol. 2. New York: WH Freeman.

Murphy, Kevin P. (2006). Naive bayes classifiers. University of British Columbia.

Pedregosa, Fabian, et al. (2011). Scikit-learn: Machine learning in Python." Journal of Machine Learning Research 12.Oct: 2825-2830.

Powers, David Martin. (2011). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation.

Rastkar, Sarah, Gail C. Murphy, and Gabriel Murray. (2014). Automatic summarization of Error Summary. IEEE Transactions on Software Engineering40.4: 366-380.

Scuse, David, and Peter Reutemann. (2007). Weka experimenter tutorial for version 3-5-5." University of Waikato.

Sebastiani, Fabrizio. (2002). Machine learning in automated text categorization." ACM computing surveys (CSUR) 34.1: 1-47.

Smith, B. (1982). An approach to graphs of linear forms." Referencia de un trabajo no publicado), sin publicar.

Sureka, Ashish. (2012). Learning to classify Error Summary into components." International Conference on Modelling Techniques and Tools for Computer Performance Evaluation. Springer Berlin Heidelberg.

Sutton, Richard S., and Andrew G. (1998). Barto. Reinforcement learning: An introduction. Vol. 1. No. 1. Cambridge: MIT press.

Turhan, Burak, Gozde Kocak, and Ayse Bener. (2009). Data mining source code for locating software errors: A case study in telecommunication industry." Expert Systems with Applications 36.6: 9986-9990.

Van Halteren, Hans, Jakub Zavrel, and Walter Daelemans. (2001). Improving accuracy in word class tagging through the combination of machine learning systems." Computational linguistics 27.2: 199-229.

Wang, Fu, Jiazheng Xu, and Zhide Liang. )1992). Maximum Entropy Method. Textures and Microstructures 19: 55-58.

Witten, Ian H., et al. (1999). Weka: Practical machine learning tools and techniques with Java implementations.

Yin, Zuoning, et al. (2011). An empirical study on Composition errors in commercial and open source systems. Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. ACM.

Zimmermann, Thomas, Rahul Premraj, and Andreas Zeller. (2007). Predicting defects for eclipse." Predictor Models in Software Engineering, 2007. PROMISE'07: ICSE Workshops 2007. International Workshop on. IEEE.