Seminars & Colloquia

Naeem Seliya

Florida Atlantic University

"Software Quality Analysis with Limited Defect Data"

Tuesday March 22, 2005 03:00 PM
Location: 402-A, Withers NCSU Historical Campus
(Visitor parking instructions)


Abstract: Software reliability can be improved by identifying and enhancing program modules that are of poor quality, i.e. modules with software defects or faults. A software quality model is typically trained using software measurement and defect (number of faults or quality-based class label) data obtained from a previous system release or similar project. It is then applied to modules of a project currently under-development to predict their quality. However, such a supervised learning approach assumes that defect data is available for all modules in the training data. There are various practical issues in software development that limit the availability of defect data for all modules in the training data. Consequently, size of the available labeled training dataset is such that a supervised learning approach will lead to a model with poor software quality estimation, i.e. poor generalization accuracy.

The problem comprises of a training dataset consisting of a small portion of labeled modules and a large portion of unlabeled modules. We investigate semi-supervised learning for software quality estimation with limited defect data. The hypothesis is that a supervised learning scheme aided by software measurements of the unlabeled program modules will improve generalization accuracy. The semi-supervised software quality modeling scheme is based on using the Expectation Maximization (EM) algorithm in an iterative manner. Empirical studies of software measurement data obtained from two NASA software projects are used in our investigation. It is shown that generalization accuracy of a semi-supervised software quality model improves significantly as compared to a model trained on the available labeled dataset. The study also provides insight into the characteristics of software modules that remain unlabeled after the semi-supervised learning process is completed.

Short Bio: Naeem Seliya is a Ph.D. candidate in the Department of Computer Science and Engineering at Florida Atlantic University, Boca Raton, Florida. He received his M.S. in Computer Science from Florida Atlantic University in 2001. His research interests include software engineering (software quality and reliability, software measurements, software architecture and design, software safety, and software process economics); data mining and machine learning; computational intelligence; computer data security; and bioinformatics. He has published over 25 articles in referred technical conferences and journals, and is a member of the IEEE, IEEE Computer Society, and ACM.

Host: Ana Anton, Computer Science, NCSU

Back to Seminar Listings
Back to Colloquia Home Page