Selecting a defect model for maintenance resource planning and software insurance Paul Li Carnegie Mellon University [email protected] Presentation Overview Creating a predictive model for software defect occurrences is a first step toward dealing with consequences of software failure in commercial software systems. Model form and parameterization of Weibull and Gamma distributions fit field defect occurrence characteristics of widely-used commercial software systems. The next step is to take use information available prior to release to estimate the fitted model parameters. The real world problem and the research problem Real World Problem Research Framework Research Problem Consequences of commercial software systems defects include costs to consumers in the form of losses associated with failures and costs to producers in the form of maintenance costs associated with repairing the underlying faults. A set of composable tools to help producers to manage and evaluate the risks and uncertainties associated with commercial software systems: defect prediction model, defect attribution method, loss model, cost to repair model. A defect prediction model that takes information available before release to estimate the number of field defects anytime after release. The fault model Fault duration: permanent (reproducible) Fault manifestation: deviation from expected behavior as perceived and reported by a user in the field. Fault source: any mistake at the code level . Granularity: clearly identified software component. Fault profile expectation: random, arbitrary, and unforeseen. The research setting 1. Determine the defect model that can best describe the field defect occurrences and derived model parameters associated with the best fitted model for each release. 2. Use information prior to release for each release to predict the best fitted model parameters. 3. Use data as it becomes available after release to adjust defect estimates. 4. Identify and incorporate additional predictors to improve predictions. Previous works Recall from previous talks that: We are look at the number of user reported defects from widely-used and multi-release commercial software systems. We think that the functional form and parameterization of Gamma and Weibull models make them better suited to describe the ramping up characteristics seen in the commercial systems. Comparing the fit of defect models There is a set of parameterized defect model classes each having its own form and parameterization. We select commonly accepted classes of models: Exponential, Gamma, Weibull, Power, and Logarithmic We find the model parameters for each class of models that best fits actual field defect data for releases of two widely-used and multi-release commercial software system and compare the fits. A middleware Occurences MW R1 Defect Occurences Acutal Occurences Time A middleware Occurences MW R1 Defect Occurences Acutal Occurences Exponential Modeled Occurences Time A middleware MW R1 Defect Occurences Occurences Acutal Occurences Exponential Modeled Occurences Weibull Modeled Occurences Time A middleware MW R1 Defect Occurences Occurences Acutal Occurences Exponential Modeled Occurences Weibull Modeled Occurences Gamma Modeled Occurences Time A middleware MW R1 Defect Occurences Acutal Occurences Occurences Exponential Modeled Occurences Weibull Modeled Occurences Gamma Modeled Occurences Power Modeled Occurences Time A middleware MW R1 Defect Occurences Acutal Occurences Occurences Exponential Modeled Occurences Weibull Modeled Occurences Gamma Modeled Occurences Power Modeled Occurences Time Logarithmic Modeled Occurences An operating system OS R3 Defect Occurences Occurences Occurences OS R1 Defect Occurences Time Time OS R2 Defect Occurences Occurences Occurences OS R4 Defect Occurences Time Time Difference in estimates Sum absolute difference between best fit model estimates in each model class and actual defect occurrences for OS and Middleware Model Exponential Weibull Gamma Power Logarithmic Release Occurences OS R1 Defect Occurences OS R1 115 69 83 144 127 OS R2 58 44 43 91 73 OS R3 216 151 184 361 263 OS R4 58 57 75 87 70 MW R1 69 52 52 88 79 Time Occurences OS R2 Defect Occurences Time Occurences OS R3 Defect Occurences Time Occurences OS R4 Defect Occurences Time Occurences MW R1 Defect Occurences Time Variance in parameter values Release Percentage deviation from the mean in OS model Parameter OS R1 OS R2 OS R3 OS R4 Model Exponential: N(1 - exp (- t/ beta) ) Exponential N 36% Exponential Beta 39% 51% 9% 121% 16% 34% 13% 51% 3% 26% 123% 3% 34% 34% 10% 44% 51% 3% 123% 13% 34% 13% 42% 8% 106% 8% 13% 8% 28% 54% 33% 107% 44% 41% Weibull: N(1 - exp (- (t^alpha)/beta) Weibull N Weibull Alpha Weibull Beta 39% 17% 104% Gamma: alpha(1 - (1+t/beta) * exp (- t/beta) ) Gamma Alpha 38% Gamma Beta 29% Power: alpha (t^beta) Power Alpha Power Beta 51% 8% Logarithmic: ln(t/alpha +1) * beta Log Alpha Log Beta 104% 12% Variance in parameter values 2 Percentage deviation from the mean (OS Avg and Middleware) in model Parameter System OS (Average) MW 10% 36% 10% 36% 18% 2% 30% 18% 2% 30% Model Exponential: N(1 - exp (- t/ beta) ) Exponential N Exponential Beta Weibull: N(1 - exp (- (t^alpha)/beta) Weibull N Weibull Alpha Weibull Beta Gamma: alpha(1 - (1+t/beta) * exp (- t/beta) ) Gamma Alpha Gamma Beta 18% 23% 18% 23% 42% 10% 42% 10% 52% 12% 52% 12% Power: alpha (t^beta) Power Alpha Power Beta Logarithmic: ln(t/alpha +1) * beta Log Alpha Log Beta Validity of results External Validity Real widely-used (>1000 users) multi-release commercial software system. From one software producing organization. Internal Validity Best currently available models. Likelihood maximization using Non-homogenous poison process mathematical fitting procedure. Fitted using grid search process. For releases in late stages of release-life. The next step We have a parameter values for the best fitting model for each class of models. We have limited pre-release information for each release. Determine how well the pre-release information can predict the best fitted parameter values. The research problem and the real world Real World Solution Research Framework Research Solution • Software consumers can select the software product that meets their risk profiles and buy insurance to hedge risks. • Software producers can allocate the appropriate amount of maintenance resources and make informed decision during development. • A policy tool based on insurance rates to influence and encourage engineering/development decisions. Together with other research pieces we can produce products like: software insurance, maintenance resource planner, effect estimator for changes in development A defect prediction model that uses information available prior to release to estimate the number of defect occurrences in the field at any time. The End Thank you. Please send suggestions and email to [email protected]
© Copyright 2025 Paperzz