Defect Removal Efficiency 1 Stevens Institute of Technology School of Systems and Enterprises FINAL REPORT SSW-533 Cost Estimation and Metrics Dr. Ye Yang Fall 2016 Defect Removal Efficiency Analysis Constantine Davantzis Chandra Pradyumna Adusumilli Danielle Romanoff https://github.com/CDavantzis/GitHub-Issues 2 Terms Defect removal efficiency (DRE) - is a powerful metric used to measure test effectiveness. From this metric we come to know how many bugs we found from the set of bugs which we could have found. The following is the formula for calculating DRE. We need two inputs for calculating this metric: the number of bugs found during development and the number of defects detected at the end user. Summary Problem - One of the reasons quality control doesn't work as well with software as it does with manufactured products is that it's difficult to define meaningful quality metrics along with target values for intellectual products such as design documents. While you might not be able to completely control the quality of software by controlling the quality of its evolving components, it might be possible to suggest more general but still useful quality control guidelines for the work products of software development. The table below shows combinations of quality control factors that can lead to high, average or poor defect removal efficiency (DRE). In order to top 98 percent in defect removal efficiency, There are at least eight forms of testing that should take place: 1 — unit test; 2 — function test; 3 —regression test; 4 — component test; 5 — performance test; 6 —usability test; 7 — system test; 8 — acceptance or beta test. [1] . 3 Solution - A defect found early, or before a customer finds it, has value. To the developer, the value lies in a higher quality product that will have a better reputation, or, when the defect is found early the cost to fix it is substantially less than if found later in the project. To the customer, the value of a defect found by the developer is that the customer will not encounter the problems caused by the defect. If processes, people, and tools were perfect, perhaps we could achieve "zero defects." However, defects are a fact of life with the resources we work with at present. Another given is that defects can cause serious problems in a project if they are not managed properly. Defect correction consumes resources and can even cause new defects. Defect removal efficiency is a common software development problem For this project, we will be defining a set of metrics to: ● Evaluate the efficiency of defect removal within a project. ● Estimate the impact of defect removal within a project. ● Improve defect removal efficiency within a project. It is critical to track the defect removal efficiency because defects that exist within a software system affect the software's quality. They cost money, delay projects, and even place lives at risk in some cases. 4 Terms 3 Summary 3 Introduction 6 Proposed Metrics 7 Results and Discussion Data Collection Graphs Assignees Per Issue Comments Per Issue Days to Close Issues Per Label Issues Raised By Contributor Issues Rates 8 8 9 9 11 13 15 19 21 Limitations 25 Conclusions 26 Reflection 27 References 28 Appendices 29 5 Introduction In software development, lot of defects come out during the development process. This project, if done long term, would be successful if based on the research, defects during the software development life cycle were reduced. Software is unique in how it impacts so many other fields. In some cases to the point that without the software, the product may not operate. Present day organizations face steep competition and are under great pressure to produce innovative technology with few to no defects. Defect removal effectiveness is a direct indicator of the capability of a software development process in removing defects before the software is delivered. It is one of few, perhaps the only, process indicators that bear a direct correlation with the quality of the software's field performance. [3][4] How do we define a defect? Are there different levels of defect severity? Is there a specific point at which we should have found all defects? A defect is any blemish, imperfection, or undesired behavior that occurs in the product. There are indeed different levels of severity or impact for defects. The level is determined by the amount of negative impact the defect has on the software. It is important to determine the level of impact of a defect so it can be prioritized appropriately for removal While ideally all defects should be found and software should be deployed with no defects at all, as said earlier, defects are a fact of life. The earlier in the software life cycle we detect a defect, the less it will cost to fix or remove it. But, there is not specific point at which all defects should be found. Unfortunately, there are some defects that do not show up until the software is being used regularly. The more research and analysis we have done and have on defect efficiency removal, the closer we are to finding a way to either prevent the defects from occurring in the first place, eliminating them altogether in the code very early on in the software development life cycle, and/or delivering products that are 100% defect free. 6 Proposed Metrics The team collected issue data from GitHub. GitHub is an online web-based project repository home to over 10 million projects. Due to the fact that GitHub provides a way to track project issues, the team had access to a large amount of data to analyze. The team decided to collect data from the following four open source projects, Emby, Angular, Material Design Lite, and Youtube-DL. Emby is an open-source home media server similar to plex which allows users to stream their own content to various devices. Angular is a framework for building mobile and desktop web applications. Material Design Lite is a CSS framework that allows web developers to easily design sites that follow Google’s Material Design Look. YouTube-DL is a python console application that lets the user save YouTube, and other website, videos offline. The GQM model below shows the team's proposed metrics. Number of Comments Per Issue - The team proposed we collect this metric because we wanted to see the level of engagement by contributors in a specific issue. Number of Issues Raised Per Contributor - T he team proposed we collect this metric because we thought it would be useful to be able to see which contributor is the best and discovering bugs within the project. Number of Issues Closed Per Contributor - T he team proposed we collect this metric because we thought it would be useful to be able to see which contributor is the best and resolving bugs within the project. Time Taken for Each Issue To Be Closed - The team proposed we collect this metric because it would allow us to see how long bugs stay in a system. Number of Issues Closed And Open Per Milestone - The team proposed we collect milestone data because it would allow us to see issue removal patterns associated with project milestones. Number of Issues Per Tag - The team proposed we collect this metric because it would allow us to see what types of defects are currently or have been affecting the project. Number of Issues Per Priority - The team proposed we collect this metric because because we thought it would be useful to see the severity of the defects that are currently or have been affecting the project. 7 Results and Discussion Data Collection In order to collect the issue data, the team created a script that interfaces with the GitHub API. The specific API endpoint the team used has the pattern “GET /repos/:owner/:repo/issues”. Full information on using this api endpoint is available at https://developer.github.com/v3/issues/#list-issues-for-a-repository. The team’s python script to save a project's repository data is available at our GitHub Repository at https://github.com/CDavantzis/GitHub-Issues/blob/master/github.py. In order to avoid API limits our script allowed users to authenticate, which drastically increased the rate limit. We also provided the “per_page” parameter to the API endpoint to increase the number of issues per page from the default of 30 issues to the maximum of 100 issues. This allowed the team to get over 3 times more issues per request. When saving the information locally, the script combined all the pages of issues into a single list of issues, and saved them in a JSON file. The formatted saved JSON data tends to be quite large The largest was Angular’s file having 923,777 lines of issue data and had a file size of 56.3 MB. While the size can get quite large, the file format allowed the team to easily view the information in a text editor and later load it into Python for analysis. With some projects having issue data this large, it was clear that using the API each time we wanted to analyze the data would be unrealistic, and we would need to save this data locally. The team made use of a popular third party Python module called Requests. This module allowed our code to be extremely clean and easy to understand. Our script was able to be run from a console. We tied in the following command that allowed us to save the data JSON file to the “data” folder in the current directory. github.py --auth "USERNAME" "PASSWORD" --save_repo_issues "REPO_OWNER" "REPO_NAME" Being able to run this script from the console provided our team with two benefits. The first benefit was that we didn’t need to save our GitHub authentication data within the script. Saving this information within the file would have been bad practice in terms of security. Saving usernames and passwords in plain text on any system is a security concern. Contributors will also need to remember to remove their credentials from the code before syncing to GitHub since our script is open source. Due to this script being separate from our analysis script, this code can easily be used and expanded in the future for different purposes. The following files are the four JSON files for the data we collected. Due to their size, they needed to be viewed as raw text on GitHub. Depending on internet speeds and the device being used, it may be necessary to download files before viewing them. Additionally some text editors may have trouble viewing files this large. https://github.com/CDavantzis/GitHub-Issues/blob/master/data/angular_angular_issues_1479782810.json https://github.com/CDavantzis/GitHub-Issues/blob/master/data/google_material-design-lite_issues_1478412302.json https://github.com/CDavantzis/GitHub-Issues/blob/master/data/MediaBrowser_Emby_issues_1478411769.json https://github.com/CDavantzis/GitHub-Issues/blob/master/data/rg3_youtube-dl_issues_1479788479.json 8 Graphs The data sets and HQ images of the graphs are linked in appendices. Matplotlib was used to graph the data. When analyzing the data we filter the issues by issues that contain a tag that has the word “bug” in it to insure we are only analyzing defects. Without this filter we would be analyzing issues such as feature requests. We filter the data at this stage so we can compare specifically defect to results to the whole system results if we choose. Data sets and HQ images can also be found in the below folder on the team’s GitHub. https://github.com/CDavantzis/GitHub-Issues/tree/master/results Assignees Per Issue The assignees per issue graph is a bar graph that allows the team to compare the number of closed and open issues for a project, and separate that information based on how many assignees the issues have in GitHub. The assignees per issue bar graph for Angular show the team that there are a roughly even amount of issues that have been assigned to no one, as the number of issues assigned to one person. We can also see that there are a larger amount of closed issues and smaller amount of open issues when the issue has an assignee. The team can conclude that issues get resolved better when they have been assigned to someone. In the project material design lite, we can see that issues don’t get assigned to individuals. About 67% of the issues are closed. There has also been significantly less issues found in this project compared to Angular, both project by Google. This can be attributed to the type of projects they are. 9 From the above graph we can see that issues for the Emby project have all been closed, and they are never assigned to individuals. We will see later that the fact that Emby has no open issues influences the other graphs. YouTube-DL is the third project that doesn't assign specific defect issues to individuals. We can see there about 75% of the defects have been closed for this system. YouTube-DL is a program that continually needs to adapt to other sites, so it is impressive they have this large percent of defects closed. Looking at the repositories history one can see that there is an extremely active community keeping YouTube-DL up to date. This activity is apparent in latter graphs. 10 Comments Per Issue The comments per issue graph is a histogram that allows us to view the amount of issues in the project that have a certain amount of comments. This histogram is useful for viewing contributor engagement. This histogram separates open and closed issue to give the user a better understanding of if the issues with the specific number of comments are open or closed. For the Angular project we can see that the distribution of comments closely matches an exponentially decaying trend. A majority of issues in this project have an engagement of zero to ten comments, although there are outliers of issues with 80 comments or more. For Material Design Light the number of comments does not match that same clear exponentially decaying trend, although there are still a larger number of issues with less comments. Similarly to Angular, a majority of issues have an engagement of zero to ten comments. The outliers for Material Design Light are less with 35 comments. 11 For Emby the project once again had a majority issues within the range of zero to ten comments, although this histogram showed an inconsistent trend within that range. The greatest outlier of number of comments in emby was 60 comments on an issue. In this graph we don’t see any open issues because all issues relating to defects have been closed. Like Angular, Youtube-DL showed a fairly clear exponentially decaying trend for comments per issue. Unlike the other projects, user engagement appeared to drop off after five comments rather than ten. The greatest outlier for number of comments on an issue was 45 for YouTube-DL. 12 Days to Close The days to close is a histogram that shows us how long it takes to fix the issue and then close it. All four histograms show a similar pattern. At the beginning of the each project, there were far more issues open than at the end. In addition, on each histogram there were a huge amount of issues closed quickly. As is expected, as the days went on, less issues remained open. As expected, toward the end of each histogram there are almost no issues to resolve. An increase in the histogram can be attributed to issues being pushed to a later release. 13 14 Issues Per Label The issues per label graph is a horizontal bar graph that allows the user to see how many issues are tagged with specific labels. Because we filtered on issues that are tagged as a bug we can see in all our graphs that all the issues have been tagged as a bug. We split our graphs between closed and open issues to see if there are noticeable differences in the type of labels closed and open issues have been tagged with. The names of the labels can be seen more clearly in the full sized graphs listed in appendices. The Angular project breaks tags up into various types of tags. One type of tag is effort. We can see that there are more closed issues tagged with easy than medium, and more tagged with medium than hard. Also there is a scale of severity from 1-6, and a majority of closed issues have a severity of 3, which stands for broken. Open issues in Angular show similar patterns, having more easy issues than hard issues, and most of the open issues have a severity of 3. 15 Material Design Lite assigns priority to issues with the tags p0, p1, and p2. In this system zero represents the highest priority and two represents the lowest priority. We can see in the above graphs that there are more high priority issues than low priority issues that are closed. On the other hand when looking at open issues that are more low priority issues than high priority issues. This show the team is working well removing the higher priority issues at a faster rate than low priority issues. 16 There are only open issues currently in Emby, so we have not included the open issue graph. We can see from this chart how many of the issues were completed and how many issues won’t be fixed. There are about an even amount of issues that won't be fixed and issues that were duplicates. There is also a tag called “no-repro”, these issues can be attributed to issues people reported but couldn’t be replicated by the developers. 17 The team can see from the graphs for YouTube-DL that the most common type of closed issue are external-bugs. These issues have been closed because they don’t directly involve YouTube-DL. The second most common tag for closed issues and the most common for open issues are geo-restricted, which means the video source is not available in the region of the end user. Once again this is not an issue that is able to be addressed by the developers. YouTube-DL lets end users submit bugs via the github repository, so this could explain why there are so many issues that aren’t able to be addressed by the developers. 18 Issues Raised By Contributor Similarly to the Days to Close, the histograms here follow a very similar pattern. As expected, more people found more issues early in the project. As time went on, there were most likely less issues to find and with less issues, less contributors would find them. The only one that varies somewhat from the pattern and even then not much is the Emby project. That graph appears to have waves of increased reporting and issues. It could coincide with changes to the code or release dates. Additional data could provide some insight about that. 19 20 Issues Rates For each project we collected data on, there are two graphs below. The first for each project is the monthly issue rates without showing the cumulative rates. The second graph for each shows the cumulative rates. The team separated the graphs to make it easier to view and understand the non-cumulative data. The cumulative lines are the integrals of the corresponding arrival and removal rate lines. Because the is a function with continuous possible numbers of issues, it theoretically could be used to find a function. The function found could then be used as a prediction model for future projects. Subtracting the cumulative removals from the cumulative arrivals gives the total number of defects in the system at any given time, which is shown by count. We have included similar graphs calculated with the daily rates in our appendices. 21 22 23 24 Limitations Software systems are intangible – they cannot see and feel them like you can with a manufactured product like cars, keyboards, cameras. As a result, managers need documents to assess progress. This can cause problems: • Timing of progress deliverables may not match the time needed to complete an activity • The need to produce documents constrains process iteration • The time taken to review and approve documents is significant Though the data collected was for four projects that we were not developing, these limitations could in fact affect our results. Many metrics have parameters such as “total number of defects” e.g. total number of requirements defects. Clearly, we only ever know about the defects that are found. So we never know the “true” value of many of these metrics. Further, as we find more defects, this number will increase: Hopefully, finding defects is asymptotically over time i.e. we find fewer defects as time goes along, especially after release. So metrics that require “total defects” type info will change over time, but hopefully converge eventually. The later in the life cycle we compute the metric, the more meaningful the results. If and when we use these metrics, we must be aware of this effect and account for it. Another limitation when using data from four different projects from four different companies or organizational data is the range in policies, practices, employees and their abilities, and their methods. Organizational data has far more validity. Analyzing one organization reduces the number of uncontrollable factors that need to be taken into account. The data we collect could easily be influenced by underreporting of problems found. Depending on the work environment, employees may feel the need to hide their mistakes. In addition, the final thing that limits effectively analyzing this type of data is that is only tells the whole story when the project is completed. The team had several limitations regarding data. In the process of doing additional research, it became apparent there were great measurements we could have used for additional analysis. However, in order to use some of those tools, we would have needed additional data from the projects. Some of that data we could have gotten had we known earlier. Most of it however, we would have needed additional access to the code and developers. 25 Conclusions Using the number of defects that arrived at each time interval, we found the K value at each month. We then averaged the K value for the project and used it to find the predicted number of defects at a certain time. With the figures below, the projected number of defects at any month can be predicted. By far, Emby has the smallest number of projected defects. The factors that lead to that low predicted value are unknown at the present time. As such, if doing additional research pertaining to predicting defects, it would be beneficial to have additional background information. Some items to look at might be the size of the team; the abilities of the team members; the approach the team is taking to developing the system. The list could go on. It would be fascinating to do some research on defects found and the work environment of the development team. Or the age of the team members. Again, that list could continue. Due to the lack of additional relevant data that could be used in conjunction with the defect arrivals and removals, it would be difficult to predict other defect projections. It would also be difficult to figure out a correlation between various factors. Angular Material Design Lite ● Maximum Arrival is 113 defects at f(22). ● Maximum Arrival is 32 defects at f(6). ● Average K Value is 2440.70 ● Average K Value is 108.60 ● The predicted defects at f(34), June 1, 2017 is 59.89 ● The predicted defects at f(34), November 1, 2017 is 1.87 Emby YouTube DL ● Maximum Arrival is 54 defects at f(2). ● Maximum Arrival is 23 defects at f(59). ● Average K Value is 54.44 ● Average K Value is 1190.92 ● The predicted defects at f(34) is 0.002 ● The predicted defects at f(73), June 1, 2017 is 16.62 26 Reflection ● ● ● ● ● ● ● ● The validity of the data is not a question for us. Data was retrieved directly from Github. Four open source projects were accessed and data pulled directly from those projects were used. The only limitations in this project had to do with the API. This was addressed in the code so it did not affect the data collected. What we could do differently: We gathered a great amount of data which isn’t in and of itself a problem. What we could have done is gather additional data directly connected to the defect arrival and removal. ○ LOC in the same time duration as the defect arrival and removal ○ Release dates for all four projects looked at. To further our research the team would like to explore different methods that would allow us to predict future defect information. The source code can be scaled to work with any automated programs with api facility. Include the releases into the models for agile projects(the frequency of release is consistent) for better formulation of the results. We were able to analyze only the release branches of the repositories as other branches are not public. If the organizations are willing to share the data of their development branches we can generate a more detailed report on the bugs before the release. The statistics generated were found to be extremely helpful for an organization to analyze the arrival of defects so that they will be able to allocate necessary resources to remove them. 27 References [1] Capers Jones. “Minimizing the Risk of Litigation Problems Noted in Breach of Contract Litigation.” CrossTalk, September/October 2016, pages 4 - 10. [2] "Issues." Issues | GitHub Developer Guide. N.p., n.d. Web. [3] Suma V and Gopalakrishnan Nair T.R. “Defect Management Strategies in Software Development, Recent Advances in Technologies”, Maurizio A Strangio (ED.), 2009. [4] Stephen H.Kan. (2003) Metrics and Models in Software Quality Engineering,(2nd ed.).Pearson. 28 Appendices Third Party Libraries - Third Party Python Libraries Used By Our Script ● Matplotlib ● Requests Scripts - Script the team developed for our project. ● Issue Downloader ○ Downloaded Data ● Issue Analysis and Grapher ○ Results (Individual Results Linked Below) Angular - Original Repository ● Assignees Per Issue: Data / Graph ● Comments Per Issue: Data / Graph ● Date To Close Issue: Data / Graph ● Issues Per Label: Data / Graph (Closed) / Graph (Open) ● Issues Raised By Contributor: Data / Graph ● Rates (Daily): Data / Graph (With Cumulative) / Graph (Without Cumulative) ● Rates (Monthly): Data / Graph (With Cumulative) / Graph (Without Cumulative) Material Design Lite - Original Repository ● Assignees Per Issue: Data / Graph ● Comments Per Issue: Data / Graph ● Date To Close Issue: Data / Graph ● Issues Per Label: Data / Graph (Closed) / Graph (Open) ● Issues Raised By Contributor: Data / Graph ● Rates (Daily): Data / Graph (With Cumulative) / Graph (Without Cumulative) ● Rates (Monthly): Data / Graph (With Cumulative) / Graph (Without Cumulative) Emby - Original Repository ● Assignees Per Issue: Data / Graph ● Comments Per Issue: Data / Graph ● Date To Close Issue: Data / Graph ● Issues Per Label: Data / Graph (Closed) / Graph (Open) ● Issues Raised By Contributor: Data / Graph ● Rates (Daily): Data / Graph (With Cumulative) / Graph (Without Cumulative) ● Rates (Monthly): Data / Graph (With Cumulative) / Graph (Without Cumulative) YouTube-DL - Original Repository ● Assignees Per Issue: Data / Graph ● Comments Per Issue: Data / Graph ● Date To Close Issue: Data / Graph ● Issues Per Label: Data / Graph (Closed) / Graph (Open) ● Issues Raised By Contributor: Data / Graph ● Rates (Daily): Data / Graph (With Cumulative) / Graph (Without Cumulative) ● Rates (Monthly): Data / Graph (With Cumulative) / Graph (Without Cumulative) 29
© Copyright 2026 Paperzz