Defect Removal Efficiency Analysis - 400 Bad Request

Defect Removal Efficiency
1
Stevens Institute of Technology
School of Systems and Enterprises
FINAL REPORT
SSW-533 Cost Estimation and Metrics
Dr. Ye Yang
Fall 2016
Defect Removal Efficiency Analysis
Constantine Davantzis
Chandra Pradyumna Adusumilli
Danielle Romanoff
https://github.com/CDavantzis/GitHub-Issues
2
Terms
Defect removal efficiency (DRE) - is a powerful metric used to measure test effectiveness. From this metric
we come to know how many bugs we found from the set of bugs which we could have found. The following is
the formula for calculating DRE. We need two inputs for calculating this metric: the number of bugs found
during development and the number of defects detected at the end user.
Summary
Problem - One of the reasons quality control doesn't work as well with software as it does with manufactured
products is that it's difficult to define meaningful quality metrics along with target values for intellectual products
such as design documents. While you might not be able to completely control the quality of software by
controlling the quality of its evolving components, it might be possible to suggest more general but still useful
quality control guidelines for the work products of software development.
The table below shows combinations of quality control factors that can lead to high, average or poor defect
removal efficiency (DRE). In order to top 98 percent in defect removal efficiency, There are at least eight forms
of testing that should take place: 1 — unit test; 2 — function test; 3 —regression test; 4 — component test; 5
— performance test; 6 —usability test; 7 — system test; 8 — acceptance or beta test. [1]
.
3
Solution - A defect found early, or before a customer finds it, has value. To the developer, the value lies in a
higher quality product that will have a better reputation, or, when the defect is found early the cost to fix it is
substantially less than if found later in the project. To the customer, the value of a defect found by the
developer is that the customer will not encounter the problems caused by the defect.
If processes, people, and tools were perfect, perhaps we could achieve "zero defects." However, defects are a
fact of life with the resources we work with at present. Another given is that defects can cause serious
problems in a project if they are not managed properly. Defect correction consumes resources and can even
cause new defects.
Defect removal efficiency is a common software development problem
For this project, we will be defining a set of metrics to:
●
Evaluate the efficiency of defect removal within a project.
●
Estimate the impact of defect removal within a project.
●
Improve defect removal efficiency within a project.
It is critical to track the defect removal efficiency because defects that exist within a software system affect the
software's quality. They cost money, delay projects, and even place lives at risk in some cases.
4
Terms
3
Summary
3
Introduction
6
Proposed Metrics
7
Results and Discussion
Data Collection
Graphs
Assignees Per Issue
Comments Per Issue
Days to Close
Issues Per Label
Issues Raised By Contributor
Issues Rates
8
8
9
9
11
13
15
19
21
Limitations
25
Conclusions
26
Reflection
27
References
28
Appendices
29
5
Introduction
In software development, lot of defects come out during the development process. This project, if done long
term, would be successful if based on the research, defects during the software development life cycle were
reduced. Software is unique in how it impacts so many other fields. In some cases to the point that without the
software, the product may not operate. Present day organizations face steep competition and are under great
pressure to produce innovative technology with few to no defects.
Defect removal effectiveness is a direct indicator of the capability of a software development process in
removing defects before the software is delivered. It is one of few, perhaps the only, process indicators that
bear a direct correlation with the quality of the software's field performance. [3][4]
How do we define a defect? Are there different levels of defect severity? Is there a specific point at which we
should have found all defects? A defect is any blemish, imperfection, or undesired behavior that occurs in the
product. There are indeed different levels of severity or impact for defects. The level is determined by the
amount of negative impact the defect has on the software. It is important to determine the level of impact of a
defect so it can be prioritized appropriately for removal
While ideally all defects should be found and software should be deployed with no defects at all, as said
earlier, defects are a fact of life. The earlier in the software life cycle we detect a defect, the less it will cost to
fix or remove it. But, there is not specific point at which all defects should be found. Unfortunately, there are
some defects that do not show up until the software is being used regularly.
The more research and analysis we have done and have on defect efficiency removal, the closer we are to
finding a way to either prevent the defects from occurring in the first place, eliminating them altogether in the
code very early on in the software development life cycle, and/or delivering products that are 100% defect free.
6
Proposed Metrics
The team collected issue data from GitHub. GitHub is an online web-based project repository home to over 10
million projects. Due to the fact that GitHub provides a way to track project issues, the team had access to a
large amount of data to analyze. The team decided to collect data from the following four open source projects,
Emby, Angular, Material Design Lite, and Youtube-DL. Emby is an open-source home media server similar to
plex which allows users to stream their own content to various devices. Angular is a framework for building
mobile and desktop web applications. Material Design Lite is a CSS framework that allows web developers to
easily design sites that follow Google’s Material Design Look. YouTube-DL is a python console application that
lets the user save YouTube, and other website, videos offline. The GQM model below shows the team's
proposed metrics.
Number of Comments Per Issue - ​The team proposed we collect this metric because we wanted to see the
level of engagement by contributors in a specific issue.
Number of Issues Raised Per Contributor - T
​ he team proposed we collect this metric because we thought it
would be useful to be able to see which contributor is the best and discovering bugs within the project.
Number of Issues Closed Per Contributor - T
​ he team proposed we collect this metric because we thought it
would be useful to be able to see which contributor is the best and resolving bugs within the project.
Time Taken for Each Issue To Be Closed​ - The team proposed we collect this metric because it would allow
us to see how long bugs stay in a system.
Number of Issues Closed And Open Per Milestone​ - The team proposed we collect milestone data because
it would allow us to see issue removal patterns associated with project milestones.
Number of Issues Per Tag​ - The team proposed we collect this metric because it would allow us to see what
types of defects are currently or have been affecting the project.
Number of Issues Per Priority​ - The team proposed we collect this metric because because we thought it
would be useful to see the severity of the defects that are currently or have been affecting the project.
7
Results and Discussion
Data Collection
In order to collect the issue data, the team created a script that interfaces with the GitHub API. The specific API
endpoint the team used has the pattern “GET /repos/:owner/:repo/issues”. Full information on using this api
endpoint is available at ​https://developer.github.com/v3/issues/#list-issues-for-a-repository.​
The team’s python script to save a project's repository data is available at our GitHub Repository at
https://github.com/CDavantzis/GitHub-Issues/blob/master/github.py.​ In order to avoid API limits our script
allowed users to authenticate, which drastically increased the rate limit. We also provided the “per_page”
parameter to the API endpoint to increase the number of issues per page from the default of 30 issues to the
maximum of 100 issues. This allowed the team to get over 3 times more issues per request. When saving the
information locally, the script combined all the pages of issues into a single list of issues, and saved them in a
JSON file. The formatted saved JSON data tends to be quite large The largest was Angular’s file having
923,777 lines of issue data and had a file size of 56.3 MB. While the size can get quite large, the file format
allowed the team to easily view the information in a text editor and later load it into Python for analysis. With
some projects having issue data this large, it was clear that using the API each time we wanted to analyze the
data would be unrealistic, and we would need to save this data locally. The team made use of a popular third
party Python module called Requests. This module allowed our code to be extremely clean and easy to
understand. Our script was able to be run from a console. We tied in the following command that allowed us to
save the data JSON file to the “data” folder in the current directory.
github.py --auth "USERNAME" "PASSWORD" --save_repo_issues "REPO_OWNER" "REPO_NAME"
Being able to run this script from the console provided our team with two benefits. The first benefit was that we
didn’t need to save our GitHub authentication data within the script. Saving this information within the file would
have been bad practice in terms of security. Saving usernames and passwords in plain text on any system is a
security concern. Contributors will also need to remember to remove their credentials from the code before
syncing to GitHub since our script is open source. Due to this script being separate from our analysis script,
this code can easily be used and expanded in the future for different purposes.
The following files are the four JSON files for the data we collected. Due to their size, they needed to be
viewed as raw text on GitHub. Depending on internet speeds and the device being used, it may be necessary
to download files before viewing them. Additionally some text editors may have trouble viewing files this large.
https://github.com/CDavantzis/GitHub-Issues/blob/master/data/angular_angular_issues_1479782810.json
https://github.com/CDavantzis/GitHub-Issues/blob/master/data/google_material-design-lite_issues_1478412302.json
https://github.com/CDavantzis/GitHub-Issues/blob/master/data/MediaBrowser_Emby_issues_1478411769.json
https://github.com/CDavantzis/GitHub-Issues/blob/master/data/rg3_youtube-dl_issues_1479788479.json
8
Graphs
The data sets and HQ images of the graphs are linked in appendices. Matplotlib was used to graph the data.
When analyzing the data we filter the issues by issues that contain a tag that has the word “bug” in it to insure
we are only analyzing defects. Without this filter we would be analyzing issues such as feature requests. We
filter the data at this stage so we can compare specifically defect to results to the whole system results if we
choose. Data sets and HQ images can also be found in the below folder on the team’s GitHub.
https://github.com/CDavantzis/GitHub-Issues/tree/master/results
Assignees Per Issue
The assignees per issue graph is a bar graph that allows the team to compare the number of closed and open
issues for a project, and separate that information based on how many assignees the issues have in GitHub.
The assignees per issue bar
graph for Angular show the
team that there are a roughly
even amount of issues that
have been assigned to no
one, as the number of issues
assigned to one person. We
can also see that there are a
larger amount of closed issues
and smaller amount of open
issues when the issue has an
assignee. The team can
conclude that issues get
resolved better when they
have been assigned to
someone.
In the project material design
lite, we can see that issues
don’t
get
assigned
to
individuals. About 67% of the
issues are closed. There has
also been significantly less
issues found in this project
compared to Angular, both
project by Google. This can be
attributed to the type of
projects they are.
9
From the above graph we can see that issues for the Emby project have all been closed, and they are never
assigned to individuals. We will see later that the fact that Emby has no open issues influences the other
graphs.
YouTube-DL is the third project that doesn't assign specific defect issues to individuals. We can see there
about 75% of the defects have been closed for this system. YouTube-DL is a program that continually needs to
adapt to other sites, so it is impressive they have this large percent of defects closed. Looking at the
repositories history one can see that there is an extremely active community keeping YouTube-DL up to date.
This activity is apparent in latter graphs.
10
Comments Per Issue
The comments per issue graph is a histogram that allows us to view the amount of issues in the project that
have a certain amount of comments. This histogram is useful for viewing contributor engagement. This
histogram separates open and closed issue to give the user a better understanding of if the issues with the
specific number of comments are open or closed.
For the Angular project we
can see that the distribution
of
comments
closely
matches an exponentially
decaying trend. A majority
of issues in this project
have an engagement of
zero to ten comments,
although there are outliers
of issues with 80 comments
or more.
For Material Design Light
the number of comments
does not match that same
clear
exponentially
decaying trend, although
there are still a larger
number of issues with less
comments. Similarly to
Angular, a majority of
issues
have
an
engagement of zero to ten
comments. The outliers for
Material Design Light are
less with 35 comments.
11
For Emby the project once again had a majority issues within the range of zero to ten comments, although this
histogram showed an inconsistent trend within that range. The greatest outlier of number of comments in emby
was 60 comments on an issue. In this graph we don’t see any open issues because all issues relating to
defects have been closed.
Like Angular, Youtube-DL showed a fairly clear exponentially decaying trend for comments per issue. Unlike
the other projects, user engagement appeared to drop off after five comments rather than ten. The greatest
outlier for number of comments on an issue was 45 for YouTube-DL.
12
Days to Close
The days to close is a histogram that shows us how long it takes to fix the issue and then close it. All four
histograms show a similar pattern. At the beginning of the each project, there were far more issues open than
at the end. In addition, on each histogram there were a huge amount of issues closed quickly. As is expected,
as the days went on, less issues remained open. As expected, toward the end of each histogram there are
almost no issues to resolve. An increase in the histogram can be attributed to issues being pushed to a later
release.
13
14
Issues Per Label
The issues per label graph is a horizontal bar graph that allows the user to see how many issues are tagged
with specific labels. Because we filtered on issues that are tagged as a bug we can see in all our graphs that
all the issues have been tagged as a bug. We split our graphs between closed and open issues to see if there
are noticeable differences in the type of labels closed and open issues have been tagged with. The names of
the labels can be seen more clearly in the full sized graphs listed in appendices.
The Angular project
breaks tags up into
various types of tags.
One type of tag is
effort. We can see
that there are more
closed issues tagged
with
easy
than
medium, and more
tagged with medium
than hard. Also there
is a scale of severity
from 1-6, and a
majority of closed
issues
have
a
severity of 3, which
stands for broken.
Open
issues
in
Angular show similar
patterns, having more
easy issues than hard
issues, and most of
the open issues have
a severity of 3.
15
Material Design Lite assigns priority to issues with the tags p0, p1, and p2. In this system zero represents the
highest priority and two represents the lowest priority. We can see in the above graphs that there are more
high priority issues than low priority issues that are closed. On the other hand when looking at open issues that
are more low priority issues than high priority issues. This show the team is working well removing the higher
priority issues at a faster rate than low priority issues.
16
There are only open issues currently in Emby, so we have not included the open issue graph. We can see from
this chart how many of the issues were completed and how many issues won’t be fixed. There are about an
even amount of issues that won't be fixed and issues that were duplicates. There is also a tag called
“no-repro”, these issues can be attributed to issues people reported but couldn’t be replicated by the
developers.
17
The team can see from the graphs for YouTube-DL that the most common type of closed issue are
external-bugs. These issues have been closed because they don’t directly involve YouTube-DL. The second
most common tag for closed issues and the most common for open issues are geo-restricted, which means the
video source is not available in the region of the end user. Once again this is not an issue that is able to be
addressed by the developers. YouTube-DL lets end users submit bugs via the github repository, so this could
explain why there are so many issues that aren’t able to be addressed by the developers.
18
Issues Raised By Contributor
Similarly to the Days to Close, the histograms here follow a very similar pattern. As expected, more people
found more issues early in the project. As time went on, there were most likely less issues to find and with less
issues, less contributors would find them. The only one that varies somewhat from the pattern and even then
not much is the Emby project. That graph appears to have waves of increased reporting and issues. It could
coincide with changes to the code or release dates. Additional data could provide some insight about that.
19
20
Issues Rates
For each project we collected data on, there are two graphs below. The first for each project is the monthly
issue rates without showing the cumulative rates. The second graph for each shows the cumulative rates. The
team separated the graphs to make it easier to view and understand the non-cumulative data. The cumulative
lines are the integrals of the corresponding arrival and removal rate lines. Because the is a function with
continuous possible numbers of issues, it theoretically could be used to find a function. The function found
could then be used as a prediction model for future projects. Subtracting the cumulative removals from the
cumulative arrivals gives the total number of defects in the system at any given time, which is shown by count.
We have included similar graphs calculated with the daily rates in our appendices.
21
22
23
24
Limitations
Software systems are intangible – they cannot see and feel them like you can with a manufactured product like
cars, keyboards, cameras. As a result, managers need documents to assess progress. This can cause
problems:
• Timing of progress deliverables may not match the time needed to complete an activity
• The need to produce documents constrains process iteration
• The time taken to review and approve documents is significant
Though the data collected was for four projects that we were not developing, these limitations could in fact
affect our results.
Many metrics have parameters such as “total number of defects” e.g. total number of requirements defects.
Clearly, we only ever know about the defects that are found. So we never know the “true” value of many of
these metrics. Further, as we find more defects, this number will increase: Hopefully, finding defects is
asymptotically over time i.e. we find fewer defects as time goes along, especially after release. So metrics that
require “total defects” type info will change over time, but hopefully converge eventually. The later in the life
cycle we compute the metric, the more meaningful the results. If and when we use these metrics, we must be
aware of this effect and account for it.
Another limitation when using data from four different projects from four different companies or organizational
data is the range in policies, practices, employees and their abilities, and their methods. Organizational data
has far more validity. Analyzing one organization reduces the number of uncontrollable factors that need to be
taken into account.
The data we collect could easily be influenced by underreporting of problems found. Depending on the work
environment, employees may feel the need to hide their mistakes. In addition, the final thing that limits
effectively analyzing this type of data is that is only tells the whole story when the project is completed.
The team had several limitations regarding data. In the process of doing additional research, it became
apparent there were great measurements we could have used for additional analysis. However, in order to use
some of those tools, we would have needed additional data from the projects. Some of that data we could have
gotten had we known earlier. Most of it however, we would have needed additional access to the code and
developers.
25
Conclusions
Using the number of defects that arrived at each time interval, we found the K value at each month. We then
averaged the K value for the project and used it to find the predicted number of defects at a certain time. With
the figures below, the projected number of defects at any month can be predicted. By far, Emby has the
smallest number of projected defects. The factors that lead to that low predicted value are unknown at the
present time.
As such, if doing additional research pertaining to predicting defects, it would be beneficial to have additional
background information. Some items to look at might be the size of the team; the abilities of the team
members; the approach the team is taking to developing the system. The list could go on. It would be
fascinating to do some research on defects found and the work environment of the development team. Or the
age of the team members. Again, that list could continue. Due to the lack of additional relevant data that could
be used in conjunction with the defect arrivals and removals, it would be difficult to predict other defect
projections. It would also be difficult to figure out a correlation between various factors.
Angular
Material Design Lite
●
Maximum Arrival is 113 defects at f(22).
●
Maximum Arrival is 32 defects at f(6).
●
Average K Value is 2440.70
●
Average K Value is 108.60
●
The predicted defects at f(34), June 1, 2017
is 59.89
●
The predicted defects at f(34), November 1,
2017 is 1.87
Emby
YouTube DL
●
Maximum Arrival is 54 defects at f(2).
●
Maximum Arrival is 23 defects at f(59).
●
Average K Value is 54.44
●
Average K Value is 1190.92
●
The predicted defects at f(34) is 0.002
●
The predicted defects at f(73), June 1, 2017
is 16.62
26
Reflection
●
●
●
●
●
●
●
●
The validity of the data is not a question for us. Data was retrieved directly from Github. Four open
source projects were accessed and data pulled directly from those projects were used.
The only limitations in this project had to do with the API. This was addressed in the code so it did not
affect the data collected.
What we could do differently: We gathered a great amount of data which isn’t in and of itself a problem.
What we could have done is gather additional data directly connected to the defect arrival and removal.
○
LOC in the same time duration as the defect arrival and removal
○
Release dates for all four projects looked at.
To further our research the team would like to explore different methods that would allow us to predict
future defect information.
The source code can be scaled to work with any automated programs with api facility.
Include the releases into the models for agile projects(the frequency of release is consistent) for better
formulation of the results.
We were able to analyze only the release branches of the repositories as other branches are not public.
If the organizations are willing to share the data of their development branches we can generate a more
detailed report on the bugs before the release.
The statistics generated were found to be extremely helpful for an organization to analyze the arrival of
defects so that they will be able to allocate necessary resources to remove them.
27
References
[1] ​Capers Jones. “Minimizing the Risk of Litigation Problems Noted in Breach of Contract Litigation.”
CrossTalk, September/October 2016, pages 4 - 10.
[2] ​"Issues." ​Issues | GitHub Developer Guide. N.p., n.d. Web.
[3] Suma V and Gopalakrishnan Nair T.R. “Defect Management Strategies in Software Development, Recent
Advances in Technologies”, Maurizio A Strangio (ED.), 2009.
[4] Stephen H.Kan. (2003) Metrics and Models in Software Quality Engineering,(2nd ed.).Pearson.
28
Appendices
Third Party Libraries - ​Third Party Python Libraries Used By Our Script
● Matplotlib
● Requests
Scripts - ​Script the team developed for our project.
● Issue Downloader
○ Downloaded Data
● Issue Analysis and Grapher
○ Results (Individual Results Linked Below)
Angular - ​Original Repository
● Assignees Per Issue: ​Data​ / ​Graph
● Comments Per Issue: ​Data​ / ​Graph
● Date To Close Issue: ​Data​ / ​Graph
● Issues Per Label: ​Data​ / ​Graph (Closed)​ / ​Graph (Open)
● Issues Raised By Contributor: ​Data​ / ​Graph
● Rates (Daily): ​Data​ / ​Graph (With Cumulative)​ / ​Graph (Without Cumulative)
● Rates (Monthly): ​Data​ / ​Graph (With Cumulative)​ / ​Graph (Without Cumulative)
Material Design Lite - ​Original Repository
● Assignees Per Issue: ​Data​ / ​Graph
● Comments Per Issue: ​Data​ / ​Graph
● Date To Close Issue: ​Data​ / ​Graph
● Issues Per Label: ​Data​ / ​Graph (Closed)​ / ​Graph (Open)
● Issues Raised By Contributor: ​Data​ / ​Graph
● Rates (Daily): ​Data​ / ​Graph (With Cumulative)​ / ​Graph (Without Cumulative)
● Rates (Monthly): ​Data​ / ​Graph (With Cumulative)​ / ​Graph (Without Cumulative)
Emby - ​Original Repository
● Assignees Per Issue: ​Data​ / ​Graph
● Comments Per Issue: ​Data​ / ​Graph
● Date To Close Issue: ​Data​ / ​Graph
● Issues Per Label: ​Data​ / ​Graph (Closed)​ / ​Graph (Open)
● Issues Raised By Contributor: ​Data​ / ​Graph
● Rates (Daily): ​Data​ / ​Graph (With Cumulative)​ / ​Graph (Without Cumulative)
● Rates (Monthly): ​Data​ / ​Graph (With Cumulative)​ / ​Graph (Without Cumulative)
YouTube-DL -​ ​Original Repository
● Assignees Per Issue: ​Data​ / ​Graph
● Comments Per Issue: ​Data​ / ​Graph
● Date To Close Issue: ​Data​ / ​Graph
● Issues Per Label: ​Data​ / ​Graph (Closed)​ / ​Graph (Open)
● Issues Raised By Contributor: ​Data​ / ​Graph
● Rates (Daily): ​Data​ / ​Graph (With Cumulative)​ / ​Graph (Without Cumulative)
● Rates (Monthly): ​Data​ / ​Graph (With Cumulative)​ / ​Graph (Without Cumulative)
29