CS 5150 17 Deciding whether to Fix a Bug: Bugs and Features

CS 5150
Software Engineering
Lecture 21
Reliability 3
CS 5150
1
Administration
Final presentations
Sign up for your presentations now.
CS 5150
2
Failures and Faults
Failure: Software does not deliver the service expected by
the user (e.g., mistake in requirements, confusing user
interface)
Fault (BUG): Programming or design error whereby the
delivered system does not conform to specification (e.g.,
coding error, interface error)
CS 5150
3
Failure of Requirements
An actual example
•
The head of an organization is not paid his salary because it is
greater than the maximum allowed by the program.
(Requirements problem.)
CS 5150
4
Terminology
Fault avoidance
Build systems with the objective of creating faultfree (bug-free) software
Fault tolerance
Build systems that continue to operate when faults
(bugs) occur
Fault detection (testing and validation)
Detect faults (bugs) before the system is put into
operation or when discovered after release.
CS 5150
5
Defensive Programming
Murphy's Law:
If anything can go wrong, it will.
Defensive Programming:
• Redundant code is incorporated to check system state
after modifications.
• Implicit assumptions are tested explicitly.
• Risky programming constructs are avoided.
CS 5150
6
Fault Tolerance
Aim: A system that continues to operate when problems occur.
Examples:
• Invalid input data (e.g., in a data processing application)
• Overload (e.g., in a networked system)
• Hardware failure (e.g., in a control system)
General Approach:
•
•
•
•
Failure detection
Damage assessment
Fault recovery
Fault repair
CS 5150
7
Fault Tolerance: Recovery
Backward Recovery
•
Record system state at specific events (checkpoints). After
failure, recreate state at last checkpoint.
•
Combine checkpoints with system log (audit trail of
transactions) that allows transactions from last checkpoint to
be repeated automatically.
Recovery Software is Difficult to Test
Example
After an entire network is hit by lightning, the restart crashes
because of overload. (Problem of incremental growth.)
CS 5150
8
Fixing Bugs
Isolate the bug
Intermittent --> repeatable
Complex example --> simple example
Understand the bug and its context
Root cause
Dependencies
Structural interactions
Fix the bug
Design changes
Documentation changes
Code changes
CS 5150
9
Moving the Bugs Around
Fixing bugs is an error-prone process!
• When you fix a bug, fix its environment
• Bug fixes need static and dynamic testing
• Repeat all tests that have the slightest relevance
(regression testing)
Bugs have a habit of returning!
• When a bug is fixed, add the failure case to the
test suite for future regression testing.
Persistence
Most people work around problems. The best
people track them down and fix them!
CS 5150
10
Difficult Bugs
Some bugs may be extremely difficult to track down and
isolate. This is particularly true of intermittent failures.
•
A large central computer stops a few times every
month with no dump or other diagnostic.
•
A database load dies after running for several days
with no diagnostics.
•
An image processing system runs correctly, but uses
huge amounts of memory.
Such problems may require months of effort to track down.
For a fictional example, see: Ellen Ullman, The Bug: a
Novel, (Doubleday 2003).
CS 5150
11
Tracking Down a Difficult Bugs: The Heisenbug
CS 5150
12
Tracking Down a Difficult Bugs: Make3D
cv::fastmalloc
Memory usage by function
CS 5150
13
Bugs in System Software
Even system software from good manufacturers may
contain bugs:
•
0)
Built-in function in Fortran run-time environment (e0 =
•
The string-to-number function that was very slow with
integers
•
CS 5150
The preload system with the memory leak
14
Bugs in Hardware
Three times in my career I have encountered hardware bugs:
•
The microfilm plotter with the missing byte (1:1023)
•
Microcode for virtual memory management
•
The Sun page fault that IBM paid to fix
Each problem was actually a bug in embedded software/firmware
CS 5150
15
Deciding whether to Fix a Bug: Creating a
Problem for Customers
Sometimes customers will build applications that rely upon a
bug. Fixing the bug will break the applications.
•
An application crashes with an emulator, even though
the emulator is bug free. (Compensating bug problem.)
•
The graphics package with rotation about the Z-axis in
the wrong direction.
•
The 3-pixel rendering problem with Internet Explorer.
With each of these bugs the code was easy to fix, but releasing
it would cause problems for existing programs.
CS 5150
16
Deciding whether to Fix a Bug: Bugs and
Features
Validation: Are we building the right product?
Verification: Are we building the product right?
It is sometimes difficult to distinguish between the two.
That's not a bug. That's a feature!
Often users will report that a program behaves in a manner
that they consider wrong, even though it is behaving as
intended.
The decision whether this is a bug should be made by the
client not by by the developers.
CS 5150
17
Reliability: Adapting Small Teams to Large Projects
Small teams and small projects have advantages for reliability:
• Small group communication cuts need for intermediate
documentation, yet reduces misunderstanding.
• Small projects are easier to test and make reliable.
• Small projects have shorter development cycles. Mistakes in
requirements are less likely and less expensive to fix.
• When one project is completed it is easier to plan for the next.
Improved reliability is one of the reasons that Agile development
has become popular over the past few years.
CS 5150
18
An Old Question:
Safety Critical Software
A software system fails and several lives are lost. An inquiry
discovers that the test plan did not consider the case that caused
the failure. Who is responsible?
(a) The testers for not noticing the missing cases?
(b) The test planners for not writing the complete test plan?
(c) The managers for not having checked the test plan?
(d) The client for not having done a thorough acceptance test?
CS 5150
19
Software Developers and Testers: Responsibilities
• Carrying out assigned tasks thoroughly and in a
professional manner
• Being committed to the entire project -- not just tasks that
have been assigned
• Resisting pressures to cut corners on vital tasks
• Alerting colleagues and management to potential problems
early
CS 5150
20
Computing Management Responsibility
• Organization culture that expects quality
• Appointment of suitably qualified people to vital tasks
(e.g., testing safety-critical software)
• Establishing and overseeing the software development
process
• Providing time and incentives that encourage quality
work
• Working closely with the client
Accepting responsibility for work of team
CS 5150
21
Client Responsibility
• Organization culture that expects quality
• Appointment of suitably qualified people to vital tasks
(e.g., technical team that will build a critical system)
• Reviewing requirements and design carefully
• Establishing and overseeing the acceptance process
• Providing time and incentives that encourage quality
work
• Working closely with the software team
Accepting responsibility for the resulting product
CS 5150
22