The present and the future role of probability in Software Engineering

1
The Present and Future Role of
Probability in Software Engineering
Trial Lecture Friday 30 December
Siv Hilde Houmb
Siv HIlde Houmb, The Present and Future Role of Probability in Software Engineering
2
Outline
•
•
•
•
•
Short introduction to SE and Probability
Probability as Decision Support for SE
Project/Software Estimation in SE
The Present Role of Probability in Project Estimation
The Future Role of Probability in Project Estimation
– Treat estimation as a probabilistic phenomenon
– Combine estimation with quantitative risk assessment to yield
realistic estimates
– Estimation as a System Identification problem
•
•
Concluding Remarks
References
Siv HIlde Houmb, The Present and Future Role of Probability in Software Engineering
3
Software Engineering
Software engineering (SE) is the
application of a systematic, disciplined,
quantifiable approach to the development,
operation, and maintenance of software [1].
Siv HIlde Houmb, The Present and Future Role of Probability in Software Engineering
Cost
4
Product Management Methodology and Tools
y
rit ty
le
u
b
c
a
e
Se Saf Reli End-Product
Development Methodology and Tools
M1
Requirement
M2
Design
M3
Implementation
Software Processes
Maintenance
Project Management Methodology and Tools
Time
Siv HIlde Houmb, The Present and Future Role of Probability in Software Engineering
5
Probability Theory
Probability theory is the branch of
mathematics concerned with analysis of
random phenomena [1].
• Probability theory are concerned with random
variables, stochastic processes and events
• Mathematical abstractions of non-deterministic
events or measured quantities that may either be
single occurrences or evolve over time in an
apparently random fashion
Siv HIlde Houmb, The Present and Future Role of Probability in Software Engineering
6
Probability as Decision Support in SE
• The goal of a software project is to produce reliable
and effective software that meets its requirements
and expectations
• This involves many DECISIONS
– Which methodology and tools are most effective?
– What are the project risks?
– What are expected and realizable quality goals (safety, security,
reliability)?
– What are the risks to meeting these goals?
– How much should and will the development cost in money and
resources?
– What are the risks to meeting the deadline, budget?
Siv HIlde Houmb, The Present and Future Role of Probability in Software Engineering
7
Estimation in SE
Motivation for looking into software/project estimation
is to find out if the situation there is the same as in
security “estimation” (quantification)
Size
Cost, Duration/Time, Schedule
Compute
TCF
User
Specs
Count
Function
Points
UFP
Apply
Organization
Particulars
Product
Size
Estimation
tool
(COCOMO)
Development
Process
Planning
Tool
Project
Plans
From page 184 in Bernstein & Yuhas (2005) [10]
Siv HIlde Houmb, The Present and Future Role of Probability in Software Engineering
8
Project Estimation
• Concerns estimating time (T), schedule (S) and
cost (C)
– Time often aggregated into a quantitative value denoted Effort
– Cost C must usually be within some Budget B
• Decisions concerns deriving at an estimate of
time and budget that is close to the “real” values
• The ultimate goal is: S=T and B=C, meaning
that the a prior estimates equals the posterior
observations
Siv HIlde Houmb, The Present and Future Role of Probability in Software Engineering
9
Estimation Methods
•
(Repeatable) Wideband Delphi Estimation
– Uses two types of meetings; kick-off and estimation meetings, and
results in an agreed upon estimate
•
PROBE (Proxy Based Estimation)
– Individual estimates based on database storing prior experience of
a particular engineer
•
COCOMO II (COnstructive COst MOdel)
– Uses 5 scale drivers and 15 cost drivers to estimate size and
complexity and from that derive the required effort
•
The Planning Game
– For XP (Extreme Programming)
– Estimation as a game between the engineers and the stakeholders
using user/usage stories
Siv HIlde Houmb, The Present and Future Role of Probability in Software Engineering
10
The Present Role of Probability (1)
• Estimation as traditional business analysis
with little real statistics and where one
attempts to obtain best estimates with little or
no formal representation of the uncertainty in
the estimates
• Software estimation's cone of uncertainty
– Uncertainty in the estimates
– Variability in the events and activities of a project
Siv HIlde Houmb, The Present and Future Role of Probability in Software Engineering
11
The Present Role of Probability (2)
– "If I give you another week to work on your
estimate, can you refine it so that it contains less
uncertainty?" – This does not work
– The reason the estimates contain variability is that
the software project itself contains variability, not
that the estimates are variables at different points
in time
– The only way to reduce the variability in the
estimate is to reduce the variability in the project
itself or to capture these in the estimation model
12
Interpretation of Probability
• Classical (frequentist) interpretation
– Estimate P’ of event E, where uncertainty is an estimate of the
relative distance between the actual P and P’
– Uses/requires historical or empirical data
• Bayesian or subjectivist interpretation –
B. Finetti (1973) [12,13]
– The belief P’ of event E, where uncertainty expresses how certain
the source is that P’ is P
– Uses expert judgment, historical data and other available sources
• Predictive Bayesian interpretation –
T. Aven (2003) [20]
– Estimate an subject’s uncertainty of P for future event E
– Uses expert judgment on historical data and experience
13
Future Role of Probability (1)
1. Treat project estimation as a probabilistic
phenomenon according to the subjectivist or
Bayesian interpretation (general idea from Pfleeger
(1999) [14] and Fenton et al. (1999/2000/2004)
[11,18,19])
•
Rather than looking at the variability in the process/software
and uncertainty in the estimates separately based on
historical data using the classical interpretation of probability
one can combine them under the subjectivist or Bayesian
interpretation of probability
14
Future Role of Probability (2)
2. Combine estimation with quantitative risk
assessment to yield realistic estimates (Kansala
(1007) [6] and to support trade-off analysis (Fenton
et al. (1999/2000/2004) [11,18,19])
3. Estimation seen as a (probabilistic) System
Identification Problem (from Ramil (2000) [7])
15
Project Estimation as a probabilistic
phenomenon
•
•
COCOMO II developed based on 83 projects examining the
relation between A-prior (estimates) and A-posterior
(observations) using regression analysis
Regression type models (Chulani, Boehm and Steece (2000)
[24])
– Might lead to misunderstanding about cause and effect
– Represent a static analysis and need lot of historical data (no
missing data points and no outliners)
– Assumes that there is an actual value
– Treats estimates as “objective”
– Assumes that all works well during the project – does not counter
for the variability
– Must be calibrated to an organization
16
A Causal Approach to Estimation
•
•
Using causal models (as in software reliability engineering)
based on the metrics in e.g. COCOMO II extends the estimation
model to a dynamic decision-support and risk analysis tool
Causal analysis using Bayesian Belief Nets (BBNs) (Fenton et
al. (1999/2004) [11,19])
– Diverse process and product variables (to express variability in each and
the dependencies between them)
– Empirical evidence and expert judgment
– Genuine cause and effect relationships
– Uncertainty
– Incomplete information
•
No additional metrics neither in the data-collection or the
sophistication of the metrics
– The BBN topology simply expresses the current metrics only extended to
handle disparate information sources in sets of conditional probability
statements
17
BBN Topology for Estimation
• BBN consist of
– A graphical network (DAG) with nodes and arcs
• Nodes represent uncertain variables
• Arcs models the causal relationship between the variables
– Probability tables
• Provides the probabilities of each state of the variable for a
node
• We will look at a BBN topology for estimating
resources in a project from Fenton and Neil (1999)
[11] that can be used both as a trade-off analysis and
as project risk analysis
18
Classical versus Causal [11]
19
Required Resources Subnet
(from Fenton and Neil (1999) [11])
20
Example:
Problem Size of 1400-1500 FP
21
Example:
Require High Accuracy
22
Propagating Evidence in BBN (1)
• Evidence propagates through the topology using
Bayes method
• Bayes method
– Estimate a prior probability (P(A)) – initial belief
– Collect evidence/information (P(B))
•
•
•
•
Perform experiments
Historical data
Collect expert opinions
Other information sources
– Update prior estimates to a posterior estimate (P(A|B))
23
Propagating Evidence in BBN (1)
• Bayes rule is used to update the network
P ( B | A) P( A)
P( A | B) =
P( B)
• Bayes rule updates our belief about a hypothesis A in
the light of new evidence B
– Our prior belief P(A) is updated to posterior belief P(A|B) by
multiplying our prior belief P(A) with the likelihood that B will occur if
A is true P(B|A)
24
Concluding Remark
– Benefits of Causal Model/BBN (1)
• Explicitly modeling of variability in projects and
uncertainty in estimates
• Explicitly modeling of cause-effect relationships
• Can combine diverse types of information
• Makes explicit those assumptions that were
previously hidden
• Intuitive graphical format makes it easier to
understand chains of complex and seemingly
contradictory reasoning
25
Concluding Remark
– Benefits of Causal Model/BBN (2)
• Ability to forecast missing data
• Support for ‘what-if?’ analysis and forecasting of
effects of process changes
• Use of subjectively or objectively derived probability
distributions
• Rigorous mathematical semantics for the model
• No need to do any of the complex Bayesian
calculations as tools like HUGIN does that
26
"Not everything that can be counted counts,
and not everything that counts can be counted.“
Unknown
placed on Einstein’s office door at Princeton
Albert Einstein
Siv HIlde Houmb, The Present and Future Role of Probability in Software Engineering
27
References (1)
1.
2.
3.
4.
5.
6.
7.
8.
9.
IEEE Standard Glossary of Software Engineering Terminology, IEEE std
610.12-1990.
A. Stellman and J. Green. Applied Software Project Management. O’Reilly,
2005.
B. Boehm et al. Software Cost Estimation with Cocomo II. Addison Wesley,
2000.
B. Boehm. Software Engineering Economics. Englewood Cliffs, N.J, PrenticeHall, 1981.
F. P. Brooks Jr. The Mythical Man Month. Essays on Software Engineering.
Addison Wesley, USA (1975).
K. Kansala. Integrating Risk Assessment with Cost Estimation. IEEE Software,
Vol. 14, No.4 (1997), pp. 61-67.
J. Ramil. Why COCOMO' Works Revisited or Feedback Control as a Cost
Factor. Submitted to FEAST 2000 International Workshop on Feedback in
Software and Business Processes, July 10-12, Imperial College, London, 2000.
S. McConnell. Software Estimation: Demystifying the Black Art. Microsoft
Press, 2006.
B. Boehm and K. Sullivan. Software economics status and prospects.
Information and Software Technology, Volume 41, No. 14, (1999), pp937-946.
28
References (2)
10. L. Bernstein and C.M. Yuhas. Trustworthy Systems Through Quantitative
Software Engineering. John Wiley & Sons, 2005.
11. N. Fenton and M. Neil. Software Metrics and Risk. 2nd European Software
Measurement Conference. 1999.
12. B. D. Finetti. Theory of Probability Volume 1. John Wiley & Sons, 1973.
13. B. D. Finetti. Theory of Probability Volume 2. John Wiley & Sons, 1973.
14. S.L. Pfleeger. Albert Einstein and Empirical Software Engineering. IEEE
Computer, 32(10):32-38, October 1999.
15. C. Wohlin and P. Runeson and M. Höst and C.O. Ohlsson and B. Regnell and
A. Wesslé. Experimentation in Software Engineering: An Introduction. Kluwer
Academic Publishers, 2000.
16. R. Cooke. Experts in Uncertainty: Opinion and Subjective Probability in
Science. Oxford University Press, 1991.
17. M. Jørgensen and K.J. Moløkken-Østvold. How Large Are Software Cost
Overruns? Critical Comment on the Standish Group’s CHAOS Report.
Information and Software Technology, 48(4):297-301, 2006.
29
References (3)
18. M. Jørgensen. Estimation of Software Development Work Effort: Evidence on
Expert Judgment and Formal Models. International Journal of Forecasting
23(3):449-462, 2007.
19. N. Fenton and M. Neil. Software metrics: roadmap. In Proceedings of 22nd
International Conference on Software Engineering: Future of Software
Engineering Track, pages 357-370, 2000.
20. N. Fenton, W. Marsh, M. Neil, P. Cates, S. Forey and M. Tailore. Making
resource decisions for software projects. Proceedings of the 26th International
Conference on Software Engineering (ICSE04), pages 391-406, 2004.
21. T. Aven. Foundations of Risk Analysis: A Knowledge and Decision-Oriented
Perspective. Wiley, 2003.
22. S. Chulani. Incorporating Bayesian Analysis to Improve the Accuracy of
COCOMO II and Its Quality Model Extension. Ph.D. Qualifying Exam Report,
University of Southern California, February 1998.
23. S. Chulani, B. Boehm and B. Steece. Bayesian Analysis of Empirical Software
Engineering Cost Models. IEEE Transactions on Software Engineering, Special
Issue on Empirical Methods in Software Engineering, Vol. 25, No. 4, July/August
1999.
30
References (4)
24. S. Chulani, B. Boehm and B. Steece. From Multiple Regression to Bayesian
Analysis for Calibrating COCOMO II. Proceedings of the 21st Annual
Conference of the International Society of Parametric Analysts (ISPA), 2000.
25. H. Wang, F. Peng, C. Zhang and A. Pietschker. Software Project Level
Estimation Model Framework based on Bayesian Belief Networks. Proceedings
of the Sixth International Conference on Quality Software. IEEE Computer
Society, pages 209-218, 2006.
26. I. Stamelos, L. Angelis, P. Dimou and E. Sakellaris. On the use of bayesian
belief networks for the prediction of software productivity. Information &
Software Technology, 45(1):51-60, 2003.
31
Project Resource BN
(from Fenton et al. (2004) [19])
32
Subnet for Total Effective Effort
(from Fenton et al. (2004) [19])
33
Example: Resource Prediction
(from Fenton et al. (2004) [19])