(ppt 2.9 MB) - Simula Research Laboratory

Magne Jørgensen
Simula Research Laboratory
University of Oslo
Scienta
Who am I?
• Researcher (100%), professor (20%), consultant (10%)
• Research on human judgment, empirical methods, software
cost estimation, software project management
• Working with (and in) industry using:
– Action research
– Data analytics (project data analysis)
– Surveys (interviews, questionnaires)
– Controlled experiments
• Member of national board advising Norwegian public ITprojects
• Founded and actively promoting
evidence-based software engineering to:
– Software professionals
– Students
Motivation:
From myths and fashion to an
evidence-based discipline!
Both the industry and the
academia see and claim
patterns where there are none
and miss true patterns.
Confirmation bias in industry:
“I see it when I believe it” vs “I believe it when I see it”
•
•
Experimental design:
– Data sets with randomly set performance data comparing “traditional” and
“agile” methods.
– Survey of each prior developer’s belief in agile methods
Question: How much do you, based on the data set, agree in: “Use of agile
methods has caused a better performance when looking at the combination of
productivity and user satisfaction.”
Result:
– Previous belief in agile determined what they saw in the randomly
generated data
Individual Value Plot of Productivity
Productivity (Function Points/Work-day)
•
Dissatisfied
Agile
9
8
7
6
5
4
3
2
1
0
Dissatisfied
Satisfied
Satisfied
Traditional
Very satisfied
User Satisfaction
Panel variable: Development Method
Very satisfied
Any patterns here? Randomness?
(random = each position in the square were equally probable to generate)
How many would show a pattern if allowed to remove 1-2 ”outliers”?
Sometimes we don’t see the patterns …
Assume a sequence of coin throws and two people (A and B) who play against
each others. The one who bets on the two-coin sequence that occurs first wins
the game.
Person A bets on: Head-Head
Person B bets on: Tail-Head
Do they have the same probability of winning?
Examples:
Tail-Tail-Head-Head-Head-Tail-...  Tail-Head occurs first and B wins
Answer: It is three times more likely to observe Tail-Head before Head-Head!
(If you don’t believe me, we can make a bet where I bet 20 Euro on Alt. 2 and you 10
Euro on Alt. 1. First to win ten times, keeps the 30 Euro.)
Many beliefs are based on
very poor empirical basis and
are easy to manipulate
The ease of creating beliefs:
Are risk-willing or risk-averse developers better?
Group A:
Group B:
Manipulated
Average 3.3
Manipulated
Average 5.4
Debriefing
Average 3.5
Debriefing
Average 5.0
2 weeks later
Average 3.5
2 weeks later
Average 4.9
Study design: Research evidence + Self-generated argument.
Informed that the
evidence
was misleadning
Question: Based on your experience, do you think that
risk-willing programmers are better than risk-averse programmers?
1 (totally agree) – 5 (No difference) - 10 (totally disagree)
Neutral group: Average 5.0
Group A (B): Evidence and argument in favour of the risk-willing (risk-averse)
Effect sizes in studies on pair programming
Source: Hannay, Jo E., et al. "The effectiveness of pair programming: A meta-analysis."
Information and Software Technology 51.7 (2009): 1110-1122.
You may at this stage start to wonder ...
HOW IS THIS CONNECTED TO
INDUSTRY COLLABORATION?
First: What is Evidence-based
software engineering (EBSE)?
Evidence-based software engineering (EBSE)
1. Convert a relevant problem or need for information
into an answerable question.
2. Search the literature and practice-based experience
for the best available evidence to answer the
question. (+ create own local evidence, if needed)
3. Critically appraise the evidence for its validity,
impact, and applicability.
4. Integrate the appraised evidence with practical
experience and the client's values and
circumstances to make decisions about practice.
5. Evaluate performance in comparison with previous
performance and seek ways to improve it.
How can this help
collaboration with industry?
Warning: Unsurprisingly, it does
not address all collaboration
challenges
EBSE Step 1:
Support in formulating the question
• Example: A company wants to know whether agile methods leads to
improvement or not!
• Our role: Tell them that they need a more precise (answerable)
question. Help them with this by:
– Clarifying which agile practices are/will be implemented? What is
the context? Improvement compared to what?
– Clarifying which aspect of project success that are important?
– Include question elements with focus on understanding the
mechanisms that makes agile practices work well/not work well.
• My own experience: This step is key to succeed with industry
collaboration. Avoids wasting time on non-answerable questions and
collaborations that leads to nothing.
– A projects where I participated a few years ago spent man-years
on studying whether RUP worked well or not, without really
defining what RUP is and what ”works well” means.
One example (not perfect, but …)
Agile
Frequent
deliveries to
production
Flexible Scope
Client benefits
16%
22%
29%
Functionality
22%
29%
16%
Tech. quality
21%
6%
32%
Budget control
2%
22%
29%
Time control
8%
11%
24%
Efficiency
11%
5%
24%
Example of finding: Agile without frequent delivery to
production and flexible scope had a negative effect on success
measured as client benefit.
Also learn industry to question claims like this …
14% Waterfall and 42% of Agile projects are successful
(source: The Standish Group, The Chaos Manifesto 2012)
Successful = “On cost, on schedule and with specified
functionality”
Can you spot a serious error in this analysis?
EBSE Steps 2 and 3:
Identify, generate and evaluate evidence
•
•
•
•
Three main sources of evidence:
– Research
– Practice-based experience
– Local experiments
Our role: When having formulated an answerable question in collaboration
with ”industry” (a company, companies), help them with collecting all three
types of evidence:
– Primary and secondary studies (systematic literature reviews)
– Collection of practice-based experience by interviewing software
professionals
– Design of local experiments (from piloting to controlled experiments)
What we need to offer: Competence in critical collection, evaluation and
summary of evidence. Good study designs.
What we get: Lots of good research data and, hopefully, interesting results.
Many companies (especially those agile and
lean) are willing to experiment
• Example experience with different models for experimentation (local
evidence):
– Piloting new technology on a project while researchers observe how
it goes, summarizes and compares with default technology.
• Easy to start, but complex to analyze data. Typically only allowed
on small, not critical projects.
– Including a new tool/process in several existing projects (most
recently we introduced “benefit management” in existing processes)
and observing/measuring what happens.
• Requires some skill in convincing that it’s worth it (people’s time is
expensive).
• Use of research grants to pay for the extra effort works well
– Randomized controlled trials (randomized treatment)
• Costly and may require funding for participation.
• Why are so many skeptic about participation payment? It’s
common practice elsewhere.
EBSE steps 4:
Integrate evidence
Many possible presentation formats:
•Evidence briefings (see sites.google.com/site/eseportal/evidencebriefings)
•Guidelines, checklists, principles, ….
•Presentations
•Reports with concrete recommendations
Our role: Critial evaluation and summary of evidence (relevance and
validity) [and possible teaching them how to critically evaluate
evidence.]
What we get: Evidence summaries for selected contexts.
Example:
Presentation of integrated evidence as
principles
Examples of an evidence-based principle:
7.1 Keep forecasting methods simple.
Description: Complex methods may include errors that
propagate through the system or mistakes that are difficult to
detect. Select simple methods initially (Principle 6.6). Then
use Occam’s Razor; that is, use simple procedures unless
you can clearly demonstrate that you must add complexity.
Purpose: To improve the accuracy and use of forecasts.
Conditions: Simple methods are important when many people
participate in the forecasting process and when the users
want to know how the forecasts are made. They are also
important when uncertainty is high and few data are
available.
Strength of evidence: Strong empirical evidence. Many
analysts find this principle to be counterintuitive.
Source of evidence: This principle is based on evidence
reviewed by Allen and Fildes (2001), Armstrong (1985),
Duncan, Gorr and Szczypula (2001), and Wittink and
Bergestuen (2001).
www.forecasting.com
EBSE Step 5:
Evaluate outcome
• The industry is poor at evaluating the effect of a
process/tool change, but has a strong wish to know this.
– Sometimes, however, they don’t really want to know if the
outcome is ”bad” …
• Typical situation: They have much data, but no-one with
competence to analyse them.
• Our role: Support with expertise on measurement, study
design and analysis.
• What’s in it for us: Lots of data to analyse, but be critical
to data quality. Personally, I experience in at least 50% of
the cases that the quality is rubbish.
Such analyses may be complex ....
A company measured an increase in productivity
of an IT-department (function points/man-month).
Everybody were happy, especially since this
“proved” that their newly implemented
incremental processes were successful.
To my surprise, when I grouped the project into
those using PowerBuilder, those using Cobol
(and a third group) I found a productivity
decrease in both groups.
Were my analysis incorrect?
Period 1
Powerbuilder
Cobol
Total
FP
500
2000
2500
Effort
500
4500
5000
Productivity
1.0
0.44
0.50
Period 2
Powerbuilder
Cobol
Total
FP
2000
1000
3000
Effort
1800
3000
4800
Productivity
0.9
0.33
0.63
Change in prod.
-0.1
-0.11
0.13
Arithmetic ”explanation”: a/b + c/d ≠ (a+c)/(b+d)
Also called ”Simpson’s paradox” and ”missing variables” (the
proportion of work done in different groups should have ben
included in the analysis)
To summarize …
• The software industry/companies have
challenges/questions, but frequently lack the competence
to properly address them (unless being taught EBSE, of
course ;-))
– We have much of that competence. Or at least we
should have it.
• EBSE may be used as a top level framework/checklist for
us, as researchers, to collaboration with them about
these challenges. This has the potential to give us valid
and relevant (= convincing) empirical research.
– Many issues, some of the really core for successful
collaboration, not addressed in EBSE
• All collaborations need to cover all five steps and all
steps need collaborations.