LOG6306 :
Études empiriques sur
les patrons logiciels
Yann-Gaël Guéhéneuc
Empirical Studies
This work is licensed under a Creative
Commons Attribution-NonCommercialShareAlike 3.0 Unported License
Types of Studies
2/97
Types of Studies
Aristotle
384 BC–7 March 322 BC
Sir Isaac Newton
4 January 1643–31 March 1727
3/97
Types of Studies
Scientific Method
– Study and creation of new knowledge
based on physical evidences
– Usage of observations, hypotheses,
and logic to explain natural
phenomena through theories
Prediction of reproducible theories
through experiments and the
development of new technologies
Aristotle
384 BC–7 March 322 BC
Sir Isaac Newton
4 January 1643–31 March 1727
4/97
Types of Studies
Platon
Zhuangzi
René Descartes
428 BC–346 BC
369 BC–286 BC
31 March 1596–11 February 1650
5/97
Types of Studies
Digression…
– Plato’s allegory of the cave
– « Zhuangzi dreamt that he was a butterfly » (莊周夢蝶)
– Descartes’ evil genius
Platon
Zhuangzi
René Descartes
428 BC–346 BC
369 BC–286 BC
31 March 1596–11 February 1650
6/97
Types of Studies
Sir Karl Raimund Popper
28 July 1902–17 September 1994
7/97
Types of Studies
Digression…
– Refutability (or falsifiability)
See http://karl-popper.over-blog.com/
Sir Karl Raimund Popper
28 July 1902–17 September 1994
8/97
Introduction
Need for quality predictive models
Heterogeneity of systems, processes, and
organisations
Lack of general predictive models
Need to conduct empirical studies to
adapt, validate, or build predictive
models
9/97
Steps of an Empirical Study
1.
2.
3.
4.
5.
6.
Objective of the study
Conception
Data collection techniques
Practical considerations
Data analysis techniques
Application of the results
10/97
Steps of an Empirical Study
1.
2.
3.
4.
5.
6.
Objective of the study
Conception
Data collection techniques
Practical considerations
Data analysis techniques
Application of the results
11/97
Objective of the Study
Choose the type of study
Define and write hypotheses
Define and study the variables
Interpret and generalise the results
12/97
Choose the Type of Study
In summary
– Survey
• Retrospective study of a situation
• Comparison with similar situations
– Case study
• Identification of variables that can impact the results of an activity
• Documentation of the activity: inputs, constraints, resources, and outputs
– Experiment
• Identification of variables that can impact the results of an activity
• Changes to these variables to study their impact on the results
13/97
Choose the Type of Study
Survey
– To generalise the data gathered on hundreds or
so occurrences to all the possible occurrences
14/97
Choose the Type of Study
Survey
Sources: http://globalnews.ca/news/2075610/prime-minister-tom-mulcair-new-seat-projections-poll-show-ndp-surging-across-canada/ and
http://globalnews.ca/news/2278263/liberals-continue-to-surge-widen-lead-in-seat-projections/
15/97
Choose the Type of Study
Case study
– Detailed study of one occurrence of a
phenomenon
• Case
– Understanding of the reasons why the case
occurred as it did
– Generation of hypotheses, low generalisation
… A well known case?
16/97
Choose the Type of Study
s
t
Case study
n
e
– Detailed study of one occurrence of a
m
phenomenon
e
v
o
r
– Understanding of the p
reasons why the case
occurred as it didim
– Generations
ofshypotheses, low generalisation
e
c
o
r
…
A
well
known
case?
P
• Case
17/97
Choose the Type of Study
Case study
– Illustration
– Exploration
– “Critical” occurrence
– Implantation of a program (in general ☺)
– Impact of a program
– Accumulation
18/97
Choose the Type of Study
Experiment
– Actions and objective observations performed to
solve a problem, answer some questions and–or
confirm/infirm some hypotheses about a
phenomenon
– “Hard” sciences vs. “Soft” sciences
“Why don’t rabbits wear glasses?”
19/97
Choose the Type of Study
?
Experiment
…
s
t
– Actions and objective observations
performed
to
o
r
r
solve a problem, answer someaquestions and–or
c
confirm/infirm some hypotheses
about a
t
a
phenomenon
e
y
e
h
t
– “Hard” sciences
vs. “Soft” sciences
e
s
u
a
c
e
“Why don’t rabbits wear glasses?”
B
20/97
Choose the Type of Study
Controlled experiment
– Comparison of the results of an experiment with
a quasi-identical control group except for the
aspects on which focus the experiment
– Generalisability
Ex.: “This computation is slower with this
new algorithm”
Ex.: “placebo and drug”
21/97
Choose the Type of Study
Natural experiment
– No variable is changed
– Observation (not manipulation) of variables
when all other variables remain constants
Ex.: “Suns are hydrogen clouds that
collapsed”
22/97
Choose the Type of Study
Quasi-experiment
– When it is impossible to physically (concretely)
test the hypotheses
– When it is impossible to create a true control
group for practical/ethical reasons
– When there cannot be equivalence between the
study and the control groups
Ex.: “A vaccine against cancer”
23/97
Define and Write Hypotheses
The objective of the study musy be clear
– Hypotheses
An hypothesis is
– A prediction relating one variable and one
behaviour (making up a phenomena)
– A temporary affirmation that describes or
explains a phenomena
– An expected explanation…
24/97
Define and Write Hypotheses
An hypothesis comes from
– Theories
– Observations
– Previous data
Ex.: “Programs written in Java are better that
those written in C”
25/97
Define and Write Hypotheses
An hypothesis may generate multiple
operational hypotheses
– Concrete examples of the hypothesis
– Main quality of an hypothesis: can become
operational
Ex.: “The development time of a same
program in Java is lower than in C”
26/97
Define and Write Hypotheses
An (operational) hypothesis involves multiple
variables and measures
– Independent variables
– Dependant variables
– Measures
27/97
Define and Study the Variables
Once you defined an hypothesis, you mut
identify the variables (and their measures)
that may impact its truth
You must assess your degree of control over
each variables
– Independent variables
– Dependant variables
28/97
Define and Study the Variables
The state or independent variable is a
variable that you can change, and whose
changes impact the observed behaviour and
that characterise the study object and the
study results
Study results are described by the values of
the dependant variables
29/97
Define and Study the Variables
To confirm/infirm the hypothesis, you must
show without doubt the relation between
independent and dependant variables
– Change the independent variables
– Keep constant all other possible variables
30/97
Define and Study the Variables
Mitigating variables
– Unknown variables
– Variables that you cannot keep constant or
measure effectively
Ex.: “Programs written in Java are faster
than in C in Windows”
31/97
Define and Study the Variables
To suppress mitigating variables, you must
try to keep their impact constant
Ex.: “Programs written in Java are faster
than in C in Windows across 30 runs on 30
different installations”
32/97
Define and Study the Variables
The null hypothesis plays a particular role
– It supports alternative hypotheses
– “There is no difference”, i.e., no relation between
a variable and a behaviour
Ex.: “There is no difference between the
development times of a program in Java and
in C”
… Importance?
33/97
Define and Study the Variables
The null hypothesis plays a particular role
– It supports alternative hypotheses
– “There is no difference”, i.e., no relation between
a variable and a behaviour
Ex.: “There is no difference between the
development times of a program in Java and
in C”
… Importance?
34/97
Define and Study the Variables
Statistical significance
– Trust that the results of the study are not due to
chance
– Statistical tests on a set of values to obtain a
probability, the p-value
The p-value represents the trust to give to
the results
35/97
Define and Study the Variables
p ≤ 0,05 (α = 0,05)
– Significative differences of the results with
results obtained randomly
– Obtained results due to chance less than 5
times out of 100
Ronald A. Fisher, 1925
– 1 out of 20
– 22 trials
36/97
Define and Study the Variables
p ≤ 0,05 (α = 0,05)
– Significative differences of the results with
results obtained randomly
– Obtained results due to chance less than 5
times out of 100
Ronald A. Fisher, 1925
– 1 out of 20
– 22 trials
Ronald A. Fisher
17 février 1890–29 juillet 1962
37/97
Define and Study the Variables
With a survey, you cannot define variables
With a case study, independent variables take the
“typical” values of the considered case
With an experiment, you perform a sampling
according to the independent variables
– An experiment provide more generalisability than other
types of studies
38/97
Interpret and Generalise the Results
The quantitative measures related to the
variables must be clear and direct
Ex.: Instead of “Programs written in Java are
better that those written in C”, write
“Program written in Java are faster than in C
in Windows across 30 runs on 30 different
installations”
39/97
Interpret and Generalise the Results
The relation between the measures and the
variables must be clear and documented
Ex.: Instead of “Programs written in Java are
better that those written in C”, write
“Program written in Java are faster in ms.
than in C in Windows across 30 runs on 30
different installations”
40/97
Interpret and Generalise the Results
Surveys and case studies allow you to
confirm a theory or hypothesis for one
organisation or case
Experiments allow you to confirm a theory or
hypothesis more generally
41/97
Interpret and Generalise the Results
Relation between variables and behaviour
– A relation can be suggested by a survey or a
case study
– An experiment can assess the degree of a
relation (statistically)
42/97
Interpret and Generalise the Results
Evaluation of the exactness of models
– Experiments can confirm/infirm the exactness of
models but the models must not influence the
study design and the sampling
– Ex.: Literary Digest survey in 1936 in the USA
for the election between Roosevelt and Landon
(see http://www.math.uah.edu/stat/data/LiteraryDigest.html)
43/97
Interpret and Generalise the Results
Ex.: remember…
44/97
Interpret and Generalise the Results
Ex.: remember…
Aristotle
Galileo Galilei
Johannes Kepler
384 BC–7 March 322 BC
15 February 1564–8 January 1642
27 December 1571–15 November 1630
45/97
Aristotle
384 BC–Mar 7, 322 BC
Galileo Galilei
Feb 15, 1564–Jan 8, 1642
Isaac Newton
Dec 25, 1642–Mar 20, 1727
Max Tegmark
May 5, 1967–
46/97
Steps of an Empirical Study
1.
2.
3.
4.
5.
6.
Objective of the study
Conception
Data collection techniques
Practical considerations
Data analysis techniques
Application of the results
47/97
Conception of an Experiment
Objective: test an hypothesis
Definitions
–
–
–
–
–
Changes to the independent variables are called treatments
A unique trial is a case
An experiment is mode of several trials
Experimental objects are the objects on which to apply treatments
People applying the treatments are the subjects of the experiments
• Participants!
– Control objects are objects on which the treatments are not applied
and against which the results of the treatments are compared
– Dependent are supposed to change because of the treatments
– Independent variables impact the treatment and the results of the
experiments indirectly but with clear relation
48/97
Conception of an Experiment
Structured method to determine the relation
between independent variables and some
dependent variables to test some
hypotheses
49/97
Conception of an Experiment
Ex.: testing a program
– Function f with 4 parameters
•
•
•
•
A, three possible values
B, three possible values
C, three possible values
D, three possible values
A × B × C × D = 81 combinations to test
Conception of the experiment to test f?
50/97
Conception of an Experiment
f is a printing function
Level 1
A: # originals
Level 2
Level 3
1
11
51
B: duplex
1 to 1
1 to 2
2 to 2
C: output
None
Yes
Staples
D: interruption
None
Panel
Tray
51/97
Conception of an Experiment
Creation of a test plan
Test
A
B
C
D
1
1
1 to 1
None
None
2
1
1 to 2
Yes
Panel
3
1
2 to 2
Staples
Tray
4
11
1 to 1
Yes
Tray
5
11
1 to 2
Staples
None
6
11
2 to 2
None
Panel
7
51
1 to 1
Staples
Panel
8
51
1 to 2
None
Tray
9
51
2 to 2
Yes
None
52/97
Conception of an Experiment
How to reduce the variability of the 81
combinations to 9 tests
– Madhav S. Phadke ; Quality Engineering Using
Robust Design ; Prentice Hall, November, 1989
– Genichi Taguchi ; System of Experimental
Design ; Don Clausing, editor ; UNIPUB/Krass
International Publications, Volume 1–2, 1987
53/97
Conception of an Experiment
How to reduce the variability?
– OFAT (one-factor-at-a-time)
– Experimental factorial “complete” plan
• Levelsvariables, 2k, 3l… ex. : 33
– Experimental factorial
“fractional” plan
• Part of the complete plan
• Choice of interesting tests
• Danger of confusion
54/97
Conception of an Experiment
Conception in function of the objectives
– To test an hypothesis
– But for which objective?
•
•
•
•
•
Comparison
Selection
Response
Mixing
Regression
55/97
Conception of an Experiment
Comparison
– One or several variables
– One important variable
– Importance of this variable?
56/97
Conception of an Experiment
Selection
– Several variables
– What are the most important variables?
57/97
Conception of an Experiment
Response (Response Surface Method)
– Several variables
– What is the shape of the response surface?
• To optimise a process
• To identify weak points
• To improve the reliability of a process
58/97
Conception of an Experiment
Mixing
– Several variables
• The variables are proportions of a mix
– What the better proportions?
59/97
Conception of an Experiment
Regression
– Several continuous variables
– What is the mathematical relation between
independent and dependent variables?
– What are the best parameters of the
mathematical model?
60/97
Conception of an Experiment
Preparation
– To prepare subject to apply the treatments
• Formation
• Written procedure
Run
– Application of the treatments to the objects by
the subjects (participants) following the
experimental plan
61/97
Conception of an Experiment
Success factors
– Have “good” objectives
• Not too few or too many variables
• Not to measure incorrectly the variables
– Quantitative measures
• Boolean
• Variations
62/97
Conception of an Experiment
Success factors
– Replication to lessen the impact of mitigating
variables
Ratios signal/noise ∆/σ
∆σ
Trials
1,0
64
1,4
32
2,0
16
2,8
8
– Randomisation of the trials
• Ex.: “Developers work harder on…”
63/97
Conception of an Experiment
Success factors
– Filtering out known variations
• Homogeneous blocks
– Folding of the effects
• When the combination of two variables has a
significant impact on the results
– Iterative runs of the experiments
• Selection, then response/regression…
– Confirmation of the results
• Trust, incertitude
64/97
Conception of an Experiment
Ex.: one variable, two treatments
– New development process
– Quality of the programs with this new process
– Independent variable: development process
– Dependant variable: number of faults postrelease
65/97
Conception of an Experiment
Ex.: on variable, two treatments
– Conception of a random experiment
– Let
•
be the average of the dependent variable for
treatment i
•
be the j e measure of the dependent variable for
treatment i
• Be n the number of measures for i = 1 and m for i = 2
– Hypotheses
66/97
Conception of an Experiment
Ex.: one variables, two treatments
– Analyses
– t-test
• t ≥ α with α = 0,05
• Rejection of the null hypothesis for the other one
– Mann-Whitney
67/97
Conception of an Experiment
Ex.: one variable, two treatments
– t-test
• α = 0,05
• Rejection of the null hypothesis for the other one
• Rejection if
• With
• With
(cf.
where
and
the variances of the samples
)
68/97
Conception of an Experiment
Ex.: one variable, two treatments
– Mann-Whitney
• If the assumptions of the t-test are not valid
• Order the data in the samples
• Compute
• With
the sum of the ranks of the smaller sample
• Reject the null hypothesis if
is lower
than the value in a table (available on-line at
http://fsweb.berry.edu/academic/education/
vbissonnette/ tables/mwu.pdf)
69/97
Steps of an Empirical Study
1.
2.
3.
4.
5.
6.
Objective of the study
Conception
Data collection techniques
Practical considerations
Data analysis techniques
Application of the results
70/97
Data Collection Techniques
Participants’ selection
– Apply the treatments to the objects
– Simple random sampling
• Participants chosen randomly in a list
– Systematic sampling
• First participant chosen randomly
• Next participants are the n-th members on the list
71/97
Data Collection Techniques
Participants’ selection
– Stratified random sampling
• The population is divided in strata
• Each stratum has a known distribution
• Simple random sampling in each stratum
72/97
Data Collection Techniques
Participants’ selection
– Method of quotas
• Participants are chosen in different categories of the
overall population
• Socio-demographic criteria, ex.: sex, age,
employment, province…
• Fast because participants can be “exchanged” from
within a same category
• Impossible to compute a margin of error but close to
random sampling
73/97
Data Collection Techniques
Participants’ selection
– Commodity sampling
• The participants closer and most easily accessible
are invited to participate
74/97
Data Collection Techniques
Margin of error
– Elections in the USA in 2004, survey of October
2nd 2004 by Newsweek
• Kerry: 47%
• Bush: 45%
• Nader: 2%
using a simple random sampling of 1,013 people
75/97
Data Collection Techniques
Margin of error
– Standard deviation for Kerry, p = 0.47
and N = 1,013
Standard deviation =
= 0.16
± 1 s.d. ⇒ confidence interval of 68%
± 2 s.d. ⇒ confidence interval of 95%
± 2.58 s.d. ⇒ confidence interval of 99%
76/97
Data Collection Techniques
Margin of error
– Margin of error at 99% computed for p = 0,5 is
– Upper bound of the confidence interval when p =
0.5, this interval decreases when p decreases
(or increases)
– Margin of error changes with p!
77/97
Data Collection Techniques
Data format
– Choose simple textual formats over complex
binary formats…
– CSV (et ARFF)
– XML
– Binaries (Excel and others…)
78/97
Data Collection Techniques
CSV
– Comma Separated Value
– De facto industrial standard (Microsoft Excel)
– Data format: ASCII (Unicode possible)
– Data stored by lines, separated by commas, with
possibly quotes
79/97
Data Collection Techniques
CSV
– Semantics of the data encoded outside of the
CSV file, dependent on the analysis software
John,Doe,120 jefferson st.,Ritoide, NJ, 08075
Jack,McGinnis,220 hobo Av.,Phila, PA,09119
"John ""Da Man""",Repici,120 Jefferson St.,Ritoide, NJ,08075
Stephen,Tyler,"7452 Terrace ""At the Plaza"" road",SomeTown,SD, 91234
,Blankman,,SomeTown, SD, 00298
"Joan ""the bone"", Anne",Jet,"9th, at Terrace plc",Desert City,CO,00123
80/97
Data Collection Techniques
ARFF
– Attribute-Relation File Format
– Similar to CSV but with headers
@RELATION Metrics-Roles
@ATTRIBUTE CBO REAL
@ATTRIBUTE DIT REAL
@ATTRIBUTE NOC REAL
@ATTRIBUTE WMC REAL
@ATTRIBUTE Roles {0,1}
@DATA
0.0,0.0,5.0,1.0,0
0.0,0.0,9.0,0.0,0
0.0,0.0,0.0,1.0,1
81/97
Data Collection Techniques
XML
– eXtensible Markup Language
– Fashionable but verbose
– Data format: Unicode
– Data recorded in a tree
– Syntax defined in a DTD file
(Document Type Definition)
– Semantics encoded in the tag names
82/97
Data Collection Techniques
XML
<designPatterns>
<program type="Java">
<name>1 - QuickUML 2001</name>
<designPattern name="Abstract Factory">
<microArchitectures>
<microArchitecture number="1">
<roles>
<clients>
<client><entity>diagram.DefaultFigureEditor</entity></cli
<client><entity>diagram.DiagramUI</entity></client>
</clients>
<abstractProducts>
<abstractProduct><entity>diagram.FigureRenderer</entity><
</abstractProducts>
<products>
<product><entity>diagram.DefaultFigureRenderer</entity></
<product><entity>uml.diagram.ClassRenderer</entity></prod
83/97
Data Collection Techniques
Binaries
– Often proprietary
– Impossible to read without a tool
– Require precise definition of their format
– Small footprint
– « Safe »
84/97
Data Collection Techniques
Binaires
þÿÿÿ
ÐÏ à¡± á
> þÿ
?
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
É€ á ° Á â
\ p Yann-Gaël Guéhéneuc
u @ À = œ ¯
¼
= h< y ß 8
·
A r i a l 1 È
" Ú 1 È
ÿ A r i a l 1 È
A r i a l 1 È
ÿ ÿ ÿ A r i a l 1 È
A r i a l
ÿ A r i a l 1 ð ÿ ¼
1 È A r i a l 1 È $ A r i a l 1 È ÿ ¼
A r i a l 1 È
A r i a l 3 # , # # 0 \
# , # # 0 \
" ¬ " ; \ - # , # # 0 \
" ¬ " = " ¬ " ; [ R e d
# , # # 0 . 0 0 \
] \ - # , # # 0 \
" ¬ " ? " ¬ " ; \ - # ,
" ¬ " ; [ R e d ] \
# # 0 . 0 0 \
" ¬ " I " # , # # 0 . 0 0 \
-# , # # 0 . 0 0 \
" ¬ " q * 6 _ - *
# , # # 0 \
" ¬ " _ - ;
-\ - *
# , # # 0 \
" ¬ " _ - ; _ - *
" - " \
" ¬ " _ - ; _ - @
-_ - k ) 3 _ - *
# , # # 0 \
_ ¬ _ - ; \ - *
# , # # 0 \
_ ¬
- _ - ; _ - *
" - " \
_ ¬ _ - ; _ - @ _ - , > _ - *
# , # #
85/97
Steps of an Empirical Study
1.
2.
3.
4.
5.
6.
Objective of the study
Conception
Data collection techniques
Practical considerations
Data analysis techniques
Application of the results
86/97
Practical Considerations
Analyses
– Validate the measures taken during the experiments
– Analyse the results using known statistical tests and tools
Diffusion and decision making
– Conclusions must be explained
– Conclusions must be reproducible
– Three usages of the results
• Apply suggested changes
• Apply changes at a bigger scale
• Conduct further experiments for more precise results
87/97
Practical Considerations
Types of errors
–
–
–
–
–
Experimental errors
Observation errors
Measure errors
Experimental resources variations
Impact of combined variables overlooked
Mitigation
– Replication of the experiment rather than the measure
– Application of a random method to avoid bias
88/97
Practical Considerations
Four main types of threats to the validity exist
– Threats to construct validity are “whether the theoretical
constructs are interpreted and measured correctly”
– Threats to internal validity concern the causal
relationship that can cause the observed effects
– Threats to external validity pertain to the generalisability
of their results
– Threats to conclusion are due to the choices of the
analyses performed on the collected data to test the
hypotheses
89/97
Practical Considerations
Four main types of threats to the validity exist
t
n
a
t
r
o
p
im
– Threats to construct validity are “whether the theoretical
constructs are interpreted and measured correctly”
– Threats to internal validity concern the causal
relationship that can cause the observed effects
– Threats to external validity pertain to the generalisability
of their results
– Threats to conclusion are due to the choices of the
analyses performed on the collected data to test the
hypotheses
y
r
e
V
90/97
Steps of an Empirical Study
1.
2.
3.
4.
5.
6.
Objective of the study
Conception
Data collection techniques
Practical considerations
Data analysis techniques
Application of the results
91/97
Data Analysis Techniques
Experiments ⇒ Data
– Number of faults per reviews…
– Values of dozens of metrics…
Correlations
Rules (see also next course)
Mario Piattini, Coral Calero, Houari Sahraoui, and Hakim Lounis ; Object-relational
92/97
database metrics ; Revue L’Objet, Hermès Sciences, March, 2001
Example of Object
CREATE TABLE subs(
idsubs INTEGER,
name VARCHAR(20),
subs_add address,
PRIMARY KEY (idsubs));
CREATE TABLE dep(
iddep INTEGER,
name VARCHAR(20),
dep_loc location,
budget DECIMAL (8,2),
PRIMARY KEY (iddep));
CREATE TABLE subs-dep(
idsubs INTEGER,
iddep INTEGER
PRIMARY KEY (idsubs,iddep),
FOREIGN KEY idsubs REFERENCES subs(idsubs)
FOREIGN KEY iddep REFERENCES dep(iddep));
CREATE TABLE employee(
idemp INTEGER,
name VARCHAR2(40),
emp_date date,
emp_loc location,
emp_add address,
manager INTEGER,
dep INTEGER,
PRIMARY KEY (idemp),
FOREIGN KEY manager REFERENCES employee(idemp),
FOREIGN KEY dep REFERENCES dep(iddep));
CREATE TYPE address AS(
street CHAR(30),
city CHAR(20),
state CHAR(2),
zip INTEGER);
CREATE TYPE location AS(
building CHAR(4),
office CHAR(4),
table CHAR(4);
93/97
Examples of Metrics
TS = TSSC + TSCC
–
–
–
–
TSSC: size of simple columns ≈ NSA
TSCC: size of complex columns≈ NCC
NSA: number of simple attributes
NCC: number of complex attributes
RD: number of foreign keys (reference degree)
DRT: depth of relational tree of a table (the longest of such
a relation with this table and any other)
PCC: percentage of the complex columns
NIC: number of classes with complex attributes
NSC: number of classes shared between this table and
others tables
94/97
Examples of Metric Values
SUBS
DEP
SUBS_DEP
EMPLOYEE
TS
4
4.5
2
8.5
RD
0
0
2
2
DRT
0
0
1
2
PCC
33%
25%
0%
28.57%
NIC
1
1
0
2
NSC
1
1
0
2
95/97
Example of Experiment
Estimate the maintainability of objectrelational databases using these metrics
What experiment(s)?
96/97
Example of Experiment
Participants
– Two groups of different participants
– Choice of the participants
• Commodity sampling
• 4 people in DIRO
• 4 people in a Spanish university
97/97
Example of Experiment
Objects
– 5 object-relational databases
Database
Number
tables
Average
attributes/table
Average complex
attributes/table
Airlines
6
4,16
1,83
Animals
10
2,7
0,6
Library
12
2,91
0,75
Movies
9
4,33
0,88
Treebase
13
3,46
0,86
98/97
Example of Experiment
Independent variables
– Metrics
(Participants’ knowledge?)
(Databases complexity?)
Dependant variable: comprehension
– 1 if 10 of the 12 metric values are correctly
evaluated by the participants
– 0 else
99/97
Example of Experiment
Treatments
– Computation of TS, DRT, and RD
– For each table in a random order
– Time limited to 2 minutes per metric and table
100/97
RoC Quantitative Analysis
Rules
– Bayesian classifier
– Conditional probability
– Exclusive and exhaustive classes
– Independent attributes
– Possibility to have unknown values
– 10 applications of RoC on 50 examples obtained
from the 50 tables of the 5 databases
101/97
RoC Quantitative Analysis
Results of the classification with RoC
Espagne
DIRO
Correct
407
369
Incorrect
93
131
Non classé
0
0
Précision
81.4 %
73.8 %
Couverture
100.0 %
100.0 %
102/97
RoC Quantitative Analysis
TS
0
1
(1 - 3)
0.136
0.543
(3 - 5)
0.193
0.233
(5 - 10)
0.336
0.129
(10 - 17.5)
0.336
0.095
DRT
0
1
0
0.336
0.336
1
0.221
0.371
2
0.193
0.233
3
0.250
0.060
RD
0
1
0
0.319
0.316
1
0.319
0.247
2
0.148
0.316
3
0.09
0.040
PCC
0
1
(0 - 25)
0.471
0.603
(25 - 80)
0.529
0.397
NIC
0
1
0
0.257
0.517
1
0.114
0.241
2
0.229
0.069
3
0.143
0.034
4
0.062
0.040
4
0.143
0.069
5
0.062
0.040
5
0.057
0.034
6
0.057
0.034
103/97
C4.5 Quantitative Analysis
Règle 1
TS ≤ 9 ∧ DRT = 0 ∧ NSC = 0 ⇒ classe 1
[84.1%]
Règle 2
TS ≤ 3 ∧ RD > 1 ⇒ classe 1
[82.0%]
Règle 3
TS ≤ 9 ∧ DRT ≤ 2 ∧ NIC > 0 ∧ NSC = 0 ⇒ classe 1
[82.0%]
Règle 4
TS > 9 ⇒ classe 0
[82.2%]
Règle 5
DRT > 2 ⇒ classe 0
[82.0%]
Classe par défaut : 0
104/97
C4.5 Quantitative Analysis
Maintenabilité prédite
Complétude
0
1
0
28
0
100%
1
3
19
86.36%
Correction
90.32%
Maintenabilité réelle
100%
Précision de 94%, données du DIRO
105/97
Steps of an Empirical Study
1.
2.
3.
4.
5.
6.
Objective of the study
Conception
Data collection techniques
Practical considerations
Data analysis techniques
Application of the results
106/97
Application of the Results
New development process
Prediction of the maintainability
New hypotheses, new experiments…
107/97
© Copyright 2026 Paperzz