Alain Casali
Christian Ernst
Industrial Problem
Given a supply chain (in micro-electronics), we want to
find links between some parameters’ values and values of a
specific attribute of the supply chain (the yield).
The use of positive (and/or negative) association rules is
not suitable in our context.
We use correlation tests because:
it is a more significant measure in a statistical way;
the measure takes into account not only the presence but also the
absence of the items;
the measure is non-directional, and can thus highlight more
complex existing links than a “simple ” implication.
Dexa'09 - Extracting Decision Correlation Rules
Outline
Preliminaries
Decision Correlation Rules
Contingency Vectors
LHS-χ2 algorithm
Experimental Analysis
Conclusion
Dexa'09 - Extracting Decision Correlation Rules
Literal Set
A literal set XY is composed by:
a positive part (X);
a negative part (Y);
The variation of a literal set XY encompasses all the
combinations that we can obtain from XY.
Ex: Var(AB) = AB, AB, AB, AB
The support of a literal set is the number of
transactions that contain its positive part and contain
no 1-item of its negative part.
Dexa'09 - Extracting Decision Correlation Rules
Correlation rule and
2
χ
(1)
Contingency table
Each cell of the contingency table (CT) of a
Targe
Tid
Item
t
pattern X contains the support of all literal
1
BCF
T
sets YZ related to its variation:
2
BCF
T
1
1
3
BCE
T1
4
F
T1
5
BDF
T2
6
CT (BF)
B
B
∑ line
F
7
1
8
BF
F
1
1
2
7
BCF
∑ column
8
2
10
8
AE
9
BCF
10
BF
Expected Value
Dexa'09 - Extracting Decision Correlation Rules
Correlation rule and
2
χ (2)
Computation of χ2 (Brin’97)
Makes the link between real support and
theoretical support (expected value)
²( X )
(Supp(Y Z) - E(Y Z))²
E(Y Z)
Y Z TC ( X )
⇒χ2(BF) ≈ 1,67
Correlation rate
utilization of a table giving the centile values with a single
degree of freedom (existence of a bijection)
Correlation (BF) ≈ 85%
Dexa'09 - Extracting Decision Correlation Rules
Related Constraints
Anti monotone constraint (Cochran
criteria):
no cell of the CT must have a null
value;
at least p% of the CT’s cells must have
a support greater or equal than
MinSup;
Monotone Constraint
X symbolizes a valid correlation rule:
χ2(X) ≥ MinCor
Dexa'09 - Extracting Decision Correlation Rules
Browsing the search space
Utilization of levelwise algorithms to browse the search
space;
Levelwise algorithms are adapted when:
the relation is on the disk;
we have anti monotone constraints.
Problem: memory requirement for the contingency tables
i1
i
o(2 * Cn )
Example with |I| = 1000
Level
Memory requirement
2
4 MB
3
2,5 GB
4
1,3 TB
Dexa'09 - Extracting Decision Correlation Rules
Lectic Order & Lectic Search (LS)
Goal: enumerate the combinations (powerset lattice) with a balanced tree
Start point: 2 vectors; the 1st one is empty, the 2nd one contains the list of the
items
Create 2 branches:
left: prune the last element of the 2nd vector (recursive call)
right: add the last element of the 2nd vector to the first (recursive call)
Stop: when the 2nd vector is empty, then output the 1st vector
(,ABC)
(C,AB)
(,AB)
(B,A)
(,A)
(, )
(A,)
(B,)
(AB,)
DEXA - Sept. 2006
9
Outline
Preliminaries
Decision Correlation Rules
Contingency Vectors
LHS-χ2 algorithm
Experimental Analysis
Conclusion
Dexa'09 - Extracting Decision Correlation Rules
Decision Correlation Rules
We are interested by rules satisfying the both
constraints:
χ2(X) ≥ MinCor
X contains 1 value of the target attribute
Problem:
it does not exist a function f such that
χ2(X ∪ A) = f(χ2(X), supp(A))
Dexa'09 - Extracting Decision Correlation Rules
Outline
Preliminaries
Decision Correlation Rules
Contingency Vectors
LHS-χ2 algorithm
Experimental Analysis
Conclusion
Dexa'09 - Extracting Decision Correlation Rules
Contingency Vector (1)
Equivalence class associated with a literal
[YZ] = {i Tid(r) / Y Tid(i) et Z Tid(i) = }
Ex : [B F] = {3}
Contingency Vector of a pattern X
Set of equivalence classes of the
variation of X
Ex : CV (B F) = { [BF], [BF], [BF], [BF]}
= {{8}, {4}, {3}, {1,2,5,6,7,9,10}
Dexa'09 - Extracting Decision Correlation Rules
Tid
Item
Targe
t
1
BCF
T1
2
BCF
T1
3
BCE
T1
4
F
T1
5
BDF
T2
6
BF
7
BCF
8
AE
9
BCF
10
BF
Contingency Vector (2)
The contingency vector is a partition of the Tid’s
Recurrence relation:
VC (X A) = (VC(X) [A]) (VC(X) [A])
In practice:
Additions in
binary logic
VC(B) + VC(F) =
Tid
1
2
3
4
5
6
7
8
9
10
VC(B)
1
1
1
0
1
1
1
0
1
1
Tid
1
2
3
4
5
6
7
8
9
10
VC(F)
1
1
0
1
1
1
1
0
1
1
Tid
1
2
3
4
5
6
7
8
9
10
VC(B F)
11
11
10
01
11
11
11
00
11
11
Dexa'09 - Extracting Decision Correlation Rules
Contingency Vector (3)
VC(B) + VC(F) =
Tid
1
2
3
4
5
6
7
8
9
10
VC(B F)
11
11
10
01
11
11
11
00
11
11
Computation of
the contingency
table
«Distribution» B F
TC[B F]
BF
BF
BF
BF
1
1
1
7
Dexa'09 - Extracting Decision Correlation Rules
Outline
Preliminaries
Decision Correlation Rules
Contingency Vectors
LHS-χ2 algorithm
Experimental Analysis
Conclusion
Dexa'09 - Extracting Decision Correlation Rules
LHS
2
χ
Algorithm
Modification of LS in order to include the contingency
vectors;
If we are on a node:
Call to the left branch: we do nothing;
Before calling the right branch:
Computation of the new contingency vector;
Test of the anti monotone constraints;
[Add current pattern to the positive border]
Test of the monotone constraints;
Computation of the χ2
If all tests are OK, then output the pattern and its χ2
Dexa'09 - Extracting Decision Correlation Rules
Memory Requirements
What is the needed storage requirement?
Contingency Vectors of the 1-item:
|I|*|r| bits
Currents contingency vectors (including the previous
one due to recursive call):
|I|*|I|*|r| bits in theory
|I|*|r| bytes in practice since we never exceed
pattern having a length greater than 8
Finally we need: |r|*(|I|+|I|/8) bytes
i
this result has to be compared with o(2i1 * Cn )
Dexa'09 - Extracting Decision Correlation Rules
Outline
Preliminaries
Decision Correlation Rules
Contingency Vectors
LHS-χ2 algorithm
Experimental Analysis
Conclusion
Dexa'09 - Extracting Decision Correlation Rules
Experimental Analysis (1)
Experiments are made on PC with a 1.8 GHz processor
with a RAM of 2Go
Files are provided by 2 manufacturers
(STMicroelectronics and ATMEL)
STMicroelectronics
ATMEL
# transactions
492
426
# Items
3384
1136
Dexa'09 - Extracting Decision Correlation Rules
Experimental Analysis (2)
Dexa'09 - Extracting Decision Correlation Rules
Experimental Analysis (2)
Dexa'09 - Extracting Decision Correlation Rules
Outline
Preliminaries
Decision Correlation Rules
Contingency Vectors
LHS-χ2 algorithm
Experimental Analysis
Conclusion
Dexa'09 - Extracting Decision Correlation Rules
Conclusion
We have discovered new parameters having an
influence on the yield (above 25% was not known
before);
Better response time between 30 and 70% with LHS-χ2
compared to a levelwise algorithm;
Perspectives:
Utilization of “divided and conquer” strategy for better
performances;
« Cleaning » / Transformation of original data;
Generalization of the rules by integrated literal sets.
Dexa'09 - Extracting Decision Correlation Rules
© Copyright 2026 Paperzz