Identifying partial orders in sequences
(with Galois connections)
Gemma Casas Garriga
Data Description
• A sequence is an ordered list of sets of items:
<(I1) (I2) … (IM)>
For example,
< (a c) (d b) (e) (a) >
• We consider a set of sequences D to be analyzed.
Id
D = {s1, … , sN}
where each si is a
sequence.
1
2
3
< (a) (b) (c) (d) >
< (b) (c) (d) (a) >
< (b) (c) (a) (d) >
Basic Definitions
• A general partial order in D is an acyclic directed
graph, indicating an order between items (such an
hybrid episode).
A
B
C
D
•The support of a poset in D is the number of input
sequences that are compatible with it.
1
A
B
C
D
< (a) (b) (c) (d) >
2
< (b) (c) (d) (a) >
3
< (b) (c) (a) (d) >
Problem Formulation
• Goal: to identify partial orders and their support
(alternatively, whose support is over a minimum user-specified
threshold)
• Problem: redundancies and complexity ...
For ex. both P and P’ are compatible with
the same input sequences, but P is more
“informative” than P’
1
< (a) (b) (c) (d) >
2
< (b) (c) (d) (a) >
3
< (b) (c) (a) (d) >
P’
P
B
C
A
A
B
C
D
P’ P
Problem Formulation
• If P’ P we say that P is more specific than P’.
• Specificity relation is different from classical
inclusion of episodes.
||
A
,
B
,
C
,
D
||
A
B
C
D
Goal redefined: to identify the most specific posets
among those occurring in the same input seqs
(alternatively, with support over a minimum threshold).
Example
Input Seqs where it is
contained
||
1
2
3
B
C
D
,
A
||
1,2,3
A
B
< (a) (b) (c) (d) >
2,3
C
D
< (b) (c) (d) (a) >
A
1,3
D
< (b) (c) (a) (d) >
B
C
A
B
C
D
1
B
C
D
A
2
B
C
A
D
3
Motivation
• Ordering relationships are useful in many domains: web
mining, monitoring of processes, e-comerce ...
• The most specific episodes give a general view of D,
summarizing
all
the
input
sequences
without
redundancies.
Addressing the Problem
• Observation: Identifying such structures directly from the
data is a complex task (specificity relation).
• A proposal:
– Constructing partial orders out of their maximal paths.
– That is, finding those subsequences in D that will identify
maximal paths of the final partial orders.
A
B
Two max paths:
• <(b) (c) (a)>
C
D
• <(b) (c) (d)>
A proposal
||
B
C
B
D
,
A
Set of all seqs.
identifying max. paths:
A
<(a)>
D
<(b) (c) (d)>
C
A
<(b) (c) (a)>
D
B
A
||
<(a) (d)>
C
B
C
D
<(a) (b) (c) (d)>
<(b) (c) (d) (a)>
B
C
D
A
B
C
A
D
<(b) (c) (a) (d)>
What are these sequences?
Result 1
Theorem: sequences identifying maximal paths of the most specific
partial orders are closed sequential patterns.
• Closed sequential patterns (or closed sequences) are maximal
among those having the same number of occurences (support) in D.
1
< (a) (b) (c) (d) >
2
< (b) (c) (d) (a) >
3
< (b) (c) (a) (d) >
• < (a) (d) > is closed.
• < (b) (d) > is not cloased, since it
is contained in <(b) (c) (d)> that has
the same support.
• Some algorithms for minig closed seqs: CloSpan, BIDE, TSP ...
How to construct posets out of closed
sequences?
||
B
C
D
,
A
<(b) (c) (d)>
C
D
<(b) (c) (a)>
A
<(a) (d)>
D
B
A
<(a) (b) (c) (d)>
C
B
Closed Sequences
<(a)>
A
B
||
C
D
<(b) (c) (d) (a)>
<(b) (c) (a) (d)>
B
C
D
A
Some
closed
sequences may identify maximal
paths
of
different partial orders.
B
C
A
D
Grouping closed sequences
||
B
C
,
D
A
<(b) (c) (d)>
A
B
||
<(a)>
C
D
<(a) (d)>
A
D
B
<(a) (b) (c) (d)>
C
A
B
C
D
B
C
D
A
B
C
A
D
<(b) (c) (d) (a)>
<(b) (c) (a) (d)>
<(b) (c) (a)>
Result 2
• It is possible to characterize a closure operator
sets of sequences.
working on
• A closure operator satisfies the three basic closure axioms:
Monotonicity, Extensivity, and Idempotency.
• Broadly:
Given any set of sequences S, (S) returns the set of
maximal sequences that are present in the same input
sequences where S is contained.
1
< (a) (b) (c) (d) >
2
< (b) (c) (d) (a) >
3
< (b) (c) (a) (d) >
({< (b) (c) >}) = {<(a)>, <(b) (c) (d)> }
Result 2
• A set of sequences S is closed if it coincides with its closure:
(S) = S
Lemma: individually, sequences in a closed set S, are
closed sequential patterns.
1
({<(a)>, < (b) (c) (d)>}) = {<(a)>, <(b) (c) (d)> }
< (a) (b) (c) (d) >
2
< (b) (c) (d) (a) >
3
< (b) (c) (a) (d) >
Both <(a)> and <(b) (c) (d)> are
closed sequential patterns.
Result 2
Theorem: closed sets of sequences identify the maximal
paths of the same partial order.
Closed set of sequences
{<(a)> , <(b) (c) (d)> }
1
2
3
Partial Orders
||
B
C
< (a) (b) (c) (d) >
< (b) (c) (d) (a) > {<(b) (c) (a)> , <(b) (c) (d)>}
,
D
A
B
C
D
< (b) (c) (a) (d) >
A
{<(a) (d)> , <(b) (c) (d)>}
D
B
C
A
||
Lattice of Closed Sets of Sequences
Conclusions
• General partial orders in sequential data can be identified by:
– Mining closed sequential patterns and their support
(CloSpan, BIDE …).
– Grouping closed sequential patterns in closed sets of
sequences, according to operator .
– Getting final partial orders from those agrupations.
• This transformation represents an important algorithmic
simplification w.r.t. previous approaches of mining episodes
directly from the data.
© Copyright 2026 Paperzz