learning pattern languages from a small number of helpfully chosen

LEARNING PATTERN LANGUAGES FROM A SMALL
NUMBER OF HELPFULLY CHOSEN EXAMPLES
A Thesis
Submitted to the Faculty of Graduate Studies and Research
In Partial Fulfillment of the Requirements
For the Degree of
Master of Science
In
Computer Science
University of Regina
By
Zeinab Mazadi
Regina, Saskatchewan
August, 2013
Copyright 2013: Zeinab Mazadi
UNIVERSITY OF REGINA
FACULTY OF GRADUATE STUDIES AND RESEARCH
SUPERVISORY AND EXAMINING COMMITTEE
Zeinab Mazadi, candidate for the degree of Master of Science in Computer Science, has
presented a thesis titled, Learning Pattern Languages from a Small Number of
Helpfully Chosen Examples, in an oral examination held on August 26, 2013. The
following committee members have found the thesis acceptable in form and content, and
that the candidate demonstrated satisfactory knowledge of the subject material.
External Examiner:
Dr. Douglas Farenick,
Department of Mathematics & Statistics
Supervisor:
Dr. Sandra Zilles, Department of Computer Science
Committee Member:
Dr. Boting Yang, Department of Computer Science
Committee Member:
Dr. Robert Hilderman, Department of Computer Science
Chair of Defense:
Dr. Ronald Martin, Faculty of Education
Abstract
A pattern is a string containing variable symbols and constants. The language of
a pattern is the set of all strings obtained by replacing all variables in the pattern
with non-empty strings. Patterns and their languages were introduced by Angluin in
1980. Since that time, learning of pattern languages has been a topic of great interest
in the research area of computational learning theory, mainly because of its relevance
for many applications. Areas in which patterns are suitable for modelling data are
for example bioinformatics (e.g., when representing sets of amino acid sequences) or
text mining (e.g., for automated information extraction).
This thesis studies learning of pattern languages in the context of computational
learning theory. Computational learning theory studies various models of learning and
investigates the learnability of classes of languages using each learning model. Moreover, determining the number of data points (sample complexity) and the amount
of computational time (time complexity) required for learning a particular class of
languages in a particular model is a main goal in computational learning theory.
In this thesis we focus on learning classes of pattern languages in a model called
“learning from helpful examples” or “learning from teachers”. In particular, we are
interested in determining the worst case number of examples that need to be communicated between a teacher and a learner to identify the target language from all
other languages in the class.
The sample complexity measures we use are the teaching dimension in the classic
teaching model and the recursive teaching dimension in the cooperative teaching
model that was introduced later. We are interested in computing these parameters
for certain classes of pattern languages over various sizes of alphabets from which the
i
constant symbols can be taken. In particular, we aim to determine the effect of the
alphabet size on these parameters.
We study arbitrary patterns, regular patterns and one-variable patterns over three
types of alphabets, namely finite alphabets of size at least two, singleton alphabets
and infinite alphabets. Our results show that sometimes the alphabet size does influence the sample complexity parameters and sometimes it does not influence them.
Moreover, we demonstrate that more advanced models of teaching, like the recursive
teaching model, can be more sample-efficient than the classic teaching model when
learning certain kinds of pattern languages.
ii
Acknowledgements
First and foremost I offer my sincerest gratitude to my supervisor, Dr. Sandra Zilles
who has supported me completing my master degree with patience, motivation, enthusiasm, and immense knowledge. I deeply appreciate her effort and advice improving
my presentation and writing skills. I could not have imagined having a better advisor
and mentor for my graduate studies.
Besides my supervisor, I would like to thank Dr. Boting Yang and Dr. Robert
J. Hilderman for being on my committee, for the time and effort they put in reading
my thesis.
My sincere thanks also go to my family: my mother Hajar Mousavipour Shirazi
who has always supported me in my life, my brother Mahmoud Mazadi who encouraged and motivated me persuading my graduate studies in Canada.
Last but not the least, I would like to thank financial support from the Faculty of
Graduate Studies and Research (Scholarship, Research Award) and the Department
of Computer Science (Teaching Assistantships) which made this research possible.
iii
Table of Contents
Abstract
i
Acknowledgements
iii
Table of Contents
iv
List of Tables
viii
1 Introduction
1
1.1
Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2
Pattern Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.3
Contribution of This Thesis . . . . . . . . . . . . . . . . . . . . . . .
6
1.4
Organization
7
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Preliminaries and Background
2.1
9
Patterns and Pattern Languages . . . . . . . . . . . . . . . . . . . . .
9
2.1.1
Interesting Special Classes of Pattern Languages . . . . . . . .
11
2.1.2
Learning Pattern Languages . . . . . . . . . . . . . . . . . . .
13
2.2
Models of Teaching . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
2.3
Useful Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
3 Finite Alphabets of Size at Least Two
3.1
Arbitrary Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iv
24
24
3.2
3.3
3.1.1
Teaching Dimension . . . . . . . . . . . . . . . . . . . . . . .
25
3.1.2
Recursive Teaching Dimension . . . . . . . . . . . . . . . . . .
28
Regular Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
3.2.1
Teaching Dimension . . . . . . . . . . . . . . . . . . . . . . .
31
3.2.2
Recursive Teaching Dimension . . . . . . . . . . . . . . . . . .
34
One-Variable Patterns . . . . . . . . . . . . . . . . . . . . . . . . . .
38
3.3.1
Teaching Dimension . . . . . . . . . . . . . . . . . . . . . . .
39
3.3.2
Recursive Teaching Dimension . . . . . . . . . . . . . . . . . .
41
4 Alphabets of Size One
4.1
4.2
4.3
45
Arbitrary Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
4.1.1
Teaching Dimension . . . . . . . . . . . . . . . . . . . . . . .
45
4.1.2
Recursive Teaching Dimension . . . . . . . . . . . . . . . . . .
46
Regular Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
4.2.1
Teaching Dimension . . . . . . . . . . . . . . . . . . . . . . .
54
4.2.2
Recursive Teaching Dimension . . . . . . . . . . . . . . . . . .
56
One-Variable Patterns . . . . . . . . . . . . . . . . . . . . . . . . . .
57
4.3.1
Teaching Dimension . . . . . . . . . . . . . . . . . . . . . . .
57
4.3.2
Recursive Teaching Dimension . . . . . . . . . . . . . . . . . .
57
5 Infinite Alphabets
5.1
5.2
59
Arbitrary Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
5.1.1
Teaching Dimension . . . . . . . . . . . . . . . . . . . . . . .
59
5.1.2
Recursive Teaching Dimension . . . . . . . . . . . . . . . . . .
60
Regular Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
5.2.1
Teaching Dimension . . . . . . . . . . . . . . . . . . . . . . .
64
5.2.2
Recursive Teaching Dimension . . . . . . . . . . . . . . . . . .
64
v
5.3
One-Variable Patterns . . . . . . . . . . . . . . . . . . . . . . . . . .
64
5.3.1
Teaching Dimension . . . . . . . . . . . . . . . . . . . . . . .
64
5.3.2
Recursive Teaching Dimension . . . . . . . . . . . . . . . . . .
65
6 Conclusions
66
6.1
Arbitrary Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
6.2
Regular Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
6.3
One-Variable Patterns . . . . . . . . . . . . . . . . . . . . . . . . . .
68
6.4
Limitations and Open Problems . . . . . . . . . . . . . . . . . . . . .
69
Bibliography
71
vi
List of Tables
2.1
Class of all pattern languages ΠL over Σ = {a, b}.
2.2
Class L = {Li | i ∈ N} where L0 = ∅, Li = {wi } and (w1 , w2 , w3 , . . .)
. . . . . . . . . .
17
is a repetition-free enumeration of Σ∗ . . . . . . . . . . . . . . . . . . .
18
3.1
Class of all regular pattern languages (RΠL) over Σ = {a, b}.
33
3.2
Class of all regular pattern languages (RΠL) over Σ = {a, b}. The
. . . .
examples used in the recursive teaching protocol are marked in brackets. 38
3.3
Class of all one-variable pattern languages (1V ΠL) over Σ = {a, b}.
The examples used in the recursive teaching protocol are marked in
brackets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1
Recursive teaching sets for languages generated by patterns of length
1 with respect to L≥1 . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2
49
Recursive teaching sets for languages generated by patterns of length
4 with respect to L≥4 . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5
49
Recursive teaching sets for languages generated by patterns of length
3 with respect to L≥3 . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4
49
Recursive teaching sets for languages generated by patterns of length
2 with respect to L≥2 . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3
44
50
Recursive teaching sets for languages generated by patterns of length
5 with respect to L≥5 . . . . . . . . . . . . . . . . . . . . . . . . . . .
vii
50
4.6
Recursive teaching sets for languages generated by patterns of length
6 with respect to L≥6 . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7
Recursive teaching sets for languages generated by patterns of length
7 with respect to L≥7 . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.8
50
Recursive teaching sets for languages generated by patterns of length
8 with respect to L≥8 . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.9
50
51
Recursive teaching sets for languages generated by patterns of length
9 with respect to L≥9 . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
4.10 Recursive teaching sets for languages generated by patterns of length
10 with respect to L≥10 . . . . . . . . . . . . . . . . . . . . . . . . . .
51
4.11 Recursive teaching sets for languages generated by some patterns of
length 9 with respect to L≥9 .
6.1
. . . . . . . . . . . . . . . . . . . . . .
53
An overview of teaching dimension and recursive teaching dimension
for learning classes of languages generated by arbitrary patterns, regular patterns and one-variable patterns over finite and infinite alphabets. 69
viii
Chapter 1
Introduction
Computational learning theory deals with theoretical guarantees on the runtime complexity, memory complexity and sample complexity of learning algorithms. Sample
complexity refers to the amount of training data needed by a learning algorithm to
complete its learning task successfully. In this thesis, we study the sample complexity
of learning pattern languages in two different formal models of teaching, i.e., models
in which the training examples are chosen by a helpful teacher.
1.1
Learning Models
In 1967, Gold introduced the model of language identification in the limit using
positive examples. In this model, in each time step, the learner is provided strings
from the target language (i.e., positive examples) and it has to generate a hypothesis
based on the information received so far. The learner is called successful if it keeps
generating the same hypothesis after some point and this hypothesis represents the
target language L, as long as the infinite input sequence contains all and only the
elements of L. Since the learner has to be able to cope with any such input sequence,
1
sample complexity is not a meaningful notion in this model.
By contrast, sample complexity has been studied in the model of learning from
randomly chosen examples (e.g., PAC-learning [30]), in models of learning from information requested by the learner (e.g., query learning [2]), and in models of learning
from helpful examples provided by the environment (e.g., teaching [8, 27]).
Valiant introduced the model of probably approximately correct learning or PAClearning [30]. In PAC-learning there is a probability distribution over the instance
space from which the examples are drawn at random. Here the examples are labelled,
i.e., each example is an instance w together with the label + if w belongs to the target
concept (e.g., a target language) and with the label − otherwise. Also, this model
introduces two parameters δ, known as “confidence”, and , known as “accuracy”. A
concept is PAC-learned by a learning algorithm if and only if for every distribution
over instances, the learning algorithm with parameters δ and will, with probability
at least 1 − δ, generate a hypothesis whose error compared to the target concept is
less than . The error is determined by the probability that the distribution chooses
an instance on which the hypothesis and target concept disagree.
Learning from queries was introduced by Angluin in 1988 [2]. It is based on the
existence of a truthful oracle that answers some particular type of queries about the
target language. The learner must generate its only hypothesis about the target
language after asking finitely many queries. The sample complexity of the query
learner is determined by the number of queries the learner asks to identify the target
language.
A teacher-oriented model of learning was introduced independently by Goldman
and Kearns [7] and Shinohara and Miyano [27]. Here the teacher aims to facilitate
and speed up the learning process by presenting helpful examples, in the form of a socalled teaching set, to the learner. The sample complexity is measured by a parameter
2
named teaching dimension. The teaching dimension for a class of languages is the
minimum number of labelled examples needed by the teacher to uniquely identify any
target language from the underlying class.
Consider for example the class L consisting of the following languages:
• L0 = {a, b, ab, bb},
• L1 = {a, b, ab},
• L2 = {a, b, bb},
• L3 = {a, ab, bb},
• L4 = {b, ab, bb}.
Then the language L1 can be uniquely identified from a single example, namely (bb, −),
because L1 is the only language in L not containing the word bb. Similarly, L2 , L3 ,
and L4 can be taught using just one example–we say they each have a teaching set
of size 1. L0 however requires a teaching set of 4 examples, namely (a, +), (b, +),
(ab, +), (bb, +). Thus the teaching dimension of the class is 4.
This thesis studies the teaching complexity for the class of pattern languages and
some interesting subclasses.
Another computational interactive teaching and learning model, introduced by
Zilles, Lange, Holte and Zinkevich [31] is the recursive teaching model in which both
teacher and learner cooperate to reduce the teaching complexity. In order to minimize
the sample complexity, teacher and learner share some knowledge about choosing
teaching sets. This model improves on the sample complexity of the classic teaching
model. The sample complexity is determined by a measure called recursive teaching
dimension.
3
This thesis studies the sample complexity of learning pattern languages using the
recursive teaching protocol in comparison to the classic protocol. In particular, it
aims at comparing the teaching dimension to the recursive teaching dimension for
various classes of pattern languages.
1.2
Pattern Languages
Based on Angluin’s definition [1], a pattern is a non-empty finite string containing
constants and variable symbols. The language of a pattern is the set of words obtained
by replacing variable symbols of the pattern by non-empty words. Learning of pattern
languages has been a topic of great interest in the research area of computational
learning theory, mainly because of its relevance for many applications. Areas in
which patterns are suitable for modelling data are for example bioinformatics (e.g.,
when representing sets of amino acid sequences) or text mining (e.g., for automated
information extraction).
For example, consider the pattern
α = x1 x2 AAx3 x1
over an alphabet that consists of 20 symbols representing the 20 different amino acids.
In α, the symbols x1 , x2 , and x3 represent variables and the symbol A represents an
amino acid. The pattern α then corresponds to the set of all amino acid sequences
of length n ≥ 6 that start and end with the same substring and that contain the
substring “AA” somewhere within positions 3 through n − 2. As another example,
consider the pattern
author: x1 ; title: x2
4
over the alphabet
Σ = {a,b,. . . ,z,A,B,. . . ,Z,;,:} .
It describes all strings that start with the string “author:”, followed by a non-empty
string over Σ, followed by the string “; title:”, which is again followed by a non-empty
string over Σ.
Angluin showed that the class of all pattern languages is learnable in the limit
from positive examples [1].
However, some research shows that the size of the underlying alphabet, which
the constant symbols are taken from, affects the learnability of the class of erasing
pattern languages (in which variables can be replaced with the empty string) in Gold’s
learning model [23]. Reidenbach showed that the class of all erasing pattern languages
over alphabets of size two is not learnable from positive examples in the limit [21].
[26, 13, 2] have studied the problem of learning non-erasing pattern languages using Angluin’s model of query learning. [5] studied the learning of one-variable erasing
pattern languages (which are generated by patterns that contain only one variable,
possibly with repetitions) using both Gold’s Model of learning and Angluin’s query
learning. In this research, the authors present a learning algorithm that improves on
the time complexity of Angluin’s original algorithm [1]. Moreover, [5] showed that
their algorithm can learn one-variable patterns using a polynomial number of superset
queries in the query learning model.
Nessel and Lange [20] studied the problem of learning erasing pattern languages
using Angluin’s model of query learning. Additionally, they considered a model in
which the learner receives an initial string from the target language before asking
queries. This model of query learning, called learning from queries with additional
information, was introduced by Marron [18]. Moreover, [15, 17, 16] showed that
5
asking queries about languages that do not belong to the target class may decrease
the number of queries that the learner needs to identify the target language.
According to Shinohara [28], a pattern is called regular if every variable in the
pattern occurs at most once. [28, 29] showed that the class of erasing regular pattern
languages is learnable using Gold’s learning model.
To the best of my knowledge, the sample complexity of teaching pattern languages
has never been studied before.
1.3
Contribution of This Thesis
This thesis focuses on learning the class of non-erasing pattern languages introduced
by Angluin [1] from teachers, in particular on both the teaching dimension and the
recursive teaching dimension of (subclasses of) the class of such pattern languages.
Furthermore, this thesis investigates whether or not the alphabet size influences
these complexity parameters.
First we determine the teaching dimension and the recursive teaching dimension
for various classes of pattern languages over different types of alphabets.
We prove that, while each language of the underlying class has finite teaching
dimension, for some classes there is no upper bound on the teaching dimension. In
all cases for which we could determine both complexity parameters, the recursive
teaching dimension was smaller than the teaching dimension. This shows that –in
the case of learning pattern languages– the advanced model of recursive teaching
improves the sample complexity when compared to the classic teaching model.
Finally, our results show that the alphabet size has some impact on both the
teaching dimension and the recursive teaching dimension in some cases. Moreover,
we introduce the first example of a class of pattern languages that has infinite teaching
6
dimension but a recursive teaching dimension of 2.
1.4
Organization
Chapter 2 provides the reader with the relevant background and definitions on patterns and their languages. In particular, it presents some interesting subclasses of
pattern languages and gives the reader a brief history on learning pattern languages.
Furthermore, Chapter 2 discusses models of teaching, mainly the classic teaching protocol and the recursive teaching protocol, and presents some useful results that will
be of help in various proofs throughout this thesis.
In Chapter 3, we investigate learning of some interesting classes of pattern languages over finite alphabets of size at least two in the two aforementioned models of
teaching. In some cases, we show that a class of pattern languages can be taught with
fewer examples when using the recursive teaching protocol, in comparison to the classic teaching dimension protocol. In particular, we prove that there is no finite upper
bound on the number of examples needed for teaching using the classic protocol, for
the cases of the class of languages generated by arbitrary patterns, regular patterns
and one-variable patterns over finite alphabets of size at least two. However, some
problems will remain open when determining the exact upper bound on the number
of examples needed in the worst case when teaching arbitrary pattern languages in
the recursive teaching protocol.
Since we are interested in investigating the impact of the alphabet size on the
learning complexity, Chapter 4 studies the learning of classes of languages generated
by arbitrary patterns, regular patterns and one-variable patterns over singleton alphabets. Again, we prove that the recursive teaching protocol reduces the number
of examples needed by the teacher to teach languages generated by regular patterns
7
and one-variable patterns. In contrast to the case of larger alphabets, we show that
there is a finite upper bound on the number of examples needed for classic teaching
of regular pattern languages. However, the teacher still needs more examples than
when using the recursive teaching protocol.
Chapter 5 investigates whether or not there is any change in the number of examples needed by the teacher to teach pattern languages over countably infinite alphabets when using either the classic teaching protocol or the recursive teaching protocol.
It turns out that even for infinite alphabets there is no finite upper bound on the number of examples needed in the classic teaching model for any of the considered classes.
However, infinite alphabets help us to determine the worst case number of examples
needed in the recursive teaching model, for the class of all pattern languages. How
many examples are needed for languages in this class using the recursive teaching
protocol over finite alphabets is still an open question.
In Chapter 6, we summarize our results on learning pattern languages using both
the classic teaching protocol and the recursive teaching protocol and we discuss some
remaining open problems.
8
Chapter 2
Preliminaries and Background
This chapter introduces the fundamental concepts dealt with in this thesis and provides basic related background knowledge as well as a few first results on teaching
pattern languages. The latter will be useful tools throughout this thesis.
2.1
Patterns and Pattern Languages
To define patterns and their languages, let us first introduce some necessary notation.
Let Σ be any countable set, called the alphabet. Elements of Σ are called constants. A word w over Σ is a finite sequence of symbols from Σ. The length of w is
denoted by |w|. The empty word is denoted by ε. Σ∗ is the set of all finite words
over Σ including ε. The set of all finite non-empty words over Σ is denoted by Σ+ .
A language L over Σ is a subset L ⊆ Σ∗ . The notation w k is used to denote the
word ww
. . . w}. We fix a countable set X = {x1 , x2 , . . .} which is disjoint from Σ. Its
| {z
k−times
elements are called variables. X + defines the set of all finite non-empty strings over
X.
Definition 2.1. [1] A pattern α is a finite non-empty string from Σ ∪ X. Π denotes
9
the set of all patterns over Σ.
Let α be a pattern. α is a constant-free pattern if α ∈ X + . As opposed to that,
α is called variable-free if α ∈ Σ+ . The length of α is denoted by |α| and equals
the number of symbols (constants and variables) composing the pattern, counting
repetition. For instance, let Σ = {a, b}. Then α = ax1 x2 ax1 b is a pattern over Σ and
|α| = 6. Note that we do not include any reference to Σ in the notation Π, as the
underlying alphabet will always be clear from the context.
Let α be a pattern and let |α| = m. Then, for all i with 1 ≤ i ≤ m, α[i] denotes
the symbol at position i in α. Moreover, for all i, j with 1 ≤ i ≤ j ≤ m, α[i : j]
denotes the sub-pattern of α which starts at position i and ends at position j, i.e.,
α[i : j] equals α[i] . . . α[j], if i < j, and α[i], otherwise.1
V ars(α) denotes the set of variables appearing in the pattern α. Also, Vα is
defined as the set of positions in α that contain a variable, i.e., Vα = {i | α[i] ∈ X}.
Moreover, |αx | denotes the number of occurrences of variable x in α.
Based on Angluin’s definition [1], we call θ a substitution if θ is a word homomorphism from Π to Σ+ . For a pattern α, θ(α) is the word obtained by replacing all
variables in α by their images under substitution θ. Θ denotes the set of all substitutions with respect to Σ. Note that no θ ∈ Θ can map any variable to the empty
string.
Again according to Angluin’s work [1], we define the language of a pattern as
follows:
Definition 2.2. [1] Let α be a pattern over Σ. The language of α over Σ, denoted
by L(α), is the set of all words obtained by substituting every variable symbol by a
non-empty word, i.e., L(α) = {θ(α) | θ ∈ Θ}. If w ∈ L(α), we also say α generates
1
For the sake of convenience, we identify the symbol α[i] ∈ Σ ∪ X with the sequence (α[i]) ∈
(Σ ∪ X)∗ of length 1.
10
w. By ΠL we denote the set of all languages generated by patterns in Π.
Consider for example the patterns α1 = x1 , α2 = x1 x2 , α3 = x1 x1 , α4 = ax1
over Σ = {a, b}. Then L(α1 ) = Σ+ , L(α2 ) = Σ+ \ {a, b} = {w ∈ Σ+ | |w| ≥
2}, L(α3 ) = {ww | w ∈ Σ+ }, and L(α4 ) = {aw | w ∈ Σ+ } = {w ∈ Σ+ | |w| ≥
2 and w begins with a}.
2.1.1
Interesting Special Classes of Pattern Languages
In this subsection we study some special classes of pattern languages.
Shinohara studied learning of regular pattern languages [28]. His definition of the
term “regular pattern” is as follows:
Definition 2.3. [28] Let Σ be the set of constants and α be a pattern over Σ. α
is called regular pattern if each variable in α occurs at most once. The set of all
regular patterns over Σ is denoted by RΠ. Also, RΠL denotes the set of all languages
generated by patterns in RΠ.
As an example, α = x1 ax2 is a regular pattern over Σ = {a, b} since α has at
most one occurrence of each variable symbol. However, the pattern β = x1 ax1 over
Σ = {a, b} is non-regular because the variable x1 occurs twice in α. Obviously, the
substitution would be the same for every occurrence of each variable. Therefore, the
word baa belongs to L(α) but not to L(β).
From results by Shinohara it follows that the class of regular pattern languages
is polynomial time learnable using positive examples. A class L is polynomial time
learnable from positive examples if and only if there exists a learner that identifies
every language in L in the limit from positive examples and processes each new input
in polynomial time with respect to the length of the words the learner has been
presented so far [28, 29].
11
Angluin introduced the class of one-variable pattern languages and found a polynomial time algorithm to find a descriptive one-variable pattern for a given sample
[1]. The pattern α is called descriptive of the sample set S ⊆ L(α) if for every pattern
β for which S ⊆ L(β), L(β) is not a proper subset of L(α) [1]. According to Angluin,
the definition for one-variable patterns is as follows:
Definition 2.4. [1] Let Σ be the set of constants and α be a pattern over Σ. Then
α is called one-variable pattern if |V ars(α)| ≤ 1. The set of all one-variable patterns
over Σ is denoted by 1V Π. Also, 1V ΠL denotes the set of all languages generated by
patterns in 1V Π.
Based on Definition 2.4, α = ax1 bx1 abx1 is a one-variable pattern over Σ = {a, b}
since x1 is the only variable occurring in α, no matter how many occurrences it has.
Later, we will discuss the learnability of RΠL and 1V ΠL in detail.
Let Π̂ be a set of patterns over Σ and let n ∈ N, n ≥ 1. We define Π̂≥n = {α ∈
Π̂ | |α| ≥ n}. Accordingly, the set of languages over Σ generated by patterns in Π̂≥n
is denoted by Π̂L≥n .
Moreover, for n, k ∈ N we define Π̂k≥n = {α ∈ Π̂≥n | |V ars(α)| = k} as the class of
all patterns that have length n or greater and have k variables. Also, Π̂Lk≥n denotes
the set of languages over Σ generated by elements of Π̂k≥n .
Also, for any n, k ∈ N, n > 0, k ≤ n, let Π̂L≤k
n denote the set of all pattern
languages generated by patterns of length n over Σ containing at most k variables,
i.e., Π̂L≤k
n = {L(α) | α ∈ Π̂, |α| = n, |V ars(α)| ≤ k}.
In the pattern languages introduced by Angluin, the variables are substituted with
non-empty words, and hence they are usually called non-erasing pattern languages
in the literature. The empty substitution was considered by Shinohara, coining the
term erasing pattern languages [28]. Consider for example the pattern α = x1 ax2
12
over the alphabet Σ = {a, b}. The non-erasing pattern language generated by α
is L(α) = {aaa, aab, . . .} = {w ∈ Σ∗ | w contains the symbol a in between two
non-empty words}, while the erasing pattern language generated by α is LE (α) =
{a, ba, . . .} = {w ∈ Σ∗ | w contains the symbol a}. In particular, note that the
shortest word in L(α) is of length |α|, while the shortest word in LE (α) has length
1 < |α|. On the one hand, this little difference between erasing and non-erasing
languages makes the erasing languages often more suitable for applications. For
example, Shinohara and Arikawa [29] studied learning of finite unions of erasing
regular pattern languages to classify protein data.
On the other hand, the learning algorithms, if they exist at all, are much more
complicated for the erasing pattern languages [20]. Since in this thesis we focus only
on Angluin’s non-erasing pattern languages, we usually omit the term “non-erasing”.
2.1.2
Learning Pattern Languages
This section will study the problem of learning pattern languages and will review
some literature on a variety of learning models. Most of the research presented in
this section is based on Gold’s model of learning and Angluin’s query learning.
In 1967, Gold studied language identification in the limit using positive examples.
A positive example of a pattern language over Σ is a word that belongs to the language.
As opposed to that, a negative example is a word in the complement of the language.
In Gold’s model of learning, every language is represented by a “hypothesis”. The
learner is presented an infinite sequence of all positive examples from the target
language L, one at a time, and at every time step it has to provide a hypothesis
based on the information it has received so far. The learner identifies L in the limit
if the learner repeats a single hypothesis at each time step after some finite time, and
this hypothesis represents the language L, independent of the given sequence of all
13
positive examples for L. A class of languages is identifiable in the limit if there exists
a learner that identifies every language in the class in the limit [6]. Many studies on
learning languages have been published based on Gold’s Model [21, 28, 12, 1, 14].
Angluin showed that the class of all (non-erasing) pattern languages is learnable
in Gold’s learning model [1]. However, if the size of the underlying alphabet is 2,
3, or 4, then the class of erasing pattern languages is not learnable in the limit [23].
Certain subclasses of erasing pattern languages are learnable in the limit, but the
known learning algorithms are more complex than in the corresponding non-erasing
case [21, 22, 11, 25, 24]. One example of a subclass of erasing pattern languages that
is learnable in the limit is the class of erasing regular pattern languages [28, 29].
Another learning model, called query learning, was introduced by Angluin in 1988
[2]. A query learner has access to an oracle that answers questions about the target
language truthfully. This learning model is a model of one-shot learning, i.e., the
learner has to provide its only hypothesis after asking finitely many queries. The query
learner is called successful if the provided hypothesis describes the target language.
The efficiency of the learner is determined by the number of queries the learner asks
in the worst case to make its final hypothesis.
Various types of queries were considered by Angluin. When learning from superset
queries for example, the learner picks a language L0 and asks whether or not L0 is a
superset of the target language L. The oracle will answer truthfully, where the answer
‘no’ is supplemented with a counterexample. The counterexample is any arbitrary
word witnessing the negative reply, e.g., a word w ∈ L \ L0 in the case of the superset
query “is L0 ⊇ L?”. No matter how the oracle selects these counterexamples, the
learner must be successful. Angluin also studied a restricted version of query learning
in which the learner is not provided with a counterexample when the answer is ‘no’ [2].
In all of these models, the learner is only allowed to ask queries about the languages
14
that belong to the target class. By contrast, Lange and Zilles [15] considered learning
with extra queries which allows the learner to query languages that are not in the
target class.
An initial positive example is provided in the model of learning from queries with
additional information, which is studied by Marron [18]. In this version, the learner
is provided with a word that belongs to the target language, at the beginning of the
learning process. Based on the initial example, the learner is able to make more
effective queries, which helps in decreasing the total number of questions to identify
the target language. In the case of learning pattern languages, usually the most
helpful initial example is the shortest word that a pattern generates since it informs
the learner either of the subsequence of constant symbols contained in the target
pattern (in the erasing case) or of the length of the target pattern pattern (in the
non-erasing case).
The papers [26, 13, 2] have studied the problem of learning non-erasing pattern
languages using queries while [5] has considered the learning of one-variable erasing
pattern languages. Nessel and Lange [20] studied the problem of language identification using Angluin’s model of query learning with and without additional information
for the class of erasing pattern languages. Also, [15, 17, 16] showed that using extra
queries to learn erasing pattern languages is more powerful and more efficient than the
original query learning model. In doing so, the authors revealed some similarities between Gold’s learning model and query learning, which helped to transfer learnability
results from one model to the other.
15
2.2
Models of Teaching
This section will discuss the complexity of learning from teachers. First we provide
definitions for some teaching complexity measures. Then some teaching models will
be reviewed.
Most studies in Computational Learning Theory deal with learning models in
which the learner either actively requests information or is given data examples that
are chosen at random or adversarily. By contrast, Shinohara and Miyano [27] studied
the complexity of teaching and they defined the notion of teachability. They considered a model of learning which is called learning from good examples. A helpful
teacher aims to reduce the total number of labelled examples presented in what is
called a teaching set to teach an unknown target language. The same teaching model
was independently developed by Goldman and Kearns [7].
A labelled example for a language L is either a pair (w, +) if w ∈ L or a pair
(w, −) if w ∈
/ L.
A teaching set for a language L in a class L is a set of labelled examples that
uniquely identifies L in L in the sense that L is the only language in L consistent
with that set (see Definition 2.5 below). We say that L is consistent with a set
{(w1 , l1 ), . . . , (wm , lm )} of labelled examples if for all i ∈ {1, . . . , m}, li = + for
wi ∈ L and li = − for wi ∈
/ L. If L is a pattern language generated by a pattern α,
we also say “α is consistent with S” to mean “L(α) is consistent with S”.
An important issue which is discussed in learning from helpful teachers is the
problem of “collusion” or “coding tricks”. Since the teacher and learner cooperate to
minimize the number of required labelled examples, they might agree on some coding
scheme. As an example, if both the set of all words and the class of potential target
languages have a fixed enumeration, then teacher and learner can agree on using the
16
Language
L(a)
L(b)
L(x1 )
L(x1 x2 )
..
.
a
+
+
..
.
w ∈ Σ∗ , |w| ≥ 2 . . .
- ...
- ...
+ ...
+. . .
b
+
+
..
.
±...
Table 2.1: Class of all pattern languages ΠL over Σ = {a, b}.
ith word (with the corresponding label) to teach the ith language in the target class.
In case of such a coding trick, teaching becomes trivial. Thus each teacher-directed
learning protocol proposes a way to prevent collusion between teacher and learner.
In [7], Goldman and Kearns defined a teaching complexity measure named teaching
dimension by considering the teacher directed learning model introduced by [9, 27].
In order to avoid collusion, the learner is required to return a hypothesis describing
the target language L even when a superset of a teaching set for L is presented, as
long as this superset is consistent with L.
Goldman and Kearns’s definition of teaching set in the context of learning a class
of languages is as follows:
Definition 2.5. [7] Let Σ be any alphabet and L be any class of languages over Σ.
Let L ∈ L. Let S be a set of labelled examples that is consistent with L. S is called a
teaching set for L with respect to L if L is the only language in L consistent with S.
By TS(L, L) we denote the set of all minimum teaching sets for L with respect to L,
i.e., S ∈ TS(L, L) if and only if S is a teaching set for L with respect to L and there
is no teaching set T for L with respect to L such that |T | < |S|.
Example 2.1. Consider the class of pattern languages over Σ = {a, b} as shown in
Table 2.1. Then S = {(a, +), (b, +)} ∈ TS(L(x1 ), ΠL) since patterns of length greater
than |x1 | do not generate words of length |x1 |. As we can see, L(a) and L(b) are not
17
consistent with S because these languages have only one positive example. Thus the
only language consistent with S is L(x1 ). Furthermore, there is no set T with |T | = 1
that is a teaching set for L(x1 ) with respect to L. To see this, first note that L(w)
is consistent with T = {(w, +)} which means T is not a teaching set for L(x1 ) with
respect to L. Second, no negative example is consistent with L(x1 ). Hence, L(x1 )
does not have a teaching set containing either a single positive example or a single
negative example.
Definition 2.6. (Based on [7]) Let L be any class of languages and let L ∈ L. The
teaching dimension of L with respect to L, denoted by TD(L, L), is the smallest size
of any teaching set for L with respect to L. Further, the teaching dimension of L,
denoted by TD(L), is defined by TD(L) = sup{TD(L, L) | L ∈ L}.2
Language
L0
L1
L2
L3
..
.
ε
+
-
a
+
-
b
+
w ∈ Σ∗ . . .
- ...
- ...
- ...
- ...
-
-
-
± ...
Table 2.2: Class L = {Li | i ∈ N} where L0 = ∅, Li = {wi } and (w1 , w2 , w3 , . . .) is a
repetition-free enumeration of Σ∗ .
Example 2.2. Let (w1 , w2, w3 , . . .) be a repetition-free enumeration of Σ∗ . Let L =
{Li | i ∈ N} be defined by L0 = ∅ and Li = {wi } (see Table 2.2). Since each word
wi for i ≥ 1 belongs only to language Li in L, we can use the set Si = {(wi , +)} as a
teaching set for Li with respect to L when i ≥ 1. Thus, TD(Li , L) = 1 for all i ≥ 1.
But L0 has no positive example to distinguish it from other languages in the class L.
2
Note that there are two possible reasons for TD(L) to be infinite: (i) If the teaching dimension
for a particular language in the class is infinite. (ii) If each language in L has finite teaching
dimension but there is no finite upper bound on TD(L, L) for all L ∈ L.
18
Thus, we have to use only negative examples to distinguish L0 from other languages in
L. For every set of negative examples not covering all of Σ∗ , and thus in particular for
every finite set of negative examples, there is a language L 6= L0 in L that is consistent
with the set. Thus, TD(L0 , L) = ∞. Therefore, TD(L) = sup{1, ∞} = ∞.
In other words, the teaching dimension of a class L is the worst-case number of
examples a teacher has to present for any target language L ∈ L so that L is the only
language in L that is consistent with the chosen examples.
Shinohara and Miyano introduced a model of polynomial time learning from a
helpful teacher and they showed if RP 6= NP there exists a class that is polynomialtime learnable from selected examples but is not polynomial-time PAC-learnable [27,
4, 19].
In some cases, to avoid collusion in teacher/learner pairs, an adversarial
teacher modifies the teaching set provided by teacher. The paper [10] studied a
teacher/learner pair in which the learner has to be successful if the teacher is replaced by an adversarial teacher to avoid coding tricks. In order to teach a “smarter
learner”, Goldman and Mathias proposed a pair of teacher and learner in which the
teacher is not required to teach all consistent learners. However, in their model the
learner must still be successful if the adversarial teacher embeds the original teaching
set into its own [8]. Balbach [3] introduced the Balbach teaching protocol and Zilles et
al. [31] proposed the subset teaching protocol both of which are cooperative models
of learning using a helpful teacher.
Recently, in the context of learning finite concept classes, Zilles, Lange, Holte and
Zinkevich introduced a cooperative model of teaching and learning, called recursive
teaching protocol (RTP) [31], in which a teacher selects a helpful set of labelled examples consistent with the target concept. The sets chosen in this model can be
much smaller than the smallest teaching sets according to Definition 2.5. To avoid
19
collusion, an adversary provides the learner with a superset of a given teaching set
and the learner has to return a hypothesis that is consistent with the target concept.
The key idea in [31] is that the teacher first builds teaching sets for some concepts
from the underlying class that have the smallest minimal teaching dimension. Then,
those concepts will be removed from the class and the teacher will continue with
the remaining class. The corresponding complexity measure is called the recursive
teaching dimension (RTD).
The definition of RTD, rephrased in terms of learning infinite classes of languages,
is as follows:
Definition 2.7. (Based on [31]) Let L be a class of languages and (L1 , L2 , L3 , . . .) a
possibly infinite sequence of subclasses Li ⊆ L such that Li ∩ Lj = ∅ for i 6= j and
S
i∈N Li = L. STS = ((L1 , d1 ), (L2 , d2 ), . . .) is called a subclass teaching sequence of
L if for all i ≥ 1, Li ⊆ {L ∈ L̄i | di = TD(L, L̄i ) ≤ TD(L0 , L̄i ) for all L0 ∈ L̄i },
where L̄1 = L and L̄j+1 = L \ (L1 ∪ . . . ∪ Lj ) for all j ≥ 1. Any set S ∈ TS(L, L̄i )
is called a recursive teaching set for L ∈ L with respect to STS, for any i ≥ 1 and
any L ∈ Li . Further, the quantity RTD(L) = sup{di | i ≥ 1} is called the recursive
teaching dimension of L.
It should be noted that the quantity RTD(L) is well-defined as it does not depend
on the particular choice of subclass teaching sequence.
Example 2.3. Consider the class introduced in Example 2.2. Let L1 = {L ∈ L |
|L| = 1} and L2 = {∅}. Every language Li ∈ L1 is learnable with a single positive
example wi since Li is the only language consistent with (wi , +) with respect to Li .
That is:
TD(L, L) = 1 f or all L ∈ L1 .
After removing all languages in L1 from the underlying class, the teacher uses the
20
empty set to teach languages in L2 , which contains only L0 . Thus
TD(L, L) = 0 f or all L ∈ L2 .
The corresponding subclass teaching sequence is STS = ((L1 , 1), (L2 , 0)). Hence,
RTD(L) = sup{0, 1} = 1.
It was shown that the recursive teaching dimension can be interpreted as the
worst-case number of examples a teacher has to provide in order to teach any concept
in the given class, when using a specific protocol, called the recursive teaching protocol
[31]. The sets used for teaching in this protocol are derived from the subset teaching
sequence in which Li = {L ∈ Li | di = TD(L, L̄i ) ≤ TD(L0 , L̄i ) for all L0 ∈ L̄i } for
all i. The recursive teaching protocol can be much more efficient than the classic
teaching protocol, and never less efficient, i.e., RTD(L) ≤ TD(L) for all L, while for
all r there is a class Lr such that RTD(Lr ) = 1 and TD(Lr ) = r [31].
In this thesis, I am trying to answer the question whether using the RTD protocol
results in an advantage over using the TD protocol while learning classes of pattern
languages. In particular, I am interested in quantifying the number of examples a
teacher needs in the worst case for either protocol.
2.3
Useful Tools
In this section we provide some useful results that will be of help in various proofs
throughout this document. First, we show that, under certain assumptions on the
underlying class of pattern languages, any teaching set for any pattern language must
contain at least one positive example.
Lemma 2.1. Let Σ be any countable alphabet and let L be any class of pattern
21
languages over Σ that contains all languages generated by variable-free patterns. Let
L ∈ L. Then no finite teaching set for L with respect to L consists of negative
examples only.
Proof. Assume there are some L ∈ L and some w1 , . . . , wm ∈ Σ+ such that S =
{(w1 , −), . . . , (wm , −)} ∈ TS(L, L). Let α1 , α2 ∈ Σ+ \ {w1 , . . . , wm } and α1 6= α2 .
Both L(α1 ) and L(α2 ) are consistent with S, hence S ∈
/ TS(L, L).
Lemma 2.2. Let Σ be any countable alphabet and let L be any class of pattern
languages over Σ that contains all languages generated by constant-free patterns. Let
L ∈ L. Then no finite teaching set for L with respect to L consists of negative
examples only.
Proof. Assume there are some L ∈ L and some w1 , . . . , wm ∈ Σ+ such that S =
{(w1 , −), . . . , (wm , −)} ∈ TS(L, L). Let k = max{|wi | | 1 ≤ i ≤ m}. Let α1 =
x1 . . . xk+1 and α2 = x1 . . . xk+2 . Both L(α1 ) and L(α2 ) are consistent with S, hence
S∈
/ TS(L, L).
Next, we observe that positive examples alone cannot teach any pattern language
(except for L(x1 )) either, as soon as L(x1 ) is contained in the underlying class.
Lemma 2.3. Let Σ be any countable alphabet and let L be any class of pattern
languages over Σ that contains L(x1 ). Let L ∈ L. If L 6= L(x1 ), then no teaching set
for L with respect to L consists of positive examples only.
Proof. Since L(x1 ) contains all non-empty words over Σ, it is consistent with every
set of positive examples. Hence, for any L ∈ L with L 6= L(x1 ), there is no teaching
set, with respect to L, consisting of only positive examples.
Our next result will show that the teaching dimension of any pattern language is
22
finite, even when the underlying class contains all pattern languages. To prove this,
we first introduce the notion of learning from membership queries.
In Angluin’s model of learning with membership queries [2], the learner selects a
word w ∈ Σ∗ and asks the oracle whether w belongs to the target language or not.
The oracle answers “yes” if w belongs to the target language and “no” otherwise.
The learner is successful for the target language L if it can uniquely identify L after
finitely many membership queries. A class of languages is said to be learnable from
membership queries if there is a learner that successfully identifies every language
in that class from membership queries. It is an immediate consequence of results
by Angluin [1] that the class of all pattern languages is learnable from membership
queries, independent of the underlying alphabets.
Theorem 2.1. Let Σ be any countable alphabet and let L be any class of pattern
languages over Σ. Let L ∈ L. Then TD(L, L) is finite.
Proof. Let A be a query learning algorithm that learns L using membership queries.
Let w1 , . . . , wm ∈ Σ+ be the words queried by A when learning L ∈ L. Then,
obviously, {(w1 , l1 ), . . . , (wm , lm )} is a finite teaching set for L in L, where, for 1 ≤
i ≤ m, li = + if wi ∈ L and li = − if wi ∈
/ L.
23
Chapter 3
Finite Alphabets of Size at Least
Two
In this chapter we will investigate how many examples a teacher needs to teach an
unknown target language from some interesting classes of pattern languages when
dealing with a finite alphabet containing at least 2 constants. The teacher may use
either the teaching protocol (TP) or the recursive teaching protocol (RTP).
3.1
Arbitrary Patterns
This section addresses the learnability of the class of all pattern languages over Σ
(shortly ΠL) using both TP and RTP. The main goal is to determine the teaching
dimension and the recursive teaching dimension for the class of all pattern languages.
In particular, we are interested in the question whether RTD is strictly smaller than
TD for the class ΠL. Unfortunately, we cannot answer this question fully. While
we prove that TD(ΠL) is infinite, determining RTD(ΠL) remains an open problem.
We will consider the subclasses of regular pattern languages and 1-variable pattern
24
languages separately in subsequent sections.
3.1.1
Teaching Dimension
In order to find the teaching dimension for the class of all pattern languages, we first
refer to Lemma 2.1 and Lemma 2.3 to prove that there is no teaching set containing
only negative examples or containing only positive examples, for any language in the
underlying class of all pattern languages. As a result of these lemmas, the learner
cannot uniquely identify any language from the class of all pattern languages using
only one positive or negative example. In other words, the teaching dimension of any
language within the class of all pattern languages is at least two.
In the following theorem we formalize this idea for every class of pattern languages
that contains L(x) and all languages generated by variable-free patterns. Obviously,
Theorem 3.1 applies to the class of all pattern languages since this class contains L(x)
and all variable-free pattern languages.
Theorem 3.1. Let Σ be any countable alphabet. Let L be any class of pattern languages that contains L(x) and all languages generated by variable-free patterns. Let
L ∈ L. Then TD(L, L) ≥ 2.
Proof. Let L ∈ L. First, consider the case where L 6= L(x). Assume there is a set
S ∈ TS(L, L) where |S| = 1. So, S contains either only one negative example or only
one positive example. Based on Lemma 2.1 and Lemma 2.3, S is not a teaching set
for L with respect to L.
Second, consider the case where L = L(x). Any S ∈ TS(L, L) with |S| = 1 would
then consist of a single positive example (w, +). However, the language generated
by the variable-free pattern w, which belongs to L, would then be consistent with S.
Hence, S cannot be a teaching set for L with respect to L.
25
Corollary 3.1. Let Σ be any countable alphabet. Let α be any pattern over Σ. Then
TD(L(α), ΠL) is a finite number greater than 1.
Proof. Immediate from Theorem 2.1 and Theorem 3.1.
Even without restriction on the alphabet size, each pattern language can be distinguished from all other pattern languages with a finite teaching set. As we will
show next, there is no upper bound on the size of the smallest possible teaching sets
over all pattern languages, if the underlying alphabet is finite and has at least two
symbols. We will later generalize this result to any countable alphabet.
Theorem 3.2. Let Σ be a finite set of at least two constants. Then TD(ΠL) = ∞.
Proof. Let Σ be a finite alphabet of at least two constants. Assume TD(ΠL) = n <
∞. Let k = |Σ|n−1 + 1. Since TD(L(x1 . . . xk ), ΠL) ≤ n, using Lemma 2.3, there is a
teaching set S for L(x1 . . . xk ) with respect to ΠL that contains at most n − 1 positive
examples. Let {w1 , w2 , . . . , wn−1} be a set of n − 1 distinct words in L(x1 . . . xk )
containing these positive examples. Since S is a teaching set for L(x1 . . . xk ) with
respect to L, there is no pattern α with {w1 , w2 , . . . , wn−1 } ⊆ L(α) ⊂ L(x1 . . . xk ).
Let m = min{|w1 |, . . . , |wn−1 |}. Note m ≥ k = |Σ|n−1 + 1.
Let l1 = |Σ|n−2 + 1. There must be l1 many positions p11 , . . . , p1l1 ≤ m and a
constant σ1 ∈ Σ such that w1 [p11 ] = w1 [p12 ] = . . . = w1 [p1l1 ] = σ1 . Otherwise no symbol
in Σ would occur at least l1 times in w1 [1 : m] which would imply
m ≤ (l1 − 1)|Σ| = |Σ|n−1 < k ≤ m.
Similarly, if l2 = |Σ|n−3 + 1, there must be l2 many positions p21 , . . . , p2l2 among
p11 , . . . , p1l1 and a constant σ2 ∈ Σ such that w2 [p21 ] = w2 [p21 ] = . . . = w2 [p2l2 ] = σ2 .
26
Otherwise,
l1 ≤ (l2 − 1)|Σ| = |Σ|n−2 < l1 .
In general, for 1 ≤ i ≤ n − 1, let
li = |Σ|n−1−i + 1.
Then there are li many positions in which all words wj for 1 ≤ j ≤ i have only
repetitions of some constant σj ∈ Σ. Finally, there must be ln−1 = 2 repetitions of a
single constant in wn−1 , such that all words w1 , . . . , wn−2 have repeated constants in
the same positions.
Let us call these two positions p1 and p2 . Let x ∈ X \ {x1 , . . . , xk }. Thus,
α = x1 . . . xp1 −1 xxp1 +1 . . . xp2 −1 xxp2 +1 . . . xk
is consistent with all positive and negative examples contained in S. Since L(α) ⊂
L(x1 . . . xk ), α is consistent with all negative examples in S.
Further, because
of the choice of p1 and p2 , α is also consistent with all positive examples in
{(w1 , +), (w2, +), . . . , (wn−1, +)}. Hence, the learner is not able to uniquely identify L(x1 . . . xk ) using n labelled examples. Therefore, there is no upper bound on the
number of examples the teacher needs to provide while teaching any L ∈ ΠL. This
implies TD(ΠL) = ∞.
The following corollaries introduce two subclasses of pattern languages over finite
Σ which have an infinite teaching dimension.
From the proof of Theorem 3.2 we infer that, as long as all constant-free pattern
languages belong to the target class, the teaching dimension of the underlying class
over any finite alphabet is infinite.
27
Corollary 3.2. Let Σ be a finite set of at least two constants. Let L be the class of
all pattern languages generated by constant-free patterns. Then TD(L) = ∞.
Proof. Immediate consequence of the proof of Theorem 3.2.
In the proof of Theorem 3.2, we further see that the class of pattern languages consisting only of languages of the form L(x1 . . . xk ) (for k ≥ 1) and
L(x1 ...xj−1 xi xj+1 ...xk ) (for 1 ≤ i < j ≤ k) has an infinite teaching dimension.
Corollary 3.3. Let Σ be a finite set of at least two constants. Let L be the class
of all pattern languages generated by patterns of the form x1 . . . xk (for k ≥ 1) or
x1 ...xj−1 xi xj+1 ...xk (for 1 ≤ i < j ≤ k). Then TD(L) = ∞.
Proof. Immediate consequence of the proof of Theorem 3.2.
3.1.2
Recursive Teaching Dimension
We will next show the existence of a class of pattern languages which has finite
recursive teaching dimension while it has infinite teaching dimension.
Lemma 3.1. Let Σ ≥ 2. Let n ∈ N, n ≥ 1. Let L = {L(β) | β = x1 . . . xk
or β = x1 . . . xj−1 xi xj+1 . . . xk , for some i, j with 1 ≤ i < j ≤ k, k ≥ n} and let
α = x1 . . . xj−1 xi xj+1 . . . xn for some i, j with 1 ≤ i < j ≤ n. Then TD(L(α), L) = 2.
In particular, S = {(w1 , +), (w2, −)} ∈ TS(L(α), L) where |w1 | = |w2| = |α| and
1. w1 ∈ L(α), w1 [i] = w1 [j], w1 [l] = w1 [l0 ] for all l, l0 ∈ {1, . . . , n} \ {i, j}, w1 [i] 6=
w1 [l] for all l ∈ {1, . . . , n} \ {i, j}.
2. w2 [i] 6= w2 [j], w2 [l] = w1 [l] for all l ∈ {1, . . . , n} \ {i, j}.
Proof. Since |w1 | = |α|, the example (w1 , +) distinguishes L(α) from all L(β) ∈ L
where |β| > |α|.
28
Moreover, constant-free patterns of length n that do not have the same variable
in positions i and j cannot be consistent with (w2 , −). Thus, S distinguishes L(α)
from all L(β) ∈ L where |β| = |α|. Finally, it is obvious that L(α) does not have a
teaching set of size 1 with respect to L.
Lemma 3.2. Let |Σ| ≥ 2. Let n ∈ N, n ≥ 1. Let L = {L(β) | β = x1 . . . xk , k ≥ n or
β = x1 . . . xj−1 xi xj+1 . . . xk , k > n} and let L(α) = L(x1 . . . xn ). Then TD(L(α), L) =
1. In particular, S = {(w1 , +)} ∈ TS(L(α), L) where |w1 | = |α|.
Proof. Since x1 . . . xn is the only pattern of length n in L, and |w1| = n, x1 . . . xn is
also the only pattern in L that is consistent with (w1 , +). Thus S = {(w1 , +)} ∈
TS(L(α), L) and TD(L(α), L) = 1.
Using Lemma 3.1 and Lemma 3.2 we can prove that the class mentioned in Corollary 3.3 has infinite teaching dimension but finite recursive teaching dimension.
Theorem 3.3. Let Σ be a finite set of at least two constants. There exists a class L
of pattern languages over Σ such that TD(L) = ∞ and RTD(L) = 2.
Proof. Let |Σ| = 2. Let L be the class of all pattern languages generated by patterns
of the form x1 . . . xk (for k ≥ 1) or x1 ...xj−1 xi xj+1 ...xk (for 1 ≤ i < j ≤ k). By
Corollary 3.3, this class has infinite teaching dimension. We claim that RTD(L) = 2.
To see this, we show that
STS = ((L1 , 1), (L2, 1), . . . , (L2z−1 , 1), (L11, 2), (L12, 1), (L13, 2), (L14 , 1), . . .)
is a subclass teaching sequence for L, where
1. L = {L(x1 )},
2. For all k ∈ {1, . . . , z − 1}, L2k = {L(x1 . . . xk+1 )},
29
3. For all k ∈ {1, . . . , z − 1}, L2k+1 = {L(x1 , . . . , xj−1xi xj+1 . . . xk+1 ) | 1 ≤ i ≤ j ≤
k + 1},
4. For all k ≥ 1, L12k−1 = {L(x1 . . . xj−1 xi xj+1 . . . xz+k ) | 1 ≤ i < j ≤ z + k},
5. For all k ≥ 1, L12k = {L(x1 . . . xz+k )}.
To see that TD(L, L̄r ) = 1 for all r ∈ {1, . . . , 2z − 1} and all L ∈ Lr , note that Σ
has z many symbols. Denote these symbols by σ1 , . . . , σz .
First, obviously L(x1 ) has a teaching set of size 1 with respect to L, since L(x1 )
is the only language in L generating words of length 1.
Second, for 1 ≤ k ≤ z − 1, the set {(σ1 σ2 . . . σk+1 , +)} is a teaching set of size 1
for L(x1 . . . xk+1 ) with respect to L̄2k .
Third, for 1 ≤ k ≤ z − 1, the set {(σ1 . . . σj−1 σi σj+1 . . . σk+1 , +)} is a teaching set
of size 1 for L(x1 . . . xj−1 xi xj+1 . . . xk+1 ) with respect to L̄2k+1 .
Applying Lemma 3.1 and Lemma 3.2 to the remainder of STS shows that STS is
indeed a subclass teaching sequence for L.
Finally, by Definition 2.7, RTD(L) = sup{1, 2} = 2. Thus TD(L) = ∞ while
RTD(L) = 2.
Theorem 3.3 shows that it may be possible for the teacher to reduce the number of
required examples to teach an unknown language from an infinite class of languages
using the recursive teaching protocol, which is a collusion-free protocol.
3.2
Regular Patterns
In this section we will study the teaching dimension and recursive teaching dimension
for the class of regular pattern languages over Σ (shortly RΠL). Recall that a pattern
30
is called regular if every variable occurs at most once in it. We denote the set of all
regular patterns by RΠ.
3.2.1
Teaching Dimension
This subsection deals with quantifying the number of examples the teacher needs to
teach the class of regular pattern languages using the TD protocol. Since L(x) and
all languages generated by variable-free patterns belong to RΠL, we are still able to
use Lemma 2.1 and 2.3 to show that TD(L(α), RΠL) > 1 for any regular pattern α.
Moreover, the following proposition shows that all regular pattern languages except for those generated by variable-free patterns have a teaching dimension of size
at least 3.
Proposition 3.1. Let Σ be any countable set. Let α ∈ RΠ be a pattern that contains
at least one variable and let |α| ≥ 2. Then TD(L(α), RΠL) ≥ 3.
Proof. We prove that L(α) does not have a teaching set of size 2. Assume for the
purpose of contradiction that TD(L(α), RΠL) ≤ 2. Because of Lemma 2.1 and
Lemma 2.3, every teaching set for L(α) has both positive and negative examples. Thus
assume S = {(w1 , +), (w2, −)} ∈ TS(L(α), RΠL). But then the variable-free pattern
β = w1 is consistent with S. So, S ∈
/ TS(L(α), RΠL) and TD(L(α), RΠL) ≥ 3.
Example 3.1. Consider the class of regular pattern languages over Σ = {a, b} (see
Table 3.1). There is only one language generated by a constant-free pattern of length
1, namely L(x1 ). Its smallest teaching set is S1 = {(a, +), (b, +)}. For the variablefree languages the teacher has to use both positive and negative examples, as we proved
in Lemma 2.1 and Lemma 2.3. In this case, the sets S2 = {(a, +), (b, −)} and S3 =
{(b, +), (a, −)} are teaching sets for L(a) and L(b), respectively.
31
But for L(α), where |α| ≥ 2 and α has at least one variable, there is no set of
only two labelled examples that uniquely identifies L(α) among all languages in RΠL.
In particular, any set S = {(w1 , +), (w2, −)} is not a teaching set for L(α) since the
variable-free language L(w1 ) is consistent with S. Moreover, based on Lemma 2.1 and
Lemma 2.3, L(α) cannot have any teaching sets containing only negative examples or
only positive examples. Thus the teaching dimension of the class is at least 3.
The next theorem proves that every constant-free pattern language over Σ has a
teaching dimension of size 3 with respect to the class of regular pattern languages.
Theorem 3.4. Let |Σ| ≥ 2. Let α ∈ RΠ be a constant-free pattern and |α| = n ≥ 2.
Then TD(L(α), RΠL) = 3. In particular, S = {(an , +), (bn , +), (an−1 , −)} belongs to
TS(L(α), RΠL), where a, b ∈ Σ, a 6= b.
Proof. Based on Proposition 3.1, TD(L(α), RΠL) ≥ 3. Since |an | = |bn | = n there
is no β ∈ RΠ, |β| > n, that is consistent with S. Furthermore, those patterns
that have at least one constant are not consistent with S. Finally, all constant-free
regular patterns of length at most n − 1 generate an−1 and are not consistent with S.
Therefore, the only regular pattern consistent with S is α.
Theorem 3.5. Let |Σ| ≥ 2. Then TD(RΠL) ≥ 4.
Proof. Based on Lemma 2.1, Lemma 2.2 and Lemma 2.3 we know that every teaching
set for L(α) must contain both positive and negative examples.
Let α = ax1 b. We prove that there is no teaching set containing three labelled
examples for L(α) with respect to RΠL. Obviously, no set S of size three containing
only one positive example (w1 , +) can be a teaching set for L(α) since the language
generated by the pattern β = w1 is consistent with S.
Furthermore, S = {(w1 , +), (w2, +), (w3 , −)}, for w1 , w2 , w3 ∈ Σ+ , is not a teaching set for L(α) because of the following reasoning:
32
Language
L(a)
L(b)
L(x1 )
L(aa)
L(bb)
L(ab)
L(ba)
L(ax1 )
L(x1 a)
L(bx1 )
L(x1 b)
L(x1 x2 )
..
.
a
+
+
..
.
b
+
+
..
.
aa
+
+
+
+
+
..
.
ab
+
+
+
+
+
..
.
ba
+
+
+
+
+
..
.
bb
+
+
+
+
+
..
.
w ∈ Σ∗ , |w| ≥ 3 . . .
- ...
- ...
+ ...
- ...
-. . .
- ...
- ...
± ...
± ...
± ...
± ...
+ ...
..
.
Table 3.1: Class of all regular pattern languages (RΠL) over Σ = {a, b}.
• If |w3 | < |α|, then the language generated by β = x1 x2 x3 is consistent with S.
• If |w3| ≥ |α|, then w3 either does not start with a or does not end with b which
means that either L(ax1 x2 ) or L(x1 x2 b) is consistent with S, respectively.
Finally, S = {(aab, +), (abb, +), (bbb, −), (aba, −)} ∈ TS(L(α), RΠL). Since S contains a positive example of length |α|, no pattern longer than α is consistent with S.
Also, S contains more than one positive example, thus there is no language generated
by variable-free patterns consistent with S. Moreover, the two negative examples in
S distinguish L(α) from all languages generated by constant-free patterns of length
less than or equal |α|, which can be explained as follows.
Since α starts and ends with constants, any pattern of length |α| that is consistent
with the two positive examples and is different from α must begin or end with a
variable. Any such pattern is inconsistent with at least one of the two negative
examples (bbb, −), (aba, −). Patterns of length less than 3 that are not constant-free
and are consistent with the two positive examples must either start with a or end
33
with b. Thus the languages generated by these patterns are inconsistent with the two
negative examples.
Therefore, the only language in RΠL that is consistent with S is L(α) and S is a
teaching set for L(α) with respect to the class of all regular pattern languages.
Thus, TD(L(α), RΠL) = 4 and TD(RΠL) ≥ 4.
3.2.2
Recursive Teaching Dimension
In this section we will demonstrate that the recursive teaching protocol reduces the
number of examples needed to teach regular pattern languages. In particular, we
will prove that the recursive teaching dimension of the class of all regular pattern
languages is 2, in contrast to its teaching dimension being at least 4.
According to Theorem 3.4, the teaching dimension for constant-free pattern languages in RΠL is 3, independent of the length of the underlying pattern. Also, the
existence of constant-free languages is the only reason we are not able to find any sets
consisting of two positive examples as a teaching set for languages that have at least
one variable in RΠL. The other languages that prevent us from having teaching sets
containing one positive and one negative example are variable-free languages. We
will show how to arrange regular pattern languages in a subclass teaching sequence
for RΠL in order to obtain a recursive teaching dimension of 2.
We will first show that variable-free languages have a teaching dimension of 2
with respect to RΠL. As we know, every variable-free pattern language L = L(w) is
consistent with only one positive example (w, +), which does not form a teaching set
by itself. Lemma 3.3 identifies which negative example the teacher may use together
with (w, +) as a teaching set for L(w).
Lemma 3.3. Let |Σ| ≥ 2 and let n ∈ N, n ≥ 2. Let α ∈ ΠL contain at least one
34
variable. Then for any w ∈ Σ∗ , if w ∈ L(α) then w n ∈ L(α).
Proof. Let α ∈ RΠ, |V ars(α)| ≥ 1 and w ∈ L(α). Let w = σ1 σ2 . . . σm . Then
there exists a variable x ∈ V ars(α) that is substituted by a word σi . . . σj , for some
i ≤ j, when α generates w. We change the substitution to generate w n by replacing
x with σi . . . σj . . . σm w n−2σ1 . . . σj . So α is consistent with {(w, +), (w n, +)} where
n = 2, 3, . . ..
Theorem 3.6. Let |Σ| ≥ 2.
Let α ∈ RΠ be a variable-free pattern.
Then
TD(L(α), RΠL) = 2.
Proof. Based on Lemma 2.1 and Lemma 2.3, TD(L(α), RΠL) > 1. Now, we prove
that α has a teaching set of size 2. Since α = w ∈ Σ∗ , α generates only the word w
so the only positive example for α is (w, +). According to Lemma 3.3, each regular
pattern β 6= w that generates w also generates w n for n ≥ 2. So these regular patterns
are consistent with {(w, +), (w n, +)}. Therefore, we can distinguish α from all such
β using the negative example (w n , −). Thus, for any n ≥ 2, {(w, +), (w n, −)} ∈
T S(L(α), RΠL).
Consider the two patterns α = x1 and β = x1 x2 over Σ = {a, b} (see Table 3.1).
Clearly, L(β) ⊂ L(α) so L(α) is consistent with all positive examples from L(β).
Thus, no set containing only positive examples can be a teaching set for L(β) using
the teaching dimension protocol. But assume the teacher uses the recursive teaching
protocol and sorts the languages in the subclass teaching sequence in increasing order
of the length of the generating pattern. In this case, the teacher uses a set of two
positive examples for teaching L(α) and teaches L(a) and L(b) using a set containing
a single positive example and a single negative example for each. Then, after all
languages generated by patterns of length 1 have been “removed” from the class,
the teacher can use a set of two positive examples of length 2 as a teaching set for
35
L(β). Note that, before “removing” L(α) (i.e., passing L(α) in the subclass teaching
sequence) we cannot find any teaching set of size two for L(β) with respect to the
underlying class.
Lemma 3.4. Let |Σ| ≥ 2. Let n,k ∈ N, 1 ≤ k < n. Let α ∈ RΠ with |α| = n
and |Vα | = k. Then TD(L(α), RΠL≥n+1 ∪ RΠL≤k
n ) = 2. In particular, for any
w1 , w2 ∈ L(α) satisfying |w1 | = |w2 | = n and w1 [i] 6= w2 [i] for all i ∈ Vα , we have
{(w1 , +), (w2, +)} ∈ TS(L(α), RΠL≥n+1 ∪ RΠL≤k
n ).
Proof. Since |w1 | = |w2 | = n, all patterns in RΠ≥n generating w1 , w2 must have
length n. For any β ∈ RΠk≥n with |β| = |α| and L(β) 6= L(α), there exists some
i ∈ {1, . . . , n} such that β[i] ∈ Σ and α[i] 6= β[i]. Thus for all w ∈ L(β) with
|w| = n, w[i] is constant. Thus β cannot generate strings of length n with different
symbols in position i. Therefore, the only consistent pattern with S in RΠk≥n is α
and S = {(w1 , +), (w2, +)} ∈ TS(L(α), RΠL≥n+1 ∪ RΠL≤k
n ).
While the teaching dimension of the class of regular pattern languages over alphabets of at least two constants is at least 4, the following theorem shows that the
recursive teaching dimension of this class is 2.
Theorem 3.7. Let |Σ| ≥ 2. Then RTD(RΠL) = 2.
Proof. Let Pi,j ⊂ RΠ be defined by Pi,j = {α ∈ RΠ | |α| = i and |V ars(α)| = j}.
Let Li,j = {L(α) | α ∈ Pi,j }. Let
L≥l =
[
Li,j .
i≥l
0≤j≤i
As a subclass teaching sequence for RΠL we choose the sequence
STS = ((L1,1, d1,1 ), (L1,0 , d1,0), (L2,2 , d2,2 ), (L2,1, d2,1 ), (L2,0 , d2,0), . . .)
36
where for all i ≥ 1 and 0 ≤ j ≤ i,
di,j = TD(L, L \ {L1,1 , L1,0 , . . . , Li−1,0, Li,i , . . . , Li,j+1})
for all L ∈ Li,j .
Based on Lemma 3.4, every pattern in Li,i , for any i, has a teaching set of size
two with respect to L≥i . Therefore di,i ≤ 2 for all i. In particular, d1,1 = 2.
Again by Lemma 3.4, every pattern in Li,j for 0 < j < i has a teaching set of size
two with respect to L≥i . Thus di,j ≤ 2 where 0 < j < i.
Finally, Theorem 3.6 shows that all variable-free pattern languages, i.e., all languages in Li,0 for any i > 0 have a teaching set of size two with respect to L≥i . Thus
di,j ≤ 2 for j = 0.
Therefore, RTD(RΠL) = sup{di,j | 0 ≤ j ≤ i} = 2.
Example 3.2. Consider learning the class of all regular pattern languages over Σ =
{a, b} using the recursive teaching protocol. The subclass teaching sequence mentioned
in Theorem 3.7 starts with L1,1 = {L(x1 )}, and the teacher teaches L(x1 ) using two
positive examples (see Table 3.2.a). The sequence continues with L1,0 = {L(a), L(b)}.
Note that the teaching dimension of L1,0 is one after removing L1,1 . At this point, we
have dealt with all languages generated by patterns of length one and the remaining
class is L≥2 . In this class, the teacher can teach L2,2 = {L(x1 x2 )} using two positive
examples (see Table 3.2.b). Proceeding in this manner, we will see that the sequence
STS = ((L1,1 , d1,1 ), (L1,0 , d1,0), (L2,2 , d2,2 ), . . .) is a subclass teaching sequence for the
class of all regular pattern languages that witnesses RTD(RΠL) = 2.
37
Language
L(x1 )
L(a)
L(b)
L(x1 x2 )
..
.
a
[+]
[+]
..
.
b
[+]
[+]
..
.
aa
+
+
..
.
ab
+
+
..
.
ba
[+]
[+]
[+]
..
.
bb
+
[+]
[+]
..
.
...
+ ...
- ...
- ...
+ ...
..
.
(a) L≥1
Language
L(x1 x2 )
L(x1 a)
L(bx1 )
L(x1 b)
L(ax1 )
..
.
aa
+
[+]
[+]
..
.
ab
[+]
[+]
[+]
..
.
...
+ ...
±...
±...
±...
± ...
..
.
(b) L≥2
Table 3.2: Class of all regular pattern languages (RΠL) over Σ = {a, b}. The examples used in the recursive teaching protocol are marked in brackets.
3.3
One-Variable Patterns
In this section we study the teaching dimension and recursive teaching dimension
for the class of one-variable pattern languages over a finite alphabet of at least two
constants (shortly 1V ΠL). Angluin introduced the class of one-variable pattern languages first in 1980 [1]. Recall that pattern α is called one-variable if |V ars(α)| ≤ 1
but there might be some repetition of the variable occurring in α, if any occurs (possibly |Vα | > 1|). Angluin proved that, given a sample set S ∈ L(α), a polynomial-time
algorithm finds a one-variable pattern β descriptive of S with respect to the class of
all one-variable pattern languages [1].
38
3.3.1
Teaching Dimension
We have seen in Theorem 3.2 that the teaching dimension of the class of all pattern
languages over a finite alphabet of at least two constants is infinite. We will now
show that even the subclass of one-variable pattern languages has infinite teaching
dimension.
Our first result shows that the teacher should present at least three labelled examples to teach languages generated by patterns that have at least one variable in
the class of all one-variable pattern languages over Σ.
Theorem 3.8. Let Σ be any countable set. Let α ∈ 1V Π contain exactly one variable
and let |α| ≥ 2. Then TD(L(α), 1V ΠL) ≥ 3.
Proof. Assume for the purpose of contradiction that TD(L(α), 1V ΠL) ≤ 2. Because
of Lemma 2.1 and Lemma 2.3, every teaching set for L(α) has both positive and
negative examples. Thus, assume S = {(w1 , +), (w2, −)} ∈ TS(L(α), 1V ΠL). But
then the variable-free pattern β = w1 is consistent with S. So, S is not a teaching
set for L(α) with respect to 1V ΠL and TD(L(α), 1V ΠL) ≥ 3.
Example 3.3. Consider the following two patterns α, β ∈ 1V ΠL: α = xk , β = xl
where k = 3 · 5 · 7 = 105, l = 3 · 5 · 7 · 11 = 1155 and x ∈ X. Let Div(A) = {t |
A mod t = 0}. We know that all languages generated by patterns of the form xt for
all t ∈ Div(k) generate all w ∈ L(α). Div(k) = {1, 3, 5, 7, 15, 21, 35, 105} and three
elements in Div(k) that have two prime factors in common with k are as follows:
• t1 = 15 = 3 · 5,
• t2 = 21 = 3 · 7,
• t3 = 35 = 5 · 7.
39
Obviously, L(xti ) ⊃ L(α) for i ∈ {1, 2, 3}. Therefore, negative examples using
words in L(xti ) \ L(α) are needed to distinguish L(α) from L(xti ). For i 6= j, i, j ∈
{1, 2, 3}, we get L(xti ) ∩ L(xtj ) = {a3·5·7·r | a ∈ Σ, r ∈ N, r ≥ 1} = {ak·r | a ∈ Σ, r ∈
N, r ≥ 1} = L(α). Therefore, three negative examples are needed to distinguish L(α)
from the three languages L(xti ), i ∈ {1, 2, 3}.
Now consider pattern β.
Div(l) = {1, 3, 5, 7, 11, 15, 21, 33, 35, 55, 77, 105, 165, 231, 385, 1155}
and there are four divisors ti ∈ Div(l), 1 ≤ i ≤ 4 that have three prime factors
in common with l. Again, using the same reasoning as above, we need exactly one
negative example per index i ∈ {1, . . . , 4} to distinguish L(α) from each L(xti ), 1 ≤
i ≤ 4, i.e., four negative examples in total.
In general, for patterns of the form γ = xk where k is the product of m distinct
prime numbers, we need at least m labelled examples to distinguish L(γ) from all
languages generated by patterns of the form xt where t ∈ Div(k) and t and k have
m − 1 prime factors in common.
Obviously, based on Theorem 2.1, the teacher needs finitely many labelled examples to teach any language in the class of one-variable pattern languages. But the
following theorem proves that there is no finite upper bound for the size of a smallest
possible teaching set of each language with respect to the class of all one-variable
pattern languages.
Theorem 3.9. Let Σ be any countable set. Then TD(1V ΠL) = ∞.
Proof. Assume TD(1V ΠL) = m for some m ∈ N. Let
k = p1 · p2 · . . . · pm
40
be the product of m distinct prime numbers p1 , p2 , . . . , pm . Let α = xk for some
x ∈ X. By assumption, TD(L(α), 1V ΠL) ≤ m, i.e., there is a set S of at most m
labelled examples such that S ∈ TS(L(α), 1V ΠL). By Lemma 2.1, S contains at
least one positive example, and thus S contains at most m − 1 negative examples.
For i ∈ {1, . . . , m}, define ti =
k
.
pi
Then L(xt1 ), . . . , L(xtm ) are pairwise dis-
tinct languages in 1V ΠL \ L(α), all of which contain L(α). Therefore, for each
i ∈ {1, . . . , m}, there is a word wi ∈ L(xti ) \ L(α) such that (wi , −) ∈ S.
Since S contains at most m − 1 negative examples, there must be two distinct
indices i, j ∈ {1, . . . , m} such that wi = wj . In particular, wi ∈ (L(xti )∩L(xtj ))\L(α).
However,
k
k
·r
L(xti ) ∩ L(xtj ) = {a pi | a ∈ Σ, r ∈ N, r ≥ 1} ∩ {a pj
·r 0
| a ∈ Σ, r 0 ∈ N, r 0 ≥ 1} =
00
{ak·r | a ∈ Σ, r 00 ∈ N, r 00 ≥ 1} = L(α).
Hence, (L(xti ) ∩ L(xtj )) \ L(α) = ∅ in contradiction to the fact that wi ∈ (L(xti ) ∩
L(xtj )) \ L(α). Therefore, m does not exist, i.e., TD(1V ΠL) = ∞.
Theorem 3.9 provides an alternative proof for Theorem 3.2, i.e., for the fact that
the class of all pattern languages over any finite alphabet of size at least two has
infinite teaching dimension.
3.3.2
Recursive Teaching Dimension
In this section, we calculate the RTD for the class of all one-variable pattern languages
over finite non-singleton Σ.
Lemma 3.5. Let Σ be any countable set. Let α ∈ 1V Π and let k ∈ N, k > 0,
n = |α| ≥ 2. Then TD(L(α), 1V ΠLk≥n ) > 1.
41
Proof. Based on Lemma 2.1, L(α) does not have a teaching set of only one negative example.
Thus assume for the purpose of contradiction S = {(w, +)} ∈
TS(L(α), 1V P Lk≥n ) for some w ∈ Σ+ . All patterns of the form β = w[1] . . . w[i −
1]xw[i + 1] . . . w[n] for some 1 ≤ i ≤ n are consistent with S. Thus, S ∈
/ T S(L(α),
1V ΠLk≥n ).
The following lemma shows that there is a class of pattern languages containing all
languages generated by variable-free patterns of length n and all one-variable patterns
of length greater than n in which the teaching dimension of the languages generated
by variable-free patterns of length n is one with respect to the underlying class.
Lemma 3.6. Let n, k ∈ N, n, k > 0.
Let α = w ∈ Σ+ , n = |α|.
Then
TD(L(α), 1V P Lk>n ∪ 1V ΠL0n ) = 1. In particular the set S = {(w, +)} belongs to
TS(L(α), 1V ΠLk>n ∪ 1V P L0n ).
Proof. The target class contains variable-free patterns of length n and no other patterns of length n. Also, patterns of length greater than n do not generate strings of
length n. Hence, the only pattern consistent with this positive example is the pattern
α.
Lemma 3.7. Let n, k ∈ N, n, k > 0 and let α ∈ 1V Π contain a variable, n = |α|.
Then TD(L(α), 1V ΠL≥n ) = 2. In particular, the set S = {(w1 , +), (w2, +)} belongs
to TS(L(α), 1V ΠL≥n ) where w1 , w2 ∈ L(α), |w1| = |w2 | = n and w1 [i] 6= w2 [i] for all
i ∈ Vα .
Proof. Based on Lemma 3.5, TS(L(α), 1V ΠL≥n ) > 1. Since w1 [i] 6= w2 [i] for all
i ∈ Vα , and L(α) ∈ 1V ΠL, only L(α) is consistent with S.
While Theorem 3.9 showed that the class of all one-variable pattern languages has
infinite teaching dimension over any countable set of constant symbols, the following
42
theorem shows that we can determine a finite upper bound on the number of required
examples the teacher needs to teach each language of the underlying class using the
recursive teaching protocol.
Theorem 3.10. Let Σ be any countable set. Then RTD(1V ΠL) = 2.
Proof. Let Σ be any countable set. Let Pi,j ⊂ 1V Π be defined by Pi,j = {α ∈ 1V Π |
|α| = i and |V ars(α)| = j}. Let Li,j = {L(α) | α ∈ Pi,j }. Let
L≥l =
[
Li,j .
i≥l
0≤j≤1
As a subclass teaching sequence for 1V ΠL we choose the sequence
STS = ((L1,1 , d1,1 ), (L1,0, d1,0 ), (L2,1 , d2,1), (L2,0 , d2,0 ), . . .)
where for all i ≥ 1:
di,0 = TD(L, L \ {L1,1 , L1,0 , . . . , Li−1,0, Li,1 }),
di,1 = TD(L, L \ {L1,1 , L1,0 , . . . , Li−1,0 }).
Based on Lemma 3.7, every pattern in Li,1 , for any i, has a teaching set of size two
with respect to L≥i ; therefore di,1 = 2 for all i.
Finally, Lemma 3.6 shows that all variable-free pattern languages, i.e., all languages in Li,0 for any i > 0, have a teaching set of size one with respect to
1V ΠLk>i ∪ 1V ΠL0i . Thus di,0 = 1.
Therefore, RTD(1V ΠL) = sup{di,j | 0 ≤ j ≤ 1} = 2.
Example 3.4. Consider learning the class of all one-variable pattern languages over
Σ = {a, b} using the recursive teaching protocol. The subclass teaching sequence
43
mentioned in Theorem 3.10 starts with L1,1 = {L(x)}, and the teacher teaches L(x)
using two positive examples (see Table 3.3.a). The sequence continues with L1,0 =
{L(a), L(b)}. Note that the teaching dimension of L1,0 is two after removing L1,1 .
At this point, we have dealt with all languages generated by patterns of length one
and the remaining class is L≥2 . In this class, the teacher can teach each language
in L2,1 = {L(ax), L(xa), L(bx), L(xb), L(xx)} using two positive examples (see Table
3.3.b). Therefore, the sequence STS = ((L1,1 , d1,1 ), (L1,0 , d1,0), (L2,1 , d2,1 ), . . .) is a
subclass teaching sequence for the class of all one-variable pattern languages that
witnesses RTD(1V ΠL) = 2.
Language
L(x)
L(a)
L(b)
L(xx)
..
.
a
[+]
[+]
[-]
..
.
Language
L(xa)
L(bx)
L(xb)
L(ax)
L(xx)
..
.
aa
[+]
[+]
[+]
..
.
b
[+]
[-]
[+]
..
.
aa
+
+
..
.
ab
+
..
.
...
+ ...
- ...
- ...
±...
..
.
ba
[+]
[+]
..
.
bb
[+]
[+]
[+]
..
.
...
±...
±...
±...
± ...
±...
..
.
L≥1
ab
[+]
[+]
..
.
L≥2
Table 3.3: Class of all one-variable pattern languages (1V ΠL) over Σ = {a, b}. The
examples used in the recursive teaching protocol are marked in brackets.
44
Chapter 4
Alphabets of Size One
In this chapter we investigate how the alphabet size can affect the number of examples needed by a teacher to teach a language from the underlying class of pattern
languages, both when using the teaching dimension and when using the recursive
teaching dimension. In particular, we focus on alphabets of size one and compare the
TD and RTD values obtained for such alphabets to those for larger finite alphabets.
4.1
Arbitrary Patterns
In this section we study the effect of alphabets of size one on both teaching dimension
and recursive teaching dimension of the class of all pattern languages.
4.1.1
Teaching Dimension
Theorem 3.2 showed that the class of all pattern languages has infinite teaching
dimension over finite alphabets of size at least two. The proof of this theorem, as
given in Section 3.1.1, does not apply to the case of a singleton alphabet.
45
However, in Theorem 3.9 we showed that the subclass of all one-variable pattern languages already has infinite teaching dimension independent of the underlying
alphabet.
An immediate consequence is that the class of all pattern languages over a singleton alphabet also has infinite teaching dimension. Note that in the case of the class
of all pattern languages, the alphabet size plays no role when it comes to the teaching
dimension itself.
Theorem 4.1. Let |Σ| = 1. Then TD(ΠL) = ∞.
Proof. Immediate from Theorem 3.9.
4.1.2
Recursive Teaching Dimension
In this subsection we determine the recursive teaching dimension of the class of all
pattern languages over alphabets of size one. In this case, the position of constants and
variables in a pattern is not important because all possible substitutions for variables
are strings consisting of only one symbol. Moreover, because the alphabet contains
only one constant symbol, many different patterns generate the same language. For
example, consider two patterns α = ax1 a and β = x1 x2 x3 over Σ = {a}. Since any
substitution θ is a word homomorphism from Π to Σ+ = {a}+ , L(α) = L(β).
If we consider the set of all patterns of over Σ = {a} then this set can be partitioned
into three parts covering all pattern languages over Σ:
• Pat1 = {an | n ∈ N, n ≥ 1},
• Pat2 = {α | α ∈ Π and α has at least one non-repeated variable}.
• Pat3 = {α | α ∈ Π \ Σ∗ and α has only repeated variables}.
This partition will be helpful in our proofs in this chapter.
46
Lemma 4.1. Let Σ = {a}, let α ∈ Pat2 and |α| = n. Then L(α) = {al | l ≥ n}.
Proof. Let α ∈ Pat2 , |α| = n, and let x1 ∈ X be a non-repeated variable in α. If
w ∈ L(α), then w ∈ {a}+ and |w| ≥ |α| = n, and thus w ∈ {al | l ≥ n}. Thus
L(α) ⊆ {al | l ≥ n}.
If w = al for some l ≥ n then θ(α) = w where
for x ∈ X.


 a
θ(x) =

 al−n+1
x 6= x1
x = x1
Therefore, w ∈ L(α) and {al | l ≥ n} ⊆ L(α).
Corollary 4.1. Let Σ = {a}. Let α, β ∈ Pat2 . If |α| = |β| then L(α) = L(β).
Proof. Immediate from Lemma 4.1.
We see that all regular patterns that are not variable-free are contained in Pat2 .
In other words, all regular patterns over Σ containing at least one variable generate
the same language.
Based on Lemma 2.1 and Lemma 2.2, no language in ΠL has a teaching set
containing only one negative example. Also, we can show that there is no teaching
set containing only one positive example for any language over Σ in ΠL≥n for n > 1.
Lemma 4.2. Let Σ be any countable set. Let n ∈ N, n > 0, and L(α) ∈ ΠL. Then
TD(L(α), ΠL≥n ) > 1.
Proof. No teaching set for L(α) with respect to ΠL≥n consists of only negative examples (w, −), because any two patterns longer than w would be consistent with such a
set.
47
For n = 1 we know that there is no teaching set containing only one positive
example for L(α) with respect to ΠL. For n > 1, assume α is not a constant-free
pattern. Then every S = {(w, +) | w ∈ L(α)} is not a teaching set for L(α) because
|w| ≥ |α| and there is a constant-free pattern β ∈ ΠL≥n such that |β| = |α| and L(β)
is consistent with S. So S is not a teaching set for L(α).
Now, assume α is a constant-free pattern and w ∈ L(α). Then S = {(w, +)} is
not a teaching set for L(α) since there is a variable-free pattern β = w and L(β) is
consistent with S.
Consequently, there is no teaching set for L(α) over Σ consisting of either a single
negative example or a single positive example with respect to ΠL≥n . Therefore,
TD(L(α), ΠL≥n ) > 1.
The following lemma shows the example needed to be presented by the teacher to
distinguish all languages generated by patterns from Pat2 from all other languages
in ΠL≥n .
Lemma 4.3. Let |Σ| = 1, α ∈ Pat3 and let |α| = n. Then an+1 ∈
/ L(α).
Proof. All patterns in Pat3 are non-regular patterns that have only repeated variables.
Thus the minimum possible difference in length between two words w1 , w2 in L(α)
with n = |w1| =
6 |w2 | is 2.
Lemma 4.4. Let |Σ| = 1, let α ∈ Pat2 and let |α| = n. Then, TD(L(α), ΠL≥n ) = 2.
In particular, S = {(an , +), (an+1, +)} ∈ TS(L(α), ΠL≥n ).
Proof. Based on Lemma 4.2, the teacher needs more than one labelled example to
teach any language in the underlying class. Let xi be a non-repeated variable in α.
The substitution that maps all variables to the string “a” generates an from α. The
substitution that maps xi to “aa” and all other variables to “a” generates an+1 . So S
48
is consistent with L(α). Obviously, all variable-free pattern languages are inconsistent
with S. Additionally, Lemma 4.3 showed that all languages in Pat3 are inconsistent
with S. Therefore, S is a teaching set for L(α) with respect to ΠL≥n .
Let L≥n = {L(α) | α ∈ Π and |α| ≥ n}. In the following tables we show a possible
order of processing pattern languages to have the recursive teaching dimension 2 with
respect to L≥n . The first column shows a selected pattern among all patterns that
generate the same language. If ki ≥ 1 is the length of a substituted word for variable
Pv
xi that has li repetitions and v = |V ars(α)|, then Len =
i=1 ki · li denotes the
polynomial that determines the length of words α can generate. The third column
shows the recursive teaching set for the corresponding language.
Language
L(x1 )
L(a)
Len
k1
1
Recursive Teaching Set
{(a, +), (aa, +)}
{(a, +), (aa, −)}
Table 4.1: Recursive teaching sets for languages generated by patterns of length 1
with respect to L≥1 .
Language
L(x1 x2 )
L(aa)
L(x1 x1 )
Len
k1 + k2
2
2k1
Recursive Teaching Set
{(aa, +), (a3 , +)}
{(a2 , +), (a4 , −)}
{(a2 , +), (a4 , +)}
Table 4.2: Recursive teaching sets for languages generated by patterns of length 2
with respect to L≥2 .
Language
L(x1 x2 x3 )
L(a3 )
L(ax1 x1 )
L(x1 x1 x1 )
Len
Σ3i=1 ki
3
2k1 + 1
3k1
Recursive Teaching Set
{(a3 , +), (a4 , +)}
{(a3 , +), (a9 , −)}
{(a3 , +), (a5 , +)}
{(a3 , +), (a6 , +)}
Table 4.3: Recursive teaching sets for languages generated by patterns of length 3
with respect to L≥3 .
49
Language
L(x1 . . . x4 )
L(a4 )
L(x1 x1 x2 x2 )
Len
Σ4i=1 ki
4
2k1 + 2k2
Recursive Teaching Set
{(a4 , +), (a5 , +)}
{(a4 , +), (a28 , −)}
{(a4 , +), (a6 , +)}
Table 4.4: Recursive teaching sets for languages generated by patterns of length 4
with respect to L≥4 .
Language
L(x1 . . . x5 )
L(a5 )
L(x21 x32 )
Len
Σ5i=1 ki
5
2k1 + 3k2
Recursive Teaching Set
{(a5 , +), (a6 , +)}
{(a5 , +), (a5!+5, −)}
{(a5 , +), (a7 , +)}
Table 4.5: Recursive teaching sets for languages generated by patterns of length 5
with respect to L≥5 .
Language
L(x1 . . . x6 )
L(a6 )
L(x31 x22 a)
L(x21 x22 x23 )
L(x31 x32 )
L(x51 a)
L(x41 aa)
L(x61 )
Len
Σ6i=1 ki
6
3k1 + 2k2 + 1
2k1 + 2k2 + 2k3
3k1 + 3k2
5k1 + 1
4k1 + 2
6k1
Recursive Teaching Set
{(a6 , +), (a7 , +)}
{(a6 , +), (a6!+6 , −)}
{(a6 , +), (a13 , +)}
{(a6 , +), (a8 , +)}
{(a6 , +), (a9 , +)}
{(a6 , +), (a11 , +)}
{(a6 , +), (a10 , +)}
{(a6 , +), (a12 , +)}
Table 4.6: Recursive teaching sets for languages generated by patterns of length 6
with respect to L≥6 .
Language
L(x1 . . . x7 )
L(a7 )
L(x51 aa)
L(x41 a3 )
L(x31 a4 )
L(x21 x21 a3 )
L(x31 x42 )
L(x21 x52 )
L(x21 x32 aa)
Len
Σ7i=1 ki
7
5k1 + 2
4k1 + 3
3k1 + 4
2k1 + 2k2 + 3
3k1 + 4k2
2k1 + 5k2
2k1 + 3k2 + 2
Recursive Teaching Set
{(a7 , +), (a8 , +)}
{(a7 , +), (a7!+7 , −)}
{(a7 , +), (a19 , −)}
{(a7 , +), (a13 , −)}
{(a7 , +), (a11 , −)}
{(a7 , +), (a14 , −)}
{(a7 , +), (a9 , −)}
{(a7 , +), (a10 , −)}
{(a7 , +), (a10 , +)}
Table 4.7: Recursive teaching sets for languages generated by patterns of length 7
with respect to L≥7 .
50
Language
L(x1 . . . x8 )
L(a8 )
L(x31 x22 a3 )
L(x51 a3 )
L(x31 a5 )
L(x41 x32 a)
L(x51 x32 )
Len
Σ8i=1 ki
8
3k1 + 2k2 + 2
5k1 + 3
3k1 + 5
4k1 + 3k2 + 1
5k1 + 3k2
Recursive Teaching Set
{(a8 , +), (a9 , +)}
{(a8 , +), (a8!+8 , −)}
{(a8 , +), (a10 , +)}
{(a8 , +), (a11 , −)}
{(a8 , +), (a28 , −)}
{(a8 , +), (a12 , +)}
{(a8 , +), (a13 , +)}
Table 4.8: Recursive teaching sets for languages generated by patterns of length 8
with respect to L≥8 .
Language
L(x1 . . . x9 )
L(a9 )
Len
Σ9i=1 ki
9
Recursive Teaching Set
{(a9 , +), (a10 , +)}
{(a9 , +), (a9!+9, −)}
Table 4.9: Recursive teaching sets for languages generated by patterns of length 9
with respect to L≥9 .
Language
L(x1 . . . x10 )
L(a10 )
L(x51 x52 )
L(x81 x22 )
L(x31 x32 a4 )
L(x31 x72 )
L(x51 x42 a)
L(x31 x52 aa)
L(x31 x32 a5 )
L(x31 x32 x43 )
L(x21 x52 a3 )
Len
Σ10
i=1 ki
10
5k1 + 5k2
8k1 + 2k2
3k1 + 3k2 + 4
3k1 + 7k2
5k1 + 4k2 + 1
3k1 + 5k2 + 2
3k1 + 2k2 + 5
3k1 + 3k2 + 4k3
2k1 + 5k2 + 3
Recursive Teaching Set
{(a10 , +), (a11 , +)}
{(a10 , +), (a10!+10 , −)}
{(a10 , +), (a22 , −)}
{(a10 , +), (a19 , −)}
{(a10 , +), (a20 , −)}
{(a10 , +), (a18 , −)}
{(a10 , +), (a16 , −)}
{(a10 , +), (a14 , −)}
{(a10 , +), (a13 , +)}
{(a10 , +), (a12 , −)}
{(a10 , +), (a13 , −)}
Table 4.10: Recursive teaching sets for languages generated by patterns of length 10
with respect to L≥10 .
51
It is clear from Tables 4.1 through 4.8 that we were able to find a subclass teaching
sequence for the class of pattern languages of length at most 8 over singleton alphabets
by hand. We obtained a similar results for patterns of length 10, see Table 4.10. For
patterns of length 9, Table 4.9 shows a partial result.
The most general method for teaching arbitrary pattern languages over singleton
alphabets using the recursive teaching protocol is to order them lengthwise. Moreover,
we can teach languages generated by variable-free patterns and constant-free patterns
at the same time using two examples for each:
Lemma 4.4 proves that we can learn the languages generated by constant-free
patterns with two positive examples.
As for the variable-free patterns, consider L(a9 ) with respect to L≥9 \
{L(x1 . . . x9 )}. To distinguish this language from all other languages generated by
patterns of length greater than 9 in the underlying class we reveal the labelled example (a9 , +), since all longer patterns are inconsistent with this example. Moreover,
we are able to distinguish L(a9 ) from all languages generated by patterns of length
9 with one negative example (az , −) where z = 2 · 3 · . . . · 9 + 9. Note that every
pattern α of length 9 in the underlying class has k = |Vα | with 1 ≤ k ≤ 9. Thus α
contains 9 − k constants. In fact, if k | (z − (9 − k)) or k | ((2 · 3 · . . . · 9) + k) then az
belongs to L(α). Since 1 ≤ k ≤ 9, any pattern α of length 9 in the underlying class is
inconsistent with (az , −). Thus {(a9 , +), (az , −)} is a smallest teaching set for L(a9 )
with respect to L≥9 \ {L(x1 . . . x9 )}.
The following lemma proves that every variable-free pattern can be learned with
two labelled examples.
Lemma 4.5. Let Σ = {a} and n ≥ 1 ∈ N. Let L≥n = {L(α) | α ∈ Π, |α| ≥ n}. Then
TD(L(an ), L≥n ) = 2. In particular, S = {(an , +), (an!+n , −)} ∈ T S(L(an ), L≥n ).
Proof. Lemma 4.2 proved that L(an ) is not learnable using a single labelled example
52
with respect to the underlying class. Thus TD(L(an ), L≥n ) ≥ 2.
Now, we prove that TD(L(an ), L≥n ) = 2. Let z = n! + n. We claim S =
{(an , +), (an!+n , −)} ∈ T S(L(an ), L≥n ). To prove this, note that L(an ) is consistent
with S, so it remains to prove that there is no other language in L≥n consistent with
S. For any language L ∈ L≥n \ {L(an )}, if L is generated by a pattern of length
greater than n, then L does not contain an , that means L is inconsistent with S.
So suppose L is generated by a pattern of length n. Since L 6= L(an ), every
pattern generating L must contain at least one variable. Note further that every
pattern generating L is of length n. Let α be any pattern generating L and let
k = |Vα |. Then α contains n − k occurrences of constants. Thus α generates any
word al where l = n − k + r · k for r ≥ 1. In particular, α generates the word am
where m = n − k + [(k − 1)! · n · (n − 1) · . . . · (k + 1) + 1] · k = n − k + n! + k = z. Thus
L(α) is inconsistent with S and S = {(an , +), (an!+n, −)} ∈ T S(L(an ), L≥n ).
Therefore, variable-free patterns and constant-free patterns are contained in the
set of languages that can be processed first with two labelled examples while we are
applying the recursive teaching protocol lengthwise.
In addition we have found the following teaching sets for some other patterns of
length 9 using the same argument. But this argument works only for any pattern α
for which Vα is a prime number or prime power.
Language
L(x91 )
L(ax81 )
L(a3 x71 )
z
2·3·4·5·7·8+9
3·4·5·7·9+9
3·4·5·6·8·9+9
Recursive Teaching Set
{(a9 , +), (az , −)}
{(a9 , +), (az , −)}
{(a9 , +), (az , −)}
Table 4.11: Recursive teaching sets for languages generated by some patterns of length
9 with respect to L≥9 .
53
4.2
Regular Patterns
This section investigates the learning of the class of all regular pattern languages over
any alphabet of size one. We study the impact of the size of the alphabet on both
teaching complexity criteria teaching dimension and recursive teaching dimension.
4.2.1
Teaching Dimension
Theorem 3.5 showed that the class of all regular pattern languages has a teaching
dimension of size at least four over any finite alphabet of size at least two. In this
section we find a finite upper bound on the size of a sample set the teacher needs
to present to teach each language of the underlying class. The following theorems
show that the teacher needs at most three labelled examples to teach each language
in the class of all regular pattern languages over Σ. As we discussed in Section 4.1.2,
since the alphabet contains only one constant symbol, many patterns generate the
same language. As a result of Corollary 4.1, all regular patterns of the same length
that are not variable-free generate the same language. Therefore, to distinguish a
language generated by a pattern that is not variable-free over Σ we do not need any
labelled example to distinguish it from other languages generated by patterns of the
same length in the underlying class.
Lemma 4.6. Let Σ = {a} and let n ∈ N, n ≥ 1, α = an . Then TD(L(α), RΠL) = 2.
In particular, S = {(an , +), (an+1, −)} ∈ TS(L(α), RΠL).
Proof. Based on Lemma 2.1, Lemma 2.2 and Lemma 2.3 there is no teaching set of
size one for L(α).
Since all patterns of length l ∈ {1, . . . , n} that have at least one variable over
Σ generate an+1 , all languages generated by these patterns are inconsistent with S.
54
Moreover, all variable-free patterns β 6= α are inconsistent with (an , +). Therefore,
S is a teaching set for L(α) with respect to RΠL.
Lemma 4.7. Let Σ = {a} and let α be any regular pattern of length n
that is not variable-free.
Then TD(L(α), RΠL) = 3.
In particular, S =
{(an , +), (an+1, +), (an−1 , −)} is a teaching set for L(α) with respect to RΠL.
Proof. According to Lemma 2.1, Lemma 2.2 and Lemma 2.3 there is no teaching
set of size one for L(α) with respect to RΠL. Additionally these lemmas show that
there is no teaching set containing only negative examples or only positive examples.
Thus, for the purpose of contradiction assume S1 = {(w1 , +), (w2, −)} is a teaching
set for L(α). Obviously, the language generated by the variable-free pattern β = w
is consistent with S1 . Therefore, S1 is not a teaching set for L(α) with respect to
RΠL. Hence, the teacher needs more than two labelled examples to teach L(α) with
respect to RΠL.
We know that all variable-free pattern languages are inconsistent with S because
it contains two positive examples. Since S contains a positive example of length n,
all languages generated by patterns of length greater than n are inconsistent with
S. Finally, all languages generated by patterns that are not variable-free and have a
length less than n are not consistent with (an−1 , −). By Corollary 4.1 we know all
patterns of length n that are not variable-free generate the same language over Σ.
Therefore, S is a teaching set for L(α) with respect to RΠL.
Recall that the teacher needs at least four labelled examples to teach some languages over a finite alphabet of size at least two with respect to the class of all regular
pattern languages while using the teaching dimension protocol. By contrast, we show
that it needs at most three labelled examples to teach any language in the underlying
class of languages over any alphabet of size one using the same learning protocol.
55
Clearly, reducing the size of the alphabet improved the teaching dimension for the
class of all regular pattern languages.
Theorem 4.2. Let Σ = {a}. Then TD(RΠL) = 3.
Proof. Based on Lemma 4.6, the teaching dimension for every variable-free pattern
over Σ in the class of all regular pattern languages is 2.
Moreover, Lemma 4.7 showed that the teaching dimension for every language
generated by a pattern that has at least one variable over Σ is 3.
Finally, based on Definition 2.6:
TD(L) = sup{TD(L, L) | L ∈ L} = sup{2, 3} = 3.
4.2.2
Recursive Teaching Dimension
We demonstrate that reducing the size of the alphabet has no improvement on the
number of labelled examples the teacher needs in the worst case to teach a language
in the class of regular pattern languages while using the recursive teaching protocol.
Theorem 3.7 has no restriction on the size of alphabet. Thus, referring to Theorem
3.7 we use the same reasoning to show that the recursive teaching dimension is two
for the class of all regular pattern languages over any alphabet of size one.
Theorem 4.3. Let |Σ| = 1. Then RTD(RΠL) = 2.
Proof. Immediate from Theorem 3.7.
Note that here RTD(RΠL) < TD(RΠL), so the recursive teaching protocol is
more efficient than the teaching dimension protocol, when it comes to teaching regular
pattern languages over singleton alphabets.
56
4.3
One-Variable Patterns
In this section we consider learning the class of all one-variable pattern languages
over an alphabet of size one. We prove that there is no improvement in the size of
either teaching dimension or recursive teaching dimension while dealing with teaching
the class of all one-variable pattern languages over an alphabet containing only one
constant symbol in comparison with teaching them over finite alphabets of size at
least two.
4.3.1
Teaching Dimension
This section aims to quantify the number of labelled examples needed by the teacher
to teach a language in the class of one-variable pattern languages over alphabets of
size one using the teaching dimension protocol.
Since there is no restriction on the size of the alphabet in Theorem 3.9, the class
of one-variable pattern languages over alphabets of size one has infinite teaching
dimension.
Theorem 4.4. Let |Σ| = 1. TD(1V ΠL) = ∞.
Proof. Immediate from Theorem 3.9.
4.3.2
Recursive Teaching Dimension
We showed that the recursive teaching dimension is two for the class of one-variable
pattern languages over finite alphabets of size at least two. It turns out that Theorem
3.10 is true for any size of alphabet. Thus, the following theorem shows that the
recursive teaching dimension is two for the class of one-variable pattern languages
over alphabet containing only a single constant symbol.
57
Theorem 4.5. Let |Σ| = 1. Then RTD(1V ΠL) = 2.
Proof. Immediate from Theorem 3.10.
58
Chapter 5
Infinite Alphabets
This chapter investigates whether or not using an infinite alphabet can affect the
number of examples needed by the teacher to teach an unknown target language in
a class of pattern languages.
5.1
Arbitrary Patterns
We proved in Theorem 3.9 that the class of one-variable pattern languages has an
infinite teaching dimension over any countable alphabet. Since this subclass is contained in the class of all pattern languages, it follows that the teaching dimension is
infinite for the class of all pattern languages.
Moreover, we will prove that the recursive teaching dimension is two for the class
of arbitrary pattern languages over infinite alphabets.
5.1.1
Teaching Dimension
As stated above, the infinite teaching dimension for the class of all pattern languages
over infinite alphabets is already proven.
59
Theorem 5.1. Let Σ be any countably infinite alphabet. Then TD(ΠL) = ∞.
Proof. Immediate from Theorem 3.9.
5.1.2
Recursive Teaching Dimension
While using an infinite alphabet had no improvement in the teaching dimension of the
class of all pattern languages, it reduces the recursive teaching dimension compared
to the case of alphabets of size one. The following example will explain the order
for “removing” languages from the underlying class to obtain the recursive teaching
dimension of two when using infinite alphabets.
Example 5.1. Let ΠL≥5 be the class of languages generated by patterns of length 5
or greater over an infinite alphabet. Let Mα = {i1 . . . , is } be the multi-set of numbers
of occurrences of variables in any pattern α. Consider the following patterns:
α1 = x1 x2 x3 x4 x5 , Mα1 = {1, 1, 1, 1, 1}
α2 = x1 x2 x3 x4 x4 , Mα2 = {1, 1, 1, 2},
α3 = x1 x2 x2 x3 x3 , Mα3 = {1, 2, 2},
α4 = x1 x2 x3 x3 x3 , Mα4 = {1, 1, 3},
α5 = x1 x1 x2 x2 x2 , Mα5 = {2, 3},
α6 = x1 x2 x2 x2 x2 , Mα6 = {1, 4},
α7 = x51 , Mα7 = {5}.
Since we still have variable-free patterns in the underlying class, we cannot find any
teaching set containing a single positive example or a single negative example. Thus,
60
we aim at finding teaching sets of size two for the languages generated by the patterns
above. Obviously, there is no teaching set of size two for any L(αl ), 2 ≤ l ≤ 7
with respect to the underlying class, as a result of L(α1 ) and all variable-free pattern
languages of length at least 5 being contained in ΠL≥5 .
Therefore, in order to have a recursive teaching dimension of two for those languages we must place L(α1 ) first in a subclass teaching sequence.
Comparing the multi-set for each L(αl ), 2 ≤ l ≤ 7, with Mα1 , every Mαl can
P
be written as Mαl = {j1 , . . . , jt } where jk =
f or all k ∈ {1, . . . , t} and
i∈Ik i,
I1 ∪ . . . ∪ It is a partition of Mα1 .
On the other hand, there is no such relationship between Mα3 and Mα4 . It turns
out that these two languages can be included in the same subclass in the subclass
teaching sequence. Obviously, this subclass comes after the subclass containing L(α2 )
and the subclass containing L(α1 ) in the subclass teaching sequence.
Definition 5.1. Let {i1 , . . . , is }, {j1 , . . . , jt } be multi-sets. {i1 , . . . , is } {j1 , . . . , jt }
iff there is a partition I1 ∪ . . . ∪ It of {i1 , . . . , is } into multi-sets such that
jk =
X
i,
f or all k ∈ {1, . . . , t}.
i∈Ik
Definition 5.2. For any pattern α ∈ Π, Mα denotes the multi-set of numbers of
occurrences of variables in α.
Lemma 5.1. Let Σ be any countably infinite alphabet. Let α ∈ Π and let Lα =
{L(β) | |α| = |β| and Mα Mβ } ∪ {L(β) | |α| = |β| and |Vα | ≥ |Vβ | and Mα Mβ and Mβ Mα } ∪ {L(β) | |β| > |α|}. Then the following two statements hold:
1. If Mα 6= ∅, then TD(L(α), Lα ) = 2. In particular, S = {(w1 , +), (w2 , +)} is a
teaching set for L(α) with respect to Lα where |w1 | = |w2 | = |α| = n, and the
61
sets {w1 [i] | i ∈ Vα }, {w2 [i] | i ∈ Vα }, and {α[i] | i ∈
/ Vα } are pairwise disjoint
and ws [i] 6= ws [j] for s ∈ {1, 2} and 1 ≤ i < j ≤ n.
2. If Mα = ∅, i.e., if α ∈ Σ+ , then TD(L(α), Lα ) = 1. In particular, S =
{(α, +)} ∈ TS(L(α), Lα ).
Proof.
1. In order to prove (1), we first show that there is no language in Lα \
{L(α)} that is consistent with S. Let β ∈ Π such that L(β) ∈ Lα \ {L(α)}.
Then the following cases should be considered :
1.1. |β| > |α|. Then w1 , w2 ∈
/ L(β) and thus L(β) is not consistent with S.
1.2. |β| = |α| and Mα ≺ Mβ . In this case, Mβ is obtained from a summation
of a partitioning of Mα which means |V ars(α)| > |V ars(β)|. Thus there
are i, j ∈ {1, . . . , n} such that α[i] 6= α[j] but β[i] = β[j]. Thus because of
the choice of w1 and w2 , L(β) is not consistent with S.
1.3. |β| = |α| and either Mα = Mβ or Mα Mβ and Mβ Mα . Note that
then |Vα | ≥ |Vβ |, because L(β) ∈ Lα . We consider two subcases:
1.3.1. There is i ∈ {1, . . . , n} such that β[i] ∈ Σ and α[i] 6= β[i]. In this
subcase, L(β) is not consistent with S.
1.3.2. For all i ∈ {1, . . . , n} if β[i] ∈ Σ then α[i] = β[i]. It follows from
|Vα | ≥ |Vβ | that |Vα | = |Vβ |. Now we have to consider two subcases
again:
1.3.2.1. For all i, j ∈ {1, . . . , n}, if β[i] = β[j] ∈ X then α[i] = α[j]. Then
Mβ Mα , and hence Mα = Mβ . But then, since β[i] ∈ Σ implies
α[i] = β[i], we obtain L(α) = L(β), which is a contradiction.
1.3.2.2. There are i, j ∈ {1, . . . , n} with β[i] = β[j] ∈ X but α[i] 6= α[j].
Then w1 [i] 6= w1 [j], and, since |w1 | = |β|, it follows that β is
62
inconsistent with S.
2. To prove (2), suppose Mα = ∅. Thus Lα contains only languages generated by
patterns longer than α and languages generated by variable-free patterns of the
length |α|. All languages generated by patterns longer than |α| are inconsistent
with S. Moreover, every language generated by a variable-free pattern β 6= α
is inconsistent with (α, +).
Therefore, TD(L(α), Lα ) = 2 if Mα 6= ∅ and TD(L(α), Lα) = 1 otherwise.
Theorem 5.2. Let Σ be any countably infinite alphabet. Then RTD(ΠL) = 2.
Proof. Let L−1 = ∅. Let Li = {L(β) | β ∈ Π with [|β| ≤ |α| and Mα ⊀ Mβ ] for all
S
α with L(α) ∈ ΠL \ j≤i−1 Lj and |α| ≥ |β|}. As a subclass teaching sequence we
choose
ST S = ((L0 , d0 ), (L1 , d1 ), (L2, d2 ), . . .)
where for all i ≥ 0 and all L ∈ Li
di = TD(L, L \
[
Lj ).
j≤i−1
Note that, for any γ with L(γ) ∈ Li , the set L \
S
j≤i−1 Lj
corresponds to the
set Lγ in the formulation of Lemma 5.1. Thus, by Lemma 5.1, if Mγ 6= ∅, then
S
S
TD(L(γ), L \ j≤i−1 Lj ) = 2 and otherwise TD(L(γ), L \ j≤i−1 Lj ) = 1.
Therefore, RTD(ΠL) = sup{di | i ≥ 0} = 2.
5.2
Regular Patterns
This section investigates whether or not an infinite alphabet reduces the number of
examples the teacher needs to reveal to teach the target language to the learner.
63
5.2.1
Teaching Dimension
Theorem 3.5 did not require finiteness of the underlying alphabet. Therefore, the
teaching dimension of the class of all regular pattern languages over infinite alphabets
is at least four.
Theorem 5.3. Let Σ be any countably infinite alphabet.Then TD(RΠL) ≥ 4.
Proof. Immediate from Theorem 3.5.
5.2.2
Recursive Teaching Dimension
Theorem 3.7 did not require finiteness of the underlying alphabet. Hence we immediately obtain that the recursive teaching dimension of the class of all regular pattern
languages over infinite alphabets is two.
Theorem 5.4. Let Σ be any countably infinite alphabet.Then RTD(RΠL) = 2.
Proof. Immediate from Theorem 3.7.
5.3
One-Variable Patterns
As we demonstrated, the class of one-variable pattern languages has infinite teaching dimension and recursive teaching dimension two over any countable alphabet.
Similarly, also Theorem 3.10 on the RTD holds for any countable alphabet.
5.3.1
Teaching Dimension
Since there is no condition on the alphabet in Theorem 3.9, we refer to this theorem
to show that there is no finite upper bound on the number of labelled examples the
teacher needs to teach a language in the class of one-variable pattern languages over
an infinite alphabet.
64
Theorem 5.5. Let Σ be any countably infinite alphabet. Then TD(1V ΠL) = ∞.
Proof. Immediate from Theorem 3.9.
5.3.2
Recursive Teaching Dimension
Similarly, choosing an infinite alphabet does not affect the recursive teaching dimension for the class of all one-variable pattern languages, as shown in Theorem 3.10.
Theorem 5.6. Let Σ be any countably infinite alphabet. Then RTD(1V ΠL) = 2.
Proof. Immediate from Theorem 3.10.
65
Chapter 6
Conclusions
In conclusion, we summarize our results on learning interesting classes of pattern
languages over different alphabets using both the teaching protocol and the recursive
teaching protocol and we address some open problems that we are still facing on
learning some subclasses of pattern languages.
This thesis documents the first research to have studied the learning of infinite
classes using the recursive teaching protocol. Furthermore, this thesis has found the
first example of classes of pattern languages that have infinite teaching dimension
while being learnable with a finite number of examples using the recursive teaching
protocol. Our proofs, in particular the subclass teaching sequences we provide, give
insights into the structure of pattern languages and could potentially be useful for
future research in various contexts.
Since this thesis studied the learning of the class of non-erasing pattern languages
using the recursive teaching protocol, one possible direction for future research is to
study whether or not the recursive teaching protocol decreases the sample complexity
for the class of erasing pattern languages when compared to the teaching protocol.
66
6.1
Arbitrary Patterns
In this thesis, we have investigated learning the class of all pattern languages using
both the teaching protocol and the recursive teaching protocol. We showed that using
the recursive teaching protocol improves the teaching dimension in some cases.
While we have not found a proof of improvement using the recursive teaching
protocol for finite alphabets, our results show that fewer labelled examples are needed
to teach every language in this class over infinite alphabets when using the recursive
teaching protocol. In fact, there is no finite upper bound on the number of labelled
examples the teacher needs to teach pattern languages using the teaching protocol
while the recursive teaching dimension for the underlying class is 2.
Moreover, we proved that the size of the alphabet does not play any role considering the teaching dimension for the class of pattern languages. This class has infinite
teaching dimension over both finite and infinite alphabets. In learning pattern languages over singleton alphabets, we obtained a recursive teaching dimension of 2 for
languages generated by patterns of length at most 8. Unfortunately, we have not
been able to determine an upper bound on the recursive teaching dimension for the
class of pattern languages over singleton alphabets and it is still an open problem.
6.2
Regular Patterns
For learning the class of regular pattern languages, we have shown that the underlying
class has smaller recursive teaching dimension over singleton alphabets. In order to
teach each language in the class of regular pattern languages, the teacher needs to
provide the learner with only 2 examples using the recursive teaching protocol while
three labelled examples are needed using the teaching protocol.
Moreover, we have proven that the size of the alphabet does not have any effect
67
on the recursive teaching dimension for the underlying class. In fact, the recursive
teaching dimension is 2 over both finite and infinite alphabets.
We have found a regular pattern language that has teaching dimension 4 over
finite alphabets of at least two symbols. Thus, we have shown that the teaching
dimension of the class of regular pattern languages over such alphabets is at least 4.
However, using the recursive teaching protocol the teacher needs only two examples
to teach each regular pattern language. Unfortunately, we have not determined an
upper bound on the teaching dimension for the class of regular pattern languages over
infinite alphabets.
6.3
One-Variable Patterns
For learning the class of one-variable pattern languages, we have showed that the
size of alphabets affects neither the teaching dimension nor the recursive teaching
dimension. However, there is an effective improvement in using the recursive teaching
protocol when compared to the teaching protocol.
Our results show that the teacher only needs two labelled examples to present
if the teacher uses the recursive teaching protocol for teaching one-variable pattern
languages over both finite and infinite alphabets. However, there is no finite upper
bound on the teaching dimension for the underlying class over both finite and infinite
alphabets.
Therefore, using the recursive teaching protocol optimizes the number of labelled
examples needed to identify each language of the class of one-variable pattern languages.
As a result, Table 6.1 provides an overview of the results we have obtained in this
thesis.
68
Class
Arbitrary Patterns
Regular Patterns
One-Variable Patterns
2 ≤ |Σ| ≤ ∞
TD = ∞
RTD ≥ 2
TD ≥ 4
RTD = 2
TD = ∞
RTD = 2
|Σ| = 1
TD = ∞
RTD ≥ 2
TD = 3
RTD = 2
TD = ∞
RTD = 2
|Σ| = ∞
TD = ∞
RTD = 2
TD ≥ 4
RTD = 2
TD = ∞
RTD = 2
Table 6.1: An overview of teaching dimension and recursive teaching dimension for
learning classes of languages generated by arbitrary patterns, regular patterns and
one-variable patterns over finite and infinite alphabets.
6.4
Limitations and Open Problems
All the techniques and theorems we have proven are useful for studying the class of
non-erasing pattern languages. Considering the class of erasing pattern languages our
techniques might be helpful but not sufficient for studying teaching of the full class
of erasing pattern languages as well as some of its interesting subclasses.
Moreover, we address some problems on learning pattern languages using the
classic teaching protocol and the recursive teaching protocol that are still open. We
have proven that the teacher needs a smaller number of examples to teach some
classes of pattern languages using the recursive teaching protocol but there are still
some questions for future research.
Open problem 6.1. What is the recursive teaching dimension for the class of arbitrary pattern languages over finite alphabets of size at least two?
In fact, we are interested in finding a best possible subclass teaching sequence for
the class of arbitrary pattern languages, since such a sequence could also give new
insights into the structure of pattern languages.
Open problem 6.2. What is the recursive teaching dimension of the class of arbitrary pattern languages over singleton alphabets?
69
We have found a subclass teaching sequence for arbitrary patterns of length at
most 8 but we were not able to find a particular subclass teaching sequence that is
applicable to the whole class.
Open problem 6.3. What is the teaching dimension of the class of regular pattern
languages over finite alphabets of size at least two?
We know that the teaching dimension is at least 4 for the class of regular languages
over finite alphabets of size at least two but we are interested in finding a tight upper
bound for it.
Open problem 6.4. What is the teaching dimension for the class of regular pattern
languages over infinite alphabets?
70
Bibliography
[1] D. Angluin. Finding patterns common to a set of strings. Journal of Computer
and System Sciences, 21(1):46–62, 1980.
[2] D. Angluin. Queries and concept learning. Machine Learning, 2:319–342, 1988.
[3] F. J. Balbach. Measuring teachability using variants of the teaching dimension.
Theoretical Computer Science, 397:94–113, 2008.
[4] A. Blumer, A. Ehrenfeucht, D. Haussler, and M.K. Warmuth. Learnability and
the Vapnik-Chervonenkis dimension. Journal of the ACM, 36(4):929–965, 1989.
[5] T. Erlebach, P. Rossmanith, H. Stadtherr, A. Steger, and T. Zeugmann. Learning
one-variable pattern languages very efficiently on average, in parallel, and by
asking queries. Theoretical Computer Science, 261(1):119–156, 2001.
[6] E. M. Gold. Language identification in the limit. Information and Control,
10(5):447–474, 1967.
[7] S. A. Goldman and M. J. Kearns. On the complexity of teaching. Journal of
Computer and System Sciences, 50:303–314, 1992.
[8] S. A. Goldman and H. D. Mathias. Teaching a smarter learner. Journal of
Computer and System Sciences, 52:67–76, 1994.
71
[9] S. A. Goldman, R. L. Rivest, and R. E. Schapire. Learning binary relations and
total orders. In Proceedings of the 30th Annual Symposium on Foundations of
Computer Science, pages 46–51, 1989.
[10] J. Jackson and T. Tomkins. A computational model of teaching. In Proceedings
of the Fifth Annual Workshop on Computational Learning Theory, COLT ’92,
pages 319–326. ACM Press, 1992.
[11] T. Jiang, A. Salomaa, K. Salomaa, and S. Yu. Decision problems for patterns.
Journal of Computer and System Sciences, 50:53–63, 1995.
[12] T. Koshiba. Typed pattern languages and their learnability. In Proceedings of
the 2nd European Conference on Computational Learning Theory, EuroCOLT
’95, pages 367–379. Springer, 1995.
[13] S. Lange and R. Wiehagen. Polynomial-time inference of arbitrary pattern languages. New Generation Computing, 8(4):361–370, 1991.
[14] S. Lange, T. Zeugmann, and S. Zilles. Learning indexed families of recursive
languages from positive data: A survey. Theoretical Computer Science, 397(13):194–232, 2008.
[15] S. Lange and S. Zilles. On the learnability of erasing pattern languages in the
query model. In Proceedings of the 14th International Conference on Algorithmic
Learning Theory, ALT ’03, pages 129–143, 2003.
[16] S. Lange and S. Zilles. Comparison of query learning and gold-style learning
in dependence of the hypothesis space. In Proceedings of the 15th International
Conference on Algorithmic Learning Theory, ALT ’04, pages 99–113, 2004.
72
[17] S. Lange and S. Zilles. Formal language identification: query learning vs. Goldstyle learning. Information Processing Letters, 91(6):285–292, 2004.
[18] A. Marron. Learning pattern languages from a single initial example and from
queries. In Proceedings of the First Annual Workshop on Computational Learning
Theory, COLT ’88, pages 345–358, 1988.
[19] B. K. Natarajan. On learning sets and functions. Machine Learning, 4:67–97,
1989.
[20] J. Nessel and S. Lange. Learning erasing pattern languages with queries. Theoretical Computer Science, pages 41–57, 2005.
[21] D. Reidenbach. A negative result on inductive inference of extended pattern
languages. In Proceedings of the 13th International Conference on Algorithmic
Learning Theory, ALT 02, pages 308–320. Springer-Verlag, 2002.
[22] D. Reidenbach. A non-learnable class of e-pattern languages. Theoretical Computer Science, 350(1):91–102, 2006.
[23] D. Reidenbach. Discontinuities in pattern inference. Theoretical Computer Science, 397(1-3):166–193, 2008.
[24] A. Salomaa. Patterns. Bulletin of the EATCS, 54:194–206, 1994.
[25] A. Salomaa. Return to patterns. Bulletin of the EATCS, 55:144–157, 1995.
[26] M. Satoshi and A. Shinohara. Learning pattern languages using queries. In Proceedings of the Third European Conference on Computational Learning Theory,
EuroCOLT ’97, pages 185–197. Springer-Verlag, 1997.
73
[27] A. Shinohara and S. Miyano. Teachability in computational learning. New
Generation Computing, 8:337–347, 1991.
[28] T. Shinohara. Polynomial time inference of extended regular pattern languages.
In RIMS Symposia on Software Science and Engineering, pages 115–127, 1982.
[29] T. Shinohara and S. Arikawa. Pattern inference. In GOSLER Final Report,
pages 259–291, 1995.
[30] L. G. Valiant.
A theory of the learnable.
Communications of the ACM,
27(11):1134–1142, 1984.
[31] S. Zilles, S. Lange, R. Holte, and M. Zinkevich. Models of cooperative teaching
and learning. Journal of Machine Learning Research, 12:349–384, 2011.
74

Download Report

learning pattern languages from a small number of helpfully chosen

Paperzz.com

Your Paperzz