Automatic Construction of Concept Maps

Automatic Construction of Concept Maps
Belinda Ng’asia Wafula
Master’s Thesis
Faculty of Science and Forestry
School of Computing
May 2016
i
UNIVERSITY OF EASTERN FINLAND, Faculty of Science and Forestry, Joensuu
School of Computing
Student, Belinda Ng’asia Wafula : Automatic Construction of Concept Maps
Master’s Thesis , 82p., 7 appendices (20p.)
Supervisor of the Master’s Thesis : PhD Wilhelmiina Hämäläinen
May 2016
Abstract:
A concept map is a graphical representation of concepts and relations of some knowledge domain as understood by the user. A common issue arising is the difficulty
in creating and evaluating different concept maps. Not even a human expert can
say for certain what a ”correct” concept map should look like, hence the need for
generation of semi-automatic or automatic concept maps. In this thesis, we give a
literature review on different automatic and semi-automatic methods for constructing concept maps. Then we introduce a new automatic method for constructing
concept maps. The heuristic applied to extract concepts is term occurrence. A
similar principle is applied in extracting relations. Initial results show that sensible
concepts and nouns occur more frequently in a given test material. More sensible
relations between concepts also occur more frequently in the text. With syntactic
analysis and auxiliary ontologies, term occurrence can be seen as a viable approach
to constructing fully automatic concept maps.
Keywords: Concept maps, automatic construction, algorithm, text material
CR Categories (ACM Computing Classification System, 1998 version): K.3.1 Computer Uses in Education, H.3.1 Content Analysis and Indexing, I.2.7 Natural Language Processing
ii
Acknowledgement
The start and completion of this thesis would not have been possible without the
abundant support from a number of people.
First and foremost, I would like to express my utmost gratitude to my supervisor
PhD Wilhelmiina Hämäläinen for the endless support, guidance and encouragnement offered over the course of my study.
I would also like to acknowledge Professor Pasi Fränti of the School of Computing, University of Eastern Finland as the second examiner of this thesis, and I am
gratefully indebted to him for taking his time to review this thesis.
Last but not least, I would like to express my gratitude to my parents, siblings
and friends for their support and continuous encouragement throughout the years,
without which, this would not have been possible. Thank you.
iii
List of Abbreviations and Symbols
ACMC
Automatic Concept Map Constructor
DM
Knowledge Discovery in DB (first three chapters)
TFSC
Theoretical Foundations of Computer Science
SW
Scientific Writing material
s
sentence
wi
word
Mw
total number of extracted words
ci
concept (which is represented by a word or a group of words)
m(wi )
absolute frequency of word wi (the number of times wi occurs in the
text)
mrel (wi )
relative frequency of word wi
m(ci )
the absolute frequency of concept ci (the number of times ci occurs in
the text)
minc
threshold for concepts
wi wj
consecutive words corresponding to a compound concept
m(wi wj )
absolute frequency of consecutive words (compound concept); (the
number of times the compound concept occurs in the text)
mincc
frequency threshold for compound concepts
ms (ci , cj )
number of times ci and cj occur in the same sentence in the text
(absolute frequency)
mrel (ci , cj )
relative frequency of co-occurrence of concepts (ci , cj ) in a sentence
Mr
total number of extracted co-occurring concepts in a sentence
minr
threshold for co-occurrence of concepts in a sentence
iv
Contents
1 Introduction
1
1.1
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.3
Organization of the thesis . . . . . . . . . . . . . . . . . . . . . . .
2
2 Concept maps
3
2.1
An overview of concept maps . . . . . . . . . . . . . . . . . . . . .
3
2.2
Applications and uses of concept maps . . . . . . . . . . . . . . . .
5
3 Semi-automatic construction of concept maps
9
3.1
Textstorm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
3.2
Clouds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
3.3
CmapTools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
3.3.1
Suggesters for concepts . . . . . . . . . . . . . . . . . . . . .
13
3.3.2
Suggesters for propositions, concept maps and multimedia resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
Suggesters for relevant topics . . . . . . . . . . . . . . . . .
15
3.4
Semi-automatic construction of topic ontology . . . . . . . . . . . .
16
3.5
Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
3.3.3
4 Fully automatic construction of concept maps
4.1
GNOSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v
19
19
vi
CONTENTS
4.2
Relex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
4.3
Concept Frame Graph . . . . . . . . . . . . . . . . . . . . . . . . .
20
4.4
Using concepts maps in digital libraries as a cross language resource
discovery tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
4.5
Identifying and extracting relations in text . . . . . . . . . . . . . .
22
4.6
Leximancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
4.7
Two phase concept map construction . . . . . . . . . . . . . . . . .
24
4.8
Related systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
4.9
Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
5 ACMC: An automatic concept map constructor
29
5.1
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
5.2
Concept extraction . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
5.2.1
Extracting and counting the frequency of words . . . . . . .
31
5.2.2
Pruning stop words . . . . . . . . . . . . . . . . . . . . . . .
31
5.2.3
Plurals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
5.2.4
Pruning infrequent concepts . . . . . . . . . . . . . . . . . .
32
5.3
Identifying compound concepts . . . . . . . . . . . . . . . . . . . .
32
5.4
Relation extraction . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
5.5
Development ideas . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
6 Tests and Experiments
37
6.1
Test cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
6.2
Data material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
6.3
Test measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
6.4
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
6.4.1
Overview of ACMC concept maps . . . . . . . . . . . . . . .
41
6.4.2
ACMC concept maps . . . . . . . . . . . . . . . . . . . . . .
42
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
6.5
CONTENTS
vii
7 Conclusions
53
Appendices
53
A DM book concept map produced by Leximancer
55
B TFCS concept map produced by Leximancer
58
C Scientific Writing concept map produced by Leximancer
61
D Stop words list
64
E Irregular plurals
66
F Hand-made concept map from TFCS
68
G Hand-made concept map from Scientific writing material
70
H Hand-made concept map from Data mining material
72
References
72
Chapter 1
Introduction
Concept maps have been defined as graphical representations of concepts and their
inter-relationships that are intended to represent the knowledge structure that humans store in their minds [NG84].
In this thesis we give a systematic overview of existing approaches for automatic and
semi-automatic concept map generation. We also introduce a new fully automatic
method for generating concept maps from text based on frequencies of occurrence
and co-occurrence of concepts.
1.1
Motivation
For educational purposes, concept maps are used as a learning tool for the students.
An effective concept map can be considered as a map that is easily understood by a
second party. Constructing an effective concept maps is sometimes considered as a
complex task as users may find it difficult to remember some concepts of a certain
topic hence the need for automatic construction of concept maps. By constructing
concept maps, students can have an overview of a given topic. Teachers can also
use concept maps as a reference tool to see if all relevant relations are represented
in their material.
1
2
1.2
CHAPTER 1. INTRODUCTION
Objectives
The first objective of this research is to investigate computational approaches applied in construction of concept maps from text. We concentrate on automatic
methods used in constructing concept maps although we review also some semiautomatic methods based on interaction with the user.
The second objective of the research is to develop and evaluate a new fully method
for constructing concept maps from text based learning material. We design, implement and test the system. The new algorithm employs frequency of occurrence
of a term in a text to extract and select salient concepts. Potential relations are
identified if extracted concepts occur in the same sentence.
Lastly, we evaluate the frequency based method of approach used to construct
concept maps automatically and how it works with different kinds of texts.
1.3
Organization of the thesis
The organization of this thesis is as follows: We begin in Chapter 2 by presenting the
notion of concept maps, their uses, types and how they are constructed. Chapters 3
and 4 describe semi-automatic and automatic methods of constructing concept maps
respectively. In Chapter 5 we introduce a new method for constructing concept maps
from text. The experiments are reported in Chapter 6 and the final conclusions are
drawn in Chapter 7.
Chapter 2
Concept maps
This chapter briefly describes the theoretical foundations of concept maps. The uses
and applications of concept maps are summarily reviewed in this chapter.
2.1
An overview of concept maps
Novak [NG84] defines concept maps as ”representations of concepts and their interrelationships that are intended to represent the knowledge structure that humans
store in their minds”. A concept map can also be described as a graphical representation of the user’s knowledge in a given domain [MSS99]. A concept is described
as some regularity within a group of facts and is designated by some sign or symbol.
Usually, concepts are represented by words or word groups (especially nouns and
noun phrases). Novak gives an example of a ”chair” which is a label/sign for an
instrument with four legs, a surface to sit on and a back to rest against.
Concept maps are composed of concepts which are enclosed in circles or boxes
(nodes) and relations between concepts, indicated by a connecting line linking two
concepts. Words or groups of words (typically verbs) on the connecting line depict
a labeled relation, and are known as linking phrases [SKUP+ 04]. Relations between
concepts can be represented as an unlabelled line between two nodes (Figure 2.1), a
labelled line describing the relationship (Figure 2.2), an arrow showing the direction
of the relationship between the concepts (Figure 2.2) or a line and a special symbol
3
4
CHAPTER 2. CONCEPT MAPS
at the end of the line showing the type of relationship (Figure 2.3).
Figure 2.1: An unlabelled and non-directional relationship between two concepts.
Figure 2.2: Labelled and directional relationships between concepts.
The process of constructing concept maps begins with identifying a familiar domain.
The topic of a concept map can be text or a particular problem or question to focus
on. The next step involves identifying key concepts, from the most general inclusive
to the less inclusive concepts that apply to the domain in focus. The last step is to
identifying relations between concepts and to find the appropriate words to describe
the relations (to form a meaningful proposition follows after) [NC06].
In its simplest form, a concept map has two nodes and a connecting line between
them. Types of concept maps are viewed from two extremes; hierarchical or treestructured maps and mind maps. Novak [NC06] deems hierarchical concepts maps
to be ideal. They are constructed in a tree-like manner, with the more general
concepts at the top of the map and the more specific concepts hierarchically at the
bottom of the map. On the other hand, mind maps are constructed freely from
a key idea, allowing any kind of association. Figure 2.4 shows an example of a
2.2. APPLICATIONS AND USES OF CONCEPT MAPS
5
Figure 2.3: Type of relationships between concepts.
hierarchical concept map and Figure 2.5 shows an example of a mind map.
In some cases, concept maps consist of extensions that clarify and complement the
concepts. Such extensions include resources such as Web pages, pictures, examples
and text in the concept map [nHC+ 04].
2.2
Applications and uses of concept maps
Concept maps have been widely used in education. They have been demonstrated
to be a successful instructional tool to help learners in their understanding process.
Concept maps are popular as they aid in creative thinking, knowledge extraction,
planning, note taking, summarization [SRF03], idea generation, knowledge creation
[AKM+ 03] and as assessment [HBN96] and evaluation tools [MMJ94]. Concept
maps can also be used to summarize papers. According to [RF05], a concept map
can be as good a summary as an abstract, and are easier to automatically prepare
and translate than a written abstract.
David et. al [DSB] have used concept maps and concept questions for engineering university level to help in their conceptual understanding of the discipline and
stimulate thinking. In [WSL06] concept maps have been used in searching through
historical archives. These maps provide a representation of the important retrieved
entities, that might be used in later searches. Maria [Jak03] demonstrated the appli-
6
CHAPTER 2. CONCEPT MAPS
Figure 2.4: Sample concept map of a concept map [NC06].
Figure 2.5: An example of a mind map representing the author’s understanding of
Educational Technology course.
2.2. APPLICATIONS AND USES OF CONCEPT MAPS
7
cation of concept maps in conjunction with practical and cognitive apprenticeships
to teach and improve programming skills in holistic learners. The use of concept
maps proved to stimulate meaningful learning in undergraduate medical students
taking a PBL (problem-based learning) [RFP06]. McClure et al [MSS99] researched
on the use of concept maps to assess learners’ knowledge on certain concepts.
The use of concept maps is not restricted to education, but they are used in business planning, public administration and health sector, among others. Concept
maps have been employed in community mental health [JBS00] for program planning and evaluation purposes. Compared to other knowledge elicitation tools, concept mapping is considered as an efficient method for generating models of domain
knowledge [HCCN02]. When integrated with other systems, concept maps have
been used as interfaces for intelligent software (i.e., knowledge based systems and
tutoring systems) in various domains [CCH+ 03].
From an educational instructor’s point of view, concept maps can be used to reveal
a learners’ understanding or misconception [RRS98] of a certain knowledge domain.
There are no ”correct” concept maps but often the teacher’s concept map is used as
a reference map [dRdCJF04]. However, a teacher’s map reflects the teacher’s way
of thinking. For a more objective map, a different approach used to construct the
concept map is applied. Automatically constructed concept maps are less biased,
easy to generate and can be used as reference maps. Hideo et. al [FYI02] developed
a concept mapping software that ”supports the externalization of ideas, reflection on
thinking processes and dialogues” by allowing collaborative learning by permitting
several users to construct one concept map. There have been several tools like
CmapTools [LMR+ 03], Clouds [POC00], Leximancer [SH05] and GNOSIS [GS94],
which attempt to construct concept maps, in interaction with the users to generate
concept maps automatically.
In summary, a concept map is a type of knowledge representation to develop mental
schemas or mind maps that act as a reference for future actions and thinking [BB00].
Concept maps can be applied in different areas and not limited to the education field.
8
CHAPTER 2. CONCEPT MAPS
A common issue arising, is the difficulty in evaluating different concept maps. Not
even a human expert can say for certain what a ”correct” concept map should look
like. Therefore, it can be hypothesized that an automatically constructed concept
map has a reduced degree of bias compared to a manually constructed concept map.
Chapter 3
Semi-automatic construction of
concept maps
Semi-automatic construction of concept maps is an approach where a software tool
is used to create concept maps with the help of the user.
In this chapter, we review four tools dedicated to assisting in the process of constructing concept maps. These methods suggest elements (concepts, topics or relations), based on a given domain. As these methods are used for semi-automatic
construction of concept maps, the role of the user in the process of construction of
concept maps will be discussed. In the section below, we present the algorithms
used in the existing methods for extracting and suggesting elements. We give a
short summary and comparison of the introduced tools.
In the following section, we introduce four tools which are used in the area of semiautomatic concept map construction; Clouds[POC00], Textstorm[APC01], CmapTools[LMR+ 03], and a tool for semi-automatically constructing topic ontologies
[FMG05].
3.1
Textstorm
With no prior knowledge about the domain in focus, Textstorm [APC01] parses and
tags a text file producing binary predicates (e.g ”eat(cow, plants)”). The system
feeds the output into another system Clouds [PC00].
9
10CHAPTER 3. SEMI-AUTOMATIC CONSTRUCTION OF CONCEPT MAPS
Textstorm tags a text file using WordNet [MBF+ 90]. The predicates built map relations between two concepts from parsing sentences. Since concepts in a text are not
named every time by the same name, Textstorm uses synonymy relationship from
WordNet to find concepts previously referred with a different name. In Textstorm,
relations are identified as verbs in a sentence, with the subject as the first concepts
and the object (verbal phrase) as the second concept in the predicate (e.g., in a sentence ”Jupiter is a big planet”, Textstorm builds the predicate ”isa(Jupiter, big)”.
The resulting predicates produced act as inputs to Clouds [PC00], a system that,
through interaction with the user, builds a complete concept map.
3.2
Clouds
Clouds [POC00] is a program that suggests concepts and relations to the user. The
user first ”feeds” the program with the basic concepts in the domain. The user
provides concepts to be focused on based on the questions provided by Clouds.
Two algorithms are used. One algorithm selects which concepts to work with in the
concept map. The next two algorithms are based on inductive learning and suggest
concepts and relations for the concept map.
First, the domain knowledge is given as an ontology of primitive concepts (an isatree). The first three levels of the tree are fixed but the user can then add new
concepts to the tree. Figure 3.1 shows a graphical representation of the first three
levels provided to Clouds as the ontology base.
The tasks performed by Clouds are as follows:
1. Clouds starts by selecting the most relevant but not fully explained concepts
to work with from the map. The relevance of a concept is defined as follows:
Definition 1
Rel(c1 c2 ) =number of separate paths between c1 and c2
X
AbsRel =
Rel(c1 ci )
c1
3.2. CLOUDS
11
Figure 3.1: First three levels of the ontology base used in Clouds.
2. The program aims to find, for each relation, components that are related
by the given relation. With the help of the isa-tree, the program analyzes
the existing relation, establishing which categories are typically linked by the
relations. For instance,
Example 1 If it is observed that:
produce(apple tree, apple)
produce(pear tree, pear)
and from the domain knowledge, it is known that
isa(pear tree, tree)
isa(apple tree, tree)
isa(pear, f ruit)
isa(apple, f ruit)
a generalization is deduced
produce(tree, f ruit)
Generalization defined to be obtaining categories up the tree, while special-
12CHAPTER 3. SEMI-AUTOMATIC CONSTRUCTION OF CONCEPT MAPS
ization is obtaining categories down the tree. Clouds searches for the most
general specializations that ”avoid” this new observation. The results of the
search as split into binary predicates that represent the pairs of category of the
arguments that cover the positive examples. The final step involves explaining
relations in a given context. The algorithm used to learn the relations is based
on inductive logic programming [MF90]. In the map, the context is defined as
relations each argument has with other concepts, up to a predefined depth. In
this phase, generalization may happen in two ways, universal quantification or
dropping of a term, illustrated in figures 3.2 and 3.3[POC00]. The dropping
of a term occurs when a new observation reflects and over-specialization of a
clause.
Figure 3.2: Joining the contexts leads to a generalization.
Figure 3.3: Universal quantification of the first argument of eat.
Specialization occurs when a negative example is given. Clouds selects clauses
that ”cover” the example and finds all positive declared predicates covered by
3.3. CMAPTOOLS
13
the selected clauses. A new hypothesis is generated by adding to the previous
clause terms that are satisfied by the positive example.
3.3
CmapTools
CmapTools [LMR+ 03] is a tool that allows the user and the program to interactively construct concept maps. CmapTools allows intergration of other multimedia
resources into the concept maps. Starting from an incomplete concept map, CmapTools suggests concepts, proposition, other concept maps and new topics for the
users.
3.3.1
Suggesters for concepts
In CmapTools, Concept suggester is a module used for searching and suggesting new
concepts [CCA+ 02] to be used in the concept map.
Based on the current map, the system mines the Web for relevant documents,
which are cached to be used for mining concepts at a later stage. The current
map is converted into a text query, which is used to retrieve additional relevant
documents. From the collected documents, a search is made for the documents
related to the current concept map.
To search for concepts to suggest, the system searches from the of retrieved documents, concepts already in the map. For each concept found, neighboring words,
defined by a distance threshold (currently 3 words), are identified as potential concept suggestions. The frequency of the terms is used to determine the concepts to
be suggested.
3.3.2
Suggesters for propositions, concept maps and multimedia resources
This part of the system applies case-based reasoning [Kol93, Lea96] to provide
proposition and concept map suggestions by analyzing prior knowledge models.
14CHAPTER 3. SEMI-AUTOMATIC CONSTRUCTION OF CONCEPT MAPS
When a user wants to ”extend” a concept map, the system views the original map
and prior concept maps as examples of how that concept was extended in the past.
Category index computed from the concept map library, is used to organize concept
maps into a hierarchical structure. The index of each category maintains references
to the original map and a cluster representative. The cluster representative is used
to determine if a new concept map is related to the maps in the category.
Concept map similarity is computed from a vector representation of the concept
maps. The system assigns higher and lower weights to the keywords from the top
and bottom of the concept maps respectively. The weight of a keyword i of concept
k in map Cj is computed as:
1
1
)δ
wijk = f reqijk .(αn + βm).( (d+1)
where Cj is a concept map of library of maps L, f reqijk is the raw frequency of
keyword i in the label of concept k.
The total weight of keyword i in Cj is the sum of all weights wijk for all concepts k
in map Cj .
Users can initiate new search for concepts or multimedia resources by selecting
concepts for which extensions are sought. The suggester converts the map into a
vector representation and extracts keywords selected by the user or the suggester.
The keywords are used to search for suggestions in a case while the vector is used
to perform a binary search for the best-fitting category.
Suggestions extracted are ranked by means of a key word correlation metric, which
is based on the distance between concepts within a concept map. The distance
based correlations between keywords i and j is computed as:
X
1
2
Mχ (i, j) = |Θi |+|Θ
×
|
j
DC (i, j)
C∈(Θi ∩Θj )
where Θi and Θj are the set of maps in χ containing keywords i and j. DC is
computed as 1 + minimum number of links between concepts containing i and j
From keywords(i, j ), the rank is computed by extracting i from the potential suggestions and j from the selected concepts in the map being constructed.
3.3. CMAPTOOLS
3.3.3
15
Suggesters for relevant topics
EXTENDER (EXtensive Topic Extender from New Data Exploring Relationships)
is a module that suggests novel topics, presented as small collections of terms, to
be included in the knowledge model. The approach applied mines the Web using
information automatically gained from the current concept map. The system takes
the knowledge model as input and mines the Web for topics related to the current
model. New information is used to guide further searches, with each topic generated,
specifies by a set of weighted terms. The EXTENDER goes through the following
steps to generate topics:
1. Apply topological analysis to convert concept maps to a vector form and
generate initial corpus
2. Combine weighted terms to produce first generation of artificial topics
3. Steps 4-10 repeated until the final generation of topics
4. Define similarity threshold using diversity factor
5. Define context for search using
6. Generate queries for a web search engine
7. Filter irrelevant results using context and similarity threshold
8. Identify relevant novel keywords and update the corpus
9. Use the diversity factor to integrate returned results with prior information
and complete the term web page matrix
10. Apply term clustering to the term-web page matrix to obtain new generation
or artificial topics.
16CHAPTER 3. SEMI-AUTOMATIC CONSTRUCTION OF CONCEPT MAPS
3.4
Semi-automatic construction of topic ontology
Ontologies can be interpreted as a special case of concept maps. A topic ontology
is one example. A topic ontology is defined as a set of topics connected with different types of relations [FMG05]. The method proposed in [FMG05] applies Latent
Semantic Indexing (LSI) [DDF+ 00] and K-Means Clustering [JMF99] to discover
and suggest topics within a corpora.
The text documents are first converted into a vector representation using standard
Bag-of-Words (BOW) and TFIDF weighting [Sal91]. Cosine similarity, similarity between two documents, is computed as the cosine angle between two vector
representation.
Topics are extracted from documents using LSI and Singular Value Decomposition
(SVD) and BOW for detecting words with similar meanings. K-Means clustering
algorithm is used to cluster documents that share similar words.
Extracting keywords from documents involves two methods: keyword extraction
using centroid vectors of a topic. In this context, centroid is the sum of all vectors
of the document inside the topic. Key words are selected based on the weights of the
centroid vectors of a topic. The second method involves a Support Vector Machine
(SVM) binary classifier [Joa99]. They use the following example to illustrate this
method. Suppose A is a topic to be described with keywords. All documents with
A as a subtopic are marked as negative, and documents from topic A are marked
as positive. If a document has both positive and negative marks, its is marked as
positive. An SVM classifier is used to classify the centroid of topic A. Keywords
are words whose weight in the SVM normal vector contribute most when deciding
in centroid is positive.
In recent years, other systems related to semi-automatic construction of concept
maps, by building and learning ontologies, have been developed. SOAT [WH02]
uses parts-of-speech structure and rules for Chinese language to extract concepts and
3.5. COMPARISONS
17
Table 3.1: Screen showing the relations found to a concept.
relations. Onto-Learn [NVG03], ASIUM [SZL14] and Adaptiva [BCW02] use similar
methods, they apply linguistics patterns and machine learning in the extraction
process. Text-To-Onto [MS01, MS04] employs different extraction approaches and
combines the results to support construction of ontologies.
3.5
Comparisons
18CHAPTER 3. SEMI-AUTOMATIC CONSTRUCTION OF CONCEPT MAPS
Chapter 4
Fully automatic construction of
concept maps
Fully automatic construction of concept maps means that the system constructs a
concept map from a source, for instance, a text document, without the user being
involved in the process. In this chapter, we present those works that have made a
contribution to the progress in the field of automatic concept map construction.
In this chapter, we present an overview of the different approaches to automatic
concept map construction. This section is dedicated to discussing some of the
existing works and applications related to fully automatically constructing concept
maps.
We will then make a comparison of the different applications discussed. We see the
different forms of resources and initial inputs used in the construction of concept
maps and the different methods and approaches utilized in the process of concept
map construction. We also look at the final products produced by each of the
discussed applications.
4.1
GNOSIS
Gaines and Shaw [GS94] developed a system they called GNOSIS. This system
automatically produced concept maps purely based on occurrence of words in a
sentence, a technique commonly used in information retrieval systems [CLR86]. The
19
20CHAPTER 4. FULLY AUTOMATIC CONSTRUCTION OF CONCEPT MAPS
system is able to extract related concepts but does not label the relations found.
Unfortunately, not much has been documented on the algorithms used to extract
the concepts and their unlabeled relations in GNOSIS.
4.2
Relex
A somewhat more sophisticated approach used to generate concept maps automatically, was used by Richardson and Goertzel [RGFP06]. Relex tool uses grammatical
analysis to extract noun phrases and noun-verb-noun relations. Relex uses template
matching algorithms to convert syntactic dependencies to graphs of semantic primitives [Wie96]. Relex changes passive and active forms into the same representations
and assigns tenses and numbers to sentence parts. The system used CMU’s link
parser [ST93] and WordNet for morphological functions [Fel98].
4.3
Concept Frame Graph
In [RT02], a collection of documents are represented as a special kind of concept
map, called a concept frame graph. The nodes of the graph are described as concept
frames.
Definition 2
Concept frame is an object represent as [ NAME, SYNSET, RELS, CONTEXTS]
where NAME is the name of the concept, SYNSET is a set of synonyms of the
concept, and RELS is a set that describes the relations of the concept with other
concepts. Each relation is represented as a tuple ( AgentCF, RELS, ObjectCF)
where AgentCF and ObjectCF are pointers to concept frames and RELS is the relation between them. CONTEXTS is an optional set of text segments corresponding
to each relation tuple in RELS.
The concept frame graph is constructed in the following steps:
• Pre-processing: menu bars or formatting specifications are removed from
documents.
4.3. CONCEPT FRAME GRAPH
21
Figure 4.1: Graphical representation of a concept frame.
• Name entity recognition: all entities are identified using a co-occurrence
resolution algorithm [ZS01], which are then extracted from the documents.
• Grammatical analysis: parts of speech are tagged, using a set of rules they
invented themselves, resulting to a NVN 3-tuple, (NC, VC, NC ), where NC
is a noun clause and VC is a verb clause.
• Word sense disambiguation: this step involves sense disambiguation of the
extracted noun clauses. A handcrafted algorithm, using WordNet [Fel98],
picks the correct word sense based on the context of the noun clause.
• Clustering: A fuzzy ART [CGR91] based clustering algorithm is applied to
cluster the disambiguated noun clauses. For this purpose, the noun clauses are
first converted to vector. All key terms are extracted from the parts-of speech
information to form a weight vector, c = (c1 c2 ...cm ) where m denotes the
number of features extracted and ci the term frequency for term i, i = 1 ... m.
22CHAPTER 4. FULLY AUTOMATIC CONSTRUCTION OF CONCEPT MAPS
The vector is then normalized by dividing all elements with maxi ci .
• Frame filling: A collection of cluster members forms a SYNET. RELS are
formed by generalizing the NVN 3-tuple. Sentence fragments corresponding
to the relation tuple are collected to form the CONTEXTS. The name of the
frame is established as the most dominant member of the SYNSET.
4.4
Using concepts maps in digital libraries as a
cross language resource discovery tool
Richardson and Fox [RF05] used a somewhat similar approach to grammatical analysis used in [RT02]. The notion of part-of-speech is applied to find noun phrases
from electronic theses and dissertations (ETDs). These are extracted using MontyTagger [Liu03] program and are used as nodes in the concept map. Verbs or
prepositions are used as the links between the nodes. The selection of the linking
word is based on frequency of occurrence in the document against the frequency
in the language used. They had two choices on how to select concept maps. Only
the most important concepts were selected and included in the concept map, based
overall document. They also had maps, based on each chapter of the document.
The nodes of the maps were identified based on their part-of-speech. The second
method involved using chapter and section heading as a skeleton concept map,
then selecting terms based on how frequently they appear with words in chapter
and section headings. Several relation extraction methods were tested. Pearson’s
Chi-squared, Dice’s co-efficient and mutual information were found not be ideal as
they favored uncommon terms in the text. Association rules [CGR91] and t-scores
produced relevant relations.
4.5
Identifying and extracting relations in text
Other similar research has been done by Roy and Yael [BR]. A collection of extractors, referred to as Textract identify terms such as people’s names, places, or-
4.6. LEXIMANCER
23
ganizations, abbreviations and other special single words in document collections.
One particular extractor, name extractor [RW33] identifies capitalized words and
selected prepositions, as potential names. These names are categorized based on
their types, for example, a person or a place.
The frequency of occurrence of the concepts identified by the extractors serves to
identify the most significant concepts.
4.6
Leximancer
A more sophisticated system of constructing concept maps automatically has been
developed [SH05]. Leximancer [Smi05] is a data mining tool that extracts information from text documents and represents the information as main concepts and
their relations. Concepts in Leximancer are defined as ”collection of terms that
provide evidence for the use of the concept in the text”. In addition, Leximancer
identifies proper names (words that start with capital letters) as potential candidate
concepts. Leximancer extracts and measures the frequency of the main concepts.
Concept extraction phase begins with Leximancer identifying ”seeds” words, which
form the starting points of the concepts. These are the most frequently appearing
words that are not stop words. Concepts are established based on the seed word
and words associated with the seed word. This is done by identifying words that
occur close to the seed words. This process is known as concept learning. Learning
of concepts involves the following steps:
1. The relevancies of a seed word and all other words in the document are calculated.
2. If the relevancies fall above a set threshold, the words are added to the concept
definition list.
3. The relevancies of other words in the document and all the new concepts
definition list is calculated.
24CHAPTER 4. FULLY AUTOMATIC CONSTRUCTION OF CONCEPT MAPS
4. If the relevancies fall above the threshold, the words are added to the concept
definition list again.
5. The learning stops when the number of sentence blocks classified by each
concept remains stable
Leximancer determines relationships by measuring the closeness and frequency of
extracted concepts in the text. Here, a window ( specified length of words or sentences) is moved sequentially through the text, and the concepts within this window
marked. The frequency of co-occurring concepts against all other is calculated, resulting to a concept co-occurrence matrix.
Leximancer uses Bayesian decision theory and word association norms to compute
the relevancy measures.
4.7
Two phase concept map construction
In other instances, work is focused on extracting relations. Sue et al [SWST04] propose a different approach for constructing concept maps for a course from historical
test records, Two phase concept map construction (TP-CMC). TP-CMC involves
two phases, Grade fuzzy association rule mining and Concept map construction.
TP-CMC uses a table, Test item concept mapping table that records related concepts of each test item in a quiz. This algorithm aims not to extract concepts, as the
concepts have already been established, rather the algorithm identifies pre-requisite
relationships among the concepts in the test items and constructs a concept map
based on these relations.
Grade fuzzy association rule mining phase. This phase involves the following
steps:
1. Grade fuzzification: this process involves the application of Fuzzy set theory to
convert numeric grade data into symbolic notation, ”Low”, ”Mid” and ”High”
representing low, middle and high grade respectively.
4.8. RELATED SYSTEMS
25
2. Anomaly Diagnosis: discrimination of an item is used to set good test items
apart from bad test items. This step aims to refine the input data by reducing
the redundant data not to be used in the concept map. If the discrimination
of the sets is too low (most students get high scores or low scores) this item
is considered redundant. To remove redundancy, the input data, Fuzzy item
analysis for norm-referencing (FIA-NR) is used.
3. Fuzzy data mining: In this step, the algorithm recognizes the existence of
relationships between two test items. Look ahead fuzzy association rule mining
algorithm [TTL01] is used to find fuzzy associations between the test items.
Concept map construction phase. In this concept construction phase, further
refined association rules, based on observation of real learning situations, are used to
analyze the pre-requisite relationships between learning concepts in quizzes. A proposed algorithm, Concept map construction algorithm, is used to find corresponding
concepts of concept sets to construction concept maps. The algorithm is based on
the Test item concept mapping table and pre-requisite relationships. Finally, the
Cycle detection process is used to detect and delete unwanted pre-requisite relationships that form a cycle between concepts.
4.8
Related systems
[Coo] concentrates on finding relations in a collection in the biomedical domain.
Relations between terms are computed based on proximity and frequency. If two
terms occur near each other often, then there exists a stronger relation between
the specific terms, compared to if these two terms occurred close together once.
Furthermore, the weights of the relations are computed using the following formula:
m = log[
totalterms × paircount
]
f req1 × f req2
where totalterms is the total number of unique terms in the collection, paircount
is the number of documents in which both terms occur, freq1 and freq2 are the
26CHAPTER 4. FULLY AUTOMATIC CONSTRUCTION OF CONCEPT MAPS
frequencies, of the two terms, with values of m (mutual information values for the
pair terms) lying between 0 and 100. Diagram 4.2 shows the output produced.
Diagram 4.2 was cited from [Coo]
Figure 4.2: Screen showing the relations found to a concept.
4.9
Comparisons
Table 4.1 summarizes the approaches for automatically constructing concept maps
that have been earlier discussed.
4.9. COMPARISONS
Table 4.1: Screen showing the relations found to a concept.
27
28CHAPTER 4. FULLY AUTOMATIC CONSTRUCTION OF CONCEPT MAPS
Chapter 5
ACMC: An automatic concept
map constructor
In this chapter, we introduce ACMC, an automatic concept map constructor. The
ACMC will enable learners, instructors and evaluators to construct concept maps
automatically from text (learning material).
5.1
Overview
The main processes involved in ACMC is the extraction of words from the text, finding significant concepts, and extraction of potential relations between the extracted
concepts. The basic method used in ACMC to extract concepts is term occurrence.
No syntactic analysis, auxiliary ontologies or other resources are needed by ACMC
to extract concepts. Potential relations between concepts are extracted based on
whether the concepts occur in the same sentence.
As Input, ACMC requires a number of external data sources in the construction of
the concept map. ACMC requires a file containing the text material in which to
extract concepts and relations. ACMC requires two additional files. One containing
a list of stop words. Stop words are are words that are so common that they are
useless to index. ACMC uses this file to access the stop words that will be eliminated
from the list of extracted words. The second file contains a list of identified irregular
plurals and their corresponding singular. Examples of irregular plurals are indices,
29
30CHAPTER 5. ACMC: AN AUTOMATIC CONCEPT MAP CONSTRUCTOR
thesis, automata among others. See appendix E.
ACMC produces a text form concepts map; this is a list of single concepts and
one word and compound concepts as two words. The relations extracted could
be between two single word concepts, two compound concepts or between a single
word concept and a compound concept. This is represented by with each concept
separated by curly brackets.
The main principle applied by ACMC in extraction of concepts is based on calculating the frequencies of word occurrences or co-occurrences of two words. Note
that occurrence of a word refers to how many times a word has occurred in the text
either in its plural or singular form. Co-occurrence of two words refers to how often
two consecutive words occur in the same sentence. A similar approach was used to
establish potential relations.
The following steps describe the method used in the extraction of concepts and
relation:
1. Extract words from a given text.
2. Calculate the frequencies of the extracted words.
3. Remove stop words.
4. Merge plurals and singular forms of words.
5. Prune infrequent words.
6. Construct compound words.
7. Calculate the frequencies of compound words.
8. Find relations between concepts in a sentence.
9. Prune infrequent relations.
10. Display the map.
5.2. CONCEPT EXTRACTION
5.2
31
Concept extraction
ACMC extracts concepts from text by first reading a file containing the text material. ACMC extracts words and counts the frequency of the words. All extracted
words are put into a binary search tree.
5.2.1
Extracting and counting the frequency of words
ACMC reads a given text file and extracts the words. The frequency of the word
is calculated by increasing the counter associated with a new word by 1 each time
a new word is encountered. Every time the word is encountered, the counter is
increased.
A balanced binary tree (red-black tree), for storing the extracted words, is used for
efficient indexing [CSRL01].
5.2.2
Pruning stop words
Stop words are words that are so common that they are useless to index. Stop
words are part of the English grammar and are used in sentences to form grammatically correct sentences, and scientific texts are no exceptions. In English, the most
common stop words are ”a”, ”of”, ”the”, ”you” among others. See appendix D.
ACMC uses a list of 100 most common stop words in English [Tex] addition, we
have added to the list a number of words which occur often but are meaningless
for concept maps. Such words ”too”, ”section” and letters of the alphabet. On the
other hand, some stop words were excluded from the list, words such as ”time” as
they were considered important concepts in the domain.
5.2.3
Plurals
In the text, a concept may occur in their singular and plural forms. For this reason,
we combine the singular and plural forms to represent one concept and get the
correct frequencies of a concept.
32CHAPTER 5. ACMC: AN AUTOMATIC CONCEPT MAP CONSTRUCTOR
The most common plural forms are identified with the following heuristics: if a
word ends with ’s’, and the same word without ’s’ occurs in the text, then they are
considered as the same concept. A slight problem occurs as words are not identified
as nouns, as verbs can also end with ’s’.
If a word is an irregular plural form, its corresponding singular form is obtained
from a list. See appendix E.
5.2.4
Pruning infrequent concepts
Assumption: the higher the frequency of a concept, the more significant it is.
In this case, we decided to set the threshold to be the absolute frequency of occurrence of a concept as ≥ 3. The user can decide the threshold, but should note the
following:
• Size of the text document ( a larger document would require a higher threshold).
• Writing style used in the text. For instance, text with contents in bullet and
list forms would require a different threshold from text written in a book
format.
• Contents within the text. For instance, text that contains mostly formulas
and equations would require a lower threshold as compared to text with plain
text.
We define relative frequency of a concept to be the absolute frequency of a concept
divided by the total number of concepts extracted:
m( wi )
.
mrel (wi ) = M
w
5.3
Identifying compound concepts
Merriam Webster dictionary defines a compound concept as a word consisting of
components that are words, representing a generic idea. In the context of this
5.3. IDENTIFYING COMPOUND CONCEPTS
33
chapter, we define a compound concept as two words wi wj that occur consecutively
in the same sentence, for example ”association rule”.
The pseudocode for extracting compound concepts from the given text and prune
out infrequent compound concepts is given in Algorithm 1
Algorithm 1 FindCompoundConcepts(DataFile, tree, FreqThreshold)
For all sentences s = w1 w2 w3 ... wn in DataFile
For i = 1 to n-1
w1 = wi
IF ((tree.Find(w1) != NULL) AND (i < n − 1))
w2 = wi+1
IF (tree.Find(w2) != NULL)
CompArray[w1.index][w2.index]++
size = CompArray.Length
For i = 1 to size -1
For j = i+1 to size
Freq = CompArray[i][j]
IF (Freq ≥ minc )
Output compound concept wi wj and Freq
F req
≥ mincc )
IF ( m(w
i)
Remove wi from the tree
// wi occurs seldom alone
F req
IF ( m(w
≥
min
)
cc
j)
Remove wj from the tree
// wj occurs seldom alone
In principle, compound concepts with more than two consecutive words can be
identified in a similar way, if the heuristics is extended to accommodate them. what
kind of compound concepts cannot be identified with this heuristic
Once the compound concepts were established, a set threshold, mincc = 0.90, was
used to prune the infrequent compound concept. The user can decide the threshold
to be used, but note that it should be large enough.
34CHAPTER 5. ACMC: AN AUTOMATIC CONCEPT MAP CONSTRUCTOR
5.4
Relation extraction
In extracting potential relations, we assume that two concepts are related if they
appear in the same sentence. It is worth noting that, two consecutive concepts are
not considered as related if they form a compound concept. The pseudocode for
extracting potential relations between concepts is given in Algorithm. 2
Algorithm 2 FindRelations(DataFile, tree)
For all sentences s = w1 w2 w3 ... wn in DataFile
For i = 1 to n-1
w1 = wi
IF ((tree.Find(w1) != NULL) AND (i < n − 1))
w3 = wi + wi+1
// compound concept
IF (tree.Find(w3) != NULL)
j=i+2
ELSE IF tree.Find(w1) != NULL
j=i+1
IF ((tree.Find(w1) = != NULL) OR (tree.Find(w3) = != NULL))
w2 = wj
(IF (tree.Find(w2) != NULL) AND (j < n))
w4 = wj + wj+1
// compound concept
IF (tree.Find(w2)!=NULL)
IF (tree.Find(w1) != NULL)
Relations[w1.index][w2.index]++
IF (tree.Find(w3) != NULL)
Relations[w3.index][w2.index]++
IF (tree.Find(w4) != NULL)
IF (tree.Find(w1) != NULL)
Relations[w1.index][w4.index]++
IF (tree.Find(w3) != NULL)
Relations[w3.index][w4.index]++
size = Relations.Length
For i = 1 to size -1
For j = i+1 to size
Freq = Relations[i][j] + Relations[j][i]
IF (Freq ≥ minr )
Output Relation wi , wj and Freq
5.5. DEVELOPMENT IDEAS
35
In the heuristic used to prune the infrequent relations, we assume that, the higher
the frequency of co-occurring of concepts in a sentence, the more significant the
relations is.
The last step involves displaying the results. It is worth noting at this point that,
the results displayed are of concepts that participate in relations. The results could
be seen as being small groups of concept maps instead of a large connected concept
map.
5.5
Development ideas
Concept maps are considered as a graphical representation of one’s knowledge in a
certain domain. ACMC is an application that aims at automatically constructing
concept maps from text. The current implementation of ACMC does not offer
a graphical user interface. This exclusion would cause a problem to one who is
not familiar to ACMC. Creating a GUI to map the concepts and the relations
between the concepts extracted from the text would make it easy for the user to
comprehend the concept map and show the results in a visual form. On the other
hand, educationally, this might help students identify which concepts are more
relevant once they are given a list of concepts.
The approach used in extracting concepts in ACMC is based on the frequency
of occurrence of a word. Other approaches could be integrated into ACMC. For
instance, checking for concepts in headings; chapter headings, section headings,
topic and introductory sentences of paragraphs, and emphasized words, table and
figure captions. These considerations can be analyzed and implemented in ACMC as
different techniques of extracting concepts. To further substantiate the significance
of concepts, concepts could be passed through a series of tests to establish their
relevance to the text. For instance, ACMC could check if the extracted words appear
in any headings, topic sentences or were emphasized. These concepts could be
considered to have more significance to the text than other concepts that appeared
in normal text.
36CHAPTER 5. ACMC: AN AUTOMATIC CONCEPT MAP CONSTRUCTOR
In ACMC, compound concepts are considered as two consecutive words. This notion
can be expanded taking into account that compound concepts can consist more than
two words.
Only one measure (co-occurrence of concepts in the same sentence) of extracting
relations has been implemented into ACMC. Other means of finding relations could
be analyzed and implemented as well. Potential relations could be determined if
concepts appeared in paragraphs as well as in sentences. The significance of relations
could be weighed such that if a relation occurred in a sentence as well as within
a paragraph, then this would weigh more than if the relation appeared within the
paragraph only, or in the same sentence.
Chapter 6
Tests and Experiments
In this chapter, we present the results obtained from the experiments performed on
the ACMC. We make comparisons between the ACMC constructed concepts maps
and the manually constructed concepts maps. We will also present results obtained
from running the test data through Leximancer, a tool for automatic concept map
construction.
6.1
Test cases
After the design and implementation of the ACMC application, several tests were
made.
• Basic statistics: Given different thresholds, the following requirements were
tested:
– The program extracts single word concepts and their frequencies correctly.
– The program extracts compound concepts and their frequencies correctly.
– The program identifies singular and plural words correctly and combines
their frequencies.
– The program extracts potential relations:
1. between single word concepts
37
38
CHAPTER 6. TESTS AND EXPERIMENTS
2. between single word and compound concepts
3. between compound concepts
• Comparison to human drawn maps: We compared the results from human constructed maps against the results obtained from the ACMC
• Comparison to Leximancer: We compared the results from the human
constructed maps and the ACMC against the results from the Leximancer.
This test served to see if there were differences among the different approaches
applied to extract and construct concept maps
6.2
Data material
One of the areas of focus on testing was to see how the ACMC would work with
different types of learning material. Three different test data were used in testing
and four different frequency thresholds were applied on each of the test data. The
lowest frequency threshold to be used was 3, which in this chapter is sometimes
referred to as ’All’. This was used on the assumption that sensible concepts would
appear atleast more than three times in the test data, while still pruning out all the
non-sensible words. It is worth mentioning at this point that the test data were all
in latex format.
• Test data 1: A data mining book. The first three chapters of Knowledge
Discovery in DB: The search for frequent patterns by Heikki Mannila and
Hannu Toivonen. This learning material contained text material written in a
book format that included chapters, sections, sub-sections and full sentences
explaining the concepts in the text, a total of 37 pages. In the context of this
chapter, this text material will be abbreviated as DM.
• Test data 2: Theoretical foundations of Computer Science by Wilhelmiinä
Hämäläinen. The text material consisted of slides with bullets and lists and
6.2. DATA MATERIAL
39
very few topic sentences, a total of 164 pages. In the context of this chapter,
this text material will be abbreviated as TFCS.
• Test data 3: The first 92 pages of Scientific Writing material by Wilhelmiinä
Hämäläinen. This contained a similar writing style to the TFCS, but contained more topic sentences on each concept. In the context of this chapter,
this text material will be abbreviated as SCIWRI.
• Human constructed maps: We had hand-drawn concept maps (drawn by
the author) of the above test data. See appendices F, G and H. These maps
were used as reference maps against which to compare the automatically constructed concept maps. The concepts and relations in the human constructed
maps were regarded as ”relevant”.
Figure 6.1 shows the a count of all the concepts and relations obtained from the
manually constructed concept maps for each test data respectively. From the human
constructed concept maps, 61 concepts were counted of which 38 were compound
concepts and 44 were nouns from the DM book. 73 relations were counted from
the DM book concept map. There were 56 concepts of which 34 were compound
concepts from the TFCS test data. 35 of the counted concepts were nouns and
61 relations counted from the TFCS concept map. From the SCIWRI human constructed map, a total of 102 concepts, of which 34 were compound concepts and 55
were nouns. 68 relations were counted from the SCIWRI test data.
Figure 6.1: Manually extracted concepts.
40
6.3
CHAPTER 6. TESTS AND EXPERIMENTS
Test measures
In determining how well ACMC works, we calculated precision, recall and f-measure.
Precision and recall are related measures which capture different aspects of comparison. In the context of this chapter, precision is defined as the fraction of retrieved
concepts that are relevant. In simpler terms, precision measures how well the ACMC
weeds out what is not wanted. Recall is defined as the fraction of relevant concepts
retrieved. Recall measures how well ACMC finds what is wanted. In many situations, the use of a single measure which combines precision and recall could be
appropriate for comparisons. In this context, F-measure summarize how well the
ACMC concepts match the human constructed map.
The formula used to calculate precision is as follows:
P recision =
|HC ∩ P C|
|P C|
The formula used to calculate recall is as follows:
Recall =
|HC ∩ P C|
|HC|
The formula used to calculate F-measure is as follows:
F − M easure = 2.
P recision · Recall
P recision + Recall
.
Here, HC stands for the set of concepts or relations found in the human constructed
concept maps and PC stands for the set of concepts or relations produced by ACMC.
6.4. RESULTS
6.4
41
Results
In this section, we present results obtained from the testing different components
on the ACMC. We also present the results obtained from the human constructed
concept maps. Later in the chapter, we make comparisons between the ACMC
constructed concept maps and the manually constructed.
6.4.1
Overview of ACMC concept maps
The first step in the ACMC is to access and load the three text files needed. To
test this component, we specified the three file names on the command line, and
the ACMC loaded the files successfully.
We tested that the ACMC was able to extract words from a given text, excluding
the stop words, and count their frequencies. The ACMC was able to identify plural
and singular words in the text, counting their frequencies, and merging if they both
occurred in the text. Figure 6.2 shows a section of the output produced by the
ACMC, extracted concepts and their frequencies. The ACMC was able to extract
potential relations, as shows by the sample of the results in figure 6.3.
Figure 6.2: Section of the results produced by the ACMC: Extracted words and their frequencies.
42
CHAPTER 6. TESTS AND EXPERIMENTS
Figure 6.3: Section of the results produced by the ACMC: Potential relations and their absolute
frequencies.
6.4.2
ACMC concept maps
Table 6.1 gives a summary of the results obtained after testing the ACMC with the
given test data. From the table, it can be seen that ACMC extracted 689 words with
≥ 3, excluding stop words, from the DM book. Out of the extracted concepts, 148
were compound words and 232 were found to be nouns. Of words with frequency
≥ 10, 181 concepts were extracted, 19 were compound words and 102 nouns. 103
extracted concepts had frequency ≥ 15, of which 10 were compound words, and 68
were nouns. The ACMC extracted 138 potential relations of which had frequencies
≥ 7.
The TFCS data had the ACMC extract 916 words of which 277 were compound concepts, 244 nouns. Of all the extracted concepts, 251 extracted word, 30 compound
words, and 97 nouns had frequency of ≥ 10. 191 extracted concepts, 15 compound
words and 67 nouns had frequency of ≥ 15. The ACMC extracted 498 potential
relations, with frequency ≥ 7 of which 19 were found to be sensible relations.
6.4. RESULTS
43
Table 6.1: Concepts extracted by the ACMC.
896 potential concepts were extracted when tested with the Scientific Writing material. Of these extracted concepts, 147 were compound words, 321 were nouns. 236
of the 897 extracted concepts, 5 were were compound words and 121 were nouns.
with frequency ≥ 10. 161 extracted concepts, 1 compound word and 72 nouns had
frequency ≥ 15. 73 relations were extracted.
It was observed that more sensible single word concepts, compound concepts and
relations had a higher frequencies. The highlighted rows in table 6.2 shows the most
frequent sensible and non-sensible concepts and relations.
Comparison to Human drawn maps
A comparison between the manually constructed and ACMC constructed concepts
maps was made, and the the results summarized in table 6.3. 32 concepts of which
14 were compound concepts, 24 nouns and no relations were found to appear in
both the manually constructed concept map and the ACMC constructed concept
maps, from the DM book test data. TFCS test data produced 31 concepts, of which
13 were compound concepts, 15 nouns and no relations in both the manually and
ACMC constructed concept maps. Of the 62 concepts in both the manually and
44
CHAPTER 6. TESTS AND EXPERIMENTS
Table 6.2: List of most sensible and non sensible extracted words.
6.4. RESULTS
45
Table 6.3: Concepts, compound concepts, nouns and relations that appear in both manual and
ACMC constructed maps.
ACMC constructed concept maps, 13 were compound words, 32 to be nouns and
no potential relations.
Precision, Recall and F-Measure
Precision, recall and F-measure of extracted concepts, compound words and relations for each test data was calculated at different thresholds. The results are
presented in table 6.4
From the table represented in table 6.4, it can generally be observed that the precision is lower than recall, with a few exceptions. The exceptions occurred when
calculating precision and recall of compound words with frequency of ≥ 7, apart
from the TFCS test data, where the precision was higher with compound words of
frequency of ≥ 10. It can also be observed, in most cases, that the precision of
extracted concepts increased while the recall decreased, as the frequency threshold
increased. A similar trend was observed with compound words and nouns.
The highest values of precision and recall were observed in within different thresholds, and different components(extracted concepts, compound words, nouns and
relations) tested.
Generally, SW has the highest precision (1) at compound words of frequency ≥ 15
followed by 0.6 at compound words of frequency ≥ 10 of the same test data, and
DM (0.5) at compound words of ≥ 15. DM has highest recall (0,6167) at extracted
concepts of frequency ≥ 3 followed by TFCS (0.5818) at nouns of frequency ≥ 3
and DM (0.4211) at compound words of frequency ≥ 3.
The highest precision values of different components were observed at frequency of
CHAPTER 6. TESTS AND EXPERIMENTS
46
Table 6.4: Precision, recall and F-Measure of different test data.
6.4. RESULTS
47
≥ 15, with relations being and exception. The highest recall values of all components
tested were observed at frequency of ≥ 3.
For two of the test data, DM and TFCS, it can be seen that there was a tendency for
the F-measure to increase with the increase in the frequency threshold for Extracted
word and Compound words, except for the highest threshold, where the F-measure
dropped. The F-measure decreased with the increase of the frequency threshold for
the SW test data. The F-measure decreased with the increase of the threshold for
Relations for all the test data. The highest F-measure (0.2857) was observed for
the Compound words DM test data.
The frequency threshold affected the results in that, as the threshold increased, more
relevant concepts were extracted as the number of retrieved concepts decreased.
The results depict better performance of the ACMC in retrieving compound concepts from the SW test data and extracting relevant concepts from DM test data.
Low precision and recall values observed in the relations component show that
ACMC did not perform well in extracting relevant relations from the test data.
From the observed measures above, we can conclude that ACMC works better
with DM test data than with TFCS or SW test data. This conclusion is based
on the observation that DM test data has a better precision/recall combination
and F-measure. DM test data produced reasonably high precision and recall values
despite the values falling within different thresholds. SW test data produced the
highest precision value (1), but did not offer a suitable recall combination. At this
point, the best frequency threshold to be used in ACMC cannot be determined as
the highest precision and recall values fall in different frequency thresholds.
Comparison to Leximancer
Each of the test data was run though Leximancer and the results can be seen
from Figure 6.4. The Leximancer extracted 116 from the DM book. 154 concepts
were extracted from the TFCS test data and 104 concepts were extracted from the
Scientific Writing test data. See appendices A, B and C for results obtained from
48
CHAPTER 6. TESTS AND EXPERIMENTS
Leximancer.
Figure 6.4: Concepts extracted from Leximancer.
Figure 6.4 summarizes the comparisons made between the manual concept maps
and the Leximancer concept maps. It was observed that 18 concepts from the
DM book appeared in both the manual concept maps and the Leximancer concept
maps. The same number of concepts was observed in the TFSC material. 19
concept appeared in the manually and Leximancer constructed concepts maps for
the Scientific Writing material.
Comparing the ACMC and Leximancer constructed concept maps led to the values
displayed in Figure 6.4. DM book and TFCS test data had 85 and 101 concepts
respectively appear in both the Leximance and ACMC extracted concepts .101 concepts from the Scientific Writing test data appear in the ACMC and the Leximancer
constructed concept maps.
Leximancer can be seen more of a graphical tool rather than statistical, therefore
comparisons of the relations extracted by the Leximancer could not be compared
to the relations extracted by ACMC or human maps.
6.5
Discussion
The experimental results showed that there is a difference in the number of concepts
extracted from the different test data. This can be attributed to the fact that the test
data had different contents, from different courses. In addition, the test data were
of different sizes. Another factor that contributes to the difference in the number
of extracted concepts, varying between the different test data, is the writing styles
used in the test data. Each test data had a different writing style used in delivering
the contents.
6.5. DISCUSSION
49
It was observed that, comparably, when a higher threshold of ≥ 15 was used, more
sensible concepts produced. This means that a larger number of sensible concepts
lied above frequencies of ≥ 15. It was observed that a slightly lower threshold, > 10
produced sensible compound concepts. We could explain these observations with
the phrase: ”the more a concept appears in the text, the more relevant it is”. A
higher threshold produced more sensible concepts. Based on the heuristics used,
if concepts are significant in the given text material, the concepts have a tendency
to occur frequently in the text. Other concepts in the text could occur, but less
frequently. This could mean that the concepts could either be less significant or
they are less inclusive concepts. More inclusive concepts occur frequently in a given
text as they encompass general ideas in the text, therefore, mentioned frequently.
On the other hand, less inclusive concepts cover specific areas in a text, hence, occur
less frequently in the text.
The results showed that a large number of extracted relations had frequencies of
≤ 3. A small number of extracted relations, which had frequencies of ≥ 3, were
seen as sensible relations. As the extraction of potential relations between concepts
relied on the extracted concepts, non-sensible concepts produced non-sensible relations. As the direction of the relationship is not indicated by the ACMC, relations
between concepts e.g. ’automaton’ and ’finite automaton’, and ’finite automaton’
and ’automaton’ are identified as one relationship.
It is important to note that the basic method used by the ACMC to extract concepts
and relations was based on term occurrence. No syntactic analysis or auxiliary
ontologies were used. The ACMC was not able to identify features such as equations,
tables, formulas and algorithms. Due to this, the ACMC was not able to differentiate
text in these features from the normal text in the text material. This could explain
why the ACMC produced some non-sensible concepts and relations.
Here, we refer to concepts and relations that are not relevant to the text material as
non-sensible. For instance, parts of equations, misspelt words, non-English words
(as some texts contained Finnish words) and verbs were considered as non-sensible
50
CHAPTER 6. TESTS AND EXPERIMENTS
concepts.
In the implementation of ACMC, we defined compound concepts as two consecutive
words. Therefore, ACMC is restricted to finding compound concepts that are only
two words long. This results in extracting incomplete and therefore non-sensible
compound concepts. For example, ACMC produced ’nondeterministic finite’ and
’finite automaton’ as a compound concept which should be in fact ’non-deterministic
finite automaton’. Extraction of non-sensible concepts in turn resulted in the extraction of non-sensible relations, for example, a relation between ’nondeterministic
finite’ and ’automaton’. In cases whereby, for instance, ’nondeterministic finite’ and
’finite automaton’ were extracted as having a relations, transitivity could be used as
a heuristic to identify longer concepts, in this case ’nondeterministic finite automaton’. Other heuristics could be integrated into the ACMC. For instance, checking
for concepts in headings; chapter and section headings, topic and introductory sentences of a paragraph and emphasized words. These considerations could produce
better results.
Additional heuristics that could be used in finding relations that are more sensible
could include checking for potential relations in topic sentences; as such, sentences
briefly introduce the concepts to be discussed. The ACMC checks for relations
between concepts in a single sentence. Potential relations could be checked for in
whole paragraphs as well. Significance and weight of relations could be assigned such
that, if relations appear between concepts in a sentence and within a paragraph,
then this weighs more than, if the relations appeared between the concepts in a
paragraph. The direction and names of the relations are not indicated by the
ACMC. This functionality could be implemented and added into the ACMC as a
future development.
There was an observable difference in the number of concepts extracted by the
ACMC and the number of concepts counted from the manually constructed maps.
This difference can be attributed to the style used in constructing the concept maps.
While the ACMC used frequency of occurrence in identifying and extracting con-
6.5. DISCUSSION
51
cepts from the text, the manual constructed concept maps did not have a particular
method used in its extraction of concepts and relations. In constructing the human
constructed concept maps, we read and understood the text, therefore coming up
with concepts and relations between the concepts in the test data provided. The
sensible concepts extracted by the ACMC included general concepts only, as they
occurred frequently, as compared to the less inclusive concepts. On the other hand,
human constructed concepts maps included both general and less inclusive concepts,
differentiated by its position in the map (we used a treelike/hierarchical map like
structure, with the more general concepts at the top and less inclusive concepts
at the bottom). A different approach from the one used by the ACMC was used
to establish relations between concepts in the manually constructed concept maps.
We relied on our understanding of the text, to discover relations between concepts.
Some of the relations between the concepts were not found in the same sentence
as the ACMC does. The ACMC extracted relations only if concepts appeared in
the same sentence. From the results, a large number of the relations extracted by
the ACMC was deemed non-sensible. This accounts for the difference in the results
produced by the ACMC and the manual concept maps.
52
CHAPTER 6. TESTS AND EXPERIMENTS
Chapter 7
Conclusions
In general, it is difficult for an individual to construct an effective concept map as the
concept maps vary from one individual to another. The individual’s understanding
and perspective of the subject concerned accounts for the variations of concept maps
of the same subject. Automatic or semi-automatic construction of concept maps
reduce this ”biasness” that might be caused by the individual in an attempt to
create a ”good” concept map.
In this research, we have discussed several semi-automatic and fully automatics
approaches for constructing concept maps. Semi-automatic construction of concept
maps requires some assistance from the user to complete the concept map. This
involves the tool retrieving information for the concept maps, suggesting concepts
or relations for the user. In automatic construction of concept maps, the whole map
is constructed automatically, with no assistance from the user.
In this thesis, we introduced a method for constructing concept maps automatically
from text. The method selects concepts based on the frequency of terms in the text
and their relations by the frequency of co-occurrence in the same sentence. Our
experiments suggest that the method can select well relevant concepts and especially
compound concepts, but the extraction of relevant relations would require further
research.
53
54
CHAPTER 7. CONCLUSIONS
Appendix A
DM book concept map produced
by Leximancer
55
56APPENDIX A. DM BOOK CONCEPT MAP PRODUCED BY LEXIMANCER
Figure A.1: Concept map for the DM book created by Leximancer.
57
Figure A.2: List of concepts from DM book extracted by Leximancer.
Appendix B
TFCS concept map produced by
Leximancer
58
59
Figure B.1: Concept map for the TFCS created by Leximancer.
60
APPENDIX B. TFCS CONCEPT MAP PRODUCED BY LEXIMANCER
Figure B.2: List of concepts from TFCS extracted by Leximancer.
Appendix C
Scientific Writing concept map
produced by Leximancer
61
62APPENDIX C. SCIENTIFIC WRITING CONCEPT MAP PRODUCED BY LEXIMANCER
Figure C.1: Concept map created for Scientific Writing material created by Leximancer.
63
Figure C.2: List of concepts from Scientific Writing material extracted by Leximancer.
Appendix D
Stop words list
the
of
and
to
a
in
that
is
he
for
it
with
as
his
on
be
at
by
I
this
not
are
but
from
have
an
they
which
one
you
were
her
all
she
there
would their
we
him
been
has
when
who
will
more
no
if
out
so
said
what
up
its
about
into
than
them
can
only
other
new
some
time
could
these
two
may
then
do
first
any
my
now
such
like
our
over
man
me
even
most
made after
also
did
many
before
must
where
much
your
way
well
through back
years
down
etc
might how
or
x
s
r
p
g
e
c
b
section z
u
t
o
n
m
l
k
f
d
h
j
q
v
w
y
too
There were a few additions to the top 100 stop words from http://www.edict.com.
hk/TextAnalyser/wordlists.htm. Such additions included letters of the alphabet
and a few words, for instance ”section”. These additions were made after making
64
65
several tests using the test material given. It was observed that the results contained
single letters of the alphabets as concepts as they were used frequently in the text.
The text also contained words such as ”section”, ”chapter” and ”example”. These
are among the latex commands used in the text.
66
67
Appendix E
Irregular plurals
Plural
Singular
halves
half
live
life
axes
axis
matrices
matrix
children
child
people
person
automata
automaton
vertices
vertex
indices
index
appendices
appendix
theses
thesis
parentheses parenthesis
analysis
analysis
bases
basis
emphases
emphasis
series
series
criteria
criterion
phenomena
phenomenon
mice
mouse
cargo
cargo
Appendix F
Hand-made concept map from
TFCS
68
69
Figure F.1: Hand-made concept map constructed from TFCS test data.
Appendix G
Hand-made concept map from
Scientific writing material
Note: The figure shows part of the hand constructed concept map from the Scientific
Writing material.
70
71
Figure G.1: Hand-made concept map constructed from Scientific writing test data.
Appendix H
Hand-made concept map from
Data mining material
Note: The figure shows part of the hand constructed concept map from the Data
mining material.
72
73
,
i
\I
Figure H.1: Hand-made concept map constructed from Data mining test data.
Bibliography
[AKM+ 03]
Harith Alani, Sanghee Kim, David E. Millard, Mark J. Weal, Wendy
Hall, Paul H. Lewis, and Nigel R. Shadbolt. Automatic ontology-based
knowledge extraction from web documents. IEEE Intelligent Systems,
18(1):14–21, 2003.
[APC01]
A. O Alves, F. C. Periera, and A. Cardoso. Automatic reading and
learning from text. In ISAI’2001: In Proceedings of the International
Symposium on Artificial Intelligence, pages 302–310, December 2001.
[BB00]
E. Bruillard and G.L. Baron. Computer-based concept mapping: a
review of a cognitive tool for students. In ICEUT ’00: Proceedings of
Conference on Educational Uses of Information and Communication
Technologies, pages 331–338. Publishing House of Electronics Industry
(PHEI), 2000.
[BCW02]
Christopher Brewster, Fabio Ciravegna, and Yorick Wilks.
User-
centred ontology learning for knowledge management. In NLDB ’02:
Proceedings of the 6th International Conference on Applications of
Natural Language to Information Systems-Revised Papers, pages 203–
207, London, UK, 2002. Springer-Verlag.
[BR]
R. Byrd and Y. Ravin. Identifying and extracting relations in text.
[CCA+ 02]
Alberto J Cañas, Marco Carvalho, Marco Arguedas, DB Leake, A Maguitman, and T Reichherzer. Mining the web to suggest concepts dur74
BIBLIOGRAPHY
75
ing concept mapping: Preliminary results. XIII Simpósio Brasileiro
de Informática na Educa cao, 2002.
[CCH+ 03]
J.W. Coffey, A.J. Canas, G. Hill, R. Carff, T. Reichherzer, and N. Suri.
Knowledge modeling and the creation of el-tech: a performance support and training system for electronic technicians. Expert Systems
with Applications, 25:483–492(10), November 2003.
[CGR91]
Gail A. Carpenter, Stephen Grossberg, and David B. Rosen. Fuzzy
art: Fast stable learning and categorization of analog patterns by an
adaptive resonance system. Neural Netw., 4(6):759–771, 1991.
[CLR86]
M. Callon, J. Law, and A. Rip, editors. Mapping the dynamics of science and technology: Sociology of science in the real world. Macmillan,
London, 1986.
[Coo]
James W. Cooper. Visualization of relational text information for
biomedical knowledge discovery.
[CSRL01]
Thomas H. Cormen, Clifford Stein, Ronald L. Rivest, and Charles E.
Leiserson. Introduction to Algorithms. McGraw-Hill Higher Education,
2nd edition, 2001.
[DDF+ 00]
Scott C. Deerwester, Susan T. Dumais, George W. Furnas, Thomas K.
Landauerr, and Richard A. Harshman. Indexing by latent semantic
analysis. Journal of the American Society for Information Science,
41:391– 407, 2000.
[dRdCJF04] F.E.L. da Rocha, J.V. da Costa Jr, and E.L. Favero. A new approach
to meaningful learning assessment using concept maps: ontologies and
genetic algorithms. In CMC ’04: Proceedings of the First International
Conference on Concept Mapping, 2004.
76
[DSB]
BIBLIOGRAPHY
David L. Darmofal, Diane H. Soderholm, and Doris R. Brodeur. Using concept maps and concept questions to enhance conceptual understanding.
[Fel98]
Christiane Fellbaum, editor. WordNet: An Electronic Lexical Database
(Language, Speech, and Communication). The MIT Press, May 1998.
[FMG05]
Blaz Fortuna, Dunja Mladenic, and Marko Grobelnik. Semi-automatic
construction of topic ontology. In Conference on Data Mining and
Data Warehouses (SiKDD 2005), October 2005.
[FYI02]
Hideo Funaoi, Etsuji Yamaguchi, and Shigenori Inagaki. Collaborative concept mapping software to reconstruct learning processes. In
ICCE ’02: Proceedings of the International Conference on Computers
in Education, page 306, Washington, DC, USA, 2002. IEEE Computer
Society.
[GS94]
Brian R. Gaines and Mildred L. G. Shaw. Using knowledge acquisition
and representation tools to support scientific communities. In AAAI
’94: Proceedings of the twelfth national conference on Artificial intelligence (vol. 1), pages 707–714, Menlo Park, CA, USA, 1994. American
Association for Artificial Intelligence.
[HBN96]
H. E. Herl, E. L. Baker, and D. Niemi. Construct validation of an
approach to modeling cognitive structure of u.s. history knowledge. In
Journal of Educational Research, volume 89, pages 206–218, 1996.
[HCCN02]
Robert R. Hoffman, John W. Coffey, Mary Jo Carnot, and Joseph D.
Novak. An empirical comparison of methods for eliciting and modeling expert knowledge. In Proceedings of the Human Factors and
Ergonomics Society 46th Annual Meeting. Human Factors and Ergonomics Society, 2002.
BIBLIOGRAPHY
[Jak03]
77
Maria Jakovljevic. Concept mapping and appropriate instructional
strategies in promoting programming skills of holistic learners. In
SAICSIT ’03: Proceedings of the 2003 annual research conference
of the South African institute of computer scientists and information
technologists on Enablement through technology, pages 308–315, Republic of South Africa, 2003. South African Institute for Computer
Scientists and Information Technologists.
[JBS00]
J.A. Johnsen, D.E. Biegel, and R Shafran. Concept mapping in mental health: uses and adaptations. Evaluation and Program Planning,
23:67–75(9)0, February 2000.
[JMF99]
A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review.
ACM Comput. Surv., 31(3):264–323, 1999.
[Joa99]
Thorsten Joachims. Making large-scale SVM learning practical. In
B. Schlkopf, C. Burges, and A. Smola, editors, Advances in Kernel
Methods - Support Vector Learning. MIT Press, 1999.
[Kol93]
Janet Kolodner. Case-based reasoning. Morgan Kaufmann Publishers
Inc., San Francisco, CA, USA, 1993.
[Lea96]
David B. Leake. Case-Based Reasoning: Experiences, Lessons and
Future Directions. MIT Press, Cambridge, MA, USA, 1996.
[Liu03]
H Liu. MontyTagger. Cambridge, Mass, 1.2 edition, 2003.
[LMR+ 03]
David B. Leake, Ana Maguitman, Thomas Reichherzer, Alberto J.
Cañas, Marco Carvalho, Marco Arguedas, Sofia Brenes, and Tom Eskridge. Aiding knowledge capture by searching for extensions of knowledge models. In K-CAP ’03: Proceedings of the 2nd international
conference on Knowledge capture, pages 44–53, New York, NY, USA,
2003. ACM Press.
78
[MBF+ 90]
BIBLIOGRAPHY
George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek
Gross, and Katherine J. Miller. Introduction to wordnet: An on-line
lexical database. Int J Lexicography, 3(4):235–244, January 1990.
[MF90]
S. Muggleton and C. Feng. Efficient induction of logic programs. In
Proceedings of the 1st Conference on Algorithmic Learning Theory,
pages 368–381. Ohmsma, Tokyo, Japan, 1990.
[MMJ94]
K. M. Markham, J. J. Mintzes, and M. G. Jones. The concept map as
a research and evaluation tool: Further evidence of validity. Journal
of Research in Science Teaching, 31:91–101, 1994.
[MS01]
Alexander Maedche and Steffen Staab. Learning ontologies for the
semantic web. In Semantic Web 2001 (at WWW10), May 1, 2001,
Hongkong, China, 2001.
[MS04]
Alexander Maedche and Steffen Staab. Ontology learning. In Steffen
Staab and Rudi Studer, editors, Handbook on Ontologies, International
Handbooks on Information Systems, pages 173–190. Springer, 2004.
[MSS99]
J.R. McClure, B. Sonak, and H.K. Suen. Concept map assessment of
classroom learning: Reliability, validity, and logical practicality. Journal of Research in Science Teaching, page 475492, 1999.
[NC06]
Joseph D. Novak and Alberto J. Caas. The theory underlying concept
maps and how to construct them. Technical report, Florida Institute
for Human and Machine Cognition, 2006.
[NG84]
Joesph D. Novak and Bob Gowin. Learning how to learn. Cambridge
University Press, Cambridge, 1984.
[nHC+ 04]
A. J. Ca nas, G. Hill, R. Carff, N. Suri, J. Lott, and T. Eskridge.
Cmaptools: A knowledge modeling and sharing environment. In A. J.
BIBLIOGRAPHY
79
Canas, J. D. Novak, and F. M. Gonzalez, editors, Concept maps:
Theory, methodology, technology. Proceedings of the first international
conference on concept mapping, volume I, pages 125–133. Universidad
Publica de Navarra, Pamplona, Spain, 2004.
[NVG03]
Roberto Navigli, Paola Velardi, and Aldo Gangemi. Ontology learning and its application to automated terminology translation. IEEE
Intelligent Systems, 18(1):22–31, 2003.
[PC00]
Francisco C. Pereira and Amı́lcar Cardoso. Clouds: A module for automatic learning of concept maps. In Michael Anderson, Peter Cheng,
and Volker Haarslev, editors, Diagrams, volume 1889 of Lecture Notes
in Computer Science, pages 468–470. Springer, 2000.
[POC00]
Francisco Câmara Pereira, A Oliveira, and Amı́lcar Cardoso. Extracting concept maps with clouds. In Proceedings of the Argentine Symposium of Artificial Intelligence (ASAI), 2000.
[RF05]
Ryan Richardson and Edward A. Fox. Using concept maps in digital libraries as a cross-language resource discovery tool. In JCDL
’05: Proceedings of the 5th ACM/IEEE-CS joint conference on Digital
libraries, pages 256–257, New York, NY, USA, 2005. ACM Press.
[RFP06]
A.B. Rendas, M. Fonseca, and P.R. Pinto. Toward meaningful learning in undergraduate medical education using concept maps in a pbl
pathophysiology course. Adv Physiol Educ, 30(1):23–9, 2006.
[RGFP06]
Ryan Richardson, Ben Goertzel, Edward A Fox, and Hugo Pinto. Automatic creation and translation of concept maps for computer sciencerelated theses and dissertations. In Proceedings of 2nd Concept Mapping Conference, pages 160–163. Citeseer, 2006.
80
[RRS98]
BIBLIOGRAPHY
Diana C. Rice, Joseph M. Ryan, and Sara M. Samson. Using concept
maps to assess student learning in the science classroom: Must different methods compete? Journal of Research in Science Teaching,
35(10):1103–1127, 1998.
[RT02]
Kanagasabai Rajaraman and Ah-Hwee Tan. Knowledge discovery from
texts: a concept frame graph approach. In CIKM ’02: Proceedings of
the eleventh international conference on Information and knowledge
management, pages 669–671, New York, NY, USA, 2002. ACM Press.
[RW33]
Y. Ravin and N. Wacholder. Extracting names from natural-language
text, 2033.
[Sai01]
H. Saito. A semi-automatic construction method of concept map based
on dialog contents, 2001.
[Sal91]
Gerard Salton. Developments in automatic text retrieval. Science,
253:974–979, 1991.
[SH05]
Andrew E. Smith and Michael S. Humphreys. Evaluation of unsupervised semantic mapping of natural language with leximancer concept
mapping. Behaviour Research Method, 38:262–279, 2005.
[SKUP+ 04] J.E. Sims-Knight, R.L. Upchurch, N. Pendergrass, T. Meressi,
P. Fortier, P. Tchimev, R. VonderHeide, and M. Page. Using concept maps to assess design process knowledge. In FIE 2004: Frontiers
in Education, volume F1G-6-10 Vol 2, October 2004.
[Smi05]
Andrew E. Smith. Leximancer manual (Version 2.2). University of
Queensland, 2005.
[SRF03]
Rao Shen, Ryan Richardson, and Edward A. Fox. Concept maps as
visual interfaces to digital libraries: summarization, collaboration, and
automatic generation, may 2003.
BIBLIOGRAPHY
[ST93]
81
D. D. Sleator and D. Temperley. Parsing english with a link grammar.
In Third International Workshop on Parsing Technologies, 1993.
[SWST04]
Pei-Chi Sue, Jui-Feng Weng, Jun-Ming Su, and Shian-Shyong Tseng.
A new approach for constructing the concept map. icalt, 0:76–80, 2004.
[SZL14]
Chen Sun, Ming Zhao, and Yangjing Long. Learning concepts and
taxonomic relations by metric learning for regression. Communications
in Statistics - Theory and Methods, 43(14):2938–2950, 2014.
[Tex]
Word frequency list.
http://www.edict.com.hk/TextAnalyser/
wordlists.htm. Virtual Language Center, University of Victoria,
Hong Kong. Accessed: 2006-09-30.
[TTL01]
Chang-Jiun Tsai, Shian-Shyong Tseng, and Chih-Yang Lin. A twophase fuzzy mining and learning algorithm for adaptive learning environment. In ICCS ’01: Proceedings of the International Conference
on Computational Science-Part II, pages 429–438, London, UK, 2001.
Springer-Verlag.
[WH02]
Shih-Hung Wu and Wen-Lian Hsu. Soat: A semi-automatic domain
ontology acquisition tool from chinese corpus. In COLING, 2002.
[Wie96]
Anna Wierzbicka. Semantics: Primes and Universals. Oxford University Press, New York, 1996.
[WSL06]
Ryen W. White, Hyunyoung Song, and Jay Liu. Concept maps to
support oral history search and use. In Gary Marchionini, Michael L.
Nelson, and Catherine C. Marshall, editors, JCDL, pages 192–193.
ACM, 2006.
[ZS01]
GuoDong Zhou and Jian Su. Named entity recognition using an hmmbased chunk tagger. In ACL ’02: Proceedings of the 40th Annual
82
BIBLIOGRAPHY
Meeting on Association for Computational Linguistics, pages 473–480,
Morristown, NJ, USA, 2001. Association for Computational Linguistics.