Extraction of Synonyms in User-Generated Content Bachelor Thesis

Extraction of Synonyms in
User-Generated Content
Bachelor Thesis
Alex Oberhauser
[email protected]
Supervisor: Dr. Anna Fensel
STI Innsbruck, Austria
University of Innsbruck, Austria
January 4, 2011
1
Abstract
The Semantic Web offers a great opportunity to gain more information, not only in quantity but also in quality, from the existing data than
it is possible with the current web. One major improvement is the identifying of new relations between objects. This thesis addresses the problem
of synonym computation to form a search query, that is relevant for the
current context. For this purpose the work focused on the gain, the evaluation and the classification of the current context on the one side and
the gain of synonym sets on the other side. After the two information
clouds are computed the intersection returns all suitable synonyms in this
context. Additional the context-aware synonyms are extended with Hyponyms and Hypernyms.
To achieve the goal of the context computation I developed and analyzed
a web crawler to gain context information from a bunch of public RDF
files that describes these user.
The final work was then integrated, in a slightly adapted form, into the
m:Ciudad framework.
Keywords
Context-Aware Synonyms, Semantic Web, Linked Open Data, RDF,
m:Ciudad, UDL
2
Contents
1 Introduction
4
2 Motivation & Problem Statement
5
3 Approach
3.1 Lexicographical Synonyms (Core) . .
3.2 Conceptional Synonyms (Extension)
3.3 Context . . . . . . . . . . . . . . . .
3.4 Intersection Algorithm . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8
9
12
12
19
4 Implementation
4.1 Used Technology . . . . . . . . .
4.2 Architecture . . . . . . . . . . . .
4.2.1 Package: synonyms . . . .
4.2.2 Package: context . . . . .
4.3 Server/Client Architecture . . . .
4.3.1 Server . . . . . . . . . . .
4.3.2 Client . . . . . . . . . . .
4.4 Integrated Version - The Library
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
21
21
22
23
25
26
26
26
27
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 Evaluation
27
6 Further Work
28
A Appendix - RDF example files
30
B Appendix - Android Client
39
3
1
Introduction
The next generation web will be not only more interlinked [20] then today known
web, but also available on many mobile devices. The availability of the data
on the go expands the context and makes it possible to guess the semantic of a
search query more precisely. For example the search could now depend on the
location or on the personal interests. Thinkable is also that the appointments
in the calendar are used to check if special synonyms are useful for the current
context.
The thesis was written in the research area of Semantic Web [25]. The term
Semantic Web was shaped by Tim Berners Lee [4]. It extends the current Social
Web (Web 2.0) with semantics. The main goal of the extension is to have a web
that is machine readable and understandable throughout different systems. One
widely used approach, that is also used in this thesis, is the use of RDF [27]
as data exchange format with RDF Schema [26] as language definition. RDF
consists of triples in the form of <Subject, Predicate, Object>. This concept in
combination with unique ontology URIs describes the most real world relations.
For exchange the XML syntax is commonly used. That makes the technology
compatible with the current web technology. Other representations are N3 [24],
N-Triples [23], TRiG [11], TRiX [18], Turtle [30] and RDFa [28]. The success
of this technology is based on the linking of more RDF files. This concept is
named Linked Data [20]. The linked structure makes it possible to crawl over a
network of RDF files knowing only one starting point. Additional there exists a
query language named SPARQL [29]. SPARQL is used to query the stored data
set with the help of the triple format. The query in Listing 1 returns all name,
e-mail combination from all FOAF [5] entities in the data set. Such a query has
two different term types. One are variables, defined by the question mark as
prefix the other are constants that have the namespace as prefix separated with
a colon from the property, this makes the term unique over all properties.
Listing 1: SPARQL example
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?mbox
WHERE
{
?x foaf:name ?name .
?x foaf:mbox ?mbox
}
One major scenario of context-aware synonyms are mobile search queries that
are related to the current context of the user. The available context information
could be used to enrich the search query or to filter unneeded search results.
Either way the information could be improved by the developed algorithm. The
application area for mobile devices is important because there we have context
information and limited resources to visualize a lot of information.
The work starts with the specification of the problem in section 2 and why
such context-aware synonyms are needed in future applications. After that, in
section 3 a conceptional solution how to compute such sets of synonyms that
suites for a concrete context is given. The next section (4) explains the theoret4
ically solution on the base of a concrete implementation. The last two sections
evaluate the current work and related works and give an outlook to further
works.
2
Motivation & Problem Statement
With the storage of knowledge in a way that everybody can easily access it
and with the presence of all knowledge on a worldwide network, the problem of
searching suitable information has shifted from searching information (quantity)
to filtering the gained information (quality). The problem is not anymore to
find information about special topics, but to filter the mass of gained data and
convert it, if needed, to a machine readable format. Based on the search term
this could be a time intense work and possibly not always successful. Even
worse is this fact on mobile devices that have limited resources and the user not
always the time to search intensely for information.
The context-awareness will be even more important on the Future Internet
[16] where each thing is connected to the internet or at least to a subnetwork. In
such scenario the context could be described more precisely than it is possible
with the today’s technology. More information means not only that the context
could be described more precisely, but also that it is more important to filter
the gained data.
The problem that this thesis tries to solve is the ambiguity of the natural
language with the help of the current environment or context. Although synonyms are by definition words with the same meaning, is possible that one input
word evaluates to more, semantically different, synonym blocks. For example
AI could evaluate to two semantically different synonym blocks. On for the
term Artificial Intelligence and the other for Amnesty International. To find
the right blocks for the current situation the context information are evaluated
and on the base of the result the right synonym blocks are chosen.
To formalize the problem we introduce the following sets:
Sinput = ∪i∈I Si
...
Cagent = CDY N ∪ CST AT
...
R
...
Synonyms for the input term input. Si is one
synonym block (synset), with synonyms with
the same semantic.
The context information of the agent that
triggers the search. CDY N is the dynamic and
CST AT is the static context.
The context-aware synonym result set.
The two sets Sinput and Cagent are gathered from different sources. On the
base of the this two sets the result set R is computed. The following formula
shows how the result is defined.
∃i(∃t(t ∈ Si ∧ t ∈ Cagent ) =⇒ ∀r(r ∈ Si ∧ r ∈ R))
The formula above claims that there exists a term that is part of one (or
maybe more) synonym block(s) and the context set. That implies that all terms
in the found synonym block(s) are also part of the result set. If this statement
holds we have found context-aware synonyms.
This thesis does not address the problem of verifying data, neither the filtering of a concrete search result. The focus of this work is to compute suitable
5
synonyms for the current context. A possible use scenario is the filtering of a
search result with the help of computed synonyms of a input search term.
Let’s consider a first scenario of the search of AI. AI is a abbreviation that
has more than one meaning and suites perfectly to show how important it is to
filter the mass of information gained from a simple search. In our scenario AI
means Artificial Intelligence, but on other scenarios is possible that the same
abbreviation means Amnesty International.
As precondition the context cloud of the agent that triggers the search should
include the term Artificial Intelligence or some related term that is part of the
same synonym block. Later, in a second scenario, we show that this term could
be also part of the context of a related entity. This indirect context evaluation
has the effect that the result has a lower accuracy.
Scenario 1 - Initial Situation
A search of AI without the help of a context filtering includes approximately
449 million search results1 . The first two results shown in the search excerpt in
Figure 1 are about the topic artificial intelligence and the third about Amnesty
International. Although we had a clear topic in mind what we expect to find,
we do not were able to find only the right results with our short and ambiguous
search term.
Figure 1: Excerpt of the search result AI searched on http://www.google.
com (not filtered)
Scenario 1 - Improved Situation
The Semantic Web represents each concept in the form of URIs (short for
Uniform Resource Identifier ), or the successor IRIs (Internationalized Resource
Identifier ). Such a concept identifier is concatenated from a ontology URI/IRI
and the concept name. The gained concept, assumptive the same ontology is
used, is not any more ambiguous. The developed algorithm goes a step further
and combines disambiguation with context information. The gained results are
not only about a unique topic2 , but also relevant for current situation.
1 Searched
on 11. May 2010 on http://www.google.com
is possible that results are found about more than one topic if the context information
and the synonym matches for more than one subject. If this is the case the results are
2 It
6
As we have seen in the previous scenario there is a need for filtering of
search results. Let’s consider the same scenario, but now with a wrapper around
the search engine that removes all results that do not include at least one of
our context-aware synonyms3 . The exact computation will be discussed in the
following chapters for now it is only important that the context of the user
includes tags that describes artificial intelligence and not anymore other search
results that do not occur in the context.
Figure 2: Excerpt of the search result AI searched on http://www.google.
com (filtered)
Scenario 2 - Related Context Information
The second scenario shows how the algorithm should react if there is no useful
information in the context of the searching agent, but in the context of a related
context data set.
Let’s consider that an agent searches for the term dive and there is no related
information in his context cloud, but in a known agent context we found the
term swimming. With the help of a context similarity algorithm we reach the
information that the two agents are to 75 % similar. The search result should
now include the terms swim, swimming, diving and dive with the priority 0.75
(or 75 %).
The thesis should show that the use of context information in combination
with synonym computation is a powerful mechanism to improve the quality
of gained data sets. The developed algorithm could be used in a variety of
prioritized.
3 The wrapper around the search engine is not part of the thesis, but should give a use
scenario how the context-aware synonym library could be used in a practically manner.
7
use cases, such as filtering of search queries on the internet or for suggestion
of free time activities on mobile devices. Another real world example is the
suggestion of movies. In such case the developed algorithm could support your
choice by computing the synonyms for the keywords of each movie and then by
filtering the computed set on the base of the agent’s context. The first step, the
computation of synonyms, is needed to receive a higher probability for a match.
The second one takes into account your preferences.
3
Approach
The following section describes the general approach that is used for the formal
definition of the solution and the implementation. The explained phases will be
deepening into the next section in more detail.
Figure 3: The workflow of the computation of context-aware synonyms.
As you can see in Figure 3 the computation starts with the interaction with
an agent. Important here is that an agent does not have to be a human being,
but could be also a piece of software.
There are two types of interaction with the entity. First there is a ”passive”
or indirect way of communication. In this phase there will be generated, from
different sources that are related to the agent, the context that is relevant for
8
the next phase of interaction. The gained data could be stored in a RDF file,
but a better solution is to store this data in a more sophisticated solution, such
as a RDF repository. At least the static context changes not very often and is
used for each computation. The easy access through a SPARQL endpoint and
the scalability of a RDF repository are big advantages.
The next phase is the ”active” part, where an input string is given. This
string is used for the input to query different dictionaries that returns the synonyms. A dictionary that suites perfect for the computation of context-aware
synonyms is WordNet from the Princeton University [21]. The output are
grouped into blocks, called synsets. Each synonym in a synset has semantically the same meaning and the block has a natural language description of the
meaning with example sentences. To simplify the computation in later steps
and to make it possible to expand the approach with other dictionaries the
data will be abstracted to a RDF file. Additional to the simplification of the
computation, it hides also the underlying implementation of the data retrieval
part with the advantage to be able to expand the software in a later step with
additional dictionaries without changing the algorithm that is responsible for
the generation of context-aware synonyms. After the gain of synonyms it is
possible to compute for each synonym in the newly gained set the translation
of this word. In this step there is used the first time the context to receive the
languages that are spoken by the agent. The expanded set is written to the
same RDF file for the synonyms that we have saved in the previous step. This
generated file is called in the scope of this work synonym cloud, analog to the
context cloud.
The preparation for the algorithm consists of two major parts. One is the
computation of synonyms and the other the computation of context information.
After that the intersection of the two sets is performed with the help of a
SPARQL query. This approach is possible through the abstraction of different
data sets to the RDF format.
As optional and less accurate extension the result set could be extended by
super- and sub-concept (called also Hypernyms and Hyponyms). This method
should only be used if there are needed more context-aware synonyms and a
high accuracy is not an obligational condition.
3.1
Lexicographical Synonyms (Core)
The first major task of the process is to compute all synonyms of a given input
word. In this step it does not matter if the input comes from a human being, a
group or a piece of software. Later, when we have to compute a context in that
the given synonyms were given, we will see that this makes a difference.
As the computation of synonyms is a well-researched area there are a lot of
databases that could be used. For the implementation part there will be used
WordNet [21] and LexVo [9]. Another possibility is to use a library that simplifies the dictionary access, such as Apache Lucence [10]. If there are also non
English search terms the library has to be extended by different dictionaries for
this language. On the other site the extension with domain specific dictionaries
are possible too. The extension guarantees for different use scenarios the most
accurate output.
WordNet [21] is a English lexical database that suites perfectly for the current work, because the search result are grouped in so called synsets. Synsets are
9
blocks of synonyms with associated example sentences and explanations what
the synonyms in the block means. This structure suite perfectly to make a first
pre-computation, to gain logical blocks of synonyms that have semantically the
same meaning. To be able to make computation on the data, for example to
unify the data with the context, it is needed to transform the data into RDFdata
structure. Once the newly, well-formed data structure was gained it is possible
to extend each synonym in the block with multilingual synonyms. At this point
the LexVo [9] database is used. Although there are a lot of good multilingual
dictionaries around, LexVo [9] is used because the easy access (unique URI for
each word), the RDF format and the fact that the dictionary searches to an
English word all multilingual translations. With the help of the information
what language the user speaks it is possible to gain a first context-awareness set
of results.
Through the conversion of the gained synonyms to a unique format it is
possible to extend or substitute the computation with other dictionaries. Another advantage is the normalization of the terms, to be able to reach a match
also if the spelling differs slightly. Thinkable for example is the extension with
technical dictionaries for special domains. The used ontology to store the data
is SKOS [7]. SKOS is an abbreviation for Simple Knowledge Organization System and is used to store knowledge about objects in a semantic way. For the
conversation from the source data format to the SKOS ontology and for the
querying of the abstracted data sets the framework uses Jena [2].
The excerpt of an RDF file in Listing 2 shows how a synonym block looks
like.
Listing 2: Synset block in raw RDF format.
<skos:Collection rdf:about="http://example.org/synonyms/AI_noun.
cognition-def2">
<rdf:object>nouns denoting cognitive processes and contents</rdf:
object>
<skos:scopeNote>noun.cognition</skos:scopeNote>
<skos:definition>the branch of computer science that deal with
writing computer programs that can solve problems creatively</
skos:definition>
<skos:example>workers in AI hope to imitate or duplicate
intelligence in computers and robots</skos:example>
<skos:hasMember>
<rdf:Bag rdf:about="http://example.org/synonyms/computer_science
">
<skos:note>computerscience</skos:note>
<skos:altLabel xml:lang="en">computer_science</skos:altLabel>
</rdf:Bag>
</skos:hasMember>
<skos:hasMember>
<rdf:Bag rdf:about="http://example.org/synonyms/computing">
<skos:note>computing</skos:note>
<skos:altLabel xml:lang="en">computing</skos:altLabel>
</rdf:Bag>
</skos:hasMember>
<skos:hasMember rdf:resource="http://example.org/synonyms/AI"/>
<skos:hasMember>
<rdf:Bag rdf:about="http://example.org/synonyms/
artificial_intelligence">
<skos:note>artificialintelligence</skos:note>
<skos:altLabel xml:lang="en">artificial_intelligence</skos:
altLabel>
10
</rdf:Bag>
</skos:hasMember>
</skos:Collection>
The synonym blocks are not sorted and hard to read for human being. With
the help of XSLT it is possible to output the result in a human readable form.
The XSLT shown in Listing 3 outputs the data as shown in Figure 4.
Listing 3: Synonym cloud XSLT for visualization in HTML
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:skos="http://www.w3.org/2009/08/skos-reference/skos.rdf#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<xsl:template match="/">
<html>
<body>
<div style="text-align:center">
<h1><font color="red">Synonym Blocks gained from CAS
framework</font></h1>
</div>
<xsl:for-each select="//skos:Collection">
<b>Definition: </b> <xsl:value-of select="skos:definition"/><
br/>
<b>Category: </b> <xsl:value-of select="skos:scopeNote"/> (<
xsl:value-of select="rdf:object"/>)<br/>
<b>Example: </b> <xsl:value-of select="skos:example"/><br/>
<ul>
<xsl:for-each select="skos:hasMember/rdf:Bag">
<li> <xsl:value-of select="skos:altLabel"/><br/>
<i>normalized: </i> <xsl:value-of select="skos:note"/></
li>
</xsl:for-each>
</ul>
<p/>
</xsl:for-each>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Figure 4: Excerpt of synset block visuzalized in HTML
11
3.2
Conceptional Synonyms (Extension)
A useful extension to the lexicographical synonyms (see section 3.1) are conceptional synonyms that extend the input with super concepts (Hypernyms) and
sub concepts (Hyponyms).
In difference to the lexicographical synonyms the conceptional synonyms
are not so accurate. For this reason they are implemented as extension to the
context-aware synonyms.
The following tree in Figure 5 shows the extension of the input artificial
intelligence with sub- and super-concept of deep one.
Figure 5: Excerpt of Hypernyms and Hyponyms for the search term artificial
intelligence
They are not so accurate because they extend the result set with additional
information and related to the deep of the search (the tree above has deep one,
because we have only direct sub- or super-concepts) this extension could be to
general or too specific. During the tests of the implementation a maximum depth
of two was useful. If there is no need for a payload of additional information
a deep of one is enough. The depth depends heavily on the use case of the
extension, so the deep could be dynamically changed depending on the use case
on that the library is used.
Through the wide range of the data set from dbpedia.org [22] the results from
the previous step could be extended with a lot of additional terms from the given
taxonomy. This extension lowers the quality of the search result. That means
that in the worst case the computed context-aware synonyms are extended with
terms that are not related to the current context and only weak related to the
synonyms. For this reason this part was only developed for the integration into
the m:Ciudad [14] project and is not part of the main context-aware synonym
algorithm.
3.3
Context
The second major information cloud, beside the synonyms, is the context. It is
useful to split the context into two major categories. One is the Static Context
and the other is the Dynamic Context. The separation is made, because the
12
computation on the two sets differs and additional the Static Context should not
be touched if something in the Dynamic Context changes. Another advantage
that comes with the distinction is that two context types could have different
priorities. Logically the same context splitting was made in [15], although with
different notations. There the dynamic context was called environmental context
and the static context was called personal context.
Although the context is classified into two parts, throughout the implementation the two context types are stored in the same format. The difference is
indicated only by the related priority4 . The priority value could be chosen and
is related to the scenario where the algorithm is used. For example in a scenario
where there are searched long term synonyms the static context could have a
higher priority. The principle behind the abstraction is the same as described
in the section 3.1. Now we do not use the SKOS [7], but the SCOT [3] ontology.
SCOT is an abbreviation for Social Semantic Cloud of Tags. Listing 4 shows an
excerpt of a context tag in raw RDF format. The list:priority defines the priority that indicates if the term is part of the static or dynamic context. The term
in the example is part of the static context. This relation is implicit through
the fact that we have defined in the current example that the static context has
the priority 1.0. There is no other indication what type of context the term is
related to. That makes the data structure very flexible for extension.
Listing 4: Excerpt of the context term computer science
<scot:cooccure_tag>
<scot:Tag rdf:about="http://example.org/context/computerscience
">
<scot:name>computerscience</scot:name>
<scot:own_afrequency>2</scot:own_afrequency>
<scot:own_rfrequency>1.3605442</scot:own_rfrequency>
<list:priority>1.0</list:priority>
<scot:synonym>computerscience</scot:synonym>
<scot:synonym>Computer Science</scot:synonym>
<scot:cooccure_with>http://koni.networld.to/foaf.rdf#me</
scot:cooccure_with>
<scot:cooccure_with>http://devnull.networld.to/foaf.rdf#me</
scot:cooccure_with>
</scot:Tag>
</scot:cooccure_tag>
To have a more human readable output we use the XML stylesheet in Listing
5. With the help of this XSLT we receive an ordered output in HTML format
as shown in Figure 6
Listing 5: Excerpt of the context term computer science
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:list="http://crschmidt.net/ns/list#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:scot="http://scot-project.org/scot/ns#">
<xsl:template match="/">
4 http://crschmidt.net/ns/list#priority
13
<html>
<body>
<h1><font color="red">Context Cloud - Sorted by Importance</
font></h1>
<h2><xsl:value-of select="//scot:Tagcloud/dc:title"/></h2>
<xsl:value-of select="//scot:Tagcloud/dc:description" /><p/>
<table border="1">
<tr bgcolor="#9acd32">
<th>#</th>
<th>Tag Name</th>
<th>Priority</th>
<th>Absolute Frequency</th>
<th>Relative Frequency</th>
<th>URI</th>
</tr>
<xsl:for-each select="//scot:Tag">
<xsl:sort select="scot:own_afrequency * list:priority"
data-type="number"
order="descending"/>
<tr>
<td><xsl:value-of select="scot:own_afrequency * list:
priority"/></td>
<td><xsl:value-of select="scot:name"/></td>
<td><xsl:value-of select="list:priority"/></td>
<td><xsl:value-of select="scot:own_afrequency"/></td>
<td><xsl:value-of select="scot:own_rfrequency"/></td>
<td><xsl:value-of select="@rdf:about"/></td>
</tr>
</xsl:for-each>
</table>
<p/>
---<br/>
<i>#</i>: The sorting value is calculated as follows: <i>
Absoulte Frequency * Priority</i>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
The following two sections explain more in deep the two context types.
Static Context
As the name implies the Static Context changes only from time to time. Usually
that is when the agent changes the interests. This type of context is an extension
of the general definition of the concept of a context. In general it could be
said that this type of context describes the agent that searches for synonyms.
If the agent is a person that could be the interests or the tags of the made
publications. If the gained information are not enough to compute suitable
context-aware synonyms it could be extended with context information from a
social network. Similar as described in [31] the similarity algorithm presented
in this section takes into account the connection between two (or more) agents.
Useful connection information from a social network are the context clouds of
known agents, further called ”friends”. As friend we define a known agent with
14
Figure 6: Excerpt of the sorted context visuzalized in HTML
a similarity great than S, where S is called friendship threshold and expressed
in a percentage value. More in general we reach the following formula:
f riend2,1 = sim(agent1 , agent2 ) > S
f riend2,1
...
sim(agent1 , agent2 )
...
S
...
if true agent2 is defined as friend of
agent1
computes a percentage value of the similarity of the two agents.
Threshold value to indicate who is a
friend. Should be set during the integration to gain the optimal performance
for the scenario.
The term friend is more abstract than the real world term, because it is
possible that the real agents have not met each other. It could be interpreted
as could be possible friends. Although it is useful to define the relation friend in
such a way, that we can say
If sim(agent1 , agent2 ) computes to a higher value it is more likely
that the other context that not matches describes also each other
person.
An argument for this definition is that people with the same background or
social status are likely to have similar interests. That means if the majority of
the interests are the same we can assume that the two agents are similar with
a likelihood of the presented algorithm.
In the scope of this work and for the implementation the sim function is
defined as follow.
#(matching)
X
sim(agent1 , agent2 ) =
absoluteF requency
(1)
#(total)
X
15
absoluteF requency
simagent1 ,agent2 =
sim(agent1 , agent2 ) + sim(agent2 , agent1 )
2
(2)
Figure 7: Compute the similarity of two Contexts
For convenience the implementation uses FOAF [5] files to express an agent.
The FOAF file could point to other RDF file that could be used to evaluate further static context related data, for example published articles that are tagged.
The use of linked data as abstraction is useful to consolidate information from
different sources, with the advantage that the underlying algorithm is consistent
after source changes.
To handle the intersection of synonyms and context we have to abstract the
information further. After the evaluation of the FOAF file and all related files
that includes static context we have a further RDF file written in the ontology
SCOT [3]. This information cloud includes the first time also priority values.
Important to mention is that the priority consists of two types of values. The
first is the quantity and the second the quality. The quantity counts only how
often a tag occurs in the whole context. The value is then stored as absolute
amount. The relative value is not stored explicitly, so the blocks are consistent
also if the total amount of tags changes. If needed the relative value is computed
16
by the value total amount and the absolute values. The other value is the quality.
For the own context the quality is 1 (100% accuracy) and for the extension with
the context of friends the quality value is sim(agentme , agentf riend ) if the tag
does not occur in your own context otherwise it is also 1.
In the following example there will be used the abbreviation M for myself
and F for friend. Through the bidirectional of the similarity function the agents
are interchangeable. Figure 8 shows the two entities, the shared context tags
and the absolute frequency in the context cloud for the example scenario.
Figure 8: Excerpt of context cloud relation.
Context Tags
Artificial Intelligence
Computer Science
Semantic Web
Business
Computer Security
Computer Network
Diving
Backtracker
MokSec
Research
Networld
Networld Team
Absolute Frequency
3
1
5
1
1
1
1
2
2
3
1
1
22
Common
Yes
Yes
Yes
No
No
No
No
No
No
No
No
No
Table 1: Tags that occurs in the context cloud of M
We compute first the similarity of the context cloud of M in relation to F
and vice versa with the help of the formula 1.
17
Context Tags
Artificial Intelligence
Computer Science
Semantic Web
Robotics
Web 2.0
Absolute Frequency
1
1
1
1
1
5
Common
Yes
Yes
Yes
No
No
Table 2: Tags that occurs in the context cloud of F
3+1+5
3+1+1+1+5+1+1+2+2+3+1+1
9
=
22
≈ 0.41
sim(agentM , agentF ) =
1+1+1
1+1+1+1+1
3
=
5
= 0.60
sim(agentF , agentM ) =
The average of this two values is computed with the formula 2.
0.41 + 0.60
2
1.01
sim =
2
sim ≈ 0.51
sim =
We compute for M and F the similarity and take than the average. The
average is needed because the friend relationship is bidirectional so it is not
possible to have different friend values for the two agents. More general speaking
M could not be more a friend of F than F of M. This different measuring of
friendship could be possible in real world scenarios but is not permitted in our
use case where we compute the similarity of interests.
Additional the average weakens also the fact that the number of tags in the
two context clouds could differ.
After the computation, as shown above, we receive the following new, extended context cloud described in Table 3.
Dynamic Context
The definition of the Dynamic Context could be taken from any English speaking
dictionary of the word context. The following definition was taken from WordNet
[21].
18
Context Tags
Artificial Intelligence
Computer Science
Semantic Web
Business
Computer Security
Computer Network
Diving
Backtracker
MokSec
Research
Networld
Networld Team
Robotics
Web 2.0
Absolute Frequency
3
1
5
1
1
1
1
2
2
3
1
1
1
1
24
Similarity
1
1
1
1
1
1
1
1
1
1
1
1
0.51
0.51
Common
Yes
Yes
Yes
No
No
No
No
No
No
No
No
No
No
No
Table 3: Tags that occurs in the context cloud of M after extension
S: (n) context, circumstance, setting (the set of facts or circumstances that surround a situation or event) ”the historical context”
For the purpose of this thesis the following dynamic context parts are thinkable:
Location A location could be expressed as street name, city name, ... or more
accurate as GPS coordinates in the form latitude/longitude. Possible is
also indirect location data, such as GSM cells or unique wireless hotspots
(to be able to identify a hotspot the router MAC address could be used).
Appointments Each appointment is sorted into a category or has a description. This information could be used to extract useful tags for the context.
For the computation of synonyms, it is not always possible to use all available
context data types. A good example for this is the location as GPS coordinates.
The developed algorithm does not allow using raw GPS coordinates, but only
concepts that describe the current valid context. To solve this problem the evaluation of the coordinates to the location as concept is one thinkable approach.
For example a GPS coordinate could evaluate to the concept of university. With
this location specification the synonyms could be computed. During the integration step into the m:Ciudad [14] framework the location are returned in such
a manner.
3.4
Intersection Algorithm
Through the good preparation of the two data sets the real intersection part
works very stable and reliable. It does not matter what sources are added to the
sets, with the help of the abstraction to a unique data format the intersection is
always the same SPARQL query. The Figure 9 visualizes the concept of the data
abstraction that is used throughout the work. The picture shows the abstraction
19
Figure 9: Data Abstraction Visualization
from different data sources (Syn 1 to Syn X and Context 1 to Context Y ), but
also the possibility to formalize the produced output in different data formats
(Result View 1 to Result View Z ). This makes the algorithm very portable
and easy to integrate into third party applications. The colored entities are
the parts used by the algorithm to compute context-aware synonyms. RDF
Synonym Cloud and RDF Context Cloud are the input sets and Context-Aware
Synonyms is the output set after successful computation.
Additional the intersection is simplified through the same handling of dynamic and static context. The difference between the two context types changes
nothing on the computation and does not complicate the given output. More
precise that means that the context types are consolidate into one big context
and the difference is shown only indirectly by the priority of the entries. To
do not lose the flexibility the related value of dynamic and static context could
be changed by the developer. In the focus of this work the dynamic has a 1.5
higher priority than the static context. These values are chosen to demonstrate
a real time scenario, where the dynamic context information is more important
and the context of the searching agent is only used to optimize the search results. Such a scenario could be for example the search of facilities on the go,
where the current location and time classifications play a higher role for a first
pre-filtering. In a second step the personal preferences are included.
Context ∩ Synonyms
The SPARQL query in Listing 6 is used to find all synonym blocks that includes a term with the same name in the context. For the intersection the two
ontologies SKOS [7] and SCOT [3] are used.
20
Listing 6: Intersection of Synonyms and Static Context with SPARQL
PREFIX scot: <http://scot-project.org/scot/ns#>
PREFIX skos: <http://www.w3.org/2009/08/skos-reference/skos.rdf#>
SELECT DISTINCT *
FROM </path/to/searchterm.rdf>
FROM </path/to/contextcloud.rdf>
WHERE {
?synset skos:hasMember ?bagLabel .
?bagLabel skos:note ?tagname .
?tagLabel scot:name ?tagname .
?tagLabel scot:synonym ?orgname .
?synset skos:hasMember ?members .
?members skos:altLabel ?synonym .
?synset skos:definition ?definition .
}
The SPARQL query in Listing 6 is executed over all data sources specified
in the FROM statement. The same variable ?tagname for the two properties
skos:note and scot:name makes the intersection of the two data sources and
returns the identifier of all subtrees. From this subtree all synonyms and additional information are extracted.
4
Implementation
In the focus of this thesis was also an implementation that proofs that the
developed algorithm works as expected. Additional this implementation was
integrated then, in a slightly modified form, into the m:Ciudad [14] framework.
The following section describes a concrete implementation of the theoretically discussed computation of context-aware synonyms in the previous sections.
The implementation could be divided into the following parts:
Framework The core part that handles the computation is written as framework and could be deployed as library to other applications. This part
was also used for the integration into the m:Ciudad framework [14].
Local Test Implementations To test the framework and during the development there are two implementation that output the received data to the
standard output. One implementation uses the wrapper class ComputeCAS, the other uses the components directly and could be used in a more
flexible way, for example if other dictionaries or context sources have to
be integrated.
Server/Client Implementation Another implementation, mainly written for
presentation purpose and to demonstrate a real world scenario, is a server/client architecture with the computation on the server side and the visualization on the client side.
Before the different implementations are described, the used technologies are
explained.
4.1
Used Technology
The implementation was written in the programing language JAVA version 1.6
[1]. The core implementation depends on the following libraries.
21
Jena [2] Writing, reading and querying of the RDF files, especially for the
abstraction of the context and synonyms.
JWI [17] The WordNet API library that queries the dictionary, that has to be
stored locally.
For demonstration purpose a mobile application that runs on the Android
platform is used. The minimal requirement to run the client is Android 1.5
platform API level 3 [12], although it should run also on newer devices. The
code on the mobile device is written in a subset of Java, but the generated class
files runs on the Dalvik Virtual Machine [13]. This part is optional and was
intended only for demonstration purpose and for a real world scenario how the
framework could be used.
4.2
Architecture
Figure 10: Architecture view from a developer perspective.
Figure 10 and 11 explains the implementation of the algorithm from a developer perspective that uses the framework. The ComputeCAS class is a wrapper
that simplifies the use. Through the loose binding of the components the framework is very flexible and could be adapted easily. Loose binding means that the
component that gathers the synonyms (the SynToRDF class) and the component that computes the context (the Context class) are absolutely independent.
22
Figure 11: Sequence diagram from the perspective of the ComputeCAS class.
The IntersectionAlgorithm uses only the abstracted RDF files from the other two
classes. The dependence exists not on the code, but on the data representation
layer. This fact makes the use of RDF ontologies a central point. Additional
the substitution of one component with another implementation could be easily
done if the abstracted data do not change. Figure 10 shows the dependencies
between the classes in a static way. The sequence diagram in Figure 11 on the
other side shows the interaction between the classes during the runtime. The
returned context-aware synonyms are encapsulated in the IResultSet interface.
The concrete implementation ResultSet includes CASTag objects that implements the IResultTag interface. There is no representation of the context-aware
synonyms in a semantic way. That makes it possible to store the result in
different data format without a lot of effort.
4.2.1
Package: synonyms
To simplify the extension of the framework with other dictionaries the synonyms
package abstract the gained data from different dictionaries and writes them to
a special RDF file that uses mainly the SKOS [7] ontology. Each of the used
dictionaries implements against the following interface.
Listing 7: Synonym Java Interface
package i n t e r f a c e s ;
im po rt j a v a . u t i l . V e c t o r ;
p u b l i c i n t e r f a c e ISynonyms {
p u b l i c Vector<S t r i n g > g e t W o r d D e f i n i t i o n ( S t r i n g word ) ;
p u b l i c Vector<SynEntry> getSynonymList ( S t r i n g word ) ;
23
Figure 12: Use Case diagram that describes what task have to be fulfilled to
compute context-aware synonyms
}
The implementation against this interface ensures that the intersection algorithm works on the newly used set. The other two important classes in the
package are SynEntry and SynToRDF. SynEntry implements the interface ISynEntry (see Listing 8) and combines synonyms with the same meaning to a
logical block. Such a block includes the following values:
word The search term that are input by an agent.
word type The word type such as adjective, adverb, noun or verb.
word type description A short description of the word type.
language list For each synonym there exists a related language.
synonym list Synonyms that have the same meaning.
Listing 8: SynEntry Java Interface
package i n t e r f a c e s ;
im po rt j a v a . u t i l . HashMap ;
p u b l i c i n t e r f a c e ISynEntry {
p u b l i c a b s t r a c t v o i d setSynonym ( S t r i n g synonym , S t r i n g
language ) ;
p u b l i c a b s t r a c t v o i d setHypernym ( S t r i n g synonym , S t r i n g
language ) ;
p u b l i c a b s t r a c t v o i d setHyponym ( S t r i n g synonym , S t r i n g
language ) ;
public a b s t r a c t void setWordTypeDefinition ( String
wordtypeDefintion ) ;
p u b l i c a b s t r a c t S t r i n g getWord ( ) ;
public abstract String getDefinition () ;
24
public
public
public
public
public
abstract
abstract
abstract
abstract
abstract
S t r i n g getWordType ( ) ;
S t r i n g getWordTypeDefinition ( ) ;
HashMap<S t r i n g , S t r i n g > getSynMap ( ) ;
HashMap<S t r i n g , S t r i n g > getHypernymMap ( ) ;
HashMap<S t r i n g , S t r i n g > getHyponymMap ( ) ;
}
The class SynToRDF handles the writing of the abstracted synonyms. For
example it writes the translated words into the correct block. If a dictionary is
added or removed this class has to be changed also. SynToRDF writes then all
gained synonyms for a certain search term to a RDF file.
4.2.2
Package: context
Figure 13: Describes the dependencies in the context package.
On the context side there is used the class ContextTag to ensure the right
handling of the context abstraction. To simplify the access, a manager class with
the name ContextTagCloud is used. This class handles the set of all context tags
25
and could be used to extend the context cloud with information from different
sources.
Listing 9: Context Tag Java Interface
package i n t e r f a c e s ;
im po rt j a v a . u t i l . V e c t o r ;
p u b l i c i n t e r f a c e IContextTag {
p u b l i c a b s t r a c t v o i d setTagName ( S t r i n g tagName ) ;
p u b l i c a b s t r a c t v o i d s e t O r g S p e l l i n g ( S t r i n g tagOrgName ) ;
public abstract void setAbsoluteFrequency ( i n t
absoluteFrequency ) ;
priority ) ;
public abstract void s e t P r i o r i t y ( f l o a t
uri ) ;
p u b l i c a b s t r a c t v o i d setCooccurURI ( S t r i n g
p u b l i c a b s t r a c t void incrementFrequency ( ) ;
public
public
public
public
public
abstract
abstract
abstract
abstract
abstract
S t r i n g getTagName ( ) ;
i n t getAbsoluteFrequency ( ) ;
float getPriority () ;
Vector<S t r i n g > getCooccurURI ( ) ;
Vector<S t r i n g > g e t O r g S p e l l i n g ( ) ;
}
4.3
Server/Client Architecture
The following section describes shortly the proof-of-concept implementation how
the framework could be used in a real world scenario.
4.3.1
Server
The server component uses a very simple proprietary5 protocol to exchange
information with the client. In this scenario the server is responsible for the
synonym computation and for the abstraction of the context information. The
client sends the context data in form of a URL to a FOAF file. For the test
scenario this context is enough, also because from the FOAF file the publications
and the related tags are reachable.
4.3.2
Client
The client is a simple GUI frontend that makes it easier to interact with the
framework. As you see in Figure 14a the main part consists only of an input field
and a search button. After the setting of the server variables and the URL to
the context data (see Figure 14b and 14c) it is possible to search for synonyms.
For this purpose the client sends first the context URLs to the server and then
the search term. After the computation of the synonyms on the server side the
result will be send back to the server. The client notifies the user with a message
at the left top of the screen and prints the results in a list (see Figure 14d).
The implementation of the client on a mobile device is not only useful to
have the possibility to compute context-aware synonyms on the go, but also to
have a dynamic context for the current user. Such device is for most people a
5 CSV
(comma separated values) for the exchange of request/response messages
26
private equipment that stores a lot of information about them. This fact gives
the possibility to compute a detailed dynamic context.
4.4
Integrated Version - The Library
A slightly adapted version was integrated into the m:Ciudad framework [14].
m:Ciudad is a service infrastructure that allows users to generate composite
services on the fly. One difference from the testing implementation is that the
context is not related to one agent, but to a group of agents, a community. Another difference is that the data sources are not RDF files, but read out directly
from the internal data store. For a detailed comparison between the two implementations see Table 4. The implementation into the m:Ciudad framework
shows how flexible the implementation could be used.
Context Source
Context for...
Synonym Source
Standalone Test Application
FOAF File with related information (e.g. publications)
one agent
WordNet, LexVo
m:Ciudad Integration
UDL-SP (m:Ciudad Service
Profile)
a community
WordNet, DBpedia.org
Table 4: Difference between the standalone test application and the m:Ciudad
integration
5
Evaluation
The focus of this thesis was to develop an algorithm that computes synonyms
that are relevant for a given context. Through the abstraction of the two sets it
was possible to use well defined, public available data sets, such as WordNet for
the computation of the synonym blocks in English and the extension with multilingual terms from LexVo [9]. More important is the abstraction for the context
information, because there are different data sources that have to be evaluated
to reach a nearly complete representation of the real world environment. This
approach makes the framework easily extendible with additional synonym and
context sources. The choice of this architecture weakens also the fact that the
algorithm is only as good as the underlying data sets. If the data sets are incomplete or do not define the right context the algorithm returns wrong results. The
decision what data sets should be used is scenario dependent and can be only
made during the integration. Additional to the flexible core algorithm there was
developed also a second approach that extends the synonyms with Hypernyms
and Hyponyms. This algorithm uses the dbpedia.org [22] dataset. This implementation is not related to any context and should be used only as extension
for the core part if there are not enough results. Another extension could be
the prediction of words in a sentence as described in [19]. This extension was
not part of this work because the context is only related to the current sentence
and not related to single agents or groups.
27
The developed algorithm bridge the gap between context information systems such as Context-Aware Services for Mobile Users - Technology and User
Experiences [15], Context-Aware Query Processing on the Semantic Web [6]
and synonym databases (e.g. WordNet [21]). In difference to the papers about
context computation this work assumes that the context is available and starts
from this, more abstract, layer. That fact makes it possible to shift the focus
away from data gathering towards the intersection of context and synonyms.
After the design of the computation of context-aware synonyms the implementation was integrated into the m:Ciudad context-aware search engine for
mobile services that are generated on the fly [8]. The purpose of the algorithm
is to use the internal stored context for user groups and the given search term
to optimize the search results. Through the broader context information6 the
additional extension with Hypernyms and Hyponyms was used.
The major drawback of the presented solution is that the solution works
only if the underlying data sets are correct and includes enough content to be
able to reason about it. This problem could be minimized by the right choice of
well-known synonym databases on the one side and by the use of information
from data sets that are generate by the user. For example context information
could be filtered from social networks or from mobile devices.
To reach an optimal result set the algorithm should receive as input scenario
related data, for example the use of domain specific synonym dictionaries guarantees optimal results for the synonym sets. The same is true for the context
information. Here different sources should be chosen that describes the agent
in a complete as possible manner.
The algorithm is very successful if the input term is an abbreviation that
expands to more, semantically different, synonym blocks and the searching agent
has domain specific information in his context that is related to the search term.
6
Further Work
The presented algorithm in this thesis shows a possible solution how contextaware synonyms could be computed. The related application provides a highly
flexible implementation that could be used as starting point for a concrete real
world integration. To use the algorithm in a productive way the data sources
should be improved and adapted for the scenario. The improvement is not only
related to domain specific synonym dictionaries, but also for context information that describes the searching agent/group. Context information could be
extracted for example from social network.
The focus of this work was only the computation of context-aware synonyms
and not the creation of an ontology that represents the search results. In the
presented and maybe in most scenarios the gained data has to be handled by a
third party application. So it makes sense to have an ontology that represents
context-aware synonyms.
A further improvement of the algorithm is the parsing of the definition and
the example sentences in the synonym cloud and take the gained information
into account during the intersection process. This extension implies the understanding of the semantic of the natural language sentences.
6 m:Ciudad
services are using context information related to groups and not to single agents.
28
A good approach for a optimized output is the improvement and/or the
extension of the underlying data sets.
29
A
Appendix - RDF example files
Listing 10: Synonyms for AI abstracted as RDF file
<?xml v e r s i o n=” 1 . 0 ”?>
<r d f :RDF
xmlns : r d f=” h t t p : / /www. w3 . o r g /1999/02/22 − r d f −syntax−ns#”
xmlns : s k o s=” h t t p : / /www. w3 . o r g /2009/08/ s k o s −r e f e r e n c e / s k o s .
r d f#”>
<s k o s : C o l l e c t i o n r d f : about=” h t t p : / / example . o r g / synonyms /
AI noun . ac t −d e f 4 ”>
<r d f : o b j e c t >nouns d e n o t i n g a c t s o r a c t i o n s </ r d f : o b j e c t >
<s k o s : scopeNote>noun . act </s k o s : scopeNote>
<s k o s : d e f i n i t i o n >t h e i n t r o d u c t i o n o f semen i n t o t h e
o v i d u c t o r u t e r u s by some means o t h e r than s e x u a l
i n t e r c o u r s e </s k o s : d e f i n i t i o n >
<s k o s : hasMember>
<r d f : Bag r d f : about=” h t t p : / / example . o r g / synonyms /AI”>
<s k o s : note>a i </s k o s : note>
<s k o s : a l t L a b e l xml : l a n g=” en ”>AI</s k o s : a l t L a b e l >
</ r d f : Bag>
</ s k o s : hasMember>
<s k o s : hasMember>
<r d f : Bag r d f : about=” h t t p : / / example . o r g / synonyms /
i n s e m i n a t i o n ”>
<s k o s : note>i n s e m i n a t i o n </s k o s : note>
<s k o s : a l t L a b e l xml : l a n g=” en ”>i n s e m i n a t i o n </s k o s :
altLabel>
</ r d f : Bag>
</ s k o s : hasMember>
<s k o s : hasMember>
<r d f : Bag r d f : about=” h t t p : / / example . o r g / synonyms /
a r t i f i c i a l i n s e m i n a t i o n ”>
<s k o s : note>a r t i f i c i a l i n s e m i n a t i o n </s k o s : note>
<s k o s : a l t L a b e l xml : l a n g=” en ”>
a r t i f i c i a l i n s e m i n a t i o n </s k o s : a l t L a b e l >
</ r d f : Bag>
</ s k o s : hasMember>
</ s k o s : C o l l e c t i o n >
<s k o s : C o l l e c t i o n r d f : about=” h t t p : / / example . o r g / synonyms /
AI noun . animal−d e f 3 ”>
<r d f : o b j e c t >nouns d e n o t i n g a n i m a l s </ r d f : o b j e c t >
<s k o s : scopeNote>noun . animal </s k o s : scopeNote>
<s k o s : d e f i n i t i o n >a s l o t h t h a t has t h r e e l o n g c l a w s on
each f o r e f o o t and each h i n d f o o t </s k o s : d e f i n i t i o n >
<s k o s : hasMember>
<r d f : Bag r d f : about=” h t t p : / / example . o r g / synonyms /
t h r e e −t o e d s l o t h ”>
<s k o s : note>t h r e e t o e d s l o t h </s k o s : note>
<s k o s : a l t L a b e l xml : l a n g=” en ”>t h r e e −t o e d s l o t h </
skos : altLabel >
</ r d f : Bag>
</ s k o s : hasMember>
<s k o s : hasMember>
<r d f : Bag r d f : about=” h t t p : / / example . o r g / synonyms /
B r a d y p u s t r i d a c t y l u s ”>
<s k o s : note>b r a d y p u s t r i d a c t y l u s </s k o s : note>
<s k o s : a l t L a b e l xml : l a n g=” en ”>
B r a d y p u s t r i d a c t y l u s </s k o s : a l t L a b e l >
</ r d f : Bag>
</ s k o s : hasMember>
<s k o s : hasMember>
30
<r d f : Bag r d f : about=” h t t p : / / example . o r g / synonyms /
t r e e s l o t h ”>
<s k o s : note>t r e e s l o t h </s k o s : note>
<s k o s : a l t L a b e l xml : l a n g=” en ”> t r e e s l o t h </s k o s :
altLabel>
</ r d f : Bag>
</ s k o s : hasMember>
<s k o s : hasMember>
<r d f : Bag r d f : about=” h t t p : / / example . o r g / synonyms /
s l o t h ”>
<s k o s : note>s l o t h </s k o s : note>
<s k o s : a l t L a b e l xml : l a n g=” en ”>s l o t h </s k o s :
altLabel>
</ r d f : Bag>
</ s k o s : hasMember>
<s k o s : hasMember>
<r d f : Bag r d f : about=” h t t p : / / example . o r g / synonyms / a i ”>
<s k o s : note>a i </s k o s : note>
<s k o s : a l t L a b e l xml : l a n g=” en ”>a i </s k o s : a l t L a b e l >
</ r d f : Bag>
</ s k o s : hasMember>
</ s k o s : C o l l e c t i o n >
<s k o s : C o l l e c t i o n r d f : about=” h t t p : / / example . o r g / synonyms /
AI noun . group−d e f 1 ”>
<r d f : o b j e c t >nouns d e n o t i n g g r o u p i n g s o f p e o p l e o r
o b j e c t s </ r d f : o b j e c t >
<s k o s : hasMember>
<r d f : Bag r d f : about=” h t t p : / / example . o r g / synonyms /
a u t h o r i t y ”>
<s k o s : note>a u t h o r i t y </s k o s : note>
<s k o s : a l t L a b e l xml : l a n g=” en ”>a u t h o r i t y </s k o s :
altLabel>
</ r d f : Bag>
</ s k o s : hasMember>
<s k o s : scopeNote>noun . group </s k o s : scopeNote>
<s k o s : hasMember>
<r d f : Bag r d f : about=” h t t p : / / example . o r g / synonyms /
A r m y I n t e l l i g e n c e ”>
<s k o s : note>a r m y i n t e l l i g e n c e </s k o s : note>
<s k o s : a l t L a b e l xml : l a n g=” en ”>A r m y I n t e l l i g e n c e </
skos : altLabel >
</ r d f : Bag>
</ s k o s : hasMember>
<s k o s : hasMember>
<r d f : Bag r d f : about=” h t t p : / / example . o r g / synonyms /
o f f i c e ”>
<s k o s : note>o f f i c e </s k o s : note>
<s k o s : a l t L a b e l xml : l a n g=” en ”> o f f i c e </s k o s :
altLabel>
</ r d f : Bag>
</ s k o s : hasMember>
<s k o s : d e f i n i t i o n >an agency o f t h e United S t a t e s Army
r e s p o n s i b l e f o r p r o v i d i n g t i m e l y and r e l e v a n t and
a c c u r a t e and s y n c h r o n i z e d i n t e l l i g e n c e t o t a c t i c a l
and o p e r a t i o n a l and s t r a t e g i c l e v e l commanders</s k o s
: definition >
<s k o s : hasMember r d f : r e s o u r c e=” h t t p : / / example . o r g /
synonyms /AI”/>
<s k o s : hasMember>
<r d f : Bag r d f : about=” h t t p : / / example . o r g / synonyms /
f e d e r a l a g e n c y ”>
<s k o s : note>f e d e r a l a g e n c y </s k o s : note>
31
<s k o s : a l t L a b e l xml : l a n g=” en ”>f e d e r a l a g e n c y </
skos : altLabel >
</ r d f : Bag>
</ s k o s : hasMember>
<s k o s : hasMember>
<r d f : Bag r d f : about=” h t t p : / / example . o r g / synonyms /
agency ”>
<s k o s : note>agency </s k o s : note>
<s k o s : a l t L a b e l xml : l a n g=” en ”>agency </s k o s :
altLabel>
</ r d f : Bag>
</ s k o s : hasMember>
<s k o s : hasMember>
<r d f : Bag r d f : about=” h t t p : / / example . o r g / synonyms /
g o v e r n m e n t a g e n c y ”>
<s k o s : note>governmentagency </s k o s : note>
<s k o s : a l t L a b e l xml : l a n g=” en ”>government agency </
skos : altLabel >
</ r d f : Bag>
</ s k o s : hasMember>
<s k o s : hasMember>
<r d f : Bag r d f : about=” h t t p : / / example . o r g / synonyms /
bureau ”>
<s k o s : note>bureau </s k o s : note>
<s k o s : a l t L a b e l xml : l a n g=” en ”>bureau </s k o s :
altLabel>
</ r d f : Bag>
</ s k o s : hasMember>
</ s k o s : C o l l e c t i o n >
<s k o s : C o l l e c t i o n r d f : about=” h t t p : / / example . o r g / synonyms /
AI noun . c o g n i t i o n −d e f 2 ”>
<r d f : o b j e c t >nouns d e n o t i n g c o g n i t i v e p r o c e s s e s and
c o n t e n t s </ r d f : o b j e c t >
<s k o s : scopeNote>noun . c o g n i t i o n </s k o s : scopeNote>
<s k o s : d e f i n i t i o n >t h e branch o f computer s c i e n c e t h a t
d e a l with w r i t i n g computer programs t h a t can s o l v e
p r o b l e m s c r e a t i v e l y </s k o s : d e f i n i t i o n >
<s k o s : example>w o r k e r s i n AI hope t o i m i t a t e o r d u p l i c a t e
i n t e l l i g e n c e i n computers and r o b o t s </s k o s : example>
<s k o s : hasMember>
<r d f : Bag r d f : about=” h t t p : / / example . o r g / synonyms /
c o m p u t e r s c i e n c e ”>
<s k o s : note>c o m p u t e r s c i e n c e </s k o s : note>
<s k o s : a l t L a b e l xml : l a n g=” en ”>c o m p u t e r s c i e n c e </
skos : altLabel >
</ r d f : Bag>
</ s k o s : hasMember>
<s k o s : hasMember>
<r d f : Bag r d f : about=” h t t p : / / example . o r g / synonyms /
computing ”>
<s k o s : note>computing </s k o s : note>
<s k o s : a l t L a b e l xml : l a n g=” en ”>computing </s k o s :
altLabel>
</ r d f : Bag>
</ s k o s : hasMember>
<s k o s : hasMember r d f : r e s o u r c e=” h t t p : / / example . o r g /
synonyms /AI”/>
<s k o s : hasMember>
<r d f : Bag r d f : about=” h t t p : / / example . o r g / synonyms /
a r t i f i c i a l i n t e l l i g e n c e ”>
<s k o s : note> a r t i f i c i a l i n t e l l i g e n c e </s k o s : note>
<s k o s : a l t L a b e l xml : l a n g=” en ”>
32
a r t i f i c i a l i n t e l l i g e n c e </s k o s : a l t L a b e l >
</ r d f : Bag>
</ s k o s : hasMember>
</ s k o s : C o l l e c t i o n >
</ r d f :RDF>
Listing 11: Context abstracted as RDF file
<?xml v e r s i o n=” 1 . 0 ”?>
<r d f :RDF
xmlns : l i s t =” h t t p : / / c r s c h m i d t . n e t / ns / l i s t #”
xmlns : r d f=” h t t p : / /www. w3 . o r g /1999/02/22 − r d f −syntax−ns#”
xmlns : s c o t=” h t t p : / / s c o t −p r o j e c t . o r g / s c o t / ns#”>
<s c o t : C o o c u r r e n c e r d f : about=” h t t p : / / n e t w o r l d . t o /? s i o c t y p e=
p o s t&amp ; s i o c i d =97”>
<s c o t : c o o c c u r e t a g >
<s c o t : Tag r d f : about=” h t t p : / / example . o r g / c o n t e x t /
semanticweb ”>
< l i s t : p r i o r i t y >1.0</ l i s t : p r i o r i t y >
<s c o t : c o o c c u r e w i t h >h t t p : // n e t w o r l d . t o /?
s i o c t y p e=p o s t&amp ; s i o c i d =97</ s c o t :
cooccure with>
<s c o t : c o o c c u r e w i t h >h t t p : // d e v n u l l . n e t w o r l d . t o /
f o a f . r d f#me</ s c o t : c o o c c u r e w i t h >
<s c o t : c o o c c u r e w i t h >h t t p : // n e t w o r l d . t o /?
s i o c t y p e=p o s t&amp ; s i o c i d =207</ s c o t :
cooccure with>
<s c o t : o w n a f r e q u e n c y >7</ s c o t : o w n a f r e q u e n c y >
<s c o t : c o o c c u r e w i t h >h t t p : // n e t w o r l d . t o /?
s i o c t y p e=p o s t&amp ; s i o c i d =229</ s c o t :
cooccure with>
<s c o t : synonym>semanticweb </ s c o t : synonym>
<s c o t : c o o c c u r e w i t h >h t t p : // k o n i . n e t w o r l d . t o / f o a f
. r d f#me</ s c o t : c o o c c u r e w i t h >
<s c o t : synonym>Semantic Web</ s c o t : synonym>
<s c o t : c o o c c u r e w i t h >h t t p : // n e t w o r l d . t o /?
s i o c t y p e=p o s t&amp ; s i o c i d =339</ s c o t :
cooccure with>
<s c o t : c o o c c u r e w i t h >h t t p : // n e t w o r l d . t o /?
s i o c t y p e=p o s t&amp ; s i o c i d =172</ s c o t :
cooccure with>
<s c o t : name>semanticweb </ s c o t : name>
<s c o t : o w n r f r e q u e n c y >22.580645 </ s c o t :
own rfrequency>
</ s c o t : Tag>
</ s c o t : c o o c c u r e t a g >
<s c o t : c o o c c u r e t a g >
<s c o t : Tag r d f : about=” h t t p : / / example . o r g / c o n t e x t /
networldteam ”>
<s c o t : name>networldteam </ s c o t : name>
<s c o t : o w n a f r e q u e n c y >1</ s c o t : o w n a f r e q u e n c y >
<s c o t : o w n r f r e q u e n c y >3.2258062 </ s c o t :
own rfrequency>
< l i s t : p r i o r i t y >1.0</ l i s t : p r i o r i t y >
<s c o t : synonym>Networld Team</ s c o t : synonym>
<s c o t : c o o c c u r e w i t h >h t t p : // n e t w o r l d . t o /?
s i o c t y p e=p o s t&amp ; s i o c i d =97</ s c o t :
cooccure with>
</ s c o t : Tag>
</ s c o t : c o o c c u r e t a g >
<s c o t : c o o c c u r e t a g >
33
<s c o t : Tag r d f : about=” h t t p : / / example . o r g / c o n t e x t /
b a c k t r a c k e r ”>
<s c o t : name>b a c k t r a c k e r </ s c o t : name>
<s c o t : o w n a f r e q u e n c y >2</ s c o t : o w n a f r e q u e n c y >
<s c o t : o w n r f r e q u e n c y >6.4516125 </ s c o t :
own rfrequency>
< l i s t : p r i o r i t y >1.0</ l i s t : p r i o r i t y >
<s c o t : synonym>B a c k t r a c k e r </ s c o t : synonym>
<s c o t : c o o c c u r e w i t h >h t t p : // n e t w o r l d . t o /?
s i o c t y p e=p o s t&amp ; s i o c i d =97</ s c o t :
cooccure with>
<s c o t : c o o c c u r e w i t h >h t t p : // n e t w o r l d . t o /?
s i o c t y p e=p o s t&amp ; s i o c i d =172</ s c o t :
cooccure with>
</ s c o t : Tag>
</ s c o t : c o o c c u r e t a g >
<s c o t : c o o c c u r e t a g >
<s c o t : Tag r d f : about=” h t t p : / / example . o r g / c o n t e x t /
a r t i f i c i a l i n t e l l i g e n c e ”>
<s c o t : c o o c c u r e w i t h >h t t p : // n e t w o r l d . t o /?
s i o c t y p e=p o s t&amp ; s i o c i d =97</ s c o t :
cooccure with>
< l i s t : p r i o r i t y >1.0</ l i s t : p r i o r i t y >
<s c o t : c o o c c u r e w i t h >h t t p : // n e t w o r l d . t o /?
s i o c t y p e=p o s t&amp ; s i o c i d =229</ s c o t :
cooccure with>
<s c o t : synonym> A r t i f i c i a l I n t e l l i g e n c e </ s c o t :
synonym>
<s c o t : c o o c c u r e w i t h >h t t p : // k o n i . n e t w o r l d . t o / f o a f
. r d f#me</ s c o t : c o o c c u r e w i t h >
<s c o t : o w n r f r e q u e n c y >12.903225 </ s c o t :
own rfrequency>
<s c o t : o w n a f r e q u e n c y >4</ s c o t : o w n a f r e q u e n c y >
<s c o t : c o o c c u r e w i t h >h t t p : // d e v n u l l . n e t w o r l d . t o /
f o a f . r d f#me</ s c o t : c o o c c u r e w i t h >
<s c o t : name> a r t i f i c i a l i n t e l l i g e n c e </ s c o t : name>
<s c o t : synonym> a r t i f i c i a l i n t e l l i g e n c e </ s c o t :
synonym>
</ s c o t : Tag>
</ s c o t : c o o c c u r e t a g >
<s c o t : c o o c c u r e t a g >
<s c o t : Tag r d f : about=” h t t p : / / example . o r g / c o n t e x t /
moksec ”>
<s c o t : name>moksec</ s c o t : name>
<s c o t : o w n a f r e q u e n c y >2</ s c o t : o w n a f r e q u e n c y >
<s c o t : o w n r f r e q u e n c y >6.4516125 </ s c o t :
own rfrequency>
< l i s t : p r i o r i t y >1.0</ l i s t : p r i o r i t y >
<s c o t : synonym>MokSec</ s c o t : synonym>
<s c o t : c o o c c u r e w i t h >h t t p : // n e t w o r l d . t o /?
s i o c t y p e=p o s t&amp ; s i o c i d =97</ s c o t :
cooccure with>
<s c o t : c o o c c u r e w i t h >h t t p : // n e t w o r l d . t o /?
s i o c t y p e=p o s t&amp ; s i o c i d =172</ s c o t :
cooccure with>
</ s c o t : Tag>
</ s c o t : c o o c c u r e t a g >
<s c o t : c o o c c u r e t a g >
<s c o t : Tag r d f : about=” h t t p : / / example . o r g / c o n t e x t /
r e s e a r c h ”>
<s c o t : name>r e s e a r c h </ s c o t : name>
<s c o t : o w n a f r e q u e n c y >3</ s c o t : o w n a f r e q u e n c y >
34
<s c o t : o w n r f r e q u e n c y >9.677419 </ s c o t :
own rfrequency>
< l i s t : p r i o r i t y >1.0</ l i s t : p r i o r i t y >
<s c o t : synonym>Research </ s c o t : synonym>
<s c o t : c o o c c u r e w i t h >h t t p : // n e t w o r l d . t o /?
s i o c t y p e=p o s t&amp ; s i o c i d =97</ s c o t :
cooccure with>
<s c o t : c o o c c u r e w i t h >h t t p : // n e t w o r l d . t o /?
s i o c t y p e=p o s t&amp ; s i o c i d =207</ s c o t :
cooccure with>
<s c o t : c o o c c u r e w i t h >h t t p : // n e t w o r l d . t o /?
s i o c t y p e=p o s t&amp ; s i o c i d =229</ s c o t :
cooccure with>
</ s c o t : Tag>
</ s c o t : c o o c c u r e t a g >
<s c o t : c o o c c u r e t a g >
<s c o t : Tag r d f : about=” h t t p : / / example . o r g / c o n t e x t /
n e t w o r l d ”>
<s c o t : name>n e t w o r l d </ s c o t : name>
<s c o t : o w n a f r e q u e n c y >1</ s c o t : o w n a f r e q u e n c y >
<s c o t : o w n r f r e q u e n c y >3.2258062 </ s c o t :
own rfrequency>
< l i s t : p r i o r i t y >1.0</ l i s t : p r i o r i t y >
<s c o t : synonym>Networld </ s c o t : synonym>
<s c o t : c o o c c u r e w i t h >h t t p : // n e t w o r l d . t o /?
s i o c t y p e=p o s t&amp ; s i o c i d =97</ s c o t :
cooccure with>
</ s c o t : Tag>
</ s c o t : c o o c c u r e t a g >
</ s c o t : Coocurrence >
<s c o t : C o o c u r r e n c e r d f : about=” h t t p : / / n e t w o r l d . t o /? s i o c t y p e=
p o s t&amp ; s i o c i d =333”>
<s c o t : c o o c c u r e t a g >
<s c o t : Tag r d f : about=” h t t p : / / example . o r g / c o n t e x t /
u n c a t e g o r i z e d ”>
<s c o t : name>u n c a t e g o r i z e d </ s c o t : name>
<s c o t : o w n a f r e q u e n c y >1</ s c o t : o w n a f r e q u e n c y >
<s c o t : o w n r f r e q u e n c y >3.2258062 </ s c o t :
own rfrequency>
< l i s t : p r i o r i t y >1.0</ l i s t : p r i o r i t y >
<s c o t : synonym>U n c a t e g o r i z e d </ s c o t : synonym>
<s c o t : c o o c c u r e w i t h >h t t p : // n e t w o r l d . t o /?
s i o c t y p e=p o s t&amp ; s i o c i d =333</ s c o t :
cooccure with>
</ s c o t : Tag>
</ s c o t : c o o c c u r e t a g >
</ s c o t : Coocurrence >
<s c o t : C o o c u r r e n c e r d f : about=” h t t p : / / d e v n u l l . n e t w o r l d . t o / f o a f
. r d f#me”>
<s c o t : c o o c c u r e t a g r d f : r e s o u r c e=” h t t p : / / example . o r g /
c o n t e x t / semanticweb ”/>
<s c o t : c o o c c u r e t a g >
<s c o t : Tag r d f : about=” h t t p : / / example . o r g / c o n t e x t /
d i v i n g ”>
<s c o t : name>d i v i n g </ s c o t : name>
<s c o t : o w n a f r e q u e n c y >1</ s c o t : o w n a f r e q u e n c y >
<s c o t : o w n r f r e q u e n c y >3.2258062 </ s c o t :
own rfrequency>
< l i s t : p r i o r i t y >1.0</ l i s t : p r i o r i t y >
<s c o t : synonym>Diving </ s c o t : synonym>
<s c o t : c o o c c u r e w i t h >h t t p : // d e v n u l l . n e t w o r l d . t o /
f o a f . r d f#me</ s c o t : c o o c c u r e w i t h >
35
</ s c o t : Tag>
</ s c o t : c o o c c u r e t a g >
<s c o t : c o o c c u r e t a g >
<s c o t : Tag r d f : about=” h t t p : / / example . o r g / c o n t e x t /
computernetwork ”>
<s c o t : name>computernetwork </ s c o t : name>
<s c o t : o w n a f r e q u e n c y >1</ s c o t : o w n a f r e q u e n c y >
<s c o t : o w n r f r e q u e n c y >3.2258062 </ s c o t :
own rfrequency>
< l i s t : p r i o r i t y >1.0</ l i s t : p r i o r i t y >
<s c o t : synonym>Computer Network</ s c o t : synonym>
<s c o t : c o o c c u r e w i t h >h t t p : // d e v n u l l . n e t w o r l d . t o /
f o a f . r d f#me</ s c o t : c o o c c u r e w i t h >
</ s c o t : Tag>
</ s c o t : c o o c c u r e t a g >
<s c o t : c o o c c u r e t a g r d f : r e s o u r c e=” h t t p : / / example . o r g /
c o n t e x t / a r t i f i c i a l i n t e l l i g e n c e ”/>
<s c o t : c o o c c u r e t a g >
<s c o t : Tag r d f : about=” h t t p : / / example . o r g / c o n t e x t /
c o m p u t e r s c i e n c e ”>
<s c o t : name>c o m p u t e r s c i e n c e </ s c o t : name>
<s c o t : o w n a f r e q u e n c y >2</ s c o t : o w n a f r e q u e n c y >
<s c o t : o w n r f r e q u e n c y >6.4516125 </ s c o t :
own rfrequency>
< l i s t : p r i o r i t y >1.0</ l i s t : p r i o r i t y >
<s c o t : synonym>c o m p u t e r s c i e n c e </ s c o t : synonym>
<s c o t : synonym>Computer S c i e n c e </ s c o t : synonym>
<s c o t : c o o c c u r e w i t h >h t t p : // k o n i . n e t w o r l d . t o / f o a f
. r d f#me</ s c o t : c o o c c u r e w i t h >
<s c o t : c o o c c u r e w i t h >h t t p : // d e v n u l l . n e t w o r l d . t o /
f o a f . r d f#me</ s c o t : c o o c c u r e w i t h >
</ s c o t : Tag>
</ s c o t : c o o c c u r e t a g >
<s c o t : c o o c c u r e t a g >
<s c o t : Tag r d f : about=” h t t p : / / example . o r g / c o n t e x t /
c o m p u t e r s e c u r i t y ”>
<s c o t : name>c o m p u t e r s e c u r i t y </ s c o t : name>
<s c o t : o w n a f r e q u e n c y >1</ s c o t : o w n a f r e q u e n c y >
<s c o t : o w n r f r e q u e n c y >3.2258062 </ s c o t :
own rfrequency>
< l i s t : p r i o r i t y >1.0</ l i s t : p r i o r i t y >
<s c o t : synonym>Computer S e c u r i t y </ s c o t : synonym>
<s c o t : c o o c c u r e w i t h >h t t p : // d e v n u l l . n e t w o r l d . t o /
f o a f . r d f#me</ s c o t : c o o c c u r e w i t h >
</ s c o t : Tag>
</ s c o t : c o o c c u r e t a g >
<s c o t : c o o c c u r e t a g >
<s c o t : Tag r d f : about=” h t t p : / / example . o r g / c o n t e x t /
b u s i n e s s ”>
<s c o t : name>b u s i n e s s </ s c o t : name>
<s c o t : o w n a f r e q u e n c y >1</ s c o t : o w n a f r e q u e n c y >
<s c o t : o w n r f r e q u e n c y >3.2258062 </ s c o t :
own rfrequency>
< l i s t : p r i o r i t y >1.0</ l i s t : p r i o r i t y >
<s c o t : synonym>B u s i n e s s </ s c o t : synonym>
<s c o t : c o o c c u r e w i t h >h t t p : // d e v n u l l . n e t w o r l d . t o /
f o a f . r d f#me</ s c o t : c o o c c u r e w i t h >
</ s c o t : Tag>
</ s c o t : c o o c c u r e t a g >
</ s c o t : Coocurrence >
<s c o t : C o o c u r r e n c e r d f : about=” h t t p : / / n e t w o r l d . t o /? s i o c t y p e=
p o s t&amp ; s i o c i d =172”>
36
<s c o t : c o o c c u r e t a g r d f : r e s o u r c e=” h t t p : / / example . o r g /
c o n t e x t / semanticweb ”/>
<s c o t : c o o c c u r e t a g r d f : r e s o u r c e=” h t t p : / / example . o r g /
c o n t e x t / b a c k t r a c k e r ”/>
<s c o t : c o o c c u r e t a g r d f : r e s o u r c e=” h t t p : / / example . o r g /
c o n t e x t / moksec ”/>
</ s c o t : Coocurrence >
<s c o t : TagCloud>
<s c o t : h a s t a g r d f : r e s o u r c e=” h t t p : / / example . o r g / c o n t e x t /
computernetwork ”/>
<s c o t : h a s t a g r d f : r e s o u r c e=” h t t p : / / example . o r g / c o n t e x t /
a r t i f i c i a l i n t e l l i g e n c e ”/>
<s c o t : h a s t a g r d f : r e s o u r c e=” h t t p : / / example . o r g / c o n t e x t /
b a c k t r a c k e r ”/>
<s c o t : h a s t a g >
<s c o t : Tag r d f : about=” h t t p : / / example . o r g / c o n t e x t /
web20 ”>
<s c o t : name>web20</ s c o t : name>
<s c o t : o w n a f r e q u e n c y >1</ s c o t : o w n a f r e q u e n c y >
<s c o t : o w n r f r e q u e n c y >3.2258062 </ s c o t :
own rfrequency>
< l i s t : p r i o r i t y >0.45</ l i s t : p r i o r i t y >
<s c o t : synonym>web20</ s c o t : synonym>
<s c o t : c o o c c u r e w i t h >h t t p : // k o n i . n e t w o r l d . t o / f o a f
. r d f#me</ s c o t : c o o c c u r e w i t h >
</ s c o t : Tag>
</ s c o t : h a s t a g >
<s c o t : h a s t a g >
<s c o t : Tag r d f : about=” h t t p : / / example . o r g / c o n t e x t /
m o b i l e d e v i c e s ”>
<s c o t : name>m o b i l e d e v i c e s </ s c o t : name>
<s c o t : o w n a f r e q u e n c y >1</ s c o t : o w n a f r e q u e n c y >
<s c o t : o w n r f r e q u e n c y >3.2258062 </ s c o t :
own rfrequency>
< l i s t : p r i o r i t y >1.0</ l i s t : p r i o r i t y >
<s c o t : synonym>Mobile D e v i c e s </ s c o t : synonym>
<s c o t : c o o c c u r e w i t h >h t t p : // n e t w o r l d . t o /?
s i o c t y p e=p o s t&amp ; s i o c i d =339</ s c o t :
cooccure with>
</ s c o t : Tag>
</ s c o t : h a s t a g >
<s c o t : h a s t a g r d f : r e s o u r c e=” h t t p : / / example . o r g / c o n t e x t /
n e t w o r l d ”/>
<s c o t : t o t a l t a g s >31</ s c o t : t o t a l t a g s >
<s c o t : h a s t a g r d f : r e s o u r c e=” h t t p : / / example . o r g / c o n t e x t /
semanticweb ”/>
<s c o t : h a s t a g r d f : r e s o u r c e=” h t t p : / / example . o r g / c o n t e x t /
c o m p u t e r s c i e n c e ”/>
<s c o t : h a s t a g >
<s c o t : Tag r d f : about=” h t t p : / / example . o r g / c o n t e x t /
f o o t b a l l ”>
<s c o t : name>f o o t b a l l </ s c o t : name>
<s c o t : o w n a f r e q u e n c y >1</ s c o t : o w n a f r e q u e n c y >
<s c o t : o w n r f r e q u e n c y >3.2258062 </ s c o t :
own rfrequency>
< l i s t : p r i o r i t y >0.45</ l i s t : p r i o r i t y >
<s c o t : synonym>f o o t b a l l </ s c o t : synonym>
<s c o t : c o o c c u r e w i t h >h t t p : // k o n i . n e t w o r l d . t o / f o a f
. r d f#me</ s c o t : c o o c c u r e w i t h >
</ s c o t : Tag>
</ s c o t : h a s t a g >
<s c o t : h a s t a g r d f : r e s o u r c e=” h t t p : / / example . o r g / c o n t e x t /
37
u n c a t e g o r i z e d ”/>
<s c o t : h a s t a g r d f : r e s o u r c e=” h t t p : / / example . o r g / c o n t e x t /
c o m p u t e r s e c u r i t y ”/>
<s c o t : h a s t a g r d f : r e s o u r c e=” h t t p : / / example . o r g / c o n t e x t /
networldteam ”/>
<s c o t : h a s t a g >
<s c o t : Tag r d f : about=” h t t p : / / example . o r g / c o n t e x t /
r o b o t i c s ”>
<s c o t : name>r o b o t i c s </ s c o t : name>
<s c o t : o w n a f r e q u e n c y >1</ s c o t : o w n a f r e q u e n c y >
<s c o t : o w n r f r e q u e n c y >3.2258062 </ s c o t :
own rfrequency>
< l i s t : p r i o r i t y >0.45</ l i s t : p r i o r i t y >
<s c o t : synonym>r o b o t i c s </ s c o t : synonym>
<s c o t : c o o c c u r e w i t h >h t t p : // k o n i . n e t w o r l d . t o / f o a f
. r d f#me</ s c o t : c o o c c u r e w i t h >
</ s c o t : Tag>
</ s c o t : h a s t a g >
<s c o t : h a s t a g r d f : r e s o u r c e=” h t t p : / / example . o r g / c o n t e x t /
moksec ”/>
<s c o t : h a s t a g r d f : r e s o u r c e=” h t t p : / / example . o r g / c o n t e x t /
r e s e a r c h ”/>
<s c o t : h a s t a g r d f : r e s o u r c e=” h t t p : / / example . o r g / c o n t e x t /
b u s i n e s s ”/>
<s c o t : h a s t a g r d f : r e s o u r c e=” h t t p : / / example . o r g / c o n t e x t /
d i v i n g ”/>
</ s c o t : TagCloud>
<s c o t : C o o c u r r e n c e r d f : about=” h t t p : / / n e t w o r l d . t o /? s i o c t y p e=
p o s t&amp ; s i o c i d =229”>
<s c o t : c o o c c u r e t a g r d f : r e s o u r c e=” h t t p : / / example . o r g /
c o n t e x t / semanticweb ”/>
<s c o t : c o o c c u r e t a g r d f : r e s o u r c e=” h t t p : / / example . o r g /
c o n t e x t / a r t i f i c i a l i n t e l l i g e n c e ”/>
<s c o t : c o o c c u r e t a g r d f : r e s o u r c e=” h t t p : / / example . o r g /
c o n t e x t / r e s e a r c h ”/>
</ s c o t : Coocurrence >
<s c o t : C o o c u r r e n c e r d f : about=” h t t p : / / k o n i . n e t w o r l d . t o / f o a f .
r d f#me”>
<s c o t : c o o c c u r e t a g r d f : r e s o u r c e=” h t t p : / / example . o r g /
c o n t e x t / semanticweb ”/>
<s c o t : c o o c c u r e t a g r d f : r e s o u r c e=” h t t p : / / example . o r g /
c o n t e x t / a r t i f i c i a l i n t e l l i g e n c e ”/>
<s c o t : c o o c c u r e t a g r d f : r e s o u r c e=” h t t p : / / example . o r g /
c o n t e x t / c o m p u t e r s c i e n c e ”/>
<s c o t : c o o c c u r e t a g r d f : r e s o u r c e=” h t t p : / / example . o r g /
c o n t e x t / r o b o t i c s ”/>
<s c o t : c o o c c u r e t a g r d f : r e s o u r c e=” h t t p : / / example . o r g /
c o n t e x t / f o o t b a l l ”/>
<s c o t : c o o c c u r e t a g r d f : r e s o u r c e=” h t t p : / / example . o r g /
c o n t e x t / web20 ”/>
</ s c o t : Coocurrence >
<s c o t : C o o c u r r e n c e r d f : about=” h t t p : / / n e t w o r l d . t o /? s i o c t y p e=
p o s t&amp ; s i o c i d =207”>
<s c o t : c o o c c u r e t a g r d f : r e s o u r c e=” h t t p : / / example . o r g /
c o n t e x t / semanticweb ”/>
<s c o t : c o o c c u r e t a g r d f : r e s o u r c e=” h t t p : / / example . o r g /
c o n t e x t / r e s e a r c h ”/>
</ s c o t : Coocurrence >
<s c o t : C o o c u r r e n c e r d f : about=” h t t p : / / n e t w o r l d . t o /? s i o c t y p e=
p o s t&amp ; s i o c i d =339”>
<s c o t : c o o c c u r e t a g r d f : r e s o u r c e=” h t t p : / / example . o r g /
c o n t e x t / semanticweb ”/>
38
<s c o t : c o o c c u r e t a g r d f : r e s o u r c e=” h t t p : / / example . o r g /
c o n t e x t / m o b i l e d e v i c e s ”/>
</ s c o t : Coocurrence >
</ r d f :RDF>
B
Appendix - Android Client
(a) Main Window with Options
(b) Settings
(c) Settings - FOAF URL
(d) Result - Search Term ”AI”
Figure 14: Contex-Aware Synonym (CAS) Client
39
Listings
1
2
3
4
5
6
7
8
9
10
11
SPARQL example . . . . . . . . . . . . . . . . . . . . . . .
Synset block in raw RDF format. . . . . . . . . . . . . . . .
Synonym cloud XSLT for visualization in HTML . . . . . .
Excerpt of the context term computer science . . . . . . . .
Excerpt of the context term computer science . . . . . . . .
Intersection of Synonyms and Static Context with SPARQL
Synonym Java Interface . . . . . . . . . . . . . . . . . . . .
SynEntry Java Interface . . . . . . . . . . . . . . . . . . . .
Context Tag Java Interface . . . . . . . . . . . . . . . . . .
Synonyms for AI abstracted as RDF file . . . . . . . . . . .
Context abstracted as RDF file . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
10
11
13
13
20
23
24
26
30
33
List of Figures
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Excerpt of the search result AI searched on http://www.google.
com (not filtered) . . . . . . . . . . . . . . . . . . . . . . . . . . .
Excerpt of the search result AI searched on http://www.google.
com (filtered) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The workflow of the computation of context-aware synonyms. . .
Excerpt of synset block visuzalized in HTML . . . . . . . . . . .
Excerpt of Hypernyms and Hyponyms for the search term artificial intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Excerpt of the sorted context visuzalized in HTML . . . . . . . .
Compute the similarity of two Contexts . . . . . . . . . . . . . .
Excerpt of context cloud relation. . . . . . . . . . . . . . . . . . .
Data Abstraction Visualization . . . . . . . . . . . . . . . . . . .
Architecture view from a developer perspective. . . . . . . . . . .
Sequence diagram from the perspective of the ComputeCAS class.
Use Case diagram that describes what task have to be fulfilled to
compute context-aware synonyms . . . . . . . . . . . . . . . . . .
Describes the dependencies in the context package. . . . . . . . .
Contex-Aware Synonym (CAS) Client . . . . . . . . . . . . . . .
40
6
7
8
11
12
15
16
17
20
22
23
24
25
39
In Acknowledgement
This work is part of the m:Ciudad [14] project carried out at the University Innsbruck and is supported by STI Innsbruck and the Telecommunications Research
Center Vienna (FTW).
References
[1] JavaSE 1.6 API. http://java.sun.com/javase/6/docs/api/.
[Online; accessed 19-April-2010].
[2] Jena - A Semantic Web Framework for Java.
http://jena.
sourceforge.net/. [Online; accessed 19-April-2010].
[3] Social Semantic Cloud of Tags. http://scot-project.org/. [Online;
accessed 08-April-2010].
[4] T. Berners-Lee, J. Hendler, and O. Lassila. The semantic web. Scientific
American, May 2001.
[5] Dan Brickley and Libby Miller. The Friend of a Friend (FOAF) project.
http://www.foaf-project.org/. [Online; accessed 19-February2010].
[6] Andrew Burton-Jones, Sandeep Purao, and C. Veda Storey. Context-Aware
Query Processing on the Semantic Web.
[7] W3 Consortium. Simple Knowledge Organization System (SKOS). http:
//www.w3.org/2004/02/skos/. [Online; accessed 19-February-2010].
[8] J. Danado, M. Davies, P. Ricca, and A. Fensel. An Authoring Tool for
User Generated Mobile Services. In Proceedings of the 3rd Future Internet
Symposium (FIS’10), 20-22 September 2010 Berlin, Germany, 2010.
[9] Gerard de Melo. Lexvo. http://lexvo.org/. [Online; accessed 24February-2010].
[10] Apache Software Foundation. Lucence. http://lucene.apache.org/.
[Online; accessed 14-July-2010].
[11] Freie Universität Berlin. TRiG Notation. http://www4.wiwiss.
fu-berlin.de/bizer/TriG/. [Online; accessed 23-October-2010].
[12] Google. Android 1.5 - API Level 3. http://developer.android.
com/sdk/android-1.5.html. [Online; accessed 19-April-2010].
[13] Google. Dalvik VM Internals. http://sites.google.com/site/io/
dalvik-vm-internals. [Online; accessed 19-April-2010].
[14] ICT (Information and Communication Technlogies). m:Ciudad FP7.
http://www.mciudad-fp7.org/.
[Online; accessed 19-February2010].
41
[15] Juha Kolari, Timo Laakko, Hiltunen Tapio, Veikko Ikonen, Minna Kulju,
Raisa Suihkonen, Santtu Toivonen, and Tytti Virtanen. Context-aware
services for mobile users - technology and user experiences. 2004.
[16] Friedemann Mattern and Christian Floekemeier. Vom Internet der Computer zum Internet der Dinge. Informatik Spektrum, 33(2):107–121, April
2010.
[17] MIT. MIT Java Wordnet Interface. http://projects.csail.mit.
edu/jwi/. [Online; accessed 19-April-2010].
[18] Nokia. TRiX Notation. http://sw.nokia.com/trix/trix.html.
[Online; accessed 23-October-2010].
[19] Michael Putcher. Performance Evaluation of WordNet-based Semantic Relatedness Measures for Word Prediction in Conversational Speech. December 2010.
[20] Tom Heath. Linked Data - Connect Distributed Data across the Web.
http://linkeddata.org/. [Online; accessed 19-February-2010].
[21] Princton University. Wordnet. http://wordnet.princeton.edu/.
[Online; accessed 24-February-2010].
[22] University Leipzig, Free University Berlin, and OpenLink Software. DBpedia.org. http://dbpedia.org. [Online; accessed 24-October-2010].
[23] W3C.
N-Triples Notation.
http://www.w3.org/TR/
rdf-testcases/#ntriples. [Online; accessed 23-October-2010].
[24] W3C.
N3 Notation.
http://www.w3.org/DesignIssues/
Notation3.html. [Online; accessed 23-October-2010].
[25] W3C. Official Semantic Web Site. http://semanticweb.org. [Online;
accessed 04-August-2010].
[26] W3C. RDF Schema. http://www.w3.org/TR/rdf-schema/. [Online;
accessed 04-August-2010].
[27] W3C.
RDF Syntax Grammar.
http://www.w3.org/TR/
rdf-syntax-grammar/. [Online; accessed 04-August-2010].
[28] W3C.
RDFa
Notation.
http://www.w3.org/TR/
xhtml-rdfa-primer/. [Online; accessed 23-October-2010].
[29] W3C. SPARQL. http://www.w3.org/TR/rdf-sparql-query/.
[Online; accessed 23-October-2010].
[30] W3C. Turtle Notation. http://www.w3.org/TeamSubmission/
turtle/. [Online; accessed 23-October-2010].
[31] Anna V. Zhdanova, Livia Predoiu, Tassilo Pellegrini, and Dieter Fensel. A
Social Networking Model of a Web Community. In Proceedings of the 10th
International Symposium on Social Communication, 2007.
42

Download Report

Extraction of Synonyms in User-Generated Content Bachelor Thesis

Paperzz.com

Your Paperzz