Context based semantic data retrieval

Context based semantic data retrieval
Simone Santini and Alexandra Dumitrescu
Escuela Politécnica Superior, Universidad Autónoma de Madrid
Ciudad Universitaria de Cantoblanco
Calle Francisco Tomás y Valiente, 11
28049 - Madrid (España)
(simone.santini,doina.dumitrescu)@uam.es
Abstract. The large quantity of information accessible across the Internet has raised in a new and urgent form the information retrieval
problem. The search engines, services that based on keywords return a
list of documents more or less relevant, represent a first response to this
problem. Nevertheless, in spite of their great utility, the search engines
suffer from several limitations that affect the precision of the results of
the queries and, therefore, their utility. Solutions based on formal annotations, like the ones that are being proposed in the semantic web, suffer
important limitations since they do not take into account the context
in which the search is performed. In this work we present a search system based on the idea that each search is performed in the context of a
certain activity, context which will be formalized.
1
Introduction
The internet is today a truly impressive reservoir of information, so much so
that finding what one is after has become a challenge, one that needs the intervention of sophisticated technical instruments. Search engines have become a
fundamental component of the organization of the Web, but still, they are far
from perfect. It is well known that they are very sensible to linguistic phenomena
like polysemy and synonymy [4], which affect the precision of the results of the
queries. (Polysemy refers to a word that has multiple meanings and synonyms to
different words that have the same meaning.) It is by now widely acknowledged
that the solution to these problem passes throught the introduction of semantics
in the search process. The semantic web [2] solution to the “semantic problem”
consists of formalizing the meaning of the information that will be published in
the web, expressing it in a formal and unambiguous language. This line of work
presents many limitations and many problems, of a conceptual and pragmatic
type. From the conceptual point of view, the formalization of the information
does not consider that the meaning is not a property of a document, but the
result of a dialectical process of interpretation, a process that always forms part
of a given activity and that, therefore, develops in a context that depends on this
activity [16]. From the pragmatic point of view, the success of the formalization
depends on the will of the person who publishes the information to provide a
complete and precise annotation (meta-data). The sociology and the economic
reality of the web had made that several authors doubted that these perspectives
are realistic [5].
Our work is framed in a somewhat dual perspective: recognizing the priority of the context in the interpretation of documents, our effort is not directed
towards formal representation of documents, but towards the representation (formal or otherwise) of the context and the activity in which the search is made.
In this field, we follow Sperber and Wilson [17, 13] turning the traditional relationship between context and meaning topsy-turvy. The common way of seeing
this relation is to consider the effect of the context on the meaning: to interpret
a document in a context C supposes giving it a different meaning that if it was
interpreting the document in a context C 0 . Sperber and Wilson, and us with
them, consider the opposite relation: the meaning of a document is determined
by the change that the document causes in the context of interpretation. We are,
for example, in a context C and, in this context, we are interpreting a document
d. The presence of the meaning of d supposes a change in our context, from C
to C 0 . We postulate that the meaning of the document is the difference that d
causes to C.
2
The context
Various intents have been made in the sense of determining the context of a
search of a user [3, 12]. For instance, Watson project [3] is a system that uses the
text of the Word document or Web page that the user is editing or viewing to
automatically guessing the context information that will be added to the search
query. Remembrance Agent [14], while a user is manipulating a document in the
Emacs editor and based on this text and email messages and research papers of
the user, makes suggestions about possible relevant documents for the current
user situation. These systems that infer automatically context information have
the advantage that does not necessitate the involvement of the user.
Following the same line, in our work, the context is represented by the user’s
set of documents. The computer contains documents of different types that the
user has organized in a way that reflects the important associations to the activity that is developing; these documents, and their organization contain important information about the context of the search, information that we will
use for disambiguation and focusing. The information on the context, in other
words, come from two different sources: on one hand the analysis of the contents
of documents found in the user’s computer (or, as we shall see, some of them),
and on the other hand the analysis of the structure of the folders in which the
user has organized these documents. We always consider the search as part of
an activity, as can be implemented, for example, putting the search programs as
plug-in in applications such as word processors and electronic sheets. In other
words, when the user searches something he is always working on a document
located in a certain folder. The context of the computer, relativized in reference to this folder, will be used to modify and focus the search and filter the
results. For example, the word “bank” can mean financial institution, or shore
of a river/lake, depending on the context in which appears. If at any given
time a user works with documents related to financial data, and makes a query
about “bank”, she probably refers to the bank as financial institution, and with
this meaning of the word the results should be recovered. The ambiguity of the
meaning (generated in the example seen by the presence of the polysemic word
“bank”) is an essential characteristic of the natural language at the point that
many modern theories of interpretation deny that a word could be assigned only
one and constant meaning and, consequently, that a text has no fixed meaning
[6, 1]; its meaning is not determined only by its contents, but also by the activity
in which interpretation is placed. For instance, if the user is a biologist, is very
likely that in his computer the documents are organized so that a certain folder
contains working papers, in which case the word “bank” probably refers to the
bank of a river, and perhaps another folder contains information on the bank
accounts of the biologist, in which case, the meaning of the word “bank” will
be the bank as a financial institution. This example shows that we can’t assume
that there is a one-to-one correspondence between users and contexts: a user is
not characterized by a unique context, but by a multitude of contexts, one for
each one of his activities.
3
Implementing Context
The practical problems posed by the general orientation presented here include
how to capture ongoing activities, how to represent them and formalize them,
in such a way that they can be used as a basis for data access. For many people,
an increasing number of activities is carried out with the help of digital devices
of various nature, devices that contain data produced in the course of these
activities. These data form the digital trace of the activity, and can be used to
represent its context. We will reduce our attention to activities that take place
in an ordinary computer, model the context of these activities, and use it for
image search. Suppose that we are preparing a presentation for a conference to
which we had submitted a paper and that, during this process, we need to look
for a figure for the presentation. In order to prepare the presentation, we have
created a document in a directory (let us say the directory presentation) where
we have possibly copied some documents that we thought might be useful. This
directory is likely to be placed in a hierarchy. Its sibling directories will contain
documents somehow related to the topic at hand although, probably, not so
directly as those that can be found in the work directory, the primary context.
The siblings of the conference directory (and their descendants) will contain
documents related to my general area of activity. These constitute the accessory
context. This information, suitably encoded, will constitute the context of the
activity. Our context representation is based on a self-organizing map that we
lay out in a suitable space of words to constitute a latent semantic manifold,
that is, a non-linear low-dimensional subspace of the word space that capture
important semantic regularities among words. (Our latent semantic manifold
is to be contrasted with linear models such as the latent semantics subspaces
[4].) The technique is based on the self-organizing map WEBSOM [10], but
while WEBSOM and other semantic techniques have been used so far mainly
for the reprsentation of data bases, we shall use them as a context representation
technique, and as a technique for the creation of queries.
We shall divide the construction of the representation in two parts. First, we
build a low level representation that captures the syntactic regularities that exist
in the documents of the context. This process results in a context representation
in the form of a point cloud in a vector space whose axes represent words. In
this space we then train the self-organizing map to consitute the latent semantic
manifold that will be our final context representation and our query tool.
3.1
The point cloud
In each document directory of the user computer we build a complete context
representation, all the way to the latent semantics manifold, that depend on
the documents contained in the context of that directory. Here, we will consider a working directory D (our primary context) and a collection of directories
K = {D1 , . . . , Dk } that constitute its accessory context. The point cloud representation of the context of D, is called the index of D. This index is assembled
based on information derived separately for each directory, and that depend only
on the documents of the directory it represents. We call this the generator of the
directory. Let gen(D) be the generator of directory D. We build a generator for
D and for each directory in its context: gen(D1 ), . . . , gen(Dk ), then we merge
them using a suitable indexing function f to obtain the index of the directory
D, that is,
ind(D) = f (gen(D); gen(D1 ), . . . , gen(Dk )) .
(1)
Let us begin by considering the construction of the generator of a directory.
We begin by collecting all documents in the directory in a single, large document. To this document we apply standard algorithms for stopword removal and
stemming. The result is a series of stems of significant words (see figure 3.1).
From this sequence of stems we consider groups of n consecutive stems, which
form what we call word groups. In figure 3.1 we have illustrated the case n = 2,
which is the one that we shall consider in this paper. Correspondingly, we will
talk about word pairs rather than groups. Let the words (terms) found in the
directory be [t1 , . . . , tW ], (ij) be the pair formed by the word ti followed by the
word tj , P the set of all pairs found in the documents, and Nij the number of
times that the pair (ij) appears. Then in the generator of the directory D, the
pair (ij) is given a weight
D
ωij
=P
Nij
(hk)∈P
Nhk
(2)
The set of all these weighted pairs constitutes the representation of the directory,
that is, its generator. Note that the weighting scheme that we are using is fairly
simple, and it doesn’t take into account the document frequency of a word as is
fourscore and seven years ago our forefathers came to these shores
(stop word removal, stemming)
?
fourscore seven year forefather come shore
?
?
(year,forefather)
(fourscore,seven)
?
?
(come,shore)
?
(seven,year)
(forefather,come)
Fig. 1. Initial steps for the construction of the generator. The text of the documents
in the directory is first processed by removing stopwords and doing stemming on the
remaining words. From the list of stems we then extract all pairs of consecutive words.
the case, for instance, of the tf/idf weighting scheme. The reason for this is that,
in our case, we do not have a collection of documents against which we compare
the frequencies that we find in a document: all we have is a single context, with
the frequencies relative to it. Note that it is conceptually impossible to have
such a reference collection: such a collection would ipso facto become part of the
context, and its word frequencies should be counted together with those of the
context.
In order to create the index, we put together the generators of the directory
D and all the directories of its accessory context K through a linear combination
of the weights of homologous word pairs. Let 0 ≤ γ ≤ 1 be a constant. Then for
each pair (ij) we define the index weight as
D
D
wij
= γωij
+
1−γ X F
ωij
|K|
(3)
F ∈K
The pairs with these weights constitute the point cloud representation of the
context of the directory D, that is, its index. The points are represented in a
vector space whose axes are the words t1 , . . . , tW . In this space, the pair (ij),
D
is represented by the point
with index weight wij
i
z }| {
D
D
pij = (0, . . . , wij
, 0, . . . , wij
, 0, . . . , 0)
{z
}
|
(4)
j
Each point pij lies in the two-dimensional sub-space determined by the axes ti
and tj (see figure 3.1) At the end of this step, the context of the directory D is
seven
6
w23 -s(seven,year)
6
w23
?
+
w34
+
-s
year
3
axes:
1:
2:
3:
4:
5:
6:
fourscore
seven
year
forefather
come
shore
w34
(year,forefather)
forefather
Fig. 2. Position of the points of a point cloud in the word space.
represented by a set fo points ID in this space, a point for each word found in
the directory D or in one of the directories of its context K.
3.2
The latent semantics manifold
The point cloud thus built is used as the training data for a self-organizing map
deployed in the term space. The map is a grid of elements called neurons, each
one of which is a point in the word space and is identified by two integer indices,
that is, a neuron is given as:
µν
[µν] = (uµν
1 , . . . , uT )
1 ≤ µ ≤ N, 1 ≤ ν ≤ M
(5)
The map is discrete, two dimensional with the 4-neighborhood topology. That
is, given the neuron [µν], its neighbors are the neurons [(µ − 1)ν], [(µ + 1)ν],
[µ(ν − 1)], and [µ(ν + 1)]. We can visualize the map as a grid laid out in the
word space with rods joining neighboring neurons. Given two neurons, [µν] and
[ζξ], we can measure their distance in two ways:
i) as points in the word space, that is, assuming that the metric is Euclidean,
as
" T
# 21
X ζξ
µν 2
(6)
d([ζξ], [µν]) =
(ui − ui )
i=1
Note that this distance can be computed between any point in the word
space and a neuron:
"
d(p, [µν]) =
T
X
i=1
# 12
2
(pi − uµν
i )
(7)
ii) as points in the grid using the graph distance between them (also called the
chemical distance): δ([ζξ], [µν]) = |ζ − µ| + |ξ − ν|. If the neurons are dense
and form a continuum, this distance reduces to a geodesic distance in the
two-dimensional manifold of the map [15].
On tis map we define a neighborhood function, h(t, n), which depends on two
parameters t, n ∈ N; n is the graph distance between a given neuron (the neuron whose neighborhood we are determining) and another neuron, t is a time
parameter that increases as learning proceeds. The function h(t, n) represents
the “degree of neighborhood-ness” of two neurons at a distance n at time t; for
it we postulate the following properties:
i) ∀t.(t ≥ 0 ⇒ h(t, 0) = 1);
ii) ∀t, n.(t ≥ 0 ∧ n ≥ 0 ⇒ 0 ≤ h(t, n) ≤ 1);
iii) ∀t, n.(t ≥ 0 ∧ n ≥ 0 ⇒ h(t, n) ≥ h(t + 1, n), h(t, n) ≥ h(t, n + 1));
The degree to which neuron [ζξ] belongs to the neighborhood of neuron [µν]
at time t is given by h(t, δ([ζξ], [µν])). Condition iii) localizes the neighborhood
around [µν] and causes it to “shrink” in time. In addition to the neighborhood
we define a learning parameter α(t), t ∈ N such that
i) ∀t.(t ≥ 0 ⇒ 0 ≤ α(t) ≤ 1);
ii) ∀t.(t ≥ 0 ⇒ α(t) ≥ α(t + 1));
In order to create the latent semantic manifold for a directory D, all the
points in the index ID are presented to the map, and the training algorithm is
applied. We call the presentation of a point p ∈ ID an event of learning, and
the presentation of all the points of ID an epoch. Learning consists of a number
of epochs, counted by a counter t. The neurons of the map are at first spread
randomly in the word space; then, for each event consisting of the presentation
of the point p, the following learning steps take place:
i) the neuron that is closest to p according to the word space distance is found:
[∗] = arg min d(p, [µν]);
(8)
[µν]
ii) the neuron [∗] and all its neighbors are shifted towards p. The amount of this
shift depends on the learning parameter α and on the distance from [∗] on
the map:
∀[µν]
[µν] ← [µν] + α(t)h(t, δ([∗], [µν])) · (p − [µν])
(9)
After learning, each neuron crystalizes in a final position in the word space. The
µν
neuron [µν], placed at point (uµν
1 , . . . , uT ) induces a weighting in the words of
the context given by
µν
t1 : uµν
(10)
1 · · · tT : u T
This observation will be instrumental in deriving our simplified query scheme,
as will be shown in the next section.
3.3
The query
The training procedure described in the previous section results in a context
representation for each working directory. The same representation is used as a
basis for the contextualization of queries done in a working directory D.
We begin with the terms entered by the user, which we call the inquiry, to
distinguish it from the conceptual query that we will create. The inquiry may
be composed of a set of keywords, a sentence, even a whole paragraph. In any
case, we process it, as we did with the documents during context creation, by
removing the stop words it contains and by stemming. The result is a series of
stems (keywords) Y = {tk1 , . . . , tkq }. For the sake of generality, we assume that
the user associated weights {uk1 , . . . , ukq } to these terms1 . The inquiry can thus
be represented as a point in the word space:
q=
q
X
ukr ekr
(11)
r=1
i
z }| {
where ei = (0, . . . , 0, 1, 0, . . . , 0). The inquiry modifies the context by subjecting
it to a sort of partial learning. Let [∗] be the neuron in the map closest to the
inquiry point q. The map is updated, through a learning iteration, in such a way
that the neuron [∗] gets closer to the point q by a factor φ, with 0 < φ ≤ 1.
That is, given that the distance between q and [∗] is d(q, [∗]) before the context
modification, it will be (1−φ)d(q, [∗]) afterwards (see figure 3.3). All the neurons
target
context [µν]0
qr
6
J
[∗]0 r
J
p p p p prXXX
H
p p p p pH
^
J
Xr 6 d
p
r
p
p
p
p
HHr
ppppp rp
r
r
r φd
r
X
rr
r r XX
? ?
X
r
[∗]
H
YH context [µν]
r
r
r
Fig. 3. Displacemente of the map towards the query point q to create the target context.
in the neighborhood of [∗] will be updated by changing their position as
[µν]0 ← [µν] + φh(t, δ([∗], [µν])) · (q − [µν]).
1
(12)
In the most common applications, the user will not specify any queries, and all the
weights will be equal; we use this formulation so as not to rule out the possibility
that in some applications of context search weights will indeed be available.
This is the target context of the query. According to the semantic model that we
are using, the target semantics for our query is given by the difference between
the target context and the original one:
˜ = [µν] − [µν]0 = φh(t, δ([∗], [µν])) · (q − [µν])
[µν]
(13)
˜ in a neighborhood of [∗]
˜ constitute our complete query expresThe values [µν]
sion. As we shall mention in the conclusion, one of the purposes of our continuing
activity is the development of a server capable of serving queries expressed in
this guise.
3.4
The approximate query
In this paper, however, we are considering client-only approaches to contextual
˜ into a query that today’s servers
queries, so we have to transform the map [µν]
can handle. To this end, we resort to the observation made in a previous section,
that each neuron, positioned in the word space, induces a weighting of the words
in the context. Let us consider a (small) neighborhood N of [∗], the neuron closest
to the inquiry. For each [µν] ∈ N we have a weighting of each term as in (10).
The raw weight of the term tk is the sum of the weight that it receives from all
the neurons in N:
X [µν]
ẑk =
uk
(14)
[µν]∈N
This weighting scheme, potentially, assigns a weight to all the words of the
context, but, in practice, the vast majority of these weights will be zero or very
close to zero. We consider a set Z consisting of the terms with the highest weights.
This set can be chosen based on two policies, specified during configuration:
either one selects a fixed number of words with the highest weights, or one selects
all the words whose weight is above a fixed threshold. Be it as it may, once we
have the set Z we can normalize the weights with respect to the maximun, to
obtain the final weighting scheme zk = zẑMk , where zM = max{zk |k ∈ Z}. Due to
the preliminary operations that we performed, the axes of the word space that
is, the terms tk are stems rather than actual words. We use a file in which we
had saved the correspondence between stems and words and use, in lieu of the
axis term ti , the corresponding word τi . The query is then the set of weighted
words
{(τk , zk )|k ∈ Z}
(15)
4
Experiments
As we saw, the approach presented here focuses on expanding the query by
adding contextual information so as to retrieve documents not only relevant to
the user query but also to the contextual information. This approach presents a
significant advantage because it helps in improving the precision of the results
as we can see in this section.
Textual search results We conducted the searches for two distinct working
contexts, each one from distinct areas: computer science and neurophysiology.
The computer science documents were taken from a real context of a computer
science professor. For neurophysiology we used a collection of documents collected during the preparation of a report. It must be noted that none of these
collections was prepared specifically for the experiment, but that they represent
the context as it was found on the computer of one of the authors. As in our
experiments we used Google (www.google.com) search engine which does not
permit querying using weights for each keyword, we used the possibility of attributing more “weight” to a keyword versus the others, that is ranking in order
of importance the keywords and by repeating the words that are more important.
For each area, we conducted searches for 30 different queries with words from
the context. For each query we measure the precision for the first eight results.
As in real scenarios is not possible to know the number of relevant documents
in the collection, recall is impossible to be calculated. The results are shown in
the figure 4.
Context: computing
1.00
r
b
0.00 0.00
r
b
r
b
r
b
r
b
r
b
Context: neurophysiology
r
b
r
b
9.00 0.00
r
b
r
b
r
r
r
r
r
r
b
b
b
b
b
b
rwith context
bwithout context
9.00
Fig. 4. Precision of the results for computing and neurophysiology contexts.
In both cases, the use of context entailed statistically significant improvement in precision but, from a superficial look at the data, the improvement
appears more pronounced in the case of computing. The reasons for this might
be several, some of which are probably due to the inherent characteristics of
the search engine rather than to the use of context, but it is likely that, on
the context side, the difference is due in good part to the different nature of the
words used in the two contexts. Neurophysiologists, by and large, use neologisms
with relatively uncommon Greek roots to indicate important concepts so, in this
case, even a single word is sufficient to characterize the context with sufficient
precision, and further contextual information brings only marginal advantages.
Computing, on the other hand, tends to borrow words from other areas without modifying them, or making only superficial adjustments. That is, computing
words are much more ambiguous than neurophysiological ones, and are by themselves a poor characterizer of context. In this case, we can expect that a more
complete characterization of context will bring the greatest advantages, as is
indeed observed in the experiments.
In the following, we will consider some examples of the most representative
queries which leads us the attention: the word relationship in the context of
neuropsychology and executed for computing. In the case of the non-contextual
query word relationship (in other words, the query contains only the word relationship), in the majority of the results returned, the word of the query had
the meaning of couple relationship, meanwhile for the contextual query the results were more relevant to the context used. The word executed is also a good
example of seeing how context made the difference as the documents retrieved
without involving the context are in most of the case referring at execution as
death penalty, and no as execution of a program as it is more probable in the
context of computing. In these cases the results were very different and we could
see clearly that the context made the difference when it comes to improving
precision of the results. Nevertheless, analyzing more, it was seen that for some
words the context didn’t make any difference as the words were containing all the
contextual information necessary for retrieving good results (e.g. cell, molecule
for the neuropsychology field, and algoritm in computing).
Context: computing
1.00
r
b
0.00 0.00
r
b
r
b
r
b
r
b
r
b
Context: neurophysiology
rWith context
bWithout context
r
b
r
b
9.00 0.00
r
r
r
r
r
r
b
b
b
b
b
b
r
b
r
b
9.00
Fig. 5. Precision of the image results for the computing and neurophysiology contexts.
Image search The same approach was tested for image retrieval using Google
image search engine. In order to conduct the searches we use the computing and
neurophysiology context from the previous tests. The results are shown in figure
5. In each case the results were considered as relevant if the pictures, in the user
judgment, could conceivably be used as a technical illustration in a presentation
or a paper in the subject of the context (computing or neurophysiology). As we
can see the results demonstrate a good starting point, nevertheless it must be
stressed again that these are absolutely preliminary results that we use only as
an indication of the viability of context use in multimedia. We do believe that
a proper way of using context in this case entails the creation of multimedia
contexts and the design of specialized context-sensitive search engines.
5
Conclusions
In this work we presented the preliminary results of a semantics search based on
the formalization of the search context. This solution contrasts, on one hand, and
complements, on the other hand, the current solutions studied for the semantic
web, which are based on the formalization of the contents of the documents. We
have presented results that suggest that the context can be an important factor
in focalization and disambiguation of searches. This work is a interesting starting
point for a number of possible research directions, among which we underline:
on one hand the creation of a search server capable of managing contextual
searches, and on the other, the integration of semantic web techniques and the
study of techniques for the representation of the context that can be integrated
with the typical logic descriptions of the semantic web.
Acknowledgment
This work was supported in part by Consejerı́a de Educación de la Comunidad Autónoma de Madrid, under the grant CCG08-UAM/TIC/4303, Búsqueda
basada en contexto como alternativa semántica al modelo ontológico. Alexandra
Dumitrescu was also in part supported by the European Social Fund, Universidad
Autónoma de Madrid.
References
1. Roland Barthes. S/Z. Paris:Seuil, 1976.
2. T. Berners-Lee. The semantic web. Database and Network journal, 36(3):7–10,
2006.
3. J. Budzik and K. J. Hammond User interactions with everyday applications as
context for just-in-time information access. In IUI ’00: Proceedings of the 5th international conference on Intelligent user interfaces, (New York, NY, USA, 2000),
ACM, pp. 44-51.
4. Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and
Richard Harshman. Indexing by latent semantic analysis. Journal of the American
Society for Information Science, 41(6):391–407, 2000.
5. Cory Doctorow. Meta-crap: Putting the torch to seven straw men of the metautopia. Published on-line, 2001.
6. H. G. Gadamer. Truth and Method. Continuum, London and new York, 1975.
7. T. Gruber. Toward principles for the design of ontologies used for knowledge
sharing. International Journal of Human-Computer Studies, 43(4):907–28, 1995.
8. T. Gruber. Ontology of folksonomy: A mash-up of apples and oranges. International Journal on Semantic Web and Information Systems, 3(1):1–11, 2007.
9. E. J. Glover, S. Lawrence, M.D. Gordon, W.P. Birmingham and C.L. Giles Web
search—your way. Commun. ACM 44, 12 (2001), 97-102.
10. S. Kaski. Computationally efficient approximation of a probabilistic model for
document representation in the WEBSOM full-text analysis method. Neural Processing letters, 5(2), 1997.
11. T. Kohonen. Self-organizing maps. Heidelberg, Berlin, New York:Springer-Verlag,
2001.
12. S. Lawrence. Context in web search. IEEE Data Engineering Bulletin 23, 3 (2000),
25-32.
13. José Carlo Rodrı́guez. Jugadas, partidas y juegos de lenguaje: el significado como
modificación del contexto. Asunción:Centro de documentos y estudios, 2003.
14. B. J. Rhodes and T. Starner Remembrance Agent: A Continuously Running Automated Information Retrieval System. In Proceedings of the 1st International
Conference on the Practical Applications of Intelligent Agents and Multi-Agent
Technologies (1996), pp. 487-495.
15. Simone Santini. The self-organizing field. IEEE Transactions on Neural Networks,
7(6):1415–23, 1996.
16. Simone Santini and Alexandra Dumitrescu. Context-based retrieval as an alternative to document annotation. In Proceedings of OntoImage 2008, Workshop of
LREC 2008. LREC, 2008.
17. D. Sperber and D. Wilson. Relevance. Communication and Cognition. Blackwell,
Malden, Oxford, Victoria, Berlin, 2002.