An Agent for Semi-automatic Management of Emails

An Agent for Semi-automatic Management of Emails
Fangfang Xiaa and Liu Wenyin b
a
Dept. of Computer Science & Technology, Tsinghua University, Beijing 100084, China
b
Dept. of Computer Science, City University of Hong Kong, Hong Kong SAR, China
[email protected]; [email protected]
ABSTRACT
Recent growth in the use of emails for communication and the corresponding growth in the volume of emails have made
automatic processing of email desirable. However, most existing systems failed to work in practice due to low
classification accuracy and inconvenient user interfaces. In this paper, we present an adaptive Personal Email Agent
(PEA) which can learn the mail handling preferences of its user and automatically categorize and manage its user’s
emails. One of the key ideas in this approach is extracting both the high-level semantic features (e.g., concept
information) from the body text and other low-level email features (e.g., sender, time, importance, etc.) from the entire
email message for similarity assessment based on the standard Information Retrieval (IR) approach. Another main
contribution of our work is establishing both global and local information space models for building relevance categories
based on the user’s folders. Besides, a query refinement strategy is incorporated to make the agent act as an incremental
learner. That is, it can adjust its working strategy based on only the new examples and avoid a total re-training using all
previous examples. To test the effectiveness of our system, we did experiments on its two main functions, email retrieval
and relevance categorization and obtained preliminary promising results.
Keywords: Email Overload, Email Management, Example -based Learning, Information Retrieval, Content-based
Retrieval, Relevance Categories, Query Refinement, Personal Email Agent (PEA)
1. INTRODUCTION
The explosion in electronic communication is dramatically changing the way people interact with one another. Email
overload [1,2] has become a growing problem since more and more users are embracing the online technologies in recent
years. According to Forrester Research, 7 trillion emails are sent per day in 2002 and an estimated 81 percent of
organizations that introduced email to improve their efficiency now complain that email is becoming a victim of its own
success. IDC estimates that in 2002 the average business user spends an average of over 2.4 hours a day just dealing with
an average of 30 work-related messages [2]. These numbers are still increasing or updated every day.
To address the problem of email overload, many researchers have done evaluation of some common manual
management strategies for emails, including Piorritizers, archivers [3], No filers, Spring cleaners, Frequent filers [2], and
Folderless cleaners [4]. Whittaker and Sidner [2] have found that a major aim of filing is to reduce the huge number of
undifferentiated inbox items into a relatively small set of folders each containing multiple related messages. Balter [5] has
developed a mathematical model to illustrate that storage time is the major time consumer for users with more than a
thousand stored messages and the best long term strategy is to use folders sparsely (4 to 20) in combination with the
search functionality. He suggested those users who want to use folders use agents that can automatically suggest folders
for archiving since the agents could help reduce the storage time drastically and a larger number of folders may help
reduce the time to retrieve a message.
Hence, the early research focused on a variety of machine learning techniques to classify emails into folders. Among
the famous prototypes, SwiftFile used shortcut buttons to archive messages into folders, but only when initiated by the
user [6]. Mock used a nearest-neighbor classifier to group inbox emails into categories in his experimental framework [7].
Some projects, such as Enfish Onespace, and Metastorm’s infowise, use information retrieval techniques to measure
similarities among folders or individual messages [8]. Other companies, such as Abridge, Plumtree, and Tacit, use rules or
user-supplied categories to group emails. There are also flexible email organizers. For example, the Gnus news and mail
reading system [9], distributed with recent versions of GNU Emacs has hooks that allow installation of arbitrary programs
for filtering and foldering news and mail. Furthermore, there are several open-source email readers which could be
modified to include a hook for arbitrary classifiers [10].
With the vast amount of interest and research that has been accomplished with automatic email categorization, why
hasn’t the concept been incorporated into existing e mail readers? The current difficulties with automatic email
1
organization exist in the following aspects . First, the user’s folders are usually not well organized and they change over
time as new messages are received; this inbox irregularity has set hurdles for accurate classification. Second, most of the
learning algorithms are based on statistics, and for the algorithms to perform well, a large amount of data must be on
hand; the training time is usually considerable. Third, many of the current algorithms do not learn incrementally : they
update by requiring a complete re-training based upon all data, including the original training messages. Fourth, most
existing systems provided limited user-oriented functions; they do not allow classification into multiple categories and
use imp licit rules that users cannot adjust.
In this paper, we focus on the issue of automatic categorization to save the time on archiving (when there are a large
number of folders) and present an example-based semi-automatic learning approach for this purpose. A prototype
system— Personal Email Agent (PEA) is built based on this approach, which can adapt to an individual user by learning
his/her email management preferences from the interaction examples between the user and the email system. Based on
the user’s p references, PEA can automatically categorize and manage his/her incoming and/or stored emails. One of the
key ideas in this approach is extracting both the high-level semantic features (e.g., concept information) from the text
and other low-level email features (e.g., sender, time, importance, etc.) from the entire email message for similarity
assessment. Another main contribution of our work is establishing both global and local information space models for
building relevance categories based on the user’s folders. Besides, a query refinement strategy is incorporated to make
the agent act as an incremental learner. Experiments have shown the effectiveness of the proposed approach.
The remainder of this paper is structured as follows. In Section 2, we present our solution of the Personal Email
Agent and describe its system architecture and user interface. We then present the core algorithms and other
implementation details in Section 3. We will also show the preliminary experimental results of the agent in Section 4.
Finally, we conclude and present some directions for future work.
2. SOLUTIONS
Many of the difficulties described with classification may be alleviated through better classifiers, while another way to
resolve these difficulties is to sidestep the entire problem with an alternate technology. We adopt one alternate
technology, Relevance Categories [8], which addresses some of the same information management issues as automatic
classification while avoiding many of the problems discussed in the previous section.
In order to utilize as much detail information as possible, we extract all useful features from an email message,
including sender, receipt, time, topic, body, etc. Different methods are then employed to compute the similarities
respectively. The overall similarity between two messages is the weighted sum of these features. Note that, different sets
of weights are assigned to the features in different folders. Learning from the user’s feedback, the weights can be
adjusted automatically to represent more exactly the user’
s preferences to the diverse features within one folder and thus
refine the query of this folder.
2.1 Architecture
The architecture of our agent system is shown in Figure 1. The system consists of two components: the user interface,
and the core component of Personal Email Agent. The user interface is divided into four parts: two functional parts and
two peripheral ones. The functional parts include an email retrieval interface and an email classification interface, both
of which provide user feedback interfaces. The system configuration part is where the user can set the parameters and
manually adjust part of the folder space coefficients. The non-feedback function part consists of some auxiliary functions
such as events logging and message filing according to their category. In the core component, we have three spaces, i.e.,
the weights space, the local information space and the global information space, five modules, namely, the feature
extractor, the nearest-neighbor similarity evaluator, the inverted indexer, the email matcher and the relevance categorizer,
and finally two databases which store low-level features and high-level semantic features, respectively. They work
together to perform both function and feedback routines.
A typical scenario of the system is as follows. Upon installation of the agent, the feature extractor scans all the emails
in the user’s personal folder; both low-level and high-level features of the emails are extracted and the corresponding
databases are constructed. Then, the nearest-neighbor similarity evaluator and the inverted indexer work simultaneously.
The indexer builds the global information space for each folder according to the existing inbox structure; the evaluator
compares emails within each folder to set up the local information space and decide the initial weights for the features.
Once the three space models are available, the matcher compares the user’s query with the local space model of emails to
2
yield the ret rieval results and the outcome is given in the form of a rank list. The user can denote irrelevant emails which
are ranked improperly high, and thus the negative feedback is applied. The relevance categorizer is triggered when a new
message comes in or the user adjusts the inbox structure, e.g., moving emails from one folder to another or creating new
folders. In these occasions, the agent first updates its database and space models and then refreshes its classification. The
agent learns from user feedbacks by refining inner space models to yield more accurate results in the future.
Figure 1. Architecture of the PEA
2.2 User Interface
We implement our Personal Email Agent as an add-in in Microsoft Outlook 2002 on Windows XP. The basic interface is
a supplemental command bar which is indicated within the red (or gray) rectangle (containing the “Email Retrieval”,
“Archive”, and “Settings” buttons) in the upper-right part of Figure 2.
Upon the first time startup, the scanning process is performed which automatically creates a category out of every
folder the user maintains. The messages in the folder are then associated with that category. While the agent is enabled,
new emails are automatically classified into the best matching folder. They are only grouped together but not moved
immediately. The user can view inbox emails that are grouped into categories and make the mails really go to their
assigned folders simply by clicking the “Archive”button. When the user manually adjusts the categorization result in the
inbox or move mail from one folder to another, relevance feedbacks are provided and the learning process is then
3
triggered. In these occasions, the agent will automatically show the accompanying changes it made and the user can
cancel some of them. The “Email Retrieval” button is used to aid users that wish to search emails. This function provides
the capability to quickly display a list of messages ranked by relevance (using the similarity metrics) to the selected
messages. In this manner, other messages in the same thread or in the same topic will be displayed at the top of the list.
The feedback mechanism is also provided for the email retrieval function. Finally, the “Settings” button is for users to
access and change the agent’s parameters such as constants and feature weights. Users can also enable or disable some
non-feedback functions and change the running modes there.
Figure 2. User Interface of the PEA
3. ALGORITHMS AND IMPLEMENTATIONS
3.1 Feature Extracting and Similarity Assessment
There are two kinds of features that can be used in our agent. One is low-level feature, such as sender, time, importance,
etc. The other is high-level semantic feature extracted from the subject and body of an email. We compute first the
similarity between two e mails at each level and then calculated their weighted sum as the overall similarity.
We implement the relevant email retrieval functionality of our agent by similarity assessment. All the emails are
compared with the query one and then sorted in the descending order of their similarities. A high rank usually indicates
significant relevancy.
3 .1.1 Low-level features
We have extracted eight basic low-level features in our agent. They are sender, recipients, creation time , importance,
body format and three Boolean variables (IsRead, IsReplied and IsWithAttachment). To compute the similarity, we also
incorporate an additive feature “sender-recipients”which is useful in some particular occasions. This is not another
independent feature; we add it mostly because of the following concern: In a quite frequent occasion, a user wants to
keep all his correspondence with a person in the same folder. However, either the sender or recipients feature alone
cannot help him. For example, two emails, one from A to B and the other from B to A, are obviously related, but the
similarities calculated based on sender and recipients are both 0. In such case, the sender-recipients feature mingles the
sender and receivers into one set and the similarity calculated on it should be 1. This feature is also useful for work
groups. The similarities corresponding to each of the features is computed differently and their detailed calculation
methods will be presented in an extended version of this paper.
4
3 .1.2 High-level features
We have e xtracted two high-level features in our agent. They are subject and body. Since they are both text features, we
use the same method to get the comparing results. Our implementation is based upon an inverted index with integrated
TF/IDF [11] values. The detailed algorithm will be presented in an extended version of this paper.
3 .1.3 The overall similarity
Although there may be many sophisticated similarity assessment methods, we use the simplest similarity models to
obtain the overall similarity. With high-level and low-level similarities calculated separately, the overall similarity is
simply calculated as the liner combination of them.
Note that different folders are assigned with different sets of weights and they are consistent ly refined by user’s
feedbacks. This is the key point for our agent to gain intelligence and will be further discussed in the following sections.
3.2 Folder Space and Relevance Categories
A key function of our agent is to classify emails according to existing folders. Section 3.1 gives an algorithm of
computing the similarity between two individual email messages. In order to assess the similarity between a message and
a folder, we should also build a user folder space model, through which the nature of different folders could be well
characterized.
Many existing systems achieve this goal by assigning each folder a vector compatible with the email vector. Since
such vector is usually the average of all the emails in the folder, its weakness in classifying is obvious as described in
Section 1. To utilize as much detail information as possible, we explore both global and local properties of a folder in
establishing its space model. (More exactly, “folder”here should be replaced by “relevance category”, a concept that will
be discussed soon.)
Global Information : Global information of a folder is the semantic information of all the messages in that folder (As
we shall introduce the relevance categories concept in the following text, the messages in the category linked with this
folder should also be included). The messages are concatenated and treated like a single document. The N most frequent
terms (either from the body or the subject field) and term frequencies are extracted. (In our agent, N was set to 50 by
default.) The resulting terms comprise part of the query for the category that it represents. Note that as the set of
messages changes, the queries are simple to update. All that is required is to re-compute the term frequencies.
Local Information: Local information of a folder is obtained by the simple nearest-neighbor method. Given a target
message to classify, its features are extracted and compared to all messages in the folder using the algorithm introduced
in Section 3.1. The top M matches are averaged as the local measure for the category. M was set to 3 by default in our
agent. The introduction of local information should be helpful since some users maintain too generic folders (e.g.,
“Projects”) encompassing multiple irrelevant sub-categories. It is also useful when dealing with topic-drift occasions.
The basic concept of Relevance Categories [8] is to provide the same functionality as regular folders or categories.
Users can assign email to categories, or remove them from categories just like they are normally used to. Relevance
Categories are initially built based on the existing folders in the user’s inbox. When new emails come in, they are
automatically assigned to one category by our agent. The user can manually correct the wrongly classifications or assign
one email to multiple categories. In these occasions, our agent will refine the queries based on the feedbacks, trying to
approach more precisely to the user’s subjective intention. Otherwise, the newly assigned emails will be regarded as
members of its category from then on, even though their real movements to the destination folders will not be applied
until the user explicitly perform the “Archive”function of our agent.
In the computation of the email-category similarity, a unique weight vector indicating the user’s preference placed on
different features is assigned to each category to obtain the weighted feature sum. Apart from the global and local
information, this weight vector is another important part of the folder space model, which alone builds up the Weights
Space. How to compute the weight vector and adjust it based on user feedbacks thus becomes the central problem in our
query refinement strategy.
3.3 Query Refinement Strategy
Queries are created for each relevance category. Corresponding to the folder space model, the query refinement strategy
for our agent could also be divided into two parts, the global query refinement and the local query refinement.
5
Global query refinement is an approach to the precise representation of the global semantic feature of a category.
Negative training could be employed for emails the user explicitly denotes as not belonging to the category. These might
arise in the agent’s email retrieval function if the user wishes to apply corrective action to highly ranked messages so that
they are displayed toward the bottom of the list. To apply negative training, the N most frequent terms are extracted from
the negative examples and subtracted from the N most frequent terms from the positive examples. This may result in
some terms with negative frequencies.
Local query refinement is mainly the adjustment of the weight vector mentioned in Section 3.2. Our agent learns
from user feedbacks in order that the weight vector will more and more tally with the user’s subjective emphasis on the
features. The detailed algorithm is presented in an extended version of this paper.
4. PERFORMANCE EVALUATION
In order to test the two main functions of our agent, email retrieval and email classification, we designed two
corresponding experiments. Since the effectiveness of the relevance categories on the purely semantic feature, i.e. our
global information space, has been tested by Mock over the Reuters-21578 corpus [8], we will only concentrate on the
overall performance of our agent on the multi-feature basis. The test data we use are mainly the daily emails of the
authors. The volume is not very large (about 1000). However, it represents a typical user’s situation well.
4.1 Retrieval Accuracy
In this experiment, we randomly select a number of emails (the number is less than 20, since usually a user does not
have the patience to select more than 5 emails in each iteration or go over more than 4 iterations) belonging to the same
category as query (positive feedback) examples and do email retrieval. Since we exactly use 100 emails as our ground
truth for each query and we also only actually check first 100 emails, the value of precision and recall are the same.
Therefore, we use the term “accuracy”to refer to both. The results are show in Figure 3, with the x axis being the
number of query (positive feedback) emails and the y axis the average retrieval accuracy. As the figure shows, the
average accuracy of email retrieval exceeds 50% when the number of query emails reaches 10.
80
70
Accuracy
60
50
40
30
20
10
0
1
4
7 10 13 16 19
Number of Query Emails
Figure 3. Retrieval Accuracy
4.2 Categorization Accuracy and Feature Abilities
The second experiment evaluates the performance of the categorizer on learning a user’s mail sorting preferences from
hand-sorted mails. The input data are six months of the first author’s sorted mails. Table 1 shows the folders and
distribution of messages in the data set. These data pose an interesting challenge for a learning system. Not only is the
distribution of messages in the folders highly non-uniform, but the selection of folders for messages is also strongly
idiosyncratic. While the content of the folder “FROM HER” was exclusively determined by a single keyword match
(sender=”Arendt”), other folders were not determined by a single keyword match with the “from” or “to” fields, but
rather by the subjective judgment of the first author of this paper of what folder would be the best mnemonic for later
retrieval of the message based on its content, time, recipients, etc. For example, the “REMINDER” folder only maintains
6
emails received within the recent week, while the “E-MAGAZINE” folder contains various HTML messages the first
author of this paper subscribed from various websites. In this case, the task of the agent is to learn a model of the user’s
email sorting preferences.
Table 1. Hand Archived Emails in Our Experiments.
Folder Name
Email Count
Percentage
CS 91
62
5.87%
E-MAGAZINE
317
30.0%
FROM HER
96
9.08%
MISCELLANEOUS
80
15.0%
PERSONAL
126
11.9%
PHILOSPHY GROOP
30
2.84%
PROJECTS
247
23.4%
REMINDER
21
1.99%
SOCCER
78
14.8%
1057
100%
Total Exemples
(a)
(b)
Figure 4. (a) Categorization Accuracy and (b) Feature Discrimination Abilities
The results of this experiment are shown in Figures 4 (a) and (b). Through learning, the agent achieves 82% test
accuracy after 100 training examples and 87% after 200. The weights of features begin to show the user’s different
emphasis on them as the number of training examples increases. We only show three of the features in the figure.
However, the trends of features are clear, which proves that the agent is capable of learning a user's preferences by our
query refinement strategy.
7
The strategy of our agent has many advantages. First, relevance categories are not such “hard” folders; they are
merely an add-on to existing categories and could be ignored and used exactly like a normal category without impacting
performance; therefore, the errors made by our agent are more likely tolerated by users. Second, based on the simple
similarity-computing algorithm, the management of our agent will still be possible in the presence of sparse data. Third,
since both high-level and low-level features are extracted, the agent can handle diverse occasions well. Our agent
obviously surpasses the traditional classifiers which focus only on the text features in dealing with categories like “From
her”in the above experiment. Fourth, the incorporation of global and local information enables the agent to fit for the
various user inboxes that are not well organized. Besides, t he query refinement can be done fast and hence can avoid the
problems that most classifiers have regarding to intensive computation at the adjusting stage.
5. CONCLUSION AND FUTURE WORK
We pres ent an intelligent agent which can learn from the user’s interactions with the email system and hence can semiautomatically manage the user’s emails. The feature that distinguishes our system from the existing email retrieval or
management approaches is fourfold. First, different features of emails are extracted with corresponding similarity
assessment methods designed for them. The employment of both high level semantic features and other low level
features enables our agent to perform ambidextrously. Second, the adoption of relevance categories for our UI sidesteps
some of the common hurdles that its peer systems normally face. Though the concept of relevance categories is really a
step back from pure categorization, it allows for multiple or overlapping categories and is more likely to be tolerated by
users when classification errors occur. Third, a unique space model is established for each user folder base on both
global and local information of its encompassing emails. This makes it possible for the agent to fit a user’
s e mail sorting
habits which may be extremely idiosyncratic. Fourth, an efficient query refinement strategy is presented to facilitate the
learning process.
The next phase is to further refine our space models. For example, noun phrase extraction, better term selection, use
of more terms, support for languages other than English and mix languages, variation of test parameters and assumptions,
and different similarity metrics might significantly improve the categorization accuracy. Additional work is also required
to quantify the performance of current classification algorithms with both test data and user studies. Besides, much work
remains to be completed in code enhancements such as latching into more Outlook events, database integration for
classifiers, or MS.NET upgrades. Finally, new experiments that integrate classification and information retrieval
techniques across email and into calendaring, notes, or other types of data may also be explored.
REFERENCES
1. Email overload--facts
email_overload.htm
and
figures:
an
e -mountain
of
e-mail.
http://www.amikanow.com/corporate/
2. Whittaker S and Sidner C. Email overload: explo ring personal information management of email. SIGCHI’96, pp.
276-283.
3. Pliskin N. Interacting with electronic mail can be a dream or a nightmare: a user’s point of view. Interacting with
Computers 1(3):259-272.
4. Bälter O. Strategies for organizing email messages . SIGCHI’97, pp. 21-38.
5. Bälter O. Keystroke level analysis of email message organization. SIGCHI’2000, pp. 105-112.
6. Segal R and Kephart J. Incremental learning in SwiftFile. ICML’2000.
7. Mock K. An experimental framework for email categorization and management. SIGIR’
2001.
8. Mock K. Dynamic email organization via relevance categories. ICTAI’99.
9. Ingebrigsten LM. Gnus network user services. http://www.gnus.org/.
10. Malone TW, Lai KY, and Fry C. Experiments with oval: a radically tailorable tool for cooperative work. ACM
TOIS 13(2):177-205
11. Salton G. Automatic Text Processing, Addison-Wesley, 1989.
8