ppt - Jaime Teevan

Surviving the Information
Explosion
Jaime Teevan, MIT
with Christine Alvarado, Mark
Ackerman and David Karger
Let Me Interview You!

Web:
–What’s
the last Web page you visited? How did you get there?
–Have you looked for anything on the Web?

Email:
–What’s
the last email you read? What did you do with it?
–Have you gone back to an email you’ve read before?

Files:
–What’s
the last file you looked at? How did you get to it?
–Have you looked for a file?
Overview
Intro

Introduction
RW

Related Work
Study

Study Methodology
Res

Results: Search
Disc

Discussion
Overview
Intro

Introduction
RW

Related Work
Study

Study Methodology
Res

Results: Search
Disc

Discussion
The Information Explosion
Intro
RW
Study
Res
Disc
You must extract information from:
 3 billion Web pages (Google)
 Dozens of incoming
emails daily
 Hundreds of files
on your personal
computer
Haystack:
Personal Information Storage
Intro
Email
RW
Web pages
Haystack
Study
Res
Disc
Files
Calendar
Contacts
Haystack:
Personal Information Storage
Intro
RW
Study
Res
Disc
What was that paper I
read last week about
Information Retrieval?
Haystack
Haystack:
Personal Information Storage
Intro
RW
Study
Res
Disc
Ah yes!
Thank you.
Haystack
Supporting Information Interaction
Intro


RW
Treat different corpora the same?
Provide access to meta-data?
–
–
Keyword search (XP, advanced search)
Browse (Hearst)
Study
We don’t really know …
Res
Disc
 Understand access in the wild!
Overview
Intro

Introduction
RW

Related Work
Study

Study Methodology
Res

Results: Search
Disc

Discussion
–
Interaction by corpus
–
How people search
Interaction By Corpus
Intro

Paper documents
–
RW

Files
–
Study


[Abrams, et al. 1998], [Byrne, et al. 1999]
Email/Calendar
–
Disc
[Barreau & Nardi, 1995]
Web
–
Res
[Malone, 1983], [Whittaker & Hirshberg, 2001]
[Whittaker & Snider, 1996], [Bellotti & Smith, 2000]
How People Look for Information
Intro


RW
–

Study
Res
Focus: Web
Log analysis
Controlled tasks/environment
–

[Baldonado & Winograd, 1997], [Spool, 1998]
Situated navigation
–
Disc
[Catledge & Pitkow, 95], [Tauscher & Greenberg 97]
–
–
Micronesian islanders [Suchman, 1987]
Electronic [Marchionini, 1995], [Hearst, 2000]
Information scent [Chi, Pirolli, Chen & Pitkow, 2001]
Overview
Intro

Introduction
RW

Related Work
Study

Study Methodology
Res

Results: Search
Disc

Discussion
Method
Intro

Subjects
–
RW

Setup
–
Study
–

Res
Disc
15 MIT CS graduate students (5 women, 10 men)
10 short interviews (~ 5 min.)
1 long interview (~ 45 min.)
Topics
–
Web, Email, Files
Short Interviews
Intro


RW

Modified diary study [Palen, 2002]
Randomly interrupted participant
Two question types
–
Study
Res
Disc
–

Last email/file/Web page looked at
Last email/file/Web page looked for
Goal: Discover patterns in searching and
browsing
Long Interviews
Intro

RW

“Guided tour” of subject’s Web space, email,
and file system
Goals:
–
Study
–
Res
–
Disc
Discover organizational patterns
Discover problems in
organizational structure
Relate organization to
search/browse behavior
Overview
Intro

Introduction
RW

Related Work
–
What and how
Study

Study Methodology
–
Relating what and how
–
Individual strategies
Res

Results: Search
Disc

Discussion
Complex Information Spaces
Intro


RW
Study
Res
Disc
People had complex spaces
Felt in control
“That’s an interesting question. I think my email is
the worst, because I have so much of it. And there
are people on the other end who expect me to reply
to it. My file system is pretty well organized. I have
to go through it every once in a while, every couple
of months and just kind of push things into the right
folders and delete the old stuff. The Web just
works, usually.”
What People Look For
Intro

Specific Information
–
–
RW

General Information
–
Study
–
Res

A broad set of information
E.g., good sneakers to buy, info on cancer
Specific Document
–
Disc
A small fact
E.g., URL, phone number, appointment time
–
The actual document
E.g., a file to print, an email to reply to
How People Look For Information
Intro

The last thing you looked for on the Web
–
RW
Study
Res
Disc
Did you use a search engine?

Search is more than just keyword search

Browse, use bookmarks, type URLs
“I was looking to figure out where Glaris was. When I
lived in Switzerland there were only a few reasonable
mapping places of the country. And so I had
bookmarked [the Switzerland map site].”
Strategies Looking for Information
Intro

Teleporting
–
–
RW
–
Study

Orienteering
–
Res
–
–
Disc
Traditional search
Jump directly to target
Specify everything up
front
Use local navigation
[O’Day and Jeffries, 1993]
Could include keyword
search
Example: Orienteering
Intro
RW
Study
Res
Disc
Interviewer: Have you looked for anything on the Web today?
Jim: I had to look for the office number of the Harvard professor.
I: So how did you go about doing that?
J: I went to the homepage of the Math department at Harvard
[…]
J: I knew that she had a very small Web page saying, “I’m here at
Harvard. Here’s my contact information.”
[…]
I: So you went to the Math department, and then what did you do
over there?
J: It had a place where you can find people and I went to that page
and they had a dropdown list of visiting faculty, and so I went to
that link and I looked for her name and there it was.
Example: Teleporting
Intro

RW
Study
Res
Disc
What if Jim had teleported instead?
 Could have typed into a search engine:
“Connie Monroe, office number”
“Keyword Search” and “Browse”
Intro

Teleporting
–
–
RW
–
Study

Res
–
–
Disc
Traditional search
Jump directly to target
Specify everything up
front
Orienteering
–
“Keyword Search”
Use local navigation
[O’Day and Jeffries, 1993]
Could include keyword
search
“Keyword Search” and
“Browse”
Relating How and What
Intro
RW
Study
Res



Disc
Specific
General
Document
Orienteer
47
19
41
Teleport
34
23
17
People orienteer a lot
What people look for related to how they look
Surprise: Orienteer to specific information
Why So Much Orienteering?
Intro

Your last email search
–
RW
Study
–


Res
Disc
What were you looking for?
Did you know what email contained that
information?
People look for the information source
Specific information searches  Document
searches
Looking for the Source: Example
Intro
RW
Study
Res
Disc
“I was looking to figure out where Glaris was.
When I lived in Switzerland there were only a few
reasonable mapping places of the country. And so
I had bookmarked [the Switzerland map site].”
Looking for the Source: Example
Intro
RW
Study
Res
Disc
Interviewer: Have you looked for anything on the Web today?
Jim: I had to look for the office number of the Harvard professor.
I: So how did you go about doing that?
J: I went to the homepage of the Math department at Harvard
[…]
J: I knew that she had a very small Web page saying, “I’m here at
Harvard. Here’s my contact information.
[…]
I: So you went to the Math department, and then what did you do
over there?
J: It had a place where you can find people and I went to that page
and they had a dropdown list of visiting faculty, and so I went to
that link and I looked for her name and there it was.
Individual Strategies
Intro


RW
Study


Search strategies varied by individual
Pilers: Pile information
Filers: File information
Where was the last email you found?
–
Res
Disc
–
Inbox?
Elsewhere?
File or Pile Email
Intro
RW
Study
Res
# of searches
8
Filer
6
4
2
Piler
0
0
Disc
50
% found in Inbox
100
How Individuals Search For Files
Keyword Search
Intro
Orienteering
A
B
RW
Filers
Teleport
Pilers
Orienteer
C
D
E
Study
F
G
H
I
Res
J
K
L
Disc
M
0
1
2
3
4
5
6
7
8
9
Overview
Intro

Introduction
RW

Related Work
Study

Study Methodology
Res

Results
Disc

Discussion
–
Understanding and
applying what we learn
–
Future work
Understanding
Teleporting v. Orienteering
Intro
RW
Study
Res
Why was orienteering chosen over teleporting?
 Teleporting doesn’t work
 Teleporting requires too much cognitive effort
 Risk of over-specifying target
 Orienteering gives knowledge of the source
 Teleporting a failure mode
–
Disc
–
Can’t associate information with source
Can’t find the information source
Understanding
Filers v. Pilers
Intro
RW
Why do filers teleport more than pilers?
 Irony: Those with good organization don’t take
advantage of it
 Filers have strictly organized information
Study
Res
Disc
 Are used to defining meta-data for their information

Pilers loosely organize their information
 Are used to associative navigating
Haystack:
Applying What We Learn
Intro

Using meta-data: Support orienteering
–
RW
–

Study
Individualized support
–
–
Res
Disc
Not about having the perfect search interface
Need ability to prompt
Pilers/filers
Learning individual behaviors
Future Work: Search
Intro


RW
Study
Res
Disc


Previously viewed information
Causes of failure
Searches across corpus
Getting help from others
Future Work: Organization
Intro

RW


Study
Res
Disc

Consistency of organization
across corpus
Corpora boundaries
Context used in
organization
Organization’s
effect on search
Conclusion




Look at search in the wild
Strategies: Teleport/Orienteer
Individual strategies
Future systems should:
–
–
Support orienteering
Provide individualized support
Questions?
Contact us with comments:
- [email protected]
- [email protected]
To learn more about Haystack:
http://haystack.lcs.mit.edu
Relating How and Corpus
Intro
Orienteer
Email
59
Files
42
Web
19
06
10
64
RW
Teleport
Study


Res
Disc

Email and files: Almost always orienteered
Easy to associate information with document
Web: Teleported much more often
Relating What and Corpus
Intro
Specific
RW
General
Document
Study

Res


Disc
Email
39
Files
7
Web
33
10
08
7
35
30
14
Email searches were primarily for specific information
File searches were primarily for documents
Web searches were more evenly distributed