in problematic situations - Homepages | The University of Aberdeen

Overspecified reference in hierarchical
domains: measuring the benefits for readers
Ivandre Paraboni *
Judith Masthoff #
Kees van Deemter #
* = University of Sao Paulo
# = University of Aberdeen
What this is about
• Generation of Referring Expressions (GRE)
• Referring expression is overspecified
if a clear referring expression can be obtained
by removing a property
• Informally:
overspecified = logically redundant
Introduction to the problem
Suppose
– I live on Western Road, the longest street in
Aberdeen
– I live at number 968. No other house in Aberdeen
has that number
“Number 968, Aberdeen” is a distinguishing description,
but it’s not very useful
It’s better to add logically redundant information,
e.g., “968 Western Road, Aberdeen” , or even
“968 Western Road, Bon Accord, Aberdeen”
Overspecification in
referring expressions
• Any GRE algorithm that does not achieve “Full
Brevity” (Dale 1989)
• Investigated in its own right by e.g.
– Arts 2004 (role of location; purely empirical)
– Jordan 2000 (overspec in specific situations,
e.g., when a sale is confirmed)
– Horacek 2005 (overspec when there is
uncertainty about applicability of properties)
Our focus:
• The need for overspecification when a
large domain is not fully known in
advance to a hearer. Typical examples
involve space or time:
– A house in a city, a photocopier in a building,
a picture in a document
– (An event or object in time, e.g., ‘the minister
of the colonies in the XYZ government’ )
• This talk: empirical validation of
algorithms
Caveat
• Overspecification can make it easier to
identify the referent ...
• ... but it is bound to lengthen reading times
• Our terminology: we expect
overspecification
– to make interpretation harder
– to make resolution easier
Short history ...
Paraboni & van Deemter (INLG-2002):
• A simple theory of the way in which hearers perform
search. Ancestral Search (AS)
• Two types of situations that AS predicts to be
problematic for hearers: Lack of Orientation (LO) and
Dead End (DE).
• An algorithm (in two flavours) that adds redundant
information when AS predicts these problems
• An experiment to test whether these algorithms improve
the output of GRE
(1) Lack of Orientation (LO)
University of Brighton
Watts building
North Wing
Cockcroft building
South Wing
auditorium
North
biblioteca
“the West
Wing”
West
South
biblioteca
(2) Dead End (DE)
University of Brighton
Watts building
North Wing ?
Cockcroft building
South Wing
auditorium
North
library
“the library in
the North
Wing”
West
South
library
Explanation (informal!)
• Why are LO and DE bad?
• Ancestral Search (AS):
“Search locally, then one level up at a time”
• Essentially, this is just salience (cf.
Krahmer & Theune 2000) applied to
hierarchies
Summary of Experiment 1:
Descriptions compared by subjects
•
•
15 subjects were shown documents
from which most of the words were
deleted
Binary forced choice between two
expressions that refer to document parts:
1. the obvious minimal description
2. the redundant description
generated by our algorithm
What the subjects chose between (example)
Hypotheses & Outcomes
• Hyp 1: In problematic situations,
redundant descriptions are preferred
• Hyp 2: In non-problematic situations,
non-redundant descriptions are preferred
• Outcomes:
– Hyp 1: overwhelmingly confirmed
– Hyp 2: trend in the right direction (57%),
but not statistically significant. (Too few
subjects?)
Limitations of first experiment
• This experiment was hybrid: partly about
reading, partly about writing
• It did not teach us why redundant descriptions
were preferred (in problematic cases)
• We think this was because non-redundant
descriptions caused problems for resolution ...
• ... but the experiment did not address resolution
separately. (Subjects may have balanced interpretation
and resolution when judging).
What next?
• Therefore, a new experiment was called for,
which addresses resolution only.
• Documents as our domain again
• Add hyperlinks to support non-linear search
through the document
• Track readers’ resolution (i.e., search) process
• Intricate experiment, hence a new author:
Judith Masthoff (University of Aberdeen)
Experiment 2:
Tracking resolution
• Effect of logical redundancy on the
performance of readers
• Focussing on resolution
Experimental Design
• 40 subjects completed experiment
• Within-subjects design:
each subject shown 20 documents
• Order of documents randomized
• Documents were made to look different
• Reader had knowledge of hierarchical structure
• Reader was given task: “Please click on..”
• Navigation actions recorded
Reader Location
“Let’s talk about helicopters.
Please click on picture 4 in part C”
Hypothesis 1
• In a problematic (DE/LO) situation, the number
of navigation actions required for a long (FI/SL)
description is smaller than that required for a
minimal description.
• Informally: redundancy helps resolution!
(in problematic situations)
But ...
• it seems likely that redundant information
will always help resolution
• so let’s compare the “Gain” in
problematic/unproblematic situations
Hypothesis 2
• The Gain achieved by a long description over
a minimal description will be larger in a
problematic situation than in a non-problematic
situation
• Informally: redundancy helps especially in
problematic situations
But ...
• Even more redundancy might have helped even more
• The obvious candidate: a complete description
• Compare cases where our algorithm prescribes a
complete description with ones where it does not.
• We want b to be greater than a:
a = Gain(complete-description,
incomplete-description-generated-by-algorithm)
b = Gain(complete-description-generated-by-algorithm,
incomplete-description)
Hypothesis 3
• The Gain of a complete description
over a less complete one
will be larger for a situation in which our
algorithms generated the complete description,
than for a situation in which our algorithms
generated the less complete description.
Results: Hypothesis 1
Do redundant descriptions benefit
problematic situations?
# Clicks 9
8
7
6
5
4
3
2
1
0
MD
Long (SL/FI)
1 DE 2 DE 3 LO 4 LO 5 LO 6 LO
Situation
Results: Hypothesis 1
Do redundant descriptions benefit
problematic situations?
# Clicks 9
8
7
6
5
4
3
2
1
0
MD
Long (SL/FI)
1 DE 2 DE 3 LO 4 LO 5 LO 6 LO
Situation
Yes!
Results: Hypothesis 2
Do redundant descriptions benefit problematic
situations MORE than non-problematic situations?
# Clicks 5
4
MD
Long
3
2
1
0
1 DE
7 NONE
2 DE
Situations
8 NONE
Comparing like with like
• General Linear Model (GML) with repeated
measures
• Comparison of similar situations, e.g. 2 and 7
sit2&7: minimal = “pic.3 in part A”
redundant = “pic.3 in part A of section 2”
sit2: reader is in same section as target
sit7: reader is in a different section
Results: Hypothesis 2
Do redundant descriptions benefit problematic
situations MORE than non-problematic situations?
# Clicks 5
4
MD
Long
3
2
1
0
1 DE
7 NONE
2 DE
Situations
8 NONE
Yes!
Results: Hypothesis 3
Are our algorithms economical with redundancy?
# Clicks
6
Not complete
Complete
5
4
3
2
1
FI
FI
FI
FI
0
3 LO
5 LO
4 LO
Situation
6 LO
Results: Hypothesis 3
Are our algorithms economical with redundancy?
# Clicks
6
Not complete
Complete
5
4
3
2
1
FI
FI
FI
FI
0
3 LO
5 LO
4 LO
Situation
6 LO
Yes!
How much overspecification is optimal ?
University of Brighton
Watts building
North Wing
auditorium
Cockcroft building
South
North
library
“The
auditorium”
“The ...in the
North Wing”
West
South
library
“The .... on this
campus”
“The .... in the
Watts building”
• Which of all these descriptions is best?
• Depends on issues other than the structure of the
domain, e.g.,
– how much time/space has the speaker/writer
available?
– how important is it that misunderstandings are
avoided? [cf., Van Deemter et al., this conference]
– is there room for negotiation through dialogue
[cf., Khan et al., this conference])
In setting of this experiment
• We did not find a point beyond which
overspecification backfires
• We did find a point of “diminishing returns”
for resolution speed
• Given that interpretation deteriorates
with every added property, the figures are
suggestive
Getting a feeling for the numbers
• Nonproblematic situations (situations 7 and 8):
– short descr: 1.53 clicks
(2 properties)
– redundant (other): 1.34 clicks
(3 properties)
• Problematic situations (situations 3 and 4):
– short descr: 4.05 clicks
(1 property)
– redundant (algorithm): 1.77 clicks (2 properties)
– redundant(other): 1.31 clicks
(3 properties)
Conclusion
• Overspec can have many reasons
(Jordan 2000, Horacek 2005)
• Overspec isn’t always equally necessary
• Focus on overspec for guiding “resolution”
• The optimum amount of overspec
is hard to determine
• But we have found a point of diminishing
returns, based on the need to avoid DE and LO.
Additional slides
[ A medical comparison
• A hospital with two types of patients, all of
whom have coughing (cf., clicking!) as
their main symptom
– chest infections (serious patients)
– throat infections (light patients)
• you can administer 1, 2, or 3 of pills (cf.,
properties). But pills can be harmfull, so
the doctor uses them sparingly
The doctor’s regime:
• light patients should get 1 pill
• serious patients should get 2 pills on a
normal night, and 3 pills on a bad night
Is this a wise regime?
Tests were done ...
Test of effectiveness of pills
1.
2.
3.
Serious patients who get their 2 or 3 pills start
coughing less
Serious patients benefit more from getting their
prescribed high number of pills (as opposed to just 1)
than light patients
Focus on serious patients. Try giving the ones that are
having a good night 3 pills (i.e. one more than
prescribed). They benefit less (from getting 3 instead
of 2 pills) than the ones that are having a bad night
benefitted (from getting 3 instead of 2 pills).
]
Results on Search Behaviour
# subjects
10
9
8
7
6
5
4
3
2
1
0
0
1
2
3
4
5
6
7
8
9
10
11
12
# Deviations from Ancestral Search in first navigation action
for 12 documents with incomplete descriptions