Segmentation for Efficient Supervised Language Annotation with an

Segmentation for Efficient
Supervised Language Annotation
with an Explicit Cost-Utility Tradeoff
Matthias Sperber, Mirjam Simantzik, Graham Neubig,
Satoshi Nakamura, Alex Waibel
Karlsruhe Institute of Technology (Germany)
Mobile Technologies (Germany)
Nara Institute of Science and Technology (Japan)
Transactions of the Association for Computational Linguistics (TACL), April 2014
1
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Supervision Strategies
Example: Let’s manually correct an automatic speech transcript…
Reference: It was a bright cold day in April, and the clocks were striking thirteen.
Speech recognizer:
It was a bright cold (they) in (apron), and (a) clocks were striking thirteen.
2
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Supervision Strategies
Example: Let’s manually correct an automatic speech transcript…
Reference: It was a bright cold day in April, and the clocks were striking thirteen.
Speech recognizer:
It was a bright cold (they) in (apron), and (a) clocks were striking thirteen.
3
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Supervision Strategies
Example: Let’s manually correct an automatic speech transcript…
Reference: It was a bright cold day in April, and the clocks were striking thirteen.
Speech recognizer:
Correcting sentences
[Sperber+2013]:
wastes effort!
It was a bright cold (they) in (apron), and (a) clocks were striking thirteen.
4
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Supervision Strategies
Example: Let’s manually correct an automatic speech transcript…
Reference: It was a bright cold day in April, and the clocks were striking thirteen.
Speech recognizer:
Correcting sentences
[Sperber+2013]:
wastes effort!
It was a bright cold (they) in (apron), and (a) clocks were striking thirteen.
It was a bright cold (they) in (apron), and (a) clocks were striking thirteen.
5
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Supervision Strategies
Example: Let’s manually correct an automatic speech transcript…
Reference: It was a bright cold day in April, and the clocks were striking thirteen.
Speech recognizer:
Correcting sentences
[Sperber+2013]:
wastes effort!
It was a bright cold (they) in (apron), and (a) clocks were striking thirteen.
Correcting individual words
[Sanchez-Cortina+2012]:
frequent context switches
It was a bright cold (they) in (apron), and (a) clocks were
striking
thirteen.
à high
cognitive
overhead!
6
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
User Study: Supervision efficiency drops
for short segments!
Avg. time / instance [sec]
Avg. supervision
time per token
[sec] 6
5
4
3
2
1
0
7
-  Transcription task
-  Word segmentation task
6/23/14
1
3
5
7
9
11
13
15
Segment length
length [# tokens]
Segment
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
17
19
Institute for Anthropomatics, Department of Computer
Science
Supervision Strategies
Example: Let’s manually correct an automatic speech transcript…
Reference: It was a bright cold day in April, and the clocks were striking thirteen.
Speech recognizer:
Correcting sentences
[Sperber+2013]:
wastes effort!
It was a bright cold (they) in (apron), and (a) clocks were striking thirteen.
Correcting individual words
[Sanchez-Cortina+2012]:
frequent context switches
It was a bright cold (they) in (apron), and (a) clocks were
striking
thirteen.
à high
cognitive
overhead!
8
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Supervision Strategies
Example: Let’s manually correct an automatic speech transcript…
Reference: It was a bright cold day in April, and the clocks were striking thirteen.
Speech recognizer:
Correcting sentences
[Sperber+2013]:
wastes effort!
It was a bright cold (they) inPrevious
(apron), and (a) clocks were striking thirteen.
Correcting individual words
approaches
[Sanchez-Cortina+2012]:
frequent context switches
It was a bright cold (they) in (apron), and (a) clocks were
striking
thirteen.
à high
cognitive
overhead!
9
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Supervision Strategies
Example: Let’s manually correct an automatic speech transcript…
Reference: It was a bright cold day in April, and the clocks were striking thirteen.
Correcting sentences
[Sperber+2013]:
wastes effort!
Speech recognizer:
It was a bright cold (they) inPrevious
(apron), and (a) clocks were striking thirteen.
Correcting individual words
approaches
[Sanchez-Cortina+2012]:
frequent context switches
It was a bright cold (they) in (apron), and (a) clocks were
striking
thirteen.
à high
cognitive
overhead!
It was a bright cold (they) in (apron), and (a) clocks were striking thirteen.
Better: moderate-sized segments
with high percentage of errors!
10
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Supervision Strategies
Example: Let’s manually correct an automatic speech transcript…
Reference: It was a bright cold day in April, and the clocks were striking thirteen.
Correcting sentences
Sentence-based
[Sperber+2013]:
baseline
wastes
effort!
Speech recognizer:
It was a bright cold (they) inPrevious
(apron), and (a) clocks were striking thirteen.
Correcting individual words
approaches
Word-based :
[Sanchez-Cortina+2012]
frequent
context switches
baseline
It was a bright cold (they) in (apron), and (a) clocks were
striking
thirteen.
à high
cognitive
overhead!
It was a bright cold (they) in (apron), and (a) clocks were striking thirteen.
Better: moderate-sized segments
method
withProposed
high percentage
of errors!
11
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Roadmap
1.  Segmentation Approach
2.  Predictive User Models
3.  User Study
12
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Our approach to efficient supervised language
annotation
Given some language corpus…
1.  Predict utility and cost of supervising any given segment
• 
according to supervision modes (e.g. typing, respeaking, skipping)
2.  Define constrained optimization goal
•  E.g. minimize annotation time to remove at least 100 errors
3.  Choose segments (locations/lengths & supervision modes), such that
human supervision yields maximum efficiency according to (1) and (2)
User Study:
"   Post-editing for speech transcription
"   Active learning for word segmentation
13
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Optimization Framework
1
14
6/23/14
(at) 2 (what’s) 3 a
4
bright 5 cold
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
6
…
Institute for Anthropomatics, Department of Computer
Science
Optimization Framework
[SKIP]
[SKIP]
1
(at) 2 (what’s) 3 a
[TYPE]
[TYPE]
4
bright 5 cold
6
…
[TYPE]
[TYPE]
15
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Optimization Framework
[SKIP]
[SKIP]
1
(at) 2 (what’s) 3 a
[TYPE]
[TYPE]
4
bright 5 cold
6
…
[TYPE]
[TYPE]
16
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Optimization Framework
[RESPEAK]
[SKIP]
[SKIP]
1
(at) 2 (what’s) 3 a
[TYPE]
[TYPE]
4
bright 5 cold
6
…
[TYPE]
[TYPE]
17
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Optimization Framework
[RESPEAK:1.5/2]
[SKIP:0/0]
[SKIP:0/0]
1
(at) 2 (what’s) 3 a
[TYPE:1/4]
4
bright 5 cold
[TYPE:1/4]
6
…
[TYPE:0/6]
[TYPE:2/5]
“typing corrects ~2 words,
needs ~5 seconds”
18
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Optimization Framework
[RESPEAK:1.5/2]
[SKIP:0/0]
[SKIP:0/0]
1
(at) 2 (what’s) 3 a
[TYPE:1/4]
4
bright 5 cold
[TYPE:1/4]
6
…
[TYPE:0/6]
[TYPE:2/5]
“typing corrects ~2 words,
needs ~5 seconds”
19
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Optimization Framework
[RESPEAK:1.5/2]
[SKIP:0/0]
[SKIP:0/0]
1
(at) 2 (what’s) 3 a
[TYPE:1/4]
4
bright 5 cold
[TYPE:1/4]
6
…
[TYPE:0/6]
[TYPE:2/5]
“typing corrects ~2 words,
needs ~5 seconds”
20
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Optimization Framework
[RESPEAK:1.5/2]
[SKIP:0/0]
[SKIP:0/0]
1
(at) 2 (what’s) 3 a
[TYPE:1/4]
4
bright 5 cold
[TYPE:1/4]
6
…
[TYPE:0/6]
[TYPE:2/5]
“typing corrects ~2 words,
needs ~5 seconds”
21
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Optimization Framework
[RESPEAK:1.5/2]
[SKIP:0/0]
[SKIP:0/0]
1
(at) 2 (what’s) 3 a
[TYPE:1/4]
4
bright 5 cold
[TYPE:1/4]
6
…
[TYPE:0/6]
[TYPE:2/5]
“typing corrects ~2 words,
needs ~5 seconds”
22
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Optimization Framework
Constrained shortest path problem
[RESPEAK:1.5/2]
[SKIP:0/0]
[SKIP:0/0]
1
(at) 2 (what’s) 3 a
[TYPE:1/4]
4
bright 5 cold
[TYPE:1/4]
6
…
[TYPE:0/6]
[TYPE:2/5]
“typing corrects ~2 words,
needs ~5 seconds”
23
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Optimization Framework
Constrained shortest path problem
[RESPEAK:1.5/2]
[SKIP:0/0]
[SKIP:0/0]
1
(at) 2 (what’s) 3 a
[TYPE:1/4]
4
bright 5 cold
[TYPE:1/4]
6
…
[TYPE:0/6]
[TYPE:2/5]
“typing corrects ~2 words,
needs ~5 seconds”
24
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Optimization Framework
Constrained shortest path problem
…
………
.
……..
……
…
Define as integer linear program (ILP)
Use off-the-shelf ILP solver to find near-optimal solution
25
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Roadmap
1.  Segmentation Approach
2.  Predictive User Models
3.  User Study
26
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Roadmap
1.  Segmentation Approach
2.  Predictive User Models
"   Here:
3.  User Study
27
6/23/14
"   Cost model:
"   Supervision time
"   Utility models:
"   # removed errors (post-editing setting)
"   Model improvement (active-learning setting)
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Predicting supervision time
Input
features
Segment
length
Segment
confidence
Audio duration
Regressor
….
Trained
via annotator
enrollment
predict
Supervision
time
28
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Post editing utility: # removed errors
# errors in
automatic output
∑
Reliability of
human annotator
Fixed human error rate
(e.g. from enrollment)
uncertainty(t)
t∈tokens
X
# errors removed
by human
29
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Active learning utility: model improvement
Token uncertainties
Scaled token uncertainties, want:
util(tok) ∝ Model improvement
u1
u2
u3
u’1
u’2
u’3
…
un
u’n
+
Segment utility
30
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Roadmap
1.  Segmentation Approach
2.  Predictive User Models
3.  User Study
31
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Roadmap
1.  Segmentation Approach
2.  Predictive User Models
3.  User Study
"   Post-editing task for speech transcription
"   Active-learning task for word segmentation
32
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Experiment: Post Editing for Speech Transcription
"   Optimization goal: given a faulty transcript, improve from 20% to 15%
WER as fast as possible
"   2 supervision modes: TYPE, SKIP
"   3 participants corrected 2 TED transcripts (24 minutes of audio)
"   Preparative step: annotator enrollment (200 random segments)
33
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Experiment: Post Editing for Speech Transcription
"   Sentence-based, confidence-order baseline [Sperber+2013]:
"   Fixed segmentation into sentence-like units
"   average length: 8.6 words
"   Order segments by average word confidence
"   Choose top-n segments, such that user model predicts 15% WER after
supervision
"   Proposed method:
"   Directly optimize segmentation according to optimization goal
"   Participants transcribe 2 talks twice (once baseline, once proposed
method)
"   Switch order of methods after 1st talk to minimize learning bias
34
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
User Interface
35
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Simulation
(w/ perfect time predictions & annotator accuracy)
Target WER
after post-editing [%]
20
Sentence-based:
Linear baseline
Confidence-order
baseline
Target WER [%]
15
Explicitly segmented:
Proposed method
10
Proposed w/ oracle
utilities
5
0
36
6/23/14
0
5
10
15
20
25
30
35
Post editingtime
time [minutes]
[min]
Post-editing
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
40
45
50
Institute for Anthropomatics, Department of Computer
Science
Simulation
(w/ perfect time predictions & annotator accuracy)
Target WER
after post-editing [%]
20
Sentence-based:
Linear baseline
Confidence-order
baseline
Target WER [%]
15
Explicitly segmented:
Proposed method
10
Proposed w/ oracle
utilities
5
0
37
6/23/14
0
5
10
15
20
25
30
35
Post editingtime
time [minutes]
[min]
Post-editing
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
40
45
50
Institute for Anthropomatics, Department of Computer
Science
Simulation
(w/ perfect time predictions & annotator accuracy)
Target WER
after post-editing [%]
20
Sentence-based:
Linear baseline
Confidence-order
baseline
Target WER [%]
15
Explicitly segmented:
Proposed method
10
Proposed w/ oracle
utilities
5
0
38
6/23/14
0
5
10
15
20
25
30
35
Post editingtime
time [minutes]
[min]
Post-editing
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
40
45
50
Institute for Anthropomatics, Department of Computer
Science
Simulation
(w/ perfect time predictions & annotator accuracy)
Target WER
after post-editing [%]
20
Sentence-based:
Linear baseline
Confidence-order
baseline
Target WER [%]
15
Explicitly segmented:
Proposed method
10
Proposed w/ oracle
utilities
5
0
39
6/23/14
0
5
10
15
20
25
30
35
Post editingtime
time [minutes]
[min]
Post-editing
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
40
45
50
Institute for Anthropomatics, Department of Computer
Science
Simulation
(w/ perfect time predictions & annotator accuracy)
Target WER
after post-editing [%]
20
Sentence-based:
Linear baseline
Confidence-order
baseline
Target WER [%]
15
Explicitly segmented:
10% target WER
10
Proposed method
Proposed w/ oracle
utilities
5
0
40
6/23/14
0
5
10
15
20
25
30
35
Post editingtime
time [minutes]
[min]
Post-editing
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
40
45
50
Institute for Anthropomatics, Department of Computer
Science
Simulation
(w/ perfect time predictions & annotator accuracy)
Target WER
after post-editing [%]
20
Sentence-based:
Linear baseline
Confidence-order
baseline
Target WER [%]
15
Explicitly segmented:
10% target WER
10
Proposed method
Proposed w/ oracle
utilities
5
7 min
0
41
6/23/14
0
5
10 min
10
15
20
25
30
35
Post editingtime
time [minutes]
[min]
Post-editing
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
40
45
50
Institute for Anthropomatics, Department of Computer
Science
Simulation
(w/ perfect time predictions & annotator accuracy)
Target WER
after post-editing [%]
20
Sentence-based:
Linear baseline
Confidence-order
baseline
Target WER [%]
15
Explicitly segmented:
Proposed method
10
Proposed w/
oracle utilities
5
0
42
6/23/14
0
5
10
15
20
25
30
35
Post editingtime
time [minutes]
[min]
Post-editing
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
40
45
50
Institute for Anthropomatics, Department of Computer
Science
Real User Study
Confidence baseline
Proposed method
Participant
43
6/23/14
WER
Time
WER
Time
P1
12.26
44:05
12.18
33:01
P2
12.75
36:19
12.77
29.54
P3
12.70
52:42
12.50
37:57
Avg.
12.57
44:22
12.48
33:37
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Real User Study
Confidence baseline
Proposed method
Participant
44
6/23/14
WER
Time
WER
Time
P1
12.26
44:05
12.18
33:01
P2
12.75
36:19
12.77
29.54
P3
12.70
52:42
12.50
37:57
Avg.
12.57
44:22
12.48
33:37
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Real User Study
Confidence baseline
Proposed method
Participant
WER
Time
WER
Time
P1
12.26
44:05
12.18
33:01
P2
12.75
36:19
12.77
29.54
P3
12.70
52:42
12.50
37:57
Avg.
12.57
44:22
12.48
33:37
25.2% speed-up!
Problem w/ WER prediction: Reached ~12% WER, even though we
wanted only 15%…
45
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Word Segmentation Experiment: Summary
"   Active learning scenario for Japanese word segmentation
"   Optimize model improvement, s.t. 30 minutes time budget
"   Baseline: word-based uncertainty sampling [Neubig+2011]
Classification
accuracy
0.965
0.955
Proposed
Baseline
Annotation
time [min]
0
10
20
30
"   Improvement for expert annotator over baseline
"   Time prediction in absolute terms was inaccurate: training data mismatch
"   For non-experts (students), improvement less clear, due to high
annotation error rate
46
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Conclusion
Annotating sentences wastes effort:
✗
Thank you for your
attention!
Any Questions?
“It was a bright cold (they) in (apron), and (a) clocks were striking thirteen”.
Annotating tokens leads to frequent context switches:
✗
“It was a bright cold (they) in (apron), and (a) clocks were striking thirteen”.
Instead, we propose to use compact segments that yield optimal annotation
efficiency according to cost and utility predictions:
✔
It was a bright cold (they) in (apron), and (a) clocks were striking thirteen.
•  Some future work…
•  Larger scale user study
•  Improve user modeling
•  Investigate other tasks such as translation
•  …
47
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
48
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Formal Problem Definition
"
"
"
"
 
 
 
 
Corpus of words w1…wn e.g. ASR transcript
Set of “supervision modes” e.g. K={TYPE, RESPEAK, SKIP}
Set of optimization criteria e.g. L={supervision time, # of errors removed}
User models ul,k predict outcome of supervising any given segment
w.r.t. to supervision mode k and criterion l
"   e.g., user model would probably predict that:
"   Long segments have higher cost than short segments
"   TYPING is slow, but corrects many errors
" RESPEAKING is fast, but does not correct all errors
"   SKIPPING a segment costs no time and corrects no errors
"   Specify a constrained optimization problem
"   One criterion is the objective, others are constraints
"   e.g.: correct as many errors as possible, given a 2h time budget
à Want to find the segmentation and supervision modes that
optimize this objective!
49
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Appendix: Integer Linear Program Formulation
50
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Appendix: Example Segments
51
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Appendix: Word Segmentation Experiment Details
52
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science
Appendix: Possible Supervision Modes
"   Input modalities [Suhm et al., 2001]
"   Several annotators with different expertise and cost [Donmez &
Carbonell, 2008]
"   Correct (post-edit) vs. annotate from scratch in MT [Specia, 2011]
53
6/23/14
Segmentation for Efficient Supervised Language Annotation with an Explicit CostUtility Tradeoff
Institute for Anthropomatics, Department of Computer
Science