Task and Workflow Design II
KSE 801
Uichin Lee
Contents
• Turkomatic: divide and conquer strategy for
performing more “challenging tasks” in MTurk
• TurKontrol: decision-theoretic approach for
work-flow control (e.g., how many
improve/vote tasks?)
• Turkalytics: monitoring workers’ behavior
remotely
Turkomatic: Automatic Recursive
Task and Workflow Design for
Mechanical Turk
CHI'11 WIP
Turkomatic
• Turkomatic interface accepts task
requests written in natural language
• Subdivide phase:
– For each request, it posts a HIT to MTurk, asking workers to break the task
down into a set of logical subtasks
– Each subtask is then automatically
reposted to M-Turk; subtask can be
further broken down
• Merge phase:
– Once all subtasks are completed, HITs
are posted asking workers to combine
subtask solutions into a coherent whole
• The end result will then be delivered to
the requester
Subdivide Phase
• Decomposition of tasks, and the creation of
solution elements
Divide and Merge
Divide and Merge
Evaluation
• Tasks:
– Producing a written essay in response to a prompt: “please
write a five-paragraph essay on the topic of your choice”
– Solving an example SAT test “Please solve the 16-question SAT
located at http://bit.ly/SATexam”
– Payment: $0.10 to $0.40 per HIT
• Each “subdivide” or “merge” HIT received answers within 4
hours; solutions to the initial task were completed within
72 hours
• Essay: the final essay (about “university legacy admissions”)
displayed a reasonably good understanding of a topic; yet
the writing quality is often mixed
• SAT: the task was divided into 12 subtasks (containing 1-3
questions); the score was 12/17
Decision-Theoretic Control of
Crowd-Sourced Workflows
Peng Dai, Mausam, Daniel S. Weld
AAAI 2010
Motivation
• Iterative workflow (i.e., improve and vote) used in
TurKit has the following problems:
– What is the optimal number of iterations?
– How many ballots (votes) should we use?
– How do answers change if the workers are more/less
skilled?
Iterative workflow
TurKontrol: Computation Model
• Text α is improved to text α’ (after improve task)
• Given a pair (α, α’), a series of votes can be received
(bk ) to judge which one is better
TurKontrol: Computation Model
• Text α: quality density function: fQ(q) – prior
• A worker x takes an improvement job and
submits α‘
• Text α‘ done by worker x:
quality density function: fQ’|q,x(q’) – posterior
• Quality density function of text α‘
TurKontrol: Computation Model
• Voting:
– A series of n votes: b = b1, b2, …, bn where bi ∈ {1, 0}
– Posterior probability after n votes: fQ|b (q) and fQ’|b (q’)
• Difficulty:
– Closer the two results the more difficult to judge
– d(q, q’) = 1 - |q-q’|M where M is constant; and d ∈ [0, 1],
• Accuracy (of a worker x)
– ax(d) = ½ [1+(1-d)r] where r is a knob for controlling accuracy dist
If the i-th worker xi has accuracy axi (d),
TurKontrol: Computation Model
• For a given pair (α, α’), its posterior probabilities
(Q, Q’) are fQ|b(q) and fQ’|b(q)
α
where
Given that we don’t know the worker, an average worker
is used
TurKontrol: Computation Model
α
Improve
fQ(q)
α‘
fQ’(q’)
fQ|b (q) α
fQ’|b (q’) α'
Vote
Cost: c_b
utility
Cost: c_imp
Utility function:
fQ|b+1 (q)
fQ’|b+1 (q’)
quality
TurKontrol: Computation Model
• Utility estimation of a pair (α, α’), for (1) improve and
(2) voting task
– (2) utility of a vote task
U: utility function
cb: vote cost
cimp: improve cost
– (1) utility of an improve task
• Decision making:
– Three options: (a) vote, (b) improve, or(c) accept
– k-step lookahead: evaluate all sequences of k decisions,
and find the sub-sequence with the highest utility
Numerical Results
•
•
•
•
•
Convex utility function with max 1000
Fixed cost (improve, vote) = (30, 10)
Net utility: utility of submitted artifact –payment to workers
TurKit: performs as many iterations as possible (max allowance 400)
TurKontrol (2): lookahead of 2
cf: accuracy of workers ax(d) = ½[1+(1-d)r]
Turkalytics: Real-time Analytics
for Human Computation
Paul Heymann and Hector Garcia-Molina
WWW'11
Basic Buyer human programming
• A human program generates forms; advertised through a marketplace.
• Workers look at posts, and then complete the forms for compensation.
Game Maker human programming
• The programmer writes a human program and a game.
• The game implements features to make it fun and difficult to cheat.
• The human program loads and dumps data from the game.
Human Processing programming
Human Processing programming
• Task description:
– Input, output, web forms, human driver, other
information
– Human task instance
(description)
(instance)
• Human drivers: interact with workers
– Functions: initialization (forms, games), retrieving results
– “Human Program” accesses workers via “human drivers”
• Recruiters: post task instances into the marketplaces,
(by working with marketplace drivers)
– Marketplace driver provides an interface to marketplaces
Turkalytics
• Challenge: collecting reliable data about the
workers and the tasks they perform
• Why?
– If a task is not being completed, is it because no
workers are seeing it? Is it because the task is
currently being offered at too low a price?
– How does the task completion time break down?
– Do workers spend more time previewing tasks or
doing them?
– Do they take long breaks?
– Which are the more “reliable” workers?
Interaction Model
• Search-Preview-Accept (SPA) model
Interaction Model
• Search-Continue-RapidAccept-Accept-Preview (SCRAP)
Continue
completing a
task that was
accepted but
not submitted
Accept the next
task in a HITGroup
w/o previewing it
Turkalytics Data Models
Turkalytics Architecture
Client-side javascript: ta.js
ta.js
Log Server
Log messages (JSON )
Ajax: POST
Client-side javascript: ta.js
Log messages (JSON )
ta.js
Analysis Server
Implementation: client-side Javascript
• Requester embeds a Turkalytics script (ta.js) into
a HIT (when designing a HIT)
– Monitoring: Detect relevant worker data and actions.
– Sending: Log events by making image requests to the
log server (ajax: POST)
Implementation:
ta.js -- client-side JavaScript
• ta.js’s monitoring activities:
– Client Information: Worker’s screen resolution? What
plugins are supported? Can ta.js set cookies?
– DOM Events: Over the course of a page view, the
browser emits various events (e.g., load, submit,
before unload, and unload events)
– Activity: listens on a second-by-second basis for the
mousemove, scroll and keydown events to determine
if the worker is active or inactive.
– Form Contents: examines forms on the page and their
contents; logs initial form contents, incremental
updates, and final state.
Implementation: log/analysis
• Log Server:
– Simple web app built on Google’s App Engine.
– Receives logging events from clients running ta.js and
saves them to a data store.
• IP address, user agent, and referer, etc
• Analysis Server:
– Periodically polls the log server to download any new
events that have been received
– Event inserted into DB, considering the following:
•
•
•
•
Time constraints: data availability to analysis server
Dependencies: if events are dependent on one another
Incomplete input: if all events are not received yet..
Unknown input: what if unexpected input is received?
Implementation: analysis
// what type of data (event) is sent
// actual data for a given type
Detailed info about task
// session ID
Experiments
• Tasks:
– Named Entity Recognition (NER): This task, posted in groups of
200 by a researcher in Natural Language Processing, asks
workers to label words in a Wikipedia article if they correspond
to people, organizations, locations, or demonyms. (2, 000 HITs, 1
HIT Type, more than 500 workers.)
– Turker Count (TC): This task, posted once a week by a professor
of business at U.C. Berkeley, asks workers to push a button, and
is designed just to gauge how many workers are present in the
marketplace. (2 HITs, 1 HIT Type, more than 1, 000 workers
each.)
– Create Diagram (CD): This task, posted by the authors, asked
workers to draw diagrams for this paper based on hand drawn
sketches
Experiments: origin of workers
• GeoLite City DB from MaxMind to geolocate
all remote users by IP address
Experiments: worker characteristics
Experiments: states/actions
• RapidAccept is quite popular (Continue is rare)
Experiments: # previews
• Artificial recency for NER/CD (keep making them near the top in the list):
NER and CD exhibit less severe drop as opposed to TC
Artificial
Recency
Experiments: activity vs. delay
• Average active and total seconds for each worker
who completed the NER task (correlation 0.88)
Discussion
• Multi-tasking users? Activity vs. working time
• Privacy??
– We can collect as much as we can..
– How about Google Analytics? Any web pages that we visit
can collect such information…
• False data injection?
• How can we better utilize the dataset?
– Re-designing existing tasks, pricing, etc. (or mining user
behavior?)
Summary
• Turkomatic: divide and conquer strategy for
performing more “challenging tasks” in MTurk
• TurKontrol: decision-theoretic approach for
work-flow control (e.g., how many
improve/vote tasks?)
• Turkalytics: monitoring workers’ behavior
remotely
© Copyright 2025 Paperzz