slides

Loren Terveen
Computer Science & Engineering
The University of Minnesota
August 2011
1
 Theory
 Simulation
 Lab studies
 Surveys
 Qualitative studies
 Build and learn
 (e.g., Google, Facebook, Wikipedia)
 Build To Learn
GroupLens Research
•
•
•
Create new interaction / social
computing techniques
Do empirical, quantitative
research
Learn from what we and others
build
 Data
 Experimental Control
Learning from others’ data
2. Learning from our own data
3. Exercising experimental control
1.
 Q&A systems
 Wikipedia












WP:Clubhouse? An Exploration of Wikipedia’s Gender Imbalance. Lam, S.K., Uduwage, A., Dong, Z.,
Sen, S., Musicant, D.R., Terveen, L., Riedl, J. WikiSym 2011.
NICE: Social translucence through UI intervention. A. Halfaker, B. Song, D. A. Stuart, A. Kittur and J.
Riedl. Wikisym 2011.
Don't bite the Newbies: How Reverts Affect the Quantity and Quality of Wikipedia Work. A. Halfaker,
A. Kittur and J. Riedl. Wikisym 2011.
Mentoring in Wikipedia: A Clash of Cultures. D. Musicant, Y. Ren, J. Johnson and J. Riedl. Wikisym
2011.
The Effects of Group Composition on Decision Quality in a Social Production Community, Lam, S.K.,
Karim, J., Riedl, J. Group 2010.
The Effects of Diversity on Group Productivity and Member Withdrawal in Online Volunteer Groups,
Chen, J., Ren, Y., Riedl, J. CHI 2010.
rv you're dumb: Identifying Discarded Work in Wiki Article History, Ekstrand, M.D., Riedl, J.T.
Wikisym 2009.
A Jury of Your Peers: Quality, Experience and Ownership in Wikipedia, Halfaker, A., Kittur, N., Kraut,
R., Riedl, J. Wikisym 2009.
Is Wikipedia Growing a Longer Tail?, Lam, S.K., Riedl, J. Group 2009.
Wikipedians are born, not made: a study of power editors on Wikipedia, Panciera, K., Halfaker, A.,
Terveen, L. Group 2009.
SuggestBot: Using Intelligent Task Routing to Help People Find Work in Wikipedia, Cosley, D.,
Frankowski, D., Terveen, L., Riedl, J. IUI 2007.
Creating, Destroying, and Restoring Value in Wikipedia, Priedhorsky, R., Chen, J., Lam, S.K., Panciera,
K., Terveen, L., Riedl, J. Group 2007.
 WP:Clubhouse? An Exploration of
Wikipedia’s Gender Imbalance.
Lam, S.K., Uduwage, A., Dong, Z.,
Sen, S., Musicant, D.R., Terveen, L.,
Riedl, J.
 www.grouplens.org/node/466
 http://www.nytimes.com/2011/01/31/business/media/3
1link.html?_r=1&src=busln
 A topic generally restricted to teenage girls, like
friendship bracelets, can seem short at four paragraphs
when compared with lengthy articles on something boys
might favor, like, toy soldiers or baseball cards, whose
voluminous entry includes a detailed chronological
history of the subject.
 (BTW, it’s not about the friendship bracelets)
9
 Only 16% of new editors joining Wikipedia during
2009 identified themselves as women
 Women made only 9% of the edits by this cohort
 New women editors are more likely to stop editing and
leave Wikipedia when their edits are reverted
 Topics of particular interest to women appear to get
less (and poorer) coverage in Wikipedia
 (Hmm… maybe Wikipedia has a low collective IQ!)
 Come to Wikisym to get the details!
 MovieLens
 Cyclopath
200 Union St SE
Lagoon Theatre
 How do contributors to open content systems become
contributors?
 Inspired by…
Wikipedians fill different niches than nonWikipedians
 Wikipedians branch out to new areas and topics as
they mature
 Wikipedians take on more “community work” as they
mature

Qualitative study with nine participants self-reporting
Evidence for “becoming”?
Quantity of work
Quality of work
Nature of work
Are Wikipedians Born or Made?
A registered editor with 250+ edits over
his/her lifetime
If editors reach 250 edits within our data set,
they are labeled Wikipedian from the beginning
English Wikipedia dump (January 13, 2008)
Edits from bots and other non-human means
removed
We counted:
Only registered editors
Wikipedians (users with 250+ edits) - 38K
Non-wikipedians - random sample of 38K
Edits per day per editor
(“User days”)
(“Day 1”)
Wikipedians are
Born
Made
 Is a user’s fate sealed?
 Measure: Persistent Word Revisions (PWRs)
 Proportion of words added that persist five revisions
Wikipedians are
Born
Made
 Other quality metrics?
 Conjecture: Wikipedians take on community
maintenance work over time
 Several ways to formalize
 Editing in “talk” (and other) namespaces
 (Nope: still “born”)
 Referring to “community norms” (Wikipedia policies) to
explain edits
Wikipedians are
Born
Made
 Learning norms vs. learning to appeal to the norms?
 Training: effective editing
Common pattern: Initial burst of activity, decline, steady state
 Wikipedians look different from day one
 Little evidence for “Becoming Wikipedian”: Wikipedians are
born, not made




Can we reconcile?
This is depressing!
Possible responses:
Early interventions
Change the culture
Systemic initiatives, e.g., APS Wikipedia Initiative:
http://www.psychologicalscience.org/index.php/members/apswikipedia-initiative
 Accept the reality of the long tail



 We can’t ask Wikipedia users about our
interpretations
 What if the learning happened before users
registered?
 As of September 2009, we identified:
 1172 “unambiguous” users
 268 of these users made some edits
 440 “ambiguous” users
 For unambiguous users
 Day 1 = First time a user came to the site (not the day
they registered)
Same pattern as
for Wikipedia
300
# of users
250
200
150
100
50
0
Do Not Edit
Do Edit
# of users
800
700 or two
A minute
600
500
400
300
200
100
0
0
<= 5 min.
1-50
<= 15
<= 30
<= 60
51-100 101-250 251-500
5011000
1001+
“Born, Not Made”
still seems true
 Cyclopath user surveys – Wikisym 2011 paper
 Why these patterns?
 What ‘triggers’ initial contribution?
 And how might we nurture ongoing participation?
 Cyclopath contextual interviews
 planned
 Motivating participation: How can we get more
work done in open content systems?
 Idea: match users with tasks they’re likely to be
interested in and capable of doing
 Requirements:




Introduce tasks matching algorithms/interfaces
Assign users to different conditions
Gather data necessary for evaluation
Survey users
Goals
Get work done
Nurture new users
Serve community
Intelligent
Task Routing
Tools
Theory
Recommender algorithms
Interaction design
Collective Effort Model
Social Influence
MovieLens
Task:
Edit movie content
theory-based
High Pred
Pick movies the system thinks
the user will really like
Rare Rated
Pick movies the user has rated
that few others have
Needs Work
Pick movies that are missing
the most information
Random
Pick random movies
(individual value of
outcomes)
(lower effort for a
given performance)
(contribution matters
to group)
(baseline)
 Assign ML users to four groups, one per algorithm
 About 2,000 subjects, 200 contributors
 Count # editors, contributions, fields
Editing behavior by strategy
250
Count
200
Rare rated: dominant
Needs work: bang for buck
Random: not bad here
High prediction: lousy
150
HighPred
RareRated
NeedsWork
Random
100
50
0
Number of editors Number of edits
Metric
Fields filled in
 Task matching worked
 Familiarity of user with task was most helpful
 Reduces effort
 Increases value
 Note: we’ve tried this approach in Wikipedia and
Cyclopath, too
 Different issues
 Generality
 MovieLens
 14 years of continuous development
 Several complete software architecture / UI redos (and
another needed!)
 1 full-time software engineer
 Much graduate student time over the years
 ~140K lines of code, in multiple languages
 1 full-time software engineer
 Grad students: expectation they will spend 25-30% of
their time on ‘development’ tasks
 Looming tasks:
 UI redesign / reimplementation
 Expanding geographic coverage
 Significant resources devoted to development
 But: typically enables new experiments and/or builds
the user community
 And: funding for these resources often came only due
to the success of the system/community
 Fewer papers
 But: papers of a type that would be impossible
otherwise
 We can investigate questions in different settings,
applying different methods: cumulative science
 Cycloplan (in collab. with Metropolitan Council)
 Planners can develop ideas informed by usage data (“What if
I add a trail here?”)
 Planners can share plans with public
 Public can explore plans, give feedback (“How much would
my route be improved with this trail?”)
 Public can share concerns directly to relevant officials
 Participatory Crowdsourcing (in collab. with IBM)
 Citizens as sensors
 Continua of participation; incentives
 Models for participation in open content systems
 Roles, privileges, processes: Nupedia vs. Wikipedia
 Models for volunteer participation
 Initial vs. ongoing
 http://www.grouplens.org/biblio
 The GroupLens Research Group, particularly:





John Riedl
Joe Konstan
Reid Priedhorsky
Dan Cosley
Katie Panciera
 And:
 Tom Erikcson, IBM
 Me:
 [email protected]
 Twitter: @lorenterveen