Encoding wikipage to PML

Investigations into Trust for
Collaborative Information Repositories:
A Wikipedia Case Study
Deborah L. McGuinness,
Co-Director Knowledge Systems,
Artificial Intelligence Lab, Stanford University
[email protected]
Joint work with:
Honglei Zeng, Paulo Pinheiro da Silva, Li Ding,
Dhyanesh Narayanan, and Mayukh Bhaowal
Big Picture

Research theme


May 22, 2006
Make question answering systems
more operational to users
(agents/humans) by providing
explanations for answers…
In many settings, explanations
require some notion of trust in
information and/or sources
MTW - McGuinness
Trust is a Critical Emerging Component in
Social Collaborative Information Spaces

Goal: Allow users to access, view, and analyze information informed by
trust ratings. This enables users (and agents) to:
 Assess the trustworthiness of documents that are collaboratively created
and updated
 Monitor the changes in trustworthiness of dynamic documents and
provide timely notifications of possible malicious content modification
 Identify trustworthy information with visualization tools
 Access shareable trust information among heterogeneous systems
 Enable new design paradigms for Wikis with built-in trust components
– e.g., target text analytic tools at more trustworthy documents or
document fragments within a larger resource such as Wikipedia
May 22, 2006
MTW - McGuinness
Some Issues Relevant to Collaborative
Information Repositories/Wikis and Trust



Revisions: a key characteristic of Wikis
 Some social collaborative spaces, such as Wikis allow (and sometimes
promote) updates to posts from others. Note that this differs from
traditional bulletin boards, archived mailing lists, etc. that only support
revision by way of follow-up posts
Rating-based systems
 Some web systems support and encourage explicit ratings of
contributors and contributions
 Wikis have no explicit trust encoding support
 Simple rating schemes may not work (e.g. an article rated trustworthy
may not still be trustworthy if modified)
We are exploring computational approaches to trust exploiting prominent
Wiki features including:
 Citation-based trust approach (Wiki articles are interlinked via
citations/hyperlinks)
 Revision-history based trust approach
May 22, 2006
MTW - McGuinness
Terms

Concepts
 Article
 Version (of an article)
 Fragment
 Author
Article
hasVersion:[1,n]
Version
hasAuthor:[1,m]
hasFragment:[1,p]
Relations
 An article may have multiple
Author
versions, each of which reflects the
modification made by an author on
hasAuthor:[1,1]
a previous version
 A version can be split into multiple Fragment
fragments, each of which is
entirely contributed by a single
May 22, 2006
MTW - McGuinness
author

Citation-based Trust


May 22, 2006
Derive trust based on the citation
relationships among articles
 For example, a well-cited article may
be more trustworthy than an article
that has no citations
In the same family as the well known
(Google) PageRank.
MTW - McGuinness
Link-ratio Algorithm



Link-ratio of an article (i.e., the page with title x): the ratio
between the number of citation occurrences of the
encyclopedia term x and the number of total occurrences of x
(citations and non-citations).
 For example, “Seattle” appears 3855 times in Wikipedia,
1408 of which are citations (other mentions are not hot).
The link-ratio value of “Seattle” is 1408/3855 = 0.36.
Generally speaking*, the higher the link-ratio value of an article
is, the more trustworthy an article is.
Issue: there may be no incentive to link to an encyclopedia entry
(e.g. the “love” article vs. the “Gauss's law” article)
May 22, 2006
MTW - McGuinness
Revision History-based Trust (an example of
the “natural number” article in Wikipedia)
130.94.162.64
Trovatore
isAuthorOf
isAuthorOf
Natural number can mean
either a positive integer (1,
2, 3, ...) or a non-negative
integer (0, 1, 2, 3, …)
Content
Insertion The former definition is
v0: Oct 7, 2005

generally used in number
theory, while the latter is
preferred in set theory.
v1: Dec 1, 2005
When 130.94.162.64 (an anonymous author) inserted new content into the
“natural number” page, originated by Trovatore, there could be an assumption
of implicit trust in the original document fragment(s).
May 22, 2006
MTW - McGuinness
Deriving Trust from Revision History

Revision Operations (insertion, deletion, modification) implies
trust.
 trustworthiness of the revised article depends on the
trustworthiness of the previous version, the author of the
last revision, and the amount of text involved in the last
revision.

Revision history is widely available in cooperative information
systems:
 Collaborative Software Development (CVS)
 Cooperative Document Authoring (Wikipedia)
May 22, 2006
MTW - McGuinness
A formulation of Revision Trust


(Assumption) The trustworthiness of a new article fragment is
(only) dependent on its author.
(Assumption) the trustworthy content of a revised fragment f ’
is the trustworthy content of the previous fragment f minus
the trustworthy content that the author a removed from f (e.g.,
a fragment f could be more trustworthy if the deletion made by
a has removed inaccuracies in f)
t f' 
t f | f | (1  ta ) | D |
| f '|
tf, tf ’, ta are trust values of f, f ’ and a respectively; |f|, |f ’|
and |D| are the sizes of f, f ’ and D (D is the deleted text).
May 22, 2006
MTW - McGuinness
Inference Web and PML






Inference Web is an infrastructure for providing explanations of results
from web applications. It provides tools such as browsers, abstractors,
checkers, summarizers, combiners to manipulate and present justifications.
PML is the interlingua representation language for Inference Web. Proof
markup language (PML) is a representation language designed to be able to
encode information agents may need in order to evaluate results –
including where information came from and how it was manipulated.
PML has an OWL encoding (and XML serialization)
PML can be (and has been used) to represent justification of information
manipulation steps done by theorem provers (e.g., JTP, SNARK), text
analytic tools (e.g., UIMA), task processors (e.g., SPARK), rule
engines/systems (e.g., CWM, Cybercop), etc.
The main components concern inference representation and provenance
issues such as author, source, etc.
Our current work expands PML to include representation primitives for
trust.
May 22, 2006
MTW - McGuinness
A Sample PML encoding
http://inferenceweb.stanford.edu/2006/02/example1-iw-wiki.owl
<iw:NodeSet rdf:about="http://foto.stanford.edu/mediawiki-1.4.12/index.php/Natural_number">
<In mathematics, a natural number is either a positive integer … </iw:hasConclusion>
<iw:hasLanguage rdf:resource="http://inferenceweb.stanford.edu/registry/LG/English.owl#English"/>
<iw:isConsequentOf>
<iw:InferenceStep>
<iw:hasRule rdf:resource="http://inferenceweb.stanford.edu/registry/DPR/Told.owl#Told"/>
<iw:hasInferenceEngine rdf:resource="http://inferenceweb.stanford.edu/registry/IE/CitationTrust.owl#CitationTrust"/>
<iw:hasSourceUsage>
<iw:SourceUsage>
<iw:hasSource>
<iw:Source rdf:about="http://inferenceweb.stanford.edu/wp/registry/PER/Alexandrov.owl#Alexandrov"/>
</iw:hasSource>
</iw:SourceUsage>
</iw:hasSourceUsage>
</iw:InferenceStep>
</iw:isConsequentOf>
fragment
</iw:NodeSet>
<iw:AggregatedTrustRelation>
<iw:hasTrustingParty rdf:resource="http://inferenceweb.stanford.edu/wp/registry/ORG/wikipedia.owl#wikipedia"/>
<iw:hasTrustedParty rdf:resource="http://foto.stanford.edu/mediawiki-1.4.12/index.php/Natural_number"/>
<iw:hasTrustValue rdf:datatype="http://www.w3.org/2001/XMLSchema#float">0.1766</iw:hasTrustValue>
</iw:AggregatedTrustRelation>
fragment
<iw:AggregatedTrustRelation>
<iw:hasTrustingParty rdf:resource="http://inferenceweb.stanford.edu/wp/registry/ORG/wikipedia.owl#wikipedia"/>
<iw:hasTrustedParty rdf:resource="http://inferenceweb.stanford.edu/wp/registry/PER/Alexandrov.owl#Alexandrov"/>
<iw:hasTrustValue rdf:datatype="http://www.w3.org/2001/XMLSchema#float">0.1766</iw:hasTrustValue>
author
</iw:AggregatedTrustRelation>
May 22, 2006
MTW - McGuinness
trust
trust
Proof Markup Language:
Node Sets and Inference Steps
Conclusion:
In mathematics, a natural number is either a positive integer
(1, 2, 3, 4, ...) or a non-negative integer (0, 1, 2, 3, 4, ...).
Encoding this conclusion in PML:
iw:NodeSet
iw:isConsequenceOf
iw:InferenceStep
iw:hasRule:
Direct Assertion (DA)
iw:hasSourceUsage: articleID, author, timestamp
mathematics, a natural number is either a positive integer (1, 2, 3,
iw:hasConclusion: In
4, ...) or a non-negative integer (0, 1, 2, 3, 4, ...).
iw:hasLanguage: en
May 22, 2006
MTW - McGuinness
Proof Markup Language:
Aggregated Trust Relation
A trivial conclusion:
In mathematics, a natural number is either a positive integer
(1, 2, 3, 4, ...) or a non-negative integer (0, 1, 2, 3, 4, ...).
Encoding trust conclusion in PML:
iw:AggregatedTrustRelation
iw:hasTrustedParty: Wikipedia author
iw:hasTrustingParty:
Wikipedia
iw:hasTrustValue:
May 22, 2006
0.1766
MTW - McGuinness
Application: Trust View in Wikipedia
Fragmentation
Service
output
Article D
(fragment, version)+
(fragment, author)+
input
Trust Valuation
Service
input
Article D
(version, author) +
output
Article D
(version, author)+
citations, …
Article D
(fragment, trust)+
(version, trust)+
(author, trust)+
Wikipedia DB
processor
Wikipedia Database
article
revision
author
Wikipedia
May 22, 2006
User Click
“trust” tab
Wikipedia
view
User Click
“pml” tab
view
Wikipedia
MTW - McGuinness
HTML
for D
input
Trust Rendering
Service
PML
for D
output
Wikipedia Article without Trust View
May 22, 2006
MTW - McGuinness
Wikipedia Article with Citation Trust View
Multiple Trust View Tab
Fragments are colored per their
trust values computed from
Citation Trust (default mode).
May 22, 2006
MTW - McGuinness
Wikipedia Article with Revision Trust View
Fragments are colored per their
trust values computed from
Revision Trust.
May 22, 2006
MTW - McGuinness
Conclusion




Inference Web and PML can be used to support encoding and presentation
of trust related to information in social collaborative information
repositories such as Wikis.
We have designed and implemented a simple trust representation that
extends PML and included support for the extension in our IW tools.
More sophisticated trust modeling and trust processing is expected to be
required.
We are investigating
 Models of trust
 Trust aggregation from multiple sources and multiple algorithms
 Refinements and usage of revision-based trust
 Additional trust approaches and their combination
 New applications utilizing (sharable) trust information
More info:
Inference Web: iw.stanford.edu
Simple examples of PML markup with wiki demo:
foto.stanford.edu/mediawiki-1.4.12/index.php/Main_Page
[email protected]
May 22, 2006
MTW - McGuinness
Extra
May 22, 2006
MTW - McGuinness
Abstract PML
wiki:ArticleVersion
http://.../title=Natural_number
wiki:hasFragmentList
iw:hasTrustedParty
iw:AggregatedTrust
fragment trust is 0.1766
iw:NodeSet
In mathematics, a natural number
is either a positive integer …
…
iw:NodeSet
(fragment n)
Oleg Alexandrov
iw:Organization
Wikipedia
iw:hasSource
iw:hasSource
iw:Person
iw:hasTrustingParty
iw:hasTrustingParty
iw:Person
(author m)
22, 2006
Note:May
Green
nodes are in IW registry
iw:hasTrustedParty
MTW - McGuinness
iw:AggregatedTrust
author trust is 0.1766