Linking Open Data
Linking the world of data
Iftikhar Alam
What is now
User generated content is growing
tremendously
Post Ajax era…
Blogs, wiki, comments, tags,
Isolated contents need deadly to get
connected.
An idea of Tim Berners-Lee (Semantic Web)
The world is connected, so do the data,
information and knowledge should be
connected.
Old terms
Data
Structured format
Closed environment (Database: As a set)
Slow growth (in context of Database)
Display of results in any order is not an issue
Select * from students;
Information
Un-structured format
Open environment (for every one), No
restriction on its contents
Order of result is important issue
Old terms….
Knowledge – contextualizing information
Comprehend the perceived information
Add context (hp, core i7, ITB, 8GB RAM)…
Context ultimately determines what’s actually what.
Contextualization aids comprehension. For example, an
arithmetic problem may not seem very practical until it is seen
within a story problem; the real-life situation contextualizes the
math problem and makes it more understandable.
Pythagoras's theorem
a2+b2=c2
DBPedia..
Extract structured data from Wikipedia
Example Wasim Akram
Date of birth, name, matches etc.
Database to RDF
Apply SPARQL statements ..
1.PREFIX
plant: <http://www.linkeddatatools.com/plants>
2.FROM
<http://www.linkeddatatools.com/plantsdata/plants.r
df>
3.SELECT ?name WHERE {
4.?planttype plant:planttype ?name.
5.}
6.ORDER BY ?name
What is in our daily life
Access data
Manipulate data (add, delete, change)
Process data
Generate information (tables, forms)
Create knowledge (reports, papers..)
Data is our life
Data is our daily bread
Do we have identifier for data?
Not really important if data is small and individual (your
class we don’t needs roll numbers)
Really important if data is huge and connected
? Should we need identifier for our data
? Why do we need our name, or NIC number
? Can you refer to someone without identifier
?a person with good heart----
Make our busy life less messy
We just got 24 hours per day, not more
Add identifier to our data
Give the everyone-agreed-unique-identifier to each data
-- the perfect world of our dreamland
We will not have any integration problem, most of the IT
departments can be closed
Different groups give different identifiers to the same
data – we can live with that, it is more real in our daily
life, standardization bodies and IT guys are helping
us (NIC is required for SIM Registration, but not as
roll number or registration number).
We are happy that we can refer to data
Where are our data
In computer
On the Web
In my paper notes
In printed books
…
Data are being digitalized and are available online
Web Data
Web data
Data on the Web
Online journal
Blog
Wiki
…
Data in physical world
Yourself
Table
Book in library
Computer you are using
…
The boundary is blurring
Paper is both in your hand and on the Web
How to refer data
Web data
DOI (Digital Object Identifier)
Alphnumeric strings, APA style starts with 10
OpenID (Logging into other websites using facebook)
URI (blog, wiki, homepage, …)
…
URI (Uniform Resource Identifier)
To identify or name a resource on the Internet
The main purpose is to enable interaction with
representations of the resource over a network,
typically WWW, using specific protocols
URN – like a person’s name
urn:isbn:0-486-27557-4 – Book of “Romeo and Juliet”
URL – like a street address
http://www.slis.indiana.edu
Linked Data
A term coined by Tim Berners-Lee
It describes HTTP-based Data Access by
Reference for the Web
Current web is changing from hypertext links
(link documents) to hyperdata links (linking data)
Data are small components of the resources
It drills deep to the details of the resources
Linked data provides a powerful mechanism for
meshing disparate and heterogeneous data
Vision from Sir Berners-Lee
“The Semantic Web isn’t just about putting data on the
web. It is about making links”.
Four Rules for linking data
Use URIs as names for things
Use HTTP URIs so that people can look up those names
When someone looks up a URI, provide useful information
(URI dereferencing)
Include links to other URIs, so that they can discover more
things
“Breaking them does not destroy anything, but misses an
opportunity to make data interconnected. This in turn
limits the ways it can later be reused in unexpected
ways. It is the unexpected re-use of information which is
the value added by the web”
W3C SWEO Linking Open Data Project
Project aims to
"The goal of the W3C SWEO Linking Open Data community project is to extend the
Web with a data commons by publishing various open datasets as RDF on the Web
and by setting RDF links between data items from different data sources."
Publish existing open license datasets as linked
data on the web
Interlink things between different data sources
Develop clients and applications that consume
linked data from the web
Bubbles in May 2007
Over 500M RDF triples
Around 120K RDF links between data sources
Bubbles in April 2008
>2B RDF triples
Around 3M RDF links
Bubble now
Semantic search engines
Use RDF
Sindice
Falcons (temporarily not available)
More on w3c website
Organization participating in the LOD
community
Academic
MIT, Univ Southampton, DERI, Open Univ,
Univ London, Univ Hannover, Penn State Univ,
Univ Leipzig, Univ Karlsruhe, Joanneum (AT),
Free Univ Berlin, Cyc, SouthEast Univ (CN), …
Commercial
BBC, OpenLink, Talis, Zitgist, Garlik, Mondeca,
Renault, Boad Interactive
What are Linked Data?
Linked Data require RDF
But not all RDF data are linked data
You have to compliant your RDF data
according to the four rules mentioned by
Berners-Lee
What is RDF?
Basic Ideas behind RDF
RDF uses Web identifiers (URIs) to identify
resources
RDF describes resources with properties and
property values
Everything can be represented as triples
The essence of RDF is the (s,p,o) triple
Resource
(subject)
Property
(predicate)
Value
(object)
Subject has a property with value “object ” (s,p,o)
RDF Triples
Triple
A Resource (Subject) is anything that can have a URI: URIs
or blank nodes
A Property (Predicate) is one of the features of the Resource:
URIs
A Property value (Object) is the value of a Property, which
can be literal or another resource: URIs, literal, blank nodes
Resource
(subject)
Property
(predicate)
Value
(object)
Literals can be the object of an RDF statement, but cannot be the subject
or the predicate
Do you have linked data
Linked data are just RDF triples
<rdf:Description about=“http://example.org/smith#albert”>
<fam:hasChild rdf:Resource="http://example.org/smith#brian">
<fam:hasChild rdf:Resource="http://example.org/smith#carol">
</rdf:Description>
How can I get RDF triples
Relational database:
D2R tools can convert them for you
RDFizers from SIMILE:
Can convert JPEG, MARC/MODS, OAI-PMH, OCW(MIT
Open Course), Email, BibTex, Java, Javadoc, etc. to RDF
Thumb of the rules
Understand your data
What do you want to have in your data
Do not reinvent – REUSE!
Potential ontologies/vocabularies
• FOAF, SIOC, Geo
URI Aliases
Different URIs for the same non-information resource
(Berlin, etc.)
owl:sameAs to link these URI aliases
More principles
Linked Data is simply about using the Web
to create typed links between data from
different sources.
The principle of Linked data is to:
Use the RDF data model to publish structured
data on the web
Use RDF links to interlink data from different
data sources.
Use HTTP URIs to identify resource
To avoid other URI schemes (URNs or DOIs)
Power of Linked Data
rdf:type
ying
foaf:Person
dblp:publications
foaf:name
foaf:publication
Ying Ding
foaf:knows
Stefan
foaf:based_near
72K
dp:population
db:Galway
skos:subject
dp:Dublin
skos:subject
dp:Cities_in_Ireland
What LOD can bring?
It will lift current document web up to a data web
LOD browsers can let you navigate between
different data sources by following RDF links.
It can drill down to the lower granularity of the
information
allowing you for more fine search on the web
making the question-answer search on the Web
possible
meshing up different data through RDF links
Making the built-on-top application easier
Document Web vs. Data Web
Document Web
Glued by hyperlinks
Data are HTML pages
Query result is HTML
pages, which can not be
further processed
Data are just interlinked,
but not integrated
Data access through
different APIs
Data Web
Glued by RDF links
Data are RDF triples
Query result is RDF
triples which can be easily
further processed (e.g.,
web services)
Data are interlinked and
integrated, and links are
typed
Data access through a
single and standardized
access mechanism
(maybe it will called in the
future LOD API?)
More about LOD
LOD Wiki
http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/Linking
OpenData
Tutorial on how to publish LOD data
http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/
Further readings and tools
W3C Track LOD WWW2008
http://www.w3.org/2008/Talks/WWW2008-W3CTrack-LOD.pdf
Linked Data Planet in New York 2008
http://linkeddata.org/slides/2008-06-nyc-ldp.pdf
LDOW2008 workshop in WWW2008
http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol369/
ISWC 2008 LOD tutorial
http://events.linkeddata.org/iswc2008tutorial
LOD mailinglist
© Copyright 2026 Paperzz