The Internet changes things

SHOE
A Knowledge Representation
Language for Internet Applications
The Problem
• HTML was never meant for computer
consumption; its function is for displaying
data for humans to read.
• The "knowledge" on a web page is in a
human-readable language (usually English),
laid out with tables and graphics and frames
in ways that we as humans comprehend
visually.
• Even with state-of-the-art natural language
technology, getting a computer to read and
understand web documents is very difficult.
• This makes it very difficult to create an
intelligent agent that can wander the web on
its own, reading and comprehending web
pages as it goes.
The Solution
• SHOE!
• Simple HTML Ontology Extensions
SHOE eliminates this problem by making it possible for web
pages to include knowledge that intelligent agents can actually
read.
• SHOE eliminates this problem by making it
possible for web pages to include
knowledge that intelligent agents can
actually read.
The Internet changes things
• The Web is a Knowledge Base.
• A massive source of information for agents
to make intelligent queries on.
• Requires a shift in our view of what a KB is
and what a KR language should be designed
for.
The Web as Knowledge Base
• The Web is massive
– Most KR systems have semantics too rich to
scale well
– Many KR languages have NP-hard complexity
– KR for Web must make
complexity/expressivity tradeoffs
Web as KB (cont’d)
• The Web is an “Open World”
– A Web agent is not free to assume it has
gathered all available information.
– Many KR systems assume a “closed world.”
– Unlikely, on the Web, that any KB describing it
could ever be complete.
The Web is Dynamic
• Web changes faster than any bot or agent
could keep up with.
• A KR system must assume that data can be,
and often will be, out of date.
• Without a unifying ontological framework
web agents will struggle to cross-map
comflicting knowledge structures
• The Web’s KR framework must be flexible
yet general to handle the on-line economy
of ideas.
Web as KB redux
• Viewing the Web as a Knowledge Base
changes the way we must look at KR and
KR languages.
• Web systems cannot assume that all of the
information is correct and consistent.
• Authority on the Internet is distributed.
No Central Control
• Each page’s reliability must be questioned.
• No guarantee on the availability of
information.
• Information from different sources can be in
disagreement, leading to inconsistency.
• Web Hoaxes
• On the Web no one knows you’re a dog.
Ontology
• Modern KR systems designed around
concept of categorization.
– Allows reasoning about the generality of a
concept
– allows specification of relationships between
these concepts.
• Such ontologies allow one to define what is
relevant and what is to be ignored
Ontologies on the Web
• Ontologies on the Web can be used to
structure information if we take into
account the properties discussed earlier.
• Let’s look at some of the problems that may
be solved with the use of ontologies
Heterogeneity
• Many file formats and protocols:
– images, music, movies, VR files
– HTTP, FTP, Telnet, Gopher
• Automated indexing is difficult.
• All of these resources are potentially useful
to someone.
• Need method to specify what information is
contained in these sources.
Lack of Structure
• Structure of HTML used primarily for
presentation, instead of information
retrieval.
• Difficult to infer semantic meaning from
them despite limited support for semantic
information (META tags, etc.)
• XML will allow semi-structured documents,
but will need some form of Ontology.
• No structures for classification or reasoning.
Contextual Dependency
• Reading documents, people draw on
contextual knowledge (domain, language)
to interpret statements.
• Context required to disambiguate terms and
provide framework for understanding
• Ontologies provide mechanism by which
context can be encoded on web pages or
other repositories of web-based information.
The SHOE Language
Basic Structure
• Ontologies
– define rules guiding what kinds of assertions
may be made and what kinds of inferences may
be drawn on ground assertions
• Instances
– entities which make assertions based on those
rules
Basic Structure
• SHOE treats assertions as claims being
made by specific instances (instead of facts
to gather as generally-recognized truth.)
• SHOE syntax is an application extension of
HTML
– also available in XML syntax
– SHOE also designed for more general
distributed knowledge and agent issues.
SHOE Ontologies
• SHOE has flexible facilities for ontologies
to be derived from one or more
superontologies in a multiple-inheritance
scheme, or for later versions to modify
earlier versions.
• Four basic data types
– strings, numbers, dates and boolean values
SHOE Ontologies
• An additional URL type is under
consideration.
• An ontology can define additional arbitrary
types
• An ontology can make category definitions
which specify the categories under which
instances can be classified.
SHOE Ontologies
• Relational Definitions
– <RELATION> tags specify the format of n-ary
relational claims made by instances regarding
other instances and data
• Inferential Declarations
– <DEF-INFERENCE> tags can specify
additional inferences agents may freely make
on ground information.
LKite:
URL as id to give
agents ability to
determine is
instance is really
what it claims to
be.
SHOE Instances
• Fill two functions:
– instances are arbitrary objects, like those in an
object-oriented database system.
– Instances are elements responsible for making
claims.
• Each instance has unique ID
– SHOE proposes, not requires, that the id be
based on the URL of the page where instance
found.
SHOE Instances
• Instances may specify delegate instances.
• Within an instance may be found category
claims and relation claims made by that
instance:
– category claim: instance x should be
categorized under category y.
– relational claim: instance claims that an n-ary
relation exists.
Formal Definition
• We’ll skip the details today, but say:
– SHOE’s semantic knowledge consists of a set
of claims, made by instances, about
relationships between ground atomic elements
(numbers, strings, instances, etc.)
– Claims are either ground claims explicitly
stated in instances or claims SHOE has inferred
via the simple rules defined in an ontology.
Language Features
• Compatibility with HTML/XML
– application of SGML
– HTML compatible syntax defined in an SGML
DTD derived from the HTML DTD.
– XML version:
• has familiar format
• can be analyzed and processed through DOM
• With XSL, SHOE markup can be machine and
human-readable.
Language Features
• Prevention of Contradiction
– assertions permitted, not retractions
– no negation
– no single-valued relations (relational sets
having only one value or a fixed number of
values)
– includes claimant as part of a claimed assertion.
Language Features
• Extensibility and Versioning
– Shared Ontologies - two ontologies referring to
a common concept should both extend an
ontology in which that concept defined.
– Each version of an ontology is a separate file
with a unique version number
– All versions of an ontology are accessible
– Ontologies can specify backward-compatibility
– Depends on compliance of onto-designers
Related Work
•
•
•
•
HTML
Wrappers
Ontobroker
Web Analysis and Visualization
Environment (WAVE)
• Ontology Markup Language (OML)
• Conceptual Knowledge Markup Language
(CKML)
SHOE vs. RDF
• RDF drawbacks:
– RDF is a semantic network without inheritance;
just nodes connected with named links
– RDF has no mechanism for defining general
inferences
– no way to map between different
representations of the same concept.
– RDF schema can’t rename properties to a local
vocabulary (no equivalence)
SHOE vs. RDF
• RDF Drawbacks (cont’d):
– no way to track revision of a schema unless
schema maintainer uses a consistent naming
scheme for the URIs.
– Use of XML namespaces leads to difficulty in
distinguishing RDF from a different DTD.
LKite:
ensuring that two
object references
are matched
when they
conceptually refer
to the same
object is an open
problem.
Language Features
• Other features:
– Separation of ontologies and instances (unlike
RDF)
– N-ary relations
– Uniqueness of identification
• the system will only interpret two objects as
equivalent when they are truly equivalent
Final Notes
• Concerns:
– versioning compliance depends on cooperation
of ontology designers
– reliance on “market forces” to weed out bad
ontologies
– relies on central repository of ontologies
– Scalability yet to be proved
– Ditto usability (simple tools needed)
– Language issues (instance vs. category)