Linking Environmental Data and Samples May 30

The last decimetre for linked data and
geo-samples: when pragmatics ace
semantics and syntax
Linking Environmental Data and Samples
May 30, 2017
Canberra, AUSTRALIA
Peter Fox; [email protected], @taswegian –
Tetherless World Constellation
Rensselaer Polytechnic Institute
*also Woods Hole Oceanographic Institution
Samples and Data?
Samples - born digital?
• Nope
• Well, mostly not
– A topic for a bar discussion (or panel)
• But, increasingly their metadata can be!
• Opportunity
– Ineffective – lack of current technical
capability, e.g. linked-data
Premise 1
• “we” remain comfortable for “commodity
offerings to rule our science”
– Without being part of the change
Premise 2
• Producers rather than consumers
dictate what is captured and in what
format
– Approach what technical solutions are
pursued from a “use” perspective
– Where would we be if OGC had been
really interested in samples?
Premise 3
• “we've” not hacked into the data/
metadata generation pipelines to the
extent that we must so that consumers
(the pragmatists) can use and learn
from extant samples and sample
collections
– Stay tuned for this part…
Caveat emptor
• Parts of what follow <do> exist …
• However, not as part of a coherent
whole; neither architecture nor services
• Thanks to Bryan Broderic and Mark
Gahegan for long-in-the-past
discussions but they cannot be held
responsible for what comes next ;-)
Premise 1
• “we” remain comfortable for “commodity
offerings to rule our science”
– Without being part of the change
Metadata encoding(s)
• EXIF/IPTC for images
• Geo-TIFF for images
• IGSN for samples (XML)
• Collect more automatically of course…
• However, it is how you use it, structure
it, and what “schemas” to adopt…
Schema.org/datasets Slide from Ruth Duerr
Slide from Ruth Duerr
NSDIC landing page
What “Google” sees…
Slide from Ruth Duerr
Thus…please…
• A Samples and SampleCollection
extension for schema.org
• Revisions to Dataset and
DatasetCollection schema to add
”sample” fields, e.g. fromSample,
isSampleType, …
• This would mean
– IGSN metadata in RDF (RDFa)
– Landing pages (Samples, Collections, ?)
– And more …
Premise 2
• Producers rather than consumers dictate
what is captured and in what format, so
– Approach what technical solutions are pursued
from a “use” perspective
– Cf. schema.org just presented but has
implications for what fields are in the schema
and which are populated, e.g.
• isdivisible
• isavailable (vs. exists)
– sx.igsn.org (cf. dx.crossref.org or …) with
content negotiation (sample “citation” – maybe)
– Add samples to your ORCID profile ;-)
Data-Information-Knowledge
Ecosystem
Producers
Consumers
Experience
Data
Creation
Gathering
Information
Presentation
Organization
Knowledge
Integration
Conversation
Inference
Context
Sample
Transduction
16
Producers
Consumers
Quality Control
Quality Assessment
Fitness for Purpose
Fitness for Use
Trustee
Trustor
17
Semiotic model
18
Semiotics
19
20
Semiotics
• Also called semiotic studies or semiology, is
the study of sign processes (semiosis), or
signification and communication, signs and
symbols
Compute Entropy/
Conditional Ent.
“Safety/ navigability”
“Egg code”
“Thickness,
age,
etc. of 21
ice”
Premise 3
• “we've” not hacked into the data/
metadata generation pipelines to the
extent that we must so that consumers
(the pragmatists) can use and learn
from extant samples and sample
collections
– See Premise 1 and 2
– Revisit Architecture(s) and Services
– Conceptual and Logical models v. schema
(Information) Architecture
• Definition:
– “is the art of expressing a model or
concept of information used in activities
that require explicit details of complex
systems” (wikipedia)
– “… I mean architect as in the creating of
systemic, structural, and orderly principles
to make something work - the thoughtful
making of either artifact, or idea, or policy
that informs because it is clear.” Wuman
23
More detail to connect us
• “The term information architecture
describes a specialized skill set which
relates to the interpretation of information
and expression of distinctions between
signs and systems of signs.” (wikipedia,
emphasis added)
“Information architecture is the
categorization of information into a
coherent structure, preferably one
that the most people can
understand quickly, if not inherently.
24
Semiotic triangle
• When you build an information system
(elements, relations, operation), it has
“SYMBOLS” to stand for “SOMETHING”
• Design of your symbols and how they go
together (architecture) enables the
“THOUGHT” (or not) A sample I can USE.
Sample,
type, etc.
25
And yet, I’m still not done..
http://4.bp.blogspot.com/-7mYclB2oypk/TWrlhBPvHxI/AAAAAAAAALc/mwjhBbuZ9kU/s1600/yawn4.jpg
+Provence (please)
Investigator
isA
Mapping the many
sample use cases into
PROV-O
Drill core
isA
isA X-ray
27
Born digital?
• Until samples and people are born
digital, social and cultural
considerations will be present (whole
session on that later in the week, so I
did not want to pre-empt it)
• We know how to do everything I
presented (and more I did not)
• Ready? Go.
• Me: [email protected], @taswegian
Informatics enables a new approach
• Use cases
– Pragmatics
• Stakeholders
• Distributed
authority
• Access control
• Semantics!
• Maintaining
Identity
RPI Tetherless World
Constellation tw.rpi.edu
• Government Data
• Health care/Life Sciences
• Environmental Informatics
Future Web
•Web Science
•Policy
•Social
Hendler
Xinformatics
•Data Science
•Semantic eScience
•Data Frameworks
Fox
Lots of technology but the oldest
building on campus!
Semantic Foundations
•Knowledge Provenance
•Inference, Trust
Senior scientists
+ ~ 40 = Post-docs, Staff, Grad, UGrad McGuinness
Met-uh-dat-ah