Digital Business Pathway

Markets for (Big) Data
Aija Leiponen, Cornell University, [email protected]
Joint work with Pantelis Koutroumpis and Llewellyn Thomas,
Imperial College London
2013: everybody is talking about big data
but… what is it?
Gartner: Emerging Technologies Hype Cycle
2013
Web 3.0?
• Capacity to capture and store information growing
exponentially
• Sensor networks, social networks, admin data, health
records
• Boon for social science… and business innovation?
Communication revolutions

Printing press
Steam engine
Telegraph
Telephone
Radio
Television

Networked data?





Agenda for today
1.
2.
3.
How are data different from other intangible/digital
assets?
How are data currently being traded and how do the
economic features of data influence the trading
mechanisms?
How will the Internet of Things emerge considering
the economic features of data and the available and
emerging trading mechanisms?
1. Creation of data value

”22.7”


”ºC”




Which instrument made the observation. Inalienability & provenance.
”24032017”
”60.1699”°N ”24.9384”°E
”18.6 22.1 25.3 24.0 22.7 19.9”


Units – what is being measured. Metadata is crucial
”sensor 2292334”


Observational data point from some instrument.Value?
When and where observed; time series. Connected data
”Is it a lot or a little?”

What is the environment in which observation is made? How does it matter?
Who cares about this? Judgment (models, analyses) & context (who,
how)
Data value capture

How to appropriate value (profit) from data? I.e. what
aspects of data can generate market power?





Control the data resource
Control the metadata
Control the connected data
Control the analytical tools, models, intelligence
Control the enabling platform
How to control the data resource AND
maximize its value?


NO Intellectual Property Rights for data
Secrecy – embed data in a service
can’t license data itself

Database right (EU) – prevent others from selling the
whole database
doesn’t apply to subsets

Contracts – license the data via contractual agreement
can sue for contractual breach; not prevent third parties from using data

Verification technologies – attach a Distributed Ledger to
the data and track its trading
Works 100% with parties who care about provenance. Maybe not others

Closed network of partners – share data within a
consortium through a combination of contracts, trust,
reputation effects, monitoring, consortium rules
Small network/market in order to effectively monitor & govern
No broader legal recourse in case of breach
 Differences between data and other
intangible assets
Record Data Content Software
Currency
Invention
Information
Type
Raw records or
structured
databases
Knowledge
(insights)
Knowledge
(instructions)
Pure value
Knowledge
(instructions)
Good Type
Intermediate/
Final
Final
Final
Final
Intermediate
Alienability
Low/medium
Medium
High
High
High
Inferability
High
Low
Low
Zero
Zero
Excludability
Limited
Variable
Variable
High
Variable
Protection
Method
Copyright
Secrecy or timing or timing
Copyright or
patents in some
cases
Distributed
Patents or
ledgers or other secrecy
verification tech
Protection
Aspect
Reuse
Expression
(patterns)
Expression
Transaction
(patterns) or
value
insight (invention)
Insight
Fungibility
Variable
Low
Low
Low
?
High
Characteristics of different data sources
Source of data Confidentiality Duration/ Alienability
useful life
Fungibility
Inferrability
Health care
High
>50 years
Low (health, retail, social
network, locational)
High?
Medium?
Public sector
administration
Medium
>50 years
Medium
Low
Low
10-20 years Medium
Low
Medium?
Manufacturing/ Medium
Operations
(sensor networks)
Individual
behavior
High
1-5 years
Low (health, retail, social
network)
High
High
Personal
Location Data
Medium
1-5 years
Medium
High
High
Summary (1)

The economics of data goods depend on an analysis of
data characteristics



Data are heterogeneous across contexts
Description, classification of data and its institutional
framework is necessary for understanding its
commercialization potential
Data goods substantially differ from other intangible
goods in terms of how their value is affected by:




Excludability (protection)
Provenance (metadata)
Alienability (ongoing implications for subjects)
Inferrability (implications of data integration for subjects)
(2) Data Market Design

Market efficiency requires (A. Roth)



Thickness/liquidity
Low transaction costs
Limited strategic behavior by participants




Provenance
Excludability
Stable matching: there are no more preferred potential
matches
Lack of “repugnance” (appropriateness/fairness)
Types of market matching mechanisms
Matching
Marketplace
design
Terms of
Exchange
Examples
One-to-one
1.
Bilateral
Negotiated
Data brokers
One-to-many
2.
Dispersal
Standardized
Twitter API
Many-to-one
3.
Harvest
Implicit barter
Google Services
Many-to-many
4.
Multilateral
Standardized or
negotiated
InfoChimps,
Microsoft Azure
“The (unfullfilled) promise of Data Marketplaces”, P. Koutroumpis, A. Leiponen, L. Thomas
1. Bilateral:
Proprietary data vs. other IP licenses
Data
Patents
Trademarks
Copyrights
License duration
1-2 years
10-20 years
Up to 20 years
1-5 years
Exclusivity
Rare
Frequent
Often regional
Rare
Confidentiality
Frequent
Rare
Rare
Rare
Use restrictions
Abundant
Concise
Specific
Concise
Warranty
‘As is’
Frequent
--
--
Obligation &
remedy
Correct/refund/replace/
update
--
--
--
Audit
Frequent
--
--
--
% of sales or
flat fee
NA
Per device
Modal fee schedule Annual subscription
“Data Contracts”, P. Koutroumpis, A. Leiponen, L .Thomas & J. Wu (2016)
2. Dispersal:
366 Open Data Contracts (T&C)
Personal
4%
International
2%
Contract type
Non-Profit
17 %
Academic
37 %
Government
21 %
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Proprietary
License
Open Database
Comons
GNU
FOI / Open
Government
Commercial
19 %
Data sharing
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Commercial use
Sharing Permitted
Share Alike
Not Noted
No Sharing
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Not Noted
No Commercial
Use Permitted
Commercial Use
Permitted
3. Harvest
Facebook Terms; Data Policy
Sharing Your Content and Information

You own all of the content and information you post on Facebook, and you can control how it
is shared through your privacy and application settings.

For content that is covered by intellectual property rights, like photos and videos (IP content), you
specifically give us the following permission: you grant us a non-exclusive, transferable, sub-licensable,
royalty-free, worldwide license to use any IP content that you post on or in connection with Facebook
(IP License).

When you publish content or information using the Public setting, it means that you are allowing
everyone, including people off of Facebook, to access and use that information, and to associate it
with you (i.e., your name and profile picture).
About Advertisements and Other Commercial Content Served or Enhanced by Facebook

You give us permission to use your name, profile picture, content, and information in connection
with commercial, sponsored, or related content (such as a brand you like) served or enhanced by us.

You permit a business or other entity to pay us to display your name and/or profile picture with
your content or information, without any compensation to you. If you have selected a specific
audience for your content or information, we will respect your choice when we use it.

We do not give your content or information to advertisers without your consent.

You understand that we may not always identify paid services and communications as such.
https://www.facebook.com/legal/terms/update
4. Multilateral:
Data Platform
Supply
Data
Providers
Complement
Demand
Customer
Algorithm
Providers
Data
Marketplace
Customer
Expert
Advice
Complement
Customer
• Selling data through a
platform
• Platform provider takes
the risk, provides
services, takes a cut
• Technical challenges in
standardization, rights
management,
• Strategic challenges in
revenue sharing, chicken
& egg (switching costs);
loss of control etc
Summary (2): Data marketplaces meet Roth
Marketplace
design
Bilateral
Dispersal
Harvest
Multilateral
Liquidity
Low
High
High
High
Transaction
costs
High
Low
Low
Low
Provenance
Clear
Unclear
Unclear
Medium
Excludability Stability of
Matching
Medium
Low?
Low
?
Low
?
Low
High?
Market liquidity and stability inversely related to
transaction costs and excludability (strategic behavior)
With current data market mechanisms, you can achieve
large markets with little control or small markets with
greater control
(3) How do we build an Industrial Internet?
(a) isolated industrial clusters/
data pools (cf. patent pools)
(b) adopt verification
technologies such as
Distributed Ledgers
(a) Common Pool Resources (Ostrom 1990)



Costly but not impossible to exclude
potential beneficiaries from obtaining
benefits from use
CPR  Tragedy of the Commons
Collective action resolves TOTC and
maintains resource if



Clearly defined boundaries identify
legitimate users
Rules define how CPR should be used;
metarules to change rules
Effective monitoring to enforce rules,
boundaries
Smart steel data
consortium

SSAB recently finalised an R&D project exploring SmartSteel, a digital platform enabling
steel to be ‘loaded with knowledge’.

Unique identity code in the steel plate connecting the plate and information provides customers and
their machinery with appropriate data and instructions to help them select and use SSAB steels.

The idea is to share expert knowledge in steel.

A platform built on cloud-based data that contains instructions for different stakeholders in the value
chain on how to use the steel. “By accessing and adding data on the platform, our customers would be
able to make optimal use of the steel and avoid costly and time-consuming failures and misuse.”

Pilot R&D project: SSAB, Meyer Turku, Cajo Technologies, Aalto U,VTT & DIMECC

If steel could provide all the data accumulated during the manufacturing and transportation chain, it
would help us significantly and would be the first step towards transparent value chains.

SSAB invites customers, process equipment manufacturers and other actors to join the development
work.
(b) Decentralized Data Platform –
distributed ledger for data?
User content & sensor data
Public
Ledger
…
transactionXX1
transactionXX2
transactionXX3
transactionXX4
transactionXX5
…
Tagging & Cleaning
A
B
E
D
G
I Trading
Aggregators
Processing
C
F
H
• “Bottom-up” approach in
information exchange
• Users and sensors collect
data
• Aggregators can buy/sell
data for profit; data
owners get paid and have
control over future uses
• Processing, analysis and
insights are separate
Decentralized marketplace
Multilateral marketplaces meet Roth
Marketplace
design
Bilateral
Dispersal
Harvest
Multilateral
Centralized
Multilateral
Decentralized
with DLT
Collective
action/
consortium
Liquidity
Provenance
Low
High
High
Transaction
costs
High
Low
Low
Clear
Unclear
Unclear
Excludability Stability of
Matching
Medium
Low?
Low
?
Low
?
High
Low
Medium
Low
High?
High
Low/
Medium
Clear
High?
High?
Medium/
low
High
Clear
Medium
Low?
 Distributed Ledger Technologies could conceivably enable large-scale,
anonymous multilateral data markets by enforcing excludability
 Data consortia can enable small-scale markets based on identity and reputation
but will they be sufficiently valuable and stable?
http://hackingdistributed.com/2016/08/04/byzcoin/
https://www.technologyreview.com/s/600781/technical-roadblock-might-shatter-bitcoin-dreams/
Multilateral marketplaces meet Ostrom
Marketplace design
Boundaries
Rules
Monitoring
Types of data
Bilateral
Clear
Strong
Effective
High value/high confidentiality
Dispersal
Unclear
Weak
Minimal
Low value/low confidentiality
Harvest
Unclear
Weak
Minimal
Low value/low confidentiality
Centralized multilateral
Medium
Medium
Weak
Medium value?
Med.confidentiality?
Decentralized multilateral Unnecessary
with DLT
Strong
Effective
High value/high confidentiality
Collective
action/consortium
Strong
Effective
High value/high confidentiality,
few sources
Clear
Collective governance is feasible in small settings; verification
tech required to achieve large scale for high-value data
Summary (3)
• Data really is a different kind of an intellectual asset
– Careful attention to technical, institutional detail
• Trading regimes: secrecy & trust or verification technology
(DLT) – or ‘FREE’
– Bilateral trading sets up a complex relationship with remedies,
audits, subscriptions as contractual features
– Decentralized Multilateral based on verification tech
anonymous and one-off – probably for high-value data due to
computing cost
– Collective data pooling can resolve control problems but will
not create market liquidity