Quantifying Qualitative Data

Quantifying Qualitative Data
– Big Time!
Jukka Sihvonen©
Department of Accounting and Finance
Workflow
Financial research
Literary research
Data acquisition
minute(s)
month(s)
Data analysis
minute(s)
month(s)
Results
Quantitative
- Replicable
- Scalable
- Easily revisable
Service
Jukka Sihvonen© | Big Data and Digitalization
Qualitative
- Not replicable
- Not scalable
- Revision is hard
Traditional way
Qualitative data
- Photos
- Movies
- Speech
- Text
Analysis
at the office
by hand
Jukka Sihvonen© | Big Data and Digitalization
Results
Application Programming Interface (API)
Data refining API
Data API
Qualitative data
- Photos
- Movies
- Speech
- Text
Analysis API
Intelligence API
Replicability – Speed – Possibilities
Jukka Sihvonen© | Big Data and Digitalization
Results
Data APIs – retrieve machine readable data
Twitter example: @univaasa
Request …
import twitter
api
= twitter.Api(my_credentials)
tweets = api.GetSearch(term = “univaasa”, count = 10)
Twitter API
... Response
Jukka Sihvonen© | Big Data and Digitalization
Examples
- Wikipedia
- Facebook
- Twitter
- Instagram
- Suomi24
- Yle
- Google Maps
- Accuweather
- Scopus
Data refining APIs – metadata and converting
Examples
- Voice to gender
- Names to ethnicity
- Novel to characters
- Articles to abstracts
- Email to language
Microsoft Cognitive API
Jukka Sihvonen© | Big Data and Digitalization
Speech to text
Photo to text
HTML to text
PDF to text
Analysis APIs – insights from non-numerical data
Emotions…
anger: 0.00,
contempt: 0.00,
disgust: 0.00,
fear: 0.00,
happiness: 0.73,
neutral: 0.26,
sadness: 0.00
surprise: 0.00
Me!
Personality…
This is my rifle. There
are many like it, but
this one is mine.
My rifle is my best
friend. It is my life. I
must master it as I
must master my life.
inner-directed,
strict,
shrewd,
skeptical,
restrained
Concepts…
Our Father,
who art in heaven,
hallowed be thy name,
thy kingdom come,
thy will be done,
on earth as it is in
heaven.
My rifle, without me,
is useless...
Give us this day our
daily bread and
forgive us our debts…
Rifleman’s Creed
Lord’s Prayer
Examples
face to emotion, speech to person, diary to personality,
story to concept, correspondence to tone
Jukka Sihvonen© | Big Data and Digitalization
Christianity
Christian prayer
Linguistics
Gospel of Matthew
Lord's Prayer
Intelligence APIs – teach machine to classify
Training data
Testing data
Three Musketeers
Moby Dick
Crime and Punishment
Character
Which book?
Accuracy
Count de Rochefort
Ishmael
Dmitri Razumikhin
D'Artagnan
Three musketeers
?
True
Monsieur Bonacieux
Captain de Deer
Andrei Lebezyatnikov
Grimaud
Three musketeers
?
True
Duke of Buckingham
Dough Boy
Nastasya Petrovna
Aramis
Three musketeers
?
True
Bazin
Starbuck
Porfiry Petrovich
Mousqueton
Three musketeers
?
True
Felton
Ahab
Katerina Marmeladov
Captain Boomer
Moby? Dick
True
Anne of Austria
Flask
Pyotr Luzhin
Father Mapple
Moby? Dick
True
Athos
Fedallah
Pulcheria Raskolnikov
Tashtego
Moby? Dick
True
Kitty
Stubb
Alexander Zamyotov
Pip
Moby? Dick
True
Constance de Bonacieux Captain Bildad
Ilya Petrovich
Semyon Marmeladov
Crime and ?punishment
True
Planchet
Queequeg
Rodion Raskolnikov
Arkady Svidrigailov
Crime and ?punishment
True
Monsieur de Tréville
Daggoo
Lizaveta Ivanovna
Zossimov
Moby? Dick
False
Milady de Winter
Elijah
Sofya Marmeladov
Avdotya Raskolnikov
Crime and ?punishment
True
Porthos
Captain Peleg
Alyona Ivanovna
IBM
Watson
Jukka Sihvonen© | Big Data and Digitalization
Scaling up – literary research
For-loop that sends textual material to API:
For Each Book in Library:
For Each Page in Book:
Result(Page) <= Send(Page, API)
Next Page
Next Book
Cumulative Standardized Sentiment 1942 – 1944
Fritz Pfeffer
joins the annex
Empirical exercise: Anne Frank’s diary
Library
none
Book
The Diary of a Young Girl
API
Watson Sentiment Analysis
Jukka Sihvonen© | Big Data and Digitalization
“The sun is
shining … I think
spring is inside
me. I feel spring
awakening”
Scaling up – communication studies
For-loop that sends images to API:
For Each Movie in Collection:
For Each Frame in Movie:
Result(Frame) <= Send(Frame, API)
Next Frame
Next Movie
Empirical exercise: Trump-Clinton Debate
Collection
none
Movie
Final presidential debate
API
Microsoft Emotion API
Trump’s primary
facial expression
is angry, and
more so if not
having the floor
Jukka Sihvonen© | Big Data and Digitalization
Hillary emphasizes
message by raising
eyebrows, expresses
integrity by smiling
Key takeaways
- Non-numeric data can be obtained and analyzed at an industrial scale
- The scientific workflow can be automated by chaining APIs cleverly
- Huge implications on what can be researched and at what cost
Jukka Sihvonen© | Big Data and Digitalization