Quantifying Qualitative Data – Big Time! Jukka Sihvonen© Department of Accounting and Finance Workflow Financial research Literary research Data acquisition minute(s) month(s) Data analysis minute(s) month(s) Results Quantitative - Replicable - Scalable - Easily revisable Service Jukka Sihvonen© | Big Data and Digitalization Qualitative - Not replicable - Not scalable - Revision is hard Traditional way Qualitative data - Photos - Movies - Speech - Text Analysis at the office by hand Jukka Sihvonen© | Big Data and Digitalization Results Application Programming Interface (API) Data refining API Data API Qualitative data - Photos - Movies - Speech - Text Analysis API Intelligence API Replicability – Speed – Possibilities Jukka Sihvonen© | Big Data and Digitalization Results Data APIs – retrieve machine readable data Twitter example: @univaasa Request … import twitter api = twitter.Api(my_credentials) tweets = api.GetSearch(term = “univaasa”, count = 10) Twitter API ... Response Jukka Sihvonen© | Big Data and Digitalization Examples - Wikipedia - Facebook - Twitter - Instagram - Suomi24 - Yle - Google Maps - Accuweather - Scopus Data refining APIs – metadata and converting Examples - Voice to gender - Names to ethnicity - Novel to characters - Articles to abstracts - Email to language Microsoft Cognitive API Jukka Sihvonen© | Big Data and Digitalization Speech to text Photo to text HTML to text PDF to text Analysis APIs – insights from non-numerical data Emotions… anger: 0.00, contempt: 0.00, disgust: 0.00, fear: 0.00, happiness: 0.73, neutral: 0.26, sadness: 0.00 surprise: 0.00 Me! Personality… This is my rifle. There are many like it, but this one is mine. My rifle is my best friend. It is my life. I must master it as I must master my life. inner-directed, strict, shrewd, skeptical, restrained Concepts… Our Father, who art in heaven, hallowed be thy name, thy kingdom come, thy will be done, on earth as it is in heaven. My rifle, without me, is useless... Give us this day our daily bread and forgive us our debts… Rifleman’s Creed Lord’s Prayer Examples face to emotion, speech to person, diary to personality, story to concept, correspondence to tone Jukka Sihvonen© | Big Data and Digitalization Christianity Christian prayer Linguistics Gospel of Matthew Lord's Prayer Intelligence APIs – teach machine to classify Training data Testing data Three Musketeers Moby Dick Crime and Punishment Character Which book? Accuracy Count de Rochefort Ishmael Dmitri Razumikhin D'Artagnan Three musketeers ? True Monsieur Bonacieux Captain de Deer Andrei Lebezyatnikov Grimaud Three musketeers ? True Duke of Buckingham Dough Boy Nastasya Petrovna Aramis Three musketeers ? True Bazin Starbuck Porfiry Petrovich Mousqueton Three musketeers ? True Felton Ahab Katerina Marmeladov Captain Boomer Moby? Dick True Anne of Austria Flask Pyotr Luzhin Father Mapple Moby? Dick True Athos Fedallah Pulcheria Raskolnikov Tashtego Moby? Dick True Kitty Stubb Alexander Zamyotov Pip Moby? Dick True Constance de Bonacieux Captain Bildad Ilya Petrovich Semyon Marmeladov Crime and ?punishment True Planchet Queequeg Rodion Raskolnikov Arkady Svidrigailov Crime and ?punishment True Monsieur de Tréville Daggoo Lizaveta Ivanovna Zossimov Moby? Dick False Milady de Winter Elijah Sofya Marmeladov Avdotya Raskolnikov Crime and ?punishment True Porthos Captain Peleg Alyona Ivanovna IBM Watson Jukka Sihvonen© | Big Data and Digitalization Scaling up – literary research For-loop that sends textual material to API: For Each Book in Library: For Each Page in Book: Result(Page) <= Send(Page, API) Next Page Next Book Cumulative Standardized Sentiment 1942 – 1944 Fritz Pfeffer joins the annex Empirical exercise: Anne Frank’s diary Library none Book The Diary of a Young Girl API Watson Sentiment Analysis Jukka Sihvonen© | Big Data and Digitalization “The sun is shining … I think spring is inside me. I feel spring awakening” Scaling up – communication studies For-loop that sends images to API: For Each Movie in Collection: For Each Frame in Movie: Result(Frame) <= Send(Frame, API) Next Frame Next Movie Empirical exercise: Trump-Clinton Debate Collection none Movie Final presidential debate API Microsoft Emotion API Trump’s primary facial expression is angry, and more so if not having the floor Jukka Sihvonen© | Big Data and Digitalization Hillary emphasizes message by raising eyebrows, expresses integrity by smiling Key takeaways - Non-numeric data can be obtained and analyzed at an industrial scale - The scientific workflow can be automated by chaining APIs cleverly - Huge implications on what can be researched and at what cost Jukka Sihvonen© | Big Data and Digitalization
© Copyright 2026 Paperzz