Hybrid recommendation systems combining user

Eindhoven University of Technology
MASTER
Hybrid recommendation systems combining user-preferences with domain-expert
knowledge
Tufis, V.
Award date:
2014
Disclaimer
This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student
theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document
as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required
minimum study period may vary in duration.
General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners
and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.
• Users may download and print one copy of any publication from the public portal for the purpose of private study or research.
• You may not further distribute the material or use it for any profit-making activity or commercial gain
Take down policy
If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately
and investigate your claim.
Download date: 15. Jun. 2017
Vlad Tufis,
Hybrid recommendation systems
combining user-preferences
with domain-expert knowledge
School of Science
Thesis submitted for examination for the degree of Master of
Science in Technology.
Espoo 30.06.2014
Thesis supervisor:
Prof. Heikki Saikkonen
Thesis advisor:
Lic. Tech. Håkan Mitts
aalto university
school of science
abstract of the
master’s thesis
Author: Vlad Tufis,
Title: Hybrid recommendation systems combining user-preferences
with domain-expert knowledge
Date: 30.06.2014
Language: English
Number of pages: 10+69
Department of Computer Science and Engineering
Professorship: Software Systems
Code: T-106
Supervisor: Prof. Heikki Saikkonen
Advisor: Lic. Tech. Håkan Mitts
The ever-growing popularity and adoption of smartphones into everyday life has
transformed these devices into more than merely a tool which helps one maximize
his productivity, but a truly real-life companion. However, by staying connected all
the time, users generate large quantities of data, which in turn overwhelms them at
later points in time, thus making the task of choosing the right piece of information
in an optimal way virtually impossible. One solution to this problem is to equip
applications with intelligent modules able to filter out non-relevant information
and present highly focused information guaranteed to be relevant for the end-user.
This work will focus on one subclass of such intelligent modules: recommendation
systems. Recommendation systems can help achieve an efficient filtering of large
amounts of information and match it against a previously inferred user profile.
However, there are two aspects worth noting: first, from a technical standpoint,
there is the challenge of understanding the domain of applicability and making the
right design decision that will generate a higher accuracy for the RS which, in turn,
will lead to increased user satisfaction; second, from a business perspective, an
increasing number of economic agents realize that having an intelligent algorithm
as part of their value proposition might be the difference between success and
failure. The question in this situation is therefore “how to design an intelligent
module such that it will stand out from the crowd while providing accurate and
valuable information?” In particular, this work will on recommendation systems
that combine user preferences with domain-expert knowledge. As an operating
domain I chose a mixture of Fitness and Occupational Health, and Wellbeing
and I will provide answers for the following questions: (1) How can domain-expert
information be used to enhance user-preference based recommendations? (2) What
user benefits can be achieved by augmenting preference-based recommendations
with domain-expert information?
Keywords: recommendation system, expert system, domain-expert, knowledge,
hybrid, mobile, health
iii
Hosting Institutions and Organizations
EIT ICT Labs
EIT ICT Labs is an initiative of the European Union and implemented by the European Institute of Innovation and Technology to establish Knowledge and Innovation
Communities. EIT ICT Labs creates a platform to bring together researchers, members of academia and business people in order to drive European leadership in ICT
innovation for economic growth and quality of life.
EIT ICT Labs offers higher education in ICT, integrated with innovation and
entrepreneurial education through its Doctoral School, Master School, Open School
and Summer school. The EIT ICT Labs Master School 1 is a joint initiative of
the leading technical universities and business schools in Europe, coupled with the
mentoring and partnering from leading European research organizations and business partners. The Master School offers two-year educational programs along seven
technical majors. Each program includes a minor in Innovation & Entrepreneurship, and features a geographical mobility between the first and the second year of
studies. Also, a winter-school, summer-school and an internship in a company are
compulsory elements of the master program.
Eindhoven University of Technology
Eindhoven University of Technology is a top university in The Netherlands, worldwide known for the top-quality of the provided educational programs and research
activities carried-on. It is ranked 106 in the world according to Times Higher Educational World University Rankings of 2013-2014, the best Dutch engineering and
science university by the Study Guide to Universities 2013, and best university in
the Netherlands according to the weekly magazine Elsevier.
Eindhoven University of Technology was the entry-point university in this Master program, providing fundamental education in Business Information Systems,
Business Process Management Systems, Introduction to Services, Innovation and
Entrepreneurship.
Aalto University
Aalto University is a new university founded in 2010, but with centuries of experience. Aalto University was created from the merger of three top Finnish universities,
The Helsinki School of Economics, Helsinki University of Technology and The University of Art and Design Helsinki, to encompass and stimulate new joint research
and teaching programs.
Aalto School of Science and Technology is located in Otaniemi, the largest technology, innovation and business hub in Finland and in Northern Europe, with respect to the number of companies and R&D centers located in the area. Through
1
http://www.eitictlabs.eu/education/master-school/
iv
it s close connections with the industry, Aalto University provides students with
excellent research and entrepreneurial opportunities.
Aalto School of Science was the exit-point university in this Master program,
providing education in the areas of Digital Services, Smart Spaces, Multimedia and
Mobile Services.
Framgo
Framgo is a Finnish start-up founded in September 2012, activating in the domain
of occupational-health and wellbeing 2 .
Framgo provided the necessary setup for completing the internship required by
the EIT ICT Labs Master School.
2
http://www.framgo.com
v
Acknowledgement
To my family who has constantly supported me throughout my studies. To my
girlfriend for accepting to spend the most part of the last two years apart. You were
the driving force that determined me to complete this program. Thank you!
I would like to thank Prof. Heikki Saikkonen and especially Håkan Mitts for
their valuable feedback and insightful conversations we had during the preparation
of this thesis. A special thank you goes to Prof. Mykola Pechenizkiy in Eindhoven
University of Technology, for providing me the theoretical foundation needed to
complete this thesis.
Last, but not least, I would like to thank all my colleagues from the past two
years. You were part of my learning experience; I had a wonderful time with you
and I am very happy to have known you.
Otaniemi, 30.06.2014
Vlad Tufis,
vi
Contents
Abstract
ii
Hosting Institutions and Organizations
iii
Acknowledgement
v
Contents
vi
Abbreviations
x
1 Introduction
1.1 Problem description .
1.2 Research questions .
1.3 Thesis scope . . . . .
1.4 Thesis structure . . .
.
.
.
.
1
1
2
3
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
6
7
8
8
9
10
10
11
12
12
14
14
16
16
17
17
19
20
21
21
22
22
23
23
23
23
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 Theoretical Background
2.1 Recommendation systems basics . . . . . . . . . . . . . . . . .
2.2 Personalization process . . . . . . . . . . . . . . . . . . . . . .
2.3 Similarity measures and distance metrics . . . . . . . . . . . .
2.3.1 Utility matrix . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 Jaccard index . . . . . . . . . . . . . . . . . . . . . . .
2.3.3 Cosine similarity . . . . . . . . . . . . . . . . . . . . .
2.3.4 Euclidean distance . . . . . . . . . . . . . . . . . . . .
2.4 Content-based filtering . . . . . . . . . . . . . . . . . . . . . .
2.4.1 A basic architecture . . . . . . . . . . . . . . . . . . .
2.4.2 Content-based recommendation advantages . . . . . . .
2.4.3 Examples from literature . . . . . . . . . . . . . . . . .
2.5 Collaborative filtering . . . . . . . . . . . . . . . . . . . . . . .
2.5.1 A basic architecture . . . . . . . . . . . . . . . . . . .
2.5.2 User-User filtering . . . . . . . . . . . . . . . . . . . .
2.5.3 Item-Item filtering . . . . . . . . . . . . . . . . . . . .
2.5.4 Collaborative filtering advantages . . . . . . . . . . . .
2.5.5 Examples from literature . . . . . . . . . . . . . . . . .
2.6 Demographic filtering . . . . . . . . . . . . . . . . . . . . . . .
2.6.1 Examples from literature . . . . . . . . . . . . . . . . .
2.7 Common problems and limitations of recommendation systems
2.7.1 Over-specialization . . . . . . . . . . . . . . . . . . . .
2.7.2 Limited content analysis . . . . . . . . . . . . . . . . .
2.7.3 Cold-start (new-user) . . . . . . . . . . . . . . . . . . .
2.7.4 Cold-start (new-item) . . . . . . . . . . . . . . . . . .
2.7.5 Serendipity . . . . . . . . . . . . . . . . . . . . . . . .
2.7.6 Shilling-attacks . . . . . . . . . . . . . . . . . . . . . .
2.7.7 Gray sheep . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
vii
2.8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
24
24
26
27
3 Expert systems
3.1 Expert systems architecture . . . . . . . . . . . . . . . .
3.2 The knowledge acquisition process . . . . . . . . . . . . .
3.3 Limitations and pitfalls . . . . . . . . . . . . . . . . . . .
3.3.1 Choosing the right problem . . . . . . . . . . . .
3.3.2 Collaborating with the domain-expert . . . . . . .
3.3.3 Liability issues . . . . . . . . . . . . . . . . . . .
3.4 Combining recommendation systems and expert systems
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
29
29
31
31
31
32
32
32
35
2.9
Hybrid filtering . . . . . . . . . . . . . .
2.8.1 Classic examples of hybridization
2.8.2 “Exotic” hybrid approaches . . . .
Summary . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4 OmaTauko - Concept Description
38
4.1 OmaTauko - Concept description . . . . . . . . . . . . . . . . . . . . 38
4.2 Domain model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5 System Design and Implementation Details
5.1 Motivation for a domain-expert enhanced recommendation system
5.2 Domain expert involvement . . . . . . . . . . . . . . . . . . . . .
5.3 System description . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.2 Choosing the similarity measure . . . . . . . . . . . . . . .
5.3.3 Choosing the similarity threshold . . . . . . . . . . . . . .
5.3.4 Oskar architecture . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
45
45
46
46
46
47
52
53
6 System Evaluation
58
6.1 Experiment design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.2 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7 Conclusion and Future Work
64
7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
8 Appendix - Survey questions
66
viii
List of Figures
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Personalization process . . . . . . . . . . . . . . . . . . . . . . .
Utility matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Content based recommendation architecture . . . . . . . . . . .
Collaborative filtering architecture . . . . . . . . . . . . . . . . .
Collaborative filtering architecture - utility matrix . . . . . . . .
The long tail . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Demographic filtering architecture . . . . . . . . . . . . . . . . .
Expert systems architecture . . . . . . . . . . . . . . . . . . . .
Application starting screen . . . . . . . . . . . . . . . . . . . . .
Break configuration . . . . . . . . . . . . . . . . . . . . . . . . .
Performing an exercise . . . . . . . . . . . . . . . . . . . . . . .
Monthly statistics view . . . . . . . . . . . . . . . . . . . . . . .
Overall statistics view . . . . . . . . . . . . . . . . . . . . . . .
Scheduling reminders . . . . . . . . . . . . . . . . . . . . . . . .
User details . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
OmaTauko - Domain model . . . . . . . . . . . . . . . . . . . .
Domain-expert defined relative weights . . . . . . . . . . . . . .
Similarity metric comparison, t =< m, d, c > . . . . . . . . . . .
Similarity metric comparison, t =< m, m, m, m, d, d, d, d, c, c, c >
Similarity metric comparison, t =< m1 , m2 , m3 , m4 , d, c > . . . .
Heat-chart map of similarities, wd1 , τ = 0.8 . . . . . . . . . . . .
Oskar architecture . . . . . . . . . . . . . . . . . . . . . . . . .
Distribution of skipped tasks over break duration . . . . . . . .
Histogram of recommended/skipped exercises . . . . . . . . . .
Recommendation system success from a user‘s standpoint . . . .
Questionnaire results . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6
8
11
15
16
19
20
30
39
40
40
40
40
41
41
41
48
49
50
51
52
54
60
61
62
63
ix
List of Tables
1
2
3
Long tail data source . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Recommendation systems - overview . . . . . . . . . . . . . . . . . . 28
Recommendation systems vs. Expert systems . . . . . . . . . . . . . 36
x
Abbreviations
Abbreviation
Explanation
RS
CBR
CF
UM
ES
KE
KB
tf-idf
UX
UI
Recommendation system
Content-based recommendation
Collaborative filtering
Utility matrix
Expert system
Knowledge engineer
Knowledge base
Term frequency - inverse domain frequency
User experience
User interface
1
1
Introduction
Due to the constant advancements in technology in the last 20 years, people are now
able to stay connected to the Internet from virtually any place, at any point time.
Whether they are using their PCs, laptops or mobile devices, they have constant
access to the Internet and are able to consume a wide variety of information.
In particular, the adoption of mobile technologies in every-day life has impacted
every aspect in which people communicate and run their businesses. In U.S. only,
the mobile device sales are expected to reach $215 million by the end of 2016, an
increase of 25% compared to 2009, while the revenue of mobile data usage is expected
to reach $180 billion, a staggering 85% increase over the same period of time, in the
same market [11].
The mobile phone is no longer a device used to make a phone call and send
text messages, but rather a facilitator for interactivity and information sharing, a
true real life companion. The multitude of sensors with which a mobile phone is
nowadays equipped, coupled with forward thinking and creativity of entrepreneurs
has led to the creation of millions of applications in a myriad of domains ranging from
orientation, shopping, touristic activities, fitness and workout, up to more sensitive
domains like healthcare or automotive (i.e. augmented reality GPS software) where
security, privacy and precision are highly important.
1.1
Problem description
However, by always staying connected, people generate an immense amount of data.
For example, 90% of the data currently available have been generated only in the
past two years [14]. This process of constantly generating more and more data has
led to a big problem for the user - information overload. To alleviate this problem,
researchers and business people have proposed and successfully deployed a number
of information filtering techniques; search engines, automated information retrieval
techniques (web-crawlers) or recommendation systems, all solve the same problem
of information overloading.
This work will focus on a particular class of information filtering techniques,
namely recommendation systems. Recommendation systems have been a hot area
for research in the recent years and have been successfully deployed in a variety of
domains to boost business values. Typical areas include recommendations of movies,
music, books or research articles with large vendors like Netflix, Amazon or eBay investing important resources in developing state-of-the-art recommendation systems
that would increase user-experience, customer satisfaction, customer retention and
ultimately increasing revenues.
An equally hot topic in the recent years, and growing more popular together
with the proliferation of mobile devices, is mHealth. mHealth is the term coined
for “medical and public health care practice supported by mobile devices, such as
mobile phones, patient monitoring devices, PDAs, and other wireless devices” [11].
According to the same report, 31% of cell phone users have used their phones to
search for health information. With an increase of 134% over one year (from 2010
2
to 2011), reaching 18.5 million mobile users looking for health information (personal
health, fitness, wellness and information on health services), this category is the one
gaining most popularity in the mobile users segment. Furthermore, as of April 2012,
40000 mHealth apps are available across all major mobile platforms (iOS, Android,
Windows Phone) with 70% of them being targeted for laymen and every-day use
[11].
However, more is not always better. As Azar et al. claim in [4], many of the
mobile apps fall short of incorporating evidence-based content and theory-based
strategies, that would lead to behavioral changes in health habits for the mobile
users. The most important reason for failing is that, in their hunt for increasing their
user-base, app vendors focus more on the social aspect of the app (transforming the
app in a social game, sharing results with the community, performing actions which
are popular in the community, getting approved by the community, etc). I consider
that the focus of such apps should rather be on what people need to do, instead of
what people would like to do; since users typically do not know to articulate a need,
especially in a very specific domain such as health-care, domain-expert help is much
needed. Costabile et al. [8] define a domain-expert as a professional with extensive
knowledge in a specific domain. Examples of domain-experts include: accountants
in the domain of accountancy, lawyers in the domain of law, automotive engineers
in automotive industry, medics in healthcare or personal trainers in fitness.
Additionally, the financial implications of focusing on treating rather than preventing are dear. Health care costs raise up to 17.9% of the gross domestic product
(GDP) in the US, 8-9% in Europe and 4.5% in China [1]. Moreover, the cost for
treating chronic diseases represents 75% of all the health care costs [1]. The need to
prevent instead of treat is logical and obvious.
Mobile technology has the potential to solve at least part of this stringent problem by preventing acute problems (such as neck and back pain, stress, obesity) to
lead to chronic ones. By integrating domain-expert knowledge (fitness training, nutrition advice, weight-loss advice, etc.) in information systems, coupling them with
the power of recommendation systems to guide the users through a vast universe
of information, and using mobile phones as the delivery platform, end-users would
be able to gain access to a wealth of high-quality information regardless of their
location, or the location of the domain-expert.
1.2
Research questions
A solution for the problem I intend to tackle can be provided by answering the
following research questions:
1. How can domain-expert information be used to enhance user-preference
based recommendation?
In order to answer this question, I will implement a recommendation system that mixes domain-expert knowledge with user-preferences previously collected. The domain-expert knowledge will be provided by a fitness personal
trainer, while user-preferences are collected continuously via a mobile applica-
3
tion. Sections 5.2 and 5.3 will describe in detail the two information sources I
intend to use, as well as the architecture of the recommendation system.
2. What user benefits can be achieved by augmenting preference-based
recommendations with domain-expert information?
The answer for this question will consist of an evaluation of the recommendation system along two major coordinates. An objective measurement will assess the performance of the recommendation system by counting the number of
successful recommendations explicitly indicated by users through a survey. A
subjective measurement will evaluate the benefits of the proposed recommendation system along several dimensions. Section 6 will detail the methodology
employed for the evaluation, the results and the complementary discussion.
1.3
Thesis scope
This work is focused on the implementation and evaluation of a recommendation
system that leverages the knowledge of a domain-expert in order to provide bettertailored recommendations for its users. The domain chosen for building the recommendation system is occupational health and well-being. The decision is based on
the fact that this work was performed during an internship at a technology-based
Finnish startup which activates in this domain.
The commercial nature of the product which will benefit from this recommendation system has rendered impossible the testing of the recommendation system in a
real-life scenario. Instead, in order to evaluate the system, a survey that emulates
the behavior of the real system was designed, and 35 participants were included in
the evaluation. Most of the users were in contact with the recommendation system
and the product for the first time when filling in the survey, therefore a short description of the product and a use-case were provided for each participant in order
to better understand the scenario.
1.4
Thesis structure
The first section presents a short introduction to the topic, the problem formulation
and the research questions which are addressed.
Section 2 and 3 provide the theoretical background of this thesis, and define
the key concepts. First, the personalization process implications and benefits are
described. Then, the focus moves on presenting the notion of a recommender system, typical algorithms used to provide recommendations and connected terminology. Finally, a brief overview of expert-systems and their architecture is offered.
Throughout both chapters the existing body of literature is used to provide various
examples of the notions being defined.
Section 4 presents the current state of OmaTauko - a commercial product-service
system which enables end-users to keep and track micro-breaks aimed at decreasing
their muscular-skeletal problems and increasing their energy levels.
4
Section 5 presents the design and implementation details of Oskar - a recommendation system which blends user preferences and domain expert knowledge to
recommend users with physical exercises that fit the users‘ profile.
Section 6 presents the experiment design and data collection methods, and discusses the obtained results, as well as the limitations of the proposed approach.
Finally, section 7 is reserved for presenting future directions of research and a conclusion of this study.
5
2
Theoretical Background
This chapter will explain the key concepts present in this work. First a definition of
the recommendation system is provided. Next, the personalization process is presented and some popular metrics used for measuring similarity between the entities
participating in such a system are introduced; the main approaches to provide recommendations are detailed before discussing some of the problems and limitations
recommendation systems have to face.
2.1
Recommendation systems basics
The existing body of literature provides various definitions for a recommendation
system. For example, Meteren et al. [33] define RSs as a special type of information
filtering systems, where information filtering is concerned with selecting a subset
of items from a large collection which is likely to be deemed interesting and useful
by the user. Similar to [33], Bogers et al. [5] define RSs as a class of personalized
information filtering technologies whose aim is to identify which items from a catalog
are likely to be of interest for a user.
In [6] a recommendation system is described as a solution for the information
overloading problem; it entails delivering personalized information services which
ideally will help the user in selecting the desired item. Ricci [26] adopts a similar
definition for the RS, as an information provider and facilitator in the decisionmaking process.
In [20], the author argues that a recommendation system makes use of justifications to recommend products to customers and ensure the customers like those
products. The justifications can be obtained from preferences specified explicitly or
induced from past user behavior.
This work will use the definition provided by Semeraro et al.[19] according to
which recommendation systems are one way to guide a user into a large space of
possible alternatives, towards those items considered to be of interest. They involve
predicting user responses to various options and suggesting those options for the
end-user based either on hers, or the community‘s past behavior. In that sense, recommendation systems are both an information-filtering as well as a personalization
technique.
Based on the type of information used for computing the predictions, recommendation systems can be classified in two major categories: content-based — they
examine the properties of the recommended items, or collaborative — exploiting
the similarities between users or items [18]. Both methods have strengths as well as
weaknesses, and the adoption of one over the other depends on the context of the
application, as well as the type of data they make use of.
A smaller category of recommendation systems leverages the demographic information of users (age, gender, location, income) in order to provide demographicbased recommendations. Also, in order to compensate for the weaknesses of contentbased filtering and collaborative filtering, it is a common practice to merge the two
methods in order to obtain a hybrid recommendation system.
6
However, before discussing the different flavors of recommendation techniques,
the personalization process is presented, followed by some definitions of the popular
metrics used in this domain.
2.2
Personalization process
Adomavicius and Tuzhilin [2] define personalization as the tailoring of offerings
(content, services, product recommendations) from providers to consumers based
on existing knowledge about their preferences and behavior. The personalization
activity is typically performed with particular goals in mind, such as improving
customer‘s experience when interacting with a product/service, increasing customer
retention and satisfaction or increasing sales.
Personalization is an information-intensive activity and it involves operating on
large data sets, rapid data-collection and processing of large volumes of information;
also, the results of the analysis have to be quickly actionable items. For these reasons,
personalization is better suited for the online world, rather than for the offline one.
In [2] the personalization activity is viewed as a process consisting of three main
stages. First, the provider needs to understand the customer through data collection
and analysis. The output of this stage is a comprehensive repository of user profiles
storing information about their behavior on the online platform. In the second stage,
the information stored in the profiles is matched against certain rules in order to
deliver personalized offerings for the users. The final stage involves measuring the
impact of personalization and adjusting of the personalization strategy to better
suit the customers upon their next visit of the platform. A representation of the
personalization process is illustrated in Fig. 1.
Figure 1: Personalization process - adapted from [2]
A similar approach is presented in [34] which breaks down the personalization
process in 5 steps including user identification, user data collection, user data interpretation, deciding upon the personalization itself and adaptation to the new
context. On top of the definition provided in [2], the authors of [34] emphasize
the importance of including the user in all the stages of the personalization process
and identify a set of issues that impact a user‘s experience when interacting with a
personalization system:
7
• predictability - the user must be able to predict the outcomes of her actions
before the new content is generated
• comprehensibility - the user must be able to understand how she is being
modeled by the system and how does the personalization process work
• controllability - stemming from predictability, a user should be able to control
her user-model and what content will be generated
• unobtrusiveness - the user can complete her task using the information system
without being distracted by the personalization process
• privacy - the user should not have the feeling that the user model violates her
privacy
• breadth of experience - the user should not be prevented from discovering new
items using the information system; that is, the personalization process does
not develop only in one direction.
• system competence - the user must not have the feeling that the system generates faulty recommendations or the user model is built in a faulty manner
[34].
As presented in [2], the process starts with the task of data acquisition from
various sources. These include explicit user actions (such as filling in a profile, rating
various items, bookmarking websites, liking pages) or implicit actions coupled with
various heuristics to interpret them (e.g., spending more than t seconds on web-page
means that the user deems it relevant for her needs, watching a video until the end
means that the user has enjoyed it).
Once the data is collected it needs to be structured into user profiles. User
profiles are matched with item profiles to determine the extent to which a set of
items fits the user needs and wants. Matching techniques include recommendation
systems, rule-based or case-based expert systems or statistics-based approaches [2].
The matching process generates lists or sets of relevant items. The list may be
ordered on relevance, predicted rating or simply unsorted. An explanation of why
the delivered items are considered relevant is usually included.
The final stage involves measuring the impact of the personalization impact. To
this end, metrics should be defined and performance goals set (i.e. the accuracy of
the predictor has to be at least 80% — meaning that 80% of the predicted ratings
were estimated correctly). The information collected in the measurement stage can
then be used as feedback and integrated at each stage of the personalization process
to improve the performance, or refine the behavior of each component.
2.3
Similarity measures and distance metrics
Before moving on to describe the various categories of recommendation systems, I
will first summarize some of the popular notions and metrics used in this domain;
8
namely, the notion of utility matrix, Jaccard index, cosine similarity and Euclidean
distance will be considered.
2.3.1
Utility matrix
The utility matrix (UM) (Fig.2) captures the preference relationship between a user
and an item. One dimension of the matrix represents the users of the system;
the other dimension represents the items present in the system, which should be
recommended to the users.
U1
U2
U3
U4
I1
1
1
4
I2
5
I3
I4
2
3
3
5
I5
I6
4
5
5
Figure 2: Utility matrix representing the ratings of four users over six items
An element in the matrix, located at the intersection between line i and column
j represents the rating useri has awarded for itemj , under the assumption that users
are represented on lines, and items on columns. An example of a utility matrix
consisting of four users and six items is represented in Fig. 2. In this example, the
users were able to award ratings on a scale from 1 to 5. If the ratings are binary
(e.g., 0 - not liked, 1 - liked), then the elements of the UM would modify accordingly.
2.3.2
Jaccard index
The Jaccard Index is a metric to measure the overlapping or similarity between two
finite sets, and it is defined by the formula:
J(A, B) =
|A ∩ B|
|A ∪ B|
(1)
If A = B = ∅ ⇒ J(A, B) = 1.
If A and B have the same elements, then
A ∩ B = A ∪ B ⇒ J(A, B) = 1
(2)
On the other hand, if A ∩ B = ∅, then
J(A, B) = 0
(3)
0 ≤ J(A, B) ≤ 1
(4)
From (2) and (3) results that
The intuition behind using the Jaccard index as a similarity measure between
two items represented as sets of elements is that, as two sets are more different,
9
their intersection will result in a smaller number of elements, while their union will
result in a bigger number of elements, thus the ratio between the intersection and
the reunion will be closer to 0. Conversely, if the two sets are more similar, their
intersection will contain a higher number of elements, and closer to the number of
elements contained by their reunion; thus, their ratio will be closer to 1.
The rows or columns of a UM, such as the one in Fig. 2, can be interpreted as
vectors. However, if a processing step is applied on the UM such that, instead of
having ratings on a discrete scale from min to max, the matrix is binarized with a
given threshold (for example, all ratings ≤ 2 are discarded, and all ratings ≥ 3 are
replaced by 1), and the result will contain only 1s or empty spaces.
One could then interpret the matrix rows or columns as sets, with elementi ∈
setj if U M (i, j) = 1.
Then, the Jaccard index, as illustrated by (1) can be computed for two sets,
setj1 and setj2 , regardless if they represent users or items.
2.3.3
Cosine similarity
The cosine similarity is derived from the dot product of two vectors a and b.
a · b = ||a|| · ||b|| · cos(θ)
(5)
Therefore,
n
P
ai ∗ b i
a·b
i=0
=
sim(a, b) = cos(θ) =
||a|| · ||b||
||a|| · ||b||
(6)
and
v
u n
uX
||v|| = t
(vi )2
(7)
i=0
The cosine similarity measures the angle between two vectors a and b. The
intuition behind using this metric as a similarity measure is that the smaller the
angle between two vectors of features, the more similar they are. In a vector space
in which all the elements of a vector are positive, the cosine similarity will range
from 0 to 1, where 1 indicates complete overlapping (or complete similarity), and
0 indicates orthogonality (or complete dissimilarity) between the two considered
vectors.
A potential shortcoming of this method is that the cosine similarity fails to
capture the difference in magnitude between the two vectors. For example, assume
two vectors, v1 =< 1, 1, 1 > and v2 =< 10, 10, 10 > in a 3-dimensional space.
According to (6):
cos(θ) =
10 + 10 + 10
30
v1 · v2
= √ √
=
=1
||v1 ||||v2 ||
30
3 · 300
(8)
10
According to this metric, the two vectors considered are 100% similar. Obviously
this is not the case, as v2 is 10 times bigger than v1 in all its dimensions.
2.3.4
Euclidean distance
The Euclidean distance between two vectors a and b is the length of the line segment
connecting them. Suppose a =< a1 , a2 , ..., an > and a =< b1 , b2 , ..., bn >. Then, the
Euclidean distance is defined by the equation:
v
u n
uX
(ai − bi )2
(9)
d(a, b) = t
i=0
The Euclidean distance is a positive measure, left bounded to 0. That is, if two
vectors are identical, the distance between the two of them is 0 and their similarity
reaches the maximum point. The more different two vectors are, the bigger the
distance between them will be.
As compared to (6), equation (9) also takes into account the magnitude of the
vectors. Using the same example from section 2.3.3, the Euclidean distance between
v1 and v2 is:
d(v1 , v2 ) =
2.4
√
81 + 81 + 81 ≈ 15.6
(10)
Content-based filtering
In a content-based filtering setting, the system recommends to the user items similar
to the ones the user has liked in the past. Content-based filtering techniques analyze
common features among items and select new items based on the correlations between those and the user‘s past preferences [2] [33]. Similar definitions are provided
in [25] and [23] which describe the outcome of content-based recommendations as
resulting from the analysis of items rated by the user in the past, and matching
them against candidates from the set of unrated documents.
Content-based recommendations (CBR) assume that each item in the system
has a profile attached to it. An item profile is a collection of properties which can
be extracted for an item [18]. For example, suppose an information system provides
recommendations for recipes, such as in [32]. A recipe can be represented as a vector
of properties including cuisine, list of ingredients (set), diet type, region, occasion,
number of servings, etc. Determining the elements of the item profile is the first
major design decision to be taken when developing a content-based recommendation
system [25].
Ideally, roughly all the items in a collection have the same set of properties,
and data sparsity3 is not an issue. In this situation, a similarity measure can be
computed (i.e., cosine similarity, or Jaccard distance) to determine to which extent
a previously rated item is similar to a new one. If the result meets a certain threshold, then the new item is recommended for the user. Choosing the similarity metric
3
item profiles consistently have the same properties
11
and the similarity threshold represents the other major design decision needed to be
taken when developing a content-based recommendation system [25].
2.4.1
A basic architecture
A basic architecture of a CBR is presented in Fig. 3. First, the information is
collected from various information sources (bottom-left corner of the figure). For
example, one could use a crawler to harvest tweets, recipes, images or articles from
various third party platforms. Most likely, the information will not be in a structured
form that would allow easy usage and straightforward inclusion in a recommendation
engine. Therefore, the next step is adding structure to it, cleaning it from noise
and extracting only those features that are relevant for the specific situation (the
Content Analysis block). The items are then stored in a database, or other form
of persistent storage, for future reference (the Items Collection block). The typical
form of representing an item for a CBR system is a vector of features.
Figure 3: Content based recommendation - basic architecture. Adapted from [19]
A user will start using the platform, and will be presented with items. The
system typically has a way of collecting feedback from the user, for the items she
is viewing. For collecting feedback, two techniques are available: explicit feedback
and implicit feedback [19]. In an explicit feedback setting, a user is requested to
explicitly provide a rating for the item (typically on a scale from a minimum to a
maximum — 1 to 5 stars, or a metaphor that is translated into a scale — various
emoticons to express the feeling about an item). Using implicit feedback entails the
tracking of user actions and assigning them a weight (e.g., viewing a video until the
end earns it 5 points, while only viewing the page containing it, 2 points). In either
case, the feedback is being stored for future reference (the Feedback block, top-right
corner).
12
The system periodically analyses those items that the user has recently rated and
infers a profile (the Profile Learning block). When providing new recommendations,
new items are extracted from the Items repository, and compared against the user
profile. The Filtering block uses a similarity measure to determine which of those
unseen items best fit the current profile of the user. Usually, only top-k items will
make it to the final recommendation list.
The process will then enter a new iteration in which the user views the recommended items, judges and rates them and her profile is being updated for the next
round of recommendations.
2.4.2
Content-based recommendation advantages
Semeraro et al. present in [19] a list of advantages the CBR systems posses. First,
CBRs rely only on the ratings of the user for which recommendations are actually
provided.
This user-independence leads to another attractive feature - CBR systems do
not suffer from the first-rater problem which means that they are capable of recommending items newly added in the system, items that have not been rated yet by
any user.
Finally, a CBR can be easily explained for the active user. Such an explanation
would be for example: “You are seeing this item because you previously liked items
B, C and D which are similar to the current one”. Providing explanations for the recommendations is a good way to increase the trust of the user in the recommendation
system [19].
2.4.3
Examples from literature
[36] presents BlogMuse - an application built to help blog writers connect with their
audience. Readers that want to read about a certain topic but are not able to find
it, can submit a request in the system. The request is then routed to potential
matching users based on the interests from their profiles. In that sense, BlogMuse
implements a simple form of content-based recommendation in which an author that
has indicated an interest in a topic will be notified when that topic is being requested
by the potential audience.
If the writer decides to write about the topic, the requester is also notified. In
order to support audiences larger than one person, a topic submitted by a person is
public and can be viewed by other community members. The topics can be rated and
the voters are notified when someone has written about that topic. Also, whenever
a topic‘s interest was increased as a result of a rating, potential matching authors
are notified and can decide to write about the topic. Therefore, BlogMuse also
implements a collaborative-approach to recommending topics, such that the more
readers request a topic, it is likely that it will be recommended to a potential author.
A more classical CBR approach is presented in [33]. PRES - Personalized REcommender System - is a recommendation system exploiting CBR techniques to create dynamic hyperlinks for web pages containing advices for “do it yourself” home
13
improvements. The architecture of PRES is typical for a CBR system and hence,
similar to the one presented in Fig. 3. A user profile is learned from the feedback
the user is providing; the RS compares the user profile with the documents in the
collection and feeds a list of recommendations ranked on various dimensions such as
novelty, similarity, proximity and relevance [33].
Due to the nature of the domain, the authors argue that the profile learned by
PRES has to be very dynamic — it is highly likely that a user is not interested
in performing the same home improvement over a very short period of time. As a
consequence, the authors use implicit feedback heuristics to infer user-preferences
from their actions instead of asking users to provide explicit feedback for the pages
they visited. Thus, the more time a user spends on a page, it is considered that the
document is more relevant for her. However, the authors claim that using the same
heuristic to detect non-relevant documents is not suitable because a small amount of
time spent on reading a document might also be an indication that the document is
too similar to one she has previously read. For this reason, they do not use negative
examples to learn the user profile.
PRES uses the relevance feedback model introduced by Rocchio [21] and defined
by the following equation:
Pm = αP + β
X
1
1 X
Dj + γ
Dj
|Dr | D ∈D
|Dnr | D ∈D
j
r
j
(11)
nr
where:
Pm
P
Dr
Dnr
α, β, γ
the updated user profile
the initial user profile
the set of relevant documents
the set of non-relevant documents
constants to control the relative importance of the initial profile,
and the sets of relevant and non-relevant documents
However, because PRES does not make use of negative examples, γ is set to 0.
Furthermore, β is set to 1 because a document is only considered to be relevant or
not; α is set to a value between 0 and 1 via experimentation to reduce the weights
in the current profile.
The documents are parsed offline on a periodical basis and represented in the
document collection as vectors of features. The term frequency-inverse document
frequency (tf-idf) measure is used to determine discriminative terms in a document,
while cosine similarity, as defined by equation (6), is used to compute similarity
between the items in a user‘s profile and a document in the collection. For sorting
the results, the authors use two approaches: recommending top-k results ranked by
similarity, or recommending all results that comply with a similarity threshold.
To evaluate the performance of the RS, the authors used two metrics: precision
- how many of the retrieved documents are actually relevant; recall - how many
of the relevant documents were actually retrieved. The most important finding of
the study is that the precision varies depending on the considered topic [33]. The
14
authors attribute this finding to the intrinsic properties of the similar documents
(similar documents contain different terms which cannot be associated). I consider
that this problem may have been overcome if a synonyms dictionary would have
been used when parsing the documents to merge together terms with similar meanings.
Finally, one of the approaches presented by Pazzani in [25] uses a content-based
approach to build user profiles based on the description of the items in his system.
In order to learn the user profile he uses text-mining techniques to parse the descriptions of the items and applies the Winnow algorithm 4 to identify only the relevant
attributes in a a pool of many possible attributes. Once the relevant features are
identified on a per item basis, a user-profile is built by inspecting the items, which
were previously rated by the user.
I consider content-based recommendation to be relevant for this work. Therefore,
part of the algorithm described in section 5 will employ this technique. Specifically,
I am interested in using the vector space representation used in [33] as well as
heuristics to infer the user feedback from her actions rather than explicitly requesting
for feedback. I will also build user profiles based on the items previously rated by
the user.
2.5
Collaborative filtering
Unlike content-based filtering, collaborative filtering (CF) techniques leverage the
correlation between users with similar tastes [33]. The assumption behind collaborative filtering techniques is that if two users have similar behaviors with respect
to a set of items, they will have a similar behavior over other unseen items as well
[31]. The key difference is that when computing similarity, collaborative methods
usually exploit the rating behavior of the users, instead of looking at the features of
the users, or the features of the items.
There are two main categories of CF techniques. Memory-based techniques use
the whole (or a subset of the) dataset to compute ratings on the go. Such methods
are easy to implement and largely deployed in commercial systems such as Amazon
[31]. Model-based techniques use the ratings awarded by users for items to estimate
and learn user models that will generate rating predictions. The rest of this section
will focus on the first category, as it is more popular and easier to grasp. Henceforth,
when the term “collaborative filtering” is being used, it is a reference to memorybased collaborative filtering techniques. Providing the fundamentals for the second
category is out of scope for the thesis; however, [31] contains a comprehensive review
of CF model-based techniques.
2.5.1
A basic architecture
A picture sketching the architecture of a collaborative filtering architecture is depicted in Fig. 4.
4
http://en.wikipedia.org/wiki/Winnow_(algorithm)
15
According to Fig. 4, User A has rated positively items I1 , I2 and I3 , while User B
has rated positively items I1 , I2 and I4 ; User C has provided ratings for all I1 , I2 , I3
and I4 but consistently smaller than both User A and User B. In this situation users
A and B seem to be similar, therefore, if A is the active user — the user for which
a recommendation will be computed — then he should be recommended with I4 .
Alternatively, if user B is the active user, then she should be recommended with I3 .
A more formal representation of the collaborative filtering architecture can be
achieved if we make use of the utility matrix introduced in section 2.3.1 and is
presented in Fig. 5.
Figure 4: Collaborative filtering architecture
16
UA
UB
UC
I1
4
5
1
I2
5
3
2
I3
4
?
1
I4
?
4
3
Figure 5: Collaborative filtering architecture - utility matrix representation of
figure 4
2.5.2
User-User filtering
In a user-centered approach of the CF technique, the algorithm tries to infer the
ratings Useri will award to unrated items, by comparing her rating behavior against
users that have already rated those items. The system will estimate that Useri will
award the same rating to an item as the user, which is more similar to Useri .
For example, consider the UM from Fig. 5 and suppose we want to predict the
rating UA will give for I4 . Both UB and UC have rated I4 , therefore, two similarities
have to be computed between the vectors UA , UB and UA , UC . Let us assume we are
using the Euclidean distance, as defined by (9) as a similarity measure between the
two vectors. Because data sparsity - lack of ratings - is an issue for memory-based
CF techniques, it is necessary that only items that have been co-rated by both users
to be considered when computing
the similarity between
two users.
√
√
Thus, d(UA , UB ) = 5, while d(UA , UC ) = 27, and because d(UA , UB ) <
d(UA , UC ), UA is more similar in behavior to UB than to UC , therefore the predicted rating of UA for I4 is 4, which is a strong indication that the item should be
recommended.
2.5.3
Item-Item filtering
The item-centered approach is orthogonal to the user-centered approach. Instead of
computing the similarity between the vectors of ratings awarded by two users, the
item-item CF computes the similarity between the ratings awarded by all the users
to a pair of items. Let us consider again the example from Fig. 5 and assume we
would like to determine the ratings UB will give for I3 . Because UB has awarded
ratings to items I1 , I2 , I4 , we need to compute three similarities, between the pairs
(I3 , I1 ), (I3 , I2 ) and (I3 , I4 ). Using (9) and the same heuristic as in the user-user
filtering technique, according to which only pairs of users who have co-rated an item
are considered, we obtain the following values:
d
value
d(I3 , I1 ) √0
d(I3 , I2 ) √2
d(I3 , I4 )
4
I1 will better approximate the rating UB will give for I3 , hence the predicted
rating is 5. Again, this is a strong indication that I4 will be liked by UB and
therefore it should be recommended.
17
2.5.4
Collaborative filtering advantages
The memory-based CF techniques used for recommendations have a series of advantages that make them attractive and very popular to use; [31], present a list of
these advantages.
First, as compared to their content-based counter parts, CF recommendation
systems do not require a representation of the items in a potentially n-dimensional
space. This solves a big challenge because, in a content-based setting, a preprocessing step is often required to extract the relevant features for each item.
Second, and connected to the first point, when adding new items in the system
no pre-processing steps are required. An item is simply added in the collection with
no rating and can immediately be included in the recommendation process.
Finally, another major advantage of memory-based CF techniques is their low
complexity in implementation. Usually, a matrix representation of the problem and
choosing a similarity metric is all that is needed to set up such a recommendation
system.
2.5.5
Examples from literature
Pazzani presents in [25] a memory-based collaborative recommendation system,
which predicts the ratings that users award for restaurants. The author uses implicit feedback methods to infer a user‘s rating behavior. Thus, if a user has added
the restaurant‘s web-page to her online profile, it means that the restaurant has
achieved a positive rating.
The author addresses both user-centric and item-centric approaches in his experiment. For a similarity metric, he chooses the Pearsons r measure of correlation
defined by equation (12) in order to find the degree of correlation between two
users; next, the algorithm predicts the rating for an item as a weighted average of
the ratings the other users have awarded for the same item.
P
(Rx,d − Rx )(Ry,d − Ry )
d∈docs
(12)
r(x, y) = r P
P
2
2
(Ry,d − Ry )
(Rx,d − Rx )
d∈docs
d∈docs
where Rx,d is the rating awarded by user x to document d in the collection and
Rx is the average rating awarded by user x to all the documents she has rated.
Another example of collaborative filtering is detailed in [5]; the authors use webpages tag aggregation - or folksonomies - to enrich the recommendations delivered
using CF techniques. A folksonomy is defined as a tripartite graph consisting of
users, web-pages and tags. A user is connected to a web-page if she has added that
page to her profile. A tag is connected to both a web-page and a user, if the user has
used that tag to mark that specific web-page. Thus, the folksonomy is represented
as 3D matrix, containing implicitly collected information, with all items added by a
user receiving a score of 1 in the matrix.
The authors build standard user-user and item-item CF algorithms using a kNearest Neighbor approach (only the top-k most similar users or items are consid-
18
ered for predicting ratings) and leveraging aggregated information from the matrix
representation of the folksonomy:
• R - matrix containing ratings awarded by the users for the pages. This is a
binary matrix containing 0 if a user has not added the web-page to her profile,
and 1 if she did.
• U I - matrix containing how many tags each user has assigned to each item
• U T - matrix specifying how many times a user has used a certain tag to
annotate the items
• IT - matrix specifying how many times a tag has been used to annotate a
page
To compute similarity between users and items they use the cosine similarity measure
as defined in equation (6).
Furthermore, they investigate the possibility of using tags overlapping to measure
similarity between users or items. To this end they use the matrices U T and IT
and compare the performance of the Jaccard Index, Dice‘s coefficient5 and cosine
similarity in measuring similarity.
The results show that the user-user CF algorithm performs better than its item
counterpart, on the account that the average number of items per user is much higher
than the average number of users per item, which in turn leads to less sparse user
vectors when computing the similarity measurement [5]. For the tag-overlapping
similarity the results are mirrored. The authors motivate this finding due to the
higher average of tags per item, compared to the average number of items per user.
The findings from [5] are consistent with the real-world behavior, where there are
more users that purchase a large amount of different items, rather than having a large
amount of popular items, which are purchased by all the users. For this reason, usercentric approaches of CF algorithms are more popular and more frequently deployed
in real-life situations.
The Long Tail
Moreover, the findings from [5] are also backed-up by the long-tail theory which sits
at the root of recommendation systems. According to the long tail theory, there
is a significant larger pool of unpopular or unrated items as compared to a smaller
pool of very popular ones [3]. The purpose of the recommendation systems is to
recommend less popular items residing in the long tail to a larger number of users;
hence the phrase “less is more”. To better illustrate this concept, consider the plot
from Fig. 6 in which on the x-axis the items that participate in a recommendation
system are represented, while the y-axis contains the number of times an item has
been rated. The plot was obtained using the data from table 1.
5
https://www.google.fi/?gfe_rd=cr&ei=M76NU7flH-nJ8gfa_4HoBg
19
Item
# Ratings
I1
100
I2
65
I3
45
I4
35
I5
20
I6 I7
10 5
I8
5
... I100
...
5
Table 1: Distribution of ratings over a pool of 100 items
As it can easily be seen, the head of the tail (suppose we define as belonging to
the head of the tail only those items that have at least 20 ratings, that is I1 − I5 )
contains a total of 265 ratings, while the long tail contains 480, almost double the
amount. For the sake of argument let us suppose that the action of rating implies
that the user has bought the item, and all items have the same price. Thus, even
though items I1 − I5 are very popular among the community of users, they have
generated 50% less revenue than the rest of the items I6 − I100 residing in the long
tail. Hence, the motivation of recommendation systems of trying to recommend
new, unseen items.
120
100
Frequency
80
60
40
20
0
0
20
40
60
80
100
Items
Figure 6: The long tail
We limit the discussion about the various collaborative filtering techniques at
this point. Although it is a very popular technique and it is widely deployed in reallife scenarios, it will not be implemented in the recommendation system presented
in this paper. The reason for which I do not consider it relevant for this study is that
the addressed domain — health and well-being — is of such nature that contribution
from the community is much less important than domain-expert‘s knowledge and
the content of the recommended items.
2.6
Demographic filtering
A method that combines concepts from both content-based, as well as collaborative
recommendations is demographic filtering. When implementing demographic filter-
20
ing, a user is represented as a set of features, much like in a content-based approach.
However, the vector of features characterizing a user consists of demographic data
such as gender, age, income, location or profession. Most of the features that define
a user‘s profile are typically collected in an initialization step, shortly after the user‘s
registration on the platform; also, usually, the features are specified explicitly.
The purpose of demographic filtering is to identify typologies of users — users
that like a similar product — and focus the offering for that specific market segment
[25]. In order to identify classes of users, user-user similarity is usually computed
using one of the metrics mentioned in section 2.3 (cosine similarity, Euclidean distance, Jaccard coefficient). In that sense, demographic filtering uses concepts typical
for the user-centric collaborative approach.
Fig. 7 presents the concept of user similarity from a demographic perspective.
Figure 7: Demographic filtering architecture
2.6.1
Examples from literature
Recognizing that eliciting user demographics can be quite a challenge, Pazzani describes in [25] an alternative approach to obtain demographic information for the
users of his system. By crawling and text-mining the home-pages of existing users,
21
the author minimizes the effort required to obtain demographic information. As
with the content-based algorithm of the author, which was described in section 2.4.3,
the Winnow algorithm is used to learn the relevant characteristics of home pages.
User similarity is then calculated between inferred profiles and a recommendation
is computed.
Vozalis and Margaritis use demographic data to enhance the results of the baseline collaborative user-user and item-item recommendation algorithms [35]. A user
in their system has three demographic components: age - split into four categories,
gender - split into two categories, and occupation - split into 21 categories; for each
component, one state of the category is possible at a time. The resulting vector of
features has 27 dimensions; an example is u = <0,1,0,0,0,1,0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1> representing a user in the second age category (e.g.
25-35 years old), female, and having a job that fits in category 21 (e.g., accountant).
For computing similarity between the vectors of demographic data, cosine similarity
is used as a metric. The similarity score is then multiplied by the correlation score
between the two vectors of ratings awarded by the users for various items and thus
an enhanced metric is obtained.
Demographic data is an important element for the domain addressed in this paper, as well as for the proposed recommendation system. Equally relevant is the
representation of the vector of features as a binary vector in a multi-dimensional
space; such a representation, although it significantly increases the dimensionality
of the working space, it compensates for the shortcoming of the cosine-similarity
method of not accounting for the magnitude of the array. Consequently, this representation will be considered as an alternative representation in the vector-space
model of the items and its performance will be tested in order to take an informed
design decision with respect to the implementation of the proposed algorithm.
However, at the point of development, I did not posses enough accurate demographic information about the users to properly leverage it in the system. Thus, the
lack of consistent demographic data constitutes a limitation of this system and the
implementation of a demographic component is deferred for future work.
2.7
Common problems and limitations of recommendation
systems
Taken individually, each of the above methods has certain problems and limitations
which will be discussed in the following paragraphs.
2.7.1
Over-specialization
Over-specialization refers to a RS‘s incapacity to recommend things too different
than the ones the user has rated with a high rating. Over-specialization is a limitation of CBR systems. For example, in a system recommending news articles, if
the user has indicated as relevant some articles on the theme “Nokia acquired by
22
Microsoft”, the vast majority of the future recommendations will be on the same
theme.
Techniques to overcome over-specialization assume building hybrid RSs (using a
mix of CBR and CF), including a small random component in the final result [19], or
mixing the set resulted from CBR with the set resulted from a demographic filtering.
Also, one could prevent the recommendation of too similar results by lowering the
threshold of the similarity measure [19](i.e., instead of recommending items that
score 80% or higher in the similarity test, recommend those items that are similar
in a range between 60-75%).
2.7.2
Limited content analysis
CBR systems are also limited with respect to the size and contents of the items‘
profile. Either if the data cleansing is performed manually or automatically, there
is only a limited set of features that can be extracted; often a trade-off needs to be
made with regards to what an item profile should contain. For example, in a recipe
recommendation system, some recipes crawled from one website are characterized by
features like cuisine, ingredients and occasion, while recipes from a different web-site
are characterized by ingredients, number of servings and diet. A decision has to be
made with respect to which set should be used. Of course, a middle solution would
be to use both, but in this situation data-sparsity issues may arise, as different item
profiles will not exhibit the same consistency.
Moreover, to accurately represent items for a CBR, domain expert knowledge is
often needed [19]. Using again the recipe recommendation system, one would have
to validate that the recipes have indeed a minimum set of features that deems them
usable for the users of the system. The role of the domain-expert will be further
detailed in section 3.
2.7.3
Cold-start (new-user)
Both CBR and CF systems struggle to provide recommendations for users newly
entered in the system. Without a minimal base of reasoning, a CBR will not be able
to provide reliable recommendations. In the CF setting, the problem is related to
the data-sparsity problem. The UM used to compute similarity between new users
and new items is usually very sparse (provided the user base and/or the collection
of items is large) — only a small fraction of the user-item pairs will typically have
a rating assigned.
In a CF system, a new-user is equivalent with an empty row (or column, depending on how the information is represented), in the UM. A user for which no
prior ratings are available will not be recommended with any items, because her
vector of ratings will always have all elements 0 and, according to equation (6), the
dot-product will also be 0 therefore, a rating prediction cannot be computed.
23
2.7.4
Cold-start (new-item)
This problem is common for CF systems. An item without any prior ratings will
not be able to yield rating predictions, due to the same reasons as above. CBRs,
though are not affected by this shortcoming, because this class of recommendation
systems focuses on the content of items.
2.7.5
Serendipity
Serendipity is the counter-part of over-specialization. It is a desired feature of a RS,
and a limitation if the RS does not possess it. Serendipity is the ability of the RS
to provide surprising recommendations, that otherwise the user would not have the
chance to come across [19]. There is a clear distinction between the novelty property
of a RS and serendipity, and it stems from the probability of the user discovering
the recommended item. A novel item has a higher probability of being discovered if
not recommended. Thus, an item that is serendipitous is also novel, but the reverse
does not hold [19].
2.7.6
Shilling-attacks
The “shilling-attack” method is a vulnerability of collaborative recommendation systems. Shilling-attacks entail creating fake user profiles which rate in the same manner a specific set of target items. Next, the same user profiles are used to rate other
items such that the rating behavior becomes similar to the ones of other regular
users. The final outcome is that, in a user-centric CF approach, regular users will
be recommended with the target items, on the account that the fake user has a similar rating behavior with the active user [7]. There are two forms of shilling-attacks:
“push” attacks (promote attacks) when the target items are rated with a high rating
and the outcome is the one previously mentioned; “nuke” attacks (demote attacks)
when the target items are rated with a low rating and the outcome is that the target
items do not get the chance of being recommended due to their poor average score.
2.7.7
Gray sheep
“Gray-sheep” users are defined as users that have unusual tastes as compared to
the rest of the community [12]. Their ratings partially agree with some users and
partially disagree with others. Two potential problems may result from this scenario:
first, “gray-sheep” users may not receive accurate recommendations due to their
inconstant purchasing/rating behavior; second, their contribution to the system is
unreliable and might affect the quality of the recommendations for the other users.
The authors of [12] present a solution for detecting “gray-sheep” users: they adapt
the k-Means++[?] clustering algorithm in order to cluster the utility matrix and
detect gray-sheep users, and use the results to generate accurate recommendations
for this category of users.
24
2.8
Hybrid filtering
To compensate for the disadvantages previously mentioned, and augment the strong
points of each approach, various hybrid methods have emerged.
One trivial approach would be to separately implement a content-based system
and a collaborative one. Each system will generate a set of results which can then
be combined in a final list, using various heuristics to rank the list in a way which
is meaningful for the user [2]. Another approach, as previously seen in section 2.6,
is to use demographic data to boost the results of collaborative filtering.
Across time, practitioners have observed the advantages of building hybrid recommendation systems. The next section aims at presenting some of the most interesting examples encountered in the literature. Approaches that are a hybrid of
CBR and CF techniques will be presented, but also more “exotic” implementations
are considered.
2.8.1
Classic examples of hybridization
Collaboration via content
In line with one of the hybridization techniques proposed by Adomavicius and
Thuzilin [2], Pazzani [25] merges content-based and collaborative techniques in a
single recommendation model. His method - collaboration via content - exploits the
content-based profile of each user (described in section 2.4.3) to compute similarity
between pairs of users using the Pearson measure defined by equation (12). The ratings are predicted using the same technique described in section 2.5.5, by computing
a weighted average of the ratings all users have awarded for a certain restaurant,
where the weights are represented by a correlation factor previously calculated.
The results obtained with this approach (accuracy of 70.1% for the predicted
ratings) outperform the user-user CF approach (67.9%), item-item CF approach
(57.9%) and the pure content-based approach (61.5%).
Merging all algorithms together
Another approach, yet again proposed by Pazzani in [25] merges together all the
four previously mentioned algorithms in the following manner:
1. each algorithm runs to provide a list of recommendations, out of which top-5 are
retained.
2. each recommendation receives a number of points equal to 6 − k, where k is the
rank of a recommendation (i.e., rank 1 - 5 points, rank 2 - 4 points, etc.).
3. the results are then aggregated into one list, with the points for the same recommendation being added.
On an average, this method is successful on 72.1% of the cases and outperforms
all the other methods, thus demonstrating that hybrid techniques can successfully
be used to overcome limitations of stand-alone methods.
25
Quickstep
Quickstep [23] is a hybrid RS combining CBR and CF techniques for research paper
recommendations. The research papers are represented as vectors of relevant terms.
When parsing a paper, term weights are computed as
w=
term frequency
total # of terms
(13)
and a pre-processing step for Porter stemming (eliminate suffixes, prefixes, etc.),
removing stop words and common words is implemented.
As with [33], the authors use implicit feedback heuristics assigning different values for user actions; however, providing explicit feedback is also possible and helps
enhancing the system‘s accuracy (i.e., browsing a paper, following a recommendation, rating a paper as relevant or not). Based on the user actions, a topic interest
value is computed.
For labeling the new research papers an inductive supervised learning method
(nearest-neighbor) is used in conjunction with a multi-class representation, where
each class is represented by a research paper topic. The authors investigate the
possibility of augmenting user profiles with a research-paper ontology. Thus, when
a paper receives interest (as described above), its immediate super-class receives a
share of that interest (i.e., 50%), the next super class a smaller share (25%) until
the top-level of the ontology is reached [23].
The recommendations are delivered following a matching between the user‘s
current topic of interest and the papers classified as belonging to those topics. The
confidence with which a recommendation is delivered is obtained by the equation:
recommendation confidence = classification confidence × topic interest value (14)
The collaborative component consists by users being able to provide new examples of topics and correcting papers that were assigned in a wrong class.
I consider the augmentation of the topic list with a research-topic ontology to be
valuable, as it might alleviate the over-specialization problem to which the contentbased filtering component is prone to. This is all the more important, as the efficiency of the collaborative component — which assumes a user will explicitly modify
a paper‘s class — is questionable and conflicting with the authors‘ decision of using
implicit feedback techniques to increase the unobtrusiveness of the system.
Personalized Learning Recommendation System (PLRS)
Lu [20] presents PLRS, a framework for personalized learning recommender systems, consisting of four components: student profile builder, student requirement
identification, learning material matching analysis and learning recommendation
generation.
The student profile is built with a mix of implicit and explicit student actions. As
a source of intentional (implicit) information, PLRS uses web-mining techniques to
analyze the click-stream of a student in an e-learning platform. This analysis reveals
26
the behavior of a student in terms of what materials is she viewing, what does she
consider as being of interest, and serves as the basis for the CBR component of the
recommendation algorithm.
The student requirement identification component uses multi-criteria analysis to
build a model of the student. The reason provided by the author is that student
requirements are difficult to approximate through precise values, and fuzzy values
would better fit this domain (requirements are represented as “important”, “less important”, etc.). Furthermore, the criteria for student requirement identification are
enriched through a mix of demographic filtering (collect requirements from students
with similar learning styles, and membership to different academic groups - business faculty, science faculty) as well as collaborative filtering (inspect the access to
learning material of other students).
The learning material matching analysis makes use of a set of matching rules
and a learning material tree to match a set of requirements against a learning material set. Finally, the recommendations are delivered to the student using a top-N
technique.
2.8.2
“Exotic” hybrid approaches
Conversational recommendation system - MobyRek
Ricci and Nguyen [27] challenge the efficiency of implicit user feedback as a source of
information in recommendation systems, and underline two major problems for this
method: first, the user actions need to be interpreted and translated into meaningful
user profiles; second, implicit feedback is often noisy, as the reason and objective of
a user-action varies from user to user and context to context.
Having this in mind, they propose MobyRek - a conversational recommendation
system developed for mobile platforms. Conversational recommendation systems
assume human-computer interaction in successive cycles, the result of the recommendation being adjusted after each cycle, until it converges to the user‘s desired
outcome. MobyRek recommends restaurants to users, where a restaurant is modeled
as vector in an n-dimensional vectorial space.
A user query is composed of three parts: the logical query (QL ) models the
“must” conditions that need to be independently satisfied by the recommendations;
the favorite pattern (p) models the “should” conditions which should be satisfied as
many as possible; finally, a vector (w) is reflecting the importance of some features
over the others.
When the user initializes a recommendation session, the user‘s past history is
consulted to retrieve an initial list of recommendations. A list of ranked recommendations (based on the vector w) is delivered for the user, in which case one of three
possibilities may occur. If the user considers one of the recommendations appropriate, she may choose it and the process terminates; the recommendation is added to
the user‘s history and will be used in the future to provide new recommendations.
If none of the recommendations is appropriate and the user terminates the session,
the current case is recorded as a failed one and, again, used for future reference.
Finally, the user may consider that a recommendation might suit her needs but
27
some features are not completely in line with her requirements. She can choose to
criticize the recommendation and indicate new features, as well as their type - “wish”
or “must”. A new set of recommendations will be computed and again, the three
situations are possible.
Results of MobyRek‘s evaluation show that the critique-based RS converges to
a successful recommendation in 2-3 cycles. There may be situations in which this
approach might be effective (especially on a desktop/laptop computer); however,
given the users‘ reluctance to provide feedback for recommendations (even in one
cycle and provided the recommendation is a good match) I consider the critiquebased RS would generally have a hard time eliciting user preferences, particularly
when accessed from a mobile platform.
Networks of recipe ingredients
Teng et al. [32] present a recommendation system for recipes that leverages the information encoded in the ingredients network. Parsing the collection of recipes and
their ingredients, authors use pointwise mutual information 6 to determine which
ingredients occur together, and build the complement network; the substitute network, which is derived from mining user-generated content to determine suggestions
for recipe modifications, illustrates ingredients which can be replaced by other ingredients in the network.
In order to predict recipe ratings, the authors apply stochastic gradient boosting trees and support vector machines techniques [32] and demonstrate that the
structure of ingredient network contains valuable information, which improves the
recommendation results.
At its core the method is another hybrid approach, in which the content-based
component is represented by the ingredients included in each recipe, while the contributions of the community of users are leveraged to determine the substitute network
of ingredients.
2.9
Summary
For a convenient overview of this chapter, table 2 summarizes the main concepts
discussed above. The summary includes: the main types of recommendation techniques, the data each technique uses, advantages and disadvantages of each approach, and examples from the reviewed literature.
6
http://en.wikipedia.org/wiki/Pointwise_mutual_information
Algorithm
Data used
Advantages
Content-based
filtering
• Features of items
• Relies only on the • Overs-pecialization
• Vector space repreratings the active • Limited
content
sentation of an item
user has awarded
analysis
• Active user‘s ratings • Does not suffer from • Limited serendipity
for items
the first-rater problem
• Easy explanation of
the recommendation
• Pazzani [25],
• BlogMuse [36],
• PRES [33]
Collaborative
filtering
• Ratings awarded by
users to items
• User-centric
• Item-centric
• Pazzani [25]
• Folksonomies [5]
Demographic
filtering
• User
data
Hybrid
filtering
• Merges
concepts • Able to overcome the • High complexity defrom several of the
weak points of invelopment
above
approaches,
dividual approaches
either in a sequential
while building on
way, or in parallel
their strong points
• Low complexity development
• Data pre-processing
not required
• Item dimensionality
reduction
Disadvantages
•
•
•
•
Cold-start
“Gray sheep”
Shilling-attacks
Scalability issues
demographic • Can be used to en- • Demographic data is
hance the recommennot accurate and difdations of the previficult to collect
ous approaches
Examples
• Pazzani [25],
• Vozalis et al. [35]
• Collaboration
via
content,
Merging
several
techniques
together [25],
• Quickstep [23],
• PLRS [20],
• MobyRek [27],
• Teng et al. [32]
28
Table 2: Recommendation systems - overview
29
3
Expert systems
Edward Feigenbaum, considered to be the father of expert systems (ES), defines
them as “an intelligent computer program that uses knowledge and inference procedures to solve problems that are difficult enough to require significant expertise”
[28].
DeTore [9] defines ES as being computer programs that exploit knowledge from
human experts in order to solve problems in a non-procedural manner. A similar definition is provided in [30] where ES are perceived as computerized systems
with embedded human expert problem solving knowledge and inference capabilities.
Williams [37] sees the potential of expert systems as alternatives to human experts
and again, able to be deployed in a wide range of narrow domains.
Sasikumar et al. [28] refer to ES as applications which should be able to solve
very complex problems at least as well as human experts. In fulfilling this goal,
they do not make use of algorithms, but rather rules of thumb from a very specific
domain. Singh [29] reinforces the statement that ES should be deployed in a very
specific and limited domain.
A few recurrent topics emerge from the above definitions. First, there is unanimous agreement that an expert system should embed the knowledge of a human
expert. In that sense, DeTore [9] makes a clear distinction between knowledge and
information in the sense that while information can exists by itself, knowledge is
information, processed such that a decision can be made based on it.
Second, an expert system‘s domain of applicability should be very narrow. This
idea is strongly connected to the one above. Because an expert system should
replace the interaction of the user with the human expert, the person that provides
the domain-knowledge for the system needs to be highly proficient in that domain.
A human individual can achieve excellence in a specific domain only if she dedicates
the majority of her time investigating that domain. Hence, an expert system which
will leverage the deep, focused knowledge of a human individual, will typically be
very narrowly scoped.
Finally, an expert system should mimic the interaction between the user and
the human expert from one end of the experience to the other. This is typically
achieved through a set of “if-then” rules, which are fired in a cascading manner [28].
For an enhanced user experience the line of reasoning used to infer certain facts can
be explained at the end of the decision making process [37].
3.1
Expert systems architecture
The default architecture of an ES is presented in Fig. 8.
The knowledge base (KB) contains the domain specific knowledge. It is the
task of the knowledge engineer to collect and encode the human knowledge from a
domain-expert, such that a computer can understand it. The KB typically consists
of both theoretical as well as practical knowledge (heuristics and rules of thumb)
[9]. The knowledge can be represented either as past cases, or if-then rules.
The working memory represents a set of facts used to describe a particular situ-
30
ation. It encompasses all the inputs of a program and determines the starting point
of the inference engine. Unlike an algorithm, an ES can start at different points of
its flow, depending on the current data [9].
Figure 8: Expert systems - basic architecture
The inference engine is the heart of ES; it schedules the rules from the KB
to determine their sequence of execution and fires them using the data from the
Working memory block as input parameters [28]. The inference engine works by
reasoning (chaining facts) about the problem at hand. It does so in one of two
possible ways: forward chaining or backward chaining.
Forward chaining assumes constructing a solution starting from initial information. The approach is suitable in situations in which there are a small number
of initial conditions and large number of potential solutions [9]. For this reason,
forward chaining is also considered a data driven approach to solving a problem.
Backward chaining selects a possible answer and navigates backwards to see
if the input parameters match. It is a suitable technique when there are many
initial conditions, but few possible results. Backward chaining is considered to be a
hypothesis driven approach.
The user interface is the part through which the user interacts with the expert
system and the main entry-point of the data in the program. Depending on the
type of application, ES can communicate with their users in either an interactive
or non-interactive way [9]. For example, a wine-advisor expert system might model
the interaction with the user through a set of subsequent questions. Each time the
user provides an answer, the ES adjusts the interaction to fit the newly enriched
context. On the other hand, a recommendation system that makes use of domainexpert knowledge might feed recommendations in a non-interactive way, simply by
populating the working memory with facts from the user‘s past behavior.
31
3.2
The knowledge acquisition process
The knowledge acquisition process has two actors. On one hand, there is the knowledge engineer (KE) — a person with extensive knowledge in using and building
expert systems. To some extent, the KE should also act as a solution architect
when defining how the rules should fire. On the other hand, there is the domain
expert, who provides the knowledge to be represented in the KB. The KE observes
the domain-expert taking decisions and reasoning about various situations. In addition, the domain-expert should also explain her reasoning to the KE. The task of
the KE is to accurately translate the domain-expert knowledge into the set of rules,
which will be later incorporated in the KB.
When eliciting knowledge, the KE should consider a different number of sources
(such as textbooks or reference manuals) to reinforce, enrich and understand the
knowledge shared by the domain-expert [28].
3.3
Limitations and pitfalls
Sasikumar et al. list in [28] a set of possible characteristics a domain should have in
order to support the decision of building an ES for it:
• A domain expert should be available and willing to share her knowledge about
the area
• The problem the system is trying to solve could be solved by talking to the
domain-expert in person
• The domain expert can solve the problem in a short amount of time
• The domain expert builds up her skills gradually as she solves more cases
• There is a book or manual which contains the same knowledge as the domain
expert possesses
Even though some, or all of the above items may be present in a situation,
there are still limitations and pitfalls of which ES users should be aware. They are
discussed in the following paragraphs.
3.3.1
Choosing the right problem
Choosing a too difficult problem to solve will require more resources: for example,
the problem can span across several domains, thus requiring more domain-experts
and perhaps more KEs; it can also translate into an increased number of rules in
the KB which will negatively impact the development time, and even the quality
and performance of the final system.
Also, from a business standpoint, the problem solved by the ES must justify the
costs involved by the development.
Finally, from a technological perspective, for a problem to be suitable for an ES,
it should not be easily solvable using an algorithmic approach.
32
3.3.2
Collaborating with the domain-expert
The interaction with a domain expert can be a tedious and sometimes frustrating
process. As stated earlier, the knowledge acquisition process should happen with
the KE observing actions performed by the domain-expert. It might happen that
the domain-expert does not find enough time to schedule meetings for interviews
and observations through which the knowledge is elicited.
Provided the KE and the domain-expert find some common ground for the observations to take place, then the KE must demonstrate sufficient skills to interpret
the rules, which the domain expert may specify either in a too simplistic or too
complex way.
Finally, the KE might find herself faced with the situation in which she has to
cooperate with a domain-expert doubting the effectiveness of an ES and thus making
the cooperation more difficult.
3.3.3
Liability issues
Although it might not always be the case, managers that take the decision of implementing ES as part of their offering should be aware of the implications that
potential failures entail. All actors involved in the life-cycle of an ES are, to some
extent, subject to legal action [24]. For example, communication problems during
the knowledge acquisition process could lead to situations where either the KE, the
domain-expert or the company owning the ES might be charged of negligence. On
one hand, KEs can misinterpret information transmitted by the domain-expert, or
they can invalidate it due to biased opinions or self-overrated capabilities. To the
same extent, domain-experts might not be able to properly articulate the knowledge
they are trying to pass on, or might not completely recall their line of reasoning with
respect to specific situations. Companies may enter the incidence of law simply by
being situated higher on the hierarchy responsible for the ES [24].
End-users are also responsible for their actions with respect to an ES, especially
when referring to the interactive subclass of ESs. Their (non)-erroneous responses
and (poorly) formulated queries impact the final recommendation an ES will provide.
This subsection will stop here, as the liability issues connected to ES operation
are far more involved and out of scope for this study. For further law-related details
the reader is pointed to source [24].
3.4
Combining recommendation systems and expert systems
The approach of merging the two notions of recommendation systems and expert
systems into an information system is not new. This section reports on the previous
attempts to create hybrid systems that are striving to exploit the positive aspects
of RS and ES, while countering their negative aspects.
33
Fuzzy cognitive agents
Miao et al. [22] present a new type of recommendation systems called fuzzy cognitive
agents. A fuzzy cognitive agent provides recommendations based on current user‘s
preferences, other user‘s common preferences and domain-expert knowledge. The
agent‘s knowledge model is represented as a fuzzy cognitive map: a weighted, signed
and directed graph consisting of concepts and weights and defined as a 2-tuple
MF CA = {C, W } where:
• C = {ci |ci ∈ [−1, 1]} - is the set of concepts
• W = {wij |wij ∈ [−1, 1]} - is the set of weights, with i, j = 1 : n
The vertices in the map indicate cause-effect relationship between two concepts,
while the weight of one vertex defines the strength of the relationship: a positive
weight means that the higher concepti is, the higher conceptj will be; a negative
weight means that the higher concepti is, the lower conceptj will be. The value
ci ∈ [−1, 1] indicates to what extent the concept is present in the map.
The system proposed in [22] is designed for a used-car online store. As such,
the domain-expert has identified five attributes of which the buyers are usually
concerned when they are considering to purchase a second-hand car: price, model
(particularities about the engine, i.e. 2.0 hybrid), mileage, age and make (the brand
of the car, i.e., Toyota Prius); and the relationships between them and the customer‘s
satisfaction degree: the higher the price, age or mileage, the lower the satisfaction
degree will become; in turn, the model and make attributes are positively correlated
to the satisfaction degree; moreover, price is negatively correlated to the age and
mileage (the higher the mileage, or the older the car, the lower the price), while the
model and make attributes are positively correlated to the price.
The recommendation is computed as a mix of user‘s current elicited preferences
and other users‘ preferences. Initially, the knowledge agent is initialized with the
domain-expert information. The map is then adjusted by applying case-based reasoning from other users‘ past history, and neural networks learning to infer community‘s general preferences. The current preferences are elicited using explicit
interaction with the user. Two recommendation lists are delivered to the user, one
which takes into account all three information sources, and the other without taking
into account the user‘s individual preferences.
The authors run two experiments, one in which the knowledge-agent-model
fuzzy-cognitive-map is transferred onto a neural network without taking into account the weights of the map, and the second one including the weights. The results
indicate a mean square error of 0.01% and 0.005%, respectively, while the accuracy
reaches 72.8% and 79.6%.
Multi-agent expert system for electronic store
Lee [17] presents a multi-agent system that uses domain-expert knowledge and collaborative filtering techniques to provide product recommendations for an online
34
electronics store. The ES he develops does not build user profiles for capturing user
preferences, but rather uses the ephemeral information provided by the user while
visiting the online store. He motivates his design decision by underlining the low
frequency with which users purchase electronics items [17].
The ES multi-agent system comprises four agents. The interface agent collects
a set of requirements the user indicates for the future purchase. Because not all
the users have the required knowledge to provide quantitative information about
electronic products, the interface agent collects the requirements in a qualitative
way and sends the result to the decision-making agent.
The domain-expert interacts with the knowledge agent to share her knowledge
about the products existing in the system. Following their interaction a product,
which initially has a profile consisting of quantitative features, will also have a
correspondent consisting of qualitative ones. Several domain-experts can input their
knowledge about a certain product; when this happens, their input is combined with
the weights of all the experts being equal.
The decision-making agent receives the qualitative information from the interface
agent and compares it against the qualitative features resulted from the knowledge
acquisition process. A product is recommended such that it has the largest benefit
indicator value and the smallest cost indicator value, that is, the recommended
product is positioned closest to the best solution and farthest away from the worst
solution that matches the criteria [17].
Finally, to minimize the user-interface agent interaction, the behavior-matching
agent analyses the current user‘s behavior when answering the qualitative questions
and tries to determine users that are similar, in terms of behavior, with the current
user. Thus, after each adjustment of preferences, the user is recommended products
that were previously recommended to the matching users.
The idea of incorporating the knowledge of several domain-experts is valuable,
but the reader should recall the limitations of ES mentioned in section 3.3.2. While
the automation of the knowledge acquisition process is definitely a plus of the system presented in [17], combining the domain-experts‘ knowledge in equal parts might
prove to be problematic in certain situations (i.e., domain expert does not fully understand the purpose of the application, or does not fully understand the interface).
A recommendation system for the same domain but with a different implementation is presented in [6]. Many of the authors‘ assumptions and design decisions are
similar with the ones from [17]. For example, it is assumed that the customers do not
have enough knowledge to answer quantitative questions when eliciting user needs,
and qualitative ones are used instead; also, it is assumed that customers do not buy
electronic products so often, hence there is no need to store a past user history; finally, domain-expert knowledge is used to translate user preferences in quantitative
metrics, as well as to accurately represent the products in the database.
The difference between the two is that, if in [17] the author uses multi-attribute
decision making to simultaneously consider customer‘s needs, Cao et al. [6] translate user preferences in triangular fuzzy numbers and computes similarity measures
between two sets of fuzzy numbers. Further technical details can be obtained by
consulting source [6].
35
GymSkill
GymSkill [16] is a smartphone application aimed at addressing several shortcomings identified by the authors in a previous extensive app review. Thus, GymSkill
consists of an exercise database, a module for collecting sensor data (using RFID,
accelerometer and magnetometer data), a module for evaluating user‘s skill and presenting the feedback, and a module that recommends new exercises based on the
current skill. GymSkill is designed for balance board exercises. When performing
an exercise session, the user is required to place the smartphone on which GymSkill
is running, on the balance board. The balance board is augmented with an RFID 7
tag, which enables the phone, through the accelerometer and magnetometer sensors,
to record deviations from the initial position [16].
After the completion of the exercise the recorded data is evaluated against
ground-truth data and feedback is presented to the user. New exercises are recommended to the user based on the previous skill assesment. The domain-expert‘s
implication in this system consists in providing the ground-truth facts, based on
which the end-user receives customized feedback.
The evaluation of GymSkill shows that the application might help reaching a
training goal, can provide long-term motivation for the end user, as well as recommendations aimed at improving certain parts of the human body in a systematic
way; thus, the integration of the domain-expert knowledge in the system proves to
be lucrative.
Wine advisor expert system
Finally, Dinuca and Istrate [10] present a wine advisor expert system. In this scenario, rather than encoding domain-expert knowledge in lists of weights and rankings
of importance [17], [6], an “if-then-else” rule-based approach coupled with forward
chaining as defined in 3.1 is preferred. I consider the approach to be suitable and
I will adapt it for the system proposed in section 5. However, [10] lacks evidence
with respect to the evaluation of the proposed ES therefore, a comparison between
the results of my implementation and [10] will not be possible.
3.5
Summary
This section will summarize the expert system concepts defined above by comparing
them to the recommendation system concepts presented in section 2.9; next, areas
in which merging recommendation systems with expert systems can prove to be
efficient, will be outlined. The results of the comparison are enclosed in table 3.
7
http://en.wikipedia.org/wiki/Radio-frequency_identification
36
Concept
Advantages
Disadvantages
Recommendation
systems
• Easy
implementation
for memory based
techniques
• Easy explanation
for CBR
• Over-specialization
(CBR)
• Cold-start (CF)
• Limited
content
analysis
Expert systems
• Work better on very • Potential faulty colnarrow domains
laboration between
• Easy implementaKE and domaintion of rule-based
expert
decision systems
• Liability issues
Table 3: Recommendation systems vs. Expert systems
By augmenting a recommendation system with the knowledge of a domain expert, several of the negative aspects of both approaches can be improved. First, as
discussed in section 2.7, one major problem of the content-based filtering techniques
is over-specialization. As a reminder for the reader, over-specialization refers to the
recommendation system‘s incapacity of recommending items very different from the
ones a user has previously indicated as relevant. By plugging in a component that
leverages a domain-expert‘s knowledge, recommendations can be enriched with potentially novel items. The assumption behind this statement is that a domain-expert
is able to tell what a user needs, which may be different than what a user likes. A
good example where domain-expert knowledge is used for this purpose is presented
in [6].
Second, a domain-expert enhanced recommendation system can overcome the
cold-start problem of a collaborative filtering technique. Cold-start refers to the
inability of a new product (with very few ratings) to participate in the recommendation process; or to the inability of a user with no ratings to be recommended with
any items. By using knowledge from past experience, a domain-expert can better
articulate what are the needs of a new user; alternatively, the rules based on which
the expert system is working can lead to recommending items newly added to the
system.
Next, in a content-based approach, an item is typically represented as vector in
an n-dimensional space. Determining the dimension of the vectorial space is a design
decision which has direct implications on the result of the similarity metric being
used in the system and, consequently, affects the results of the recommendations
(e.g., two items represented in a 2-dimensional space may be fundamentally different
than the same 2 items with n extra dimensions in a different space). A domainexpert can provide new interpretations for the values of an item‘s features and can
potentially enrich the recommendation list or improve the novelty of the results.
Finally, the data used by a recommendation system can be mined, use-cases can
be built and then used to improve the collaboration process between the knowledge
37
engineer and the domain-expert; the knowledge engineer can use the data to better
illustrate the behavior of a particular user, while the domain-expert can better
exemplify what is the result of applying a certain piece of knowledge to a specific
scenario.
These ideas represent the main arguments of the decision of building the hybrid
recommendation system, which will be described in section 5.
38
4
OmaTauko - Concept Description
This section will briefly present the current state of the art of OmaTauko - a commercial system that will benefit from the development of the hybrid recommendation
system proposed in this study. First, the fundamental idea of OmaTauko is presented, together with an explanation of the basic interaction of the user with the
system. Next, the entities involved in the system are described by detailing the
domain model.
4.1
OmaTauko - Concept description
Framgo is a start-up founded in September 2012 in Helsinki, Finland, activating in
the domain of occupational health and well-being. Framgo provides occupational
health services and products to small and medium sized companies along three
coordinates:
• The digital service Oma Tauko
• An ergonomics division selling products related to ergonomics - the products
included in this offering consist of small add-ons, such as back supports for office chairs, ergonomic mouses and keyboards, or larger ones, such as ergonomic
chairs or tables with adjustable height
• A wearable device able to measure muscle and fat tissue, heart rate and blood
pressure and several other physiological markers.
OmaTauko is the most developed component of the company‘s offering being
already launched on the Finnish market. OmaTauko is an occupational health
product-service system (PSS) which provides a way to keep short micro-breaks by
combining physical workout gear with a smartphone application. The best way to
decrease musculoskeletal problems and prevent a number of connected illnesses is
having regular physical exercise. Additionally, according to [13], 12 minutes of break
per day give people energy and decrease stress levels. The exercises featured in the
mobile app are specifically designed by a personal trainer (which in this context acts
as the domain expert), to decrease neck, shoulder and back pain.
The contents of OmaTauko can be tailored to a customer‘s needs; a starter
package consists of the following elements:
• four kettle bells with the weights: 8kg, 6kg, 4kg and 2.5 kg
• a foam roller
• a balance board
• individual stress relief balls for each employee
• a specially designed shelf for storing and easily accessing the equipment
• an introduction session held by a personal trainer at the customer‘s location
39
In addition to the starter package, the employees of the company gain access
to the smartphone application. At the moment, Framgo addresses two out of the
three major mobile platforms - Windows Phone (WP) and iOS - as well as the web
platform.
From a technological perspective, OmaTauko is architected as a client-server
application. The server-side component is developed in Python using the Flask
framework and communicates with a PostgreSQL database. Both the WP and the
iOS clients are built as thin clients in terms of data storage with all the content
being served over the network; therefore, in order for a user to be able to use the
application, she needs to have access to an Internet connection.
When using the application, the user has an overview of the weekly progress as
well as of the current day. A day in which the user has completed the target of 12
minutes of break per day is marked accordingly and in a different way than a day
that was either partially completed, or not started at all (Fig. 9).
Figure 9: Application starting screen
The user can choose to start a new break; when she does so, she is required to
select the devices (tennis ball, kettle bell, own weight, foam roller) with which she
would like to exercise during the break. She is also requested to select the duration
of the break (current possible values are 2, 4 or 6 minutes) (Fig. 10). Once she
does that, she is offered a list of tasks which corresponds to her selection. When
visualizing an exercise, the user is displayed with a title and a short description,
together with a video playing in infinite loop. Should she decide the current exercise
is not relevant for her break or too difficult, she has the option of skipping it and
moving on to the next one (Fig. 11). However, if too many exercises are skipped in
a break, when the selected break duration can no longer be completed, the process
is aborted and the user is asked to provide feedback with respect to the reason of
skipping all the exercises.
40
Figure 10: Break configuration
Figure 11: Performing an exercise
While using the app, the user also has access to a small set of statistical data,
such as a monthly overview of her breaks, the total number of completed, partially
completed or incomplete days (Fig. 12), the current number of completed days in
row, or the distribution of exercises with respect to the types of devices available
(i.e., 55% of the exercises have been completed using a kettlebell, while 45% of the
exercises have been completed doing stretching routines) (Fig. 13).
Figure 12: Monthly statistics view
Figure 13: Overall statistics view
The user has the option of scheduling customizable recurrent in-app reminders
that will let her know when is the moment to have a micro-break in order to refill her
energy levels (Fig. 14). Finally, in the settings area, the user can fill in a minimal
set of personal information (name, birth date, gender) (Fig. 15).
41
Figure 14: Scheduling reminders
4.2
Figure 15: User details
Domain model
The UML diagram representing a fraction of the domain model which is relevant for
this work, is presented in figure 16
Figure 16: OmaTauko - Domain model
42
User
A user gains access to the system following a registration process. After she registers,
she is required to introduce a minimal set of demographic information (date of birth
and gender). However, at this point in time the demographic information is not
mandatory, therefore, it cannot be used in the recommendation algorithm.
Basetask
A basetask is an exercise which a user should perform during a break. It is identified
in the system by the following attributes:
• name - the title of the task
• description - a small description regarding the movement the user should perform during the exercise
• category - the type of exercise - based on it, the duration of a task is defined.
For the time being, all the tasks have the same type and a duration of 30
seconds. Future development will address other types of tasks with variable
durations
• device id - the device with which the task should be executed; a basetask is
performed with exactly one device
• muscle group - the muscle group targeted to be improved through the current
task; a basetask addresses one primary muscle group
• complexity - the complexity of the movement required by the current task; a
basetask has exactly one complexity level
Devices
A device represents a physical object with which a task is being completed. The
current version of the system includes tasks that can be performed with two physical
devices - kettllebell and tennis-balls - and tasks that can be performed without
any devices - user‘s own body weight and stretching moves. As such, even though
body weight or stretching movements cannot be considered proper devices, they
are included in the list and the user can select them to indicate her preference of
exercise.
Other devices that are considered to be included in the system are a foam-roller,
a gym stick, an elastic band and a balance-board.
A device can be associated to several tasks.
Muscle groups
The tasks in the system have been designed by a professional trainer to reduce
musculo-skeletal problems and pain in three major areas of the body, as well as
43
improving overall posture. The areas addressed by the exercises are back, shoulders
and legs. A small subset of exercises is designed for miscellaneous tasks such as
simple massage techniques or wrist rotations. In the system, these exercises fall
under the category labeled “maintenance".
A muscle group can be associated to several tasks.
Complexity levels
Complexity levels capture differences between various tasks on two major coordinates: number of muscle groups involved by a movement (e.g., often, a movement
does not completely isolate a muscle group but instead, it targets a primary group
and incidentally trains a secondary one) and complexity of the movement (e.g.,
flexion, joint rotation, rotation and translation, etc.)
A complexity level can be associated to several tasks.
Break
A break represents a set of tasks executed by a user in a certain order, such that
the user exercises at least the period of time indicated at the beginning of the break
(e.g. 2 minutes). A completed break is identified in the system by the following
elements:
• id - an identifier that uniquely points to a break and indicates which tasks
have been included in that break
• user id - the id of the user who has performed the break
• date - the moment in time when the user has performed the break
• duration - the duration of the break measured in seconds; a completed break
has the duration equal to the duration indicated by the user when she started
the break; a break that was not completed (as a result of skipping too many
tasks) is saved in the database with a duration of 0 seconds; a break that was
aborted is not saved in the database
Completed task
A completed task is an instance of a basetask; several completed tasks are part of
a break. The duration of the break dictates the number of tasks included in that
break. Considering that for the moment all the tasks have a duration of 30 seconds,
the number of tasks for each type of break is as follows:
• a 2 minutes break will consist of 8 basetasks out of which at least 4 need to
be completed
• a 4 minutes break will consist of 12 basetasks out of which at least 8 need to
be completed
44
• a 6 minutes break will consist of 16 basetasks out of which at least 12 need to
be completed
The role of the 4 extra tasks that are included in each break is to allow the user
to skip a task should she not like it or if it is too complex.
A completed task is identified through the id of the basetask, the id of the break
in which it was included and a boolean flag indicating if the task has been completed
or not. The flag is set to TRUE if the task is watched until the end without skipping
it, or set to FALSE if a “skip” event has occurred before the duration of the task
has expired.
45
5
System Design and Implementation Details
This chapter will delve in the details of Oskar - a recommendation system that
makes use of domain-expert knowledge and user preferences to give users tailored
recommendations. Oskar is the recommendation engine that powers the OmaTauko
product-service system.
First, a motivation for the decision of building a domain-expert knowledgeenhanced recommendation system is provided. Next, the overall architecture of
Oskar and its components is detailed.
5.1
Motivation for a domain-expert enhanced recommendation system
Having described the current state of OmaTauko, thoroughly detailed the domain
model and using the insights presented in section 3.5 I will now provide the reasons
why I consider a recommendation system that leverages the knowledge of a domainexpert would be a good fit for the case of OmaTauko.
First, the domain addressed by OmaTauko is highly specialized and very narrow.
Occupational health is a sub-domain of health and well-being and I consider that a
system targeted at this domain (or any of its sub-domains) should benefit from the
knowledge of a domain expert. Furthermore, OmaTauko is recommending physical
exercises for its end-users. The exercises involve various types of movements and
working with potentially heavy objects; thus the risk of misuse and injury exists. The
domain-expert is already part of the OmaTauko offering through his participation
in the videos used to illustrate the exercises. However, I consider highly relevant
that the domain-expert should be involved in the decision process as well, in order
to enhance the quality of recommendations for the end-users. This decision is also
backed up by findings from literature which show that too few health and well-being
mobile apps include evidence-based content and theory-based strategies that would
lead to a significant improvement in the user‘s life [4], [16].
Second, the current pool of basetasks already contains over 100 items. This, coupled with the fact that the primary delivery platform for OmaTauko is represented
by the mobile devices, renders the display of all the exercises virtually impossible.
One solution would be to present the basetasks in a hierarchical view and allow the
user to select the basetask herself. However, the approach would involve a lot of
user interaction with the system, and the whole time allocated for the break would
likely be spent in browsing the list of exercises. In the current format, OmaTauko
allows a user to start a break and exercise with only three touches of the screen.
Another argument against this solution would be that the approach would allow the
user to repeatedly select the same exercise(s) and potentially over-develop only one
part of the body. A recommendation system would solve these issues by rotating
the exercises in an intelligent manner such that a user does not have to repeat an
exercise in successive days.
I am opting for a hybrid recommendation system which blends a content-based
46
filtering technique with the knowledge elicited from a domain-expert. I base my
decision, of including domain-expert knowledge in the recommendation system, on
the fact that in a setting such as health and well-being, only the knowledge of
a domain-expert could lead to generating recommendations that are in-line with
the user‘s current needs. The choosing of a CBR component over a collaborative
approach is motivated by the fact that users are quite different in terms of physical
condition and endurance; therefore, what is suitable for one user may be harmful
for the other and, therefore, the voice of the community should not have a high
impact on the final recommendations. User demographics could have potentially
improved this setting; however, at the moment, due to technical limitations and lack
of reliable information, a demographic component cannot be implemented. Finally,
with respect to over-specialization - the major shortcoming of the CBR approach
- I expect this to be compensated by the domain-expert component, as it will be
illustrated shortly.
5.2
Domain expert involvement
For the purpose of collecting the domain-knowledge required for our RS, I collaborated with a domain-expert. In this particular scenario, the domain-expert is a
personal trainer with advanced knowledge in fitness training techniques and training schedule creation.
Over the course of several meetings we discussed the aspects that should be taken
into consideration when building a list of recommended tasks that are included in
a break. We encoded the results of our discussions in “if-then” rules, as presented
in [10]. The domain-expert is actively involved in three of the four components
of the algorithm, namely: user level decision component, expert recommendation
component and tasklist ranking component.
Additionally, using concepts presented in [17] and [22], I asked the domain-expert
to assign weights to the following concepts in the system: devices, muscle groups,
complexity levels. By using these weights, I wanted to better capture the differences
between various types of devices or muscle groups. For example, working with
a kettlebell is fundamentally different than working with a tennis-ball. Similarly,
performing an exercise for legs has a different impact than performing an exercise
for shoulders. Moreover, I used the weights to compute various similarity measures
and decide on the one that best captures the difference between two basetasks. The
results are synthesized in section 5.3.2.
5.3
5.3.1
System description
Objective
The entities participating in the proposed recommendation system are:
• active user - the user who explicitly takes a break and is recommended with a
list of exercises;
47
• basetask - an exercise with a title and description, which must be executed
with one of the available devices, having a well defined complexity level and
targeting a primary muscle group
• user preferences
– a list of breaks performed in the past and their corresponding tasks.
The user preferences aggregate the total duration of the workout over a
certain period of time, and whether the tasks were liked by the user or
not. The system differentiates between a liked/not liked task by using
an implicit feedback heuristic: if an exercise runs its course until the end
of its duration (e.g., 30 seconds) then it is inferred that it was a suitable
exercise and it is marked as liked; if an exercise is skipped, regardless of
the fact that it was started or not, it is inferred that the exercise was not
suitable and it is marked as unliked
– break duration - the duration of the current break, in minutes, explicitly
indicated by the user through the user interface
– list of devices - a list of devices with which the user would like to perform
the exercises in the current break; explicitly defined by the user through
the user interface
• domain-expert knowledge
– set of rules according to which the expert recommendation should be
computed; the rules are detailed in section 5.3.4
– set of weights associated with the concepts of muscle group, devices and
complexity levels, aimed to better discriminate between different values
of the same concept
Thus, the problem the algorithm is trying to solve, can be stated as following.
Problem statement
Given the user‘s explicit preferences with respect to the duration of the break and
the devices which she would like to exercise with, the algorithm makes use of past
user preferences and domain expert knowledge to compute a list of exercises that fit
the current level of the user, the desired duration and the selected devices.
5.3.2
Choosing the similarity measure
In order to decide on which similarity measure to use, I have performed several
experiments computing various similarity measures between each possible pairs of
tasks from our collection. I have searched for a similarity measure that would transition smoothly from the minimum possible value (0) to the maximum possible value
(1) across the whole set of possible pairs. I was interested in a smooth transition
such that, when applying a similarity threshold, I would not loose too many items
due to large gaps between similarity scores.
48
Relative weights
First, I have tried an approach in which a task has exactly one muscle group associated to it. Thus a task is represented by a vector in a 3-dimensional space having
the following configuration:
• t =< muscle_group, device, complexity >
where the concepts of muscle group, device and complexity were encoded using the
values from Fig. 17. The weights were awarded by the domain-expert and are
designed to capture the differences between two elements from the same category.
For example, an exercise for the back is closer in execution to an exercise aimed
for shoulders and more different than an exercise aimed for the legs. All of them
are fundamentally different from the maintenance exercises. Likewise, an exercise
performed with the user‘s body-weight, is more similar to an exercise performed
with a kettlebell, and both are fundamentally different from an exercise performed
with the tennis ball.
Muscle group
Maintenance
Shoulder
Back
Leg
Weight
0.001
0.1
0.3
1
Device
Weight
Tennisball
0.001
Stretching
0.01
Bodyweight
0.3
Kettlebell
1
Complexity Weight
Low
0.01
Medium
0.1
High
1
Figure 17: Domain-expert defined relative weights
I have measured the cosine similarity, Jaccard Index (intepreting a tuple as a
set) and Euclidean distance. The results are shown in Fig. 18.
Not surprisingly, the Jaccard Index is not a good similarity measure in the current
representation, due to the small number of distinct elements in a tuple. Euclidean
distance is performing better than Jaccard Index; however, there are still areas on the
plot where the transition from one pair to another is done in significant steps. Cosine
similarity captured best the pattern I was looking for; however, cosine similarity is
unable to capture the difference in magnitude between two vectors. For example,
consider two tasks t1 =< 0.001, 0.001, 0.01 >, t2 =< 1, 1, 1 > corresponding to the
two most different tasks in the system. According to (6), sim(t1 , t2 ) ≈ 0.6, while
the Euclidean distance between the two, as defined by (9) is d(t1 , t2 ) = 0.98, which
corresponds to a similarity of 0.02, managing to capture in a more accurate manner
the difference between the two items.
49
1.2
Cosine similarity
Jaccard Index
Euclidean Distance
1
Similarity
0.8
0.6
0.4
0.2
0
−0.2
0
200
400
600
800
Pairs #
1,000
1,200
Figure 18: Similarity metric comparison for task representation t =< m, d, c >
11-dimensional, boolean representation
In the second approach I have increased the number of dimensions on which a task
is represented, in a fashion similar to the one described in [35]. Thus, a task is
represented in an 11-dimensional space having the following format:
• features 1-4 encode the muscle group targeted by the task; only one feature in
position 1-4 has a value of 1, while the others have a value of 0; the sequence
of features 1-4 is < shoulder, back, leg, maintenance >
• features 5-8 encode the device with which the task should be performed; only
one feature in position 5-8 has a value of 1, while the others have a value of 0;
the sequence of features 5-8 is < tennisball, stretching, bodyweight, kettlebell >
• features 9-11 encode the complexity of the task; only one feature in position 911 has a value of 1, while the others have a value of 0; the sequence of features
9-11 is < low, medium, high >
Therefore, a task targeted for legs, of medium complexity and executed with
bodyweight has the encoding t =< 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0 >.
For this approach I have only considered cosine similarity and Euclidean distance,
and ruled out from the start Jaccard index on the account that all the distinct
elements in this space were 0, 1 and the Jaccard index would have had only three
potential values: {0, 0.5, 1}.
The results obtained using this representation were worse than the results yielded
in the first approach (Fig. 19).
50
1.2
Cosine similarity
Euclidean Distance
1
Similarity
0.8
0.6
0.4
0.2
0
−0.2
0
200
400
600
800
Pairs #
1,000
1,200
Figure 19: Similarity metric comparison for task representation
t =< m, m, m, m, d, d, d, d, c, c, c >
6-dimensional, fuzzy representation of the muscle groups
Finally, in the third approach I have represented the task as a 6-tuple, t =<
m1 , m2 , m3 , m4 , d, c >, with the following features:
• features 1-4 capture the targeted muscle group of the exercise; however, in this
approach I am leveraging the fact that the exercises in the system do not fully
isolate a muscle group; therefore, a task is primarily addressed for a dominant
muscle group, but it may happen that at least another group is trained involuntarily and to a small extent. The sum of the first four components should
be 1.
• feature 5 represents the device with which the task should be performed
• feature 6 represents the complexity of the task
• each of the components has associated to it a weight defined as:


wmi = 0.25, i = 1 : 4
wd = 1


wc = 1
(15)
The domain-expert has indicated the dominant muscle group for each exercise
and, if the task does not fully isolate the movement, at least one adjacent muscle
51
group that is incidentally trained. I have tested this representation for cosine similarity, classic Euclidean distance, and three variations of a weighted distance as
defined by the equation:

n
P
2
k
 i=1 wi · (ai − bi ) 

wdk (a, b) = 
n


P
wi
(16)
i=1
for k = 0.5, 1, 2. The similarity between two items, when using the wdk distances,
is defined as:
swdk (a, b) = 1 − wdk (a, b)
(17)
1.2
1
Frequency
0.8
0.6
0.4
d
wd1
wd2
wd1/2
sim
jac
0.2
0
−0.2
0
400
800 1,200 1,600 2,000 2,400 2,800 3,200
Items
Figure 20: Similarity metric comparison for task representation
t =< m1 , m2 , m3 , m4 , d, c >
The results from Fig. 20 suggest the following:
• Jaccard Index (curve jac - purple color) exhibits very sharp transitions from
one value of similarity to the next therefore, it is not the desired metric
• Cosine similarity (curve sim - orange color) has a smooth transition over the
range of possible value and the shape would recommend it as a good candidate;
however, as discussed earlier, it does not account for the magnitude of the
vectors between which the similarity is computed, therefore it will not be able
to detect differences in magnitude for the components d and c of the task.
Consequently, I have discarded this measure as well
52
• Euclidean distance (curve d - red color) exhibits a very sharp increase around
the similarity value of 0.65. Moreover, if a threshold of 0.6 would be considered
to define similarity between two items, when using this distance only approx.
600 pairs of items would pass it, which accounts for 35 distinct items (k items
generate k(k − 1)/2 pairs). Considering that some of the items might not fit
the user-preference requirement, I have considered the resulting number to be
too low; lowering the threshold would defeat the purpose of item similarity.
Therefore, we discard this measure as well.
• Weighted distance (k = 1/2) (curve wd1/2 - black color) has a similar shape to
curve d, therefore the same arguments stand and I did not consider the metric
as suitable
• Weighted distance (k = 2) (curve wd2 - blue color) is not too discriminative
for the current set of items. Even a restrictive threshold of 0.8 would result in
considering approx. 2200 pairs (approx. 65 items).
• Weighted distance (k = 1) (curve wd1 - green color) is the metric which I have
decided to use since it exhibits the best trade-off in terms of discriminative
power and shape.
5.3.3
Choosing the similarity threshold
In order to determine the value of the threshold used for deciding on the similarity
between two items, I have computed the similarity of each item against all the other
items, and plotted the values in a heat-chart map presented in Fig. 21.
Figure 21: Heat-chart map of the similarities between all pairs of items,
wd1 , τ = 0.8
53
The matrix has the dimension 78 × 78 with the element Sim(i, j) representing
the similarity between the item at position i and the item at position j. Due to
the reflexive properties of the considered metric (wd1 (i, j) = wd1 (j, i)) the matrix is
symmetrical with respect to the first diagonal.
In order to determine a suitable value for the similarity threshold (τ ) I applied
the following rules:
white, Sim(i, j) < τ



red, Sim(i, j) = τ
colorcell(i, j, τ ) =

green, Sim(i, j) = 1



linearly interpolated between red and green, τ < Sim(i, j) < 1
for τ = 0.8.
Fig. 21 suggests that, with only three exceptions, τ = 0.8 is a good value for the
similarity threshold, as for each item there is a list of at least 30 potential similar
candidates.
5.3.4
Oskar architecture
The architecture of Oskar is presented in Fig. 22. The green (dark) blocks represent the areas where the domain-expert is involved, while the yellow (light) blocks
represent the areas where user preferences are captured or exploited.
User Interface
The user interface module provides the user with means to interact with the system.
In the current setting the user has two available platforms of interaction: mobile
(using touch technology) and web (using classic keyboard and mouse interaction).
The user interface is the entry-point in the recommendation process. From the
starting screen (Fig. 9) the user indicates that she would like to start a break.
The next step is to select the devices with which she would like to exercise during
the break, and the duration of the break. Once this information is provided, the
recommendation process is initiated.
User Level Update Component
The User Level Update Component uses as an input the user‘s current level and her
past history and, using domain-expert defined rules, it computes an updated user
level. Current possible levels for the user are beginner, intermediate and advanced.
The domain-expert has defined rules for upgrading/downgrading from one user-level
to another as follows:
• [Upgrade rule #1] a user can advance from beginner‘s level to intermediate
level if she exercises an average of 40 minutes per week for the past three
weeks; the reason behind this rule is that beginner users have to show a fair
level of commitment and develop a healthy habit for exercising
54
• [Upgrade rule #2] a user can advance from intermediate level to advanced
level if she exercises an average of 55 minutes per week for the past two weeks;
the reason behind this rule is that an intermediate user which has already
developed a healthy habit for exercising must show an increased level of commitment, for the results to start paying off.
• [Upgrade rule #3] a user can maintain her advanced level if she exercises an
average of 60+ minutes per week in the past week; the intuition behind this
rule is that staying in top shape should be more difficult than getting there
• [Downgrade rule] a user is downgraded to the previous level, if she fails to
achieve the average weekly break duration for her level
Figure 22: Oskar architecture
55
Training Type Decision Component
The training type component receives as input the updated user level and the user‘s
past preferences, and decides on the type of training for the next break. Two options
are available: holistic training and focused training.
A holistic training is an exercise plan for a break, which targets all the muscle
groups the system is trying to address (e.g., back, shoulder, leg, maintenance). By
contrast, a focused training is an exercise plan for a break, which targets at most
two muscle groups.
The decision regarding the type of training is made using the following rules:
• [Holistic training rule #1] If the user level is beginner, then the training
is holistic
• [Holistic training rule #2] If the user level is at least intermediate then,
if in the last week she did not have at least three days with at least three
trainings, then the training is holistic
• [Focused training rule] - the else branch of [Holistic training rule #2]
The decision regarding which muscle groups should be included in the break is
made as follows:
• [Muscle groups rule #1] If the training type is holistic, then all the muscle
groups are included in the training
• [Muscle groups rule #2] If the training type is focused, the next muscle
group(s) that should be trained is(are) selected. The sequence in which the
muscle groups should be trained is defined by the domain-expert.
• [Muscle groups rule #3] If the duration of the training is 2 or 4 minutes,
only one group is included; if the duration is 6 minutes, two muscle groups are
included
The result of this component is a list of muscle groups which should be targeted
by the exercises in the new break.
Domain Expert Recommendation Component
The domain-expert recommendation component uses the muscle groups and list of
devices and returns all the tasks that match these restrictions. The result serves as
one of the inputs for the tasklist ranking component.
User Preferences Recommendation Component
The user preferences recommendation component implements a content-based filtering technique as described in section 2.4.
First, the date of the last workout for the current user is retrieved. If the date of
the last break is at most a week in the past, then the tasks included in all the breaks
56
performed in the last week are considered, and the 15 most recent are considered to
be user‘s history. The number is a control parameter and can be adjusted for fine
tuning the algorithm.
The reason I have decided on this value is that in the current setting, in which
all the tasks have a duration of 30 seconds, 15 tasks are sufficient to cover 9 minutes
of breaks. Considering that only breaks of 2, 4 and 6 minutes are currently possible,
this condition is strong enough to guarantee that a user is not performing the same
tasks twice in consecutive breaks (e.g., the worst case scenario is to have a break of
6 minutes, followed by another break of 6 minutes).
If the date of the last break is more than a week in the past, then the user history
will consist of the last 5 completed breaks. The parameter is subject to change for
the purpose of fine-tuning the algorithm. Again, the size of the user‘s history is
truncated to at most 15 tasks, this time for optimization purposes. The final set of
at most 15 tasks is referred to as the user_history_set.
Next, the remaining tasks, which are not included in the user history set, are
retrieved. This set is referred to as the candidate_set. A similarity measure between
each element from the user_history_set and each element from the candidate_set is
computed, and only those tasks for which the similarity meets a predefined threshold
are kept. The resulting set of similar tasks is delivered as the second input for the
ranking component.
Ranking Component
The objective of the ranking component is to merge the two list of tasks (expert
recommendations and user-preferred), select a number of tasks per break according
to the duration of the break, and order them in such a way that the number of
skipped tasked will be minimized.
• [Ranking rule #1] The tasks recommended by the domain expert have priority over user-preferred similar tasks
• [Ranking rule #2] If the duration is 2 minutes, the tasks with the more
important devices (Fig. 17) are scheduled in the beginning of the break
• [Ranking rule #3] If the duration is 4 or 6 minutes, no two tasks that are
targeted for the same muscle group should come in succession
• [Ranking rule #4] If the duration is 4 or 6 minutes, the tasks with low
complexity come before the tasks with higher complexity (to allow the user a
minimal warm-up period)
An additional four tasks are added to the final list of recommendations, in order
to give the end-user some room to tailor her training, if the domain-expert recommendation does not fully suit her. The extra tasks are primarily retrieved from the
user-preferred set.
57
Feedback Component
The ranked list of tasks is delivered to the end-user so she can start her break. If
the user completes an exercise, it is marked as liked. Alternatively, if the user skips
an exercise, it is marked as not liked. Therefore, an implicit feedback collection
technique is being used, similar to the ones presented in [25] or [33]. A completed
break, either fully completed or fully skipped, is saved in the Usage Log component
for future reference.
58
6
System Evaluation
6.1
Experiment design
For the purpose of evaluating the recommendation system, I have designed a survey
to measure the performance of Oskar from two perspectives. An objective measurement was aimed at capturing the global success rate of the recommendation system,
defined as below:
GRSsuccess =
#relevant breaks
#total breaks
(18)
where a break is considered relevant if less than 5 tasks in it were skipped.
A second objective measurement was aimed at capturing the succes rate of the
recommendation from a user‘s perspective and is defined by the equation:
U RSsuccess =
#relevant breaks per user
#total breaks per user
(19)
A subjective measurement was aimed at capturing the user benefits that can
be achieved through such a recommendation system, that blends domain-expert
knowledge with user preferences. For this measurement, several dimensions were
considered:
• Motivation - is the system able to motivate the user to exercise regularly
• Usefulness - is the system able to help preventing the occurrence of neck and
back pain resulted from spending long hours in front of the computer
• Fitness relevance - can the system help in reaching a training goal
• Freshness - is the system able to recommend fresh items over a period of time
such that a user is motivated to keep using the system
• Customization - is the system able to provide tailored recommendation, in
accordance with the user‘s preferences and her skill levels.
The questions used in the survey can be found in section 8.
In order to generate accurate recommendations, past user-history was required
for both the domain-expert-based component as well as for the user-preferencesbased one. However, the commercial nature of the system, and the external context
were important factors that prevented the evaluation of the system in a full-fledged
real-life scenario. Specifically, OmaTauko‘s customers were not willing to participate
in a research project and generate enough information that would help Oskar to
provide accurate recommendations. Reasons for declining the participation varied
from lack of interest to lack of resources (e.g. time, personnel). Therefore, the
evaluation scenario had to be slightly adapted in order to allow proper evaluation.
To this end, exploiting a shortcoming of the current setting - namely that before
implementing Oskar recommendations were generated in a pure random fashion - I
59
have generated several user profiles and 3 weeks-old user history behavior attached
to those profiles. The generated user behaviors fit 4 categories:
• “power” users - work 5 days per week, an average of 12+ minutes per day
• “average+” users - work 4 days per week, an average of 8 minutes per day
• “average-” users - work 3 days per week, an average of 4-6 minutes per day
• “lazy” users - work 2 days per week, an average of 2-4 minutes per day
The contents of each break were generated randomly using the old random algorithm. The rating behavior of the users was generated in such a way that all the
breaks were considered relevant; each break contained 3 skipped tasks in order to
better capture the user‘s preference with respect to a particular task. It should be
noted that this heuristic is different than a real-life scenario, where a user would
probably consistently indicate that she does not like a task. In order to compensate
for this shortcoming, when the user-preferences based recommendation is computed,
a task is considered to be part of the user‘s history and contributes to the recommendation process if, in the considered period of time, it was liked more times than
it was skipped.
Having user profiles generated and in place, an anonymous online survey was
designed to collect the desired information. The survey consisted of 5 steps:
• the first three steps were aimed at collecting the respondent‘s opinion about
recommended breaks. The respondent was asked to evaluate three breaks:
2 minutes, 4 minutes and 6 minutes long - for which the equipment (tennis
ball, kettle-bell, stretching, body-weight) was randomly assigned. The user
was displayed with the targeted muscle groups for each break, as well as with
the title, description and the video of the exercise for each task. The rating of the exercise was captured through an explicit action: pressing an OK
button for indicating the relevance of the exercise, or pressing a Skip button
for indicating the irrelevance of the exercise. Moreover, when indicating relevance/irrelevance of an exercise, the participants were instructed to evaluate
the exercise from the following perspectives: compliance with personal fitness
levels, complexity of the movement, if it is indeed relevant for the targeted
muscle groups.
• in the fourth step, the respondent was asked to fill in a questionnaire consisting
of five closed questions (1-5 Likert scale) in order to capture her opinion about
the five previously mentioned user-benefits.
• the fifth step consisted of collecting minimal demographic information from
the respondent: age and gender
It is worth noting that when a survey respondent was recommended with a new
break, the break was generated each time for a previously created user profile that
fit one of the categories “power”, “average+”, “average-” or “lazy” user. In this way,
a relatively small amount of users was used to evaluate recommendations for three
times more synthetic user profiles from the database.
60
6.2
Results and discussion
The survey was sent to 50 participants over the course of a week; no incentive for
participating in the survey was provided. 35 participants successfully filled in the
survey resulting in a participation success rate of 70%. The gender distribution of
the participants was 28.5% women, 71.5% men, (age mean: 26.65 years, standard
deviation 3.11 years). All the participants in the survey were involved in a form of
work which entailed spending many hours in front of the computer, thus fitting the
market segment targeted by OmaTauko.
The first finding of this study is related to the accuracy of the recommendation
system. A total number of 35 users successfully participated in the survey; for
each user 3 breaks have been generated, one for each available duration (2, 4 and 6
minutes), resulting in a total of 105 breaks. Using the heuristic previously described
- a break is successful if it contains less than 5 skipped tasks - the analysis revealed
a total number of 77 successful breaks, leading to GRSsuccess = 73.3%.
Next, I was interested in finding out if there is a pattern in the nature of the
skipped breaks with respect to the duration of the break.
# of skipped breaks
15
10
5
0
2
4
Break duration (minutes)
6
Figure 23: Distribution of skipped tasks over break duration
As expected, the 6 minutes breaks were the ones with the highest incidence of
being skipped, while the 2 minutes breaks were only skipped once. One reason for
this might be the excitement level of the survey respondent.
On one hand, a 2 minute break contains only 8 exercises, hence the list was rather
short; moreover, most of the users were in contact with the concept of OmaTauko
and Oskar for the first time and the novelty of the system might have contributed
to the high succcess rate of the 2 minutes breaks, which were first provided to the
user. On the other hand, a 4 minute break contains 12 exercises, while a 6 minute
61
breaks contains 16 exercises. The possibly high number of exercises included in a 4
or 6 minute break, coupled with the repetitive nature of the task of rating exercises,
might have contributed to a significant drop in the success rate of 4 and 6 minutes
breaks, compared to 2 minutes breaks.
More important, as previously described, a survey participant was not assigned only
one user profile from the databse, but instead he rated recommendations provided
based on three different user profiles. This detail might have led to situations in
which a respondent was asked to rate in a 6 minute break, an exercise which he had
previously completed in a 2 minute break. It is highly likely that in these situations,
respondents skipped the exercise the second time they were recommended with it.
Fig. 24 displays the histogram of the recommended tasks for the frequency of the
recommended tasks as well as the frequency of skipped tasks. As the figure suggests,
all the tasks were involved in the recommendation process for the considered userbase, demonstrating that the algorithm is able to cover the whole dataset. Each task
was recommended at least 6 times, and on an average 15.88 times. With respect to
the number of skipped tasks, apart from 5 items, all the other ones were skipped at
least once.
# of recommendations/skips
35
Skipped
Recommended
30
25
20
15
10
5
0
0
10
20
30
40
50
Task ID
60
70
Figure 24: Histogram of recommended/skipped exercises
There were 10 tasks that were skipped in more than 50% of the cases when they
occurred in the recommendation. 6 of these tasks were targeted for shoulders, one
involved working with the body weight, and the remaining three involved working
with a tennis-ball.
Fig. 25 illustrates the success of the recommendation system from a user‘s perspective.
62
# of users
15
10
5
0/3
1/3
2/3
3/3
# skipped breaks / # recommended breaks
Figure 25: U RSsuccess
As the figure suggests, 48.6% (17) of the survey respondents did not invalidate
any break through their skipping behavior, 22.9% (8) of them indicated that one
break did not contain enough relevant exercises, while the remaining 28.5% (10)
indicated two breaks that did not contain enough relevant exercises. None of the
users has invalidated all three recommended breaks.
Finally, the results of the subjective evaluation of the recommendation system
are presented in Fig. 26. The survey captured the user benefits of Oskar in terms
of 5 dimensions.
First, with respect to motivation, 80% of the users provided a positive feedback
(either Strongly Agree or Agree) considering the system‘s capability of increasing
end-users‘ motivation level to exercise regularly.
Second, regarding the system‘s usefulness in preventing the occurrence of neck
and back problems, 85.7% (30 users) expressed themselves in a positive manner,
while only one user disagreed with the affirmation.
Third, in terms of the system‘s capability to help reaching a training goal, 48.5%
of the respondents replied positively to this question, 28.5% of them were neutral
and the remaining 23% were skeptical on this coordinate. The slightly poor results
with respect to this coordinate might be attributed to the fact that OmaTauko is
not necessarily aimed for regular training and achieving fitness goals, but rather for
preventing musculoskeletal problems. In that sense, the users‘ responses illustrates
that this question might not have been relevant in the context of the survey and the
described service.
In terms of the system‘s capacity to recommend fresh tasks on a regular basis,
80% of the respondents believed this system is able to provide fresh, non-redundant
recommendations, while the remaining 20% were neutral with respect to this dimen-
63
sion. None of the respondents answered negatively to this question.
Finally, in terms of the system‘s capability to provide tailored recommendation
in accordance with the user‘s profile, 77% replied in a positive way (34% - Strongly
Agree, 43% - Agree) while only 5% replied in a negative. The high scores of the last
two items come to support the results obtained through the objective measurements
(recommendation success rate from a global and user perspective) and validate the
findings of the survey.
25
24
24
20
#respondents
17
16
15
15
13
12
10
10
7
6
5 4
4
4
1
2
Motivation
4
1
0
0
Usefulness
Strongly Agree
6
Agree
2
1
2
0 0
Fitness
Neutral
0
Freshness Customization
Disagree
Strongly Disagree
Figure 26: Subjective evaluation of the recommendation system
In conclusion, the evaluation results are generally positive and show that such a
recommendation system would be of good use for the end-users. However, the small
sample of respondents to the survey, and the pseudo-synthetic nature of part of the
data represent an important limitation of this study, which should be addressed in
future research. The next logical step is to confirm the results of this evaluation process by deploying the recommendation system in a real-life scenario and to perform
relevant quantitative and qualitative analysis over a longer period of time.
64
7
7.1
Conclusion and Future Work
Conclusion
Information overload refers to the situation in which a user‘s access to information
is limited due to the high number of available options which add an overhead in the
decisional process to select the relevant information. This work has tackled the problem of information overload in the domain of occupational health and well-being.
In a world where the individuals spend increasing amounts of time while staying
connected and generating data, the problem of information overload is becoming
increasingly relevant. Recommendation systems are an information filtering technique that have the potential of solving the problem of information overload; they
guide the user in a large universe of information towards items that are likely to be
relevant for her.
The second facet of the problem consists of the significant resources (material
and human) spent in the domain of healthcare and the high costs of treating a set
of diseases that otherwise can be easily prevented at smaller costs. The adoption
of mHealth - medical and health-related services and products supported by mobile
devices - has offered the possibility that a set of diseases (e.g., obesity, circulatory
diseases, stress, musculo-skeletal problems, etc.) could be prevented, by making use
of information consumed through a mobile phone and a data connection.
In such a context, this study has highlighted the importance of users having
access to high-quality, health-related information through their mobile devices, and
suggested as a solution the integration of domain-expert knowledge into recommendation systems, in order to provide relevant information for end-users.
Two research questions were addressed. First, this study attempted to find
out how can domain-expert knowledge be used to enhance user-preference based
recommendations. As an answer, Oskar - a hybrid recommendation system that
blends domain-expert knowledge with user preferences - was implemented and presented. Oskar is the recommendation system that powers OmaTauko - a health
and well-being product-service system, which enables end-users to keep and track
micro-breaks aimed at decreasing their musculo-skeletal problems and increasing
their energy levels.
Second, this study tried to elicit the user benefits that can be achieved by augmenting preference-based recommendations with domain-expert information. Evaluation results indicate that 73.3% of the recommendations were accurate; moreover,
all the evaluation participants were able to complete at least one break, with 48.6%
of them indicating that all the breaks were relevant and were able to complete all
three of them. Furthermore, on an average, 80% of the respondents provided positive
feedback with respect to the system‘s ability to motivate them, system‘s usefulness,
fitness relevance, freshness and demonstrated capabilities of customization.
65
7.2
Future work
Future development of this work should focus on a number of topics. The first
major area that should be addressed is a more thorough evaluation of the system.
The commercial nature of the service has made evaluation quite difficult and the
sample used for measuring the performance of the recommendation system was
small. This is clearly a limitation of this study and one step to overcome it would
be to arrange a real-life scenario with the deployment of the recommendation system
in a production environment. As a suggested method, I recommend collecting users‘
behavior over a certain period of time using the current state of the system (without
a recommendation system in place). Next, without letting the user-base know that
a major feature has been released, deploy the recommendation system and repeat
the same measurements performed in the first part of the experiment. At the end
of the second period a comparison of the user‘s behavior, in the setting before and
after the installation of the recommendation system, can be performed in order to
objectively measure the impact of the recommendation system and the accuracy of
the recommendations.
Second, with respect to the system‘s user-experience and user-interface (UX &
UI), an explanation interface of the provided recommendations should be developed,
in order to motivate the decisions of recommending one item instead of the other.
In the same area of UX & UI, the user should be aware of the rules used by the
recommendation system and appropriate elements should be included (e.g., let the
user know how much does she still need to work until she reaches the next level,
what are the benefits of the next level).
Finally, in order to mitigate the end-users‘ possible reluctance to use the service, this should include functionality that would allow exercising in groups and,
accordingly, a group-recommendation system would be of high value in this context.
Group recommendation systems aggregate the models of individual users to provide
a meaningful recommendation for the active group [15]. Enabling end-users to work
in groups is likely to reduce the social pressure which some of the end-users might
experience; also, this direction of development provides opportunities of including
a collaborative-filtering component in the recommendation algorithm, and allowing
the user community to have a stronger voice in the recommendation process.
66
8
Appendix - Survey questions
1. [MOTIVATION] This system would motivate me to exercise regularly.
A.Strongy Disagree B.Disagree C.Neutral D.Agree E.Strongly Agree
2. [USEFULNESS] This system will help in preventing the occurrence of neck
and back problems associated to long hours spent while sitting in front of the
computer.
A.Strongy Disagree B.Disagree C.Neutral D.Agree E.Strongly Agree
3. [FITNESS RELEVANCE] This system could help reaching a training goal.
A.Strongy Disagree B.Disagree C.Neutral D.Agree E.Strongly Agree
4. [FRESHNESS] I am satisfied with the variety of exercises provided per break
(in terms of not repeating the same exercises over a certain period of time).
A.Strongy Disagree B.Disagree C.Neutral D.Agree E.Strongly Agree
5. [CUSTOMIZATION] I could use a system that would provide more tailored
exercise suggestions (with respect to my preferences and skill level).
A.Strongy Disagree B.Disagree C.Neutral D.Agree E.Strongly Agree
67
References
[1] mhealth - mobile technology poised to enable a new era in health care, Tech.
report, Ernst & Young, 2012.
[2] Gediminas Adomavicius and Alexander Tuzhilin, Personalization technologies a process oriented perspective, Communications of the ACM 48 (2005), no. 10.
[3] Chris Anderson, The long tail: Why the future of business is selling less of
more, Hyperion ebook, 2009.
[4] Kristen M.J. Azar, Lenard I. Lesser, Brian Y. Laing, Janna Stephens, Magi S.
Aurora, Lora E. Burke, and Latha P. Palaniappan, Mobile applications for
weight management, American Journal of Preventive Medicine 5 (2013), no. 45,
583–589.
[5] Toine Bogers and Antal van den Bosch, Collaborative and content-based filtering
for item recommendation on social bookmarking websites, Proceedings of the
ACM RecSys’09, Workshop on Recommender Systems & the Social Web (2009).
[6] Yukun Cao and Yunfeng Li, An intelligent fuzzy-based recommendation system
for consumer electronic products, Elsevier, Expert Systems with Applications
33 (2007).
[7] Paul-Alexandru Chirita, Wolfgang Nejdl, and Cristian Zamfir, Preventing
shilling attacks in online recommender systems, Proceedings of the 7th annual
ACM international workshop on Web information and data management (New
York, NY, USA), ACM, 2005, pp. 67–74.
[8] M.F. Costabile, D. Fogli, C. Letondal, P. Mussio, and A. Piccinno, Domainexpert users and their needs of software development, Proceedings Session on
End-User Development held at HCI International 2003 Conference (2003).
[9] Arthur W. DeTore, An introduction to expert systems, Journal of Insurance
Medicine 21 (1989), no. 4.
[10] Elena Claudia Dinuca and Mihai Istrate, Wine advisor expert system using
decision rules, Annals of the University of Oradea, Economic Science Series 22
(2013), no. 1, 1853–1864.
[11] Deloitte Center for Health Solutions, mhealth in an mworld - how mobile technology is transforming health care, Tech. report, Deloitte Center for Health
Solutions, 2012.
[12] Mustansar Ghazanfar and Adam Prugel-Bennett, "fulfilling the needs of graysheep users in recommender systems, a clustering solution", 2011 International
Conference on Information Systems and Computational Intelligence, January
2011.
68
[13] http://www.healthisajourney.net/fitness-community-blog/100-60-minutes-ofexercise-a-week-can-change everything, 60 minutes of exercise per week can
change everything.
[14] http://www.sciencedaily.com/releases/2013/05/130522085217.htm, Big data,
for better or worse: 90sciencedaily.
[15] Anthony Jameson, More than the sum of its members: Challenges for group
recommender systems, Proceedings of the Working Conference on Advanced
Visual Interfaces (New York, NY, USA), AVI ’04, ACM, 2004, pp. 48–54.
[16] Matthias Kranz, Andreas Möller, Nils Hammerlac, Stefan Diewaldb, Thomas
Plötz, Patrick Olivier, and Luis Roalter, The mobile fitness coach - towards
individualized skill assessment using personalized mobile devices, Elsevier, Pervasive & Mobile Computing (2012).
[17] Wei-Po Lee, Applying domain knowledge and social information to product analysis and recommendations - an agent-based decision support system, Expert
Systems 21 (2004), no. 3.
[18] Jure Leskovec, Anand Rajaraman, and Jeff Ullman, Mining of massive datasets,
2 ed., 2013.
[19] Pasquale Lops, Marco de Gemmis, and Giovanni Semeraro, Recommender systems handbook, ch. 3, Springer, 2011.
[20] Jie Lu, A personalized e-learning material recommender system, Proceedings of
the 2nd International Conference on Information Technology for Application
(ICITA 2004) (2004).
[21] Christopher Manning, Prabkhar Raghavan, and Hinrich Schütze, Introduction
to information retrieval, Cambridge University Press, 2009.
[22] Chunyan Miao, Qiang Yang, Haijing Fang, and Angela Goh, A cognitive
approach for agent-based personalized recommendation, Elsevier, Knowledge
Based Systems (2007), no. 20.
[23] Stuart E. Middleton, David C. De Roure, and Nigel R. Shadbolt, Capturing
knowledge of user preferences - ontologies in recommender systems, Proceedings
of the 1st international conference on Knowledge capture (2001).
[24] Kathleen Mykytyn, Peter P. Mykytyn Jr., and Craig W. Slinkman, Expert
systems - a question of liabiiity?, MIS Quarterly 14 (1990), no. 1, 27–42.
[25] Michael J. Pazzani, A framework for collaborative, content-based and demographic filtering, Journal of Artificial Intelligence Review - Special issue on
data mining on the Internet (1999).
[26] Francesco Ricci, Travel recommendation systems, IEEE Inteligent Systems
(Nov-Dec, 2002).
69
[27] Francesco Ricci and Quang Nhat Nguyen, Critique-based mobile recommender
systems, ÖGAI Journal, ÖGAI Press 24 (2005), no. 4.
[28] M Sasikumar, S Ramani, S Muthu Raman, KSR Anjaneyulu, and R Chandrasekar, A practical introduction to rule based expert systems, Narosa Publishing House, New Delhi, 2007.
[29] A. Singh, Knowledge based expert systems in organization of higher learning,
Proceedings of the International Conference and Workshop on Emerging Trends
in Technology (New York, NY, USA), ICWET ’10, ACM, 2010, pp. 571–574.
[30] Il-Yeol Song and Joseph LaGue, Predicting expert system success: An expert
system for expert systems, Proceedings of the 1990 ACM SIGBDP Conference
on Trends and Directions in Expert Systems (New York, NY, USA), SIGBDP
’90, ACM, 1990, pp. 88–110.
[31] Xiaoyuan Su and Taghi M. Khoshgoftaar, A survery of collaborative filtering
techniques, Advances in Artificial Intelligence (2009).
[32] Chun-Yuen Teng, Yu-Ru Lin, and Lada Adamic, Recipe recommendation using
ingredients networks, Proceedings of the 4th International Conference on Web
Science (WebSci’12) (2012).
[33] Robin van Meteren and Maarten van Someren, Using content-based filtering for
recommendation, Proceedings of the Machine Learning in the New Information
Age: MLnet/ECML2000 Workshop (2000).
[34] Lex van Velsen, Thea van der Geest, and Michäel Steehouder, The contribution
of technical communicators to the user-centered design process of personalized
systems, Technical Communication 57 (May 2010), no. 2.
[35] Manolis Vozalis and Konstantinos G. Margaritis, On the enhancement of collaborative filtering by demographic data, Web Intelligence and Agent Systems 4
(2006), no. 2, 117–138.
[36] Casey Dugan Werner Geyer, Inspired by the audience – a topic suggestion system for blog writers and readers, Proceedings of the SIGCHI Conference on
Human Factors in Computing Systems (2010).
[37] Joseph Williams, When expert systems are wrong, Proceedings of the 1990 ACM
SIGBDP Conference on Trends and Directions in Expert Systems (New York,
NY, USA), SIGBDP ’90, ACM, 1990, pp. 661–669.