IV. Improved filtering algorithm based on user similarity

An Improved Collaborative Recommendation Algorithm Based Optimized User
Similarity
Chen Hao ,Li Zhongkun,Hu Wei
College of Computer Science and Electronic Engineering,
Hunan University, Changsha, 410000, China
E-mail:[email protected]
Abstract—There are lots of issues existed in traditional
I.
collaborative filtering recommendation algorithm such as
Introduction
Collaborative filtering recommendation system is
data scarcities, cold boot, recommendation accuracy and
timeliness. To resolve these problems, a large number of
one
domestic and foreign scholars put up a variety of solutions,
recommended
to a certain degree, which achieved relatively desired results.
recommendation
But there are still existed neglected issues of rating scale
development of personalized recommendation technology,
difference for all projects in the traditional collaborative
personalized recommendations for e-commerce has
filtering algorithm while calculating the similarity between
brought enormous commercial interests, and it serves the
users within the project set. Despite the Adjusted Cosine
public in all areas of social life for the efficient, accurate
Similarity Algorithm and the Pearson Similarity Algorithm
and personalized features. By the join efforts of
have improved the issue, it is still existed that users have the
researchers of all the industries, the research and
problem of single rating scale difference of the project.
development of this technology have never stopped.
When the users have significant differences for the score
Personalized recommendation technology has become
vectors on a common set, they may have chance to get
increasingly
similar resultant vector results. The substantial presence of
technology
this kind of phenomena have a direct impact on the
recommended
accuracy of user similarity calculation, furthermore, it will
commodities quantities and network users, collaborative
affect the target user's predicted score accuracy. As to the
filtering recommendation technology faces multiple
problem, we try to join a balancing factor into the
challenges, such as timeliness and data scarcities, but this
traditional cosine similarity algorithm, which is used for
did not affect the popularity of collaborative filtering
calculating the project rating scale differences between users.
technology totally. Researchers improve their quality of
Besides,
improved
collaborative filtering, such as collaborative filtering
collaborative filtering recommendation algorithm based on
based on clustering, the probability of collaborative
optimized user similarity (ICFOS). The effectiveness of the
filtering algorithms, collaborative filtering based on
scheme is verified through the experiment. From the result
neural
we can get the conclusion that the balancing factor can
decomposition, based on a variety of models such as the
largely improve the accuracy of user similarity, and get
probability model, Bayes model abstraction , maximum
better recommendation effects.
entropy model, Gibbs abstract, linear regression, also
Index Terms—collaborative recommendation; user similarity;
shine.
we
are
planning
to
rating scale difference; balance factor.
propose an
of
the
most
successful
techniques
application
among
technologies.
mature , and
becomes
one
technique.
networks,
the
of
personalized
With
the
collaborative
of
With
collaborative
the
the
filtering
most
the
rapid
widely
increasing
filtering,
matrix
Based on these studies, the paper take advantage of
user ratings history information, a single score data
discrepancies between users and the combination with
F, Hung C proposed a phased score predicting method[8],
traditional user similarity measure algorithm, proposed
which used clustering method of initial scoring matrix to
collaborative recommendation algorithm based on the
cluster users and merchandise firstly, and then predicted
improvements of user similarity, and it verified the
and recommended the commodity by the weighted
feasibility
of
non-negative matrix factorization method on the results of
through
earlier commodity clustering. Zhang Juan, Hu Xuegang,
and
collaborative
performance
recommendation
improvements
algorithm
experiments .
Zhang Yuhong proposed a clustering algorithm which
II.
The
applied to the user's collaborative recommendation[9];In
Related Work
basement
of
collaborative
recent years, multiple areas of learning have become a
filtering
new research direction
[6-7]
, there have been many
recommendation algorithm is similar preferences of
scholars who began to focus on learning methods to
similar users. Finding synergy target audience is one of
migrate applications to collaborative filtering algorithm.
the core steps of collaborative filtering algorithm.
Migration scholars proposed many learning methods
User-based collaborative filtering algorithm is used to
corresponding collaborative filtering algorithm. There are
find the target user's nearest neighbor to coordinate target
model-based migrating approaches such as Rating Matrix
users. In order to improve the accuracy of collaborative
Generative
filtering recommendation technology, solve the cold start
Model,
Factorization, CMF
RMG-M[10],
[11]
Collective
Matrix
, Coordinate System Transfer,
problem and reduce the computational complexity,
CST
scholars in the user similarity computing and nearest
are also existed, such as the method of Transfer by
neighbor users selected aspects of a lot of research. Linas
Integrative Factorization, TIF[13]. There are some scholars
Baltrunas and Francesco Ricci proposed project-based
based on the improvement of social network structure
context-aware splitting collaborative filtering algorithm,
similarity calculation. Based on a social network diagram
which will split the project in two contexts condition and
to calculate the distance , R, Zheng, in 2007, proposed a
assessment with the matrix decomposition synthetic data
method uses a weighted distance as a similarity index
sets and neighbor CF. Recommended system can
correction between users. The closer the users become,
establish the user's interest model based on users’ past
the larger similarities they have. However, with the rapid
behavior information,and recommend the goods that
increase of Internet users, the computational complexity
their behavior had not yet produced to users by
of the distance between the users grows rapidly
personalized recommendation algorithm. In order to
accordingly. To some extent, the performance of
improve
filtering
recommendation algorithm has been improved and the
recommendation, researchers have proposed a variety of
cold starting problem has been alleviated, more
recommendation algorithm. For example, Piraste reduced
importantly, it opened a totally new perspective for
the matrix scarcities and solved the problem of cold start
research staffs.
the
quality
of
collaborative
through the use of information and the type of film
directors
[4]
[12]
, etc. Besides, sample-based migrating methods
These studies solved the problems of cold start and
, which requires the use of additional
data sparseness in the collaborative filtering algorithm to
information and directing Genre;Pitsilis established a
a certain extent, which improved the recommendation
hypothetical trust relationship through the usage of a
performance. However, they all did not take the existence
system user ratings data. By this way you can solve some
of a large number of phenomena into account while
cold-start problems and scarcities problems[5] , but the
calculating the user similarity. When two users were quite
trust relationships in this method is not the true social
different facing grading vector and scores, it is easily to
networks problems;Kumar used a matrix decomposition
get a similar sum vector, which leads users to get high
technique[3] to reduce the dimension of the matrix to
similarity. It is assumed that the user's score two vectors,
improve the accuracy of recommendation systems;Tsai C
respectively
,
r  (r1 , r2 , r3 , r4 )
r '  (r1' , r2' , r3' , r4' )
,
r ''  (r1'' , r2'' , r3'' , r4'' ) ,as shown in Figure 1:
r4
r1
r3
r2
r4''
r1''
'
1
r
r 2' r 3'
r2'' r3''
r4'
r ''
r'
r
Figure 1.Comparison of different sum vectors
Obviously, it can be seen from the Figure 1: There is a
great scale of differences among vector r1 , r2 , r3 , r4 ,
''
''
''
''
r1' , r2' , r3' , r4' , r1 , r2 , r3 , r4
Adjusted Cosine Similarity Algorithm and the Pearson
Similarity Algorithm.
The Cosine Similarity Algorithm:
. But their sum vectors r 、 r ' 、 r ''

 R
sim (u a , u b ) 
has a highly similarity. In real data, there are a huge
number of this type of data. If there is merely a simple
iI
adjusted cosine similarity algorithm used to calculate user
sim (u a , u b ) 
user similarity calculations deviation.
a ,i

iI a , b
(1)

iI
R b,i
2

( Ra ,i - R a )( Rb,i - R b )
2
(Ra ,i - R a)
iI a
In view of this, scholars proposed the Adjusted
Cosine Similarity Algorithm(ACSA) and the Pearson

(2)
(R b,i - R b ) 2
iI b
The Pearson Similarity Algorithm:

Similarity Algorithm(PSA) which have taken the rating
sim (u a , u b ) 
scale difference between users into account. The
iI a , b

( Ra ,i - R a )( Rb,i - R b )
2
(Ra ,i - R a)
iI a , b
experiment proved the two algorithms to some extent,
III.
2
The Adjusted Cosine Similarity Algorithm:
similarity, it will inevitably leads to a large number of
which improved the accuracy of user similarity.
Ra ,i  Rb ,i
iI

iI a , b
Equation (1)(2)(3): R a ,i is rating of user
(R b,i - R b )
(3)
2
u a on item i.
User-based collaborative filtering algorithm
User-based collaborative filtering algorithm need to
Here
Ra is the average of the user’s ratings. I a ,b
calculate the similarity with the target users of other users
firstly; Afterwards, we choose a higher similarity value M
Indicates the items that user
u a and u b co-evaluated.
users compose nearest neighbor set. Then predicting the
target user's all neighbor users target items in the
Ia
is item set that user
ua
rated.
Ib
is item set that
collection all ratings is necessary. And then we have to
calculate the scores in descending order, select the top-N
user
ub
rated.
items scores in descending order of recommendation to
the target user.
B. Generate neighbor set
Neighbor set is a collection of the users that has
A. The Cosine Similarity Algorithm
When calculating the users’ similarity, there are
similar preferences with current target user. The
technology
called
KNN(K-Nearest-Neighbor)[17]
is
mainly three kinds of similarity measure between users.
always used for selecting neighborhood in user-based
Including the Standard Cosine Similarity Algorithm, the
collaborative recommendation system. It used the
similarity as a weight to select top-K user as a neighbor
score vector is R2  (3,2,2,1) . The user 1’s rating and the
set to the target user.
user 2’s rating difference vector on the project is
C. Generate recommendation set
Rdiffer2  (1,0,0,1) . As shown in the following figure
After the neighbor set of the target user is selected, it
combined with all the neighbors’ scores of the project and
compares their vectors.
the similarity between the users to predict the target user's
R1  (5,4,4,3)
scores on the test project. Selecting Top-N scores of data
R2  (3,2,2,1)
from the result set as the recommended results. It is
assumed that the target user is u , the test program is i, the
predicted score of i is:
Pu ,i  Ru 

lNu
sim (u, l )  ( Rl ,i  Rl )

lNu
Equation (4):
Nu
sim (u, l )
(4)
Rdiffer 2  (1,0,0,1)
Rdiffer1  (1,0,0,1)
Figure 2.Comparison of different sum vectors
is the nearest neighbors set of
From the Figure 2 shows, when the Adjusted cosine
user u and Rl ,i is rating scores of user l on item i.
similarity algorithm’s and Pearson similarity algorithm’s
score on a single project has a large difference in the
IV.
collection of items, there will be a high degree of user
Improved filtering algorithm based on user
similarity phenomenon. Although this phenomenon is
similarity
Finding the collaborative neighbor set of the target
user accurately is the core of collaborative filtering
algorithm. By calculating the similarity between two
users,
traditional
user-based
collaborative
filtering
algorithm find the target user's top-K nearest neighbors,
then achieve the recommendation by the nearest neighbor.
Thus the accuracy of user similarity algorithm will
directly affect the performance of the algorithm
recommended. However the traditional cosine similarity
algorithm does not take a phenomenon abound into
based on the users’ average score that there is a certain
rationality, it is also not a normal phenomenon even if the
results is obtained with a high degree of similarity. In the
case of a large amount of data, this phenomenon is also
reasonable to happen, so that it is also necessary to
improve this phenomenon. The next section will present
the concept of the balance factor aimed at improving the
problem.
A. Balance factor
The
user-based
collaborative
filtering
account when calculating the similarity of users. When
recommendation algorithm has the problem of the high
two users’ scoring vector and single score were quite
degree of user similarity when calculating user similarity
different, it also very easy to get a similar sum vector,
on the condition of high differences of users’ single rating
which leads users to get high similarity. Improved
scale. The currently proposed algorithm doesn’t regard
collaborative
the
the differences of users’ single rating scale as a weight to
introduction of a balancing factor, which considering the
balance the similarity calculation results. To solve this
single difference between users within the neighbor set.
problem, this paper presents the concept of balance factor,
Therefore, to get a set of similar users with better
which will take the differences of users’ rating scale into
recommendation quality, we need combine traditional
account to the user similarity calculation to make up for
algorithms and proposed a similar balance factor. That is
to say, it considered the similarity between the user and
this shortcoming of traditional similarity calculation
method. The scale differences between user u a and user
the single score difference in the neighbor set. In order to
u b is calculated as follows:
recommendation
algorithm
is
prove the existence of the phenomenon, we assume that
the user 1’s score vector is R1  (5,4,4,3) and the user 2’s
sumDiffer( u a , u b ) 
iIa ,b ( Ra ,i  Rb ,i ) 2
user1
(5)
user2
M
Balance factor is calculated as follows:
 (ua , ub )  sumDiffer(u
a ,ub )
,0    1
(6)
co-items
Here sumDiffer (ua , ub ) is the scale differences
sumDiffer (ua , ub )
between user u a and user u b ,  (u a , ub ) is the
balance factor between user u a and user u b ,  is
 (ua , ub )
weight index of balance factor that requires repeated
tra-similarity
algorithm
amendments to get a relatively accurate value, I a ,b is the
items set that user u a and u b co-evaluated, R is item’s
user similarity
rating of user and M is the count of I a ,b . By calculated
we can get sumDiffer(ua , ub )  [0,4] ,  (u a , ub )  (0,1)
.
When
the
Figure 3. User similarity calculation flow chart
score
difference
between
users
sumDiffer (ua , ub ) is low, the balance factor proposed in
this paper  (u a , ub ) is tending to be 1. So that there will
The user similarity calculation process can run offline,
so you can reduce the recommended running time and
be less impact on the results of the initial similarity
problem of the real-time for recommending. We can
calculation. However, the higher the score difference
between users sumDiffer(ua , ub ) is, the smaller  (u a , ub )
update the user similarity calculation results run offline
improve the speed of recommendation, which solve the
several week a time.
is. Then we can balance the results calculated by the
V. Experimental results and analysis
traditional cosine similarity algorithm, so as to get more
To
accurate similarity between users.
verify
the
superiority
of
the
improved
collaborative recommendation algorithm, we make the
B. Improved similarity algorithm
Imp_sim (u a , u b )  sim (u a , u b )   (u a , ub )
following experimental design that compared the
(7)
traditional
collaborative
filtering
recommendation
algorithm.
Here, improved user similarity calculation method
A. Dataset
Imp_sim (u a , u b ) is based on the traditional Adjusted
In the experiments ,we use data from our MovieLens
cosine similarity algorithm of adding a balance index
recommender system. MovieLens is a web-based research
 (u a , ub ) to the results. Then use the similarity
recommender system that debuted in Fall 1977. Each
week hundreds of users visit MovieLens to rate and
calculation result neighbor set, and recommend projects
receive recommendations for movies. The MovieLens
according to the formula (2) at last.
data set contains more than 100 thousand ratings, more
The user similarity calculation flow chart shown
below:
than 940 users and 1680 movies. In this data set, the
user's score in the range of 1-5, "5" means "like very
much", "1" means "do not like", and the sparsity of the
data is 93.7%. Randomly 10 thousand ratings of the
MovieLens data set are chosen into this experiment and
randomly divided into 70 and 30% that 30% selected as
RMSE 
part of the test set, and the remaining 70% as part of the
N
( p
j 1
training set.
B. Metrics
j
 rj ) 2 / N
(9)
Here, p i is the predicted score to target user on the
The accuracy of the recommendation system is the
most basic indicator. The ultimate aim of the improved
project i j ,
N is the count of projects predicted and r j
collaborative recommendation algorithm is to improve
the accuracy of the results in this paper, thus we mainly
is the real score to target user on the project i j .
consider on the accuracy of the algorithm. In order to
evaluate the recommendation accuracy of the improved
C. Result Analysis
recommendation algorithm, we used the root mean square
error (RMSE) and the mean absolute error (MAE) to
In the experiment, the weight index of the balance
factor
 (u a , ub )
RMSE are measures of the deviation of recommendations
The
neighbor
from their true user-specified values. RMSE and MAE
Experiments were done to compare the results obtained
values can be obtained by calculating the score deviation
under the different neighbor recommended set condition.
between the actual score and the predicted score between
Comparison of recommendation accuracy at the condition
users. The lower the value of RMSE and MAE, the higher
of different
the accuracy of the algorithm recommended. Formally,
Figure 4 ,Figure 5 and Figure 6.
measure the effect of recommended system. MAE and
N
MAE 
p
j 1
j
set
is K  {10,20,30,40,50,60} .
 is shown as TABLE I and TABLE II,
 rj
(8)
N
TABLE I.
MAE
corrected after repeated experiments.
MAE in the condition of different

and different K based ACSA
0.70
0.80
0.85
0.88
0.90
0.92
0.95
10
0.89816
0.88376
0.90487
0.88718
0.86898
0.90601
0.89816
20
0.82951
0.81764
0.82472
0.83788
0.82653
0.82275
0.82951
30
0.8222
0.80508
0.81241
0.82005
0.80155
0.80717
0.8222
40
0.81525
0.78634
0.80916
0.81897
0.79985
0.80924
0.81525
50
0.8173
0.81802
0.80825
0.81277
0.79139
0.78732
0.8173
60
0.78884
0.79861
0.80417
0.78817
0.79062
0.80871
0.78884
MAE
0.82602
0.81938
0.81658
0.8202
0.80732
0.8206
0.84417
TABLE II. MAE in the condition of different

and different K based PSA
MAE
0.70
0.80
0.85
0.88
0.90
0.92
0.95
10
0.89716
0.88476
0.88387
0.85618
0.83998
0.86101
0.89716
20
0.81851
0.80664
0.81372
0.82798
0.81763
0.81375
0.81851
30
0.81222
0.81508
0.81341
0.82125
0.80045
0.80727
0.81222
40
0.80625
0.78734
0.79998
0.80797
0.79875
0.79824
0.80625
50
0.80773
0.80912
0.79975
0.80157
0.78799
0.78752
0.80773
60
0.78884
0.79861
0.80997
0.78707
0.78282
0.80891
0.78884
MAE
0.82172
0.81647
0.81938
0.81790
0.80460
0.81264
0.82120
recommendation results when
=0.80
=0.85
=0.88
=0.90
=0.92
=0.95
0.90
0.87
results
from
MAE
 =0.90, so that the all
improved
collaborative
filtering
algorithm are obtained under the condition of  =0.90.
Experiments were done compared the results
obtained
0.84
the
in
different
neighbor
recommended
set
condition.
The user's neighbors set increased from 10 to 60,
0.81
interval to 10. Then we observed recommended effect of
0.78
user neighbor set under different conditions,and regarded
10
20
30
40
50
60
RMSE and MAE as evaluation criteria of recommended
K
quality. Comparison of recommendation accuracy is
Figure 4. MAEbased on ACSA
shown as Figure 7 and Figure 8.
1.0
=0.80
=0.85
=0.88
=0.90
=0.92
=0.95
0.90
0.88
0.86
0.9
MAE
MAE
0.84
CF
ICFOS
0.82
0.8
0.80
0.78
0.7
0.76
10
20
30
40
K
Figure 5. MAE based on PSA
50
10
60
20
30
40
50
60
K
Figure 7. MAE under different neigbors' data set
0.86
CF
ICFOS
1.2
0.84
RMSE
MAE
0.82
1.1
0.80
1.0
0.78
0.80
Figure 6.
0.85
MAE in
 0.90
0.95
1.00
the condition of different 
10
20
30
40
50
60
K
Figure 8. RMSE under different neigbors' data set
These figures show the results of the MAE
As can be seen from the figure, the effect of the
recommended overall trend is relatively low and MAE
is at the lowest point when  =0.90. It means that in this
improved recommendation algorithm whose neighbor set
paper, the improved algorithm can get a more accurate
effect of conventional recommendation algorithm. The
ranging from 10 and 60 is significantly better than the
reason is that we added a scoring scale index difference
between users, making a more accurate calculation of the
user similarity, thus making a more accurate prediction
[11]
score. But it also can be seen from the figure, when the
selected neighbor set is small, the results obtained by the
[12]
traditional collaborative filtering is not ideal, obviously
there is a data sparse problem. However, the improved
recommended
algorithm
can
also
get
[13]
better
recommendation results which data sparsity problem of
[14]
the system is in the presence.
VI.
[15]
Conclusion
This paper analyzes the ignored problem of
[16]
traditional collaborative filtering technology to individual
project rating scale difference between users, taking the
balance factor  (u a , ub ) into the traditional similarity
computing
in
recommendation
the
study
algorithm,
of
the
which
collaborative
improved
the
collaborative recommendation. Experimental results show
that the method can effectively solve the problem of the
high similarity of neighbor set caused by user rating scale
of individual project differences, which enhanced the
similarity of user accuracy, and perfected the quality of
the
recommended.
The
improved
collaborative
recommendation algorithm is bound to have a good
application prospect.
REFERENCES
[1]
L. Lu, M. Medo, C. H. Yeung, Y. C. Zhang, Z. K. Zhang, and T.
Zhou, Recommender systems, Physics Reports,vol. 519, no. 1,
pp.1-49, Oct. 2012.
[2] R. Burke, Hybrid recommender systems: Surveyand experiments,
User Modeling and User-AdaptedInteraction, vol. 12, no. 4, pp.
331-370, Nov. 2002.
[3] Kumar R, Verma B K, Rastogi S S. Social Popularity based SVD++
Recommender System [J]. International Journal of Computer
Applications, 2014, 87:33-37
[4] Pirasteh P, Jung J J, Hwang D. Item-Based Collaborative Filtering
with Attribute Correlation: A Case Study on Movie
Recommendation [M]. Intelligent Information and Database
Systems. Springer International Publishing, 2014: 245-252.
[5] Pitsilis G, Knapskog S J. Social Trust as a solution to address
sparsity-inherent problems of Recommender systems [J]. arXiv
preprint arXiv,2011,19:332-344.
[6] Li Bin. Cross-domain Collaborative Filtering: A Brief Survey [C].
Proceedings of the 23rd International Conference on Tools with
Artificial Intelligence. [S. l. ]:IEEE Press,2011:1085-1086.
[7] Ning X,Karypis G. Multi-task Learning for Recommender
System[C]. Proceedings of the 2nd Asian Conference on
Machine Learning. Tokyo,Japan:[s. n. ],2010:269-284.
[8] Tsai C F, Hung C. Cluster ensembles in collaborative filtering
recommendation[J]. Applied Soft Computing, 2012, 12(4):
1417-1425
[9] Zhang Juan, Hu Xuegang, Zhang Yuhong, et al. An efficient
ensemble method for classifying skewed data streams[C].
Proceedings of the 7th International Conference on Intelligent
Computing:Bio-inspired Computing and Applications. 2011:
144-151.
[10] Li Bin,Yang Qiang,Xue Xiangyang. Transfer Learning for
[17]
Collaborative Filtering via a Rating-matrix Generative Model[C] .
Proceedings of the 26th Annual International Conference on
Machine Learning. Quebec, Canada:[s. n. ],2009:617-624.
Singh A P, Gordon G J. Relational Learning Via Collective Matrix
Factorization[C]. Proceedings of the 14th ACM SIGKDD
International Conference on Knowledge Discovery and Data
Mining. [ S. l. ]:ACM Press,2008:650-658.
Pan Weike,Evan W X,Lin N,et al. Transfer Learning in
Collaborative Filtering for Sparsity Reduction [ C ] . Proceedings
of the 24th AAAI Conference on Artificial Intelligence. [S.
l. ]:AAAI Press,2010:230-235.
Pan Weike,Evan W X,Yang Qiang. Transfer Learning in
Collaborative Filtering with Uncertain Ratings[C] . Proceedings of
the 26th AAAI Conference on Artificial Intelligence. Toronto,
Canada: AAAI Press, 2012: 662-668.
L. Ungar and D. Foster, “Clustering methods for collaborative
filtering,” in Recommender Systems—Papers From the AAAI
Workshop, Madison, WI, July 1998.
T. Hofmann and J. Puzicha, “Latent class models for collaborative
filtering,” in Proc. 17th Int. Joint Conf. Artificial Intelligence,
1999, pp.688–693.
J. Konstan, B. N. Miller, D. Maltz, J. L. Herlocker, L. Gordon,
andJ. Riedl, “Applying collaborative filtering to usenet news,”
Commun. ACM, vol. 40, no. 3, pp. 77–87, Mar. 1997.
Paul Resnick, Neophytos Iacovou, Mitesh Suchak,Peter Bergstron,
John Riedl. Grouplens. An open Architecture for Collaborative
Filtering of Net News[C]. Proceeding of the 1994 ACM
conference on Computer supported cooperative wookm,
1994:175-186.