An Improved Collaborative Recommendation Algorithm Based Optimized User Similarity Chen Hao ,Li Zhongkun,Hu Wei College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410000, China E-mail:[email protected] Abstract—There are lots of issues existed in traditional I. collaborative filtering recommendation algorithm such as Introduction Collaborative filtering recommendation system is data scarcities, cold boot, recommendation accuracy and timeliness. To resolve these problems, a large number of one domestic and foreign scholars put up a variety of solutions, recommended to a certain degree, which achieved relatively desired results. recommendation But there are still existed neglected issues of rating scale development of personalized recommendation technology, difference for all projects in the traditional collaborative personalized recommendations for e-commerce has filtering algorithm while calculating the similarity between brought enormous commercial interests, and it serves the users within the project set. Despite the Adjusted Cosine public in all areas of social life for the efficient, accurate Similarity Algorithm and the Pearson Similarity Algorithm and personalized features. By the join efforts of have improved the issue, it is still existed that users have the researchers of all the industries, the research and problem of single rating scale difference of the project. development of this technology have never stopped. When the users have significant differences for the score Personalized recommendation technology has become vectors on a common set, they may have chance to get increasingly similar resultant vector results. The substantial presence of technology this kind of phenomena have a direct impact on the recommended accuracy of user similarity calculation, furthermore, it will commodities quantities and network users, collaborative affect the target user's predicted score accuracy. As to the filtering recommendation technology faces multiple problem, we try to join a balancing factor into the challenges, such as timeliness and data scarcities, but this traditional cosine similarity algorithm, which is used for did not affect the popularity of collaborative filtering calculating the project rating scale differences between users. technology totally. Researchers improve their quality of Besides, improved collaborative filtering, such as collaborative filtering collaborative filtering recommendation algorithm based on based on clustering, the probability of collaborative optimized user similarity (ICFOS). The effectiveness of the filtering algorithms, collaborative filtering based on scheme is verified through the experiment. From the result neural we can get the conclusion that the balancing factor can decomposition, based on a variety of models such as the largely improve the accuracy of user similarity, and get probability model, Bayes model abstraction , maximum better recommendation effects. entropy model, Gibbs abstract, linear regression, also Index Terms—collaborative recommendation; user similarity; shine. we are planning to rating scale difference; balance factor. propose an of the most successful techniques application among technologies. mature , and becomes one technique. networks, the of personalized With the collaborative of With collaborative the the filtering most the rapid widely increasing filtering, matrix Based on these studies, the paper take advantage of user ratings history information, a single score data discrepancies between users and the combination with F, Hung C proposed a phased score predicting method[8], traditional user similarity measure algorithm, proposed which used clustering method of initial scoring matrix to collaborative recommendation algorithm based on the cluster users and merchandise firstly, and then predicted improvements of user similarity, and it verified the and recommended the commodity by the weighted feasibility of non-negative matrix factorization method on the results of through earlier commodity clustering. Zhang Juan, Hu Xuegang, and collaborative performance recommendation improvements algorithm experiments . Zhang Yuhong proposed a clustering algorithm which II. The applied to the user's collaborative recommendation[9];In Related Work basement of collaborative recent years, multiple areas of learning have become a filtering new research direction [6-7] , there have been many recommendation algorithm is similar preferences of scholars who began to focus on learning methods to similar users. Finding synergy target audience is one of migrate applications to collaborative filtering algorithm. the core steps of collaborative filtering algorithm. Migration scholars proposed many learning methods User-based collaborative filtering algorithm is used to corresponding collaborative filtering algorithm. There are find the target user's nearest neighbor to coordinate target model-based migrating approaches such as Rating Matrix users. In order to improve the accuracy of collaborative Generative filtering recommendation technology, solve the cold start Model, Factorization, CMF RMG-M[10], [11] Collective Matrix , Coordinate System Transfer, problem and reduce the computational complexity, CST scholars in the user similarity computing and nearest are also existed, such as the method of Transfer by neighbor users selected aspects of a lot of research. Linas Integrative Factorization, TIF[13]. There are some scholars Baltrunas and Francesco Ricci proposed project-based based on the improvement of social network structure context-aware splitting collaborative filtering algorithm, similarity calculation. Based on a social network diagram which will split the project in two contexts condition and to calculate the distance , R, Zheng, in 2007, proposed a assessment with the matrix decomposition synthetic data method uses a weighted distance as a similarity index sets and neighbor CF. Recommended system can correction between users. The closer the users become, establish the user's interest model based on users’ past the larger similarities they have. However, with the rapid behavior information,and recommend the goods that increase of Internet users, the computational complexity their behavior had not yet produced to users by of the distance between the users grows rapidly personalized recommendation algorithm. In order to accordingly. To some extent, the performance of improve filtering recommendation algorithm has been improved and the recommendation, researchers have proposed a variety of cold starting problem has been alleviated, more recommendation algorithm. For example, Piraste reduced importantly, it opened a totally new perspective for the matrix scarcities and solved the problem of cold start research staffs. the quality of collaborative through the use of information and the type of film directors [4] [12] , etc. Besides, sample-based migrating methods These studies solved the problems of cold start and , which requires the use of additional data sparseness in the collaborative filtering algorithm to information and directing Genre;Pitsilis established a a certain extent, which improved the recommendation hypothetical trust relationship through the usage of a performance. However, they all did not take the existence system user ratings data. By this way you can solve some of a large number of phenomena into account while cold-start problems and scarcities problems[5] , but the calculating the user similarity. When two users were quite trust relationships in this method is not the true social different facing grading vector and scores, it is easily to networks problems;Kumar used a matrix decomposition get a similar sum vector, which leads users to get high technique[3] to reduce the dimension of the matrix to similarity. It is assumed that the user's score two vectors, improve the accuracy of recommendation systems;Tsai C respectively , r (r1 , r2 , r3 , r4 ) r ' (r1' , r2' , r3' , r4' ) , r '' (r1'' , r2'' , r3'' , r4'' ) ,as shown in Figure 1: r4 r1 r3 r2 r4'' r1'' ' 1 r r 2' r 3' r2'' r3'' r4' r '' r' r Figure 1.Comparison of different sum vectors Obviously, it can be seen from the Figure 1: There is a great scale of differences among vector r1 , r2 , r3 , r4 , '' '' '' '' r1' , r2' , r3' , r4' , r1 , r2 , r3 , r4 Adjusted Cosine Similarity Algorithm and the Pearson Similarity Algorithm. The Cosine Similarity Algorithm: . But their sum vectors r 、 r ' 、 r '' R sim (u a , u b ) has a highly similarity. In real data, there are a huge number of this type of data. If there is merely a simple iI adjusted cosine similarity algorithm used to calculate user sim (u a , u b ) user similarity calculations deviation. a ,i iI a , b (1) iI R b,i 2 ( Ra ,i - R a )( Rb,i - R b ) 2 (Ra ,i - R a) iI a In view of this, scholars proposed the Adjusted Cosine Similarity Algorithm(ACSA) and the Pearson (2) (R b,i - R b ) 2 iI b The Pearson Similarity Algorithm: Similarity Algorithm(PSA) which have taken the rating sim (u a , u b ) scale difference between users into account. The iI a , b ( Ra ,i - R a )( Rb,i - R b ) 2 (Ra ,i - R a) iI a , b experiment proved the two algorithms to some extent, III. 2 The Adjusted Cosine Similarity Algorithm: similarity, it will inevitably leads to a large number of which improved the accuracy of user similarity. Ra ,i Rb ,i iI iI a , b Equation (1)(2)(3): R a ,i is rating of user (R b,i - R b ) (3) 2 u a on item i. User-based collaborative filtering algorithm User-based collaborative filtering algorithm need to Here Ra is the average of the user’s ratings. I a ,b calculate the similarity with the target users of other users firstly; Afterwards, we choose a higher similarity value M Indicates the items that user u a and u b co-evaluated. users compose nearest neighbor set. Then predicting the target user's all neighbor users target items in the Ia is item set that user ua rated. Ib is item set that collection all ratings is necessary. And then we have to calculate the scores in descending order, select the top-N user ub rated. items scores in descending order of recommendation to the target user. B. Generate neighbor set Neighbor set is a collection of the users that has A. The Cosine Similarity Algorithm When calculating the users’ similarity, there are similar preferences with current target user. The technology called KNN(K-Nearest-Neighbor)[17] is mainly three kinds of similarity measure between users. always used for selecting neighborhood in user-based Including the Standard Cosine Similarity Algorithm, the collaborative recommendation system. It used the similarity as a weight to select top-K user as a neighbor score vector is R2 (3,2,2,1) . The user 1’s rating and the set to the target user. user 2’s rating difference vector on the project is C. Generate recommendation set Rdiffer2 (1,0,0,1) . As shown in the following figure After the neighbor set of the target user is selected, it combined with all the neighbors’ scores of the project and compares their vectors. the similarity between the users to predict the target user's R1 (5,4,4,3) scores on the test project. Selecting Top-N scores of data R2 (3,2,2,1) from the result set as the recommended results. It is assumed that the target user is u , the test program is i, the predicted score of i is: Pu ,i Ru lNu sim (u, l ) ( Rl ,i Rl ) lNu Equation (4): Nu sim (u, l ) (4) Rdiffer 2 (1,0,0,1) Rdiffer1 (1,0,0,1) Figure 2.Comparison of different sum vectors is the nearest neighbors set of From the Figure 2 shows, when the Adjusted cosine user u and Rl ,i is rating scores of user l on item i. similarity algorithm’s and Pearson similarity algorithm’s score on a single project has a large difference in the IV. collection of items, there will be a high degree of user Improved filtering algorithm based on user similarity phenomenon. Although this phenomenon is similarity Finding the collaborative neighbor set of the target user accurately is the core of collaborative filtering algorithm. By calculating the similarity between two users, traditional user-based collaborative filtering algorithm find the target user's top-K nearest neighbors, then achieve the recommendation by the nearest neighbor. Thus the accuracy of user similarity algorithm will directly affect the performance of the algorithm recommended. However the traditional cosine similarity algorithm does not take a phenomenon abound into based on the users’ average score that there is a certain rationality, it is also not a normal phenomenon even if the results is obtained with a high degree of similarity. In the case of a large amount of data, this phenomenon is also reasonable to happen, so that it is also necessary to improve this phenomenon. The next section will present the concept of the balance factor aimed at improving the problem. A. Balance factor The user-based collaborative filtering account when calculating the similarity of users. When recommendation algorithm has the problem of the high two users’ scoring vector and single score were quite degree of user similarity when calculating user similarity different, it also very easy to get a similar sum vector, on the condition of high differences of users’ single rating which leads users to get high similarity. Improved scale. The currently proposed algorithm doesn’t regard collaborative the the differences of users’ single rating scale as a weight to introduction of a balancing factor, which considering the balance the similarity calculation results. To solve this single difference between users within the neighbor set. problem, this paper presents the concept of balance factor, Therefore, to get a set of similar users with better which will take the differences of users’ rating scale into recommendation quality, we need combine traditional account to the user similarity calculation to make up for algorithms and proposed a similar balance factor. That is to say, it considered the similarity between the user and this shortcoming of traditional similarity calculation method. The scale differences between user u a and user the single score difference in the neighbor set. In order to u b is calculated as follows: recommendation algorithm is prove the existence of the phenomenon, we assume that the user 1’s score vector is R1 (5,4,4,3) and the user 2’s sumDiffer( u a , u b ) iIa ,b ( Ra ,i Rb ,i ) 2 user1 (5) user2 M Balance factor is calculated as follows: (ua , ub ) sumDiffer(u a ,ub ) ,0 1 (6) co-items Here sumDiffer (ua , ub ) is the scale differences sumDiffer (ua , ub ) between user u a and user u b , (u a , ub ) is the balance factor between user u a and user u b , is (ua , ub ) weight index of balance factor that requires repeated tra-similarity algorithm amendments to get a relatively accurate value, I a ,b is the items set that user u a and u b co-evaluated, R is item’s user similarity rating of user and M is the count of I a ,b . By calculated we can get sumDiffer(ua , ub ) [0,4] , (u a , ub ) (0,1) . When the Figure 3. User similarity calculation flow chart score difference between users sumDiffer (ua , ub ) is low, the balance factor proposed in this paper (u a , ub ) is tending to be 1. So that there will The user similarity calculation process can run offline, so you can reduce the recommended running time and be less impact on the results of the initial similarity problem of the real-time for recommending. We can calculation. However, the higher the score difference between users sumDiffer(ua , ub ) is, the smaller (u a , ub ) update the user similarity calculation results run offline improve the speed of recommendation, which solve the several week a time. is. Then we can balance the results calculated by the V. Experimental results and analysis traditional cosine similarity algorithm, so as to get more To accurate similarity between users. verify the superiority of the improved collaborative recommendation algorithm, we make the B. Improved similarity algorithm Imp_sim (u a , u b ) sim (u a , u b ) (u a , ub ) following experimental design that compared the (7) traditional collaborative filtering recommendation algorithm. Here, improved user similarity calculation method A. Dataset Imp_sim (u a , u b ) is based on the traditional Adjusted In the experiments ,we use data from our MovieLens cosine similarity algorithm of adding a balance index recommender system. MovieLens is a web-based research (u a , ub ) to the results. Then use the similarity recommender system that debuted in Fall 1977. Each week hundreds of users visit MovieLens to rate and calculation result neighbor set, and recommend projects receive recommendations for movies. The MovieLens according to the formula (2) at last. data set contains more than 100 thousand ratings, more The user similarity calculation flow chart shown below: than 940 users and 1680 movies. In this data set, the user's score in the range of 1-5, "5" means "like very much", "1" means "do not like", and the sparsity of the data is 93.7%. Randomly 10 thousand ratings of the MovieLens data set are chosen into this experiment and randomly divided into 70 and 30% that 30% selected as RMSE part of the test set, and the remaining 70% as part of the N ( p j 1 training set. B. Metrics j rj ) 2 / N (9) Here, p i is the predicted score to target user on the The accuracy of the recommendation system is the most basic indicator. The ultimate aim of the improved project i j , N is the count of projects predicted and r j collaborative recommendation algorithm is to improve the accuracy of the results in this paper, thus we mainly is the real score to target user on the project i j . consider on the accuracy of the algorithm. In order to evaluate the recommendation accuracy of the improved C. Result Analysis recommendation algorithm, we used the root mean square error (RMSE) and the mean absolute error (MAE) to In the experiment, the weight index of the balance factor (u a , ub ) RMSE are measures of the deviation of recommendations The neighbor from their true user-specified values. RMSE and MAE Experiments were done to compare the results obtained values can be obtained by calculating the score deviation under the different neighbor recommended set condition. between the actual score and the predicted score between Comparison of recommendation accuracy at the condition users. The lower the value of RMSE and MAE, the higher of different the accuracy of the algorithm recommended. Formally, Figure 4 ,Figure 5 and Figure 6. measure the effect of recommended system. MAE and N MAE p j 1 j set is K {10,20,30,40,50,60} . is shown as TABLE I and TABLE II, rj (8) N TABLE I. MAE corrected after repeated experiments. MAE in the condition of different and different K based ACSA 0.70 0.80 0.85 0.88 0.90 0.92 0.95 10 0.89816 0.88376 0.90487 0.88718 0.86898 0.90601 0.89816 20 0.82951 0.81764 0.82472 0.83788 0.82653 0.82275 0.82951 30 0.8222 0.80508 0.81241 0.82005 0.80155 0.80717 0.8222 40 0.81525 0.78634 0.80916 0.81897 0.79985 0.80924 0.81525 50 0.8173 0.81802 0.80825 0.81277 0.79139 0.78732 0.8173 60 0.78884 0.79861 0.80417 0.78817 0.79062 0.80871 0.78884 MAE 0.82602 0.81938 0.81658 0.8202 0.80732 0.8206 0.84417 TABLE II. MAE in the condition of different and different K based PSA MAE 0.70 0.80 0.85 0.88 0.90 0.92 0.95 10 0.89716 0.88476 0.88387 0.85618 0.83998 0.86101 0.89716 20 0.81851 0.80664 0.81372 0.82798 0.81763 0.81375 0.81851 30 0.81222 0.81508 0.81341 0.82125 0.80045 0.80727 0.81222 40 0.80625 0.78734 0.79998 0.80797 0.79875 0.79824 0.80625 50 0.80773 0.80912 0.79975 0.80157 0.78799 0.78752 0.80773 60 0.78884 0.79861 0.80997 0.78707 0.78282 0.80891 0.78884 MAE 0.82172 0.81647 0.81938 0.81790 0.80460 0.81264 0.82120 recommendation results when =0.80 =0.85 =0.88 =0.90 =0.92 =0.95 0.90 0.87 results from MAE =0.90, so that the all improved collaborative filtering algorithm are obtained under the condition of =0.90. Experiments were done compared the results obtained 0.84 the in different neighbor recommended set condition. The user's neighbors set increased from 10 to 60, 0.81 interval to 10. Then we observed recommended effect of 0.78 user neighbor set under different conditions,and regarded 10 20 30 40 50 60 RMSE and MAE as evaluation criteria of recommended K quality. Comparison of recommendation accuracy is Figure 4. MAEbased on ACSA shown as Figure 7 and Figure 8. 1.0 =0.80 =0.85 =0.88 =0.90 =0.92 =0.95 0.90 0.88 0.86 0.9 MAE MAE 0.84 CF ICFOS 0.82 0.8 0.80 0.78 0.7 0.76 10 20 30 40 K Figure 5. MAE based on PSA 50 10 60 20 30 40 50 60 K Figure 7. MAE under different neigbors' data set 0.86 CF ICFOS 1.2 0.84 RMSE MAE 0.82 1.1 0.80 1.0 0.78 0.80 Figure 6. 0.85 MAE in 0.90 0.95 1.00 the condition of different 10 20 30 40 50 60 K Figure 8. RMSE under different neigbors' data set These figures show the results of the MAE As can be seen from the figure, the effect of the recommended overall trend is relatively low and MAE is at the lowest point when =0.90. It means that in this improved recommendation algorithm whose neighbor set paper, the improved algorithm can get a more accurate effect of conventional recommendation algorithm. The ranging from 10 and 60 is significantly better than the reason is that we added a scoring scale index difference between users, making a more accurate calculation of the user similarity, thus making a more accurate prediction [11] score. But it also can be seen from the figure, when the selected neighbor set is small, the results obtained by the [12] traditional collaborative filtering is not ideal, obviously there is a data sparse problem. However, the improved recommended algorithm can also get [13] better recommendation results which data sparsity problem of [14] the system is in the presence. VI. [15] Conclusion This paper analyzes the ignored problem of [16] traditional collaborative filtering technology to individual project rating scale difference between users, taking the balance factor (u a , ub ) into the traditional similarity computing in recommendation the study algorithm, of the which collaborative improved the collaborative recommendation. Experimental results show that the method can effectively solve the problem of the high similarity of neighbor set caused by user rating scale of individual project differences, which enhanced the similarity of user accuracy, and perfected the quality of the recommended. The improved collaborative recommendation algorithm is bound to have a good application prospect. REFERENCES [1] L. Lu, M. Medo, C. H. Yeung, Y. C. Zhang, Z. K. Zhang, and T. Zhou, Recommender systems, Physics Reports,vol. 519, no. 1, pp.1-49, Oct. 2012. [2] R. Burke, Hybrid recommender systems: Surveyand experiments, User Modeling and User-AdaptedInteraction, vol. 12, no. 4, pp. 331-370, Nov. 2002. [3] Kumar R, Verma B K, Rastogi S S. Social Popularity based SVD++ Recommender System [J]. International Journal of Computer Applications, 2014, 87:33-37 [4] Pirasteh P, Jung J J, Hwang D. Item-Based Collaborative Filtering with Attribute Correlation: A Case Study on Movie Recommendation [M]. Intelligent Information and Database Systems. Springer International Publishing, 2014: 245-252. [5] Pitsilis G, Knapskog S J. Social Trust as a solution to address sparsity-inherent problems of Recommender systems [J]. arXiv preprint arXiv,2011,19:332-344. [6] Li Bin. Cross-domain Collaborative Filtering: A Brief Survey [C]. Proceedings of the 23rd International Conference on Tools with Artificial Intelligence. [S. l. ]:IEEE Press,2011:1085-1086. [7] Ning X,Karypis G. Multi-task Learning for Recommender System[C]. Proceedings of the 2nd Asian Conference on Machine Learning. Tokyo,Japan:[s. n. ],2010:269-284. [8] Tsai C F, Hung C. Cluster ensembles in collaborative filtering recommendation[J]. Applied Soft Computing, 2012, 12(4): 1417-1425 [9] Zhang Juan, Hu Xuegang, Zhang Yuhong, et al. An efficient ensemble method for classifying skewed data streams[C]. Proceedings of the 7th International Conference on Intelligent Computing:Bio-inspired Computing and Applications. 2011: 144-151. [10] Li Bin,Yang Qiang,Xue Xiangyang. Transfer Learning for [17] Collaborative Filtering via a Rating-matrix Generative Model[C] . Proceedings of the 26th Annual International Conference on Machine Learning. Quebec, Canada:[s. n. ],2009:617-624. Singh A P, Gordon G J. Relational Learning Via Collective Matrix Factorization[C]. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. [ S. l. ]:ACM Press,2008:650-658. Pan Weike,Evan W X,Lin N,et al. Transfer Learning in Collaborative Filtering for Sparsity Reduction [ C ] . Proceedings of the 24th AAAI Conference on Artificial Intelligence. [S. l. ]:AAAI Press,2010:230-235. Pan Weike,Evan W X,Yang Qiang. Transfer Learning in Collaborative Filtering with Uncertain Ratings[C] . Proceedings of the 26th AAAI Conference on Artificial Intelligence. Toronto, Canada: AAAI Press, 2012: 662-668. L. Ungar and D. Foster, “Clustering methods for collaborative filtering,” in Recommender Systems—Papers From the AAAI Workshop, Madison, WI, July 1998. T. Hofmann and J. Puzicha, “Latent class models for collaborative filtering,” in Proc. 17th Int. Joint Conf. Artificial Intelligence, 1999, pp.688–693. J. Konstan, B. N. Miller, D. Maltz, J. L. Herlocker, L. Gordon, andJ. Riedl, “Applying collaborative filtering to usenet news,” Commun. ACM, vol. 40, no. 3, pp. 77–87, Mar. 1997. Paul Resnick, Neophytos Iacovou, Mitesh Suchak,Peter Bergstron, John Riedl. Grouplens. An open Architecture for Collaborative Filtering of Net News[C]. Proceeding of the 1994 ACM conference on Computer supported cooperative wookm, 1994:175-186.
© Copyright 2026 Paperzz