International Journal of Recent Innovation in Engineering and Research Scientific Journal Impact Factor - 3.605 by SJIF e- ISSN: 2456 – 2084 STRUCTURED DATA ANALYTICS OF SOCIAL NETWORK REVIEWS USING SENTIMENT ANALYSIS - SVM ON HADOOP Rajamohan V.R1 and Dr. M Krishnamurthy2 1 PG Scholar, Department of Computer Science and Engineering, KCG College of Technology, Chennai, India 2 Professor, Department of Computer Science and Engineering, KCG College of Technology, Chennai, India Abstract-There are millions of user store their information in Social Media every day. These data are gathered in Big Data. Those are analysing to get useful information. Sentiment dictionaries have numerous inaccuracies. This could not able to principally categorize the Opinion Results. Sentiment based analysis is the major key in categorizing the user’s Feedback. FSM & EEM Algorithm for the Word processing process is not much efficient to analysis the result. The proposed model will implement SVM algorithm to analyse data from Big Data collected through social network. Twitter like application is created and users Tweets are processed. Users Tweets are the input to the Big Data HDFS System. Data are stored in the Data Nodes. Index is maintained in the Name Node. Tweets are clustered and classified based on Keywords extracted. Tweets are analysed using Sentiment Analysis and Positive & Negative Tweets are classified using SVM algorithm. Map & Reduce is also implemented. These result are used to conclude the analysed data is useful are not. Key Words- Sentiment analysis, opinion poll, social network, SVM algorithm, Big Data, Twitter Analysis. I.INTRODUCTION The digital age, also referred to as the information society, is characterized by ever growing volumes of information. Driven by the current generation of web applications, the nearly limitless connectivity, and an insatiable desire for uploading information, in particular for younger generations, the volume of user-generated social media content is growing rapidly and likely to add even more people in the upcoming future. People using the Web are constantly invited to share their opinions and preferences with the rest of the world, which has led to an explosion of opinionated blogs, reviews on products and services, and comments on tweets pictures or anything. This type of web-based content is more and more recognized as a source of data that has added value for multiple application domains. People use social media every day to upload new information anytime. They belong to many information like news, status, geographic report, and daily breaking information time to time. The opinions expressed in various web and media outlets(e.g., blogs, newspapers) are an important yardstickfor the success of a product or a government policy.For instance, a product with consistently good reviews islikely to sell well. The general approach of determiningthe overall orientation (i.e., positive or negative) of a sentence/document is by analysis of the orientations of the individual words. Sentiment dictionariesare utilized to facilitate the summarization. There arenumerous works that, given a sentiment lexicon, analysethe structure of a sentence/document to infer its orientation,the holder of an opinion, the sentiment of the opinion, etc. Several area has independentsentiment analyses dictionaries that have been manually working or automaticallycreated, e.g., general inquirer (GI), opinion finder (OF), appraisal lexicon (AL), SentiWordNet(SWN) and Q-WordNet (QW). QW and SWN arelexical resources which classify the synsets (senses) in WordNet according to their polarities. We call them sentimentsense dictionaries (SSD). They consist of wordsmanually comment with their corresponding polarities. We have noticed the following problems with the sentiment dictionaries: They exhibit substantial (intradictionary) inaccuracies. Example, European has a negative polarity in QW, while intuitively this synset has a neutral polarity instead. They have (inter-dictionary) inconsistencies. Forexample, the adjective word fast is positive in AL andnegative in OF.These sentiment analyses dictionaries doesn’t address the concept ofpolarity in consistency of words/synsets. We concentrate on the @IJRIER-All rights Reserved -2017 Page 47 Volume: 02 Issue: 04 April– 2017 (IJRIER) concept of (in) consistency in this paper. This paper defines consistency among the polarities ofwords/synsets within and across sentiment dictionariesand give methods to check them. We provide two examplesto illustrate the problem addressed in this paper.The first example is the verbs deny and disprove,which have positive and negative polarities, respectively, inOF. According to WordNet, both words have a uniquemeaning sense, which they share: disprove, deny (prove to be false)”The physicist disproved his colleagues’ theories” Assuming that WordNet has complete information aboutthe two words, it is rather strange that the words have distinctpolarities. By manually checking two authoritativeEnglish dictionaries, Oxford1 and Cambridge, user confirm that the information about deny and disprove in WordNet is the same as that in these dictionaries. So, theproblem seems to originate in OF.The second example is the verbs tantalize and taunt,which have positive and negative polarities, respectively, inOF. They also have a unique meaning sense in WordNet, which they share. Again, there is a contradiction. In this case Oxforddictionary mentions a sense of bait that is missing from WordNet: “excite the senses or desires of (someone)”. Thissense conveys a positive polarity. Hence, bait conveysa positive sentiment when used with this sense.In summary, these dictionaries have conflicting information.Their use in sentiment analysis tasks can give conflictingresults. Manual checking of sentimentdictionaries for inconsistency is a difficult endeavour. Weregard words such as deny and disprove are more inconsistent.We aim to remove these inconsistencies in sentiment analyse dictionaries.Note that the occurrence of inconsistency found viapolarity analysis is not exclusively attributed to one party,i.e., either the sentiment dictionary or WordNet. Instead, asemphasized by the above examples, some of them lie in thesentiment dictionaries, while others lie in WordNet. Hence, a by-product of our polarity consistency analysis isthat it can also found likely places where WordNet needs difficult attention. II. RELATED WORK There are many analysis related on text mining. This paper considers the user tweets based on recent trends or news. This analyse gives the count of majority of the person interested on the topic. This proposed paper invoked based on some existing paper. [8]Pedro Domingos and Geoff Hulten proposed “Mining High-Speed Data Streams” to mining the text with high speed data. Many data are analysing with short period of time. Also they proposed the mapping and search the content to access with accurate text data. They use decision tree to sort and store the data in database. It is easy to retrieve data quick and also search the text data with respective search key from the database. Very fast decision tree (VFDT) learner has poor attribute and high memory to analyse text mining. The drawback of this paper is the result accuracy is complexity of the concept. E. Breck, Y. Choi, and C. Cardie[5] proposed “Identifying expressions of opinion in context” that describes the user expression through text data.Extracting information about subjectivity is an area of great interest to a variety of public and private interests. Author have discussed that successfully completed this research will require the same expression-level identification as in factual information extraction. This method is the first to directly approach the task of extracting these expressions. It identifies the user expression is happy or sad, good or bad. It provides only 5% of the user expression through his research.Also improves to build on this expressionlevel identification towards systems that present the user with a comprehensive view of the opinions expressed in text. A. L. Maas, R. E. Daly, P. Pham, D. Huang, A. Ng, and C. Potts[7], “Learning word vectors for sentiment analysis” paper describes the text mining of large data set to produce robust benchmark for movie review. It’s similar to this paper that hold only movie review. But this paper gathers all user tweet and analyse the required text data from data sets. Author presented a vector space model that learns word representations capturing semantic and sentiment information. The model’s probabilistic foundation gives a theoretically justified technique for word vector induction as an alternative to the overwhelming number of matrix factorization-based techniques commonly used. This model is parametrized as a log-bilinear model following recent success in using similar Available Online at : www.ijrier.com Page 48 Volume: 02 Issue: 04 April– 2017 (IJRIER) techniques for language models (Bengio et al., 2003; Collobert and Weston, 2008; Mnih and Hinton, 2007), and it is related to probabilistic latent topic models (Blei et al., 2003; Steyvers and Grif- fiths, 2006). Parametrize the topical component of this model in a manner that aims to capture word representations instead of latent topics. In this experiments, our method performed better than LDA, which models latent topics directly. We extended the unsupervised model to incorporate sentiment information and showed how this extended model can leverage the abundance of sentiment-labeled texts available online to yield word representations that capture both sentiment and semantic relations. This shows the benefit of such representations on two quest of sentiment classification, using existing datasets as well as a larger one that author release for future research. These tasks involve relatively simple sentiment information, but the model is highly flexible in this regard; it can be used to characterize a wide variety of annotations, and thus is broadly applicable in the growing areas of sentiment analysis and retrieval. C. Danescu-N.-M., G. Kossinets, J. Kleinberg, and L. Lee[2], “How opinions are received by online communities”amazon.com conduct some opinion on employee like “What did Y think of X?”, also, “What did Z think of Y’s opinion of X?” these dataset analyse through social behavior. This relate to current paper that analyse social user text data analyses.Author have seen that helpfulness evaluations on a site like Amazon.com provide a way to assess how opinions are evaluated by members of an on-line community at a very large scale. A resultshows helpfulness depends not just on its data, but also the relation of its score to other scores. This dependence on the score contrasts with a number of theories from sociology and social psychology, but is consistent with a simple and natural model of individual bias in the presence of a mixture of opinion distributions. There are more number of sentiment analyses for this paper research. First, the robustness of our results across independent populations suggests that the phenomenon may be relevant to other settings in which the evaluation of expressed opinions is a key social dynamic. Variations in the effect (such as the magnitude of deviations above or below the mean) can be used to form hypotheses about differences in the collective behaviour of the underlying populations. Finally, it would also be very interesting to consider social feedback mechanisms that might be capable of modifying the effects we observe here, and to consider the possible outcomes of such a design problem for systems enabling the expression and dissemination of opinions. M. Kim and E. Hovy, “Determining the sentiment of opinions” [3] proposed the opinion poll on social information. Sentiment recognition is a challenging and difficult part of understanding opinions. We plan to extend our work to more difficult cases such as sentences with weak-opinionbearing words or sentences with multiple opinions about a topic. To improve identification of the Holder, we plan to use a parser to associate regions more reliably with holders. Author have idea to do other learning techniques, such as decision word lists or SVM. Nonetheless, as the experiments show, encouraging results can be obtained even with relatively simple models and only a small amount of manual seeding effort. III.PROPOSED SYSTEM A. TWITTER APPLICATION This project starts with twitter like application. This application is created using Advanced Java Concepts like JSP and Servlets. In that application user may register new login with their personal details like name, age, sex, mail id, mobile number, user name and passwords. After registration of user, he/she can login to their personal account. He can see the other user tweets and also he make new tweets that other user can see. Server stores the user tweets and maintain the chat domain. Admin has the full control to create the mapping and start the sentiment analysis. B. DATABASE Database stores the user name and password. It’s used to verify the user while login. And also database store and retrieve the user tweets to all other users. New tweets store in database. Admin access the database and collect the required tweets and data. Admin controls the tweets to be gathered from the database. Available Online at : www.ijrier.com Page 49 Volume: 02 Issue: 04 April– 2017 (IJRIER) C. HADOOP Hadoop works on command system. It gathers the initial information from lib and jar file from the package. That package holds the information that hadoop software to run on. Hadoop coding holds the SVM algorithm that counts the tweets that belong to positive and negative tweets. First it gathers the required information from the lib and jar file. Then it separates the data with NameNodes and DataNodes. NameNode stores Meta-Data about the data being stored in DataNodes. DataNode stores the actual Data. In a multinode cluster NameNode and DataNodes are stores on different machines. There is only one NameNode in a cluster and many DataNodes; that’s why we call NameNode as a single point of failure. Although There is a Secondary NameNode (SNN) that can exist on different system which doesn't actually act as a NameNode but stores the image of primary NameNode at certain checkpoint and is used as backup to restore NameNode. JobTracker is a MasterNode which creates and runs the job. JobTracker which also run on the NameNode allocates the job to TaskTrackers which run on DataNodes; TaskTrackers run the tasks and report the status of task to JobTracker. The JobTracker runs on MasterNode aka NameNode whereas TaskTrackers run on DataNodes.The ResourceManager (RM) is responsible for tracking the resources in a cluster, and scheduling applications (e.g., MapReduce jobs). User is free to pick up any storage to set up RM restart, but must use ZooKeeper based state-store to support RM high availability. ZooKeeper based state-store results support fencing mechanism to avoid a split-brain situation where multiple RMs assumes they are active and can edit the state-store at the same time. IV. SVM ALGORITHM SVM algorithms provide the results that count positive and negative tweet of the user. Hadoop command holds the tweets to be analyse through SVM algorithm. The system architecture describes the flow of this paper. It starts with user tweets that store in database. Then the server handles the data to be sentiment analyse through hadoop server. Hadoop uses SVM algorithm to count the positive and negative tweets. SVM algorithm takes input as user tweets and keyword and alternative keyword and files that hold the words that belong to positive or negative words in the tweet. Using these words SVM Algorithm separate the user tweets with respective count. And the result taken to GUI hadoop result page for user view. Figure: SYSTEM ARCHITECTURE SVM algorithm works on the text mining dataset. This first gather the dataset from database. And collect the data then sort the data with the user search keyword to minimize the dataset. Then Available Online at : www.ijrier.com Page 50 Volume: 02 Issue: 04 April– 2017 (IJRIER) these dataset is reduced to required tweets. Then hadoop concept works on these dataset. It separated the data into positive tweet count and negative tweet count to count the user tweet using the keyword stored in the set files in the database. Hadoop take each user tweets and search positive words in it as count on +1 and also count negative keywords as count on -1. The final word count comes positive integer then the user tweet count as positive tweet. Else the user tweet count as negative tweet. IV. RESULT AND ANALYSE Sample result provide the output for the search keyword of “KCG COLLEGE” and alternative keyword “KCTECH” the GUI result page shows the positive tweet count as 1022 and negative tweet count as 213 and overall tweet in database as 51827. The below graph shows the different result analysis with different tweet count. First analysis as KCG COLLEGE is good or bad? User opinion based on this tweet is 1022 user tweeted as positive and 213 users as negative count. Next analysis as Sunandha murder case, 422 users tweeted as murder and 22 users tweeted as suicide. Next analysis as cavery river water issue, 1088 user supported the need for river water and 248 user denied to give the river water to Tamilnadu. And final analyse in this graph as farmer issue, 1180 user support farmer to provide what farmer need and 210 users doesn’t support farmer to provide what they need. Like that this paper analyse many issue and recent trend through user tweets. Figure: GRAPH FOR DIFFERENT ANALYSIS RESULT V. CONCLUSION AND FUTURE WORK The problem of checking polarity consistency for sentiment word dictionaries. This project proves that this problem is NP-complete. And shown that in practice polarity inconsistencies of words both within a dictionary and throughout dictionaries can be obtained using SAT solvers. Sets of inconsistent words are pinpointed and this allows the dictionaries to be improved. This project gives the result of analysed text tweet that holds positive and negative reviews about any topic. Server holds the final output of the analyses. Then administrator decides the final result is positive opinion or negative opinion. This topic uses map-reduce and holds the good result to the user and admin. The admin finalize the topic resultant review is positive or negative. SVM method is used to perform the reduce function through category and sub-category given by the administrator.Future work is based on gather data set from different social media applications. And also gather many data set data to analyse the user opinions. Bayesian data analysis is to improve the parameter of content of the data to analyse the required parameter data. Bayesian data observed the data format and provide the similar data into a collective database to fetch the relevant data. This method helpful to sort the relevant data easily to the map reduce concept. Bayesian data and SVM algorithm provide the Available Online at : www.ijrier.com Page 51 Volume: 02 Issue: 04 April– 2017 (IJRIER) required information to the server or admin to proceed further. Hadoop then gives the count of the user tweets to the administrator. REFERENCES [1] B. Pang and L. Lee, “A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts,” in Proc. 42nd Annu. Meeting Assoc. Comput. Linguistics, 2004, pp. 271–278. [2] C. Danescu-N.-M., G. Kossinets, J. Kleinberg, and L. Lee, “How opinions are received by online communities: A case study on amazon.com helpfulness votes,” in Proc. 18th Int. Conf. World Wide Web, 2009, pp. 141–150. [3] M. Kim and E. Hovy, “Determining the sentiment of opinions,” in Proc. 20th Int. Conf. Comput. Linguistics, 2004, pp. 1367–1373. [4] H. Takamura, T. Inui, and M. Okumura, “Extracting semantic orientations of words using spin model,” in Proc. 43rd Annu. Meeting Assoc. Comput. Linguistics, 2005, pp. 133–140. [5] E. Breck, Y. Choi, and C. Cardie, “Identifying expressions of opinion in context,” in Proc. 20th Int. Joint Conf. Artif. Intell., 2007, pp. 2683–2688. [6] X. Ding and B. Liu, “Resolving object and attribute coreference in opinion mining,” in Proc. 23rd Int. Conf. Comput. Linguistics, 2010, pp. 268–276. [7] L. Maas, R. E. Daly, P. Pham, D. Huang, A. Ng, and C. Potts, “Learning word vectors for sentiment analysis,” in Proc. 49thAnnu. Meeting Assoc. Comput. Linguistics, 2011, pp. 142–150. [8] Pedro Domingos, Geoff Hulten, “Mining High-Speed Data Streams”, Sixth ACM SIGKDD International Conference,2000. [9] Kim Schouten and Flavius Frasincar, “Survey on Aspect-Level Sentiment Analysis,” IEEE Transactions on Knowledge and Data Engineering, Vol. 28, No. 3, March 2016. [10] R. Chen and K. Sivakumar, “Collective Mining of Bayesian Networks from Distributed Heterogeneous Data”, in Proc. Knowledge and Information Systems, march 2014, Volume 6, Issue 2, pp 164–187. [11] L. Jia, C. Yu, and W. Meng, “The Effect of Negation on Sentiment Analysis and Retrieval Effectiveness,” in Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM 2009). ACM, 2009, pp. 1827–1830. [12] M. Thelwall, K. Buckley, and G. Paltoglou, “Sentiment Strength Detection for the Social Web,” Journal of the American Society for Information Science and Technology, vol. 63, no. 1, pp. 163–173, 2012. [13] R. Narayanan, B. Liu, and A. Choudhary, “Sentiment Analysis of Conditional Sentences,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2009). ACL, 2009, pp. 180–189. [14] H. Lakkaraju, C. Bhattacharyya, I. Bhattacharya, and S. Merugu, “Exploiting Coherence for the Simultaneous Discovery of Latent Facets and Associated Sentiments,” in SIAM International Conference on Data Mining 2011 (SDM 2011). SIAM, 2011, pp. 498–509. Available Online at : www.ijrier.com Page 52
© Copyright 2026 Paperzz