Why Semantic Analysis is Better than Sentiment Analysis A White Paper by T.R. Fitz-Gibbon, Chief Scientist, Networked Insights Why semantic analysis is better than sentiment analysis “I like it,” “I don’t like it” or “I have no opinion” – sentiment is widely used to measure how customers view a company’s products and services. After all, who doesn’t want to be liked? But does sentiment tell you what you really need to know? Sometimes it does, for example, when you want to understand what people are saying that could affect your brand image. Or you may be interested in how your product fares in a straight-up comparison with a competitor’s. Other times, though, sentiment may not provide the insights you’re after. This can be especially true when you’re trying to wade through the huge numbers of mentions and comments appearing in the social media world. A promising alternative to sentiment analysis is “semantic analysis.” Don’t be turned off by the name. Simply put, semantic analysis is a way to distill and create structure around mountains of unstructured data – blog posts, social network chatter, tweets and more – without preconceived ideas of whether or how they are related. Networked Insights’ new Topic Discovery Engine (TDE) is a semantic analysis system finely tuned to discover topics in social media posts. Semantic analysis refers to a group of methods that allow machines to discover the fundamental patterns of words or phrases that act as building blocks in a large set of text. Topics, themes, sentiment and similar elements of meaning appear as intricate weavings of those fundamental patterns. In fact, a valuable type of semantic analysis is topic discovery: the summarization of large amounts of text by automatically discovering the topics and themes within. Networked Insights’ new Topic Discovery Engine (TDE) is a semantic analysis system finely tuned to discover topics in social media posts. networkedinsights.com 608.237.1867 [email protected] © Networked Insights, Inc. 2011, all rights reserved 2 By grouping social media posts based on semantic similarity, rather than preset sentiment categories such as positive, negative and neutral, TDE can help you uncover important information – for example, what exactly people are saying about your product or service; where and how they use it; the features they use most; and the enhancements or new offerings they’re interested in. All of this information can ultimately drive product development, new revenue streams, and strategies for marketing, advertising and media planning. Why sentiment falls short One problem with sentiment analysis is what it cannot tell you because it only considers a small amount of the available data. Our experience shows that, on average, only about 10 percent of posts actually contain sentiment, either positive or negative — and that’s a generous estimate (Figure 1). This means nine out of 10 posts are neutral, revealing no sentiment, and are effectively being ignored by the analysis. Thus, with sentiment analysis you’re making decisions based on what only 10 percent of the posts are saying. Percentage of posts that contain sentiment 100 90 80 70 60 50 40 30 The 90 percent of posts that do not reveal sentiment are not all irrelevant; they just don’t fall cleanly into the restrictive positive-negative view of semantics or meaning that sentiment analysis adheres to. For example, many posts about a particular smartphone may come from dedicated, loyal fans who simply have questions about using the device. These are potentially valuable posts as they indicate what users want from the device, problems they may be having with its and features that could be improved. However, customer questions such as these are rarely classified as positive or negative, so they would be missed by sentiment analysis. A second problem with sentiment analysis deals with statistical confidence in data. All methods of sentiment analysis rely on example data to design, test or validate the analysis. The accuracy and value of sentiment analysis is directly dependent on the quality or confidence of the example data. networkedinsights.com 608.237.1867 [email protected] 20 10 0 Positive Negative None Unknown Figure 1 Data is based on a 500-post sentiment study we conducted. The posts were classified by 20 people each. Posts were assigned to a sentiment category based on a majority vote. Only about 10% of posts were found to contain sentiment. © Networked Insights, Inc. 2011, all rights reserved 3 Many companies report that, on average, approximately 65 to 75 percent of readers agree on the sentiment of a post. Assuming one of these companies asks four people about the sentiment of each post, which is very likely, statistics tells us that the company is no more than 35 percent confident it actually has a positive post when its readers identify one. The graph at the right demonstrates this fact. Data with such low confidence is a poor foundation for sentiment analysis and largely leaves it up to chance – ask a different set of four readers or use a different set of posts, and results could be drastically different. Confidence intervals for a sample size of four readers 100 90 95% 35% 80 70 Percent agreement Because sentiment is subjective, this example data is based on majority opinion rather than truth. For practical reasons, we cannot determine the majority opinion of all readers for each post. Instead, the example data is obtained from a small sample of human readers labeling posts with the type of sentiment they contain (for example: positive, negative or neutral). 60 50 40 30 20 10 Sentiment analysis is not inherently bad; for particular types of questions, it may be the right tool. But if you use it, make sure the data underlying the analysis is sound and valuable data is not being ignored. Semantic analysis gives you much more If you really want to discover and understand the conversations around your company, products, services and brand, you need to be open to what all of the data tells you. Semantic analysis is a better way to do that than sentiment analysis for several reasons. In contrast to sentiment analysis, semantic analysis can take every post from a data set into account and can even identify clear trends within groups of posts. networkedinsights.com 608.237.1867 [email protected] 0 Sentiment of a post When three out of four readers agree on the sentiment of a post, 35% is the highest confidence interval that ensures a majority of readers would considered a post positive. Normally, statistical significance at the 95% level is desired (for research and opinion polls). Most sentiment data only achieves statistical significance at the 35% level. Thus, most sentiment data is not statistically significant (at the 95% level). © Networked Insights, Inc. 2011, all rights reserved 4 It’s not limited to a positive-negative framework and doesn’t exclude neutral posts, unlike sentiment analysis in the smartphone example previously discussed. In this way, semantic analysis gives you clear insights into what’s happening in the aggregate across a large number of posts without your having to read all of them, an inefficient or impossible task. In short, semantic analysis can find any trend in the data as long as it exists in significant enough numbers. In the end, it’s about you and what you’re looking for iPhone 4 dual core buy an iPad retina display next gen iPads price drop Verizon leak Motorola Xoom PlayBook, RIM Android guess iPad 2 specs Android, Google HTC Flyer, tablet Android Honeycomb A final advantage of semantic analysis is unique to Networked Insights. Our TDE uses an advanced form of semantic analysis to produce “topic trees” – it organizes the topics it discovers into a tree-like structure, allowing you to drill into a topic to see the subtopics within it. A tree structure is highly effective for organizing large amounts of data. It makes the process of finding valuable insights, quite literally, exponentially faster than having to search a flat set of topics. iPad 2 Android Tablet Another important advantage of semantic analysis is that it isn’t restricted by a narrow view of meaning or semantics. Sentiment, after all, is semantics: “What is the author trying to communicate in this post?” But people rarely post to a social network with the intent of simply expressing that they either like or dislike a product, company or idea; most forms of meaning are more complex and varied. Semantic analysis reveals the meaning or topics that sentiment analysis ignores. Networked Insights’ “topic tree” using semantic analysis Our TDE uses an advanced form of semantic analysis to produce “topic trees” – it organizes the topics it discovers into a tree-like structure, allowing you to drill into a topic to see the subtopics within it. The size of the node represents volume of conversation. Ultimately, you are the best judge of information about your company. You understand your domain best, which topics are important and which are not. At the same time, it’s important to inject subjectivity into the process as late as possible to avoid biasing the analytic results. networkedinsights.com 608.237.1867 [email protected] © Networked Insights, Inc. 2011, all rights reserved 5 Semantic analysis with TDE considers these factors. Rather than having a machine or human readers judge the subjective sentiment of every post and then aggregate some output, TDE groups similar posts and summarizes the topics. Then, at the last stage, you or another qualified professional can examine the output and decide which topics are relevant, which are not and what they mean in the given context. A tool for these times Social media information is expanding at a challenging pace, and valuable nuggets can come from the most unexpected places. Semantic analysis with TDE can help you harness and make sense of it all. Most exciting, automatic topic discovery with TDE gives you tremendous latitude around how you approach the analysis. You don’t have to be certain about what you’re looking for. Instead, it’s a journey to discovery, not a set path that may lead to inadequate insights or misleading conclusions. With TDE’s semantic analysis, you can cost-effectively learn volumes about how your company and your products and services are being judged in the marketplace – so much that you’ll have little time to be sentimental. Social media information is expanding at a challenging pace, and valuable nuggets can come from the most unexpected places. Semantic analysis with TDE can help you harness and make sense of it all. We love the challenge of finding insights in all this data – our challenge is your success! Networked Insights was founded in 2006 by industry leaders and seasoned entrepreneurs in the fields of social media and customer intelligence. Headquarters are in Madison, WI, with offices in New York and Chicago. T.R. Fitz-Gibbon is the chief scientist at Networked Insights. His team designs the Natural Language Processing and Artificial Intelligence algorithms that power the company’s software. His background is in electrical engineering, computer engineering, and computer science with a focus on machine learning. T.R.’s passion lies in using machine learning and big-data techniques to find great solutions to problems that are too large and complex to have perfect solutions. networkedinsights.com 608.237.1867 [email protected] © Networked Insights, Inc. 2011, all rights reserved 6
© Copyright 2026 Paperzz