Why Semantic Analysis is Better than Sentiment Analysis

Why Semantic Analysis is
Better than Sentiment Analysis
A White Paper by T.R. Fitz-Gibbon, Chief Scientist, Networked Insights
Why semantic analysis is better
than sentiment analysis
“I like it,” “I don’t like it” or “I have no opinion” –
sentiment is widely used to measure how customers view
a company’s products and services. After all, who doesn’t
want to be liked?
But does sentiment tell you what you really need to know?
Sometimes it does, for example, when you want to understand what people are saying that could affect your brand
image. Or you may be interested in how your product fares
in a straight-up comparison with a competitor’s.
Other times, though, sentiment may not provide the
insights you’re after. This can be especially true when
you’re trying to wade through the huge numbers of
mentions and comments appearing in the social media
world. A promising alternative to sentiment analysis is
“semantic analysis.”
Don’t be turned off by the name. Simply put, semantic
analysis is a way to distill and create structure around
mountains of unstructured data – blog posts, social
network chatter, tweets and more – without preconceived
ideas of whether or how they are related.
Networked Insights’ new
Topic Discovery Engine
(TDE) is a semantic analysis
system finely tuned to
discover topics in social
media posts.
Semantic analysis refers to a group of methods that allow
machines to discover the fundamental patterns of words
or phrases that act as building blocks in a large set of text.
Topics, themes, sentiment and similar elements of meaning appear as intricate weavings of those fundamental
patterns. In fact, a valuable type of semantic analysis
is topic discovery: the summarization of large amounts
of text by automatically discovering the topics and
themes within.
Networked Insights’ new Topic Discovery Engine (TDE)
is a semantic analysis system finely tuned to discover
topics in social media posts.
networkedinsights.com 608.237.1867 [email protected]
© Networked Insights, Inc. 2011, all rights reserved
2
By grouping social media posts based on semantic
similarity, rather than preset sentiment categories such as
positive, negative and neutral, TDE can help you uncover
important information – for example, what exactly people
are saying about your product or service; where and how
they use it; the features they use most; and the enhancements or new offerings they’re interested in. All of this
information can ultimately drive product development,
new revenue streams, and strategies for marketing,
advertising and media planning.
Why sentiment falls short
One problem with sentiment analysis is what it cannot
tell you because it only considers a small amount of the
available data. Our experience shows that, on average, only
about 10 percent of posts actually contain sentiment, either
positive or negative — and that’s a generous estimate
(Figure 1). This means nine out of 10 posts are neutral,
revealing no sentiment, and are effectively being ignored
by the analysis. Thus, with sentiment analysis you’re
making decisions based on what only 10 percent of the
posts are saying.
Percentage of posts that
contain sentiment
100
90
80
70
60
50
40
30
The 90 percent of posts that do not reveal sentiment
are not all irrelevant; they just don’t fall cleanly into the
restrictive positive-negative view of semantics or meaning that sentiment analysis adheres to. For example, many
posts about a particular smartphone may come from
dedicated, loyal fans who simply have questions about
using the device. These are potentially valuable posts as
they indicate what users want from the device, problems
they may be having with its and features that could be
improved. However, customer questions such as these are
rarely classified as positive or negative, so they would be
missed by sentiment analysis.
A second problem with sentiment analysis deals with
statistical confidence in data. All methods of sentiment
analysis rely on example data to design, test or validate
the analysis. The accuracy and value of sentiment analysis
is directly dependent on the quality or confidence of the
example data.
networkedinsights.com 608.237.1867 [email protected]
20
10
0
Positive
Negative
None
Unknown
Figure 1
Data is based on a 500-post sentiment
study we conducted. The posts were
classified by 20 people each.
Posts were assigned to a sentiment
category based on a majority vote.
Only about 10% of posts were found
to contain sentiment.
© Networked Insights, Inc. 2011, all rights reserved
3
Many companies report that, on average, approximately
65 to 75 percent of readers agree on the sentiment of a
post. Assuming one of these companies asks four people
about the sentiment of each post, which is very likely, statistics tells us that the company is no more than 35 percent
confident it actually has a positive post when its readers
identify one. The graph at the right demonstrates this fact.
Data with such low confidence is a poor foundation for
sentiment analysis and largely leaves it up to chance – ask a
different set of four readers or use a different set of posts,
and results could be drastically different.
Confidence intervals for
a sample size of four readers
100
90
95%
35%
80
70
Percent agreement
Because sentiment is subjective, this example data is based
on majority opinion rather than truth. For practical
reasons, we cannot determine the majority opinion of all
readers for each post. Instead, the example data is obtained
from a small sample of human readers labeling posts with
the type of sentiment they contain (for example: positive,
negative or neutral).
60
50
40
30
20
10
Sentiment analysis is not inherently bad; for particular
types of questions, it may be the right tool. But if you use
it, make sure the data underlying the analysis is sound and
valuable data is not being ignored.
Semantic analysis gives you much more
If you really want to discover and understand the
conversations around your company, products, services
and brand, you need to be open to what all of the data tells
you. Semantic analysis is a better way to do that than
sentiment analysis for several reasons.
In contrast to sentiment analysis, semantic analysis can
take every post from a data set into account and can even
identify clear trends within groups of posts.
networkedinsights.com 608.237.1867 [email protected]
0
Sentiment of a post
When three out of four readers agree
on the sentiment of a post, 35% is
the highest confidence interval that
ensures a majority of readers would
considered a post positive.
Normally, statistical significance at the
95% level is desired (for research and
opinion polls). Most sentiment data
only achieves statistical significance at
the 35% level. Thus, most sentiment
data is not statistically significant (at
the 95% level).
© Networked Insights, Inc. 2011, all rights reserved
4
It’s not limited to a positive-negative framework and
doesn’t exclude neutral posts, unlike sentiment analysis
in the smartphone example previously discussed. In this
way, semantic analysis gives you clear insights into what’s
happening in the aggregate across a large number of posts
without your having to read all of them, an inefficient
or impossible task. In short, semantic analysis can find
any trend in the data as long as it exists in significant
enough numbers.
In the end, it’s about you and what you’re
looking for
iPhone 4
dual core
buy an iPad
retina display
next gen iPads
price drop
Verizon leak
Motorola Xoom
PlayBook, RIM
Android
guess iPad 2 specs
Android, Google
HTC Flyer, tablet
Android Honeycomb
A final advantage of semantic analysis is unique to
Networked Insights. Our TDE uses an advanced form
of semantic analysis to produce “topic trees” – it organizes
the topics it discovers into a tree-like structure, allowing
you to drill into a topic to see the subtopics within it.
A tree structure is highly effective for organizing large
amounts of data. It makes the process of finding valuable
insights, quite literally, exponentially faster than having to
search a flat set ​of topics.
iPad 2
Android Tablet
Another important advantage of semantic analysis is
that it isn’t restricted by a narrow view of meaning or
semantics. Sentiment, after all, is semantics: “What is
the author trying to communicate in this post?” But
people rarely post to a social network with the intent of
simply expressing that they either like or dislike a product,
company or idea; most forms of meaning are more
complex and varied. Semantic analysis reveals the
meaning or topics that sentiment analysis ignores.
Networked Insights’ “topic tree”
using semantic analysis
Our TDE uses an advanced form of
semantic analysis to produce “topic
trees” – it organizes the topics it
discovers into a tree-like structure,
allowing you to drill into a topic to
see the subtopics within it. The size
of the node represents volume of
conversation.
Ultimately, you are the best judge of information about
your company. You understand your domain best, which
topics are important and which are not. At the same time,
it’s important to inject subjectivity into the process as late
as possible to avoid biasing the analytic results.
networkedinsights.com 608.237.1867 [email protected]
© Networked Insights, Inc. 2011, all rights reserved
5
Semantic analysis with TDE considers these factors.
Rather than having a machine or human readers judge
the subjective sentiment of every post and then aggregate
some output, TDE groups similar posts and summarizes
the topics. Then, at the last stage, you or another qualified
professional can examine the output and decide which
topics are relevant, which are not and what they mean in
the given context.
A tool for these times
Social media information is expanding at a challenging
pace, and valuable nuggets can come from the most
unexpected places. Semantic analysis with TDE can help
you harness and make sense of it all. Most exciting,
automatic topic discovery with TDE gives you tremendous
latitude around how you approach the analysis. You don’t
have to be certain about what you’re looking for.
Instead, it’s a journey to discovery, not a set path that may
lead to inadequate insights or misleading conclusions. With
TDE’s semantic analysis, you can cost-effectively learn
volumes about how your company and your products and
services are being judged in the marketplace – so much
that you’ll have little time to be sentimental.
Social media information is
expanding at a challenging
pace, and valuable nuggets
can come from the most
unexpected places. Semantic
analysis with TDE can help
you harness and make sense
of it all.
We love the challenge of finding insights in all this
data – our challenge is your success!
Networked Insights was founded in 2006 by industry leaders and seasoned
entrepreneurs in the fields of social media and customer intelligence. Headquarters
are in Madison, WI, with offices in New York and Chicago.
T.R. Fitz-Gibbon is the chief scientist at Networked Insights. His team designs
the Natural Language Processing and Artificial Intelligence algorithms that power
the company’s software. His background is in electrical engineering, computer
engineering, and computer science with a focus on machine learning. T.R.’s passion
lies in using machine learning and big-data techniques to find great solutions to
problems that are too large and complex to have perfect solutions.
networkedinsights.com 608.237.1867 [email protected]
© Networked Insights, Inc. 2011, all rights reserved
6