Computational Sustainability: Papers from the 2015 AAAI Workshop Toward Social Media Opinion Mining for Sustainability Research Rundong Du, Zhongming Lu, Arka Pandit, Da Kuang, John Crittenden, Haesun Park Georgia Institute of Technology Atlanta, Georgia 30332 Abstract sentiment analysis into the field of computational sustainability. We will present a general framework for mining topics and opinions from social media. And then we will summarize related work and the necessity of developing new methods. We will use Twitter as an example in this paper, but our framework also works for other social media. We propose to introduce social media opinion mining research into the field of computational sustainability. Opinion mining from social media can be a faster and less expensive alternative to traditional survey and polling, on which many sustainability research are based. We describe a framework for such analysis, examine the challenges in our proposed framework and current status of research on those challenges. We also propose some possible research directions for tackling these challenges. Framework for Mining Topics and Opinions from Social Media For survey and polling methods, the first step is usually to carefully design questions that reflect the information we want to gather. However, mining opinions from social media is a rather passive process, which need a different work flow. Given the fact that tweets about sustainability is relatively rare compared to those discussing hot topics, it is better to know what kind of information we can actually acquire from social media first, so that we can design the questions and analysis methods in a more targeted and efficient way. Therefore, we propose the following framework as illustrated in Figure 1. Introduction In the area of urban sustainability and resilience, it is usually important to understand people’s attitude towards certain products, amenities or design. For example, urban planners and policy makers may want to know how people choose between “conventional sprawling community” and “smart growth neighborhood” (Lu et al. 2014); green products manufacturers may want to understand what makes consumers choose non-sustainable products over their green products (or the other way around) (Pickett-Baker and Ozaki 2008; Yam-Tang and Chan 1998); sustainability researchers and educators may want to know sustainability related hot topics people are talking about and people’s attitude towards them. Traditionally, these information is obtained by conducting and analyzing surveys. However, conducting surveys is usually expensive: a survey with 1000 respondents usually takes tens of thousands of dollars to run (Braunsberger, Wybenga, and Gates 2007). On the other hand, in today’s digital age, hundreds of millions of people are sharing their thoughts and opinions on social media like Twitter every day. And most social media contents can be freely accessed by the public. It would be of great help if we could extract sustainability related opinions from social media, as a faster and less expensive alternative to traditional survey and polling methods. Such analysis also significantly increases the variety of potential opinions since people are talking about all kinds of things on social media. Based on the reasons we state above, we propose to bring social media opinion mining research based on topic and Social Media (eg. Twitter) Step 1 Step 2 Data Collector Clusters of Related Documents Representative Data Collection Opinion Analysis Targeted Topic Modeling Meaningful Report Step 3 Figure 1: Illustration of our proposed framework c 2015, Association for the Advancement of Artificial Copyright Intelligence (www.aaai.org). All rights reserved. In this framework, the first step is to collect data from 21 these messages) in their paper, which also correspond to the two fundamental tasks in our framework: targeted topic discovery and sentiment analysis. While much research have been done in the area of topic modeling and sentiment analysis, new methods are still needed to be developed for social media contents. For topic discovery, people have built some statistical models (Blei, Ng, and Jordan 2003; Hofmann 1999) and developed fast and large-scale algorithms (Kim and Park 2011; Kuang and Park 2013). However, those methods don’t usually perform well on Twitter data due to the restricted length of Twitter messages, and they are not able to find high quality targeted topics. Traditional sentiment analysis methods have similar issues on Twitter data (Kouloumpis, Wilson, and Moore 2011). While there are already many open challenges in the area of sentiment analysis, features of Twitter messages such as restricted length, casual language style, mixed use of symbols (like hash tags) and words, and high frequency of grammar and spelling errors are making such analysis even harder. Some efforts were made trying to address these issues (Ramage, Dumais, and Liebling 2010; Hong and Davison 2010; Agarwal et al. 2011; Go, Bhayani, and Huang 2009), but building robust and trustable topic models and sentiment analysis methods remains an open challenge. We may need a series of research to tackle these challenges on query-driven analysis, targeted topic discovery and sentiment analysis. Although solving some of these issues perfectly may be as hard as developing human level language processing methods, there are still a lot we can do to develop approximation algorithms or alternative methods that are good enough for practical use, as in many AI areas. For query-driven analysis, we can investigate different types of survey questions and establish a general framework of converting survey problems into data analysis methods. We can also identify and utilize the types of survey questions that can be answered without very accurate sentiment analysis. For targeted topic discovery, we are currently working on better pre-processing techniques and encoding methods for Twitter texts, which may make existing topic models more accurate on Twitter data. Also, we are looking for an unsupervised measurement of topic/cluster quality to solve the issue of evaluating targeted topic model without labeled data. For sentiment analysis, which is harder, we can try to develop aggregate opinion mining methods that do not require high accuracy of individual sentiment analysis (O’Connor et al. 2010). the social media. For Twitter, this step is usually done by querying the Twitter API. From our experience, while people do talk about sustainability on Twitter, it is hardly a hot topic. Directly querying Twitter’s sample API, which only returns a small random sample of all public statuses, will result in very few tweets about sustainability, which is far from enough for analysis. In a 5-month sample that contains 293 million tweets, we only found less than ten thousand tweets that contain the keywords we are interested in. Therefore, it is better to use the filter API to get tweets containing a set of predefined keywords. The second step is to discover and clean up topics from the collected data by using a topic model (e.g. Blei, Ng, and Jordan 2003). The collected data may be about a specific product or a general topic. In the case of a specific product, such as solar panel, topic discovery can help products manufacturers to understand how different attributes of their products affect consumer’s choices. In the case of a general topic, such as distributed energy, topic discovery is essential for researchers to get a more concrete view of people’s thoughts expressed on social media. However, traditional topic models are good at discovering the salient topics, or major themes, underlying a text collection, while the specific keywords related to sustainability we are targeting for might not be revealed in the discovered topics. Therefore, we propose targeted topic models that aim at finding topics related to one’s needs and producing document clusters with better quality, so that related documents and noises are separated. After the second step, documents will be clustered into different topics that are discovered. And we will get a good knowledge of potential topics that we can design questions for. Then the last step is to apply sentiment/opinion analysis on the clustered documents. It will help product manufacturers to know which attributes of their products are liked/disliked by the public, and will help researchers to know which green technologies are adopted more widely by the public. To answer specific survey problems, one may need to design specialized methods to analyze the data. Challenges, Related Work and Future Directions The idea of using Twitter as a polling mechanism is not new. O’Connor et al. (2010) revealed the potential of using text streams as a substitute or supplement for traditional polling, by showing correlation between sentiment word frequencies in contemporaneous Twitter messages and some survey results on consumer confidence and political opinions. They then proposed a more general goal of developing “querydriven sentiment analysis where one can ask more varied questions of what people are thinking based on text they are already writing”, which is very challenging and requires more research on it. Their illuminating work also suggests that more advanced NLP techniques may be useful for more accurate estimation of public opinions. Such NLP techniques were referred as “message retrieval” (identifying messages related to the topic) and “opinion estimation” (determining sentiment of Conclusion Opinion mining based on topic model and sentiment analysis is a promising alternative to survey and polling for sustainability researchers, urban planners and green products manufacturers. Plenty of research can be done in this area, which can not only enrich the field of computational sustainability but also motivate new research in natural language processing, which can also be applied to many other areas. 22 References Agarwal, A.; Xie, B.; Vovsha, I.; Rambow, O.; and Passonneau, R. 2011. Sentiment analysis of twitter data. In Proceedings of the Workshop on Languages in Social Media, 30–38. Association for Computational Linguistics. Blei, D. M.; Ng, A. Y.; and Jordan, M. I. 2003. Latent dirichlet allocation. the Journal of machine Learning research 3:993–1022. Braunsberger, K.; Wybenga, H.; and Gates, R. 2007. A comparison of reliability between telephone and web-based surveys. Journal of Business Research 60(7):758–764. Go, A.; Bhayani, R.; and Huang, L. 2009. Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford 1–12. Hofmann, T. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, 50–57. ACM. Hong, L., and Davison, B. D. 2010. Empirical study of topic modeling in twitter. In Proceedings of the First Workshop on Social Media Analytics, 80–88. ACM. Kim, J., and Park, H. 2011. Fast nonnegative matrix factorization: An active-set-like method and comparisons. SIAM Journal on Scientific Computing 33(6):3261–3281. Kouloumpis, E.; Wilson, T.; and Moore, J. 2011. Twitter sentiment analysis: The good the bad and the omg! ICWSM 11:538–541. Kuang, D., and Park, H. 2013. Fast rank-2 nonnegative matrix factorization for hierarchical document clustering. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, 739–747. ACM. Lu, Z.; Southworth, F.; Crittenden, J.; and Dunhum-Jones, E. 2014. Market potential for smart growth neighbourhoods in the usa: A latent class analysis on heterogeneous preference and choice. Urban Studies 0042098014550956. O’Connor, B.; Balasubramanyan, R.; Routledge, B. R.; and Smith, N. A. 2010. From tweets to polls: Linking text sentiment to public opinion time series. ICWSM 11:122– 129. Owoputi, O.; O’Connor, B.; Dyer, C.; Gimpel, K.; Schneider, N.; and Smith, N. A. 2013. Improved part-of-speech tagging for online conversational text with word clusters. In HLT-NAACL, 380–390. Pang, B., and Lee, L. 2008. Opinion mining and sentiment analysis. Foundations and trends in information retrieval 2(1-2):1–135. Pickett-Baker, J., and Ozaki, R. 2008. Pro-environmental products: marketing influence on consumer purchase decision. Journal of consumer marketing 25(5):281–293. Ramage, D.; Dumais, S. T.; and Liebling, D. J. 2010. Characterizing microblogs with topic models. ICWSM 10:1–1. Yam-Tang, E. P., and Chan, R. Y. 1998. Purchasing behaviours and perceptions of environmentally harmful products. Marketing Intelligence & Planning 16(6):356–362. 23
© Copyright 2026 Paperzz