Social Media Mining: An Introduction

TJTSD66: Advanced Topics in Social Media
(Social Media Mining)
Introduction
Dr. WANG, Shuaiqiang @ CS & IS, JYU
Email: [email protected]
Homepage: http://users.jyu.fi/~swang/
Most of contents are provided by the website http://dmml.asu.edu/smm/
About Me - Experience
• Education
– Aug. 2009 – Oct. 2009, exchange Ph.D. student in CS,
HKBU, Hong Kong, China
– Sep. 2004 – Dec. 2009, Ph.D. in CS, SDU, China
– Sep. 2000 – Jul. 2004, B.Eng. in CS, SDU, China
• Work Experience
– Sep. 2014 – Present, Postdoc Researcher of CS, JYU,
Finland
– Mar. 2011 – Jul. 2014, Assoc. Prof. of CS, SDUFE,
China
– Jan. 2010 – Feb. 2011, Postdoc fellow of CS, TSU, TX,
USA
Social Media Mining
Introduction
Slide 2 of 30
2
About Me - Research
• Research Interests
– Recommender Systems; Information Retrieval; Data
Mining; Machine Learning
• Publications
– 20+ papers, including 6 JUFO-3 papers
– See http://users.jyu.fi/~swang/
• Students
– 2 doctoral students
– 2 master students
Welcome self-motivated students with good programming
skills and mathematical background!
Social Media Mining
Introduction
Slide 3 of 30
3
About the Course
• The growth of social media over the last decade
has revolutionized the way individuals interact
and industries conduct business.
• We attempt to deeply understand and process
this data for interdisciplinary research, novel
algorithms, and tool development.
• You will learn the main techniques and skills for
social media mining
– Fundamental concepts, emerging issues, effective
algorithms, and possible applications for social data
mining.
Social Media Mining
Introduction
Slide 4 of 30
4
Contents
• Part I Essentials
–
–
–
–
Graph Essentials
Network Measures
Network models
Data mining essentials
• Part II Communities and Interactions
– Community Analysis
– Information Diffusion in Social Media
• Part III Applications
– Influence and Homophily
– Recommendation in Social Media
Social Media Mining
Introduction
Slide 5 of 30
5
Textbook
Most of my slides come from here!
http://dmml.asu.edu/smm/
Social Media Mining
Introduction
Slide 6 of 30
6
Reference for Social Networks
Book: http://www.cs.cornell.edu/home/kleinber/networks-book/
Class: http://www.ymsir.com/networks/
Social Media Mining
Introduction
Slide 7 of 30
7
Reference for Data Mining
http://hanj.cs.illinois.edu/bk3/
Social Media Mining
Introduction
Slide 8 of 30
8
Assessment
• Assessment criteria
– Individual assignment: 20%
– Group work deliverable and presentation: 30%
– Final exam: 50%
• Final exam: Written final exam based on
course material will take place after the course.
Exam dates are provided in Korppi.
Social Media Mining
Introduction
Slide 9 of 30
9
Individual assignment
• You can choose 4 chapters (subchapters). There are
two options:
– Technical oriented option. Each student is required to
implement at least 4 social/data mining algorithms. Any
pair of algorithms can NOT belong to a same chapter. The
datasets can be either downloaded from the internet or
artificially made on your own. Any programming language
is acceptable.
– Report oriented option. Each student is required to
write 4 reports on different potential applications of the
social/data mining algorithms. Each report should include:
(1) description and motivation of the application scenario,
(2) problem formulation (formulated as a social/data
mining problem), (3) possible solutions and algorithms,
(4) expected results and conclusions, and (5) key
literatures.
Social Media Mining
Introduction
Slide 10 of 30
10
Group work
• The group work continuous throughout the duration of
the course. There are also two options:
– Technical oriented option. Students are expected to apply
the theoretical knowledge to solve practical problems. Each
group consists of 4-5 students. The group work includes
conceiving a social media application scenario, designing and
implementing a web-based computer software or mobile app,
and presentation. The software/app should have a friendly user
interface, and apply at least one social mining algorithm. Any
programming language is acceptable. Somehow improvement to
the existing algorithm is obviously a big plus.
– Paper oriented option. Students are expected to write a
research paper. Each group consists of 3-5 students. The paper
should use some social/data mining algorithms to analyze the
data and achieve the conclusions. It can be an extension/a
combination of your previous individual reports, but at least
50% new materials should be introduced.
Social Media Mining
Introduction
Slide 11 of 30
11
Facebook
•
•
What kinds of information can be found in Facebook?
Where do you think Facebook can use your data?
Social Media Mining
Introduction
Slide 12 of 30
12
Amazon
Social Media Mining
Introduction
Slide 13 of 30
13
Yelp
Social Media Mining
Introduction
Slide 14 of 30
14
Twitter
Social Media Mining
Introduction
Slide 15 of 30
15
Objectives of Our Course
• Understand social aspects of the Web
– Social Theories + Social media + Mining
– Learn how to collect, clean, and represent social
media data
– How to measure important properties of social media
and simulate social media models
– Find and analyze communities in social media
– Understanding friendships in social media, perform
recommendations, and analyze behavior
• Study or ask interesting research issues
– e.g., start-up ideas
• Learn representative algorithms and tools
Social Media Mining
Introduction
Slide 16 of 30
16
Social Media
Social Media Mining
Introduction
Slide 17 of 30
17
Definition
Social Media is the use of electronic and Internet
tools for the purpose of sharing and discussing
information and experiences with other human
beings in more efficient ways.
Social Media Mining
Introduction
Slide 18 of 30
18
Social Media Mining
Introduction
Slide 19 of 30
19
Social Media Mining is the process
of representing, analyzing, and
extracting meaningful patterns from
social media data
Social Media Mining
Introduction
Slide 20 of 30
20
Social Media Mining Challenges
1. Big Data Paradox
1. Social media data is big, yet not evenly distributed.
2. Often little data is available for an individual
2. Obtaining Sufficient Samples
1. Are our samples reliable representatives of the full data?
3. Noise Removal Fallacy
1. Too much removal makes data more sparse
2. Noise definition is relative and complicated and is taskdependent
4. Evaluation Dilemma
1. When there is no ground truth, how can you evaluate?
Social Media Mining
Introduction
Slide 21 of 30
21
Publications: Data Mining
• Conferences
– KDD: ACM SIGKDD Conference on Knowledge Discovery
and Data Mining
– ICDM: IEEE International Conference on Data Mining
– SDM: SIAM Conference on Data Mining
– ECML/PKDD: European Conference on Machine
Learning and Principles and Practice of Knowledge
Discovery in Databases
• Journals
– TKDE: IEEE Transactions on Knowledge and Data
Engineering
– TKDD: ACM Transactions on Knowledge Discovery from
Data
– DMKD: Data Mining and Knowledge Discover
– KAIS: Knowledge and Information Systems
Social Media Mining
Introduction
Slide 22 of 30
22
Publications: WWW and Social Networks
• Conferences
– WWW: International World Wide Web Conference
– ICWSM: International AAAI Conference on Web and
Social Media
• Journals
– TWEB: ACM Transactions on the Web
– WWWJ: World Wide Web Journal
Social Media Mining
Introduction
Slide 23 of 30
23
Publications: Information Retrieval
• Conferences
– SIGIR: ACM SIGIR Conference on Research and
Development in Information Retrieval
– CIKM: ACM International Conference on
Information and Knowledge Management
– WSDM: ACM International Conference on Web
Search and Data Mining
– ECIR: European Conference on Information
Retrieval
• Journals
– TOIS: ACM Transactions on Information Systems
– IPM: Information Processing & Management
– IRJ: Information Retrieval Journal
Social Media Mining
Introduction
Slide 24 of 30
24
Publications: Artificial Intelligence
• Conferences
– IJCAI: International Joint Conference on Artificial
Intelligence
– AAAI: AAAI Conference on Artificial Intelligence
– ECAI: European Conference on Artificial Intelligence
– RecSys: ACM Conference on Recommender Systems
• Journals
– AIJ: Artificial Intelligence
– JAIR: Journal of Artificial Intelligent Research
– TIST: ACM Transactions on Intelligent Systems and
Technology
Social Media Mining
Introduction
Slide 25 of 30
25
Publications: Natural Language Processing
• Conferences
– ACL: Annual Meeting of the Association for
Computational Linguistics
– EMNLP: Conference on Empirical Methods in
Natural Language Processing
– Coling: International Conference on Computational
Linguistics
– NAACL: North American Chapter of the Association
for Computational Linguistics
• Journals
– CL: Computational Linguistics
– TACL: Transactions of the Association for
Computational Linguistics
Social Media Mining
Introduction
Slide 26 of 30
26
Publications: Image Processing
• Conferences
– CVPR: IEEE Conference on Computer Vision and
Pattern Recognition
– MM: ACM Conference on Multimedia
– CHI: ACM Conference on Human Factors in
Computing Systems
• Journals
– TPAMI: IEEE Transactions on Pattern Analysis and
Machine Intelligence
– TMM: IEEE Transactions on Multimedia
– PR: Pattern Recognition
– TOCHI: ACM Transactions on Computer-Human
Interaction
Social Media Mining
Introduction
Slide 27 of 30
27
Publications: CS Journals and Magazines
• Journals
– JACM: Journal of the ACM
– JASIST: Journal of the Association for Information
Science and Technology
• Magazines
–
–
–
–
–
–
Communications of the ACM
IEEE Computer
IEEE Internet Computing
IEEE Intelligent Systems
SIGIR Forum
KDD Explorations
Social Media Mining
Introduction
Slide 28 of 30
28
Homework
• Find your group members
• Group meeting
– Discuss your topic: Title+Motivation+Objectives
• Choose a contactor for each group
• Each contactor sends an Email with the subject
of the course code (TJTSD66) to Mr. Denis
Kotkov ([email protected]), indicating:
–
–
–
–
Your group members, including YOURSELF
Your choice: Technique-oriented or paper-oriented
Your topic: Title+Motivation+Objectives
Deadline: 02/11/2015 (next Monday), 6pm
Social Media Mining
Introduction
Slide 29 of 30
29
Any Question?
Social Media Mining
Introduction
Slide 30 of 30
30