Major Encounters to Search the Social Network

International Journals of Advanced Research in
Computer Science and Software Engineering
ISSN: 2277-128X (Volume-7, Issue-6)
Research Article
June
2017
Major Encounters to Search the Social Network
Charu Pujara1, Anuradha Pillai2, Dimple Juneja3
Research Scholar, Ymca University of Science and Technology, Haryana, India
2
Department of CE, Ymca University of Science and Technology, Haryana, India
3
Department of MCA, NIT, Kurukshetra, Haryana, India
1
Abstract— With the increase in the usage of Online Social Networks, People tend to manage multiple Social Networks
like Facebook, Twitter, Linkedin etc for interacting, networking, promoting business etc. web-based social networking
are changing the way data goes inside and between system of people. A user links his multiple accounts information
on one common platform to use the services of different networks and thus achieving network independence.
Aggregating the user information will provide a viable solution for collecting the information from various social
networks to one integrated presentation. In this paper, major issues to search the social network are discussed and a
novel approach is proposed to aggregate and extract useful information from the social networks.
Keywords— Social Network, TF/IDF, Hierarchical Algometric clustering, Machine Learning
I. INTRODUCTION
Social Networks like Facebook, Bebo, Delicious etc. are popular networks that provide web services to the users. Users
create profiles, publish data by Sharing, scrapping, tweeting etc. interact among others and generate online content on the
network. Now days, users communicate, connect and collaborate using services of social network. Numerous researches
have been done to study the structure of social networks and its properties. User can represent his query in a formal
language and knows the output of the query but will fail to represent the best solution and rank the users. It is beyond the
capability of a user or an expert of the domain to syntax the machine learning process and set keywords for the most
accurate and precise output. This paper suggests an open vision of challenges faced to design an intelligent social
network database system. The system suggests the need of an intelligent machine learning process as an important part of
the query processor to supervise and improvise upon the user’s implementations.
Social networks include a large amount of dataset with the exponential rise in the number of users. Every Social
network has its own protocols and query language to provide the expected results to the user resourcefully and precisely.
Facebook has developed FQL (Facebook Query Language) [5] which is a similar language to SQL. Twitter provides user
search based on words, particular people, places, and adding further properties related to tweet. LinkedIn provides job
and functionalities to search based on people. Each of the specified social networks is restricted to their own network and
provides specialized solution to the social network. To process a query on the social network is a major challenge in the
industry as well as in academia. There are tools available to analyze the user’s network, closeness, clique and other
properties of the network. Social Network can be seen as graphs and specialized databases are designed for querying
graph based structure. It is evident from the research that there is a need to provide an intelligent Social network
database which has the ability to query the graph based database and analyze the properties of nodes [21].
Alex Patriquin of Compete.com reported on the overlapping of users having accounts at various online social
network services [20]. A 2009 study of 11,000 users reported that the majority of MySpace, LinkedIn, and Twitter users
also have Facebook accounts [22].
Site
LinkedIn
Facebook
Friendster
Bebo
Orkut
Plaxo
Ning
MySpace
Hi5
LinkedIn
100
2
6
1
8
54
19
0
1
Facebook
42
100
23
25
26
48
35
20
24
Friendster
8
2
100
2
4
8
6
1
4
Bebo
4
4
5
100
3
5
6
3
7
Orkut
3
1
1
0
100
4
2
0
2
Plaxo
3
9
0
0
1
100
2
0
0
Ning
8
1
2
1
2
14
100
0
0
MySpace
32
64
49
65
29
34
44
100
69
Hi5
2
2
4
3
7
2
1
1
100
This paper is divided into six sections: Section 1 gives an introduction; section 2 discusses the Literature review.
Section 3 throws light on major challenges in designing an intelligent social network database. Section 4 proposed the
novel approach and section 5 concludes the paper.
© www.ijarcsse.com, All Rights Reserved
Page | 504
Pujara et al., International Journals of Advanced Research in Computer Science and Software Engineering
ISSN: 2277-128X (Volume-7, Issue-6)
II. RELATED WORK
A social network can be represented as a collection of nodes and edges where each user represent a node and an edge
between two nodes represents a relation like friends or followers depending upon the protocol of social network or group
of people. The nodes can have various attributes like user name, display name, location etc. Edges are hyper-edges to
easily represent groups of people and the interactions between them [4]. In a weighted graph, the system may generate
the weight as per the interaction of the user to the group or another user. The edges can also be linked with the public and
private attributes of the user and its corresponding value [5]. Queries refer to the structure of graph and query processor
refers to the engine which will extract the result for the database as per the desired interest of the user with the defined
keywords of the network hiding the information of the underlying data storage [6].
Numerous attempts have been made to model the database, each of which has its own underlying theory related
rules, principles, terms and level of development. However, according to study, only few have a potential to stand by the
requirements of online social networks database [3]. A relational data model is not suitable for representing a relation
that exists between users of the online social network. A graph is a noble approach to represent the interrelationships over
a set of data entities and can be taken as a conceptual tool to represent network related data. Graph theory plays an
important role in many areas like databases, computer science and other disciplines. Generally, graph models are
motivated by real-life applications where information about data inter-connectivity is more important as the data.
Therefore, despite the wealth of social network structures and analysis, there is still a need for new designs, methods and
specifically data management systems. Graph data models and related query technologies [1] offer notable advantages to
discover relationships in huge data sets and can be the base for many new functions which can be used for the future
online social network efficiently. Dries et al.[11] associates the analyzing , clustering and querying proficiency of social
network. SoQL [3], SociQL[4] is grounded on relational tables majorly focusing on centrality measure of social network.
SNQL [5] is founded on GraphLog considering querying and manipulation of data. SocialScope[16] ranks the output of
the query on the communal associations.
III. MAJOR CHALLENGES
The high level of personal information of a user and the inter-communication between the users can be extracted from
the social network. Structuring graphs as a theoretical graph data model defines a useful database. Graphs are the
foundation in mathematics and computer science. Graph theory is supported by a huge amount of formal study and
analysis. The system can use effective graph related algorithms which are specially designed to utilize the discussed
graph data structures. Graphs can represent online social network services easily and the graph databases are sufficient to
meet the present and the future needs for social web [2]. A database system that can answer the query written in a natural
language for a social network is extremely challenging. The database systems should be capable of managing,
processing, and analyzing complex, heterogeneous, temporal and voluminous graph-structured data. A more flexible and
dynamic system is required which provides the applications and user to augment novel facts and associations as they
want. Designing of a new system that can handle the voluminous data and manage the heterogeneous resources
effectively in online social network requires reconsidering about all aspects of a DBMS, including data modeling, storage
management, indexing, and query processing [2].
Some of the main issues that can serve the answer of the user’s query specified in natural language are listed below:
1. Extracting and mapping of the entities from the query is a major challenge and assigning semantic information of the
entities extracted from the user’s query to the attributes of the social network is an another extreme of the river.
2. There is a need of an optimal operational leaning process and accomplishing it effectively in a query processor.
3. Leveraging the results of one query to another to speed up the learning for a different query is also an interesting and
important problem.
4. A high correlation mechanism is required to build the gap between user’s interest of querying the system and the
results that the user is getting from the system.
5 There is a need to develop a better privacy mechanism that can restrict the usage of valuable information about the user
by others.
6. Due to the enormous size of social network and the perpetual growth of the embryonic network, there is a need of an
operative mechanism to demonstrate the closeness of two nodes, measuring centrality etc.
7. To provide an optimal, flexible dynamic and minimal time for executing a query is another important task.
8. The hypothetical questions, management of time and sources within a network, considering collaboration into the
query language and updating the coordination of the networks is another hurdle to cross over.
9. Displaying effective results of different statistics over the network is another challenge to face.
There are social network aggregators available that integrates the services provided by the multiple online social
networks like Hoot suite, flock etc. There are various solutions available to integrate the social network but no one has
tried to integrate the information available within multiple social networks. None of the aggregator has mined the
multiple social networks and extracted some useful information after collecting data from different Social Networks in a
natural language. A Comparative study of various services of social network aggregators is explored and presented in
table 2.
Flock [18] can get updates from friends, status updates and photos submitted at multiple social networks
whereas SocConnect users can creates a personalized social and semantic contexts for their social data. Users can
combine and cluster friends and rate network [17]. Hootsuite [19] aggregates organizations and businesses to
© www.ijarcsse.com, All Rights Reserved
Page | 505
Pujara et al., International Journals of Advanced Research in Computer Science and Software Engineering
ISSN: 2277-128X (Volume-7, Issue-6)
collaboratively execute promotions across multiple social networks and XeeMe [18] Organizes Social presence,
discovers new network and people. It organizes the entire social presence of the user, determine new networks and
people and develop their presence and influence. But none of the aggregator has mined the multiple social networks and
extracted some useful information after collecting data from different Social Networks. All the social network
aggregators discussed can integrate Facebook, LinkedIn and Twitter accounts but none of the network can handle the
query search written in a natural language.
IV. PROPOSED WORK
Figure 1 Architecture of Social Search Engine
To start with, there are numerous different conceivable valuable implementations for each capacity. Expecting that there
are various ways to implement, which ought to be decided to find the solution to the specific question. The database is
created and aggregated based on the information available from multiple social networks. The data should be extracted
based on the requirement from the user end. The similar data are clustered using Machine learning algorithm. The
clustered data are ranked based on the data extracted with the help of Machine learning Algorithm. Natural language
techniques can be applied to extract the semantic meaning of the query to cognize the need of the user and then the query
is sacked to the database engine. The extracted output will be displayed to the user as per user’s preference. The complete
architecture is as shown in figure 1. Content generated by the user, their interaction among the users and relationships are
the point of the information to be analyzed and extracted as per user’s criteria. The clustering will group the useful value
from the ocean of data to identify and discover new learning to understand more aspects of the user like interest in a
particular group, same age group connected people etc. To select the best choice of user’s objective to query the system
and provide the results is accomplished by ranking algorithm.
Utilizing machine learning, the databases framework can endeavor to enhance the performance of the search
results comes about, when the question is resected. For search queries that runs frequently, and for which to a great
degree effective outcome positioning is essential, a learning procedure is very valuable. In this way, in a nutshell, our
vision is of natural language query processor including services of social networks that can likewise figure out how to
answer questions.
In the proposed work, 5000 profile information about the user is extracted from Twitter and Bing Search that is
considered as the input data. The input query is processed in order to extract the keywords for getting the relevant results
from the database. Characters are converted to lower case, Stop words are removed, Extra white spaces are stripped and
stemming is done using Porter’s suffix-stripping algorithm in order to map the entities into a unique form. User Profiles
are aggregated from multiple social networks into one unique profile.
TF - IDF and Cosine Similarity - TF-IDF is a weighted model commonly used for information retrieval
problems. It aims to convert the text documents into vector models on the basis of occurrence of words in the documents
without taking considering the exact ordering. For Example - let say there is a dataset of N text documents, In any
document ―D‖, TF and IDF will be defined as Document (D) – dataset that has n number of documents
Term Frequency (TF) - TF for a term ―t‖ is defined as the count of a term ―t‖ in a document ―D‖
Inverse Document Frequency (IDF) - IDF for a term is defined as logarithm of ratio of total documents available in the
corpus and number of documents containing the term T.
TF-IDF formula gives the relative importance of a term in a corpus (list of documents), given by the following
formula below. Following is the code to convert a text into tf-idf vectors:
© www.ijarcsse.com, All Rights Reserved
Page | 506
Pujara et al., International Journals of Advanced Research in Computer Science and Software Engineering
ISSN: 2277-128X (Volume-7, Issue-6)
The cosine of the angle between the vectors ends up being a good indicator of similarity because at the closest
the two vectors could be, 0 degrees apart, the cosine function returns its maximum value of 1. It's worth noting that
because we are calculating similarities and not distances, the optimization objective in this function is not to minimize
the cost function, or error, but rather to maximize the similarity function.
Hierarchical Algometric clustering is a method of cluster analysis which seeks to build a hierarchy of clusters.
The quality of a pure hierarchical clustering method suffers from its inability to perform adjustment, once a merge or
split decision has been executed. Then it will neither undo what was done previously, nor perform object swapping
between clusters. Thus merge or split decision, if not well chosen at some step, may lead to somewhat low quality
Clusters. One promising direction for improving the clustering quality of hierarchical methods is to integrate hierarchical
clustering with other techniques for multiple phase clustering.
The proposed clustering integrates the output of TF –IDF and Hierarchical clustering and produces ensemble
clusters. The results of each clustering models are combined and ensemble clusters are produced by using Hungarian
algorithm and cumulative voting for each cluster. Formally the HAC algorithm can be stated as follows:
i) Begin disjoint clustering.
ii) Update the distance matrix D by deleting the rows and columns corresponding to the clusters and addition of
new rows and columns to the newly formed cluster.
iii) If all data points are in one cluster then stop or else repeat the steps from ii.
Output cluster of users are ranked as per the user’s query. User’s meeting the criteria as per the query will be
ranked higher.
V. SIMILARITY AND EVALUATION MEASURES
1. Manhattan distance: The distance between two points on the number of horizontal/vertical that takes from one point
to another [23]. It is the simple sum of horizontal and vertical entities. The Manhattan distance of two points p and q is
computed as,
Where:
p=p1, p2, ----, pn
p=p1, p2, ----, pn
for n varibles.
2. Euclidean distance: The square root of the sum of squared difference between the two vectors is defined as the
Euclidean distance [24]. It is the straight line distance between two points in the Euclidean space. The Euclidean distance
of two vector p and q is defined as,
3. Cosine Dissimilarity: It is used to identify how two vectors aren’t related to each other [].The cosine of
0° is 1, and it is less than 1 for any other angle. The cosine dissimilarity of vector p and q is calculated as:-
4. Rand index: It penalizes both false positive and false negative decisions during clustering [26]. Table 1 displays the
evaluation of various measures for the extracted dataset.
Data
Twitter users profile
Bing users profile
Table 1. Evaluation on various measures
Manhattan Euclidean Rand
0.81
0.73
0.36
0.46
0.56
0.23
Cosine
0.15
0.25
VI. CONCLUSION
User Personalization and user interaction is a high level of information that can easily be extracted from the social
network. No single model of data exists which is appropriate for all the users and problem domain. Thus, considering the
modern nature of communication of users via social network, there is a need to build a more realistic query processor that
can answer the user’s query in a natural language and overcomes the traditional strategies to adopt the high level of
user’s communication.
REFERENCES
[1]
Cohen, S., Ebel, L., & Kimelfeld, B. (2013). A Social Network Database that Learns How to Answer Queries.
In CIDR.
[2]
YANG, S. (2011). Data Modeling and Query Processing for Online Social Networking Services (Doctoral
dissertation).
[3]
Ronen, R., & Shmueli, O. (2009, March). SoQL: A language for querying and creating data in social networks.
In Data Engineering, 2009. ICDE'09. IEEE 25th International Conference on (pp. 1595-1602). IEEE.
© www.ijarcsse.com, All Rights Reserved
Page | 507
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
Pujara et al., International Journals of Advanced Research in Computer Science and Software Engineering
ISSN: 2277-128X (Volume-7, Issue-6)
San Martın, M., Gutierrez, C., & Wood, P. T. (2011). SNQL: A social networks query and transformation
language. cities, 5, r5.
Serrano Suarez, D. F. (2011). SociQL: a query language for the social web.
Holland, P. W., & Leinhardt, S. (1977). Social Networks: A Developing Paradigm.
Zou, L., Chen, L., & Özsu, M. T. (2009). Distance-join: Pattern match query in a large graph database.
Proceedings of the VLDB Endowment, 2(1), 886-897.
Williams, D. W., Huan, J., & Wang, W. (2007, April). Graph database indexing using structured graph
decomposition. In Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on (pp. 976-985).
IEEE.
Tong, H., Faloutsos, C., Gallagher, B., & Eliassi-Rad, T. (2007, August). Fast best-effort pattern matching in
large attributed graphs. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge
discovery and data mining (pp. 737-746). ACM.
Mitra, S., Bagchi, A., & Bandyopadhyay, A. K. (2009). Design of a data model for social network applications.
Database Technologies: Concepts, Methodologies, Tools, and Applications: Concepts, Methodologies, Tools,
and Applications, 1.
Dries, A., Nijssen, S., & De Raedt, L. (2009, November). A query language for analyzing networks. In
Proceedings of the 18th ACM conference on Information and knowledge management (pp. 485-494). ACM.
Staworko, S., & Wieczorek, P. (2012, March). Learning twig and path queries. In Proceedings of the 15th
International Conference on Database Theory (pp. 140-154). ACM.
Cohen, S., & Cohen-Tzemach, N. (2013, June). Implementing link-prediction for social networks in a database
system. In Proceedings of the ACM SIGMOD Workshop on Databases and Social Networks (pp. 37-42). ACM.
Amer-Yahia, S., Lakshmanan, L., & Yu, C. (2009). Socialscope: Enabling information discovery on social
content sites. arXiv preprint arXiv:0909.2058.
Lampe, C., Ellison, N., & Steinfield, C. (2006, November). A Face (book) in the crowd: Social searching vs.
social browsing. In Proceedings of the 2006 20th anniversary conference on Computer supported cooperative
work (pp. 167-170). ACM.
Amer-Yahia, S., Lakshmanan, L., & Yu, C. (2009). Socialscope: Enabling information discovery on social
content sites. arXiv preprint arXiv:0909.2058.
Zhang, J., Wang, Y., & Vassileva, J. (2013). SocConnect: A personalized social network aggregator and
recommender. Information Processing & Management, 49(3), 721-737.
Fadrique del Campo, H. (2012). Design and development of a social network aggregator (Doctoral dissertation).
Hootsuite Media Inc. (2015, Sept 1). Retrieved from hootsuite.com: https://hootsuite.com/
Patriquin, A. (2007). Connecting the social graph: member overlap at opensocial and facebook. Compete. com
blog.
Cohen, S., Ebel, L., & Kimelfeld, B. (2013). A Social Network Database that Learns How to Answer Queries.
In CIDR.
Perez, S. (2009). Who Uses Social Networks and What Are They Like.
T. W. Schoenharl and G. Madey, ―Evaluation of Measurement Techniques for the Validation of Agent-based
Simulations Against Streaming Data‖, In International Conference on Computational Science, (2008), Springer
Berlin Heidelberg, pp. 6-15.
A. Huang, 2008, ―Similarity Measures for Text Document Clustering‖, Proceedings of the sixth New Zealand
Computer Science Research Student Conference (NZCSRSC2008), Christchurch, New Zealand, (2008), pp. 4956
J. Han, J. Pei, and M. Kamber, ―Data Mining: Concepts and Techniques‖, (2011), Elsevier
Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American
Statistical association, 66(336), 846-850.
© www.ijarcsse.com, All Rights Reserved
Page | 508