International Journals of Advanced Research in Computer Science and Software Engineering ISSN: 2277-128X (Volume-7, Issue-6) Research Article June 2017 Major Encounters to Search the Social Network Charu Pujara1, Anuradha Pillai2, Dimple Juneja3 Research Scholar, Ymca University of Science and Technology, Haryana, India 2 Department of CE, Ymca University of Science and Technology, Haryana, India 3 Department of MCA, NIT, Kurukshetra, Haryana, India 1 Abstract— With the increase in the usage of Online Social Networks, People tend to manage multiple Social Networks like Facebook, Twitter, Linkedin etc for interacting, networking, promoting business etc. web-based social networking are changing the way data goes inside and between system of people. A user links his multiple accounts information on one common platform to use the services of different networks and thus achieving network independence. Aggregating the user information will provide a viable solution for collecting the information from various social networks to one integrated presentation. In this paper, major issues to search the social network are discussed and a novel approach is proposed to aggregate and extract useful information from the social networks. Keywords— Social Network, TF/IDF, Hierarchical Algometric clustering, Machine Learning I. INTRODUCTION Social Networks like Facebook, Bebo, Delicious etc. are popular networks that provide web services to the users. Users create profiles, publish data by Sharing, scrapping, tweeting etc. interact among others and generate online content on the network. Now days, users communicate, connect and collaborate using services of social network. Numerous researches have been done to study the structure of social networks and its properties. User can represent his query in a formal language and knows the output of the query but will fail to represent the best solution and rank the users. It is beyond the capability of a user or an expert of the domain to syntax the machine learning process and set keywords for the most accurate and precise output. This paper suggests an open vision of challenges faced to design an intelligent social network database system. The system suggests the need of an intelligent machine learning process as an important part of the query processor to supervise and improvise upon the user’s implementations. Social networks include a large amount of dataset with the exponential rise in the number of users. Every Social network has its own protocols and query language to provide the expected results to the user resourcefully and precisely. Facebook has developed FQL (Facebook Query Language) [5] which is a similar language to SQL. Twitter provides user search based on words, particular people, places, and adding further properties related to tweet. LinkedIn provides job and functionalities to search based on people. Each of the specified social networks is restricted to their own network and provides specialized solution to the social network. To process a query on the social network is a major challenge in the industry as well as in academia. There are tools available to analyze the user’s network, closeness, clique and other properties of the network. Social Network can be seen as graphs and specialized databases are designed for querying graph based structure. It is evident from the research that there is a need to provide an intelligent Social network database which has the ability to query the graph based database and analyze the properties of nodes [21]. Alex Patriquin of Compete.com reported on the overlapping of users having accounts at various online social network services [20]. A 2009 study of 11,000 users reported that the majority of MySpace, LinkedIn, and Twitter users also have Facebook accounts [22]. Site LinkedIn Facebook Friendster Bebo Orkut Plaxo Ning MySpace Hi5 LinkedIn 100 2 6 1 8 54 19 0 1 Facebook 42 100 23 25 26 48 35 20 24 Friendster 8 2 100 2 4 8 6 1 4 Bebo 4 4 5 100 3 5 6 3 7 Orkut 3 1 1 0 100 4 2 0 2 Plaxo 3 9 0 0 1 100 2 0 0 Ning 8 1 2 1 2 14 100 0 0 MySpace 32 64 49 65 29 34 44 100 69 Hi5 2 2 4 3 7 2 1 1 100 This paper is divided into six sections: Section 1 gives an introduction; section 2 discusses the Literature review. Section 3 throws light on major challenges in designing an intelligent social network database. Section 4 proposed the novel approach and section 5 concludes the paper. © www.ijarcsse.com, All Rights Reserved Page | 504 Pujara et al., International Journals of Advanced Research in Computer Science and Software Engineering ISSN: 2277-128X (Volume-7, Issue-6) II. RELATED WORK A social network can be represented as a collection of nodes and edges where each user represent a node and an edge between two nodes represents a relation like friends or followers depending upon the protocol of social network or group of people. The nodes can have various attributes like user name, display name, location etc. Edges are hyper-edges to easily represent groups of people and the interactions between them [4]. In a weighted graph, the system may generate the weight as per the interaction of the user to the group or another user. The edges can also be linked with the public and private attributes of the user and its corresponding value [5]. Queries refer to the structure of graph and query processor refers to the engine which will extract the result for the database as per the desired interest of the user with the defined keywords of the network hiding the information of the underlying data storage [6]. Numerous attempts have been made to model the database, each of which has its own underlying theory related rules, principles, terms and level of development. However, according to study, only few have a potential to stand by the requirements of online social networks database [3]. A relational data model is not suitable for representing a relation that exists between users of the online social network. A graph is a noble approach to represent the interrelationships over a set of data entities and can be taken as a conceptual tool to represent network related data. Graph theory plays an important role in many areas like databases, computer science and other disciplines. Generally, graph models are motivated by real-life applications where information about data inter-connectivity is more important as the data. Therefore, despite the wealth of social network structures and analysis, there is still a need for new designs, methods and specifically data management systems. Graph data models and related query technologies [1] offer notable advantages to discover relationships in huge data sets and can be the base for many new functions which can be used for the future online social network efficiently. Dries et al.[11] associates the analyzing , clustering and querying proficiency of social network. SoQL [3], SociQL[4] is grounded on relational tables majorly focusing on centrality measure of social network. SNQL [5] is founded on GraphLog considering querying and manipulation of data. SocialScope[16] ranks the output of the query on the communal associations. III. MAJOR CHALLENGES The high level of personal information of a user and the inter-communication between the users can be extracted from the social network. Structuring graphs as a theoretical graph data model defines a useful database. Graphs are the foundation in mathematics and computer science. Graph theory is supported by a huge amount of formal study and analysis. The system can use effective graph related algorithms which are specially designed to utilize the discussed graph data structures. Graphs can represent online social network services easily and the graph databases are sufficient to meet the present and the future needs for social web [2]. A database system that can answer the query written in a natural language for a social network is extremely challenging. The database systems should be capable of managing, processing, and analyzing complex, heterogeneous, temporal and voluminous graph-structured data. A more flexible and dynamic system is required which provides the applications and user to augment novel facts and associations as they want. Designing of a new system that can handle the voluminous data and manage the heterogeneous resources effectively in online social network requires reconsidering about all aspects of a DBMS, including data modeling, storage management, indexing, and query processing [2]. Some of the main issues that can serve the answer of the user’s query specified in natural language are listed below: 1. Extracting and mapping of the entities from the query is a major challenge and assigning semantic information of the entities extracted from the user’s query to the attributes of the social network is an another extreme of the river. 2. There is a need of an optimal operational leaning process and accomplishing it effectively in a query processor. 3. Leveraging the results of one query to another to speed up the learning for a different query is also an interesting and important problem. 4. A high correlation mechanism is required to build the gap between user’s interest of querying the system and the results that the user is getting from the system. 5 There is a need to develop a better privacy mechanism that can restrict the usage of valuable information about the user by others. 6. Due to the enormous size of social network and the perpetual growth of the embryonic network, there is a need of an operative mechanism to demonstrate the closeness of two nodes, measuring centrality etc. 7. To provide an optimal, flexible dynamic and minimal time for executing a query is another important task. 8. The hypothetical questions, management of time and sources within a network, considering collaboration into the query language and updating the coordination of the networks is another hurdle to cross over. 9. Displaying effective results of different statistics over the network is another challenge to face. There are social network aggregators available that integrates the services provided by the multiple online social networks like Hoot suite, flock etc. There are various solutions available to integrate the social network but no one has tried to integrate the information available within multiple social networks. None of the aggregator has mined the multiple social networks and extracted some useful information after collecting data from different Social Networks in a natural language. A Comparative study of various services of social network aggregators is explored and presented in table 2. Flock [18] can get updates from friends, status updates and photos submitted at multiple social networks whereas SocConnect users can creates a personalized social and semantic contexts for their social data. Users can combine and cluster friends and rate network [17]. Hootsuite [19] aggregates organizations and businesses to © www.ijarcsse.com, All Rights Reserved Page | 505 Pujara et al., International Journals of Advanced Research in Computer Science and Software Engineering ISSN: 2277-128X (Volume-7, Issue-6) collaboratively execute promotions across multiple social networks and XeeMe [18] Organizes Social presence, discovers new network and people. It organizes the entire social presence of the user, determine new networks and people and develop their presence and influence. But none of the aggregator has mined the multiple social networks and extracted some useful information after collecting data from different Social Networks. All the social network aggregators discussed can integrate Facebook, LinkedIn and Twitter accounts but none of the network can handle the query search written in a natural language. IV. PROPOSED WORK Figure 1 Architecture of Social Search Engine To start with, there are numerous different conceivable valuable implementations for each capacity. Expecting that there are various ways to implement, which ought to be decided to find the solution to the specific question. The database is created and aggregated based on the information available from multiple social networks. The data should be extracted based on the requirement from the user end. The similar data are clustered using Machine learning algorithm. The clustered data are ranked based on the data extracted with the help of Machine learning Algorithm. Natural language techniques can be applied to extract the semantic meaning of the query to cognize the need of the user and then the query is sacked to the database engine. The extracted output will be displayed to the user as per user’s preference. The complete architecture is as shown in figure 1. Content generated by the user, their interaction among the users and relationships are the point of the information to be analyzed and extracted as per user’s criteria. The clustering will group the useful value from the ocean of data to identify and discover new learning to understand more aspects of the user like interest in a particular group, same age group connected people etc. To select the best choice of user’s objective to query the system and provide the results is accomplished by ranking algorithm. Utilizing machine learning, the databases framework can endeavor to enhance the performance of the search results comes about, when the question is resected. For search queries that runs frequently, and for which to a great degree effective outcome positioning is essential, a learning procedure is very valuable. In this way, in a nutshell, our vision is of natural language query processor including services of social networks that can likewise figure out how to answer questions. In the proposed work, 5000 profile information about the user is extracted from Twitter and Bing Search that is considered as the input data. The input query is processed in order to extract the keywords for getting the relevant results from the database. Characters are converted to lower case, Stop words are removed, Extra white spaces are stripped and stemming is done using Porter’s suffix-stripping algorithm in order to map the entities into a unique form. User Profiles are aggregated from multiple social networks into one unique profile. TF - IDF and Cosine Similarity - TF-IDF is a weighted model commonly used for information retrieval problems. It aims to convert the text documents into vector models on the basis of occurrence of words in the documents without taking considering the exact ordering. For Example - let say there is a dataset of N text documents, In any document ―D‖, TF and IDF will be defined as Document (D) – dataset that has n number of documents Term Frequency (TF) - TF for a term ―t‖ is defined as the count of a term ―t‖ in a document ―D‖ Inverse Document Frequency (IDF) - IDF for a term is defined as logarithm of ratio of total documents available in the corpus and number of documents containing the term T. TF-IDF formula gives the relative importance of a term in a corpus (list of documents), given by the following formula below. Following is the code to convert a text into tf-idf vectors: © www.ijarcsse.com, All Rights Reserved Page | 506 Pujara et al., International Journals of Advanced Research in Computer Science and Software Engineering ISSN: 2277-128X (Volume-7, Issue-6) The cosine of the angle between the vectors ends up being a good indicator of similarity because at the closest the two vectors could be, 0 degrees apart, the cosine function returns its maximum value of 1. It's worth noting that because we are calculating similarities and not distances, the optimization objective in this function is not to minimize the cost function, or error, but rather to maximize the similarity function. Hierarchical Algometric clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. The quality of a pure hierarchical clustering method suffers from its inability to perform adjustment, once a merge or split decision has been executed. Then it will neither undo what was done previously, nor perform object swapping between clusters. Thus merge or split decision, if not well chosen at some step, may lead to somewhat low quality Clusters. One promising direction for improving the clustering quality of hierarchical methods is to integrate hierarchical clustering with other techniques for multiple phase clustering. The proposed clustering integrates the output of TF –IDF and Hierarchical clustering and produces ensemble clusters. The results of each clustering models are combined and ensemble clusters are produced by using Hungarian algorithm and cumulative voting for each cluster. Formally the HAC algorithm can be stated as follows: i) Begin disjoint clustering. ii) Update the distance matrix D by deleting the rows and columns corresponding to the clusters and addition of new rows and columns to the newly formed cluster. iii) If all data points are in one cluster then stop or else repeat the steps from ii. Output cluster of users are ranked as per the user’s query. User’s meeting the criteria as per the query will be ranked higher. V. SIMILARITY AND EVALUATION MEASURES 1. Manhattan distance: The distance between two points on the number of horizontal/vertical that takes from one point to another [23]. It is the simple sum of horizontal and vertical entities. The Manhattan distance of two points p and q is computed as, Where: p=p1, p2, ----, pn p=p1, p2, ----, pn for n varibles. 2. Euclidean distance: The square root of the sum of squared difference between the two vectors is defined as the Euclidean distance [24]. It is the straight line distance between two points in the Euclidean space. The Euclidean distance of two vector p and q is defined as, 3. Cosine Dissimilarity: It is used to identify how two vectors aren’t related to each other [].The cosine of 0° is 1, and it is less than 1 for any other angle. The cosine dissimilarity of vector p and q is calculated as:- 4. Rand index: It penalizes both false positive and false negative decisions during clustering [26]. Table 1 displays the evaluation of various measures for the extracted dataset. Data Twitter users profile Bing users profile Table 1. Evaluation on various measures Manhattan Euclidean Rand 0.81 0.73 0.36 0.46 0.56 0.23 Cosine 0.15 0.25 VI. CONCLUSION User Personalization and user interaction is a high level of information that can easily be extracted from the social network. No single model of data exists which is appropriate for all the users and problem domain. Thus, considering the modern nature of communication of users via social network, there is a need to build a more realistic query processor that can answer the user’s query in a natural language and overcomes the traditional strategies to adopt the high level of user’s communication. REFERENCES [1] Cohen, S., Ebel, L., & Kimelfeld, B. (2013). A Social Network Database that Learns How to Answer Queries. In CIDR. [2] YANG, S. (2011). Data Modeling and Query Processing for Online Social Networking Services (Doctoral dissertation). [3] Ronen, R., & Shmueli, O. (2009, March). SoQL: A language for querying and creating data in social networks. In Data Engineering, 2009. ICDE'09. IEEE 25th International Conference on (pp. 1595-1602). IEEE. © www.ijarcsse.com, All Rights Reserved Page | 507 [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] Pujara et al., International Journals of Advanced Research in Computer Science and Software Engineering ISSN: 2277-128X (Volume-7, Issue-6) San Martın, M., Gutierrez, C., & Wood, P. T. (2011). SNQL: A social networks query and transformation language. cities, 5, r5. Serrano Suarez, D. F. (2011). SociQL: a query language for the social web. Holland, P. W., & Leinhardt, S. (1977). Social Networks: A Developing Paradigm. Zou, L., Chen, L., & Özsu, M. T. (2009). Distance-join: Pattern match query in a large graph database. Proceedings of the VLDB Endowment, 2(1), 886-897. Williams, D. W., Huan, J., & Wang, W. (2007, April). Graph database indexing using structured graph decomposition. In Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on (pp. 976-985). IEEE. Tong, H., Faloutsos, C., Gallagher, B., & Eliassi-Rad, T. (2007, August). Fast best-effort pattern matching in large attributed graphs. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 737-746). ACM. Mitra, S., Bagchi, A., & Bandyopadhyay, A. K. (2009). Design of a data model for social network applications. Database Technologies: Concepts, Methodologies, Tools, and Applications: Concepts, Methodologies, Tools, and Applications, 1. Dries, A., Nijssen, S., & De Raedt, L. (2009, November). A query language for analyzing networks. In Proceedings of the 18th ACM conference on Information and knowledge management (pp. 485-494). ACM. Staworko, S., & Wieczorek, P. (2012, March). Learning twig and path queries. In Proceedings of the 15th International Conference on Database Theory (pp. 140-154). ACM. Cohen, S., & Cohen-Tzemach, N. (2013, June). Implementing link-prediction for social networks in a database system. In Proceedings of the ACM SIGMOD Workshop on Databases and Social Networks (pp. 37-42). ACM. Amer-Yahia, S., Lakshmanan, L., & Yu, C. (2009). Socialscope: Enabling information discovery on social content sites. arXiv preprint arXiv:0909.2058. Lampe, C., Ellison, N., & Steinfield, C. (2006, November). A Face (book) in the crowd: Social searching vs. social browsing. In Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work (pp. 167-170). ACM. Amer-Yahia, S., Lakshmanan, L., & Yu, C. (2009). Socialscope: Enabling information discovery on social content sites. arXiv preprint arXiv:0909.2058. Zhang, J., Wang, Y., & Vassileva, J. (2013). SocConnect: A personalized social network aggregator and recommender. Information Processing & Management, 49(3), 721-737. Fadrique del Campo, H. (2012). Design and development of a social network aggregator (Doctoral dissertation). Hootsuite Media Inc. (2015, Sept 1). Retrieved from hootsuite.com: https://hootsuite.com/ Patriquin, A. (2007). Connecting the social graph: member overlap at opensocial and facebook. Compete. com blog. Cohen, S., Ebel, L., & Kimelfeld, B. (2013). A Social Network Database that Learns How to Answer Queries. In CIDR. Perez, S. (2009). Who Uses Social Networks and What Are They Like. T. W. Schoenharl and G. Madey, ―Evaluation of Measurement Techniques for the Validation of Agent-based Simulations Against Streaming Data‖, In International Conference on Computational Science, (2008), Springer Berlin Heidelberg, pp. 6-15. A. Huang, 2008, ―Similarity Measures for Text Document Clustering‖, Proceedings of the sixth New Zealand Computer Science Research Student Conference (NZCSRSC2008), Christchurch, New Zealand, (2008), pp. 4956 J. Han, J. Pei, and M. Kamber, ―Data Mining: Concepts and Techniques‖, (2011), Elsevier Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association, 66(336), 846-850. © www.ijarcsse.com, All Rights Reserved Page | 508
© Copyright 2026 Paperzz