1 A SURVEY PAPER ON ESTIMATING THE ONLINE SOCIAL NETWORING SITE Deepti Bhagwani*, Setu Kumar Chaturvedi *[email protected] Department of Computer Science Engineering Technocrat Institute of Technology Bhopal Rajiv Gandhi Proudyogiki Vishwavidyalya Bhopal M.P. India Abstract: A social network is a set of people or organizations or other social entities connected by set of social relationships such as friendship, co-working or information exchange. Estimating the server load of online social network sites is the most challenging research topic of the network management system. Also the online social web servers are existing in various countries. This Paper provides the necessary background on the online social networking sites and the Techniques that are applied on social networking sites to maintain them. Keywords: Online Social Media, Online Social Networking, Human Factor, Information System. Introduction With the advancement of internet age online social networks has shown the rapid growth. Internet has become the means to interact with people for business communication as well as personal contact, social networks are the means which with the help of internet provides strong bonding between the individual. Now-a-days there are various resources available by which interested individuals can become the part of the online social networking community. Online social networks (LSNW 2010), (FS 2013) provide a powerful reflection of the structure and dynamics of the society of the 21st century and the interaction of the Internet generation with both technology and the people. Indeed, the intense growth of social multimedia and user generated content is revolution of all phases of the content value chain including fabrication, processing, allocation and utilization. It also originated and brought to the multimedia sector a new underestimated and now critical aspect of science and technology: Online social interaction 2 and networking. The significance of this new speedily evolving research field is clearly evidenced by many associated emerging technologies and applications including online content sharing services, communities, multimedia communication through the Internet, Online social multimedia search, interactive services, health care, entertainment, and security applications. This has generated a new research area called Online Social Multimedia Computing, in which well- known and established computing and multimedia networking technologies are brought together with emerging social media research. OSN Internet services are changing the way we communicate with others, entertain and the way we live. Social Networking is one of the primary reasons that many people have become avid Internet users; people who until the emergence of social networks could not find interest in the web. It is a very vigorous indicator of what is actually happening online. The Web 2.0 era has passed leaving great strength to the end-users. Nowadays, users both produce and consume significant quantities of multimedia content. Moreover, these behaviors when combined with Online Social Networking have formed a new Internet era where multimedia content sharing through Social Networking Sites is a daily practice. More than 200 SNSs of worldwide impact are known today and this number is growing rapidly. Many of the existing top web sites are either pure SNSs or offer some social networking capabilities (Rupam Some 2013). Except for the well known “first tier” social networks with hundreds of millions of users that span throughout the world, there are also many small social networking sites that are equally as popular within the more limited geographical scope of their membership, that may be within a city, country or continent, ONLINE SOCIAL NETWORKING SITES Social networking sites now reach 82 percent of the world’s online population, which represents 1.2 billion users around the world. The social networking sites adoption trend largely mirrored the global Internet adoption curve, and developed proportionately, showing that as soon as people began to get connected, they began connecting with one another. Even more influential feature of social networking’s emergence is the amount of time people currently engage with it. As a percentage of the time people spend online, social networking activity has been more than tripled in the last few years. In October 2011, Social Networking was classified as the most popular content category in worldwide engagement, accounting for 19 percent of total time spent online. Nearly 1 minute time in every 5 minutes spent online is now spent on social networking sites – a stark contrast from when the category accounted for only 6 percent of time spent online in March 2007. Time spent on social networking sites increased during this time by taking share predominantly from web-based email and instant messengers, reflecting its emergence as the primary communication channel for users. Clearly, it has evolved over the years to become an integral part of the global online experience, in various ways both mirroring and augmenting the offline social experience (SW 2011). Fig1: The Rise of the Global Social Networking Audience Source: comScore Media Metrix Worldwide, March 2007 – October 2011 3 Fig 2: Time Spent Online on Key Internet Categories Source: ComScore Media Metrix Worldwide, March 2007 – October A social network is a set of people or organizations or other social entities connected by set of social relationships such as friendship, working together or exchange of information. Social network analysis emphases on the analysis of the pattern of relationships among people, organizations and social entities. This section provides an overview of different social networking sites (Rupam Some 2013). Flickr Flickr is a photo-sharing site based on a social network. The Flickr contains over 1.8 million users and 22 million links. Flickr (M. Molloy and B. Reed 1995) is an image and video hosting website, web services suite, and an OSN. Flickr provides both private and public storage of image. A user uploading an image can set privacy controls that determine who can view an image. A photo can be marked as either public or private. Private images become visible by default only to the uploader, but they can also be marked as visible by friends and/or family. Privacy settings can also be decided by adding photographs from a user’s photo stream to a “group pool”. If the group is private all members of that group can see the photo. If the group is public the photograph becomes public as well. Flickr also provides a “contact list” which can be used to control image access for a specific set of users in a way similar to social tier tools of other OSNs (Alan Mislove, Massimiliano Marcon and Krishna P. Gummadi 2007). Facebook Facebook is the world’s largest social network, with over 350 million active users and half of them visit the site once per day (FS 2013). It basically provides a platform to share a common interest, idea, task or goal that interacts in its users where they are able to develop or maintain personal relationships. Moreover, it also provides facilities to invite friends and guest to join their events. It shares many of the professional data as well which can also be beneficial for business purpose. New games are also available which can be played as a individual or in groups across the world. Facebook provide a bulletin board for users to sell and buy products from each other. Companies are using this as means of advertisement of their products. Facebook launched API for its platform on 2007, providing a framework for software developers to create applications that interact with core Facebook features. But its API put several restrict to access whole of individual’s social graph. Orkut Orkut a social networking site run by Google. Orkut is a “pure” social network, as the sole purpose of the site is social networking, and no content is being shared. Its purpose is to provide an online meeting place where people can socialize, make new connections and find others who share their interests. Features include messaging, text chat, video chat, and an ability to personalize the view of the site using a wide range of colors and themes. Anyone 18 years and above can join. Orkut™ is available in 48 languages and is especially popular in Brazil and India. In 2010, Orkut™ had more than 100 million users worldwide. Brazil had the most visitors with 48%, while 39.2% were from India, and 2.2% were from the United States. The site was earlier popular in Iran, but the Iranian government block access to Orkut™ now, claiming it is a threat to national security issues and Islamic values. 4 Government in the United Arab Emirates and Saudi Arabia have also blocked access to the site (Alan Mislove, Massimiliano Marcon and Krishna P. Gummadi 2007)(Yong-Yeol Ahn, Seungyeop Han, Haewoon Kwak, Young-Ho Eom, Sue Moon, Hawoong Jeong 2007). Twitter Twitter is an OSN and micro blogging service that enables users to send and read short 140character text messages, called "tweets". Registered users can read and post tweets, but unregistered users can read them only, not post it. The service quickly gained worldwide popularity, with 300 million registered users in 2012, who posted 340 million tweets on daily basis. The service also took care of 1.6 billion search queries per day (LSNW 2010). LinkedIn LinkedIn is a business-oriented social networking site, which was founded in December 2002 and launched in May 2003 It has become fastest means of connecting professional buddies sharing knowledge about their area of interest and other professional, which help individuals in exchanging the Human resources. Many the companies are also using this network to hire the professionals required in various segments. You can be in touch with thousands of professionals by liking from each others. As of October 2009, LinkedIn had more than 50 million registered users, covering more than 200 countries and territories worldwide. LinkedIn controls what a viewer may see based on whether she or he has a paid account. LinkedIn allows users to opt out of displaying their network. Compared other OSNPs, LinkedIn’s business model is unique. It controls what a viewer may see based on whether she or he has a paid account (LSNW 2010). YouTube YouTube is a popular video-sharing site that includes a social network. The YouTube data we present was obtained on January 15th, 2007 and consists of over 1.1 million users and 4.9 million links. Similar to Flickr, YouTube exports an API. YouTube allows links to be queried only in the forward direction, similar to Flickr. Unfortunately, YouTube’s user identifiers do not follow a standard format (Alan Mislove, Massimiliano Marcon and Krishna P. Gummadi 2007). Live Journal Live Journal is a popular blogging site whose users form a social network. It contains over 5.2 million users and 72 million links (Alan Mislove, Massimiliano Marcon and Krishna P. Gummadi 2007). Cyworld Cyworld is the largest and oldest online social networking service in South Korea. It began operation in September 2001, and its growth has been explosive ever since. Cyworld’s 15 million registered users, as of November 2006, are an impressive number, considering the total population of 48 million in South Korea. As any SNS, Cyworld offers users to establish maintain and dissolve a friend (called ilchon) relationship online (YongYeol Ahn, Seungyeop Han, Haewoon Kwak, Young-Ho Eom, Sue Moon, Hawoong Jeong 2007). MySpace MySpace is the largest social networking service in the world, with more than 190 million users. It began its service in July 2003, and the number of users grew explosively. According to Alexa.com2, it is the world’s 5th most popular website (YongYeol Ahn, Seungyeop Han, Haewoon Kwak, Young-Ho Eom, Sue Moon, HawoongJeong 2007) (4th among English websites). Materials and Methods Network Size Estimation This presents an estimator for the graph size (number of nodes). The estimator uses observations of node pairs which are “far away” from each other in the random walk (S. J. Hardiman, P. Richmond, and S. Hutzler 2009). This assumption is needed to ensure both nodes in a pair are (approximately) uncorrelated: each drawn from the stationary distribution. Specifically, the estimator examines node pairs whose index distance is greater than a threshold m [5]. Formally, I = {(𝑘, 𝑙) | 𝑚 ≤ |𝑘 − 𝑙| ⋀ 1 ≤ 𝑘, 𝑙 ≤ 𝑟} 5 The estimator counts weighted neighbor collisions. A neighbor collision is a pair of indices (k, l) such that vxkand vxl share a common neighbor. Formally, let Ai be the set of vertices adjacent to vi. Thus, Ai ∩Aj is the set of nodes neighboring both viand vj. Given a random walk (x1, x2, . . ., xr), we define a new variable φk,l = |Axk ∩ Axl|. Note that if (k, l) ∈ I, then To see why consider the following combinatorial proof. For a node vk, the number of connected triplets (vi, vk, vj) with no restrictions on i and j is d2_ k. Thus, the total number of connected triplets is Alternatively for nodes vi and vj the number of connected triplets (vi, vk, vj) is |Ai ∩ Aj |. Thus, the total number of connected triplets can also be expressed by To see why consider the following combinatorial proof. For a node vk, the number of connected triplets (vi, vk, vj) with no restrictions on i and j is d2_ k. Thus, the total number of connected triplets is alternatively for nodes viand vjthe number of connected triplets (vi, vk, vj) is |Ai ∩ Aj |. Thus, the total number of connected triplets can also be expressed by Crawling large graphs Crawling large, complex graphs presents unique challenges. In this section, we describe our general approach before discussing the details of how we crawled each network (Alan Mislove, Massimiliano Marcon and Krishna P. Gummadi 2007). Crawling the entire connected component The primary challenge in crawling large graphs is covering the entire connected component. At each step, one can generally only obtain the set of links into or out of a specified node. In the case of online social networks, crawling the graph efficiently is important since the graphs are large and highly dynamic. Common algorithms for crawling graphs include breadth-first search (BFS) and depth-first search. Often, crawling an entire connected component is not feasible, and one must resort to using samples of the graph. Crawling only a subset of a graph by ending a BFS early (called the snowball method) is known to produce a biased sample of nodes. In particular, partial BFS crawls are likely to overestimate node degree and underestimate the level of symmetry (L. Becchetti, C. Castillo, D. Donato, and A. Fazzone 2006). In social network graphs, collecting samples via the snowball method has been shown to underestimate the power-law coefficient, but to more closely match other metrics, including the overall clustering coefficient. Some previous studies of social networks have used small graph samples. Using only forward links Crawling directed graphs, as opposed to undirected graphs, presents additional challenges. In particular, many graphs can only be crawled by following links in the forward direction (i.e., one cannot easily determine the set of nodes which point into a given node). Using only forward links does not necessarily crawl an entire WCC; instead, it explores the connected component reachable from the set of seed users. This limitation is typical for studies that crawl online networks, including measurement studies of the Web (S. H. Lee, P.-J. Kim, and H. Jeong 2006). Size Estimation of Facebook They used two crawls performed on Facebook, the first crawl consisted of 984, 830 uniformly sampled users collected during April 2009.11 The second crawl was performed during October 2010 and consisted of 988, 116 users (M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou 2010). This crawl performed a simple random walk on the Facebook graph and therefore selected users with probability proportional to their degree (Liran Katzir, Edo Liberty, Oren Somekh and Ioana A. Cosma 2012). 6 Fig 3: shows an example of a directed graph crawl. The users reached by following only forward links are shown in the shaded cloud, and those reached using both forward and reverse links are shown in the dashed cloud. Using both forward and reverse links allows us to crawl the entire WCC, while using only forward links results in a subset of the WCC Subnet work size estimation Since the actual size of Facebook is not known (other than Facebook’s own reports) they first estimate the size of a subgraph whose size is known. They selected a random subset of 1, 000, 000 Facebook users and tried to estimate the size of this sub-population using the first algorithm. This is done for two reasons. First, to test the subgraph size estimation algorithm. Second, to make sure that Facebook’s network topology and statistics are suitable for our estimators. They present an error curve, a confidence interval curve, and a comparison curve. These results corroborate that their subgraph size estimators behave almost identically to the complete graph estimators. This was expected since their analysis is essentially identical. A more important discovery is that the network topology and node degree distribution of Facebook are indeed suitable for our estimators to perform well. Estimating the size of Facebook They now estimate the size of the entire Facebook network. Presenting accuracy plots in this case is not possible since the true size of Facebook is not known. The uniform Facebook sample collected during April 2009, contains 2053 collisions and 2052 non-unique elements. Substituting these into Equations (1) and (2) yields estimates of 237, 197, 785 and 236, 984, 623 users respectively. The very same month, Facebook ([FBS]) reported of having “more than 200 million active users” and “more than 250 million active users” three months later. The crawl that was performed during October 2010 contained 4099 collisions and 4064 non-unique users, taking 50 random walk steps between samples. This gives estimates of 475, 566, 857 and 475, 864, 724 respectively (FS 2013). Facebook at the same time reported of having more than 500 million active users”. This is summarized in Table 1. April 2009 October 2010 Sampling Uniform Degree 6 distribution Number of samples 0.98 ∙ 10 1 ∙ 106 Number of collisions 2053 4099 Number of non2052 4064 6 unique Collision estimator 237 ∙ 10 475 ∙ 106 ∙ Non-unique 236 ∙ 106 475 estimator report 106 ∙ Facebook 200 – 250 ∙ 500 6 106 Table 1: Crawl details and consequent size estimates10 of the entire Facebook network for April 2009 and October 2010. 7 CLUSTERING COEFFICIENT ESTIMATION There are two types of clustering coefficient techniques network average and global clustering coefficient estimators (Stephen J. Hardiman, Liran Katzir 2013). The main observations that are used in both are as follows. Given a random walk (x1, x2, . . . , xr), we define a new variable φk = Axk−1,xk+1 for every 2 ≤ k ≤ r − 1. For any function f(xk) the following holds: The first equality holds due to the law of total expectation. The second equality holds because there are d2i equal probability combinations of (xk−1, vi, xk+1) out of which only 2li form a triangle (vj, vi, vk) or a reverse triangle (vk, vi, vj). Notice that in a triangle or a reverse triangle vjis connected to vk(Aj,k = 1). The third equality holds due to algebraic manipulation. Result and Discussion OSNs Number of Users Flickr 1.8 million Facebook 350 million Orkut 100 million Twitter 300 million LinkedIn 50 million YouTube 1.1 million LiveJournal 5.2 million Cyworld 48 million MySpace 190 million Table 2: Comparative analysis of various OSN Approach Network Size Estimation Crawling large graphs Size Estimation of Facebook Clustering Coefficient Estimation Concept Based on Random walk Based on breadthfirst search (BFS) and depth-first search Based on Subnet work size estimation Performance Provide good accuracy in many of the cases. Provide good performance They consistently provide more accurate estimates while using a smaller number of samples algorithm is strictly more accurate Based on Network average and global clustering coefficient estimators Table 3: Comparative analysis of various Estimation Techniques 8 Conclusion This paper provides a more current evaluation and update of online social networking site and estimation techniques for social networking. Literatures have been reviewed based on different aspects of the estimation of social networking sites. Survey on recent works in the field of social network analysis depicts that different research exposures are there in the field of social networking sites estimation. Acknowledgements This study is a part of the dissertation work on the study of estimation of online social networking sites using clustering techniques. This is self funded and supported by the Department of Computer Science engineering TIT Bhopal M.P. India Refrences A. Broder, R. Kumar, F. Maghoul, P. Raghavan,S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener.Graph(2000) Structure in the Web: Experiments and Models. In Proceedings of the 9th International World Wide Web Conference (WWW’00), Amsterdam. Alan Mislove, Massimiliano Marcon and Krishna P. Gummadi (2007) “Measurement and Analysis of Online Social Networks”IMC’07, San Diego, California, USA Liran Katzir, Edo Liberty, Oren Somekh and Ioana A. Cosma(2012) “Estimating Sizes of Social Networks via Biased Sampling” , Microsoft Innovations Lab, Israel. L. Becchetti, C. Castillo, D. Donato, and A. Fazzone(2006) “A Comparison of Sampling Techniques for Web Graph Characterization”. In Proceedings of the Workshop on Link Analysis (LinkKDD’06), Philadelphia, PA. M. Molloy and B. Reed (1995) “A critical point for random graphs with a given degree sequence" Random Structures and Algorithms, 6(2-3), 99: 161-180. M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou (2010) “Walking in Facebook: A case study of unbiased sampling of OSNs”. In Proc. of IEEE INFOCOM ’10, San Diego, CA. Stephen J. Hardiman, Liran Katzir (2013) "Estimating Clustering Coefficients and Size of Social Networks via Random Walk", International World Wide Web Conference Committee (IW3C2), Rio de Janeiro, Brazil S. H. Lee, P.-J. Kim, and H. Jeong (2006) Statistical properties of sampled networks. Physical Review E,73. S. J. Hardiman, P. Richmond, and S. Hutzler (2009) “Calculating statistics of complex networks through random walks with an application to the on-line social network”. European Physics Journal B, 71(4):611– 622. Rupam Some (2013) "A Survey on Social Network Analysis and its Future Trends", International Journal of Advanced Research in Computer and Communication Engineering 2(6). Yong-Yeol Ahn, Seungyeop Han, Haewoon Kwak, Young-Ho Eom, Sue Moon, Hawoong Jeong (2007) "Analysis of Topological Characteristics of Huge Online Social Networking Services", International Conference on World Wide Web (WWW’07), pp 835-844. Facebook Statistics[FS](2013) http://www.facebook.com/press/info.php?sta tistics It’s a Social World: Top 10 Need-to-Knows About Social Networking and Where It’s Headed[SW] (2011) Available from http://www.comscore.com List of social networking websites[LSNW] (2010) Available from http://en.wikipedia.org/wiki/list_of_social_n etworking_websites.
© Copyright 2026 Paperzz