Characterizing the YouTube video-sharing community

Characterizing the YouTube video-sharing community∗
Rodrygo L. T. Santos, Bruno P. S. Rocha,
Cristiano G. Rezende, Antonio A. F. Loureiro
Department of Computer Science
Federal University of Minas Gerais
Belo Horizonte, MG 31270-901 Brazil
{rodrygo,bpontes,rezende,loureiro}@dcc.ufmg.br
ABSTRACT
The YouTube video-sharing community is a recent and successful phenomenon that provides an expressive representation of a social network. Despite its accelerated growth, a
deep study of YouTube’s topology has not yet been made
available. For this work, we have collected a representative
sample of YouTube using our Crawlanga tool and analyzed
both its structural properties, as well as its social relationships among users, among videos, and between users and
videos. We analyze properties such as profile of users and
popularity of videos in order to highlight the impact of social
relationships on a content-sharing network.
Categories and Subject Descriptors
H.2.8 [Database Management]: Database ApplicationsData Mining; J.4 [Computer Applications]: Social and
behavioral sciences
General Terms
Human factors, Measurement
Keywords
Virtual communities, network sampling, network analysis
1.
INTRODUCTION
The last decade has witnessed the emergence of several popularity phenomena through the word-of-mouth and self-publishing made feasible by the World Wide Web. This is true
for people, the content they produce, and the vehicles that
distribute their production. Some of these phenomena have
declined or have been replaced as rapid as they rose, while
others have retained a steady pace of growth.
The TIME’s Invention of the Year for 2006 [4], the YouTube 1
video-sharing website is one of the most recent and astonishing such examples of a Web phenomenon. Founded in
∗Data set will be made available in the camera ready version
1
http://www.youtube.com
February 2005, YouTube was officially launched in December of the same year and has not stopped growing since then.
By July 2006, the site reported to serve 100 million videos
per day, with a daily upload of more than 65,000 videos and
nearly 20 million unique visitors per month – a 29% share
of the US multimedia entertainment market and 60% of all
videos watched online [12]. Its storage demands were estimated at around 45 terabytes with several million dollar
expenses on bandwidth per month [3]. Within one year of
its launch, YouTube was purchased by Google for US$1.65
billion in stock.
YouTube’s success can be seen as an example of the “wisdom
of crowds” [14]: the site exerts no control over its users’ freedom for publishing 2 , in such a way that users not only share
their videos with a few friends, but instead participate in a
huge decentralized community by creating and consuming
terabytes of video content, ranging from home-made standup performances to eyewitness footages from inside news as
they occur anywhere in the world.
Despite its enormous popularity and the sums of money involved, it is rather surprising that (at least to our knowledge) no study has been carried on unveiling the virtual
community behind YouTube.
In this paper, we present an analysis of YouTube network,
based on a sample of it we were able to collect using a
crawler tool. In our analysis we focus users and videos,
and attributes and relationships between them. We observe
attributes such as number of videos visualizations, users subscription, users favorite lists, commenting, and others. We
also model the collected network as different networks including specific views as, for instance, a friendship network
between users and a network between videos connected by
edges that represent being part of a same user’s favorite list.
This paper is organized as follows. On Section 2 we present
work on similar networks and virtual communities. We
present some background on the YouTube video-sharing community in Section 3. Sections 4 and 5 detail the crawling
process and tool, as well as the data sample we used, respectively. Our analysis of attributes and relationships is
discussed in Section 6. Finally, we present our conclusions
in Section 7.
2
According to the site policy, copyrighted or inappropriate
content is reviewed after being flagged by the community.
2.
RELATED WORK
The analysis of structural properties of large networks have
received much attention in the late years. Typically, studies include network properties such as degree distribution,
diameter, clustering coefficient, betweenness centrality, network resilience, mixing patterns, degree correlations, community structure, network navigation, etc. In this section,
we briefly outline some publications on the analysis of largescale virtual communities, organized as social networks and
information networks [11].
Anh et al. [1] compare structural properties of sampled friendship networks from two social networking services (SNSs),
namely MySpace 3 and Orkut 4 , and the entire topology of
the Cyworld 5 SNS. They uncover a two-period scaling behavior in Cyworld’s degree distribution, being the exponent
of each period correspondent to the exponent of the degree
distribution of MySpace and orkut, respectively. Also, they
show how Cyworld’s testimonial network (a subset of its
friendship network) presents a similar degree correlation to
real-life social networks. Friendship network properties are
also studied by Kumar et al. [6]. They present measurements on two Yahoo! SNSs: Flickr 6 , a one-million-node
photo-sharing community, and Yahoo! 360 7 , a five-millionnode social networking website. They present a model of
network growth by classifying users in these networks as
either (1) passive, loner members; (2) inviters, who bring
offline friends to form isolated communities; and (3) linkers,
who play the role of bridging a large fraction of the entire
networking the network evolution.
Liben-Nowell et al. [8] study the formation of friendship links
in the LiveJournal 8 blogging community. They show that,
among the nearly 500,000 LiveJournal users with mappable
geographic locations (at the level of towns and cities), the
probability of two people being friends is inversely proportional to the number of people geographically close to them.
Also, they find that this property influences the formation
of two thirds of the friendship links among these users and
prove analytically that short paths can be discovered in every network in which it is present. Link formation is also
investigated by Backstrom et al. [2]. They study the influence of network structural properties on the establishment
of community membership links in two large sources of data:
the LiveJournal social networking and blogging service, with
several million members and explicitly defined membership
links, and DBLP 9 , a publication database with several hundred thousand authors with conferences regarded as proxies
for communities. They show that the tendency of individuals to join a community is influenced by both the number of
friends they have within the community and how connected
these friends are to one another.
Some works on information networks analysis have also employed data sets from virtual communities. Newman [10]
3
http://www.myspace.com
http://www.orkut.com
5
http://www.cyworld.com
6
http://www.flickr.com
7
http://360.yahoo.com
8
http://www.livejournal.com
9
http://www.informatik.uni-trier.de/∼ ley/db/
4
compares structural properties of the co-authorship networks
in publication databases from different areas, including biomedical research, physics, and computer science. He presents results on the mean and distribution of co-authorship degrees
and clustering coefficients for these networks and shows the
presence of the small world effect in all of them. Kumar
et al. [5] characterize the profile of more than one million
LiveJournal users with regards to three main dimensions:
age, geography, and interests. They show how over 70%
of friendship links among these users can be explained by
combining these three dimensions. They also investigate the
cultural aspect of highly-dynamic local, informal community
formation in the blogospace, through the establishment of
short-lived reading, posting, and listing relationships among
small groups of users.
3. THE YOUTUBE VIDEO-SHARING COMMUNITY
The YouTube video-sharing community can be seen as an
heterogeneous graph with basically two 10 types of node:
user and video.
Users can upload, view, and share video clips. Videos can
be rated, and the average rating and the number of times
a video has been watched are both published. Unregistered
users can watch most videos on the site; registered users
have the ability to upload an unlimited number of videos.
Related videos, determined by the title and tags, appear to
the right of the video. In the site’s second year new functions
were added, providing the ability to post video ‘responses’
and subscribe to content feeds for a particular user or users.
YouTube had (and still has) a lot of traffic coming to the
site to view videos, but far fewer users actually creating and
posting content [15].
Among all the potential relationships present in the YouTube
community, we consider the following in this paper:
• user-user friendship: two users mutually regard each
other as a friend;
• user-user subscription: a user subscribes to video feeds
from another user;
• user-video favoring: a user adds a video to his/her list
of favorites;
• video-video relatedness: a video is regarded related to
another one by the YouTube’s search engine.
4. CRAWLING YOUTUBE
Due to the amount of data required to analyze YouTube,
using a tool like a web crawler to collect data is a necessity.
A web crawler needs to visit web pages of videos and user
profiles. It must be able to follow links representing relationships, like user friendship or commenting, and store the
information on visited nodes and followed edges in a format
which can be further analyzed. As there is necessity for a
large amount of data, the tool must be efficient and scalable.
10
YouTube also features collective entities, namely groups
and contests, but they will not be considered in this work.
In order to crawl YouTube, we have used our own tool. This
tool is not only a crawler, but also an extractor, which generates a graph representation of the network. It is modeldriven, in a way that it reads a network model file, containing HTML patterns of the network to be crawled. By creating a network model for YouTube and setting up a crawling
structure, we were able to achieve collection of large portions of YouTube. Our crawler and extractor is presented in
detail in [13].
Our tool uses the snowball sampling method [7]. This can
be done with a single seed node, or multiple seeds. For this
work, we’ve used the single-seed approach. In single seed
snowball sampling, we first choose a single node and all the
nodes directly linked to it are picked. Then all the nodes
connected to those picked in the last step are selected, and
this process is continued until the desired number of nodes
is sampled. To control the number of nodes in the sampled
network, a necessary number of nodes is randomly chosen
from the last layer. This is similar to a breadth-first crawling
process [9].
We’ve made more than one crawl, using different network
model files. The purpose of this was to collect different
views of the network, considering determined relationships
at each crawl. Our crawling structure consisted of a single Pentium 4 3.2GHz server with 2 GBytes of RAM, and
6 client machines, with similar processors but 1 GByte of
RAM (our tool uses a client-server model).
5.
DATA SAMPLE
An important issue in any analysis of a collected network is
the validation of the gathered sample. The YouTube network is composed of millions of nodes and the task of collecting all of them is extremely hard. Therefore, only a part of
the network is actually collected. For this reason, it is fundamental that the fraction crawled represents the behavior
of the whole network.
There are several studies about sampling methods which
guarantee that a small collected fraction of the network represents its entire behavior. The snowball sampling method [7]
is a well-known method that reliably collects a part of a network that reflects the behavior of the whole network. The
method start with a single seed node, and follows the relationships to discover new nodes in a breadth-first search
fashion. Even though there are some studies that mention
the snowball method with multiple seeds, in this work we
used the single seed version since it is more diffused and
acknowledged.
For the crawling process we utilized the notions of nodes
and their relationships. A node is an user or a video of
the YouTube network, and the relationships are friendship,
favoring, subscription and publication, from the users part,
and relatedness and ownership, from the videos part.
The gathered nodes can be grouped in layers where nodes
belong to the same layer if they are distant (in the crawling
process) from the seed by the same number of hops. By this
definition, in the zero layer we have the seed (which was a
video), in the first layer 37 nodes (1 user and 36 videos),
in the third 3,546 nodes (36 users and 3,510 videos), in the
fourth 98,428 nodes (1,799 users and 96,629 videos) and in
the last 236,003 nodes (10,996 users and 225,007 videos). As
expected, the number of nodes in each layer grows exponentially.
Our sample has a total of 338,001 nodes (12,832 users and
325,179 videos) and indexed 12,131,796 nodes (625,383 users
and 11,506,413 videos). This means that through the 300
thousand nodes collected, more than 12 million other nodes
were found by the relationships modeled.
6. ANALYSIS OF YOUTUBE
The crawling process resulted in a dump file filled with
a graph representation of about more than three hundred
nodes collected. From this data, several information can
be extracted and our objective is to analyze the impact of
real-world relations in a technological environment.
The data can be split in two kinds: attributes and edges.
Even though our collect sample is just a fraction of the entire network, attributes are relative to whole network properties since they are derived from data provided by YouTube
database. Differently, the edges compose a network with
only the collected nodes. However, as the crawling process
followed the snowball method, these partial networks reflect
properties of the whole network.
6.1 Attributes of Nodes
As depicted earlier, the two types of nodes considered are
users and videos, each of them containing attributes useful
for our analysis. There are attributes that provide information about network properties which are related to human
interaction, such as channel views, number of subscribers
and number of videos watched, from users, and number of
times favorited, number of views and number of comments,
from videos.
The distribution of these attributes as well of the degrees
of formed networks were all plotted in a log-log scale graph
in order to identify the presence (or absence) of power-law
distributions. Several works reported power-law degrees distributions and how they are related to some real-word properties. Power-laws are distributions where few values have
high frequency and plenty of values have low frequencies
while still being a substantial part of the distribution (havetail phenomenon). When observed over social relationships,
these distributions often imply on a “preferential attachment” scenario, where nodes on the network tend to attach
to certain more “popular” other nodes.
Figures 1, 2 and 3, show general statistics as well as distributions of attributes from users and videos, respectively.
On the general statistics we can observe that users ages and
videos duration distributions can be modeled as normal distributions. Users nationality has a distribution with U.S.
being by far the most frequent, while there is a heavy tail
composed mostly of European countries. Videos categories
is a more balanced distribution, having entertainment, comedy and music as most popular, maintaining a coherency
with user ages distribution (majority of users formed by
young people between 17 and 26 years). Distributions of
attributes from users and videos follow power-law distributions, and are discussed on the following paragraphs.
100000
X=k
X>=k
Frequency
10000
(a) Users age
1000
100
10
1
1
10
100
1000 10000 100000 1e+06 1e+07 1e+08
Number of Views
(a) Number of Videos Watched
100000
X=k
X>=k
(b) Users Nationality
Frequency
10000
1000
100
10
1
1
10
100
1000 10000 100000 1e+06 1e+07 1e+08
Number of Views
(b) Channel Views
100000
X=k
X>=k
10000
Frequency
(c) Videos Categories
1000
100
10
1
1
10
100
1000 10000 100000 1e+06 1e+07 1e+08
Number of Subscribers
(c) Number of Subscribers
(d) Videos Duration
Figure 2: User Network Attributes Distribution
Figure 1: Statistics
Videos watched
1e+06
X=k
X>=k
100000
Number of subscribers
Frequency
10000
Users’ reputation is strongly connected to the videos they
publish. A metric than can quantify this reputation is the
number of subscribers an user has. When someone subscribe
to a user, he asks to be notified every time a new video is
published by this user. The distribution of this attribute is
shown in Figure 2(c).
1000
100
10
1
1
10
100
1000 10000 100000 1e+06 1e+07 1e+08
Number of Comments
(a) Number of Comments
1e+06
X=k
X>=k
100000
10000
Frequency
The number of videos watched by a user indicate the utilization of the YouTube service by him/her. As it can be seen in
Figure 2(a), most of the users have watched a small number
of videos. However, there are a few users that intensively
use the YouTube service, characterizing the distribution of
videos watched as power-law. This attribute accounts only
for views of logged users (a lot of viewers do not even have
a user account).
Channel views
Channel views is also connected to users’ reputation but this
connection is weaker than that of the number of subscribers.
This is due to the fact that a user can be popular for a period
of time and have a large number of channel views but, later,
lose his reputation and he still remain with a high value of
channel views. Figure 2(b) shows the distribution of the
users’ channel views.
Videos views
1000
100
10
1
1
10
100
1000 10000 100000 1e+06 1e+07 1e+08
Number of Views
One important characteristic of a video is its popularity.
Videos views indicate the video all-time popularity, since
its does not take into account when the views took place.
As shown in Figure 3(b), the distribution has a “normal
like” distribution up to around videos with 50 visualizations.
For more than that, the distribution assumes a heavy tailed
power-law distribution behavior.
(b) Number of Views
Users comments
1e+06
X=k
X>=k
100000
Frequency
10000
The number of users comments on a video is related to how
controversial it is. As more polemic a video is, more users
will post their comments and discuss about the video’s content. The distribution of the number of users comments can
be seen at Figure 3(a).
1000
Number of times favorited
100
10
1
1
10
100
1000 10000 100000 1e+06 1e+07 1e+08
Number of Times Favorited
(c) Number of Times Favorited
Figure 3: Videos Network Attributes Distribution
A stronger (compared to video views) metric of popularity is
the number of users that included the video in their favorites
list. This attribute is more suitable because it reflects the
current status of the video and not an old popularity. Furthermore, adding a video to a favorites list not only tells us
that an user watched the video but it also reflects that he enjoyed it. Figure 3(c) shows the distribution of this attribute
on the YouTube network.
6.2 Relationships
It is important to analyze the impact of human interaction in a technological environment. In the YouTube community there are two major ways of users relate to each
other: through friendship and subscription. Both relationships were extracted from collected data and had the resulting network analyzed. These networks were studied by the
10000
X=k
X>=k
Frequency
1000
100
10
1
1
10
100
1000 10000 100000 1e+06 1e+07 1e+08
Number of Friends
(a) Friendship
100000
1e+06
X=k
X>=k
100000
10000
10000
Frequency
Frequency
X=k
X>=k
1000
1000
100
100
10
10
1
1
10
100
1
1
10
100
1000 10000 100000 1e+06 1e+07 1e+08
Number of Users
1000 10000 100000 1e+06 1e+07 1e+08
Number of Related Videos
(a) Relatedness
(b) In-degree in Subscription
1e+06
X=k
X>=k
100000
Figure 4: Degree Distribution of User Networks
We also identify two relationships between videos. The first
is relatedness, which relates similar videos through the use
of tags and keywords. Although this is a relationship generated by a technological machine (YouTube generates related lists automatically), it is influenced by social relations,
since a video can have a variable number of related videos,
depending on its associated keywords. For instance, a video
tagged as a soccer video, a very popular category, is likely to
have many related videos. The second relationship between
videos is favorite lists. Since users can add videos to their
favorite lists, we can form a network of videos and connect
each two that are present on a same favorite list. This allows
us to identify clusters of videos and detect relatedness in a
different fashion than the first relation.
Figures 4 and 5 show degree distribution of users and videos
relationships, respectively. The following paragraphs further
detail these relationships.
Friendship
Through data collected from users nodes, it was possible to
create a representation of a graph were vertices are users
Frequency
10000
analysis of the degree distribution, number of nodes, clustering coefficient (number of triangles in the first neighborhood), longest shortest-path (L1 ) and average shortest-path
(L2 ).
1000
100
10
1
1
10
100
1000 10000 100000 1e+06 1e+07 1e+08
Number of Times Favorited
(b) In-degree in Favoring
Figure 5: Degree Distribution of Videos Network
Network
Friendship
Subscription
#Nodes
9,963
8,575
CC
0.264221
0.176046
L1
10
10
L2
2.76779
3.03550
Table 1: Networks Properties
Relatedness
YouTube provides a way of finding videos related to each
other. This relationship defines a relatedness network where
an edge exists between two videos if they are related by this
engine. This data was collect from links in the main page of
videos.
and edges between them are created when they are friend
to each other (in the YouTube context). Therefore, this
network represents how users are related one to another and
the degree distribution characterizes popularity of users in
the YouTube community.
Although the search engine utilized to find related videos
makes use of tags previous specified by human being users,
the mechanism that defines this relationship it is not based
on human interaction. Therefore, the edges are defined
based on the the recall of an algorithm.
Figure 4(a) shows a scatter of the degrees distribution of the
friendship network. The single distribution behaves a little
erratic because some nodes have odd degrees even though
the friendship relationship in YouTube is reciprocal (degrees
should all be even because they are the sum of in and out
degrees). The odd degrees happen because users accounts
can be suspended. When the crawler tries to collect these
suspended users, it gather only the user identification and
stores on the dump, hence, the suspended user’s friendship
list is not collected which results in an odd degree of the
friend user. However, the cumulative distribution diminish
the impact of these users and behaves like a power-law distribution.
The degree distribution of the relatedness network can be
seen in Figure 5(a). Probably because of the algorithmic
source of the relationships, the distribution does not behaves
properly as a power-law, it does only in some parts of the
distribution.
In Table 1, some of the friendship network properties are
listed. A network that has a high clustering coefficient (CC)
and a small diameter is called a Small-World network. This
kind of network seen to emerge from a lot of different human
interactions and is well-known to be easily navigable and
have a dense local cluster. Small-World networks merge
two desirable properties of two famous type of graphs, small
diameter from random graphs and high clustering coefficient
from regular graphs (lattices).
Subscription
The subscription network was built upon the data collected
from users nodes. From each user was gathered the list of
users whose new added videos should be notified to the collected user. This is an important network since it describes
how users are interested in the content publishing of other
users. Nodes in this network with high degree are authorities that will have their published videos watched by a large
public.
Favoring
Another kind of relationship present on the YouTube network is the one formed by the action when a user adds a
video to his/her favorites list. This list is extracted from
the user profile and its composed by a list of video identifiers.
Through this relationship we can build a bipartite graph,
where every edge connects a user and a video. One interesting network that can be formed from the favoring network
is the one formed with nodes of videos, and edges between
them if they appear in the favorites list of a same user. A
comparison between this co-favoring video network and the
relatedness network could be used to analyze the effectiveness of the YouTube related search engine.
Different from the other networks, the distribution that is
more relevant in this network is the distribution of the indegrees. This distribution is plotted on Figure 5(b) and
evidences a power-law distribution. This power-law differs
from the one found by the analysis of the attribute “number
of times favorited”. This is due to the same reasons that
subscribers distribution differed from each other (as mentioned before). But the inclination of the distributions are
close one to another.
7. CONCLUSIONS
In this work we were able to present a characterization of the
YouTube video-sharing virtual community. We were able to
collect a sample of this network using our own crawler and
make different analysis over this data.
The in-degree distribution of this network represents the
distribution of the number of subscribers among YouTube
users. This distribution is on Figure 4(b) and it behaves
like a power-law distribution. The difference between this
power-law and the one found on Figure 2(c) is due to the
fact that both information are incomplete. Despite the fact
that the attributes are related to the whole network, our
sample has not all the nodes to plot the real distribution. As
the built networks consider only the edges between collected
nodes, they are a fraction of the whole YouTube community.
By analyzing attributes and relationships we could see how
this technological network has a distribution of content extremely influenced by social relationships. Visualizations
of videos, relations among users and others have statistical
distributions that follow power-law functions, showing evidence of Small-World models and preferential attachment
scenarios.
The Table 1 shows some properties of the subscription network. As it can be seen, it has a high clustering coefficient as
well as a small diameter, which are properties of Small-world
networks.
As we do not have knowledge of any other work with the
YouTube network, we present a first step towards characterizing this important virtual community. Our results confirm that YouTube is, as some social networks like Orkut
and MySpace, a technological network whose topology and
connections are heavily influenced by human social behavior.
8.
REFERENCES
[1] Y. Ahn, S. Han, H. Kwak, S. Moon, and H. Jeong.
Analysis of topological characteristics of huge online
social networking services. In WWW ’07: Proceedings
of the 16th international conference on World Wide
Web, pages 835–844, Banff, Alberta, Canada, 2007.
ACM Press.
[2] L. Backstrom, D. Huttenlocher, J. Kleinberg, and
X. Lan. Group formation in large social networks:
membership, growth, and evolution. In KDD ’06:
Proceedings of the 12th ACM SIGKDD international
conference on Knowledge discovery and data mining,
pages 44–54, Philadelphia, PA, USA, 2006. ACM
Press.
[3] L. Gomes. Will all of us get our 15 minutes on a
YouTube video? The Wall Street Journal, August
2006. Available: http://online.wsj.com/public/article/
SB115689298168048904f92aczYTlCtKrTSiZ8vumR3eZCI 20070830.html.
[4] L. Grossman. TIME Best Inventions 2006. TIME,
November 2006. Available: http://www.time.com/
time/2006/techguide/bestinventions.
[5] R. Kumar, J. Novak, P. Raghavan, and A. Tomkins.
Structure and evolution of blogspace. Communications
of the ACM, 47(12):35–39, 2004.
[6] R. Kumar, J. Novak, and A. Tomkins. Structure and
evolution of online social networks. In KDD ’06:
Proceedings of the 12th ACM SIGKDD international
conference on Knowledge discovery and data mining,
pages 611–617, Philadelphia, PA, USA, 2006. ACM
Press.
[7] S. H. Lee, P.-J. Kim, and H. Jeong. Statistical
properties of sampled networks. Physical Review E,
73:016102, 2006.
[8] D. Liben-Nowell, J. Novak, R. Kumar, P. Raghavan,
and A. Tomkins. Geographic routing in social
networks. Proceedings of the National Academy of
Sciences, 102(33):11623–11628, 2005.
[9] M. Najork and J. L. Wiener. Breadth-first crawling
yields high-quality pages. In WWW ’01: Proceedings
of the 10th international conference on World Wide
Web, pages 114–118, Hong Kong, Hong Kong, 2001.
ACM Press.
[10] M. E. J. Newman. The structure of scientific
collaboration networks. Proceedings of the National
Academy of Sciences, 98:404, 2001.
[11] M. E. J. Newman. The structure and function of
complex networks, 2003.
[12] Reuters. YouTube serves up 100 million videos a day
online. USA TODAY, July 2006. Available:
http://www.usatoday.com/tech/news/
2006-07-16-youtube-views x.htm.
[13] B. P. S. Rocha, R. L. T. Santos, C. G. Rezende,
A. A. F. Loureiro, and V. A. F. Almeida.
Model-driven crawling and extraction of web-based
virtual communities. Submitted, May 2007.
[14] J. Surowiecki. The Wisdom of Crowds: Why the Many
Are Smarter Than the Few and How Collective
Wisdom Shapes Business, Economies, Societies and
Nations. Doubleday, May 2004.
[15] Wikipedia. YouTube, May 2007. Available:
http://en.wikipedia.org/wiki/YouTube.