A Test of Media Capture by Using Machine Learning Techniques

A Test of Media Capture by Using Machine Learning
Techniques
Evidence from Italian Television News 2010-2014
Andrea De Angelis∗
Alessandro Vecchiato†
September 9, 2016
VERY PRELIMINARY. PLEASE DO NOT CIRCULATE.
Abstract
A central question in political communication refers to the existence and extent of media bias and how that could affect political stability and electoral outcomes. We address this
question by using novel methodologies based on machine learning techniques and an original dataset collecting the entire corpus of national TV news outlets (including Rai, Mediaset,
and “LA7” TV networks) in Italy from 2010 to 2014. Textual models can perform linguistic
and substantive analysis of large corpus of text by exploiting variation in language use across
and within authors, documents and time. In this paper we first estimate ideology scores for
each TV outlet and analyze their change in the period under study. Secondly, we exploit the
discontinuity in public TV ownership in our data, to determine existence of media bias in the
news outlets as a way to make the public TV a more “favorable” environment for the incumbent right-wing government. Finally, we identify key news topics and track their saliency
over time and across networks. This methodology allows us to test whether political leaning in the news is due to strategic issue selection in news coverage or to differing frames in
communication of the news stories.
∗
Andrea De Angelis, European University Institute, Dpt. of Political and Social Sciences, via dei Roccettini 9 I50014, San Domenico di Fiesole (Italy); [email protected];
†
New York University, Wilf Family Dept. of Politics, 19 W 4th Street, Room 302, 10012 New York, NY; [email protected];
1
1
Introduction
Several papers find a remarkable correlation between the set up of the media landscape and
political outcomes in the US and around the world (see e.g. Djankov et al. (2003); McMillan
and Zoido (2004); Reinikka and Svensson (2005); Gentzkow and Shapiro (2010); Durante and
Knight (2012); DellaVigna et al. (2016)).
To understand the complex interaction between media outlet and political actors, both economists
and political scientists have examined the specific factors that shape incentives to manipulate the quality and content of news that reach voters: media owners can exploit political
connections to reach funding resources outside the advertisment market, politicians benefit from a favorable media arena that reduces voters monitoring ability and therefore their
accountability1 (Besley and Prat, 2006; Prat and Stromberg, 2013).
One crucial problem is how the ownership structure affects news selection and framing to
favor a specific political party Strömberg (2015). This paper follows the literature on media
and explores the role of news and issue framing as an alternative measure of bias. Research
in economics suggests that media outlet show a strong liberal leaning. Groseclose and Milyo (2005) reports that in the universe of US news media, only Fox News and the Washington
Times received scores at the right of the center. Bias can be driven from owners preferences
or audience demand2 . Media outlets can deliberately modify their language by including
political slant in order to attract readers with similar political views Gentzkow and Shapiro
(2010). Alternatively, they may follow the ideological sensibility of their ownership or a political party their are trying to support. In this last case, bias emerges as a result of political
capture of the media. We focus on this case and provide evidence of media capture by exploiting quasi-exogenous variation in media ownership.
To investigate this relationship we face several challenges. First, media ideological position is very difficult to estimate. A number of papers have resorted to various techniques
that detect ideological correlates in newspaper language usage. Groseclose and Milyo (2005)
considers a group of 200 prominent think tanks or policy groups and counts the times a particular member of the Congress cited one of them. They perform the same procedure for
a number of newspapers and other media outlet and assign an ideological score to each of
them on the basis of the frequency in which each think tank was nominated. This procedure
allows them to link the ideological bias of media outlets to the one of others political actors
and so derive an ideological measure of media. A more recent paper by Gentzkow and Shapiro
(2010) focuses in a similar fashion on media slant. To assign a particular ideological leaning
to specific language they refer to the Congressional Records and identify those set of phrases
that are used much more frequently by one party than the other. Secondly, they calculate the
number of time each outlet resorts to particular language that may sway voter to the left or
the right of the political spectrum and assign to each of them the corresponding ideological
score. However, while these measures may be able to capture newspapers ideological leaning,
are focused on a specific way in which the media outlet may try to influence reporting.
We overcome this difficulty by adopting a new unsupervised machine learning technique
first introduced by Slapin and Proksch (2008). WORDFISH is a scaling algorithm that estimates
policy positions based on word frequencies in texts. Following the naïve Bayes assumption
1
2
For a comprehensive review of the results on media and politics read Strömberg (2015).
See below for more extensive results on theoretical models of bias.
2
prevalent on text analysis literature (Eyheramendy et al., 2003), this algorithm represents a
text as a vector of words counts. Individual words are assumed to be distributed at random,
and word frequencies to be generated by a Poisson process. The procedure treats each piece
of text as expression of a separate ideological position, and based on their word frequencies,
estimates for each of them the relative weight of words in discriminating among ideological
positions, together with the ideology score of the document. All parameters are estimated
simultaneously for the entire corpus of text. The model can be expressed as follows:
yijt ∼ P oisson(λijt )
(1)
λijt = exp(αit + ψj + βj · ωit )
(2)
where: i indexes documents, j the tokens (i.e. words stems when only unigrams are used,
or ngrams), and t to time. The only Poisson parameter λijt is modelled as a function of three
latent components: αit are the document-specific fixed effect, ψj are word-specific fixed effects (capturing the relative frequency of the words), βj are the word-specific discrimination
weight parameters, capturing the ability of words to discriminate between ideological positions, and finally ωit estimate the latent position of the document. This process allows not
only to estimate relative scores across political actor based on relative word frequencies, but
also, by assuming independence across texts, over time. The omega scores thus represents
measures of ideology within the spectrum of the available texts and are allowed to change
over time.
A second challenge comes from the intrinsic unobservable nature of media capturing
practices. Politicians may influence the media through bribes, legal favoritisms or by appointing sympathetic managers. McMillan and Zoido (2004), for instance, uses a secret police account of government bribes to investigate corruption in Peru. They find that media
and especially TV channels were receiving the largest shares of bribes during the Fujimori
regime. Tella and Franceschelli (2011) uses government advertisement practices as a proxy
for favoritism. They find that newspapers with government advertising are less likely to talk
about government corruption, with one standard deviation increase in government advertising being associated with a decrease in coverage of corruption scandals of 0.23 of a front
page per month. Nevertheless, these measure do not allow for a systematic study of capture
given their rarity or their underestimation of the extent of corruption (advertisement may
be only one of the strategies the government use to favor specific actors).
Our setup provides a number of solutions to these challenges. The Italian media landscape has been widely criticized for its general lack of impartiality, to the point of blatant
political bias. The public ownership of three media channel, Rai 1, Rai 2 and Rai 3, gives the
government the prerogative of nominating their CEOs and newscasts editors. Grasso (2004)
describes the historical process that led to the politicization of the Italian State television in
detail. In 1975, the government reformed3 the national television system, shifting the prerogative of nominating the TV managers from the government to the parliament. This led
to a practice called ‘lottizzazione’ (lotting) that resulted in the consistent nomination as directors and editors of figures that were politically affiliated with specific national parties.
Thus, each channel had a well-defined (though not formally stated) political affiliation: Rai1
3
Legge n. 103 del 14 aprile 1975 in matters of national television broadcasting.
3
Figure 1: Patrick Chiappori on Berlusconi Resignation. Source: New York Times
typically supported the incumbent government (Democrazia Cristiana), Rai2 supported the
Socialist Party, and Rai3 the Communist Party (or the far left). This practice evolved after 1993
toward a stronger focus for television newscasts. After each election, with a new parliament
and government in place, the editors of the national channels (particularly Rai1 and Rai2)
were typically replaced with journalists or political figures closer to the new establishment.
This problem was exacerbated by the candidacy of Silvio Berlusconi, a media and real
estate tycoon, who entered the political arena in the aftermath of the Mani Pulite scandal
in 1993. When in power, Silvio Berlusconi, never completely divested its properties of three
TV channels, Canale 5, Italia 1 and Rete 4, officially controlling either directly or indirectly
86% of the media market in Italy. DellaVigna et al. (2016) study the Italian case from the
advertisement market perspective. Given that in Italy government officials are not required
to divest business holdings, in this paper they are able to test whether during the years of
Berlusconi’s governments there was strategic advertisement toward its networks as a form
of political support. They find that during Berlusconi political tenure Mediaset profits (his
TV company) increased by one billion Euros.
Our setup exploits a similar empirical strategy to associate government influence on television. In the aftermath of the financial crisis of 2008, Berlusconi’s government was facing
incredible pressure from the international markets to reduce the increasing national debt.
His failures to handle the situation and reduce the national bond spread over the Euro zone
led to his resignation on November, 12 2011. We exploit this quasi-exogenous variation in
government power due to his rapid decline to detect change in media ideological leaning as
a consequence of capture. By comparing TV news ideological score before and after his resignation, and at each time a network director was replaces, we are able to provide significant
evidence of capture in the Italian media.
4
Another advantage of our setup comes from the use of a novel dataset of TV news transcriptions collected by the Italian service of Teche Rai. The data collect the universe of TV
news in Italy in the period 2010-2014. That is, we are able to derive an ideological score for
each national TV network news service4 that aired each day in the specified time frame. The
extent of this data matched with our methodology gives us numerous advantages with respect to previous research. First, we do not have to resort to methods that hand-code network ideology and are thus sensitive to researcher discretion. Secondly, we do not have to
select specific significant words as representative of ideological leaning from which to estimate media ideology. Instead, we simultaneously estimate words ideological score and we are
able to select ex-post which are the most ideologically charged expressions. Finally, we are
able to track ideological movements over time and with the highest degree of granularity5 .
This paper provides a number of new results. We find that Italian newscasts have significantly different political leaning, that map generally similarly to the Italian political spectrum where Berlusconi’s TV networks map toward the right while historically leftist channels
to the left. This evidence is consistent with previous results by Gentzkow and Shapiro (2010);
Groseclose and Milyo (2005) that report significant political leaning for American newspapers. Differently from previous research, our work shows that this differentiation is not due
to viewers demand but political capture. We support this conclusion with two different empirical strategies. First, we provide evidence of strategic reporting by the TV networks. More
pro-government networks tend to substitute economic issue that were particularly problematic during the financial crisis with crime and entertainment reporting. We thus do not find
differences in issue framing across network but only on which issues to provide more emphasis. Secondly, we identify specific structural breaks in news reporting and match them with
the political agenda. We find that consistently with the change in government in 2011, the
average score of the TV newscast shifted significantly to the left of the spectrum. We take
these results as robust evidence of capture in the Italian television system.
This paper is structured as follows. Section 2 describes the data and the methodology in
detail. Section 3 provides the baseline results results on ideology for both media and MPs.
Section 4 shows the ideological change around the temporal discontinuity in 2011 and runs
a battery of tests to confirm the graphical result. Section 5 concludes.
2
2.1
Data
Teche Rai News Database
The prevalent source of data for this study were obtained from the Teche Rai Service, the
historical archive of Italian Radio and Television broadcasts implemented by the State Broadcasting company RAI. These data were developed from the audio-video files with a platform
of Automatic Newscast Transcript System, ANTS, especially targeted to news programs. The
obtained transcription quality is about 90% correct recognition. Also, since the text is synchronized with the multimedia signal, given a word the researcher can have immediate access
4
Italy has seven main national TV channels, three publicly owned (Rai1, Rai2, and Rai3) and four private (Canale5,
Rete4, Italia1 and La7). The media landscape hasn’t change since then.
5
As we later explain, we proceed by grouping TV transcription by week.
5
to the segment where it is pronounced. In addition, a validator performs a segmentation of
the signal based on the speech footprint of the speaker. The obtained transcripts are good
enough to be used for text-based search and information discovery by, for instance, full-text
search engines and artificial intelligence techniques. The project has now completed the
transcription of the corpus of newscast from 2010 to 2014.
These data present a unique and verbatim source for text analysis, with the most detailed
level of granularity. In its original form, a text document is represented by a segment on a
particular topic during a newscast. Each of the seven Italian broadcasting networks typically
run three daily news cast of about 30 minutes (morning, noon and evening service). The
total amount of data collected by this database therefore amounts to almost 20.000 hours of
broadcasting divided into 319,895 segments.
The dataset present a number of convenient features. Additionally to the transcription
of the newscast services, the dataset contains a number of meta-data that organize them and
ease analysis, like network, starting and ending time of each segment and subject6 .
In order to make sure that our ideological scores come exclusively from journalistic reporting we delete from the dataset all transcriptions referring to speech performed by politicians.
That is, all politicians interviews, remarks and otherwise recorded speech is excluded from
our analysis. We keep instead any journalistic commentary on those parts. Overall, all the
text we analyzed consists of chronicles and opinions from journalists and reporters In this
sense, our study assumes a hypothetical perfect pluralism as for the presence of politicians
in TV, which is the main criteria adopted by the Italian Communication Authority (AGCOM)7 .
2.2
Legislative Speeches
We preliminarily test our methodology with an application on unambiguously ideological
actors: the Italian MPs. For the analysis on the legislative speeches we exploit an original
dataset directly developed by the authors. We scraped from the official website of the Italian
Chamber of Deputies8 the entire corpus of the debates of the XVI legislature (corresponding
to the years 2008-2013) recorded in the Italian lower chamber (Camera dei Deputati). Yet,
we limit our focus to the time frame in which the Berlusconi’s cabinet was in charge (20082011). In fact, the end of the Berlusconi’s experience led to the formation (16 of November
2011) of the Monti cabinet. The latter period was characterized by highly exceptional political
circumstances. The technocratic government’s political backing changed twice during its
term9 This implies that whenever the MPs would address the government in their speech, the
statistical algorithm could not possible distinguish between the previous references to the
Berlusconi government and the successive Monti one. Switching opposition and majority this
6
By subject we refer to general journalistic categorization into politics, chronicles, economy, arts and sports.
More detailed information regarding the monitoring of political pluralism, the aims of the AGCOM authority, and
the Law n. 249/1997 (the “par condicio” law) regulating the issue of pluralism on the media is available (in Italian)
from the following AGCOM link: https://www.agcom.it/ldisciplina-della-par-condicio8
The website is at the following link: http://leg16.camera.it/207. The web scraping was performed relying on the
Python’s package Beautiful Soup.
9
Monti was initially supported by both the two main Italian parties (the Democratic Party and Berlusconi’s People’s Freedom Party), and this means that two previously opposing forces would start a radically new political phase
in which they supported the same government.
7
6
frequently would have increased significantly the noise in our estimates and compromised
the validation strategy we are using.
Our corpus includes all the official sessions of the legislature, including all the secondary
discussions that concerned the legislative activities (decree presentations, discussion of amendments, final debates and voting statements). We excluded the works of the various committees, Q&A sessions with the members of the government, as well as the parliamentary investigations. Overall, we count the transcripts of 739 parliamentary sessions that correspond to
18,356 single interventions. All the interventions of the President of the Chamber (On. Gianfranco Fini) were removed from the corpus. For each transcript, we harvested information
regarding the presenters’ names, the date and the session of the intervention, and the party
affiliation of the presenter. In this way we were able to list the legislative speeches by MP,
leading to a final dataset of 518 documents (not all the MPs made at least one intervention,
see Appendix A for a summary list of MPs by number of interventions).
3
Analysis and preliminary results
This section introduces the main findings of our analytical effort. It is organized in three
subsections. In the first place, we realize a preliminary test of ideological scaling of nonpolicy text. This is motivated by the fact that the WORDFISH algorithm is typically applied to
manifesto documents containing explicit and (questionably) exhaustive or at least systematic
text regarding the main policy positions of political parties and movements. Our application
deviates from these applications, as out text corpus does not contain policy positions.
Subsection 3.1 shows that it is indeed possible to detect latent political positions even if
the political text does not directly involve policy positions. Next, in subsection 3.2 we apply
the text scaling algorithm to the corpus of TV news transcripts and report details on the estimation process and as well as on the main results. We anticipate that the estimates reveal
systematic differences between the main Italian TV channels that are compatible with our
prior knowledge on the ideological leanings in the Italian media system. Finally, subsection
3.3 investigates the two potential mechanisms of transmission of the political signal: strategic
issue selection works through systematic differences in the amount of time devoted to distinct
issues between the TV outlets; issue framing operates instead through the usage of systematically different words to present the same issue. Our findings seem compatible with the issue
selection mechanism while no supportive evidence is found for the framing mechanism.
3.1 A preliminary test of text scaling using parliamentary debates, Italy 2008-2011
We present a preliminary analysis of the Italian Chamber of Deputies’ parliamentary debates.
This scaling exercise has two purposes. First, it will let the reader familiarize with WORDFISH
and the process of latent scores’ estimation. Second, it shows that is possible to extract face
valid latent positions from non-policy text.
The “parliamentary-text test” is a particularly harsh one for text scaling. In fact, parliamentary debates typically involve lengthy procedural discussions; debates involving references to legislative measures rather than explicitly to political issues and positions; the
7
Figure 2: Box plot of political groups based on scaling of MP interventions, Italy 2008-2011
language that is spoken in the parliament does not privilege clarity and simplicity, rather it
inevitably involves articulated considerations, references to previous debates or events, ambiguous statements, and a generally allegorical and sophisticated style of communication10 .
Thus, our expectation is that if we will observe systematic differences among parliamentary
groups based on the language that the single MPs have been using during the debates, then
it will be possible to extract political positions also from other sources of non-policy text as
well.
We pretreat the corpus of legislative speeches with a procedure that will be detailed in
subsection 3.2. This pretreatment involves the removal of all punctuation and characters and
numbers, the reduction of the words to stemmed tokens11 , the computation of a list of unigrams (e.g. ‘govern’, ‘berluscon’) and bigrams (e.g. ‘govern_berluscon’), and the removal of Italian
stopwords12 . When all this text preparation steps are fulfilled, we convert the documents of
MPs speeches into a document-feature matrix presenting all the tokens that are identified in
rows and the documents ordered in columns. The entries of the data-feature matrix are thus
word frequencies (i.e. absolute word counts for each document).
The results of our scaling exercise are presented as divided according to the political
groups in the lower chamber. The Box Plot in Figure 2 represents the documents’ (or MPspecific) omega scores that were computed with WORDFISH on the corpus of the interventions
at the Chamber of Deputies.
Results range from an average estimated position of ωIdV = −0.782 for the group of the
10
The language employed by Italian politicians is known in the press jargon as “politichese”.
Stemming is the process through which every word in the text (e.g. conjugated verbs, plural substantives, derived forms) are reduced to their root form. A related process is lemmatization, that reduces the words to their
morphological root. For instance, stemming the vector of Italian words {‘governo0 , ‘governare0 , ‘governativo0 }
we obtained the stemmed vector {‘govern0 , ‘govern0 , ‘govern0 }.
12
Stopwords are those words that are functional to the creation of syntactic structures of the text. They are omitted because unrelated to the content.
11
8
Italy of Values (IdV), to the ωLN = 0.324 estimated for the group of the Northern League
(LN). The value of the People’s Freedom Party is very close to the one of LN, with a value of
ωPdL = 0.318. This makes sense considering that the two parties were government partners.
All the opposition political groups display negative average omega values, as it could be reasonably expected. Also, the ranking of the opposition groups follows our prior expectation.
The most left-wing political group appears to be the Italy of Values (with the recalled value
of −0.782). The IdV used to be a centrist, anti-corruption movement. However, after the
2009 European Parliament elections, the party undertook a populist turn and strengthened
the relationships with the parties of the radical left 13 . The party also led to the election of a
left-wing major in Naples (Luigi De Magistris) in alliance with the Federation of the Left, and
against the Democratic Party. Considered the historic absence of a Communist or Socialist political group in the XVI legislature, we can reasonably think of the IdV as the most left-wing
political group, or at least the most neatly opposed to Berlusconi and his government.
Between the two poles of the IdV and the groups supporting the government, the algorithm ranks respectively Future and Freedom (ωFLI = −0.259), the Democratic Party (ωPD =
−0.233), the mixed group of non-iscrits (ωMisto = −0.230), and the centrist Union of Center (ωUdC = −0.085). Future and Freedom was a liberal-conservative group and a negative
score may seem unreasonable. Yet, we believe that a negative score is indeed a meaningful one. First, the high dispersion of MPs’ scores seems to suggest substantial uncertainty
around the position of FLI. Second, this is the party of President Gianfranco Fini’s followers.
FLI was created after the split from the Peoples’ Freedom Party due to the recurring critiques
to the Berlusconi’s government. Famously, Fini defended more progressive stances on social
issues such as immigration (defending the right to vote for resident immigrants at the local
elections14 ) demanding a stronger role of his faction within the PdL. It is thus likely that the
latent scores computed through parliamentary debates are indeed better capture the degree
of opposition towards the government rather than an overall ideology score.
We read the results of this preliminary test as evidence that WORDFISH is able to estimate
meaningful latent scores from non-programmatic political texts such as the transcripts of
the parliamentary debates. In the next section 3.2 we will apply the scaling algorithm to our
corpus of TV news transcriptions.
3.2 Scaling the fourth estate: extracting latent positions from
TV news, Italy 2008-2014
The Italian TV news corpus consists of a total number of 319,895 recorded news stories15
broadcast from the main Italian TV news programs: TG1, TG2, TG3, TG4, TG5, Studio Aperto
and TG7. In this subsection we offer a detailed account of the data preparation and estimation process and discuss the main findings. In the next subsection 3.3 we investigates the
mechanism of TV programs’ political leaning.
13
The more centrist faction of IdV split from the party in November 2009.
“Fini: ‘Sì al voto agli immigrati’ ”, Il Sole 24 Ore, 04 September 2008. Link here.
15
The initial total number of news stories is 412,039, but we remove a number of news stories for which the date
is not available.
14
9
Data preparation
The analytical task requires a number of preliminary steps. First, we subset the news stories
to include only politically relevant topics. To this end we exclude from the corpus the following news topics: sport (we thus exclude also football news16 ), music, shows, culture and
science news, whether and natural disasters, and reported cases of death or illness of important people. This leaves us with 276,266 valid news transcripts to consider in the analysis.
Secondly, we undertake a number of operations that are required for the estimation. In fact,
text analysis with WORDFISH, as with other “bag of words” approaches, involves converting
the corpus of the raw documents into a document-feature matrix of discrete occurrences of
all the tokens in all the documents.
We place extreme care in the creation of the data-feature matrix, because it represent the
core of the estimation process. For all these pre-estimation operations we rely on the tm R
package17 . The sequence of steps that led to the creation of the data-feature matrix is the
following:
1. We merge all the news stories’ text by week for each TV channel. This effectively shifts
the unit of analysis from the single news story to the [TV channel × week] level. An
alternative aggregation scheme could have been the single TV news edition (three per
day) or the single day (merging the three editions). Yet, the weekly solution in our
opinion is more efficient in that it does not imply a dramatic loss of information, while
it substantially shrinks the number of parameters to be estimated. This, in turn, considerably reducing the estimation time.
2. Italian language includes apostrophes, thus conventional punctuation removal tools 18
would behave stripping off the apostrophe and linking the leading character to the previous one19 . We thus preliminarily apply a regular expression to substitute all punctuation characters not with an empty character, but with a single space character instead.
3. We strip all extra spaces from the text documents.
4. We transform all text into lower case characters.
5. We remove stopwords in the text20 .
16
Although one may argue that the coverage of Berlusconi’s football team, A.C. Milan, may not be considered
politically irrelevant.
17
The quanteda package would have been a valid alternative. We opt for tm because it allows tokenization to
follow stopwords removal and this leads to faster preprocessing, while quanteda preliminarily requires tokenization
in order to preprocess text.
18
Such as the removeP unctuation function from the tm R package that we use.
19
This works on English text corpora because apostrophes indicate possession and are thus always following the
word, as in “the professor’s hat”. The punctuation removal would result in “professors hat”, and once the words are
stemmed the additional “s” character would be removed as well. In the Italian language apostrophes very often are
associated with the elision of the article when the following word starts with a vowel. Thus, the punctuation removal
of “l’amica” [the female friend] and “un’amica” [a female friend] would wrongly result in two different tokens: “lamica”
and “unamica”, and this would be unaffected by the following stemming process.
20
We have created a custom Italian stopwords list composed as the union of: 1) all the snowball stopwords list
that are typically included in the most common R packages (as in tm and quanteda); 2) all the words included in
the Ranks N L stopwords list; 3) the following list of 34 additional stopwords chosen after visual inspection: {l,
poi, far, quest, qual, tant, quel, dic, so, quell, avev, piu, fa, vorrebb, gia, puo, s,
sar, d, nun, ce, n, foss, x, b, va, ogni, vuol, andar, propr, fatt, vann, www, fonte}.
10
6. We remove numbers in the text.
7. We stem the document using the Porter’s stemming algorithm.
8. We create the document-feature matrix including all the unigrams and the bigrams in
the text corpus.
9. We remove the top 5% and the bottom 5% tokens by frequency21 . This shrinks the
number of tokens from 29, 040 to 26, 172.
The final document-feature matrix has thus dimensions [26172 × 881], with 26, 172 single
tokens (unigrams and bigrams) and 881 documents (one for each TV channel per week) and
matrix entries represented by the absolute frequency of tokens’ occurrence. The TV news
reported range from the week starting on the 26 of July 2010 until the week starting on the
30 of September 2014.
Results
We run the WORDFISH’s iterative EM algorithm until it reached a tolerance threshold of 1e −
722 . The results of the estimations are reported in Figure 3. Every dot represents a weekly
TV program’s omega score. The plot shows on the horizontal axis the date of the specific
transcript, and the vertical axis expresses the estimated omega scores for each [TV program
× week] document. We apply a LOESS smoothing filter together with a 99% confidence band
to emphasize the trend of the estimates for each TV program.
The plot points to three important results. First, we can identify differing central tendencies and trends in the series of omega scores, given that the confidence intervals mostly
do not overlap. This means that indeed the scores signal the presence of systematic differences in the word usage. Secondly, we observe a quite consistent and face valid ranking of
the omega scores, that ranges from the position of Studio Aperto and Rete 4 on one pole, to the
positions of TG3 and of TG7 on the opposite side of the identified latent space. This links the
previous point to the political world. Because even if the latent scores based on the reported
news are not direct estimates of the ideological slant of TV channels (because our corpus does
not include programmatic policy statements), the fact that the three Mediaset (i.e. Berlusconi) owned news programs appear on the positive-omegas pole, while the independent TG7
and the historically left-leaning TG3 on the negative-omegas side, provides a strong signal
that our TV omega scores are indeed highly correlated with the underlying political slant.
Indeed, the Italian Authority for Communications (AGCOM), using very different criteria23 ,
in October 2010 warned Tg4, Studio Aperto, and TG1 for excessive political unbalance and
disproportionate visibility of right-wing political leaders24 . While we will deal with TG 4 and
21
This is a standard procedure in quantitative text analysis that aims at cutting non-informative long tails in the
distribution of words.
22
To scale up WORDFISH to estimate latent positions in a large data setting we run the model on the EUI HPC cluster.
We run the model using the wordfish function in R. The quanteda’s textmodel function could have been a viable
alternative. The total estimation time is of 3.93 hours.
23
Their judgement was only based on the direct presence of politicians on the various news programs. The reader
should thus notice that since all politicians’ interviews have been removed, our corpus would represent a case of
‘perfect balance’ in the information adopting the standards of AGCOM
24
‘Telegionali e pluralismo, ecco i dati’, Corriere della Sera, 21 October 2010 [link here]; ‘Dall’AGCOM arriva diffida al TG1:
“Forte squilibrio a favore del governo”. Richiamo al TG4 e a Studio Aperto’, Corriere della Sera, 21 October 2010 [link here]
11
Figure 3: Wordfish scores of Italian TV news programs, 2010-2014
Studio Aperto thoroughly in the following sections, the case of TG 1 will be analyized in Subsection 3.4. Finally, we notice a trend of growing differentiation over the years. We argue
that the growing differentiation over time can be traced to fact that “hard news” programs
devoted more coverage to the economic crisis, as will be shown in 3.3.
The representation of the TV-specific trends in the omega scores seems to indicate the
presence of a differentiated media content supply in Italy. TV news outlets in Italy appear
to be using systematically different words, which leads us to think that TV channels in fact
discuss about different topics, or about different aspects of the same topics. We will provide
more detail on this two potential explanations of the WORDFISH scores divergence in the next
Section (3.4). To better understand the contente of the longitudinal shifts that we observe for
all the news programs, it can be useful to inspect the words-specific parameters (i.e. the βs)
that are associated with the two poles of the latent space identified in the estimation process.
Table 1 shows respectively the list of the 25 more tightly linked to the pole of positive values
of the latent scores, and the 25 words most connected with the opposite pole of negative
omega values. The full distribution of words can be visually inspected in Appendix B.
We notice that words associated with positive and negative beta scores refer to very different kind of news content. Positive scores are associated with crime stories (francesc_mort,
mort_yar) or accidents reports (foll_veloc), while negative scores are associated with hard news
regarding political (i.e. ricandidatur, impegn_europe) and economic (e.g. stagnazion, miliard_men) issues. This result may lead to the conclusion that more right-wing (that is of the same
leaning as the government) networks supply softer and less politically-charged topics. We
can conjecture that this could be functional to downplaying the saliency of more thorny or
difficult issues. The next Subsection 3.3 will also address this point.
12
Table 1: List of top 25 words associated respectively with positive and negative beta scores
Tokens
esperient_govern
ex_vertic
econom_grill
crosett
merc_unic
interess_deb
vot_test
risors_destin
resping_mittent
intes_riform
ricandidatur
posit_arriv
impegn_europe
segretar_carrocc
tropp_alti
confin_turc
commission_barros
intes_part
acquist_ben
miliard_men
azion_maggior
stagnazion
ex_capogrupp
govern_grec
men_previst
b
-1.50
-1.50
-1.50
-1.50
-1.50
-1.51
-1.51
-1.51
-1.51
-1.51
-1.51
-1.52
-1.52
-1.52
-1.52
-1.52
-1.52
-1.52
-1.52
-1.52
-1.52
-1.52
-1.53
-1.53
-1.53
psi
-2.11
-2.46
-2.05
-2.27
-2.73
-2.65
-2.63
-2.80
-2.42
-2.42
-2.42
-2.77
-2.68
-2.49
-2.42
-2.44
-2.42
-2.34
-2.13
-2.32
-2.57
-2.41
-2.39
-2.33
-2.74
Tokens
benven_stud
angel_machiavell
gabriell_simon
ser_grad
machiavell
rem_croc
ser_retequattr
francesc_mort
yar_stat
apert_sent
luc_pesant
carmin_martin
andiam_rom
lecces
prim_scompars
ser_stud
massim_canin
massimil_dio
mort_yar
giorn_stud
luis_ross
feder_gatt
bepp_gandolf
foll_veloc
marc_graz
b
1.55
1.18
1.02
1.02
1.01
0.98
0.94
0.92
0.83
0.83
0.79
0.79
0.79
0.76
0.74
0.74
0.72
0.67
0.62
0.61
0.58
0.58
0.58
0.55
0.53
psi
-3.52
-3.11
-2.12
-3.16
-2.60
-3.01
-2.85
-3.21
-2.32
-3.04
-2.53
-2.94
-3.03
-2.40
-2.79
-2.70
-2.57
-2.56
-2.10
-2.83
-2.20
-2.28
-2.34
-2.87
-2.56
3.3 Strategic issue selection or framing: assessing the mechanism of media political signal’s transmission
The previous section showed the presence of consistent and systematic differences in the
language used by the Italian news programs and connected these differences to their political
coordinates. Yet, it remains unclear whether the political leaning of the news is associated
with differences in the content (i.e. what is being told to the audience) or whether the scores
are driven by differences in the news framing (i.e. how the news are presented).
To address this point, we run WORDFISH on two different subsets of the original TV corpus.
In both cases, we focus our attention to the time period in which the Berlusconi IV cabinet
was in charge (thus from the summer 2010 until October 2011). In the first subset we only
consider economic news, while in the second we only consider the reports of news related
to law and order (including a set of crime news regarding immigration, murders, terrorism,
and other generic news stories unrelated to politics and the economy). The idea at the center
of this design is that in case we are still able to identify systematic differences among news
programs when the same issue is covered, then this would imply that the political signal is being
channelled through the news frames rather than through the issue saliency.
13
3.3.1
The economy in the news
In order to investigate issue framing we thus focus on two central subjects and exploit the
text analysis features to detects the use of different language and especially of qualifiers. The
results of this second analysis are reported in Figure 4.
We notice a remarkably different pattern in the scaling of economic news with respect
to the case of the entire corpus. In face, we can immediately notice that the variation in the
omega scores does not occur between-programs, but rather over time. In fact, we observe a
radical movement toward more negative omega values for all the news programs considered,
and this corresponds to the worsening of the financial crisis in the second half 2011.
Figure 4: Wordfish scores of Italian news programs (economic news only), 2010-2011
Similarly to what we did before, to understand the substantive content of the latent “economic news” scores, Appendix D reports the list of words associated with the highest and
lowest β discrimination values. Upon inspection, we notice that the “left” pole of negative
values and the “right” one of positive scores respectively deal with problems and issues in
the financial crisis, and with more standard and traditional industrial policy issues.
Having found no systematic differences in the economic news’ frames of reference, we
turn our attention towards the coverage of economic news. The idea is that Mediaset news
programs may strategically downplay the importance of economic issues, given the bad news
and the fact that Berlusconi in our time frame is incumbent. If this is observed, then we may
conclude that it is not how the news are reported, but rather what kind of news are covered
that explains the political signals.
This view is indeed is corroborated by the evidence provided in Figure 5. This graph represents the cumulative airtime devoted to economic issues by network. We can indeed observe
how economic issues are more intensely covered on the news programs previously shown as
associated with “left-leaning” omega scores in the full model. Differently, Mediaset channels
14
appear to have silenced economic issues during the Berlusconi government, as they consistently underreport the financial crisis compared to other TV channels.
Figure 5: Cumulative airtime (in days) devoted to economic news
This provide suggestive evidence of strategic selection of news topics by political affiliation, where networks ideologically closer to the right-wing government were strategically
attempting to decrease the saliency of ‘uncomfortable’ issues related to the financial crisis.
3.3.2
Crime stories in the news
If the conjecture corroborated by the findings in the previous subsection is valid, then we
should also observe the same “strategic issue selection” mechanism at work for political issues that are owned by the Italian right (i.e. favourable). Thus, in this section we provide
similar evidence for crime news. If our argument is correct, we expect Mediaset channels to
privilege news stories associated with the traditionally ‘right-wing’ issue of law and order:
crimes, murders, immigration, and terrorism news reports.
In the first place, we again investigate whether the TV frames of reference for such crime
stories are presented in a similar or diversified fashion. Figure 6 shows that there are no
detectable framing effects also for crime news.
Then, we again turn our attention toward the cumulative airtime, this time referring to
crime news. We find striking to find that the patterns of coverage presented in Figure 7 are
the mirror-image of those previously presented with respect to the coverage of the economy.
We find this evidence consistent with our previous argument. While there are not significant
differences in the way (how) different newscasts reports either economic news (see Appendix
C and D for evidence on omega scores and representative words) or crime news and current
affairs (refer to Appendix E and F), we find a complementary pattern with respect to coverage.
15
Figure 6: Wordfish scores of Italian news programs (crime news and current affairs only), 20102011
Figure 7: Cumulative airtime (in days) devoted to crime news and current affairs
To recap, in this section we have investigated two potential mechanism of transmission
of the political signal through the news. Strategic issue selection involves downplaying the
issues that are more political uncomfortable for the referent political actors and simultaneously to overemphasize those who could potentially strengthen the electoral support of the
16
political referent group. Indeed we show this empirically, and our results point to the validity
of Agenda Setting theories and the relevance of issue saliency for media and political actors.
Differently, we find no empirical traces of a systematic framing power of the media, as we
find that for a given topic, the language that is used is basically not differentiated. In the
final empirical subsection 3.4 we leverage on the longitudinal variation of our omega scores
to strengthen the internal validity of our estimates.
3.4
Structural breaks
In this subsection we investigates the WORDFISH scores from a time-series perspective, iming at identifying meaningful structural changes in the omega scores that are related to real
world events. We have already related variation in the language used by the news programs
to report crime and economic news, but this largely relied on anecdotal evidence. In this
subsection, we more systematically link structural changes in the omega time-series to documented real-world events.
Again, we consider the cases of economic and current affair news. Finally, we shift our
focus from all the TVs to the major (and first) Italian public TV: RAI1. Our structural breaks’
analysis shows that a break in the language adopted by the TG1 news programs occurred
when Director Augusto Minzolini, notoriously a Berlusconi supporter25 , was led to abandon
the direction of TG 1. We use the changepoint R package to automatically identify the optimal
positioning and the number of breaking points in the series.
3.4.1
Structural breaks in the economic news
Once we run the algorithm of structural change detection on the series of omega scores identified for the subset of economic news, we identify one break, occurring in the week starting
on the 4th of July 2011. Figure 8 graphically represents this break.
Indeed, the the financial crisis hit Italy generating a sudden increase in the German 10y
yields of the BUND/Italian 10y BTP yield’s ratio on the 9 of July 201126 . This corroborates
our previous understanding of the economic news latent scores as ranging between ‘physiologic’ economic times, where news reports center on industrial policy, and the economic
crisis, where financial issues become prevalent on the media. Figure 9 further reinforces this
finding showing the titles of the cover page of the main Italian paper, the Corriere della Sera,
before and after the identified structural break.
3.4.2
Structural breaks in the crime and current affairs news
The same exercise is now repeated for the subset of omega scores produced for all the TV
news programs with respect to crime news and current affairs. Once we run the break point
analysis, we are able to identify two breakpoints on the weeks starting respectively on the
25
The nomination of Augusto Minzolini was not supported by the center-left members of the RAI board, who left
the board room at the moment of the vote and later called a press conference decscribind the appointment “unacceptable” supporting information. The Italian Communication authority in April 2010 also warned Augusto Minzolini for excessive unbalance and lack of pluralism supporting information. Finally, the AUDITEL data docmented a
decrease in the audience share of TG1. See also footnote 24.
26
‘Us Hedge Funds bet against Italian bonds’, Financial Time, 10 Juy 2011 [link here].
17
Figure 8: Structural breaks in the omega series (all TVs, economic news, 2010-2011
Figure 9: Cover page of Corriere della Sera across the structural break in economic news’ scores
(a) CdS before the structural break (08-07-2011)
(b) CdS after the structural break (11-07-2011)
7 of February 2011, and on the 4th of April of the same year. Figure 10 presents these two
structural breaks (all the TV news programs are considered for the period in which Berlusconi
government is incumbent).
We realize that indeed the two breaks delimit the peak of the Egyptian Revolution of 2011,
18
Figure 10: Structural breaks in the omega series (all TVs, crime news, 2010-2011
as Hosni Mubarak resigns on the 11 February 2011. Figure 11 further provides evidence to this
claim presenting the titles of Corriere della Sera on the corresponding dates.
Figure 11: Cover page of Corriere della Sera across the structural break in current affairs news’
scores
(a) CdS before the structural break (10-02-2011)
(b) CdS after the structural break (11-02-2011)
19
3.4.3
The case of TG1: structural breaks and director’s turnover
Having become more confident about the ability of WORDFISH to identify real changes in the
underlying political and economic conditions, we run a final test to check whether we are also
able to identify a “structural change” in the direction of the news programs. In particular,
we have recalled the role played by Augusto Minzolini as government supporter at the time
he was directing the main Italian news programs: the TG 1.
In this subsection we thus run the structural break detection algorithm on the time-series
of omega scores computed for TG1 in the main model that considered all the news covered. As
shown in Figure 12, we could identify one structural break in the series in the week starting
on date 17 January 2011.
Figure 12: Structural breaks in the omega series (TG1 only, 2010-2011)
Once we superimpose on the plot the dates in which the three TG 1 directors considered in
the timespan of the analysis took charge, the association of the structural break in the overall omega scores of the news program and the resignation of Augustion Minzolini becomes
evident. We provide it in the dedicated Figure 13. The timing of the events lends support
to the ability of the algorithm to detect changes in the political leaning of a news outlet. As
already pointed out in Section 3.2, in October 2010 the TG 1 received a warning from the Communication Authority for strong imbalance in the airtime presence of politicians, favouring
the right-wing government. In December 2010, Minzolini is dismissed from the direction of
TG 1 and shortly after a new director (Alberto Maccari is appointed). The red vertical line in
Figure 13 signals the official start of the Maccari direction, but the moment Minzolini was
dismissed is actually shortly antecedent to the structural change we identify.
20
Figure 13: Structural breaks in the omega series (TG1 only, 2010-2011)
4
Conclusions
This paper provides a number of new results. Italian Television networks content is differentiated consistently across networks. Historically more left-leaning networks tend to focus
on serious news content, like the state of the economy and political affairs. On the other
hands, networks historically associated with the conservative parties provide more emphasis to chronicles, gossip and crime news. We therefore find robust evidence of strategic issue
selection throughout our dataset, but scant evidence of differential issue framing.
Our results are consistent with previous work on newspaper ideology. Our study differs
in the fact that, given our setting, we are able to determine media bias as a consequence
of media capture. This results raise the welfare concerns that are typically associated with
media bias. While a degree of differentiation across network in news content is desirable as
it may be a mechanism to ensure pluralism, in this case it may be problematic as deviation
in the ideological score are not driven by viewers demand but by political capture. As a result, we might anticipate a number of negative political outcomes, like growing ideological
segmentation and increasing polarization in the audience.
21
References
Baron, David P. 2006. “Persistent media bias.” Journal of Public Economics 90(1):1–36.
Besley, Timothy and Andrea Prat. 2006. “Handcuffs for the Grabbing Hand? Media Capture
and Government Accountability.” American Economic Review 96(3):720–736.
URL: https://www.aeaweb.org/articles?id=10.1257/aer.96.3.720
DellaVigna, Stefano, Ruben Durante, Brian Knight and Eliana La Ferrara. 2016. “Market-Based
Lobbying: Evidence from Advertising Spending in Italy.” American Economic Journal: Applied
Economics 8(1):224–256.
URL: http://pubs.aeaweb.org/doi/10.1257/app.20150042
Djankov, Simeon, Caralee McLiesh, Tatiana Nenova and Andrei Shleifer. 2003. “Who Owns the
Media.” Journal of Law and Economics XLVI(October):341–381.
URL: http://scholar.harvard.edu/files/shleifer/files/media.pdf
Duggan, J. and C. Martinelli. 2011. “A spatial theory of media slant and voter choice.” Review
of Economic Studies 78(2):640–666.
Durante, Ruben and Brian Knight. 2012. “Partisan Control, Media Bias, and Viewer Responses:
Evidence From Berlusconi’s Italy.” Journal of the European Economic Association 10(3):451–481.
URL: http://doi.wiley.com/10.1111/j.1542-4774.2011.01060.x
Eyheramendy, Susana, Susana Eyheramendy, David D. Lewis and David Madigan. 2003. “On
the Naive Bayes Model for Text Categorization.”.
URL: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.20.1365
Gentzkow, Matthew and Jesse M. Shapiro. 2010. “What Drives Media Slant? Evidence From
U.S. Daily Newspapers.” Econometrica 78(1):35–71.
URL: http://doi.wiley.com/10.3982/ECTA7195
Grasso, Aldo. 2004. Storia della televisione italiana. Milano, Italia: Garzanti.
Groseclose, T. and J. Milyo. 2005. “A Measure of Media Bias.” The Quarterly Journal of Economics
120(4):1191–1237.
URL: http://qje.oxfordjournals.org/cgi/doi/10.1162/003355305775097542
McMillan, John and Pablo Zoido. 2004. “How to Subvert Democracy: Montesinos in Peru.”
Journal of Economic Perspectives 18(4):69–92.
URL: https://www.aeaweb.org/articles?id=10.1257/0895330042632690
Mullainathan, Sendhil and Andrei Shleifer. 2005. “The Market for News.” American Economic
Review 95(4):1031–1053.
URL: http://pubs.aeaweb.org/doi/abs/10.1257/0002828054825619
Prat, Andrea. 2015. “Media Capture and Media Power.” Handbook of Media Economics Vol.
1B(2002):669–686.
22
Prat, Andrea and David Stromberg. 2013. “The Political Economy of Mass Media.” Advances in
Economics and Econometrics Tenth World Congress, Volume 2: Applied Economics pp. 135–187.
Reinikka, Ritva and Jakob Svensson. 2005. “Fighting Corruption to Improve Schooling: Evidence from a Newspaper Campaign in Uganda.” Journal of the European Economic Association
3(2-3):259–267.
URL: http://doi.wiley.com/10.1162/jeea.2005.3.2-3.259
Slapin, Jonathan B. and Sven-Oliver Proksch. 2008. “A Scaling Model for Estimating TimeSeries Party Positions from Texts.” American Journal of Political Science 52(3):705–722.
Strömberg, David. 2015. “Media and Politics.” Annual Review of Economics 7(1):173–205.
Tella, Rafael Di and Ignacio Franceschelli. 2011. “Government Advertising and Media Coverage of Corruption Scandals.” American Economic Journal: Applied Economics 3(4):119–151.
URL: https://www.aeaweb.org/articles?id=10.1257/app.3.4.119
Zaller, John. 1999. A Theory of Media Politics. University of Chicago Press.
URL: http://www.sscnet.ucla.edu/polisci/faculty/zaller/media politics book .pdf
23
A Appendix - List of top 25 and bottom 5 MPs included
in the scaling of parliamentary speeches by number of
interventions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
...
514
515
516
517
518
MP name
Roberto Giachetti
Antonio Borghesi
Fabio Evangelisti
Erminio Angelo Quartiani
Mario Tassone
Simone Baldelli
Angelo Compagnon
Federico Palomba
Francesco Barbato
Renato Cambursano
Arturo Iannaccone
Furio Colombo
Pierluigi Mantini
Marco Giovanni Reguzzoni
Massimo Polledri
Pier Ferdinando Casini
Sergio Michele Piffari
Massimo Donadi
Carlo Monai
Ivano Strizzolo
Donatella Ferranti
David Favia
Amedeo Ciccanti
Rita Bernardini
Teresio Delfino
Political group
Partito Democratico
Italia Dei Valori
Italia Dei Valori
Partito Democratico
Unione Di Centro Per Il Terzo Polo
Popolo Della Liberta’
Unione Di Centro Per Il Terzo Polo
Italia Dei Valori
Italia Dei Valori
Misto
Misto
Partito Democratico
Unione Di Centro Per Il Terzo Polo
Lega Nord Padania
Lega Nord Padania
Unione Di Centro Per Il Terzo Polo
Misto
Misto
Italia Dei Valori
Partito Democratico
Partito Democratico
Misto
Unione Di Centro Per Il Terzo Polo
Partito Democratico
Unione Di Centro Per Il Terzo Polo
Michela Vittoria Brambilla
Piero Testoni
Pietro Lunardi
Sandro Oliveri
Vincenzo Barba
Popolo Della Liberta’
Popolo Della Liberta’
Popolo Della Liberta’
Misto
Popolo Della Liberta’
24
Number of interventions
340
277
261
225
215
207
170
152
150
148
141
132
128
126
123
122
121
120
119
119
113
111
110
108
108
1
1
1
1
1
B Appendix - Eiffel plot of the distribution of tokens
for the Italian news programs (2010-2014
The following figure represent the words’ discrimination parameters computed by the W ordf ish
estimation performed on the entire corpus of the Italian TV news programs. The horizontal
axis reports the β parameters, which estimate the weight of word in discriminating between
the news programs’ positions, while the ψ scores, reported on the vertical axis, reports the
words’ fixed effect, capturing the frequency of the words (with less frequent words typically
having greater discrimination weight).
Figure 14: Word discrimination parameters for the corpus of Italian news programs
25
C Appendix - Eiffel plot of the distribution of tokens
for the subset of economic news (2010-2011
The following figure represent the words’ discrimination parameters computed by the W ordf ish
estimation performed on the subset of economic news reported while the Berlusconi government was in charge. Interpretation is equivalent to the Eiffel plot in the previous appendix
B.
Figure 15: Word discrimination parameters for economic news on the Italian news programs
26
D Appendix - List of top 25 words associated with economic news’ poles of the latent space
Table 3: List of top 25 words associated respectively with positive and negative beta scores
Tokens
bors_stat
tagl_bilanc
capovolg
ital_aggiung
rating_unit
mil_tutt
tempest_perfett
arriv_guadagn
cattedr
chiud_perd
arriv_miliard
aggiorn_colleg
de_bortol
voragin
tedesc_super
pers_valor
tremont_paregg
percentual_iva
moment_econom
rest_ben
europ_scont
fin_prim
part_union
temporal
andat_bors
b
-1.50
-1.50
-1.50
-1.50
-1.50
-1.50
-1.50
-1.50
-1.51
-1.51
-1.51
-1.51
-1.51
-1.51
-1.51
-1.51
-1.51
-1.51
-1.51
-1.51
-1.51
-1.51
-1.51
-1.51
-1.51
psi
-4.83
-4.65
-4.73
-4.83
-5.19
-4.49
-5.34
-4.19
-5.35
-4.94
-5.06
-4.43
-5.06
-4.00
-5.06
-5.06
-5.35
-4.84
-4.84
-4.94
-5.35
-4.58
-4.10
-4.58
-5.07
Tokens
quattord_gennai
mirafior_referendum
referendum_accord
regol_rappresent
fatt_accord
accord_mirafior
referendum_futur
fiom_cobas
accord_separ
cgil_fiom
conten_accord
nuov_fiat
esclusion_fiom
gennai_referendum
accord_import
fia
fim_cisl
debutt_bors
alluvion_venet
fiat_mirafior
ivec
alto_dicembr
ventott_gennai
videogam
fiom_accord
27
b
5.45
5.39
5.19
4.92
4.81
4.80
4.79
4.75
4.74
4.71
4.60
4.53
4.51
4.39
4.28
4.20
4.08
4.05
4.02
3.94
3.83
3.78
3.77
3.73
3.70
psi
-8.52
-9.05
-8.72
-9.14
-8.72
-7.61
-8.47
-8.64
-7.88
-8.19
-8.74
-7.45
-8.63
-8.33
-8.05
-7.14
-7.94
-6.96
-8.02
-5.76
-7.39
-7.22
-6.87
-6.76
-7.64
E Appendix - Eiffel plot of the distribution of tokens
for the subset of crime news (2010-2011)
The following figure represent the words’ discrimination parameters computed by the W ordf ish
estimation performed on the subset of crime news reported while the Berlusconi government
was in charge. Interpretation is equivalent to the Eiffel plot in the previous appendix B.
Figure 16: Word discrimination parameters for crime news on the Italian news programs
28
F Appendix - List of top 25 words associated with the
crime news’ poles of the latent space
Table 4: List of top 25 words associated respectively with positive and negative beta scores
Tokens
trascin_div
famos_german
fium_giorn
lasc_entrar
anni_esatt
purific
mai_screz
iniz_vit
immens_dolor
propr_cont
vib_valenz
cos_vuot
dov_spegn
disag_viaggiator
pegn_rivendic
rivendic_mulin
doman_sempr
salv_gett
motovedett_riusc
temperatur_picc
gest_protest
parol_parol
scors_infatt
impiant_riscald
propr_stess
b
-2.00
-2.00
-2.01
-2.01
-2.01
-2.01
-2.01
-2.01
-2.01
-2.02
-2.02
-2.03
-2.03
-2.03
-2.03
-2.03
-2.03
-2.03
-2.03
-2.03
-2.03
-2.03
-2.03
-2.04
-2.04
psi
-5.67
-5.96
-5.97
-5.68
-5.56
-5.46
-5.82
-6.16
-5.37
-5.98
-5.07
-5.83
-5.48
-5.48
-5.70
-5.70
-5.70
-5.84
-5.48
-5.48
-5.30
-5.02
-5.71
-5.59
-5.85
Tokens
grad_accogl
profug_dov
emergent_accogl
accogl_cinquant
situazion_igien
dunqu_orma
trasfer_emigr
marin_sammarc
destin_migrant
arriv_region
marcator
equipagg_asso
termin_consigl
govern_sicil
tripol_equipagg
attracc_nav
maron_region
vers_piattaform
mezz_sbarc
immigr_dic
quarant_immigr
isol_sempr
don_dio
maron_frattin
central_sci
29
b
6.58
6.22
6.19
5.88
5.86
5.80
5.43
5.42
5.34
5.33
5.24
5.06
5.05
5.02
4.96
4.95
4.94
4.92
4.90
4.87
4.82
4.77
4.74
4.70
4.69
psi
-14.69
-13.33
-13.56
-12.46
-12.88
-12.83
-11.88
-11.94
-12.02
-11.45
-11.43
-10.84
-11.50
-9.55
-10.90
-10.67
-10.57
-10.17
-11.01
-10.95
-10.83
-10.46
-10.29
-9.47
-10.00