You are on page 1of 29

The Public and Legislative Impact of

Hyper-Concentrated Topic News


Karthik Sheshadri and Munindar P. Singh
Department of Computer Science, North Carolina State University

kshesha@ncsu.edu, mpsingh@ncsu.edu

News has been shown to influence public perception, affect technology de-

velopment, and increase public expression. We demonstrate that framing, a


subjective aspect of news, appears to influence both significant public percep-
tion changes, and federal legislation. We posit, counterintuitively, that domain

news volume and mean article similarity increase and decrease together. We
show that specific features of news, such as publishing volume, appear to in-
fluence sustained public attention as measured by annual Google Trend data,

and federal legislation. We observe that public attention changes are often
foreshadowed by periods of high news volume and similarity between articles,
which we call hyper-concentrated news periods. Finally, we contribute the mea-

sures of framing density and framing polarity, which we demonstrate to exhibit


distinctive patterns during periods in which changes in framing occur, while
remaining otherwise stable. We note, however, that our analysis does not dis-

prove reverse causality and other unobservable confounding factors.

1
Introduction and Contributions

The effect of news on public behavior has been the subject of considerable scientific interest.
Prior work has established that news framing influences public perception (1, 2), affects tech-
nology development (3, 4), and contributes to setting agendas (5). Most recently, publishing

from small news outlets has been shown to increase short-term public involvement in specific
domains (6).

Our work enhances understanding by explicitly modeling the Granger causal (G-causal) (7)

link between specific news characteristics, public opinion, and federal legislation. We note
that Granger causality captures directionality in correlation between time series but does not
correspond to “true” causality. In this work, we restrict ourselves to Granger causality and

indicate every use of the term with the qualifier “Granger” or “G.” Firstly, we demonstrate a
predictive relationship between news characteristics and federal legislation, independently of
changes in public opinion.

Secondly, we show that public and legislative reaction to news follows a punctuated equilib-
rium model (8). The punctuated equilibrium model, adopted from evolutionary biology, posits
long periods of equilibrium during which there is little change, punctuated with short dura-

tions of macromutation. Similarly, we observe that the public and the federal legislature react
substantially at discrete intervals (analogous to macromutation in the above model), rather that
uniformly and gradually. We identify a defining characteristic of news periods that elicit such

substantial reactions, namely, that they have high news-volume occurring simultaneously with
high similarity between articles. We term such periods hyper-concentrated news periods. We
note that King et al.’s (6) approach artificially created such news conditions for short time peri-

ods and reader subsets.

2
Thirdly, news reporting in general introduces subjective biases, referred to as framing. We adopt
Entman’s (9) formulation in this paper (9). Whereas news publishing is ordinarily event driven,

we demonstrate that hyper-concentrated news periods, combining high article volume and sim-
ilarity, can occur spontaneously, without event-based drivers, as a Granger causal effect of news
framing (see Fig. 1 for a compelling example). We find that hyper-concentrated periods brought

about by framing are equally influential in predicting public approval (defined as the fraction of
the public that approves of a particular position) and legislation. This finding demonstrates that
the framing of news is as influential as the events and facts reported on in the news.

Additionally, we demonstrate that news publishing volume within a specific domain is a reli-
able long-term predictor of public attention (the number of people who demonstrated interest
in a domain by conducting an Internet search), measured annually using Google Trend data

(Fig. 2).

The Granger causal flow we discovered is depicted in Fig. 3. We confirmed each link using a
directional Granger causality test, which evaluates the influence of a G-causal time series on

a G-caused one. All tests were conducted at the α = 0.05 significance level. Our choice of
Granger causality over a structural model was deliberate, since we wished to infer rather than
assume structure and direction. We note that prior research (10) agrees with this choice.

Hyper-Concentrated News Periods

Our observations stem from a remarkable pattern that holds reliably over the set of domains and

articles we examined (we highlight several highly compelling cases in the text and supplement,
and present a full list in Table S2).

We posit the idea of a hyper-concentrated period of domain news as one that is characterized by

high article volume occurring simultaneously with high mean similarity between articles. We

3
study public reaction to news and find that Granger causally significant changes in public atten-
tion and approval generally occur during such periods. Fig. 4 illustrates a hyper-concentrated

news period for the Surveillance domain.

We define the mean similarity of a domain corpus of size n as the average cosine distance (11) in
paragraph vector (12) space between all n2 pairs of articles in the corpus. In each domain, the


number of news articles published in a certain year varies directly with mean corpus similarity
(we found that each G-caused the other). This finding is surprising, since one would expect a
larger volume of articles to discuss a larger variety of subjects. Instead, we found that domain

news publishing volume is mainly event driven, and events increase not only the mean similarity
of the corpus but also its volume. For example, the number of Surveillance articles increased
by 220% in 2013, with 93% of the total (67 of 72) being primarily about Snowden. Although it

is well known that news is event driven, the discovery of a Granger causal relationship between
article volume and mean corpus similarity is a novel finding of our work.

Related Work

Our approach is similar in spirit to King et al. (6) in that both their work and this paper examine
the effect of news on public attention. However, our work yields several novel results. We posit

the hyper-concentrated news period, and show that hyper-concentrated news is a Granger causal
precursor to legislation. Our analysis applies to a larger population than the outlets used by King
et al., since our data sources (see section on Data Sources) (13, 14) enjoy wide readership. We

measure public perception annually rather than over a period of weeks, as King et al. do. We
distinguish between fact-based reporting and framing, and demonstrate that framing in itself is
a Granger causal predictor of public attention and legislation.

In addition, we note that King et al. (6) artificially created localized short duration hyper-

4
concentrated news periods in their work. The fact that these short duration hyper-concentrated
news periods did not G-cause legislation motivates the question of how long a hyper-concentrated

news period must last in order to have such an influence. In our data, we found G-causality be-
ginning from a hyper-concentrated news duration of eleven months, and consequently employ
this threshold uniformly throughout the paper.

Our conception of a hyper-concentrated news period is consistent with the idea of punctuated
equilibria of media attention introduced by Baumgartner and Jones (8), and the notion of an
availability cascade posited by Kahneman (15). Finally, we note that Jacoby (16) observes

correlation between news coverage and legislation in a particular domain (Bankruptcy). We


present several novel results that build on this body of work. We show that periods of macro-
mutation (8) between punctuated equilibria, and availability cascades, may be brought about by

news framing, without event-based drivers. Whereas earlier work, such as by Baumgartner et
al. and Edwards and Wood (10), discusses Granger causal effects between media coverage and
Congress, we demonstrate that punctuated equilibria extend to sustained public attention and

legislative reaction. Further, existing literature does not explicitly model the Granger causal
link between punctuated equilibria and legislation, but restricts itself to measuring reactions by
Congress and the President (10). In contrast, we establish Granger causality between hyper-

concentrated news periods and federal legislation.

Existing literature investigates various aspects of framing, and the terms “frame” and “fram-
ing” are consequently used to refer to various levels of analysis. For instance, Benford and

Snow (17) identify three core framing tasks: diagnostic, prognostic, and motivational framing.
Further, collective action frames have been defined corresponding to the generation of interpre-
tive frames that differ from and challenge existing ones. Existing work further studies “injustice

frames” (17) as a particular subset of collective action frames that call attention to the victims of

5
a given perceived injustice, and amplify their perceived suffering. Frame amplification (beyond
injustice frames) and extension in particular have also been studied (17) The term “framing” is

sometimes used to refer to tactics (17) that invoke human mental processes that lead members
of the public to selectively focus on certain problems rather than on others. We acknowledge
that our measures of framing do not probe such fine-grained processes, and our use of the word

“frame” does not refer to such analyses.

Materials and Methods

This section describes novel algorithms and methods introduced in our work. The supplement
provides additional details.

Dataset Collection

We describe our data sources, method for domain dataset generation, and dataset quality evalu-
ation below.

Data Sources

We used publicly accessible Application Programming Interfaces (APIs), specifically those of

The New York Times (18) and The Guardian (19), to create our news datasets. In addition to
the large volume of relevant news made available by these two publications (see Section S1),
our choice is motivated by their well-documented influence on public attitudes and perception

(2,20–23). We note that The New York Times has previously been shown to influence legislation
(24), making it an ideal choice for our study.

6
Domain Dataset Generation

We define domain news as referring to all news publishing primarily related to a specific do-
main (see Section S1 and Table S1). We considered an article as belonging to a domain if and

only if the article could not have been published with the domain component removed. An
article is called a positive or negative for a specific domain according to whether it belongs or
does not belong to the domain. As an example, consider the article James Clapper on Don-

ald Trump, Edward Snowden, torture and “the knowability of truth”, which is a true negative
from the Surveillance domain. We code it as negative because whereas the article is relevant to
Surveillance (in that it mentions Edward Snowden), it addresses several other subjects as well.

We therefore decided that the article would qualify as news with the Surveillance component
removed.

For each domain, we extract news data during the time period b (denoting the beginning) to e

(denoting the end), of our period of interest. We are interested in creating domain datasets for
two tasks: (i) detecting framing changes and (ii) examining news preceding federal legislation.
For the first task, our period of interest is ten years before and after an independently docu-

mented ground truth framing change positive (see Section S3.1). For federal legislation, our
period of interest is ten years before a law was enacted.

To create our datasets for each domain, we began with a simple term search for the domain of

interest, for instance, “surveillance.” We then extract a random subset of the articles returned by
this search (we used a cardinality of 100 for the results presented in this paper), and manually
code these into domain positives and negatives. Using the positives and negatives from our

manual coding, we train an initial binary Random Forest (25) classifier for each domain.

We define our period-specific universal set Ub,e as the set of news articles retrieved via either

7
API and published during the time period b to e.

+
Using the the Random Forest classifier we trained in the previous step, we extract Ub,e , the set

of domain positives in this period. We also extract Ub,e , the set of hard domain negatives in this
period. Hard negatives (26) are negatives that lie within a threshold distance from the classifier’s
decision boundary. We follow the procedure detailed in (26) to extract hard negatives for each

domain.

+
We iterate this training procedure until the cardinality of the returned positive set Ub,e does not
change. This final positive set forms our domain dataset. We find that this iterative training

approach increases dataset quality over a single-shot classifier.

In addition, we point out that this supervised classification procedure enhances performance
over the keyword based search approach, e.g., as used by King (6), since, as shown by previous
work (27, 28), ground truth positives in a particular domain do not necessarily contain high en-

tropy keywords. Further, Murukannaiah et al. (28) show that the use of supervised classification
uniformly enhances recall over the use of simple keyword based approaches for domains with
sufficient seed data, generally obtainable using our iterative procedure.

Dataset Quality

Our dataset generation process resulted in a dataset with a median per domain precision (the
number of articles in a domain dataset that are relevant to that domain, see Section S1) of 0.93
and a median per domain interrater agreement index (Kappa) (29) of 0.87.

To gain confidence that the articles obtained using the approach described above are relevant
to each domain, we manually reviewed samples from each domain dataset. In particular, to
estimate the precision we coded random samples of 200 each from The New York Times and

8
The Guardian from each domain, for a total of 200 × 10 = 2,000 samples. Table S1 shows
interrater agreement using Cohen’s Kappa (29) of our domains.

We did not directly measure recall. However, since news publications have a strong incentive
to broadly cover events, and The New York Times and The Guardian have the largest and fifth
largest circulations in America and the world, respectively (13, 14), we assume that sufficiently

many relevant articles are included in our corpus.

Discriminative Keywords

We are interested in identifying and summarizing those aspects of a domain’s current framing
that distinguish it from the domain’s framing at a previous time period. To this end, we adopt
the idea of an entropic formulation of discriminative keywords, as proposed by Sheshadri et

al. (27).

Below, a corpus T is a set of news articles. Specifically, given two disjoint sets of news articles
T1 and T2 , we identify a set of k n-grams that yield the largest Information Gain (IG) (30) in

the combined corpus T = T1 ∪ T2 . Let A be an article in corpus T . Let xi represent any of the
possible m n-grams in T . Let S(xi , T ) = {A ∈ T |xi ∈ A} be the set of articles in corpus T in
which the n-gram xi appears. We use a |T | × m term frequency (TF) matrix representing the

corpus to calculate H, the information entropy of T .

S(xi , T )
IG(T, xi ) = H(T ) − H(S(xi )) (1)
|T |

Following Entman’s (9) formulation, this approach weights n-grams that are specific to a par-
ticular corpus more highly than n-grams that are common to both corpora. A quick intuition for

the approach is obtained by considering that the unigram “Snowden” has a high utility in distin-

9
guishing Surveillance articles published after January 1st 2014 from those before then, but the
unigram “surveillance” is common to articles from both periods and therefore does not.

Since keywords from a particular news corpus distinguish it from others, they may be said to
represent the “concentration” of news in that corpus.

Corpus and Document Similarity

We estimate the mean similarity of a corpus of documents as the mean of its pairwise document

similarities, using all n2 combinations from the corpus. In order to estimate similarity between


two documents, we adopt doc2vec (12), a well-known tool that generates a vector representation
(called a “paragraph vector”) of a document. Specifically, we use a standard doc2vec model

(31), pretrained on the entire Wikipedia corpus to compute a vector for each document in our
corpus. We define the pairwise similarity of two documents as the cosine similarity of their
respective document vectors (32).

Predicting Likelihood of Legislation

We conduct experiments that predict the likelihood of legislative activity given an input news
pattern, using supervised classification with a Naı̈ve Bayes classifier.

We predict an annual binary legislative label. For a given news domain in a given year, the label

represents a prediction about whether there is likely to be significant legislative activity based
on the historical news pattern in that domain. We use publishing volume and mean similarity
as our predictive features. Since the ranges of possible values these features take vary quite

dramatically across domains (Figs. 1, 2, 4, 5), the raw values of the features do not capture
the presence or absence of a hyper-concentrated news period and are therefore not predictive.
Instead, we first normalize the annual value of each feature by dividing it by the maximum value

10
that the feature takes on within the domain under consideration. Then, since changes in value
of these features across successive years are a more promising basis than raw feature values for

evaluating the presence of a hyper-concentrated news period, we compute the annual difference
of each feature, and use the absolute values of the differences in our classifier. As an intuitive
illustration, consider that a yearly news volume of 80 articles or a mean similarity of 0.2 is not

predictive in itself (given the variation in these values across domains), but an increased volume
of 80 articles from last year or an increased similarity of 20% may be predictive.

Our training data thus consists of temporally separated pairs of values for each feature. Posi-

tives in our classifier are observations from years in which legislation was enacted, and simi-
larly, negatives are observations from years in which legislation was not enacted. For every pair
of observations (for every feature), we construct discrete probability distributions and normal-

ize them over each feature, to arrive at a pair of distributions that capture the change pattern
exhibited by the feature over years with and without legislation.

We then use these distributions to construct a joint change distribution of the two observa-

tions. This enables us to arrive at the conditional probability using a simple Bayesian formula-
tion:

Let (x1 , x2 ) be a feature vector pair. Then:

Pf (x1 |x2 )
Pf (x2 |x1 ) = P (2)
x2 Pf (x1 |x2 )

We adopt the Naı̈ve Bayes assumption to arrive at a single estimate from all the features:

Y
P (x2 |x1 ) = Pf (x2 |x1 ) (3)
f

11
We use a simple binary threshold to evaluate a binary legislative or not legislative label, P (x2 |x1 ) <
t.

For our experiments, we adopt the Leave One Out approach, employing data from all classes
but one for training and using the remaining class as our test data.

Framing Density

We contribute the notion of framing density, measured by entropic news keywords. We use
entropy between pairs of temporally disparate news corpora (as described earlier) to rank in-

dividual n-grams for their effectiveness in distinguishing the later corpus from the earlier one.
Entropic keywords therefore represent the “concentration” of a news domain at a given time.
We define the annual framing density of a given domain as the number of keywords required

to attain 99.7% of dataset entropy, corresponding to three standard deviations of a normal dis-
tribution (even though the present distribution is not necessarily normal, our intention is to
capture the bulk of the probability mass), between the present annual corpus and the preceding

one.

Framing Polarity

We define the framing polarity of a news corpus as the average sentiment polarity (33) of ad-
jectives and adverbs within it. Since adjectives and adverbs cannot be used to state underlying
facts or events, they represent artifacts of how an event is framed. In order to measure framing

polarity of a corpus, we first use the Stanford CoreNLP parser (34) to identify all adjectives and
adverbs in it. We then retrieve the sentiment polarity of each adjective and adverb from (33),
and finally average them to arrive at our framing polarity. We calculate annual framing polarity

within each domain, by constructing annual corpora from the full domain corpus.

12
Framing Change Detection

We demonstrate that our measures of framing polarity and concentration detect framing changes.
We show that polarity of framing and framing density change dramatically during ground truth

framing changes, while remaining largely stable in their absence.

Experiments and Discussion

We summarize our findings in the subsection below. Next, we describe comparisons with polit-
ical framing. Finally, we discuss validation of our hypothesis using the comparative method in
succeeding subsections.

Main Findings

To establish the Granger causal effect of hyper-concentrated news on legislation, we considered

all federal legislation enacted beginning from 1991 to 2016. Our choice was motivated by the
fact that our APIs provide access to data beginning in 1991. Approximately eleven percent
of federal legislation in this period was Granger-caused by hyper-concentrated news periods.

Whereas we do not claim hyper-concentrated news periods to be a necessary condition for


legislation, we conclude that the probability of legislation being Granger caused by a hyper-
concentrated period is statistically significant.

We illustrate our approach and results in Fig. 5, using a compelling example from the domain
of Child Privacy. We use the abbreviation “HC Period” to refer to hyper-concentrated news
periods in this and other figures. The primary laws governing children’s privacy protection in

the United States are COPPA (35) and FERPA (36). COPPA was introduced in April 1999,
and went through a series of amendments from 1999 through 2002, and again from 2009 to

13
2015. FERPA was enacted in 1974. Due to the unavailability of children’s privacy news articles
before 1974 (a keyword search via The New York Times API returns zero articles), we restrict

our analysis to COPPA. The Granger causal variables of interest in Fig. 5 are annual news vol-
ume (dotted blue), and mean pairwise article similarity (dashed red). We represent the presence
or absence of COPPA legislative activity by a binary time series depicted with solid brown in

Fig. 5. Our Granger causality tests are therefore conducted between pairs of independent and
(hypothesized) dependent time series, such as between news volume (dotted blue) and COPPA
legislation (solid brown) in Fig. 5. We observe that the number of news articles published on

the topic more than doubled between 1991 and 1999, coupled with a simultaneous increase in
mean article similarity. Following this hyper-concentrated period, COPPA legislation was pro-
mulgated through the period ending in 2002. Another hyper-concentrated period occurs before

the revival of interest in COPPA, as seen in the large number of amendments in 2009.

We tested the Granger causal flow depicted in Fig. 3 over the set of domains obtained as de-
scribed in the previous paragraph (as an example, using news volume and legislation as our

time series), yielding a median F-statistic and critical value of 5.63 and 4.45, respectively (see
Table S2 for a full list). Our results motivate the predictive utility of news as a Granger causal
set of independent variables that influence legislation. To further explicate this connection, we

conducted a set of supervised learning experiments to see if legislation in a given domain may
be predicted using training data only from other domains. Using the Granger causal variables
described above, we constructed joint probability distributions that capture the change pattern

exhibited by each feature over both legislative years and years with no legislation. We then
used a Leave One Out evaluation approach to predict legislation in each domain using every
other domain as training data (described in the Materials and Methods Section). We obtained

a median area under the Receiver-Operating Characteristic (25) curve of 87.5% over our set of
domains.

14
Google Trends (37) estimate public interest in a topic of interest by measuring related searches
worldwide over chosen time periods. Since 89% of US (38) and 82% of UK residents (39) use

the Internet and 74% of Internet users use Google as their primary search (40), we posit that
Google Trends are a representative measure of public attention. We modeled Granger causality
tests for our domains between article volume and Google Trend volume, yielding a median F-

statistic and critical value of 5.72 and 4.39, respectively. We illustrate this finding using the
patterns depicted in Fig. 2 and Fig. 5.

The LGBT rights domain, depicted in Fig. 1, illustrates the Granger causal influence of framing

on public opinion. Note that the negativity of framing drops in the year 2004–2005, which coin-
cides with a measured framing change from a ground truth survey (41). The figure demonstrates
an inverse relationship between framing negativity and public approval, as framing changed

from one emphasizing morality to the current-day focus on equal rights. We note that following
this trend, major LGBT legislation legalizing same-sex marriage in fifty states was promulgated
in 2016.

This result is noteworthy in that it is the polarity of news framing in the area, rather than spe-
cific news events, that Granger causes legislation at the 0.05 significance level. This finding is
reinforced by the fact that event-based drivers cannot influence framing polarity, since only ad-

jectives and adverbs, taken here to be artifacts of how a domain is framed, contribute to framing
polarity.

To gain confidence in our findings, we address an alternative hypothesis of note, namely, that

political framing Granger causally influences news framing, and not vice versa. We do not in
general deny that such a Granger causal direction may exist—indeed such an effect has been
demonstrated in prior work using news data collected from print newspapers (42). However,

we show that this effect is not Granger causally significant for our data over the domains we

15
examine.

In order to do so, we downloaded the Republican and Democratic Party Platforms from 1984

to 2016, and used a simple term search procedure to identify paragraphs relevant to a particular
domain. We then identified framing keywords and framing polarity of the returned paragraphs.
A representative example for the LGBT domain is shown in Fig. 1. G-causality for the (Po-

litical Framing, News Framing), (Political Framing, Public Approval), and (Political Framing,
Legislation) tuples were insignificant for this example, in stark contrast to the (News Framing,
Public Approval), (News Framing, Legislation) tuples, consistent with our hypothesis. We refer

the reader to Table S2 for a full list comparing the Granger causal effect of news with the effect
of political framing on legislation, for the domains we consider. We describe full details of this
study in the subsection below.

Fig. 4 depicts framing density versus time for the Surveillance domain, around the period of the
Snowden revelations. Note that in 2013, a single n-gram (Snowden) suffices to represent 99.7%
of dataset entropy. Further, fig. S7 depicts framing density for the four domains (Smoking,

Surveillance, Obesity, and LGBT Rights) that indicate framing change versus random news,
demonstrating that our measure of framing density distinctively picks up ground truth framing
changes. We found Granger causation between framing density, public attention, and legisla-

tion.

It is worthwhile to point out that the Snowden revelations, which we use in Fig. 4 to depict
framing density, were an event-based driver of news, and not in themselves a framing change.

However, a 2016 Pew Research survey (43) demonstrates that following the Snowden revela-
tions, news coverage of Surveillance changed from a narrative focusing on national security to
one focusing on individual liberty and personal privacy. We further note that event-based news

drivers ordinarily cause framing changes (8).

16
Further, we point out that whereas the event of the Snowden revelations took place in late 2013,
the legislative response (The Freedom Act) was enacted two years later, in 2015. We show

that polarity of negative framing in Surveillance increased following the Snowden revelations
(Fig. 4), and remained high until 2015, corresponding exactly with our hyper-concentrated pe-
riod, after which legislation was promulgated and framing polarity increased.

Since news events cannot affect framing polarity (since framing polarity depends purely on
adjectives and adverbs), and we show that both framing polarity and framing density have dis-
tinctive patterns during framing changes (figs. S6 and S7), we conclude that framing Granger

causes legislation.

Hyper-Concentrated News versus Political Framing as a Granger Cause of


Legislation

This section details the full results of our Granger causality study. We begin with the complete

list of federal legislation promulgated from 1991 to 2016. For each law, we compiled a news
dataset according to the procedure detailed in the Domain Dataset Generation section.

From this list, we manually identified all domains which featured a hyper-concentrated news

period. Table S2 depicts this list. We found ten such cases, corresponding to approximately 11%
of all federal legislation in the period we consider. We then conducted Granger causality tests
between news volume and similarity in these periods, and the corresponding federal legislation.

We find a Granger causally significant result in each case. Our threshold for significance is
α = 0.05. For each domain, Table S2 lists the smallest significance level at which we obtain a
G-causally significant result.

We address the alternative hypothesis that political framing G-causes legislation. In order to
do so, we downloaded the Democratic and Republican party platforms from 1984 to 2016, and

17
measured political interest in the relevant domain as the number of paragraphs devoted to the
domain in an annual platform (we did not normalize to document length since there was no

sizeable variation in length). We then conducted G-causality tests with federal legislation in
the same domains. For nine out of ten cases, we discovered that political framing did not G-
cause legislation at the α = 0.05 level. Interestingly, for three of these domains, we obtained

a p = 0.20 for the hypothesis that political framing G-causes legislation. However, we note
that this result is much weaker than the G-causal significance we obtain for hyper-concentrated
news.

Some domains remain unmentioned through the relevant period in both party platforms, such as
Cyberbullying, Drones, and the HTML data leak. For such domains, since the political parties
do not mention the domain, we conclude that there was no measurable political framing of these

domains (Table S2). Therefore, these domains do not affect our hypothesis, given significant
G-causal measures between hyper-concentrated news characteristics and federal legislation. An
entry in bold indicates a significant Granger causal effect of political framing.

We are unable to highlight all our domains in the main paper or supplementary material due to
space constraints. We encourage the reader to refer to our data and code repository available at
https://github.com/karthiksheshadri/Hyper-Concentrated-News from which all our results may

be reproduced.

Framing Change Detection

Fig. S6 shows framing polarity, and fig. S7 depicts framing density for the framing change
positives, as well as for a random control. To generate the random control, we retrieved a
sample of 991 articles from the NYT API with a null query, for each year between 1990 to

2016.

18
As is evident, framing polarity around ground truth framing changes undergoes dramatic changes,
while remaining close to constant and in tune with the framing polarity of random news dur-

ing non framing change periods. As an example, consider that whereas the framing polarity
of LGBT news between 1990 and 2000 remains fairly similar to that of random news in that
period (fig. S6), it drops dramatically in 2004, corresponding to the year in which the framing

change occurred.

Similarly, our measure of framing density for the framing change positives shown in fig. S7,
depicts an exponential entropy decline as compared to an approximately linear decline in the

case of random news.

This observation corroborates our finding that our measures of framing change detection suc-
cessfully isolate ground truth framing changes.

Comparative Evaluation

Finally, we evaluate the validity of our hypothesis using the comparative method (44). We con-

ducted tests using both the most similar and most different research designs. Full results from
both designs are presented in tables included in the supplementary material. We summarize our
research design and findings here.

The most similar paradigm (44) relies on comparing highly similar cases that differ only in the
dependent variable, as well as in a single or only a few independent variables. Given that the
dependent variables differ, the paradigm assumes that the few differing independent variables

must be responsible. To use the most similar paradigm, we take advantage of the fact that
a domain is most similar to itself. To evaluate our hypothesis that hyper-concentrated news
periods Granger cause legislation, we use all our domains, and evaluate Granger causality of

the domain’s news patterns with legislation, both with and without the presence of a hyper-

19
concentrated period. Table S3 lists the results. In nine out of ten cases, periods that are not
hyper-concentrated within a domain did not Granger cause legislation, whereas in all ten cases,

we find Granger causality between hyper-concentrated periods and legislation.

We also evaluate our hypothesis using the most different research paradigm (44), which relies
on comparing strongly different cases, all of which however have in common the same depen-

dent variable, so that any similarity in the independent variables must explain the common value
of the dependent variable. In order to estimate “difference” between our domains, we define
a custom distance function (Euclidean over our features) based on our news features. We use

the following news features as descriptors of each domain: (i) length of its hyper-concentrated
period (ii) maximum, minimum, and mean annual article volume (used as three separate fea-
tures), (iii) maximum, minimum, and mean framing polarity (used as three separate features),

(iv) maximum, minimum, and mean framing density (used as three separate features). Note
that we do not normalize the raw values of our features, since they characterize the domain and
we are making inter-domain comparisons. However, we normalize our overall distance to a

scale of zero to one. Our data contains ten domains with hyper-concentrated periods. We com-
pute all 10

2
= 45 distances, and pick the top ten to represent our most different domain pairs,
shown in Table S3. To confirm that these pairs represent notably differing domains, we use the

Kolmogorov-Smirnov test (45) to test for similarity on the annual article volume feature. None
of our domain pairs achieve significant similarity, confirming that our distance metric yields
meaningful results. However, since in each of these domains, federal legislation was enacted,

and further since each domain contains a hyper-concentrated news period (the only common
independent variable), we conclude that our hypothesis holds under the most different research
paradigm.

In this context, let us address a concern that our results rely on a particular choice of domains.

20
Note that we exercised no explicit choice in collecting our original set of domains (we ex-
amined all domains that featured federal legislation in the periods for which the NYT and

Guardian APIs provide data). We then analyzed those ten of these domains that featured a
hyper-concentrated news period. As we show in Table S2, all of these domains Granger cause
legislation at the α = 0.05 level. However, our comparative analysis demonstrates through the

most similar (Table S3) and most different (Table S4) paradigms that our hypothesis remains
valid despite wide variation in the domains.

Conclusion

Our data supports our conclusion that hyper-concentrated news periods in news, brought about
both by driver events and framing changes, Granger causally influence public attention and fed-

eral legislation. We acknowledge, however, that our analysis does not disprove reverse causality,
and cannot model other unobservable confounding factors.

References

1. A. Gunther, The persuasive press inference effects of mass media on perceived public opin-
ion. Communication Research 25, 486–504 (1998).

2. D. Mutz, J. Soss, Reading public opinion: The influence of news coverage on perceptions

of public sentiment. Public Opinion Quarterly 61, 431–451 (1997).

3. C. Hoadley, H. Xu, J. Lee, M. B. Rosson, Privacy as information access and illusory control:
The case of the Facebook news feed privacy outcry. Electronic Commerce Research and

Applications 9, 50–60 (2010).

21
4. C. Taylor, After privacy uproar, Quora feeds will no longer show data on what other users
have viewed (2016). https://goo.gl/9wG65R.

5. S. Iyengar, D. Kinder, News that Matters: Television and American Opinion (University of
Chicago Press, 2010).

6. G. King, B. Schneer, A. White, How the news media activate public expression and influ-
ence national agendas. Science 358, 776–780 (2017).

7. C. Granger, Investigating causal relations by econometric models and cross-spectral meth-


ods. Econometrica: Journal of the Econometric Society 37, 424–438 (1969).

8. F. Baumgartner, B. Jones, P. Mortensen, Punctuated equilibrium theory: Explaining stabil-

ity and change in public policymaking. Theories of the Policy Process 8, 59–103 (2014).

9. R. Entman, Framing: Toward clarification of a fractured paradigm. Journal of Communi-


cation 43, 51–58 (1993).

10. G. Edwards, D. Wood, Who influences whom? the President, Congress, and the media.

American Political Science Review 93, 327–344 (1999).

11. P.-N. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining (Pearson Education India,
New Delhi, 2006).

12. Q. Le, T. Mikolov, Proceedings of the 31st International Conference on Machine Learning
(International Machine Learning Society, Beijing, 2014), pp. 1188–1196.

13. Wikipedia, The New York Times (2001). https://en.wikipedia.org/wiki/The New York

Times.

14. Wikipedia, The Guardian (2002). https://en.wikipedia.org/wiki/The Guardian.

15. D. Kahneman, Thinking Fast and Slow (Straus Farrar and Giroux, New York, 2011).

22
16. M. Jacoby, Negotiating bankruptcy legislation through the news media. Houston Law Re-
view 41, 1092-1144 (2004).

17. R. Benford, D. Snow, Framing processes and social movements: An overview and assess-
ment. Annual Review of Sociology 26, 611–639 (2000).

18. NYT, Developer APIs (2016). http://developer.nytimes.com/.

19. The Guardian, Guardian Open Platform, http://open-platform.theguardian.com/ (2016).

Accessed: 2016-3-3.

20. S. Althaus, D. Tewksbury, Agenda setting and the new news patterns of issue importance
among readers of the paper and online versions of the New York Times. Communication

Research 29, 180–207 (2002).

21. D. Drezner, H. Farrell, Web of influence. Foreign Policy 145, 32-41 (2004). http://www.
jstor.org/stable/4152942.

22. G. Golan, Inter-media agenda setting and global news coverage. Journalism Studies 7,
323–333 (2006).

23. S. Kiousis, Explicating media salience: A factor analysis of New York Times issue coverage
during the 2000 U.S. presidential election. Journal of Communication 54, 71–87 (2004).

24. S. Rahbar, “The Evil of the Age”: The influence of the New York Times on anti-abortion
legislation in New York, 1865–1873. Pennsylvania History Review 23, 146–176 (2016).

25. L. Breiman, Random forests. Machine Learning 45, 5–32 (2001). http://dx.doi.org/10.

1023/A:1010933404324.

26. J. Henriques, J. Carreira, R. Caseiro, J. Batista, Proceedings of the IEEE International


Conference on Computer Vision (IEEE Computer Society, Edmonton, Alberta, Canada,

23
2013), vol. 10, pp. 2760–2767.

27. K. Sheshadri, N. Ajmeri, J. Staddon, Proceedings of the 15th Privacy, Security and Trust

Conference (Calgary, Alberta, Canada, 2017), pp. 159–167.

28. P. Murukannaiah, C. Dabral, K. Sheshadri, E. Sharma, J. Staddon, Proceedings of the Hot


Topics in Science of Security: Symposium and Bootcamp, ACM (Association for Comput-
ing Machinery, Hanover, 2017), pp. 35–44.

29. A. Viera, J. M. Garrett, Understanding inter-observer agreement: The Kappa statistic. Fam-
ily Medicine 37, 360-363 (2005).

30. H. Harris, International Symposium on Artificial Intelligence and Mathematics (Fort Laud-

erdale, 2002).

31. J. H. Lau, Pre-trained doc2vec models (2017). https://tinyurl.com/yddn2pwb.

32. Wikipedia, Cosine similarity (2017). https://en.wikipedia.org/wiki/Cosine similarity.

33. S. Baccianella, A. Esuli, F. Sebastiani, Proceedings of the 7th ELRA International Confer-

ence on Language Resources and Evaluation (European Language Resources Association,


Valletta, Malta, 2010), pp. 2200–2204.

34. C. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. Bethard, D. McClosky, Association for

Computational Linguistics (ACL) System Demonstrations (2014), pp. 55–60. https://nlp.


stanford.edu/pubs/StanfordCoreNlp2014.pdf.

35. FTC, Children’s Online Privacy Protection Rule (1998). https://

www.ftc.gov/enforcement/rules/rulemaking-regulatory-reform-proceedings/
childrens-online-privacy-protection-rule.

24
36. US Department of Education, Family Educational Rights and Privacy Act, https://tinyurl.
com/ybohwmfm (1974).

37. P. Trasborg, The Google Trends API (2018). https://www.npmjs.com/package/


google-trends-api.

38. 11% of Americans don’t use the Internet. Who are they? (2018). http://www.pewresearch.
org/fact-tank/2018/03/05/some-americans-dont-use-the-internet-who-are-they/.

39. Internet users in the UK: 2017 (2017). https://tinyurl.com/yadnxcf6.

40. C. Mangles, Search engine statistics (2018). https://www.smartinsights.com/


search-engine-marketing/search-engine-statistics/.

41. S. Engel, Frame spillover: Media framing and public opinion of a multifaceted LGBT rights
agenda. Law and Social Inquiry 38, 403–441 (2013).

42. R. D. Flores, Taking the law into their own hands: Do local anti-immigrant ordinances

increase gun sales? Social Problems 62, 363–390 (2015).

43. Pew, The state of privacy in post-Snowden America (2016). http://www.pewresearch.org/


fact-tank/2016/09/21/the-state-of-privacy-in-america/.

44. L. Arend, Comparative politics and the comparative method. American Political Science

Review 65, 682–693 (1971).

45. F. Massey, The Kolmogorov-Smirnov test for goodness of fit. Journal of the American
statistical Association 46, 68–78 (1951).

46. A. Johnston, M. Warkentin, Fear appeals and information security behaviors: An empirical
study. Management Information Systems Quarterly 34, 549–566 (2010). http://www.jstor.
org/stable/25750691.

25
47. M. Cummings, R. Proctor, The changing public image of smoking in the United States:
1964–2014. Cancer Epidemiology and Prevention Biomarkers 23, 32–36 (2014).

48. S.-H. Kim, A. Willis, Talking about obesity: News framing of who is responsible for caus-
ing and fixing the problem. Journal of Health Communication 12, 359-376 (2007).

49. S. Preibusch, Privacy behaviors after Snowden. Communications of the ACM 58, 48–55
(2015).

50. J. Vanian, Drone legislation (2015). https://goo.gl/BZp7dJ.

Acknowledgments

This work was completed at the Department of Computer Science at NC State University. The
authors thank Chung-Wei Hang for valuable discussions about the contributions and for his
advice in ensuring the reproducibility of the results. The authors thank Pradeep Murukannaiah

for useful discussions. The authors thank the anonymous reviewers for helpful comments on a
previous version.

Funding: The authors thank the NC State University Laboratory for Analytic Sciences for
partial support.

Author contributions: K.S. prepared the datasets and performed the analysis. K.S. and
M.P.S. designed the evaluation approach and wrote the paper. M.P.S. led the project.

Competing interests: The authors declare that they have no competing interests.

26
Human or animal subjects: The project involved no human or animal subjects.

Data and materials availability: All data needed to evaluate the conclusions in the paper are
present in the Supplementary Materials. All data and code necessary to reproduce the findings
of the paper may be downloaded from https://github.com/karthiksheshadri/Hyper-Concentrated-News.

Supplementary Materials

Section S1. Dataset Quality

Section S2. Granger Causality

Section S3. Results on Detecting Framing Changes

Section S4. Results on Legislation

Section S5. Results from the Comparative Method

Fig. S6: Framing Polarity: Random versus LGBT, Surveillance, and Smoking news

Fig. S7: Framing Density: Random versus LGBT, Surveillance, and Smoking news

Fig. S8: News volume and similarity as predictors of legislation in Cyberbullying

Fig. S9: News patterns around the 2011 Facebook ID leak, and subsequent legislation

Table. S1: Interrater agreement as Cohen’s Kappa

Table. S2: Comparing the Granger causal effect of hyper-concentrated news against that of

political framing for legislation in our domains

Table. S3: A comparative evaluation of our hypothesis using the most similar research de-
sign

27
Table. S4: A comparative evaluation of our hypothesis using the most different research de-
sign

References (30–50)

28
List of Figures

Fig. 1: News framing as a Granger causal precursor to public approval changes and legislation
in the domain LGBT Rights. Public approval increases as negative framing declines. Note
that the sharp decline in framing polarity also detects the documented framing change of 2004.

(46)

Fig. 2: News volume and mean article similarity as predictors of public attention in the domain
Drones. Note that public attention (measured by Google Trends) climbs sharply during the

hyper-concentrated (labelled HC in the figure) period and then declines

Fig. 3: We posit the hyper-concentrated period of domain news, characterized by high arti-
cle volume and similarity, which G-causes public attention changes and legislation. Hyper-

concentrated periods arise either due to news events, or independently of events, due to news
framing. We observe and model every link in the figure, except the Events to News link, which
is shown with a dotted arrow

Fig. 4: Framing changes are characterized by low framing density and changes in framing polar-
ity. The figure shows news volume, mean article similarity, and framing density in the domain
Surveillance spike during a hyper-concentrated (labelled HC in the figure) period, foreshadow-

ing legislation

Fig. 5: News characteristics and legislation for the domain Child Privacy. Note that during the
periods 1996 to 1999 and 2009 to 2011, news volume and similarity sharply increase together,

foreshadowing COPPA promulgation and legislation, respectively

29

You might also like