Professional Documents
Culture Documents
kshesha@ncsu.edu, mpsingh@ncsu.edu
News has been shown to influence public perception, affect technology de-
news volume and mean article similarity increase and decrease together. We
show that specific features of news, such as publishing volume, appear to in-
fluence sustained public attention as measured by annual Google Trend data,
and federal legislation. We observe that public attention changes are often
foreshadowed by periods of high news volume and similarity between articles,
which we call hyper-concentrated news periods. Finally, we contribute the mea-
1
Introduction and Contributions
The effect of news on public behavior has been the subject of considerable scientific interest.
Prior work has established that news framing influences public perception (1, 2), affects tech-
nology development (3, 4), and contributes to setting agendas (5). Most recently, publishing
from small news outlets has been shown to increase short-term public involvement in specific
domains (6).
Our work enhances understanding by explicitly modeling the Granger causal (G-causal) (7)
link between specific news characteristics, public opinion, and federal legislation. We note
that Granger causality captures directionality in correlation between time series but does not
correspond to “true” causality. In this work, we restrict ourselves to Granger causality and
indicate every use of the term with the qualifier “Granger” or “G.” Firstly, we demonstrate a
predictive relationship between news characteristics and federal legislation, independently of
changes in public opinion.
Secondly, we show that public and legislative reaction to news follows a punctuated equilib-
rium model (8). The punctuated equilibrium model, adopted from evolutionary biology, posits
long periods of equilibrium during which there is little change, punctuated with short dura-
tions of macromutation. Similarly, we observe that the public and the federal legislature react
substantially at discrete intervals (analogous to macromutation in the above model), rather that
uniformly and gradually. We identify a defining characteristic of news periods that elicit such
substantial reactions, namely, that they have high news-volume occurring simultaneously with
high similarity between articles. We term such periods hyper-concentrated news periods. We
note that King et al.’s (6) approach artificially created such news conditions for short time peri-
2
Thirdly, news reporting in general introduces subjective biases, referred to as framing. We adopt
Entman’s (9) formulation in this paper (9). Whereas news publishing is ordinarily event driven,
we demonstrate that hyper-concentrated news periods, combining high article volume and sim-
ilarity, can occur spontaneously, without event-based drivers, as a Granger causal effect of news
framing (see Fig. 1 for a compelling example). We find that hyper-concentrated periods brought
about by framing are equally influential in predicting public approval (defined as the fraction of
the public that approves of a particular position) and legislation. This finding demonstrates that
the framing of news is as influential as the events and facts reported on in the news.
Additionally, we demonstrate that news publishing volume within a specific domain is a reli-
able long-term predictor of public attention (the number of people who demonstrated interest
in a domain by conducting an Internet search), measured annually using Google Trend data
(Fig. 2).
The Granger causal flow we discovered is depicted in Fig. 3. We confirmed each link using a
directional Granger causality test, which evaluates the influence of a G-causal time series on
a G-caused one. All tests were conducted at the α = 0.05 significance level. Our choice of
Granger causality over a structural model was deliberate, since we wished to infer rather than
assume structure and direction. We note that prior research (10) agrees with this choice.
Our observations stem from a remarkable pattern that holds reliably over the set of domains and
articles we examined (we highlight several highly compelling cases in the text and supplement,
and present a full list in Table S2).
We posit the idea of a hyper-concentrated period of domain news as one that is characterized by
high article volume occurring simultaneously with high mean similarity between articles. We
3
study public reaction to news and find that Granger causally significant changes in public atten-
tion and approval generally occur during such periods. Fig. 4 illustrates a hyper-concentrated
We define the mean similarity of a domain corpus of size n as the average cosine distance (11) in
paragraph vector (12) space between all n2 pairs of articles in the corpus. In each domain, the
number of news articles published in a certain year varies directly with mean corpus similarity
(we found that each G-caused the other). This finding is surprising, since one would expect a
larger volume of articles to discuss a larger variety of subjects. Instead, we found that domain
news publishing volume is mainly event driven, and events increase not only the mean similarity
of the corpus but also its volume. For example, the number of Surveillance articles increased
by 220% in 2013, with 93% of the total (67 of 72) being primarily about Snowden. Although it
is well known that news is event driven, the discovery of a Granger causal relationship between
article volume and mean corpus similarity is a novel finding of our work.
Related Work
Our approach is similar in spirit to King et al. (6) in that both their work and this paper examine
the effect of news on public attention. However, our work yields several novel results. We posit
the hyper-concentrated news period, and show that hyper-concentrated news is a Granger causal
precursor to legislation. Our analysis applies to a larger population than the outlets used by King
et al., since our data sources (see section on Data Sources) (13, 14) enjoy wide readership. We
measure public perception annually rather than over a period of weeks, as King et al. do. We
distinguish between fact-based reporting and framing, and demonstrate that framing in itself is
a Granger causal predictor of public attention and legislation.
In addition, we note that King et al. (6) artificially created localized short duration hyper-
4
concentrated news periods in their work. The fact that these short duration hyper-concentrated
news periods did not G-cause legislation motivates the question of how long a hyper-concentrated
news period must last in order to have such an influence. In our data, we found G-causality be-
ginning from a hyper-concentrated news duration of eleven months, and consequently employ
this threshold uniformly throughout the paper.
Our conception of a hyper-concentrated news period is consistent with the idea of punctuated
equilibria of media attention introduced by Baumgartner and Jones (8), and the notion of an
availability cascade posited by Kahneman (15). Finally, we note that Jacoby (16) observes
news framing, without event-based drivers. Whereas earlier work, such as by Baumgartner et
al. and Edwards and Wood (10), discusses Granger causal effects between media coverage and
Congress, we demonstrate that punctuated equilibria extend to sustained public attention and
legislative reaction. Further, existing literature does not explicitly model the Granger causal
link between punctuated equilibria and legislation, but restricts itself to measuring reactions by
Congress and the President (10). In contrast, we establish Granger causality between hyper-
Existing literature investigates various aspects of framing, and the terms “frame” and “fram-
ing” are consequently used to refer to various levels of analysis. For instance, Benford and
Snow (17) identify three core framing tasks: diagnostic, prognostic, and motivational framing.
Further, collective action frames have been defined corresponding to the generation of interpre-
tive frames that differ from and challenge existing ones. Existing work further studies “injustice
frames” (17) as a particular subset of collective action frames that call attention to the victims of
5
a given perceived injustice, and amplify their perceived suffering. Frame amplification (beyond
injustice frames) and extension in particular have also been studied (17) The term “framing” is
sometimes used to refer to tactics (17) that invoke human mental processes that lead members
of the public to selectively focus on certain problems rather than on others. We acknowledge
that our measures of framing do not probe such fine-grained processes, and our use of the word
This section describes novel algorithms and methods introduced in our work. The supplement
provides additional details.
Dataset Collection
We describe our data sources, method for domain dataset generation, and dataset quality evalu-
ation below.
Data Sources
The New York Times (18) and The Guardian (19), to create our news datasets. In addition to
the large volume of relevant news made available by these two publications (see Section S1),
our choice is motivated by their well-documented influence on public attitudes and perception
(2,20–23). We note that The New York Times has previously been shown to influence legislation
(24), making it an ideal choice for our study.
6
Domain Dataset Generation
We define domain news as referring to all news publishing primarily related to a specific do-
main (see Section S1 and Table S1). We considered an article as belonging to a domain if and
only if the article could not have been published with the domain component removed. An
article is called a positive or negative for a specific domain according to whether it belongs or
does not belong to the domain. As an example, consider the article James Clapper on Don-
ald Trump, Edward Snowden, torture and “the knowability of truth”, which is a true negative
from the Surveillance domain. We code it as negative because whereas the article is relevant to
Surveillance (in that it mentions Edward Snowden), it addresses several other subjects as well.
We therefore decided that the article would qualify as news with the Surveillance component
removed.
For each domain, we extract news data during the time period b (denoting the beginning) to e
(denoting the end), of our period of interest. We are interested in creating domain datasets for
two tasks: (i) detecting framing changes and (ii) examining news preceding federal legislation.
For the first task, our period of interest is ten years before and after an independently docu-
mented ground truth framing change positive (see Section S3.1). For federal legislation, our
period of interest is ten years before a law was enacted.
To create our datasets for each domain, we began with a simple term search for the domain of
interest, for instance, “surveillance.” We then extract a random subset of the articles returned by
this search (we used a cardinality of 100 for the results presented in this paper), and manually
code these into domain positives and negatives. Using the positives and negatives from our
manual coding, we train an initial binary Random Forest (25) classifier for each domain.
We define our period-specific universal set Ub,e as the set of news articles retrieved via either
7
API and published during the time period b to e.
+
Using the the Random Forest classifier we trained in the previous step, we extract Ub,e , the set
−
of domain positives in this period. We also extract Ub,e , the set of hard domain negatives in this
period. Hard negatives (26) are negatives that lie within a threshold distance from the classifier’s
decision boundary. We follow the procedure detailed in (26) to extract hard negatives for each
domain.
+
We iterate this training procedure until the cardinality of the returned positive set Ub,e does not
change. This final positive set forms our domain dataset. We find that this iterative training
In addition, we point out that this supervised classification procedure enhances performance
over the keyword based search approach, e.g., as used by King (6), since, as shown by previous
work (27, 28), ground truth positives in a particular domain do not necessarily contain high en-
tropy keywords. Further, Murukannaiah et al. (28) show that the use of supervised classification
uniformly enhances recall over the use of simple keyword based approaches for domains with
sufficient seed data, generally obtainable using our iterative procedure.
Dataset Quality
Our dataset generation process resulted in a dataset with a median per domain precision (the
number of articles in a domain dataset that are relevant to that domain, see Section S1) of 0.93
and a median per domain interrater agreement index (Kappa) (29) of 0.87.
To gain confidence that the articles obtained using the approach described above are relevant
to each domain, we manually reviewed samples from each domain dataset. In particular, to
estimate the precision we coded random samples of 200 each from The New York Times and
8
The Guardian from each domain, for a total of 200 × 10 = 2,000 samples. Table S1 shows
interrater agreement using Cohen’s Kappa (29) of our domains.
We did not directly measure recall. However, since news publications have a strong incentive
to broadly cover events, and The New York Times and The Guardian have the largest and fifth
largest circulations in America and the world, respectively (13, 14), we assume that sufficiently
Discriminative Keywords
We are interested in identifying and summarizing those aspects of a domain’s current framing
that distinguish it from the domain’s framing at a previous time period. To this end, we adopt
the idea of an entropic formulation of discriminative keywords, as proposed by Sheshadri et
al. (27).
Below, a corpus T is a set of news articles. Specifically, given two disjoint sets of news articles
T1 and T2 , we identify a set of k n-grams that yield the largest Information Gain (IG) (30) in
the combined corpus T = T1 ∪ T2 . Let A be an article in corpus T . Let xi represent any of the
possible m n-grams in T . Let S(xi , T ) = {A ∈ T |xi ∈ A} be the set of articles in corpus T in
which the n-gram xi appears. We use a |T | × m term frequency (TF) matrix representing the
S(xi , T )
IG(T, xi ) = H(T ) − H(S(xi )) (1)
|T |
Following Entman’s (9) formulation, this approach weights n-grams that are specific to a par-
ticular corpus more highly than n-grams that are common to both corpora. A quick intuition for
the approach is obtained by considering that the unigram “Snowden” has a high utility in distin-
9
guishing Surveillance articles published after January 1st 2014 from those before then, but the
unigram “surveillance” is common to articles from both periods and therefore does not.
Since keywords from a particular news corpus distinguish it from others, they may be said to
represent the “concentration” of news in that corpus.
We estimate the mean similarity of a corpus of documents as the mean of its pairwise document
similarities, using all n2 combinations from the corpus. In order to estimate similarity between
two documents, we adopt doc2vec (12), a well-known tool that generates a vector representation
(called a “paragraph vector”) of a document. Specifically, we use a standard doc2vec model
(31), pretrained on the entire Wikipedia corpus to compute a vector for each document in our
corpus. We define the pairwise similarity of two documents as the cosine similarity of their
respective document vectors (32).
We conduct experiments that predict the likelihood of legislative activity given an input news
pattern, using supervised classification with a Naı̈ve Bayes classifier.
We predict an annual binary legislative label. For a given news domain in a given year, the label
represents a prediction about whether there is likely to be significant legislative activity based
on the historical news pattern in that domain. We use publishing volume and mean similarity
as our predictive features. Since the ranges of possible values these features take vary quite
dramatically across domains (Figs. 1, 2, 4, 5), the raw values of the features do not capture
the presence or absence of a hyper-concentrated news period and are therefore not predictive.
Instead, we first normalize the annual value of each feature by dividing it by the maximum value
10
that the feature takes on within the domain under consideration. Then, since changes in value
of these features across successive years are a more promising basis than raw feature values for
evaluating the presence of a hyper-concentrated news period, we compute the annual difference
of each feature, and use the absolute values of the differences in our classifier. As an intuitive
illustration, consider that a yearly news volume of 80 articles or a mean similarity of 0.2 is not
predictive in itself (given the variation in these values across domains), but an increased volume
of 80 articles from last year or an increased similarity of 20% may be predictive.
Our training data thus consists of temporally separated pairs of values for each feature. Posi-
tives in our classifier are observations from years in which legislation was enacted, and simi-
larly, negatives are observations from years in which legislation was not enacted. For every pair
of observations (for every feature), we construct discrete probability distributions and normal-
ize them over each feature, to arrive at a pair of distributions that capture the change pattern
exhibited by the feature over years with and without legislation.
We then use these distributions to construct a joint change distribution of the two observa-
tions. This enables us to arrive at the conditional probability using a simple Bayesian formula-
tion:
Pf (x1 |x2 )
Pf (x2 |x1 ) = P (2)
x2 Pf (x1 |x2 )
We adopt the Naı̈ve Bayes assumption to arrive at a single estimate from all the features:
Y
P (x2 |x1 ) = Pf (x2 |x1 ) (3)
f
11
We use a simple binary threshold to evaluate a binary legislative or not legislative label, P (x2 |x1 ) <
t.
For our experiments, we adopt the Leave One Out approach, employing data from all classes
but one for training and using the remaining class as our test data.
Framing Density
We contribute the notion of framing density, measured by entropic news keywords. We use
entropy between pairs of temporally disparate news corpora (as described earlier) to rank in-
dividual n-grams for their effectiveness in distinguishing the later corpus from the earlier one.
Entropic keywords therefore represent the “concentration” of a news domain at a given time.
We define the annual framing density of a given domain as the number of keywords required
to attain 99.7% of dataset entropy, corresponding to three standard deviations of a normal dis-
tribution (even though the present distribution is not necessarily normal, our intention is to
capture the bulk of the probability mass), between the present annual corpus and the preceding
one.
Framing Polarity
We define the framing polarity of a news corpus as the average sentiment polarity (33) of ad-
jectives and adverbs within it. Since adjectives and adverbs cannot be used to state underlying
facts or events, they represent artifacts of how an event is framed. In order to measure framing
polarity of a corpus, we first use the Stanford CoreNLP parser (34) to identify all adjectives and
adverbs in it. We then retrieve the sentiment polarity of each adjective and adverb from (33),
and finally average them to arrive at our framing polarity. We calculate annual framing polarity
within each domain, by constructing annual corpora from the full domain corpus.
12
Framing Change Detection
We demonstrate that our measures of framing polarity and concentration detect framing changes.
We show that polarity of framing and framing density change dramatically during ground truth
We summarize our findings in the subsection below. Next, we describe comparisons with polit-
ical framing. Finally, we discuss validation of our hypothesis using the comparative method in
succeeding subsections.
Main Findings
all federal legislation enacted beginning from 1991 to 2016. Our choice was motivated by the
fact that our APIs provide access to data beginning in 1991. Approximately eleven percent
of federal legislation in this period was Granger-caused by hyper-concentrated news periods.
We illustrate our approach and results in Fig. 5, using a compelling example from the domain
of Child Privacy. We use the abbreviation “HC Period” to refer to hyper-concentrated news
periods in this and other figures. The primary laws governing children’s privacy protection in
the United States are COPPA (35) and FERPA (36). COPPA was introduced in April 1999,
and went through a series of amendments from 1999 through 2002, and again from 2009 to
13
2015. FERPA was enacted in 1974. Due to the unavailability of children’s privacy news articles
before 1974 (a keyword search via The New York Times API returns zero articles), we restrict
our analysis to COPPA. The Granger causal variables of interest in Fig. 5 are annual news vol-
ume (dotted blue), and mean pairwise article similarity (dashed red). We represent the presence
or absence of COPPA legislative activity by a binary time series depicted with solid brown in
Fig. 5. Our Granger causality tests are therefore conducted between pairs of independent and
(hypothesized) dependent time series, such as between news volume (dotted blue) and COPPA
legislation (solid brown) in Fig. 5. We observe that the number of news articles published on
the topic more than doubled between 1991 and 1999, coupled with a simultaneous increase in
mean article similarity. Following this hyper-concentrated period, COPPA legislation was pro-
mulgated through the period ending in 2002. Another hyper-concentrated period occurs before
the revival of interest in COPPA, as seen in the large number of amendments in 2009.
We tested the Granger causal flow depicted in Fig. 3 over the set of domains obtained as de-
scribed in the previous paragraph (as an example, using news volume and legislation as our
time series), yielding a median F-statistic and critical value of 5.63 and 4.45, respectively (see
Table S2 for a full list). Our results motivate the predictive utility of news as a Granger causal
set of independent variables that influence legislation. To further explicate this connection, we
conducted a set of supervised learning experiments to see if legislation in a given domain may
be predicted using training data only from other domains. Using the Granger causal variables
described above, we constructed joint probability distributions that capture the change pattern
exhibited by each feature over both legislative years and years with no legislation. We then
used a Leave One Out evaluation approach to predict legislation in each domain using every
other domain as training data (described in the Materials and Methods Section). We obtained
a median area under the Receiver-Operating Characteristic (25) curve of 87.5% over our set of
domains.
14
Google Trends (37) estimate public interest in a topic of interest by measuring related searches
worldwide over chosen time periods. Since 89% of US (38) and 82% of UK residents (39) use
the Internet and 74% of Internet users use Google as their primary search (40), we posit that
Google Trends are a representative measure of public attention. We modeled Granger causality
tests for our domains between article volume and Google Trend volume, yielding a median F-
statistic and critical value of 5.72 and 4.39, respectively. We illustrate this finding using the
patterns depicted in Fig. 2 and Fig. 5.
The LGBT rights domain, depicted in Fig. 1, illustrates the Granger causal influence of framing
on public opinion. Note that the negativity of framing drops in the year 2004–2005, which coin-
cides with a measured framing change from a ground truth survey (41). The figure demonstrates
an inverse relationship between framing negativity and public approval, as framing changed
from one emphasizing morality to the current-day focus on equal rights. We note that following
this trend, major LGBT legislation legalizing same-sex marriage in fifty states was promulgated
in 2016.
This result is noteworthy in that it is the polarity of news framing in the area, rather than spe-
cific news events, that Granger causes legislation at the 0.05 significance level. This finding is
reinforced by the fact that event-based drivers cannot influence framing polarity, since only ad-
jectives and adverbs, taken here to be artifacts of how a domain is framed, contribute to framing
polarity.
To gain confidence in our findings, we address an alternative hypothesis of note, namely, that
political framing Granger causally influences news framing, and not vice versa. We do not in
general deny that such a Granger causal direction may exist—indeed such an effect has been
demonstrated in prior work using news data collected from print newspapers (42). However,
we show that this effect is not Granger causally significant for our data over the domains we
15
examine.
In order to do so, we downloaded the Republican and Democratic Party Platforms from 1984
to 2016, and used a simple term search procedure to identify paragraphs relevant to a particular
domain. We then identified framing keywords and framing polarity of the returned paragraphs.
A representative example for the LGBT domain is shown in Fig. 1. G-causality for the (Po-
litical Framing, News Framing), (Political Framing, Public Approval), and (Political Framing,
Legislation) tuples were insignificant for this example, in stark contrast to the (News Framing,
Public Approval), (News Framing, Legislation) tuples, consistent with our hypothesis. We refer
the reader to Table S2 for a full list comparing the Granger causal effect of news with the effect
of political framing on legislation, for the domains we consider. We describe full details of this
study in the subsection below.
Fig. 4 depicts framing density versus time for the Surveillance domain, around the period of the
Snowden revelations. Note that in 2013, a single n-gram (Snowden) suffices to represent 99.7%
of dataset entropy. Further, fig. S7 depicts framing density for the four domains (Smoking,
Surveillance, Obesity, and LGBT Rights) that indicate framing change versus random news,
demonstrating that our measure of framing density distinctively picks up ground truth framing
changes. We found Granger causation between framing density, public attention, and legisla-
tion.
It is worthwhile to point out that the Snowden revelations, which we use in Fig. 4 to depict
framing density, were an event-based driver of news, and not in themselves a framing change.
However, a 2016 Pew Research survey (43) demonstrates that following the Snowden revela-
tions, news coverage of Surveillance changed from a narrative focusing on national security to
one focusing on individual liberty and personal privacy. We further note that event-based news
16
Further, we point out that whereas the event of the Snowden revelations took place in late 2013,
the legislative response (The Freedom Act) was enacted two years later, in 2015. We show
that polarity of negative framing in Surveillance increased following the Snowden revelations
(Fig. 4), and remained high until 2015, corresponding exactly with our hyper-concentrated pe-
riod, after which legislation was promulgated and framing polarity increased.
Since news events cannot affect framing polarity (since framing polarity depends purely on
adjectives and adverbs), and we show that both framing polarity and framing density have dis-
tinctive patterns during framing changes (figs. S6 and S7), we conclude that framing Granger
causes legislation.
This section details the full results of our Granger causality study. We begin with the complete
list of federal legislation promulgated from 1991 to 2016. For each law, we compiled a news
dataset according to the procedure detailed in the Domain Dataset Generation section.
From this list, we manually identified all domains which featured a hyper-concentrated news
period. Table S2 depicts this list. We found ten such cases, corresponding to approximately 11%
of all federal legislation in the period we consider. We then conducted Granger causality tests
between news volume and similarity in these periods, and the corresponding federal legislation.
We find a Granger causally significant result in each case. Our threshold for significance is
α = 0.05. For each domain, Table S2 lists the smallest significance level at which we obtain a
G-causally significant result.
We address the alternative hypothesis that political framing G-causes legislation. In order to
do so, we downloaded the Democratic and Republican party platforms from 1984 to 2016, and
17
measured political interest in the relevant domain as the number of paragraphs devoted to the
domain in an annual platform (we did not normalize to document length since there was no
sizeable variation in length). We then conducted G-causality tests with federal legislation in
the same domains. For nine out of ten cases, we discovered that political framing did not G-
cause legislation at the α = 0.05 level. Interestingly, for three of these domains, we obtained
a p = 0.20 for the hypothesis that political framing G-causes legislation. However, we note
that this result is much weaker than the G-causal significance we obtain for hyper-concentrated
news.
Some domains remain unmentioned through the relevant period in both party platforms, such as
Cyberbullying, Drones, and the HTML data leak. For such domains, since the political parties
do not mention the domain, we conclude that there was no measurable political framing of these
domains (Table S2). Therefore, these domains do not affect our hypothesis, given significant
G-causal measures between hyper-concentrated news characteristics and federal legislation. An
entry in bold indicates a significant Granger causal effect of political framing.
We are unable to highlight all our domains in the main paper or supplementary material due to
space constraints. We encourage the reader to refer to our data and code repository available at
https://github.com/karthiksheshadri/Hyper-Concentrated-News from which all our results may
be reproduced.
Fig. S6 shows framing polarity, and fig. S7 depicts framing density for the framing change
positives, as well as for a random control. To generate the random control, we retrieved a
sample of 991 articles from the NYT API with a null query, for each year between 1990 to
2016.
18
As is evident, framing polarity around ground truth framing changes undergoes dramatic changes,
while remaining close to constant and in tune with the framing polarity of random news dur-
ing non framing change periods. As an example, consider that whereas the framing polarity
of LGBT news between 1990 and 2000 remains fairly similar to that of random news in that
period (fig. S6), it drops dramatically in 2004, corresponding to the year in which the framing
change occurred.
Similarly, our measure of framing density for the framing change positives shown in fig. S7,
depicts an exponential entropy decline as compared to an approximately linear decline in the
This observation corroborates our finding that our measures of framing change detection suc-
cessfully isolate ground truth framing changes.
Comparative Evaluation
Finally, we evaluate the validity of our hypothesis using the comparative method (44). We con-
ducted tests using both the most similar and most different research designs. Full results from
both designs are presented in tables included in the supplementary material. We summarize our
research design and findings here.
The most similar paradigm (44) relies on comparing highly similar cases that differ only in the
dependent variable, as well as in a single or only a few independent variables. Given that the
dependent variables differ, the paradigm assumes that the few differing independent variables
must be responsible. To use the most similar paradigm, we take advantage of the fact that
a domain is most similar to itself. To evaluate our hypothesis that hyper-concentrated news
periods Granger cause legislation, we use all our domains, and evaluate Granger causality of
the domain’s news patterns with legislation, both with and without the presence of a hyper-
19
concentrated period. Table S3 lists the results. In nine out of ten cases, periods that are not
hyper-concentrated within a domain did not Granger cause legislation, whereas in all ten cases,
We also evaluate our hypothesis using the most different research paradigm (44), which relies
on comparing strongly different cases, all of which however have in common the same depen-
dent variable, so that any similarity in the independent variables must explain the common value
of the dependent variable. In order to estimate “difference” between our domains, we define
a custom distance function (Euclidean over our features) based on our news features. We use
the following news features as descriptors of each domain: (i) length of its hyper-concentrated
period (ii) maximum, minimum, and mean annual article volume (used as three separate fea-
tures), (iii) maximum, minimum, and mean framing polarity (used as three separate features),
(iv) maximum, minimum, and mean framing density (used as three separate features). Note
that we do not normalize the raw values of our features, since they characterize the domain and
we are making inter-domain comparisons. However, we normalize our overall distance to a
scale of zero to one. Our data contains ten domains with hyper-concentrated periods. We com-
pute all 10
2
= 45 distances, and pick the top ten to represent our most different domain pairs,
shown in Table S3. To confirm that these pairs represent notably differing domains, we use the
Kolmogorov-Smirnov test (45) to test for similarity on the annual article volume feature. None
of our domain pairs achieve significant similarity, confirming that our distance metric yields
meaningful results. However, since in each of these domains, federal legislation was enacted,
and further since each domain contains a hyper-concentrated news period (the only common
independent variable), we conclude that our hypothesis holds under the most different research
paradigm.
In this context, let us address a concern that our results rely on a particular choice of domains.
20
Note that we exercised no explicit choice in collecting our original set of domains (we ex-
amined all domains that featured federal legislation in the periods for which the NYT and
Guardian APIs provide data). We then analyzed those ten of these domains that featured a
hyper-concentrated news period. As we show in Table S2, all of these domains Granger cause
legislation at the α = 0.05 level. However, our comparative analysis demonstrates through the
most similar (Table S3) and most different (Table S4) paradigms that our hypothesis remains
valid despite wide variation in the domains.
Conclusion
Our data supports our conclusion that hyper-concentrated news periods in news, brought about
both by driver events and framing changes, Granger causally influence public attention and fed-
eral legislation. We acknowledge, however, that our analysis does not disprove reverse causality,
and cannot model other unobservable confounding factors.
References
1. A. Gunther, The persuasive press inference effects of mass media on perceived public opin-
ion. Communication Research 25, 486–504 (1998).
2. D. Mutz, J. Soss, Reading public opinion: The influence of news coverage on perceptions
3. C. Hoadley, H. Xu, J. Lee, M. B. Rosson, Privacy as information access and illusory control:
The case of the Facebook news feed privacy outcry. Electronic Commerce Research and
21
4. C. Taylor, After privacy uproar, Quora feeds will no longer show data on what other users
have viewed (2016). https://goo.gl/9wG65R.
5. S. Iyengar, D. Kinder, News that Matters: Television and American Opinion (University of
Chicago Press, 2010).
6. G. King, B. Schneer, A. White, How the news media activate public expression and influ-
ence national agendas. Science 358, 776–780 (2017).
ity and change in public policymaking. Theories of the Policy Process 8, 59–103 (2014).
10. G. Edwards, D. Wood, Who influences whom? the President, Congress, and the media.
11. P.-N. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining (Pearson Education India,
New Delhi, 2006).
12. Q. Le, T. Mikolov, Proceedings of the 31st International Conference on Machine Learning
(International Machine Learning Society, Beijing, 2014), pp. 1188–1196.
13. Wikipedia, The New York Times (2001). https://en.wikipedia.org/wiki/The New York
Times.
15. D. Kahneman, Thinking Fast and Slow (Straus Farrar and Giroux, New York, 2011).
22
16. M. Jacoby, Negotiating bankruptcy legislation through the news media. Houston Law Re-
view 41, 1092-1144 (2004).
17. R. Benford, D. Snow, Framing processes and social movements: An overview and assess-
ment. Annual Review of Sociology 26, 611–639 (2000).
Accessed: 2016-3-3.
20. S. Althaus, D. Tewksbury, Agenda setting and the new news patterns of issue importance
among readers of the paper and online versions of the New York Times. Communication
21. D. Drezner, H. Farrell, Web of influence. Foreign Policy 145, 32-41 (2004). http://www.
jstor.org/stable/4152942.
22. G. Golan, Inter-media agenda setting and global news coverage. Journalism Studies 7,
323–333 (2006).
23. S. Kiousis, Explicating media salience: A factor analysis of New York Times issue coverage
during the 2000 U.S. presidential election. Journal of Communication 54, 71–87 (2004).
24. S. Rahbar, “The Evil of the Age”: The influence of the New York Times on anti-abortion
legislation in New York, 1865–1873. Pennsylvania History Review 23, 146–176 (2016).
25. L. Breiman, Random forests. Machine Learning 45, 5–32 (2001). http://dx.doi.org/10.
1023/A:1010933404324.
23
2013), vol. 10, pp. 2760–2767.
27. K. Sheshadri, N. Ajmeri, J. Staddon, Proceedings of the 15th Privacy, Security and Trust
29. A. Viera, J. M. Garrett, Understanding inter-observer agreement: The Kappa statistic. Fam-
ily Medicine 37, 360-363 (2005).
30. H. Harris, International Symposium on Artificial Intelligence and Mathematics (Fort Laud-
erdale, 2002).
33. S. Baccianella, A. Esuli, F. Sebastiani, Proceedings of the 7th ELRA International Confer-
www.ftc.gov/enforcement/rules/rulemaking-regulatory-reform-proceedings/
childrens-online-privacy-protection-rule.
24
36. US Department of Education, Family Educational Rights and Privacy Act, https://tinyurl.
com/ybohwmfm (1974).
38. 11% of Americans don’t use the Internet. Who are they? (2018). http://www.pewresearch.
org/fact-tank/2018/03/05/some-americans-dont-use-the-internet-who-are-they/.
41. S. Engel, Frame spillover: Media framing and public opinion of a multifaceted LGBT rights
agenda. Law and Social Inquiry 38, 403–441 (2013).
42. R. D. Flores, Taking the law into their own hands: Do local anti-immigrant ordinances
44. L. Arend, Comparative politics and the comparative method. American Political Science
45. F. Massey, The Kolmogorov-Smirnov test for goodness of fit. Journal of the American
statistical Association 46, 68–78 (1951).
46. A. Johnston, M. Warkentin, Fear appeals and information security behaviors: An empirical
study. Management Information Systems Quarterly 34, 549–566 (2010). http://www.jstor.
org/stable/25750691.
25
47. M. Cummings, R. Proctor, The changing public image of smoking in the United States:
1964–2014. Cancer Epidemiology and Prevention Biomarkers 23, 32–36 (2014).
48. S.-H. Kim, A. Willis, Talking about obesity: News framing of who is responsible for caus-
ing and fixing the problem. Journal of Health Communication 12, 359-376 (2007).
49. S. Preibusch, Privacy behaviors after Snowden. Communications of the ACM 58, 48–55
(2015).
Acknowledgments
This work was completed at the Department of Computer Science at NC State University. The
authors thank Chung-Wei Hang for valuable discussions about the contributions and for his
advice in ensuring the reproducibility of the results. The authors thank Pradeep Murukannaiah
for useful discussions. The authors thank the anonymous reviewers for helpful comments on a
previous version.
Funding: The authors thank the NC State University Laboratory for Analytic Sciences for
partial support.
Author contributions: K.S. prepared the datasets and performed the analysis. K.S. and
M.P.S. designed the evaluation approach and wrote the paper. M.P.S. led the project.
Competing interests: The authors declare that they have no competing interests.
26
Human or animal subjects: The project involved no human or animal subjects.
Data and materials availability: All data needed to evaluate the conclusions in the paper are
present in the Supplementary Materials. All data and code necessary to reproduce the findings
of the paper may be downloaded from https://github.com/karthiksheshadri/Hyper-Concentrated-News.
Supplementary Materials
Fig. S6: Framing Polarity: Random versus LGBT, Surveillance, and Smoking news
Fig. S7: Framing Density: Random versus LGBT, Surveillance, and Smoking news
Fig. S9: News patterns around the 2011 Facebook ID leak, and subsequent legislation
Table. S2: Comparing the Granger causal effect of hyper-concentrated news against that of
Table. S3: A comparative evaluation of our hypothesis using the most similar research de-
sign
27
Table. S4: A comparative evaluation of our hypothesis using the most different research de-
sign
References (30–50)
28
List of Figures
Fig. 1: News framing as a Granger causal precursor to public approval changes and legislation
in the domain LGBT Rights. Public approval increases as negative framing declines. Note
that the sharp decline in framing polarity also detects the documented framing change of 2004.
(46)
Fig. 2: News volume and mean article similarity as predictors of public attention in the domain
Drones. Note that public attention (measured by Google Trends) climbs sharply during the
Fig. 3: We posit the hyper-concentrated period of domain news, characterized by high arti-
cle volume and similarity, which G-causes public attention changes and legislation. Hyper-
concentrated periods arise either due to news events, or independently of events, due to news
framing. We observe and model every link in the figure, except the Events to News link, which
is shown with a dotted arrow
Fig. 4: Framing changes are characterized by low framing density and changes in framing polar-
ity. The figure shows news volume, mean article similarity, and framing density in the domain
Surveillance spike during a hyper-concentrated (labelled HC in the figure) period, foreshadow-
ing legislation
Fig. 5: News characteristics and legislation for the domain Child Privacy. Note that during the
periods 1996 to 1999 and 2009 to 2011, news volume and similarity sharply increase together,
29