You are on page 1of 81

IMPACT OF MISCOMPUTED STANDARD ERRORS AND FLAWED DATA RELEASE

POLICES ON THE AMERICAN COMMUNITY SURVEY


TO: ROBERT GROVES, DIRECTOR
FROM: ANDREW A. BEVERIDGE, PROFESSOR OF SOCIOLOGY, QUEENS COLLEGE AND
GRADUATE CENTER CUNY AND CONSULTANT TO THE NEW YORK TIMES FOR CENSUS
AND OTHER DEMOGRAPHIC ANALYSES
*

SUBJECT: IMPACT OF MISCOMPUTED STANDARD ERRORS AND FLAWED DATA RELEASE
POLICES ON THE AMERICAN COMMUNITY SURVEY

DATE: 1/24/2011
CC: RODERICK LITTLE, ASSOCIATE DIRECTOR; CATHY MCCULLY CHIEF U.S. CENSUS
BUREAU REDISTRICTING DATA OFFICE
This memo is to alert the Bureau to two problems with the American Community Survey as released that
degrade the utility of the data through confusion and filtering (not releasing all data available for a specific
geography). The resultant errors could reflect poorly on the Bureaus ability to conduct the American
Community Survey and could potentially undercut support for that effort. The problems are as follows:
1. The Standard Errors, and the Margins of Error (MOE) based upon them, reported for data
in the American Community Survey are miscomputed. For situations where the usual
approximation used to compute standard errors is inappropriate (generally where there a
given cell in a table makes up only a small part or a very large portion of the total) it is used
anyway. The results they yield include reports of negative numbers of people for many
cells within the MOE in the standard reports.
2. The rules for not releasing data about specific fields or variables in the one-year and three-
year files (so-called filtering) in a given table seriously undercuts the value of the one-year
and three-year ACS for many communities and areas throughout the United States by
depriving them of valuable data. (The filtering is also based in part on the miscomputed
MOEs.)
This memo discusses the issues underlying each problem. Attached to this memo are a number of items to
support and illustrate my points.
A. MISCOMPUTED STANDARD ERRORS
Background
The standard errors and MOEs reported with the American Community Survey, were miscomputed
in certain situations and lead to erroneous results. When very small or very large proportions of a given
table are in a specific category then the standard errors are computed improperly. Indeed, there are a massive
number of instances where the Margins of Error (MOEs) include zero or negative counts of persons with
given characteristics. This happens even though individuals were found in the sample with specific

*
I write as a friend of the Census Bureau. Since 1993 I have served as consultant to the New York Times with regard to
the Census and other demographic matters. I use Census and ACS data in my research and have testified using Census
data numerous times in court.

Miscomputation of Standard Error and Flawed Release Rules in the ACS--2
characteristics (e.g. Non-Hispanic Asian). Obviously, if one finds any individuals of a given category in a
given area, the MOE should never be negative or include zero for that category. If it did, then the method by
which it was computed should be considered incorrect. For instance, in the case of non-Hispanic Asians in
counties, in some 742 counties where the point estimate of the number of non-Hispanic Asians is greater
than zero, the published MOE implies that the actual number in those counties could be negative, and in 40
additional cases the MOE includes zero. Similarly, for the 419 counties where the point estimate is zero, the
published MOE implies that there could be negative or minus Non-Hispanic Asians in the county. (See
Appendix 1 for these and other results. Here I have only shared results from Table B03002. Many other
tables exhibit similar issues.) This problem permeates the data released from the 2005-2009 ACS, and has
contributed to filtering issues in the three-year and one-year files, which will be discussed later in this memo.
Issues with the ACS Approach to Computing Standard Error
This problem appears to be directly related to the method by which Standard Errors and MOEs are
computed for the American Community Survey, especially for situations where a given data field constitutes a
very large or very small proportion of a specific table. Chapter 12 of the ACS Design and Methodology describes in
some detail the methods used to compute standard error in the ACS (see Appendix 2). It appears that the
problems with using the normal approximation or Wald approximation in situations with large or small
proportions are ignored. This is especially surprising given the fact that the problems of using a normal
approximation to the binomial distribution under these conditions are well known. I have attached an article
that surveys some of these issues. (See Appendix 3, Lawrence D. Brown, T. Tony Cai and Anirban
DasGupta. Interval Estimation for a Binomial Proportion. Statistical Science 16:2 (2001): p.10133.) The
article also includes responses from several researchers who have tackled this problem. Indeed, even
Wikipedia has references to this problem in its entry regarding the Binomial Proportion. Much of the
literature makes the point that even in conditions where the issue is not a large or small proportion, the
normal approximation may result in inaccuracies. Some standard statistical packages (e.g., SAS, SUDAAN)
have implemented several of the suggested remedies. Indeed, some software programs even warn users when
the Wald or normal approximation should not be used.
It is true that on page 12-6 of ACS Design and Methodology the following statement seems to indicate that the
Census Bureau recognizes the problem (similar statements are in other documents regarding the ACS):
Users are cautioned to consider logical boundaries when creating confidence bounds from
the margins of error. For example, a small population estimate may have a calculated lower
bound less than zero. A negative number of people does not make sense, so the lower
bound should be set to zero instead. Likewise, bounds for percents should not go below
zero percent or above 100 percent.
In fact, this means that it was obvious to the bureau that the method for computing standard errors and
MOEs resulted in errors in certain situations, but nothing, so far, has been done to remedy the problem.
Furthermore in the American Fact Finder MOEs which included negative numbers, where respondents were
found in the area abound. (See Appendix 4, for an example.) It is equally problematic to have the MOE
include zero when the sample found individuals or households with a given characteristic. I should note that
exact confidence intervals for binomial proportions and appropriate approximations under these conditions
are asymmetric, and never include zero or become negative. Therefore, the ACSs articulated (but not
implemented) MOE remedy of setting the lower bound to zero would not be a satisfactory solution. Rather,
the method of computation must be changed to reflect the accepted statistical literature and statistical practice
by major statistical packages and to guard against producing data that are illogical and thus likely to be
misunderstood and criticized.

Miscomputation of Standard Error and Flawed Release Rules in the ACS--3
Consequences of the Miscomputation
There are serious consequences stemming from this miscomputation:
1. The usefulness, accuracy and reliability of ACS data may be thrown into question. With the
advent of the ACS 2005-2009 data, many of the former uses of the Census long-form will now
depend upon the ACS five year file, while others could be based upon the one-year and three-year
files. One of these uses is in the contentious process of redistricting Congress, state legislatures and
local legislative bodies. In any area where the Voting Rights Act provisions are used to assess
potential districting plans, Citizens of Voting Age Population (CVAP) numbers need to be computed
for various cognizable groups, including Hispanics, Asian and Pacific Islanders, Non-Hispanic White
and Non-Hispanic Black, etc. Since citizenship data is not collected in the decennial census, such
data, which used to come from the long form, must now be produced from the 2005-2009 ACS. In
2000 the Census Bureau created a special tabulation (STP76) based upon the long form to report
CVAP by group for the voting age population to help in the computation of the so-called effective
majority. I understand that the Department of Justice has requested a similar file based upon the
2005 to 2009 ACS. Unless the standard error for the ACS is correctly computed, I envision that
during the very contentious and litigious process of redistricting, someone could allege that the ACS
is completely without value for estimating CVAP due to its flawed standard errors and MOEs, and
therefore is not useful for redistricting. I am sure that somewhere a redistricting expert would testify
to that position if the current standard error and MOE computation procedure were maintained.
This could lead to a serious public relations and political problem, not only for the ACS, but for
census data generally.
When a redistricting expert suggested this exact scenario to me, I became convinced that I
should formally bring these issues to the Bureaus attention. Given the fraught nature of the
politics over the ACS and census data generally, I think it is especially important that having
a massive number of MOEs including minus and zero people should be addressed
immediately by choosing an appropriate method to compute MOEs and implementing it in
the circumstances discussed above.
I should note that in addition to redistricting, the ACS data will be used for the following:
1. Equal employment monitoring using an Equal Employment Opportunity
Commission File, which includes education and occupation data, so it must be
based upon the ACS.
2. Housing availability and affordability studies using a file created for the Housing and
Urban Development (HUD) called the Comprehensive Housing Affordability
Strategy (CHAS) file. This file also requires ACS data since it includes several
housing characteristics and cost items.
3. Language assistance assessments for voting by the Department of Justice, which is
another data file that can only be created from the ACS because it is based upon the
report of the language spoken at home.
Because the ACS provides richer data on a wider variety of topics than the decennial census, there
are a multitude of other uses for the ACS in planning transportation, school enrollment, districting
and more where the MOEs would be a significant issue. In short, the miscomputed MOEs are likely
to cause local and state government officials, private researchers, and courts of law to use the ACS
less effectively and less extensively, or to stop using it all together.


Miscomputation of Standard Error and Flawed Release Rules in the ACS--4
B. FLAWED RELEASE POLICIES FOR DATA TABLES IN THE ONE-YEAR AND THREE-
YEAR ACS FILES
Background
Having yearly and tri-yearly data available makes the ACS potentially much more valuable than the infrequent
Census long form. However, much of that added value has been undercut by the miscomputation of MOEs
coupled with disclosure rules for data in the one-year and three-year ACS files. This has meant that
important data for many communities and areas have been filtered, that is not released. The general rule,
of course, is that data are released for areas with a population of 65,000 or greater for the one-year file and for
areas with a population of 20,000 or greater for the three-year files. However, the implementation of the so-
called filtering rule, used to prevent unreliable statistical data from being released, has had the practical
effect of blocking the release of much important data.
As chapter 13 of the ACS Design Methodology states:
The main data release rule for the ACS tables works as follows. Every detailed table consists
of a series of estimates. Each estimate is subject to sampling variability that can be
summarized by its standard error. If more than half of the estimates in the table are not
statistically different from 0 (at a 90 percent confidence level), then the table fails. Dividing
the standard error by the estimate yields the coefficient of variation (CV) for each estimate.
(If the estimate is 0, a CV of 100 percent is assigned.) To implement this requirement for
each table at a given geographic area, CVs are calculated for each tables estimates, and the
median CV value is determined. If the median CV value for the table is less than or equal to
61 percent, the table passes for that geographic area and is published; if it is greater than 61
percent, the table fails and is not published. (Chapter 13 of the ACS Design and Methodology,
p. 13-7, see Appendix 5.)
Negative Consequences of the Filtering Rules
These rules on their face are flawed, since they make the release of data about non-Hispanic blacks or non-
Hispanic whites contingent on the presence of members of other groups living in the same area, such as
Hispanic Native Americans or Alaskan Natives, Hispanic Asians, or Hispanic Hawaiian Natives or other
Pacific Islanders. Therefore, for areas populated by just a few groups, the Bureau will not release the data
about any group for that area. Thus, many cities, counties and other areas will not have important data about
the population composition reported because they do not have enough different population groups living
within the specified area. Furthermore, using the CV based upon the miscomputed standard error (most
especially for the case of where few or zero individuals in the area are in a given category) means that the
likelihood of reporting a high CV is increased and even more areas will not have some data released. (As
noted in the section discussing the miscomputed standard errors, the computation of the standard error and
thus the computation of CV is flawed.)
To demonstrate the impact of this rule, I assessed the filtering for the variable B03002 (Hispanic status by
race) in the 2009 one-year release, using the Public Use Micro-data Areas (PUMAs). PUMAs are areas that
were designed to be used with the Public Use Micro-Data files and are required to include at least 100,000
people. Due to the Bureaus release rules, the only areas released for the one-year and three-year files that
provide complete coverage of the United States were the nation, States, and PUMAs levels of data. Using
PUMAs, it is easy to understand the impact of the filtering rules since every PUMA received data in 2009.
For the Race and Hispanic status table (B03002), the table elements include total population; total non-
Hispanic population; non-Hispanic population for the following groups: white; black; native American or
Alaskan Native; Asian; native Hawaiian or other Pacific Islander; some other race; two or more races; two
races including some other race; two races excluding some other race, and three or more races; and total
Hispanic population and then population for each of the same groups listed above but for those that are
Hispanic, such as Hispanic Asian, Hispanic native Hawaiian and other Pacific Islander, as well as Hispanic

Miscomputation of Standard Error and Flawed Release Rules in the ACS--5
White and Hispanic black. The table contains some 21 items, of which 3 are subtotals of other items of other
items. (See Appendix 2 for an example.)
Plainly, many of the cells in this table are likely to be zero or near zero for many geographies within the
United States. For that reason, it is not surprising that 1,274 of 2,101 PUMAs (well more than half) were
filtered for the 2009 one-year file. Those not filtered are likely to be urban areas with a substantial degree
of population diversity, while those filtered are often the opposite. The map presented on the final page of
this memo shows graphically which PUMAs had table B03002 filtered. This particular example shows where
the proportion of the population that was non-Hispanic white was not revealed in this table.
Looking at the map, the problem becomes clear. Vast portions of the United States are filtered, as
represented by the green cross-hatch pattern. Parts of North Dakota and Maine for example do not have a
report on the number of non-Hispanic whites. In a similar vein, there is at least one PUMA in New York
City that is so concentrated in terms of non-Hispanic African American population that B03002 is filtered.
Conclusion and Recommendations
Based upon this analysis, I would make three recommendations to the Bureau:
1) That a User Note immediately be drafted indicating that there is a problem with the MOEs (and the
standard errors which are used to compute them) in the ACS for certain situations, and that the
Bureau will recompute them for the ACS 2005-2009 file on an expedited basis. We are basically one-
month from the start of redistricting with Virginia and New Jersey required to redraw their lines by
early summer. Both states have substantial numbers of foreign-born individuals, many of whom are
Hispanic, so the Citizens of Voting Age by group calculations are very important and dependent on
clear ACS data.
2) That the MOEs for the 2005-2009 ACS, (as well as all of the one and three year files already issues)
be recomputed using a method that takes into account the issues surrounding the binomial
proportion. That the ACS 2005-2009 data be re-released with these new MOE files. This should
be done almost immediately for the more recent files and as soon as possible for the others.
3) That the Bureau adopt a new release policy based not upon whole tables (the specifications of which
are in any event arbitrary), but rather based upon specific variables or table cells. In this way, the
release of important data would not be subject to miscomputed standard errors, or to the size or
estimated variability of other variables or cells. It also may make sense to dispense with filtering
altogether. This policy should then be applied to previously released data, and the data should be re-
released.
The problems identified in this memo are causing serious issues regarding the ACSs use in a wide variety of
settings, and could seriously threaten the viability of this irreplaceable and vital data source if they became
widely known. As a Census and ACS user, I truly hope that steps can be taken to remedy these problems
swiftly.

Miscomputation of Standard Error and Flawed Release Rules in the ACS--6
MAP OF THE LOWER 48 STATES SHOWING NON-RELEASE (FILTERING) OF DATA FOR RACE AND HISPANIC
STATUS BY PUMAS. GREEN CROSS-HATCHES INDICATE NON-RELEASE

Andrew A. Beveridge <andy@socialexplorer.com>
Andrew A. Beveridge <andrew.beveridge@qc.cuny.edu> Tue, Feb 8, 2011 at 8:44 AM
To: Roderick.Little@census.gov, David Rindskopf <drindskopf@gc.cuny.edu>
Cc: robert.m.groves@census.gov, Catherine.Clark.McCully@census.gov, "sharon.m.stern" <Sharon.M.Stern@census.gov>
Bcc: j.trent.alexander@census.gov, Matthew Ericson <matte@nytimes.com>
Dear Rod:
It was good to talk to you yesterday along with David Rindskopf
regarding the issues raised in my memo regarding MOE and filtering in
the ACS Data.
We understand the fact that recomputing and re-releasing the
confidence intervals would be a very time consuming process.
Nonetheless, I do believe that some sort of statement regarding the
MOE's is necessary, particularly around negative and zero values for
groups that were found in the sample, as well as the intervals for
situations where no subjects were found.
I attach to this memo an example table from the CVAP data released on
Friday. This is for a block group in Brooklyn, and as you can see the
point estimates for seven of the 11 groups are zero, and the MOE's are
123 in every case. These groups include several that have very low
frequencies in Brooklyn (e.g., American Indian or Alaskan Native). In
the litigation context, it would be very easy to make such numbers
look absurd. I still remain concerned that this could easily harm the
Bureau's credibility, which would be very damaging given the context
of support for Bureau activities.
I also agree that doing some research into what would be an
appropriate way to view and compute Confidence Intervals for the ACS,
possibly including reference to the Decennial Census results and also
contemplating the importance of various spatial distributions would
ultimately be very helpful. At the same time, it is nonetheless the
fact that at least two of the major statistical package vendors have
implemented procedures, which take into account the issue of high or
low proportions in estimating confidence intervals from survey data.
I attach the write-up on confidence intervals for SAS's SURVEYFREQ,
one of a set of procedures that SAS has developed to handle the
analysis of data drawn from complex samples. They have the following
option for the computation of confidence limits: "PROC SURVEYFREQ
also provides the PSMALL option, which uses the alternative confidence
limit type for extreme (small or large) proportions and uses the Wald
confidence limits for all other proportions (not extreme)." (See
Attachment)
At the same time, later yesterday afternoon I was speaking with a very
well known lawyer who works on redistricting issues. He had looked at
the release of the CVAP data at the block-group level and basically
concluded that it was effectively worthless. Part of this, I think,
is due to the fact that the data such as that in the attachment, do
not look reliable on their face.
If nothing is done, it will be very difficult to defend these data in
court or use them to help in drawing districts. I remain convinced
that there also could be serious collateral damage to the Census
Bureau in general. I for one hope that this does not occur. I look
forward to your response to the issues I raised.
Very truly yours,
Andy
--
Andrew A. Beveridge
Prof of Sociology Queens College and Grad Ctr CUNY
Chair Queens College Sociology Dept
Office: 718-997-2848
Email: andrew.beveridge@qc.cuny.edu
252A Powdermaker Hall
65-30 Kissena Blvd
Flushing, NY 11367-1597
www.socialexplorer.com
Attachment_2_8_2011.doc
196K
Attachment to EMAIL to Rod Little 2/8/2011 -1- 1
Block Group 2, Census Tract 505, Kings County, New York

LNTITLE LNNUMBER CVAP_EST CVAP_MOE
Total 1 625 241
Not Hispanic or Latino 2 195 124
American Indian or Alaska Native Alone 3 0 123
Asian Alone 4 35 55
Black or African American Alone 5 45 51
Native Hawaiian or Other Pacific Islander Alone 6 0 123
White Alone 7 120 80
American Indian or Alaska Native and White 8 0 123
Asian and White 9 0 123
Black or African American and White 10 0 123
American Indian or Alaska Native and Black or African
American
11 0 123
Remainder of Two or More Race Responses 12 0 123
Hispanic or Latino 13 430 197



The SURVEYFREQ Procedure
Confidence Limits for Proportions
http://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer
.htm#statug_surveyfreq_a0000000221.htm
If you specify the CL option in the TABLES statement, PROC SURVEYFREQ computes
confidence limits for the proportions in the frequency and crosstabulation tables.
By default, PROC SURVEYFREQ computes Wald ("linear") confidence limits if you do not
specify an alternative confidence limit type with the TYPE= option. In addition to Wald
confidence limits, the following types of design-based confidence limits are available for
proportions: modified Clopper-Pearson (exact), modified Wilson (score), and logit confidence
limits.
PROC SURVEYFREQ also provides the PSMALL option, which uses the alternative confidence
limit type for extreme (small or large) proportions and uses the Wald confidence limits for all
other proportions (not extreme). For the default PSMALL= value of 0.25, the procedure
computes Wald confidence limits for proportions between 0.25 and 0.75 and computes the
alternative confidence limit type for proportions that are outside of this range. See Curtin et al.
(2006).
Attachment to EMAIL to Rod Little 2/8/2011 -2- 2
See Korn and Graubard (1999), Korn and Graubard (1998), Curtin et al. (2006), and Sukasih and
Jang (2005) for details about confidence limits for proportions based on complex survey data,
including comparisons of their performance. See also Brown, Cai, and DasGupta (2001), Agresti
and Coull (1998) and the other references cited in the following sections for information about
binomial confidence limits.
For each table request, PROC SURVEYFREQ produces a nondisplayed ODS table, "Table
Summary," which contains the number of observations, strata, and clusters that are included in
the analysis of the requested table. When you request confidence limits, the "Table Summary"
data set also contains the degrees of freedom df and the value of that is used to compute the
confidence limits. See Example 84.3 for more information about this output data set.
Wald Confidence Limits
PROC SURVEYFREQ computes standard Wald ("linear") confidence limits for proportions by
default. These confidence limits use the variance estimates that are based on the sample design.
For the proportion in table cell , the Wald confidence limits are computed as



where is the estimate of the proportion in table cell , is the standard error of the
estimate, and is the th percentile of the t distribution with df degrees of
freedom calculated as described in the section Degrees of Freedom. The confidence level is
determined by the value of the ALPHA= option, which by default equals 0.05 and produces 95%
confidence limits.
The confidence limits for row proportions and column proportions are computed similarly to the
confidence limits for table cell proportions.
Modified Confidence Limits
PROC SURVEYFREQ uses the modification described in Korn and Graubard (1998) to compute
design-based Clopper-Pearson (exact) and Wilson (score) confidence limits. This modification
substitutes the degrees-of-freedom adjusted effective sample size for the original sample size in
the confidence limit computations.
The effective sample size is computed as



where is the original sample size (unweighted frequency) that corresponds to the total domain
of the proportion estimate, and is the design effect.
Attachment to EMAIL to Rod Little 2/8/2011 -3- 3
If the proportion is computed for a table cell of a two-way table, then the domain is the two-way
table, and the sample size is the frequency of the two-way table. If the proportion is a row
proportion, which is based on a two-way table row, then the domain is the row, and the sample
size is the frequency of the row.
The design effect for an estimate is the ratio of the actual variance (estimated based on the
sample design) to the variance of a simple random sample with the same number of observations.
See the section Design Effect for details about how PROC SURVEYFREQ computes the design
effect.
If you do not specify the ADJUST=NO option, the procedure applies a degrees-of-freedom
adjustment to the effective sample size to compute the modified sample size. If you specify
ADJUST=NO, the procedure does not apply the adjustment and uses the effective sample size
in the confidence limit computations.
The modified sample size is computed by applying a degrees-of-freedom adjustment to the
effective sample size as



where df is the degrees of freedom and is the th percentile of the t distribution
with df degrees of freedom. The section Degrees of Freedom describes the computation of the
degrees of freedom df, which is based on the variance estimation method and the sample design.
The confidence level is determined by the value of the ALPHA= option, which by default
equals 0.05 and produces 95% confidence limits.
The design effect is usually greater than 1 for complex survey designs, and in that case the
effective sample size is less than the actual sample size. If the adjusted effective sample size is
greater than the actual sample size , then the procedure truncates the value of to , as
recommended by Korn and Graubard (1998). If you specify the TRUNCATE=NO option, the
procedure does not truncate the value of .
Modified Clopper-Pearson Confidence Limits
Clopper-Pearson (exact) confidence limits for the binomial proportion are constructed by
inverting the equal-tailed test based on the binomial distribution. This method is attributed to
Clopper and Pearson (1934). See Leemis and Trivedi (1996) for a derivation of the distribution
expression for the confidence limits.
PROC SURVEYFREQ computes modified Clopper-Pearson confidence limits according to the
approach of Korn and Graubard (1998). The degrees-of-freedom adjusted effective sample size
is substituted for the sample size in the Clopper-Pearson computation, and the adjusted
effective sample size times the proportion estimate is substituted for the number of positive
Attachment to EMAIL to Rod Little 2/8/2011 -4- 4
responses. (Or if you specify the ADJUST=NO option, the procedure uses the unadjusted
effective sample size instead of .)
The modified Clopper-Pearson confidence limits for a proportion ( and ) are computed as







where is the th percentile of the distribution with and degrees of freedom, is the
adjusted effective sample size, and is the proportion estimate.
Modified Wilson Confidence Limits
Wilson confidence limits for the binomial proportion are also known as score confidence limits
and are attributed to Wilson (1927). The confidence limits are based on inverting the normal test
that uses the null proportion in the variance (the score test). See Newcombe (1998) and Korn and
Graubard (1999) for details.
PROC SURVEYFREQ computes modified Wilson confidence limits by substituting the degrees-
of-freedom adjusted effective sample size for the original sample size in the standard Wilson
computation. (Or if you specify the ADJUST=NO option, the procedure substitutes the
unadjusted effective sample size .)
The modified Wilson confidence limits for a proportion are computed as

where is the adjusted effective sample size and is the estimate of the proportion. With the
degrees-of-freedom adjusted effective sample size , the computation uses . With the
unadjusted effective sample size, which you request with the ADJUST=NO option, the
computation uses . See Curtin et al. (2006) for details.
Logit Confidence Limits
If you specify the TYPE=LOGIT option, PROC SURVEYFREQ computes logit confidence
limits for proportions. See Agresti (2002) and Korn and Graubard (1998) for more information.
Logit confidence limits for proportions are based on the logit transformation .
The logit confidence limits and are computed as
Attachment to EMAIL to Rod Little 2/8/2011 -5- 5






where

where is the estimate of the proportion, is the standard error of the estimate, and
is the th percentile of the t distribution with df degrees of freedom. The degrees of
freedom are calculated as described in the section Degrees of Freedom. The confidence level is
determined by the value of the ALPHA= option, which by default equals 0.05 and produces 95%
confidence limits.

Andrew A. Beveridge <andy@socialexplorer.com>
roderick.little@census.gov <roderick.little@census.gov> Tue, Feb 8, 2011 at 9:22 AM
To: "Andrew A. Beveridge" <andrew.beveridge@qc.cuny.edu>
Cc: andy@socialexplorer.com, Catherine.Clark.McCully@census.gov, David Rindskopf <drindskopf@gc.cuny.edu>, robert.m.groves@census.gov,
"sharon.m.stern" <Sharon.M.Stern@census.gov>
Andy and David, thanks for this. The SAS appendix looks interesting and seems to be a good start on the problem. I think I have a better idea of the main
concerns from our conversation, and will convey this to the ACS team as we consider next steps. Best, Rod
From: "Andrew A. Beveridge" <andrew.beveridge@qc.cuny.edu>
To: Roderick.Little@census.gov, David Rindskopf <drindskopf@gc.cuny.edu>
Cc: robert.m.groves@census.gov, Catherine.Clark.McCully@census.gov, "sharon.m.stern" <Sharon.M.Stern@census.gov>
Date: 02/08/2011 08:45 AM
Subject: Confidence Limits for Small and Large Proportion in ACS Data
Sent by: andy@socialexplorer.com
[Quoted text hidden]
[attachment "Attachment_2_8_2011.doc" deleted by Roderick Little/DIR/HQ/BOC]
13.8 IMPORTANT NOTES ON MULTIYEAR ESTIMATES
While the types of data products for the multiyear estimates are almost entirely identical to those
used for the 1-year estimates, there are several distinctive features of the multiyear estimates that
data users must bear in mind.
First, the geographic boundaries that are used for multiyear estimates are always the boundary as
of January 1 of the final year of the period. Therefore, if a geographic area has gained or lost terri-
tory during the multiyear period, this practice can have a bearing on the users interpretation of
the estimates for that geographic area.
Secondly, for multiyear period estimates based on monetary characteristics (for example, median
earnings), inflation factors are applied to the data to create estimates that reflect the dollar values
in the final year of the multiyear period.
Finally, although the Census Bureau tries to minimize the changes to the ACS questionnaire, these
changes will occur from time to time. Changes to a question can result in the inability to build cer-
tain estimates for a multiyear period containing the year in which the question was changed. In
addition, if a new question is introduced during the multiyear period, it may be impossible to
make estimates of characteristics related to the new question for the multiyear period.
13.9 CUSTOM DATA PRODUCTS
The Census Bureau offers a wide variety of general-purpose data products from the ACS designed
to meet the needs of the majority of data users. They contain predefined sets of data for standard
census geographic areas. For users whose data needs are not met by the general-purpose prod-
ucts, the Census Bureau offers customized special tabulations on a cost-reimbursable basis
through the ACS custom tabulation program. Custom tabulations are created by tabulating data
from ACS edited and weighted data files. These projects vary in size, complexity, and cost,
depending on the needs of the sponsoring client.
Each custom tabulation request is reviewed in advance by the DRB to ensure that confidentiality is
protected. The requestor may be required to modify the original request to meet disclosure avoid-
ance requirements. For more detailed information on the ACS Custom Tabulations program, go to
<http://www.census.gov/acs/www/Products/spec_tabs/index.htm>.
138 Preparation and Review of Data Products ACS Design and Methodology
U.S. Census Bureau
Cumulative Cumulative
Frequency Percent
MOE Not Reported 3,098 96.18 3,098 96.18
MOE Less than Estimate 123 3.82 3,221 100.00
Cumulative Cumulative
Frequency Percent
MOE Not Reported 2,503 77.71 2,503 77.71
MOE Greater than Estimate 2 0.06 2,505 77.77
MOE Less than Estimate 715 22.20 3,220 99.97
Est Zero, MOE Positive 1 0.03 3,221 100.00
Cumulative Cumulative
Frequency Percent
MOE Greater than Estimate 4 0.12 4 0.12
MOE Less than Estimate 3,215 99.81 3,219 99.94
Est Zero, MOE Positive 2 0.06 3,221 100.00
Cumulative Cumulative
Frequency Percent
MOE Equals Estimate 24 0.75 24 0.75
MOE Greater than Estimate 429 13.32 453 14.06
MOE Less than Estimate 2,473 76.78 2,926 90.84
Est Zero, MOE Positive 295 9.16 3,221 100.00
Cumulative Cumulative
Frequency Percent
MOE Equals Estimate 35 1.09 35 1.09
MOE Greater than Estimate 665 20.65 700 21.73
MOE Less than Estimate 2,133 66.22 2,833 87.95
Est Zero, MOE Positive 388 12.05 3,221 100.00
Cumulative Cumulative
Frequency Percent
MOE Equals Estimate 40 1.24 40 1.24
MOE Greater than Estimate 742 23.04 782 24.28
MOE Less than Estimate 2,020 62.71 2,802 86.99
Est Zero, MOE Positive 419 13.01 3,221 100.00
Cumulative Cumulative
Frequency Percent
MOE Equals Estimate 21 0.65 21 0.65
MOE Greater than Estimate 893 27.72 914 28.38
MOE Less than Estimate 419 13.01 1,333 41.38
Est Zero, MOE Positive 1,888 58.62 3,221 100.00
Appendix 1. Tabulation of MOEs, Including Those That Include Zero or Negative
Cases for Table B03002 (Race by Hispanic Status)
Neg MOE vs EST: Not Hispanic or Latino: Native Hawaiian and Other Pacific Islander
alone
B03002_7_neg
Frequency Percent
Neg MOE vs EST: Not Hispanic or Latino: Asian alone
B03002_6_neg
Frequency Percent
Neg MOE vs EST: Not Hispanic or Latino: American Indian and Alaska Native alone
B03002_5_neg
Frequency Percent
Neg MOE vs EST: Not Hispanic or Latino: Black or African American alone
B03002_4_neg
Frequency Percent
Neg MOE vs EST: Not Hispanic or Latino: White alone
B03002_3_neg
Frequency Percent
Neg MOE vs EST: Not Hispanic or Latino
B03002_2_neg
Frequency Percent
Neg MOE vs EST: Total population
B03002_1_neg
Frequency Percent
Page 1 of 4
Appendix 1. Tabulation of MOEs, Including Those That Include Zero or Negative
Cases for Table B03002 (Race by Hispanic Status)
Cumulative Cumulative
Frequency Percent
MOE Equals Estimate 28 0.87 28 0.87
MOE Greater than Estimate 1,082 33.59 1,110 34.46
MOE Less than Estimate 817 25.36 1,927 59.83
Est Zero, MOE Positive 1,294 40.17 3,221 100.00
Cumulative Cumulative
Frequency Percent
MOE Equals Estimate 34 1.06 34 1.06
MOE Greater than Estimate 339 10.52 373 11.58
MOE Less than Estimate 2,706 84.01 3,079 95.59
Est Zero, MOE Positive 142 4.41 3,221 100.00
Cumulative Cumulative
Frequency Percent
MOE Equals Estimate 25 0.78 25 0.78
MOE Greater than Estimate 1,019 31.64 1,044 32.41
MOE Less than Estimate 505 15.68 1,549 48.09
Est Zero, MOE Positive 1,672 51.91 3,221 100.00
Cumulative Cumulative
Frequency Percent
MOE Equals Estimate 32 0.99 32 0.99
MOE Greater than Estimate 354 10.99 386 11.98
MOE Less than Estimate 2,684 83.33 3,070 95.31
Est Zero, MOE Positive 151 4.69 3,221 100.00
Cumulative Cumulative
Frequency Percent
MOE Not Reported 2,503 77.71 2,503 77.71
MOE Equals Estimate 12 0.37 2,515 78.08
MOE Greater than Estimate 187 5.81 2,702 83.89
MOE Less than Estimate 476 14.78 3,178 98.67
Est Zero, MOE Positive 43 1.33 3,221 100.00
Cumulative Cumulative
Frequency Percent
MOE Equals Estimate 21 0.65 21 0.65
MOE Greater than Estimate 342 10.62 363 11.27
MOE Less than Estimate 2,757 85.59 3,120 96.86
Est Zero, MOE Positive 101 3.14 3,221 100.00
Neg MOE vs EST: Hispanic or Latino: White alone
B03002_13_neg
Frequency Percent
Neg MOE vs EST: Hispanic or Latino
B03002_12_neg
Frequency Percent
Neg MOE vs EST: Not Hispanic or Latino: Two or more races: Two races excluding
Some other race, and three or more races
B03002_11_neg
Frequency Percent
Neg MOE vs EST: Not Hispanic or Latino: Two or more races: Two races including
Some other race
B03002_10_neg
Frequency Percent
Neg MOE vs EST: Not Hispanic or Latino: Two or more races
B03002_9_neg
Frequency Percent
Neg MOE vs EST: Not Hispanic or Latino: Some other race alone
B03002_8_neg
Frequency Percent
Page 2 of 4
Appendix 1. Tabulation of MOEs, Including Those That Include Zero or Negative
Cases for Table B03002 (Race by Hispanic Status)
Cumulative Cumulative
Frequency Percent
MOE Equals Estimate 23 0.71 23 0.71
MOE Greater than Estimate 883 27.41 906 28.13
MOE Less than Estimate 788 24.46 1,694 52.59
Est Zero, MOE Positive 1,527 47.41 3,221 100.00
Cumulative Cumulative
Frequency Percent
MOE Equals Estimate 34 1.06 34 1.06
MOE Greater than Estimate 1,062 32.97 1,096 34.03
MOE Less than Estimate 646 20.06 1,742 54.08
Est Zero, MOE Positive 1,479 45.92 3,221 100.00
Cumulative Cumulative
Frequency Percent
MOE Equals Estimate 11 0.34 11 0.34
MOE Greater than Estimate 557 17.29 568 17.63
MOE Less than Estimate 274 8.51 842 26.14
Est Zero, MOE Positive 2,379 73.86 3,221 100.00
Cumulative Cumulative
Frequency Percent
MOE Equals Estimate 6 0.19 6 0.19
MOE Greater than Estimate 363 11.27 369 11.46
MOE Less than Estimate 62 1.92 431 13.38
Est Zero, MOE Positive 2,790 86.62 3,221 100.00
Cumulative Cumulative
Frequency Percent
MOE Equals Estimate 24 0.75 24 0.75
MOE Greater than Estimate 581 18.04 605 18.78
MOE Less than Estimate 2,320 72.03 2,925 90.81
Est Zero, MOE Positive 296 9.19 3,221 100.00
Cumulative Cumulative
Frequency Percent
MOE Equals Estimate 40 1.24 40 1.24
MOE Greater than Estimate 886 27.51 926 28.75
MOE Less than Estimate 1,573 48.84 2,499 77.58
Est Zero, MOE Positive 722 22.42 3,221 100.00
Neg MOE vs EST: Hispanic or Latino: Two or more races
B03002_19_neg
Frequency Percent
Neg MOE vs EST: Hispanic or Latino: Some other race alone
B03002_18_neg
Frequency Percent
Neg MOE vs EST: Hispanic or Latino: Native Hawaiian and Other Pacific Islander
alone
B03002_17_neg
Frequency Percent
Neg MOE vs EST: Hispanic or Latino: Asian alone
B03002_16_neg
Frequency Percent
Neg MOE vs EST: Hispanic or Latino: American Indian and Alaska Native alone
B03002_15_neg
Frequency Percent
Neg MOE vs EST: Hispanic or Latino: Black or African American alone
B03002_14_neg
Frequency Percent
Page 3 of 4
Appendix 1. Tabulation of MOEs, Including Those That Include Zero or Negative
Cases for Table B03002 (Race by Hispanic Status)
Cumulative Cumulative
Frequency Percent
MOE Equals Estimate 36 1.12 36 1.12
MOE Greater than Estimate 975 30.27 1,011 31.39
MOE Less than Estimate 1,263 39.21 2,274 70.60
Est Zero, MOE Positive 947 29.40 3,221 100.00
Cumulative Cumulative
Frequency Percent
MOE Equals Estimate 32 0.99 32 0.99
MOE Greater than Estimate 1,023 31.76 1,055 32.75
MOE Less than Estimate 906 28.13 1,961 60.88
Est Zero, MOE Positive 1,260 39.12 3,221 100.00
Neg MOE vs EST: Hispanic or Latino: Two or more races: Two races excluding Some
other race, and three or more races
B03002_21_neg
Frequency Percent
Neg MOE vs EST: Hispanic or Latino: Two or more races: Two races including Some
other race
B03002_20_neg
Frequency Percent
Page 4 of 4








Appendix 2.
ACS Desi gn and Met hodol ogy (Ch. 12 Revi sed 12/ 2010) Vari ance Est i mat i on 12- 1
U.S. Census Bureau
Chapt er 12.
Variance Estimation
12.1 OVERVIEW
Sampl i ng error i s the uncertai nt y associ at ed wi t h an esti mat e t hat i s based on dat a gat hered from
a sampl e of t he popul at i on rat her t han t he ful l popul at ion. Not e t hat sampl e- based est i mat es wi l l
vary dependi ng on the part icul ar sampl e sel ect ed f rom t he popul at i on. Measures of t he magni t ude
of sampl i ng error, such as the vari ance and the st andard error (t he square root of the vari ance),
ref l ect the vari at i on in t he est i mat es over al l possi bl e sampl es t hat coul d have been sel ect ed from
t he popul at i on using the same sampl i ng methodol ogy.
The Ameri can Communi t y Survey (ACS) i s commi t t ed t o provi di ng i t s users wi th measures of
sampl i ng error al ong wi th each publ i shed est i mat e. To accompl i sh t hi s, al l publi shed ACS
est i mat es are accompani ed ei t her by 90 percent margins of error or conf i dence i nterval s, bot h
based on ACS di rect vari ance est i mat es. Due t o t he compl exi t y of the sampl i ng design and t he
wei ghti ng adj ust ment s performed on the ACS sampl e, unbi ased desi gn- based variance est i mat ors
do not exi st . As a consequence, t he di rect vari ance est imat es are comput ed usi ng a repl i cat i on
met hod t hat repeat s t he esti mat i on procedures i ndependent l y several t i mes. The vari ance of t he
f ull sampl e i s t hen est i mat ed by usi ng the vari abi l i t y across t he resul t ing repl i cat e est i mat es.
Al t hough the vari ance est i mat es cal cul at ed usi ng t hi s procedure are not compl et el y unbi ased, t he
current met hod produces vari ances t hat are accurat e enough f or anal ysi s of the ACS dat a.
For Publ i c Use Mi crodat a Sampl e (PUMS) dat a users, repl i cat e wei ght s are provi ded t o approxi mat e
st andard errors f or t he PUMS- t abul at ed est i mat es. Desi gn f act ors are al so provi ded wi t h the PUMS
dat a, so PUMS dat a users can comput e st andard errors of t hei r st at i st i cs usi ng ei ther t he
repl i cat i on method or t he desi gn f act or met hod.
12.2 VARIANCE ESTIMATION FOR ACS HOUSING UNIT AND PERSON ESTIMATES
Unbi ased est i mat es of vari ances f or ACS est i mat es do not exi st because of the syst emat i c sampl e
desi gn, as wel l as t he rat i o adj ust ment s used in est i mat i on. As an al t ernat i ve, ACS i mpl ement s a
repl i cat i on method f or vari ance est i mat i on. An advant age of thi s method i s t hat the vari ance
est i mat es can be comput ed wi t hout consi derat i on of the f orm of t he st at i st i cs or t he compl exi t y of
t he sampl i ng or wei ghti ng procedures, such as t hose bei ng used by t he ACS.
The ACS empl oys t he Successi ve Di f ferences Repl i cat i on (SDR) met hod (Wol t er, 1984; Fay & Trai n,
1995; Judki ns, 1990) t o produce vari ance est i mat es. It has been t he met hod used to cal cul at e ACS
est i mat es of vari ances si nce t he st art of t he survey. The SDR was desi gned t o be used wi t h
syst emat i c sampl es f or which t he sort order of t he sampl e i s inf ormat i ve, as i n t he case of t he
ACS s geographi c sort . Appli cat i ons of t hi s met hod were devel oped t o produce est imat es of
vari ances f or the Current Popul at i on Survey (U.S. Census Bureau, 2006) and Census 2000 Long
Form est i mat es (Gbur & Fai rchi l d, 2002).
In the SDR met hod, t he fi rst st ep i n creat ing a vari ance est i mat e i s const ruct i ng t he repl i cat e
f act ors. Repl i cat e base wei ght s are t hen cal cul at ed by mul t i pl ying the base weight f or each
housi ng uni t (HU) by t he f act ors. The wei ghti ng process t hen i s rerun, usi ng each set of repl i cat e
base wei ght s in t urn, t o creat e f i nal repl i cat e wei ght s. Repl i cat e est i mat es are creat ed by usi ng t he
same est i mat i on met hod as t he ori ginal est i mat e, but appl yi ng each set of repl i cat e wei ght s
i nst ead of t he origi nal weight s. Fi nal l y, t he repli cat e and ori ginal est i mat es are used t o comput e
t he vari ance est i mat e based on t he vari abi li t y bet ween the repl i cat e est i mat es and t he ful l sampl e
est i mat e.

12- 2 Vari ance Est i mat i on (Ch. 12 Revi sed 12/ 2010) ACS Desi gn and Met hodol ogy
U.S. Census Bureau
The f ol l owi ng st eps produce t he ACS di rect vari ance est i mat es:
1. Comput e repl i cat e f act ors.
2. Comput e repl i cat e wei ght s.
3. Comput e vari ance est i mat es.
Replicate Factors
Comput at i on of repl i cat e f act ors begi ns wi th t he sel ect ion of a Hadamard mat ri x of order R (a
mul t i pl e of 4), where R i s t he number of repli cat es. A Hadamard mat ri x H i s a k- by- k mat ri x wi t h
al l ent ri es ei ther 1 or 1, such t hat H'H = kI (t hat i s, the col umns are ort hogonal ). For ACS, t he
number of repl i cat es i s 80 (R = 80). Each of t he 80 col umns represent s one repli cat e.
Next , a pai r of rows i n t he Hadamard mat ri x i s assi gned t o each record (HU or group quart ers (GQ)
person). An al gori thm i s used t o assi gn t wo rows of an 80 80 Hadamard mat ri x t o each HU. The
ACS uses a repeat i ng sequence of 780 pai rs of rows i n t he Hadamard mat ri x t o assi gn rows t o
each record, in sort order (Navarro, 2001a). The assi gnment of Hadamard mat ri x rows repeat s
every 780 records unt il al l records recei ve a pai r of rows f rom t he Hadamard mat ri x. The fi rst row
of t he mat ri x, in whi ch every cel l i s al ways equal t o one, i s not used.
The repl i cat e f act or f or each record then i s det ermi ned f rom these t wo rows of the 80 80
Hadamard mat ri x. For record i (i = 1,,n, where n i s sampl e si ze) and repli cat e r (r = 1,,80), t he
repl i cat e f act or i s comput ed as:

where R1i and R2i are respect i vel y t he f irst and second row of the Hadamard mat ri x assi gned t o
t he i- t h HU, and a
Rl i,r
and a
R2i ,r
are respect i vel y t he mat rix el ement s (ei t her 1 or 1) from t he
Hadamard mat ri x i n rows R1i and R2i and col umn r. Not e t hat t he f ormul a f or
i,r
If a
yi el ds repl i cat e
f act ors t hat can t ake one of t hree approxi mat e val ues: 1.7, 1.0, or 0.3. That i s;
R1i,r
= + 1 and a
R2i,r
If a
= + 1, t he repl i cat e f act or i s 1.
R1i,r
= 1 and a
R2i,r
If a
= 1, the repl i cat e f act or i s 1.
R1i,r
= + 1 and a
R2i,r
If a
= 1, t he repl i cat e f act or i s approxi mat el y 1.7.
R1i,r
= 1 and a
R2i,r
The expect at i on i s t hat 50 percent of repl i cat e f act ors wi l l be 1, and t he ot her 50 percent wil l be
evenl y spli t bet ween 1.7 and 0.3 (Gunl i cks, 1996).
= + 1, t he repl i cat e f act or i s approxi mat el y 0.3.
The f ol l owi ng exampl e demonst rat es t he comput at i on of repli cat e f act ors f or a sampl e of si ze
f i ve, using a Hadamard mat ri x of order f our:

Tabl e 12.1 present s an exampl e of a t wo- row assi gnment devel oped f rom thi s mat ri x, and the
val ues of repl i cat e f act ors for each sampl e uni t .


ACS Desi gn and Met hodol ogy (Ch. 12 Revi sed 12/ 2010) Vari ance Est i mat i on 12- 3
U.S. Census Bureau
Tabl e 12.1 Example of Two- Row Assignment, Hadamard Matrix Elements, and Replicate
Factors
Case
#(i)
Row

Hadamard mat rix element Approximat e replicat e

R1 R2
i

Replicat e 1
i

Replicat e 2 Replicat e 3 Replicat e 4
f f
i ,1
f
i ,2
f
i ,3

a
i ,4

a
R1i ,1
a
R2i ,1
a
R1i ,2
a
R2i ,2
a
R1i ,3
a
R2i ,3
a
R1i ,4

1
R2i ,4

2 3 - 1 + 1 + 1 + 1 - 1 - 1 + 1 - 1 0.3 1 1 1.7
2 3 4 + 1 - 1 + 1 + 1 - 1 + 1 - 1 - 1 1.7 1 0.3 1
3 4 2 - 1 - 1 + 1 + 1 + 1 + 1 - 1 + 1 1 1 1 0.3
4 2 3 - 1 + 1 + 1 + 1 - 1 - 1 + 1 - 1 0.3 1 1 1.7
5 3 4 + 1 - 1 + 1 + 1 - 1 + 1 - 1 - 1 1.7 1 0.3 1
Not e t hat row 1 i s not used. For t he thi rd case (i = 3), rows f our and t wo of t he Hadamard mat ri x
are t o cal cul at e t he repl i cat e f act ors. For t he second repl i cat e (r = 2), t he repl i cat e f act or i s
comput ed usi ng the values i n t he second col umn of rows f our (+1) and t wo (+1) as f ol l ows:

Replicate Weights
Repl i cat e wei ght s are produced i n a way si mi l ar t o t hat used t o produce ful l sampl e f i nal weight s.
Al l of t he weight ing adj ustment processes perf ormed on t he f ul l sampl e f i nal survey wei ght s (such
as appl yi ng nonint ervi ew adj ust ment s and popul at i on cont rol s) al so are carri ed out f or each
repl i cat e weight . However, col l apsi ng pat t erns are ret ained f rom t he f ul l sampl e wei ght ing and are
not det ermined agai n f or each set of repl i cat e weight s.
Bef ore appl yi ng t he weight ing st eps expl ai ned i n Chapter 11, t he repl i cat e base wei ght (RBW) f or
repl i cat e r i s comput ed by mul t i pl ying the full sampl e base wei ght (BW see Chapt er 11 f or t he
comput at i on of t hi s wei ght ) by t he repl i cat e f act or
i,r
; that i s, RBW
i,r
= BW
i

i ,r
, where RBW
i,r
One can el aborat e on the previ ous exampl e of t he repl icat e const ruct i on usi ng fi ve cases and f our
repl i cat es: Suppose t he f ull sampl e BW val ues are gi ven under t he second col umn of t he f oll owi ng
t abl e (Tabl e 12.2). Then, t he repl i cat e base weight values are gi ven i n columns 710.
i s
t he repl i cat e base wei ght f or t he i- t h HU and the r- t h repl i cat e (r = 1, , 80).
Tabl e 12.2 Example of Computation of Replicate Base Weight Factor (RBW)
Case # BW
Approximat e Replicat e Fact or
i

Replicat e Base Weight
f f
i ,1
f
i ,2
f
i ,3
RBW
i ,4
RBW
i,1
RBW
i,2
RBW
i,3

1
i,4

100 0.3 1 1 1.7 29 100 100 171
2 120 1.7 1 0.3 1 205 120 35 120
3 80 1 1 1 0.3 80 80 80 23
4 120 0.3 1 1 1.7 35 120 120 205
5 110 1.7 1 0.3 1 188 110 32 110
The rest of t he wei ght ing process (Chapt er 11) t hen i s appl i ed t o each repli cat e weight RBW
i ,r

(st art i ng f rom the adj ust ment f or CAPI subsampl i ng) and proceedi ng t o the popul ati on cont rol
adj ust ment or raking). Basi cal l y, t he wei ght ing adj ust ment process i s repeat ed independent l y 80
t i mes and t he RBW
i ,r
i s used i n pl ace of BW
i
By t he end of thi s process, 80 f i nal repli cat e wei ght s f or each HU and person record are produced.
(as i n Chapt er 11).
Variance Estimates
Gi ven the repl i cat e weight s, t he comput at i on of vari ance f or any ACS est i mat e i s strai ght f orward.
Suppose t hat i s an ACS est imat e of any t ype of st at i st i c, such as mean, t ot al , or proport i on. Let
denot e t he est i mat e comput ed based on t he f ull sampl e wei ght , and , , , denot e t he
est i mat es comput ed based on t he repl i cat e wei ght s. The vari ance of , , i s est i mat ed as t he

12- 4 Vari ance Est i mat i on (Ch. 12 Revi sed 12/ 2010) ACS Desi gn and Met hodol ogy
U.S. Census Bureau
sum of squared dif f erences bet ween each repl i cat e est imat e (r = 1, , 80) and t he f ul l sampl e
est i mat e . The f ormul a i s as f ol l ows
1
Thi s equat i on hol ds f or count est i mat es as wel l as any ot her t ypes of est i mat es, incl udi ng
percent s, rat i os, and medi ans.
:

There are cert ain cases, however, where thi s f ormul a does not appl y. The f i rst and most i mport ant
cases are est i mat es t hat are cont rol l ed t o popul at i on tot al s and have t hei r st andard errors set t o
zero. These are est i mat es that are f orced t o equal int ercensal est i mat es duri ng t he wei ght i ng
process s raki ng st epf or exampl e, t ot al popul at i on and col l apsed age, sex, and Hi spani c ori gin
est i mat es f or wei ghti ng areas. Al t hough race i s i ncluded i n t he raking procedure, race group
est i mat es are not cont roll ed; t he cat egori es used i n the wei ghti ng process (see Chapt er 11) do not
mat ch t he publ i shed t abul at i on groups because of mult i pl e race responses and the Some Ot her
Race cat egory. Inf ormat i on on t he fi nal col l apsi ng of the person post - st rat i f i cat i on cel l s i s passed
f rom t he wei ght ing t o t he vari ance est i mat i on process in order t o i denti f y est i mat es t hat are
cont rol l ed. Thi s i denti f i cat ion i s done i ndependentl y f or al l wei ghti ng areas and t hen i s appl i ed t o
t he geographi c areas used for t abul at i on. St andard errors f or those est i mat es are set t o zero, and
publ i shed margi ns of error are set t o ***** (wi t h an appropri at e accompanyi ng foot not e).
Anot her speci al case deal s wi t h zero- est i mat ed count s of peopl e, househol ds, or HUs. A di rect
appl i cat i on of t he repl i cat e vari ance f ormul a l eads t o a zero st andard error f or a zero- est i mat ed
count . However, there may be peopl e, househol ds, or HUs wi t h that charact eri st i c i n that area t hat
were not sel ect ed t o be i n the ACS sampl e, but a di ff erent sampl e mi ght have sel ect ed t hem, so a
zero st andard error i s not appropri at e. For t hese cases, t he f ol l owi ng model - based est i mat i on of
st andard error was i mpl ement ed.
For ACS dat a i n a census year, t he ACS zero- est i mat ed count s (f or charact eri st i cs incl uded in t he
100 percent census (short f orm) count ) can be checked agai nst t he corresponding census
est i mat es. At l east 90 percent of the census count s f or t he ACS zero- est i mat ed count s shoul d be
wi t hin a 90 percent confi dence i nt erval based on our model ed st andard error.
2
Then, set the 90 percent upper bound f or the zero est imat e equal t o t he Census count :

Let the vari ance of
t he est i mat e be model ed as some mul t i pl e (K) of the average fi nal wei ght (f or a st at e or t he
nat i on). That i s:

Sol vi ng f or K yi el ds:


K was comput ed f or all ACS zero- est i mat ed count s f rom 2000 whi ch mat ched t o Census 2000
100 percent count s, and t hen t he 90t h percent il e of t hose Ks was det ermined. Based on t he
Census 2000 dat a, we use a val ue f or K of 400 (Navarro, 2001b). As t hi s model ing met hod
requi res census count s, t he 400 val ue can next be updat ed usi ng the 2010 Census and 2010 ACS
dat a.
For publ i cat i on, the st andard error (SE) of t he zero count est i mat e i s comput ed as:

1
A general replicat ion- based variance f ormula can be expressed as
where c
r
is t he mult iplier relat ed t o t he r- th replicat e det ermined by t he replicat ion met hod. For t he SDR
met hod, the value of c
r
is 4 / R, where R is t he number of replicat es (Fay & Train, 1995).
2
This modeling was done only once, in 2001, prior t o t he publicat ion of t he 2000 ACS dat a.

ACS Desi gn and Met hodol ogy (Ch. 12 Revi sed 12/ 2010) Vari ance Est i mat i on 12- 5
U.S. Census Bureau

The average wei ght s (t he maxi mum of the average housi ng uni t and average person f inal wei ght s)
are cal cul at ed at t he st at e and nat i onal l evel f or each ACS si ngl e- year or mul ti year dat a rel ease.
Est i mat es f or geographi c areas wi t hin a st at e use t hat st at e s average weight , and est i mat es f or
geographi c areas t hat cross st at e boundari es use t he nat i onal average wei ght.
Fi nal l y, a si mil ar met hod i s used t o produce an approximat e st andard error f or bot h ACS zero and
100 percent est i mat es. We do not produce approxi mat e st andard errors f or ot her zero est i mat es,
such as rat i os or medi ans.
Variance Estimation for Multiyear ACS Estimates Finite Population Correction Factor
Through t he 2008 and 2006- 2008 dat a product s, t he same vari ance est i mat i on met hodol ogy
descri bed above was i mpl ement ed f or bot h 1- year and 3- year. No changes t o t he met hodol ogy
were necessary due t o usi ng mul ti pl e years of sampl e dat a. However, beginni ng wi t h t he 2007-
2009 and 2005- 2009 dat a product s, t he ACS i ncorporat ed a f i ni t e popul at i on correct i on (FPC)
f act or i nt o t he 3- year and 5- year vari ance est i mat i on procedures.
The Census 2000 l ong f orm, as not ed above, used the same SDR vari ance est i mat i on met hodol ogy
as t he ACS current l y does. The l ong f orm met hodol ogy al so i ncl uded an FPC f act or i n i t s
cal cul at i on. One- year ACS sampl es are not l arge enough f or an FPC t o have much impact on
vari ances. However, wi th 5- year ACS est i mat es, up t o 50 percent of housing uni t s in cert ai n
bl ocks may have been in sampl e over the 5- year peri od. Appl yi ng an FPC f act or t o mul t i - year ACS
repl i cat e est i mat es wi ll enabl e a more accurat e est i mat e of t he vari ance, part i cul arly f or smal l
areas. It was deci ded t o appl y t he FPC adj ust ment t o 3- year and 5- year ACS product s, but not t o
1- year product s.
The ACS FPC f act or i s appl ied i n the creat i on of the repl i cat e f act ors:

where i s t he FPC f act or. Generi cal l y, n i s t he unwei ght ed sampl e si ze, and N i s t he
unwei ght ed uni verse si ze. The ACS uses t wo separat e FPC f act ors: one f or HUs respondi ng by mai l
or t el ephone, and a second f or HUs responding vi a personal vi si t f ol l ow- up.
The FPC i s t ypi cal l y appl i ed as a mul t i pl i cat i ve f act or out si de t he vari ance f ormula. However,
under cert ai n si mpli f ying assumpt i ons, t he vari ance using t he repli cat e f act ors af t er appl yi ng the
FPC f act or i s equal t o t he ori gi nal vari ance mul t i pl ied by t he FPC f act or. Thi s met hod al l ows a
di rect appl i cat i on of t he FPC t o each housi ng uni t s or person s set of repli cat e weight s, and a
seaml ess i ncorporat i on int o t he ACS s current vari ance product i on met hodol ogy, rat her than
havi ng t o keep track of mult i pl i cat i ve f act ors when t abul at i ng across areas of di f f erent sampl i ng
rat es.
The adj ust ed repl i cat e f act ors are used t o creat ed repl i cat e base wei ght s, and ul t i mat el y f i nal
repl i cat e weight s. It i s expect ed t hat t he i mprovement in t he vari ance est i mat e wi l l carry t hrough
t he wei ghti ng, and wil l be seen when t he f inal wei ght s are used.
The ACS FPC f act or coul d be appl i ed at any geographi c l evel . Si nce the ACS sampl i ng rat es are
det ermi ned at the smal l area l evel (mai nl y census t ract s and government al uni t s), a l ow level of
geography was desi rabl e. At hi gher l evel s, t he hi gh sampl i ng rat es i n specif i c bl ocks woul d l ikel y
be masked by t he l ower rates i n surrounding bl ocks. For t hat reason, the f act ors are appl i ed at t he
census t ract l evel.
Group quart ers persons do not have an FPC f act or appl i ed t o t heir repl i cat e f act ors.
12.3 MARGIN OF ERROR AND CONFIDENCE INTERVAL
Once t he st andard errors have been comput ed, margins of error and/ or conf i dence bounds are
produced f or each est i mat e. These are the measures of overal l sampl ing error present ed al ong
wi t h each publ i shed ACS est i mat e. Al l publ i shed ACS margi ns of error and t he l ower and upper

12- 6 Vari ance Est i mat i on (Ch. 12 Revi sed 12/ 2010) ACS Desi gn and Met hodol ogy
U.S. Census Bureau
bounds of confi dence i nt erval s present ed in t he ACS dat a product s are based on a 90 percent
conf i dence l evel, whi ch i s the Census Bureau s st andard (U.S. Census Bureau, 2010b). A margi n of
error cont ains t wo component s: t he st andard error of the est i mat e, and a mul t i pli cat i on f act or
based on a chosen conf i dence l evel. For the 90 percent conf i dence l evel, t he val ue of t he
mul t i pl i cat i on f act or used by t he ACS i s 1.645. The margi n of error of an est i mat e can be
comput ed as:

where SE( ) i s t he st andard error of t he est i mat e . Gi ven t hi s margin of error, t he 90 percent
conf i dence i nt erval can be comput ed as:

That i s, t he l ower bound of t he confi dence int erval i s [ margi n of error ( ) ], and t he upper
bound of the conf i dence i nterval i s [ + margin of error ( ) ]. Roughl y speaki ng, t hi s int erval i s a
range t hat wi l l cont ai n the f ull popul at i on val ue of t he est i mat ed charact eri st i c wi t h a known
probabi l i t y.
Users are caut i oned t o consi der l ogi cal boundari es when creat ing conf i dence bounds f rom t he
margi ns of error. For exampl e, a smal l popul at i on est imat e may have a cal cul at ed l ower bound
l ess t han zero. A negat i ve number of peopl e does not make sense, so t he l ower bound shoul d be
set t o zero inst ead. Li kewi se, bounds f or percent s shoul d not go bel ow zero percent or above 100
percent . For other charact eri st i cs, l i ke i ncome, negat i ve val ues may be l egi t i mat e.
Gi ven the confi dence bounds, a margi n of error can be comput ed as t he di f ference bet ween an
est i mat e and i t s upper or l ower confi dence bounds:

Usi ng t he margin of error (as publ i shed or cal cul at ed f rom t he bounds), t he st andard error i s
obt ai ned as f ol l ows:

For ranking t abl es and compari son prof i les, t he ACS provi des an indi cat or as t o whet her two
est i mat es, Est
1
and Est
2
If Z < 1.645 or Z > 1.645, t he di ff erence bet ween the est i mat es i s si gnif i cant at the 90 percent
l evel. Det ermi nat i ons of st at i st i cal si gnif i cance are made usi ng unrounded values of t he st andard
errors, so users may not be abl e t o achi eve the same resul t using the st andard errors deri ved f rom
t he rounded est i mat es and margi ns of error as publ i shed. Onl y pai rwi se t est s are used t o
det ermi ne si gni fi cance i n the ranking t abl es; no mul t i ple compari son met hods are used.
, are st at i st i cal l y si gni fi cant l y di ff erent at t he 90 percent conf i dence l evel.
That det ermi nat i on i s made by i ni t i all y cal cul at i ng:

12.4 VARIANCE ESTIMATION FOR THE PUMS
The Census Bureau cannot possi bl y predi ct al l combi nat i ons of est i mat es and geography t hat may
be of int erest t o dat a users. Dat a users can downl oad PUMS f i les and t abul at e t he dat a t o creat e
est i mat es of t hei r own choosi ng. Because the ACS PUMS cont ai ns onl y a subset of the ful l ACS
sampl e, est i mat es f rom t he ACS PUMS f i l e wi ll of ten be di f f erent f rom the publi shed ACS est i mat es
t hat are based on t he f ul l ACS sampl e.
Users of t he ACS PUMS fi l es can comput e t he est i mat ed vari ances of thei r st at i st i cs usi ng one of
t wo opt i ons: (1) t he repl i cat i on method using repl i cat e wei ght s rel eased wi t h the PUMS dat a, and
(2) t he desi gn f act or method.

ACS Desi gn and Met hodol ogy (Ch. 12 Revi sed 12/ 2010) Vari ance Est i mat i on 12- 7
U.S. Census Bureau
PUMS Replicate Variances
For t he repl i cat e met hod, direct vari ance est i mat es based on t he SDR f ormul a as descri bed i n
Sect i on 12.2 above can be i mpl ement ed. Users can si mpl y t abul at e 80 repl i cat e est i mat es i n
addi t i on t o t heir desi red esti mat e by usi ng t he provi ded 80 repl i cat e wei ght s, and then appl y t he
vari ance f ormul a:


PUMS Design Factor Variances
Si mi l ar t o met hods used t o cal cul at e st andard errors f or PUMS dat a f rom Census 2000, t he ACS
PUMS provi des t abl es of desi gn f act ors f or vari ous t opics such as age f or persons or t enure f or
HUs. For exampl e, the 2009 ACS PUMS desi gn f act ors are publ i shed at nat i onal and st at e l evel s
(U.S. Census Bureau, 2010a), and were cal cul at ed usi ng 2009 ACS dat a. PUMS desi gn f act ors are
updat ed peri odi cal l y, but not necessari l y on an annual basi s. The desi gn f act or approach was
devel oped based on a model t hat uses a st andard error f rom a si mpl e random sampl e as t he base,
and t hen i nf l at es i t t o account f or an increase i n t he vari ance caused by t he compl ex sampl e
desi gn. St andard errors f or al most al l count s and proport i ons of persons, households, and HUs
are approxi mat ed usi ng desi gn f act ors. For 1- year ACS PUMS f i l es begi nni ng wi th 2005, use:

f or a t ot al , and

f or a percent, where:
= t he est i mat e of t ot al or a count .
= t he est i mat e of a percent .
DF = t he appropri at e desi gn fact or based on t he t opi c of t he est i mat e.
N = t he t ot al f or the geographi c area of int erest (i f the est i mat e i s of HUs, the number
of HUs i s used; i f t he est i mat e i s of f amil i es or househol ds, t he number of
househol ds i s used; otherwise t he number of persons i s used as N).
B = t he denominat or (base) of t he percent .
The val ue 99 i n t he f ormul a i s t he val ue of the 1- year PUMS FPC f act or, whi ch i s comput ed as (100
) / , where (gi ven as a percent ) i s t he sampl i ng rat e f or the PUMS dat a. Si nce t he PUMS i s
approxi mat el y a 1 percent sampl e of HUs, (100 ) / = (100 1)/ 1 = 99.
For 3- year PUMS f i l es beginni ng wi th 20052007, t he 3 years wort h of dat a represent
approxi mat el y a 3 percent sampl e of HUs. Hence, t he 3- year PUMS FPC f act or i s (100 ) / =
(100 3) / 3 = 97 / 3. To cal cul at e st andard errors from 3- year PUMS dat a, subst i t ut e 97 / 3 f or
99 i n t he above f ormul as.
Si mi l arl y, 5- year PUMS f i les, begi nni ng wi t h 2005- 2009, represent approxi mat el y a 5 percent
sampl e of HUs. So, t he 5- year PUMS FPC i s 95 / 5 = 19, whi ch can be subst i t ut ed for 99 i n t he
above f ormul as.
The desi gn f act or (DF) i s def i ned as t he rat i o of t he st andard error of an est i mat ed paramet er
(comput ed under t he repl i cat i on met hod descri bed in Sect i on 12.2) t o t he st andard error based on
a si mpl e random sampl e of t he same si ze. The DF ref l ect s t he ef fect of the act ual sampl e desi gn
and est i mat i on procedures used f or the ACS. The DF f or each t opi c was comput ed by model i ng

12- 8 Vari ance Est i mat i on (Ch. 12 Revi sed 12/ 2010) ACS Desi gn and Met hodol ogy
U.S. Census Bureau
t he rel at i onshi p bet ween t he st andard error under t he repl i cat i on met hod (RSE) wi t h t he st andard
error based on a si mple random sampl e (SRSSE); t hat i s, RSE = DF SRSSE, where the SRSSE i s
comput ed as f ol l ows:

The val ue 39 i n t he f ormul a above i s t he FPC f act or based on an approxi mat e sampl i ng f ract i on of
2.5 percent i n the ACS; t hat i s, (100 2.5) / 2.5 = 97.5 / 2.5 = 39.
The val ue of DF i s obt ai ned by f i t t i ng t he no- i ntercept regressi on model RSE = DF SRSSE usi ng
st andard errors (RSE, SRSSE) f or vari ous publ i shed t abl e est i mat es at t he nat i onal and st at e l evel s.
The val ues of DFs by t opi c can be obt ai ned from t he PUMS Accuracy of t he Dat a st at ement t hat
i s publ i shed wi t h each PUMS f i l e. For exampl e, 2009 1- year PUMS DFs can be f ound i n U,S,
Census Bureau (2010a).. The document at i on al so provides exampl es on how t o use t he design
f act ors t o comput e st andard errors f or t he est i mat es of t ot al s, means, medi ans, proport i ons or
percent ages, rat i os, sums, and di f f erences.
The t opi cs f or t he 2009 PUMS desi gn f act ors are, f or the most part , t he same ones t hat were
avai l abl e f or t he Census 2000 PUMS. We recommend t o users t hat , i n usi ng t he desi gn f act or
approach, i f t he est i mat e i s a combi nat i on of t wo or more charact eri st i cs, t he l argest DF f or thi s
combi nat i on of charact eri st ics i s used. The onl y except ions t o t hi s are i t ems crossed wi t h race or
Hi spani c ori gin; f or t hese i tems, t he l argest DF i s used, af t er removi ng t he race or Hi spani c ori gin
DFs f rom consi derat i on.
12.5 REFERENCES
Fay, R., & Trai n, G. (1995). Aspect s of Survey and Model Based Post censal Est i mat i on of Income
and Povert y Charact eri st i cs f or St at es and Counti es. Joint Statistical Meetings: Proceedings of the
Section on Government Statistics (pp. 154- 159). Al exandri a, VA: Ameri can St at i st i cal Associ at i on:
ht t p:/ / www.census.gov/ di d/ www/ sai pe/ publ i cat i ons/ f i l es/ FayTrai n95.pdf
Gbur, P., & Fai rchi l d, L. (2002). Overvi ew of t he U.S. Census 2000 Long Form Di rect Vari ance
Est i mat i on. Joint Statistical Meetings: Proceedings of the Section on Survey Research Methods (pp.
1139- 1144). Al exandri a, VA: Ameri can St at i st i cal Associ at i on.
Gunl i cks, C. (1996). 1990 Replicate Variance System (VAR90-20). Washi ngt on, DC: U.S. Census
Bureau.
Judki ns, D. R. (1990). Fay's Met hod f or Vari ance Est i mat i on. Journal of Official Statistics , 6 (3),
223- 239.
Navarro, A. (2001a). 2000 American Community Survey Comparison County Replicate Factors.
Ameri can Communi t y Survey Vari ance Memorandum Seri es #ACS- V- 01. Washi ngt on, DC: U.S.
Census Bureau.
Navarro, A. (2001b). Estimating Standard Errors of Zero Estimates. Washi ngt on, DC: U.S. Census
Bureau.
U.S. Census Bureau. (2006). Current Population Survey: Technical Paper 66Design and
Methodology. Ret ri eved from U.S. Census Bureau: ht t p:/ / www.census.gov/ prod/ 2006pubs/ t p-
66.pdf
U.S. Census Bureau. (2010a). PUMS Accuracy of the Data (2009). Ret ri eved f rom U.S. Census
Bureau:
ht t p:/ / www.census.gov/ acs/ www/ Downl oads/ dat a_document at i on/ pums/ Accuracy/ 2009Accurac
yPUMS.pdf
U.S. Census Bureau. (2010b). Statistical Quality Standard E2: Reporting Results. Washi ngt on, DC:
U.S. Census Bureau: ht t p:/ / www.census.gov/ qual i t y/ st andards/ st andarde2.ht ml

ACS Desi gn and Met hodol ogy (Ch. 12 Revi sed 12/ 2010) Vari ance Est i mat i on 12- 9
U.S. Census Bureau
Wol t er, K. M. (1984). An Invest i gat i on of Some Est i mat ors of Vari ance f or Syst emat i c Sampl i ng.
Journal of the American Statistical Association , 79, 781- 790.









Appendix 3.
Statistical Science
2001, Vol. 16, No. 2, 101133
Interval Estimation for
a Binomial Proportion
Lawrence D. Brown, T. Tony Cai and Anirban DasGupta
Abstract. We revisit the problem of interval estimation of a binomial
proportion. The erratic behavior of the coverage probability of the stan-
dard Wald condence interval has previously been remarked on in the
literature (Blyth and Still, Agresti and Coull, Santner and others). We
begin by showing that the chaotic coverage properties of the Wald inter-
val are far more persistent than is appreciated. Furthermore, common
textbook prescriptions regarding its safety are misleading and defective
in several respects and cannot be trusted.
This leads us to consideration of alternative intervals. A number of
natural alternatives are presented, each with its motivation and con-
text. Each interval is examined for its coverage probability and its length.
Based on this analysis, we recommend the Wilson interval or the equal-
tailed Jeffreys prior interval for small n and the interval suggested in
Agresti and Coull for larger n. We also provide an additional frequentist
justication for use of the Jeffreys interval.
Key words and phrases: Bayes, binomial distribution, condence
intervals, coverage probability, Edgeworth expansion, expected length,
Jeffreys prior, normal approximation, posterior.
1. INTRODUCTION
This article revisits one of the most basic and
methodologically important problems in statisti-
cal practice, namely, interval estimation of the
probability of success in a binomial distribu-
tion. There is a textbook condence interval for
this problem that has acquired nearly universal
acceptance in practice. The interval, of course, is
r z
2
n
12
( r(1 r))
12
, where r = .n is
the sample proportion of successes, and z
2
is the
100(1 2)th percentile of the standard normal
distribution. The interval is easy to present and
motivate and easy to compute. With the exceptions
Lawrence D. Brown is Professor of Statistics, The
Wharton School, University of Pennsylvania, 3000
Steinberg Hall-Dietrich Hall, 3620 Locust Walk,
Philadelphia, Pennsylvania 19104-6302. T. Tony Cai
is Assistant Professor of Statistics, The Wharton
School, University of Pennsylvania, 3000 Steinberg
Hall-Dietrich Hall, 3620 Locust Walk, Philadelphia,
Pennsylvania 19104-6302. Anirban DasGupta is
Professor, Department of Statistics, Purdue Uni-
versity, 1399 Mathematical Science Bldg., West
Lafayette, Indiana 47907-1399
of the I test, linear regression, and ANOVA, its
popularity in everyday practical statistics is virtu-
ally unmatched. The standard interval is known as
the Wald interval as it comes from the Wald large
sample test for the binomial case.
So at rst glance, one may think that the problem
is too simple and has a clear and present solution.
In fact, the problem is a difcult one, with unantic-
ipated complexities. It is widely recognized that the
actual coverage probability of the standard inter-
val is poor for r near 0 or 1. Even at the level of
introductory statistics texts, the standard interval
is often presented with the caveat that it should be
used only when n min(r, 1r) is at least 5 (or 10).
Examination of the popular texts reveals that the
qualications with which the standard interval is
presented are varied, but they all reect the concern
about poor coverage when r is near the boundaries.
In a series of interesting recent articles, it has
also been pointed out that the coverage proper-
ties of the standard interval can be erratically
poor even if r is not near the boundaries; see, for
instance, Vollset (1993), Santner (1998), Agresti and
Coull (1998), and Newcombe (1998). Slightly older
literature includes Ghosh (1979), Cressie (1980)
and Blyth and Still (1983). Agresti and Coull (1998)
101
102 L. D. BROWN, T. T. CAI AND A. DASGUPTA
particularly consider the nominal 95% case and
show the erratic and poor behavior of the stan-
dard intervals coverage probability for small n
even when r is not near the boundaries. See their
Figure 4 for the cases n = 5 and 10.
We will show in this article that the eccentric
behavior of the standard intervals coverage prob-
ability is far deeper than has been explained or is
appreciated by statisticians at large. We will show
that the popular prescriptions the standard inter-
val comes with are defective in several respects and
are not to be trusted. In addition, we will moti-
vate, present and analyze several alternatives to the
standard interval for a general condence level. We
will ultimately make recommendations about choos-
ing a specic interval for practical use, separately
for different intervals of values of n. It will be seen
that for small n (40 or less), our recommendation
differs from the recommendation Agresti and Coull
(1998) made for the nominal 95% case. To facili-
tate greater appreciation of the seriousness of the
problem, we have kept the technical content of this
article at a minimal level. The companion article,
Brown, Cai and DasGupta (1999), presents the asso-
ciated theoretical calculations on Edgeworth expan-
sions of the various intervals coverage probabili-
ties and asymptotic expansions for their expected
lengths.
In Section 2, we rst present a series of exam-
ples on the degree of severity of the chaotic behav-
ior of the standard intervals coverage probability.
The chaotic behavior does not go away even when
n is quite large and r is not near the boundaries.
For instance, when n is 100, the actual coverage
probability of the nominal 95% standard interval
is 0.952 if r is 0.106, but only 0.911 if r is 0.107.
The behavior of the coverage probability can be even
more erratic as a function of n. If the true r is 0.5,
the actual coverage of the nominal 95% interval is
0.953 at the rather small sample size n = 17, but
falls to 0.919 at the much larger sample size n = 40.
This eccentric behavior can get downright
extreme in certain practically important prob-
lems. For instance, consider defective proportions in
industrial quality control problems. There it would
be quite common to have a true r that is small. If
the true r is 0.005, then the coverage probability
of the nominal 95% interval increases monotoni-
cally in n all the way up to n = 591 to the level
0.945, only to drop down to 0.792 if n is 592. This
unlucky spell continues for a while, and then the
coverage bounces back to 0.948 when n is 953, but
dramatically falls to 0.852 when n is 954. Subse-
quent unlucky spells start off at n = 1279, 1583 and
on and on. It should be widely known that the cov-
erage of the standard interval can be signicantly
lower at quite large sample sizes, and this happens
in an unpredictable and rather random way.
Continuing, also in Section 2 we list a set of com-
mon prescriptions that standard texts present while
discussing the standard interval. We show what
the deciencies are in some of these prescriptions.
Proposition 1 and the subsequent Table 3 illustrate
the defects of these common prescriptions.
In Sections 3 and 4, we present our alterna-
tive intervals. For the purpose of a sharper focus
we present these alternative intervals in two cat-
egories. First we present in Section 3 a selected
set of three intervals that clearly stand out in
our subsequent analysis; we present them as our
recommended intervals. Separately, we present
several other intervals in Section 4 that arise as
clear candidates for consideration as a part of a
comprehensive examination, but do not stand out
in the actual analysis.
The short list of recommended intervals contains
the score interval, an interval recently suggested
in Agresti and Coull (1998), and the equal tailed
interval resulting from the natural noninforma-
tive Jeffreys prior for a binomial proportion. The
score interval for the binomial case seems to
have been introduced in Wilson (1927); so we call
it the Wilson interval. Agresti and Coull (1998)
suggested, for the special nominal 95% case, the
interval rz
0.025
n
12
( r(1 r))
12
, where n = n4
and r = (. 2)(n 4); this is an adjusted Wald
interval that formally adds two successes and
two failures to the observed counts and then uses
the standard method. Our second interval is the
appropriate version of this interval for a general
condence level; we call it the AgrestiCoull inter-
val. By a slight abuse of terminology, we call our
third interval, namely the equal-tailed interval
corresponding to the Jeffreys prior, the Jeffreys
interval.
In Section 3, we also present our ndings on the
performances of our recommended intervals. As
always, two key considerations are their coverage
properties and parsimony as measured by expected
length. Simplicity of presentation is also sometimes
an issue, for example, in the context of classroom
presentation at an elementary level. On considera-
tion of these factors, we came to the conclusion that
for small n (40 or less), we recommend that either
the Wilson or the Jeffreys prior interval should
be used. They are very similar, and either may be
used depending on taste. The Wilson interval has a
closed-form formula. The Jeffreys interval does not.
One can expect that there would be resistance to
using the Jeffreys interval solely due to this rea-
son. We therefore provide a table simply listing the
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 103
limits of the Jeffreys interval for n up to 30 and
in addition also give closed form and very accurate
approximations to the limits. These approximations
do not need any additional software.
For larger n (n > 40), the Wilson, the Jeffreys
and the AgrestiCoull interval are all very simi-
lar, and so for such n, due to its simplest form,
we come to the conclusion that the AgrestiCoull
interval should be recommended. Even for smaller
sample sizes, the AgrestiCoull interval is strongly
preferable to the standard one and so might be the
choice where simplicity is a paramount objective.
The additional intervals we considered are two
slight modications of the Wilson and the Jeffreys
intervals, the ClopperPearson exact interval,
the arcsine interval, the logit interval, the actual
Jeffreys HPD interval and the likelihood ratio
interval. The modied versions of the Wilson and
the Jeffreys intervals correct disturbing downward
spikes in the coverages of the original intervals very
close to the two boundaries. The other alternative
intervals have earned some prominence in the liter-
ature for one reason or another. We had to apply a
certain amount of discretion in choosing these addi-
tional intervals as part of our investigation. Since
we wish to direct the main part of our conversation
to the three recommended intervals, only a brief
summary of the performances of these additional
intervals is presented along with the introduction
of each interval. As part of these quick summaries,
we indicate why we decided against including them
among the recommended intervals.
We strongly recommend that introductory texts
in statistics present one or more of these recom-
mended alternative intervals, in preference to the
standard one. The slight sacrice in simplicity
would be more than worthwhile. The conclusions
we make are given additional theoretical support
by the results in Brown, Cai and DasGupta (1999).
Analogous results for other one parameter discrete
families are presented in Brown, Cai and DasGupta
(2000).
2. THE STANDARD INTERVAL
When constructing a condence interval we usu-
ally wish the actual coverage probability to be close
to the nominal condence level. Because of the dis-
crete nature of the binomial distribution we cannot
always achieve the exact nominal condence level
unless a randomized procedure is used. Thus our
objective is to construct nonrandomized condence
intervals for r such that the coverage probability
T
r
(r C1) 1 where is some prespecied
value between 0 and 1. We will use the notation
C(r, n) = T
r
(r C1), 0 - r - 1, for the coverage
probability.
A standard condence interval for r based on nor-
mal approximation has gained universal recommen-
dation in the introductory statistics textbooks and
in statistical practice. The interval is known to guar-
antee that for any xed r (0, 1), C(r, n) 1
as n .
Let (z) and 4(z) be the standard normal density
and distribution functions, respectively. Throughout
the paper we denote z
2
= 4
1
(1 2), r =
.n and q = 1 r. The standard normal approxi-
mation condence interval C1
:
is given by
C1
:
= r n
12
( r q)
12
. (1)
This interval is obtained by inverting the accep-
tance region of the well known Wald large-sample
normal test for a general problem:
[(

) :c(

)[ , (2)
where is a generic parameter,

is the maximum
likelihood estimate of and :c(

) is the estimated
standard error of

. In the binomial case, we have
= r,

= .n and :c(

) = ( r q)
12
n
12
.
The standard interval is easy to calculate and
is heuristically appealing. In introductory statis-
tics texts and courses, the condence interval C1
:
is usually presented along with some heuristic jus-
tication based on the central limit theorem. Most
students and users no doubt believe that the larger
the number n, the better the normal approximation,
and thus the closer the actual coverage would be to
the nominal level 1. Further, they would believe
that the coverage probabilities of this method are
close to the nominal value, except possibly when n
is small or r is near 0 or 1. We will show how
completely both of these beliefs are false. Let us
take a close look at how the standard interval C1
:
really performs.
2.1 Lucky n, Lucky p
An interesting phenomenon for the standard
interval is that the actual coverage probability
of the condence interval contains nonnegligible
oscillation as both r and n vary. There exist some
lucky pairs (r, n) such that the actual coverage
probability C(r, n) is very close to or larger than
the nominal level. On the other hand, there also
exist unlucky pairs (r, n) such that the corre-
sponding C(r, n) is much smaller than the nominal
level. The phenomenon of oscillation is both in n,
for xed r, and in r, for xed n. Furthermore, dras-
tic changes in coverage occur in nearby r for xed
n and in nearby n for xed r. Let us look at ve
simple but instructive examples.
104 L. D. BROWN, T. T. CAI AND A. DASGUPTA
Fig. 1. Standard interval; oscillation phenomenon for xed r = 0.2 and variable n = 25 to 100.
The probabilities reported in the following plots
and tables, as well as those appearing later in
this paper, are the result of direct probability
calculations produced in S-PLUS. In all cases
their numerical accuracy considerably exceeds the
number of signicant gures reported and/or the
accuracy visually obtainable from the plots. (Plots
for variable r are the probabilities for a ne grid
of values of r, e.g., 2000 equally spaced values of r
for the plots in Figure 5.)
Example 1. Figure 1 plots the coverage prob-
ability of the nominal 95% standard interval for
r = 0.2. The number of trials n varies from 25 to
100. It is clear from the plot that the oscillation is
signicant and the coverage probability does not
steadily get closer to the nominal condence level
as n increases. For instance, C(0.2, 30) = 0.946 and
C(0.2, 98) = 0.928. So, as hard as it is to believe,
the coverage probability is signicantly closer to
0.95 when n = 30 than when n = 98. We see that
the true coverage probability behaves contrary to
conventional wisdom in a very signicant way.
Example 2. Now consider the case of r = 0.5.
Since r = 0.5, conventional wisdom might suggest
to an unsuspecting user that all will be well if n is
about 20. We evaluate the exact coverage probabil-
ity of the 95% standard interval for 10 n 50.
In Table 1, we list the values of lucky n [dened
as C(r, n) 0.95] and the values of unlucky n
[dened for specicity as C(r, n) 0.92]. The con-
clusions presented in Table 2 are surprising. We
Table 1
Standard interval; lucky n and unlucky n for 10 n 50 and r = 0.5
Lucky n 17 20 25 30 35 37 42 44 49
C(0.5, n) 0.951 0.959 0.957 .957 0.959 0.953 0.956 0.951 0.956
Unlucky n 10 12 13 15 18 23 28 33 40
C(0.5, n) 0.891 0.854 0.908 0.882 0.904 0.907 0.913 0.920 0.919
note that when n = 17 the coverage probability
is 0.951, but the coverage probability equals 0.904
when n = 18. Indeed, the unlucky values of n arise
suddenly. Although r is 0.5, the coverage is still
only 0.919 at n = 40. This illustrates the inconsis-
tency, unpredictability and poor performance of the
standard interval.
Example 3. Now let us move r really close to
the boundary, say r = 0.005. We mention in the
introduction that such r are relevant in certain
practical applications. Since r is so small, now one
may fully expect that the coverage probability of
the standard interval is poor. Figure 2 and Table
2.2 show that there are still surprises and indeed
we now begin to see a whole new kind of erratic
behavior. The oscillation of the coverage probabil-
ity does not show until rather large n. Indeed, the
coverage probability makes a slow ascent all the
way until n = 591, and then dramatically drops to
0.792 when n = 592. Figure 2 shows that thereafter
the oscillation manifests in full force, in contrast
to Examples 1 and 2, where the oscillation started
early on. Subsequent unlucky values of n again
arise in the same unpredictable way, as one can see
from Table 2.2.
2.2 Inadequate Coverage
The results in Examples 1 to 3 already show that
the standard interval can have coverage noticeably
smaller than its nominal value even for values of n
and of nr(1 r) that are not small. This subsec-
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 105
Table 2
Standard interval; late arrival of unlucky n for small r
Unlucky n 592 954 1279 1583 1876
C(0.005, n) 0.792 0.852 0.875 0.889 0.898
tion contains two more examples that display fur-
ther instances of the inadequacy of the standard
interval.
Example 4. Figure 3 plots the coverage probabil-
ity of the nominal 95% standard interval with xed
n = 100 and variable r. It can be seen from Fig-
ure 3 that in spite of the large sample size, signi-
cant change in coverage probability occurs in nearby
r. The magnitude of oscillation increases signi-
cantly as r moves toward 0 or 1. Except for values
of r quite near r = 0.5, the general trend of this
plot is noticeably below the nominal coverage value
of 0.95.
Example 5. Figure 4 shows the coverage proba-
bility of the nominal 99% standard interval with n =
20 and variable r from 0 to 1. Besides the oscilla-
tion phenomenon similar to Figure 3, a striking fact
in this case is that the coverage never reaches the
nominal level. The coverage probability is always
smaller than 0.99, and in fact on the average the
coverage is only 0.883. Our evaluations show that
for all n 45, the coverage of the 99% standard
interval is strictly smaller than the nominal level
for all 0 - r - 1.
It is evident from the preceding presentation
that the actual coverage probability of the standard
interval can differ signicantly from the nominal
condence level for moderate and even large sam-
ple sizes. We will later demonstrate that there are
other condence intervals that perform much better
Fig. 2. Standard interval; oscillation in coverage for small r.
in this regard. See Figure 5 for such a comparison.
The error in coverage comes from two sources: dis-
creteness and skewness in the underlying binomial
distribution. For a two-sided interval, the rounding
error due to discreteness is dominant, and the error
due to skewness is somewhat secondary, but still
important for even moderately large n. (See Brown,
Cai and DasGupta, 1999, for more details.) Note
that the situation is different for one-sided inter-
vals. There, the error caused by the skewness can
be larger than the rounding error. See Hall (1982)
for a detailed discussion on one-sided condence
intervals.
The oscillation in the coverage probability is
caused by the discreteness of the binomial dis-
tribution, more precisely, the lattice structure of
the binomial distribution. The noticeable oscil-
lations are unavoidable for any nonrandomized
procedure, although some of the competing proce-
dures in Section 3 can be seen to have somewhat
smaller oscillations than the standard procedure.
See the text of Casella and Berger (1990) for intro-
ductory discussion of the oscillation in such a
context.
The erratic and unsatisfactory coverage prop-
erties of the standard interval have often been
remarked on, but curiously still do not seem to
be widely appreciated among statisticians. See, for
example, Ghosh (1979), Blyth and Still (1983) and
Agresti and Coull (1998). Blyth and Still (1983) also
show that the continuity-corrected version still has
the same disadvantages.
2.3 Textbook Qualications
The normal approximation used to justify the
standard condence interval for r can be signi-
cantly in error. The error is most evident when the
true r is close to 0 or 1. See Lehmann (1999). In
fact, it is easy to show that, for any xed n, the
106 L. D. BROWN, T. T. CAI AND A. DASGUPTA
Fig. 3. Standard interval; oscillation phenomenon for xed n = 100 and variable r.
condence coefcient C(r, n) 0 as r 0 or 1.
Therefore, most major problems arise as regards
coverage probability when r is near the boundaries.
Poor coverage probabilities for r near 0 or 1 are
widely remarked on, and generally, in the popu-
lar texts, a brief sentence is added qualifying when
to use the standard condence interval for r. It
is interesting to see what these qualications are.
A sample of 11 popular texts gives the following
qualications:
The condence interval may be used if:
1. nr, n(1 r) are 5 (or 10);
2. nr(1 r) 5 (or 10);
3. n r, n(1 r) are 5 (or 10);
4. r 3

r(1 r)n does not contain 0 or 1;


5. n quite large;
6. n 50 unless r is very small.
It seems clear that the authors are attempting to
say that the standard interval may be used if the
central limit approximation is accurate. These pre-
scriptions are defective in several respects. In the
estimation problem, (1) and (2) are not veriable.
Even when these conditions are satised, we see,
for instance, from Table 1 in the previous section,
that there is no guarantee that the true coverage
probability is close to the nominal condence level.
Fig. 4. Coverage of the nominal 99% standard interval for xed n = 20 and variable r.
For example, when n = 40 and r = 0.5, one has
nr = n(1 r) = 20 and nr(1 r) = 10, so clearly
either of the conditions (1) and (2) is satised. How-
ever, from Table 1, the true coverage probability in
this case equals 0.919 which is certainly unsatisfac-
tory for a condence interval at nominal level 0.95.
The qualication (5) is useless and (6) is patently
misleading; (3) and (4) are certainly veriable, but
they are also useless because in the context of fre-
quentist coverage probabilities, a data-based pre-
scription does not have a meaning. The point is that
the standard interval clearly has serious problems
and the inuential texts caution the readers about
that. However, the caution does not appear to serve
its purpose, for a variety of reasons.
Here is a result that shows that sometimes the
qualications are not correct even in the limit as
n .
Proposition 1. Let > 0. For the standard con-
dence interval,
lim
n
inf
r: nr, n(1r)
C(r, n) (3)
T(o

- Poisson() b

),
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 107
Fig. 5. Coverage probability for n = 50.
Table 3
Standard interval; bound (3) on limiting minimum coverage
when nr, n(1 r)
5 7 10
lim
n
inf
r: nr, n(1r)
C(r, n) 0.875 0.913 0.926
where o

and b

are the integer parts of


(
2
2

2
4)2,
where the sign goes with o

and the sign with b

.
The proposition follows from the fact that the
sequence of Bin(n, n) distributions converges
weakly to the Poisson() distribution and so the
limit of the inmum is at most the Poisson proba-
bility in the proposition by an easy calculation.
Let us use Proposition 1 to investigate the validity
of qualications (1) and (2) in the list above. The
nominal condence level in Table 3 below is 0.95.
Table 4
Values of
r
for the modied lower bound for the Wilson interval
1 x = 1 x = 2 x = 3
0.90 0.105 0.532 1.102
0.95 0.051 0.355 0.818
0.99 0.010 0.149 0.436
It is clear that qualication (1) does not work at
all and (2) is marginal. There are similar problems
with qualications (3) and (4).
3. RECOMMENDED ALTERNATIVE INTERVALS
From the evidence gathered in Section 2, it seems
clear that the standard interval is just too risky.
This brings us to the consideration of alternative
intervals. We now analyze several such alternatives,
each with its motivation. A few other intervals are
also mentioned for their theoretical importance.
Among these intervals we feel three stand out in
their comparative performance. These are labeled
separately as the recommended intervals.
3.1 Recommended Intervals
3.1.1 The Wilson interval. An alternative to the
standard interval is the condence interval based
on inverting the test in equation (2) that uses the
null standard error (rq)
12
n
12
instead of the esti-
mated standard error ( r q)
12
n
12
. This condence
interval has the form
C1
W
=
.
2
2
n
2

n
12
n
2
( r q
2
(4n))
12
. (4)
This interval was apparently introduced by Wilson
(1927) and we will call this interval the Wilson
interval.
The Wilson interval has theoretical appeal. The
interval is the inversion of the CLT approximation
108 L. D. BROWN, T. T. CAI AND A. DASGUPTA
to the family of equal tail tests of J
0
: r = r
0
.
Hence, one accepts J
0
based on the CLT approx-
imation if and only if r
0
is in this interval. As
Wilson showed, the argument involves the solution
of a quadratic equation; or see Tamhane and Dunlop
(2000, Exercise 9.39).
3.1.2 The AgrestiCoull interval. The standard
interval C1
:
is simple and easy to remember. For
the purposes of classroom presentation and use in
texts, it may be nice to have an alternative that has
the familiar form r z

r(1 r)n, with a better


and new choice of r rather than r = .n. This can
be accomplished by using the center of the Wilson
region in place of r. Denote

. = .
2
2 and
n = n
2
. Let r =

. n and q = 1 r. Dene the
condence interval C1
PC
for r by
C1
PC
= r ( r q)
12
n
12
. (5)
Both the AgrestiCoull and the Wilson interval are
centered on the same value, r. It is easy to check
that the AgrestiCoull intervals are never shorter
than the Wilson intervals. For the case when =
0.05, if we use the value 2 instead of 1.96 for ,
this interval is the add 2 successes and 2 failures
interval in Agresti and Coull (1998). For this rea-
son, we call it the AgrestiCoull interval. To the
best of our knowledge, Samuels and Witmer (1999)
is the rst introductory statistics textbook that rec-
ommends the use of this interval. See Figure 5 for
the coverage of this interval. See also Figure 6 for
its average coverage probability.
3.1.3 Jeffreys interval. Beta distributions are the
standard conjugate priors for binomial distributions
and it is quite common to use beta priors for infer-
ence on r (see Berger, 1985).
Suppose . Bin(n, r) and suppose r has a prior
distribution Beta(o
1
, o
2
); then the posterior distri-
bution of r is Beta(. o
1
, n . o
2
). Thus a
100(1 )% equal-tailed Bayesian interval is given
by
|1(2; .o
1
, n .o
2
),
1(1 2; .o
1
, n .o
2
)|,
where 1(; n
1
, n
2
) denotes the quantile of a
Beta(n
1
, n
2
) distribution.
The well-known Jeffreys prior and the uniform
prior are each a beta distribution. The noninforma-
tive Jeffreys prior is of particular interest to us.
Historically, Bayes procedures under noninforma-
tive priors have a track record of good frequentist
properties; see Wasserman (1991). In this problem
the Jeffreys prior is Beta(12, 12) which has the
density function
](r) =
1
r
12
(1 r)
12
.
The 100(1)% equal-tailed Jeffreys prior interval
is dened as
C1
J
= |1
J
(r), U
J
(r)|, (6)
where 1
J
(0) = 0, U
J
(n) = 1 and otherwise
1
J
(r) = 1(2; .12, n .12), (7)
U
J
(r) = 1(1 2; .12, n .12). (8)
The interval is formed by taking the central 1
posterior probability interval. This leaves 2 poste-
rior probability in each omitted tail. The exception
is for r = 0(n) where the lower (upper) limits are
modied to avoid the undesirable result that the
coverage probability C(r, n) 0 as r 0 or 1.
The actual endpoints of the interval need to be
numerically computed. This is very easy to do using
softwares such as Minitab, S-PLUS or Mathematica.
In Table 5 we have provided the limits for the case
of the Jeffreys prior for 7 n 30.
The endpoints of the Jeffreys prior interval are
the 2 and 12 quantiles of the Beta(r12, n
r 12) distribution. The psychological resistance
among some to using the interval is because of the
inability to compute the endpoints at ease without
software.
We provide two avenues to resolving this problem.
One is Table 5 at the end of the paper. The second
is a computable approximation to the limits of the
Jeffreys prior interval, one that is computable with
just a normal table. This approximation is obtained
after some algebra from the general approximation
to a Beta quantile given in page 945 in Abramowitz
and Stegun (1970).
The lower limit of the 100(1 )% Jeffreys prior
interval is approximately
r 12
n 1 (n r 12)(c
2
1)
, (9)
where
=

4 r qn (
2
3)(6n
2
)
4 r q

(12 r)( r q(
2
2) 1n)
6n( r q)
2
.
The upper limit may be approximated by the same
expression with replaced by in . The simple
approximation given above is remarkably accurate.
Berry (1996, page 222) suggests using a simpler nor-
mal approximation, but this will not be sufciently
accurate unless n r(1 r) is rather large.
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 109
Table 5
95% Limits of the Jeffreys prior interval
x n=7 n=8 n=9 n=10 n=11 n=12
0 0 0.292 0 0.262 0 0.238 0 0.217 0 0.200 0 0.185
1 0.016 0.501 0.014 0.454 0.012 0.414 0.011 0.381 0.010 0.353 0.009 0.328
2 0.065 0.648 0.056 0.592 0.049 0.544 0.044 0.503 0.040 0.467 0.036 0.436
3 0.139 0.766 0.119 0.705 0.104 0.652 0.093 0.606 0.084 0.565 0.076 0.529
4 0.234 0.861 0.199 0.801 0.173 0.746 0.153 0.696 0.137 0.652 0.124 0.612
5 0.254 0.827 0.224 0.776 0.200 0.730 0.180 0.688
6 0.270 0.800 0.243 0.757
x n=13 n=14 n=15 n=16 n=17 n=18
0 0 0.173 0 0.162 0 0.152 0 0.143 0 0.136 0 0.129
1 0.008 0.307 0.008 0.288 0.007 0.272 0.007 0.257 0.006 0.244 0.006 0.232
2 0.033 0.409 0.031 0.385 0.029 0.363 0.027 0.344 0.025 0.327 0.024 0.311
3 0.070 0.497 0.064 0.469 0.060 0.444 0.056 0.421 0.052 0.400 0.049 0.381
4 0.114 0.577 0.105 0.545 0.097 0.517 0.091 0.491 0.085 0.467 0.080 0.446
5 0.165 0.650 0.152 0.616 0.140 0.584 0.131 0.556 0.122 0.530 0.115 0.506
6 0.221 0.717 0.203 0.681 0.188 0.647 0.174 0.617 0.163 0.589 0.153 0.563
7 0.283 0.779 0.259 0.741 0.239 0.706 0.222 0.674 0.207 0.644 0.194 0.617
8 0.294 0.761 0.272 0.728 0.254 0.697 0.237 0.668
9 0.303 0.746 0.284 0.716
x n=19 n=20 n=21 n=22 n=23 n=24
0 0 0.122 0 0.117 0 0.112 0 0.107 0 0.102 0 0.098
1 0.006 0.221 0.005 0.211 0.005 0.202 0.005 0.193 0.005 0.186 0.004 0.179
2 0.022 0.297 0.021 0.284 0.020 0.272 0.019 0.261 0.018 0.251 0.018 0.241
3 0.047 0.364 0.044 0.349 0.042 0.334 0.040 0.321 0.038 0.309 0.036 0.297
4 0.076 0.426 0.072 0.408 0.068 0.392 0.065 0.376 0.062 0.362 0.059 0.349
5 0.108 0.484 0.102 0.464 0.097 0.446 0.092 0.429 0.088 0.413 0.084 0.398
6 0.144 0.539 0.136 0.517 0.129 0.497 0.123 0.478 0.117 0.461 0.112 0.444
7 0.182 0.591 0.172 0.568 0.163 0.546 0.155 0.526 0.148 0.507 0.141 0.489
8 0.223 0.641 0.211 0.616 0.199 0.593 0.189 0.571 0.180 0.551 0.172 0.532
9 0.266 0.688 0.251 0.662 0.237 0.638 0.225 0.615 0.214 0.594 0.204 0.574
10 0.312 0.734 0.293 0.707 0.277 0.681 0.263 0.657 0.250 0.635 0.238 0.614
11 0.319 0.723 0.302 0.698 0.287 0.675 0.273 0.653
12 0.325 0.713 0.310 0.690
x n=25 n=26 n=27 n=28 n=29 n=30
0 0 0.095 0 0.091 0 0.088 0 0.085 0 0.082 0 0.080
1 0.004 0.172 0.004 0.166 0.004 0.160 0.004 0.155 0.004 0.150 0.004 0.145
2 0.017 0.233 0.016 0.225 0.016 0.217 0.015 0.210 0.015 0.203 0.014 0.197
3 0.035 0.287 0.034 0.277 0.032 0.268 0.031 0.259 0.030 0.251 0.029 0.243
4 0.056 0.337 0.054 0.325 0.052 0.315 0.050 0.305 0.048 0.295 0.047 0.286
5 0.081 0.384 0.077 0.371 0.074 0.359 0.072 0.348 0.069 0.337 0.067 0.327
6 0.107 0.429 0.102 0.415 0.098 0.402 0.095 0.389 0.091 0.378 0.088 0.367
7 0.135 0.473 0.129 0.457 0.124 0.443 0.119 0.429 0.115 0.416 0.111 0.404
8 0.164 0.515 0.158 0.498 0.151 0.482 0.145 0.468 0.140 0.454 0.135 0.441
9 0.195 0.555 0.187 0.537 0.180 0.521 0.172 0.505 0.166 0.490 0.160 0.476
10 0.228 0.594 0.218 0.576 0.209 0.558 0.201 0.542 0.193 0.526 0.186 0.511
11 0.261 0.632 0.250 0.613 0.239 0.594 0.230 0.577 0.221 0.560 0.213 0.545
12 0.295 0.669 0.282 0.649 0.271 0.630 0.260 0.611 0.250 0.594 0.240 0.578
13 0.331 0.705 0.316 0.684 0.303 0.664 0.291 0.645 0.279 0.627 0.269 0.610
14 0.336 0.697 0.322 0.678 0.310 0.659 0.298 0.641
15 0.341 0.690 0.328 0.672
110 L. D. BROWN, T. T. CAI AND A. DASGUPTA
Fig. 6. Comparison of the average coverage probabilities. From top to bottom: the AgrestiCoull interval C1
PC
, the Wilson interval C1
W
,
the Jeffreys prior interval C1
J
and the standard interval C1
s
. The nominal condence level is 0.95.
In Figure 5 we plot the coverage probability of the
standard interval, the Wilson interval, the Agresti
Coull interval and the Jeffreys interval for n = 50
and = 0.05.
3.2 Coverage Probability
In this and the next subsections, we compare the
performance of the standard interval and the three
recommended intervals in terms of their coverage
probability and length.
Coverage of the Wilson interval uctuates accept-
ably near 1 , except for r very near 0 or 1. It
might be helpful to consult Figure 5 again. It can
be shown that, when 1 = 0.95,
lim
n
inf
1
C

n
, n

= 0.92,
lim
n
inf
5
C

n
, n

= 0.936
and
lim
n
inf
10
C

n
, n

= 0.938
for the Wilson interval. In comparison, these three
values for the standard interval are 0.860, 0.870,
and 0.905, respectively, obviously considerably
smaller.
The modication C1
JW
presented in Section
4.1.1 removes the rst few deep downward spikes
of the coverage function for C1
W
. The resulting cov-
erage function is overall somewhat conservative for
r very near 0 or 1. Both C1
W
and C1
JW
have the
same coverage functions away from 0 or 1.
The AgrestiCoull interval has good minimum
coverage probability. The coverage probability of
the interval is quite conservative for r very close
to 0 or 1. In comparison to the Wilson interval it
is more conservative, especially for small n. This
is not surprising because, as we have noted, C1
PC
always contains C1
W
as a proper subinterval.
The coverage of the Jeffreys interval is quali-
tatively similar to that of C1
W
over most of the
parameter space |0, 1|. In addition, as we will see
in Section 4.3, C1
J
has an appealing connection to
the mid-T corrected version of the ClopperPearson
exact intervals. These are very similar to C1
J
,
over most of the range, and have similar appealing
properties. C1
J
is a serious and credible candidate
for practical use. The coverage has an unfortunate
fairly deep spike near r = 0 and, symmetrically,
another near r = 1. However, the simple modica-
tion of C1
J
presented in Section 4.1.2 removes these
two deep downward spikes. The modied Jeffreys
interval C1
JJ
performs well.
Let us also evaluate the intervals in terms of their
average coverage probability, the average being over
r. Figure 6 demonstrates the striking difference in
the average coverage probability among four inter-
vals: the AgrestiCoull interval, the Wilson interval
the Jeffreys prior interval and the standard inter-
val. The standard interval performs poorly. The
interval C1
PC
is slightly conservative in terms of
average coverage probability. Both the Wilson inter-
val and the Jeffreys prior interval have excellent
performance in terms of the average coverage prob-
ability; that of the Jeffreys prior interval is, if
anything, slightly superior. The average coverage
of the Jeffreys interval is really very close to the
nominal level even for quite small n. This is quite
impressive.
Figure 7 displays the mean absolute errors,

1
0
[C(r, n) (1 )[ dr, for n = 10 to 25, and
n = 26 to 40. It is clear from the plots that among
the four intervals, C1
W
, C1
PC
and C1
J
are com-
parable, but the mean absolute errors of C1
:
are
signicantly larger.
3.3 Expected Length
Besides coverage, length is also very important
in evaluation of a condence interval. We compare
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 111
Fig. 7. The mean absolute errors of the coverage of the standard (solid), the AgrestiCoull (dashed), the Jeffreys () and the Wilson
(dotted) intervals for n = 10 to 25 and n = 26 to 40.
both the expected length and the average expected
length of the intervals. By denition,
Expected length
= T
n, r
(length(C1))
=
n

r=0
(U(r, n) 1(r, n))

n
r

r
r
(1 r)
nr
,
where U and 1 are the upper and lower lim-
its of the condence interval C1, respectively.
The average expected length is just the integral

1
0
T
n, r
(length(C1)) dr.
We plot in Figure 8 the expected lengths of the
four intervals for n = 25 and = 0.05. In this case,
C1
W
is the shortest when 0.210 r 0.790, C1
J
is
the shortest when 0.133 r 0.210 or 0.790 r
0.867, and C1
:
is the shortest when r 0.133 or r
0.867. It is no surprise that the standard interval is
the shortest when r is near the boundaries. C1
:
is
not really in contention as a credible choice for such
values of r because of its poor coverage properties
in that region. Similar qualitative phenomena hold
for other values of n.
Figure 9 shows the average expected lengths of
the four intervals for n = 10 to 25 and n = 26 to
Fig. 8. The expected lengths of the standard (solid), the Wilson (dotted), the AgrestiCoull (dashed) and the Jeffreys () intervals for
n = 25 and = 0.05.
40. Interestingly, the comparison is clear and con-
sistent as n changes. Always, the standard interval
and the Wilson interval C1
W
have almost identical
average expected length; the Jeffreys interval C1
J
is
comparable to the Wilson interval, and in fact C1
J
is slightly more parsimonious. But the difference is
not of practical relevance. However, especially when
n is small, the average expected length of C1
PC
is
noticeably larger than that of C1
J
and C1
W
. In fact,
for n till about 20, the average expected length of
C1
PC
is larger than that of C1
J
by 0.04 to 0.02, and
this difference can be of denite practical relevance.
The difference starts to wear off when n is larger
than 30 or so.
4. OTHER ALTERNATIVE INTERVALS
Several other intervals deserve consideration,
either due to their historical value or their theoret-
ical properties. In the interest of space, we had to
exercise some personal judgment in deciding which
additional intervals should be presented.
4.1 Boundary modication
The coverage probabilities of the Wilson interval
and the Jeffreys interval uctuate acceptably near
112 L. D. BROWN, T. T. CAI AND A. DASGUPTA
Fig. 9. The average expected lengths of the standard (solid), the Wilson (dotted), the AgrestiCoull (dashed) and the Jeffreys ()
intervals for n = 10 to 25 and n = 26 to 40.
1 for r not very close to 0 or 1. Simple modica-
tions can be made to remove a few deep downward
spikes of their coverage near the boundaries; see
Figure 5.
4.1.1 Modied Wilson interval. The lower bound
of the Wilson interval is formed by inverting a CLT
approximation. The coverage has downward spikes
when r is very near 0 or 1. These spikes exist for all
n and . For example, it can be shown that, when
1 = 0.95 and r = 0.1765n,
lim
n
T
r
(r C1
W
) = 0.838
and when 1 = 0.99 and r = 0.1174n,
lim
n
T
r
(r C1
W
) = 0.889. The particular
numerical values (0.1174, 0.1765) are relevant only
to the extent that divided by n, they approximate
the location of these deep downward spikes.
The spikes can be removed by using a one-sided
Poisson approximation for r close to 0 or n. Suppose
we modify the lower bound for r = 1, . . . , r

. For a
xed 1 r r

, the lower bound of C1


W
should be
Fig. 10. Coverage probability for n = 50 and r (0, 0.15). The plots are symmetric about r = 0.5 and the coverage of the modied intervals
(solid line) is the same as that of the corresponding interval without modication (dashed line) for r |0.15, 0.85|.
replaced by a lower bound of
r
n where
r
solves
c

(
0
0!
1
1!
r1
(r1)!) = 1. (10)
A symmetric prescription needs to be followed to
modify the upper bound for r very near n. The value
of r

should be small. Values which work reasonably


well for 1 = 0.95 are
r

= 2 for n - 50 and r

= 3 for 51 n 100.
Using the relationship between the Poisson and

2
distributions,
T(Y r) = T(
2
2(1r)
2)
where Y Poisson(), one can also formally
express
r
in (10) in terms of the
2
quantiles:

r
= (12)
2
2r,
, where
2
2r,
denotes the 100th
percentile of the
2
distribution with 2r degrees of
freedom. Table 4 gives the values of
r
for selected
values of r and .
For example, consider the case 1 = 0.95 and
r = 2. The lower bound of C1
W
is 0.548(n
4). The modied Wilson interval replaces this by a
lower bound of n where = (12)
2
4, 0.05
. Thus,
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 113
Fig. 11. Coverage probability of other alternative intervals for n = 50.
from a
2
table, for r = 2 the new lower bound is
0.355n.
We denote this modied Wilson interval by
C1
JW
. See Figure 10 for its coverage.
4.1.2 Modied Jeffreys interval. Evidently, C1
J
has an appealing Bayesian interpretation, and,
its coverage properties are appealing again except
for a very narrow downward coverage spike fairly
near 0 and 1 (see Figure 5). The unfortunate down-
ward spikes in the coverage function result because
U
J
(0) is too small and symmetrically 1
J
(n) is too
large. To remedy this, one may revise these two
specic limits as
U
JJ
(0) = r
/
and 1
JJ
(n) = 1 r
/
,
where r
/
satises (1 r
/
)
n
= 2 or equivalently
r
/
= 1 (2)
1n
.
We also made a slight, ad hoc alteration of 1
J
(1)
and set
1
JJ
(1) = 0 and U
JJ
(n 1) = 1.
In all other cases, 1
JJ
= 1
J
and U
JJ
= U
J
.
We denote the modied Jeffreys interval by C1
JJ
.
This modication removes the two steep down-
ward spikes and the performance of the interval is
improved. See Figure 10.
4.2 Other intervals
4.2.1 The ClopperPearson interval. The Clopper
Pearson interval is the inversion of the equal-tail
binomial test rather than its normal approxima-
tion. Some authors refer to this as the exact
procedure because of its derivation from the bino-
mial distribution. If . = r is observed, then
the ClopperPearson (1934) interval is dened by
C1
CT
= |1
CT
(r), U
CT
(r)|, where 1
CT
(r) and U
CT
(r)
are, respectively, the solutions in r to the equations
T
r
(. r) = 2 and T
r
(. r) = 2.
It is easy to show that the lower endpoint is the 2
quantile of a beta distribution Beta(r, n r 1),
and the upper endpoint is the 1 2 quantile of a
beta distribution Beta(r 1, n r). The Clopper
Pearson interval guarantees that the actual cov-
erage probability is always equal to or above the
nominal condence level. However, for any xed r,
the actual coverage probability can be much larger
than 1 unless n is quite large, and thus the con-
dence interval is rather inaccurate in this sense. See
Figure 11. The ClopperPearson interval is waste-
fully conservative and is not a good choice for prac-
tical use, unless strict adherence to the prescription
C(r, n) 1 is demanded. Even then, better exact
methods are available; see, for instance, Blyth and
Still (1983) and Casella (1986).
114 L. D. BROWN, T. T. CAI AND A. DASGUPTA
4.2.2 The arcsine interval. Another interval is
based on a widely used variance stabilizing trans-
formation for the binomial distribution [see, e.g.,
Bickel and Doksum, 1977: T( r) = arcsin( r
12
)|.
This variance stabilization is based on the delta
method and is, of course, only an asymptotic one.
Anscombe (1948) showed that replacing r by
r = (. 38)(n 34) gives better variance
stabilization; furthermore
2n
12
|arcsin( r
12
) arcsin(r
12
)| 1(0, 1)
as n .
This leads to an approximate 100(1)% condence
interval for r,
C1
Pc
=

sin
2
(arcsin( r
12
)
1
2
n
12
),
sin
2
(arcsin( r
12
)
1
2
n
12
)

.
(11)
See Figure 11 for the coverage probability of this
interval for n = 50. This interval performs reason-
ably well for r not too close to 0 or 1. The coverage
has steep downward spikes near the two edges; in
fact it is easy to see that the coverage drops to zero
when r is sufciently close to the boundary (see
Figure 11). The mean absolute error of the coverage
of C1
Arc
is signicantly larger than those of C1
W
,
C1
PC
and C1
J
. We note that our evaluations show
that the performance of the arcsine interval with
the standard r in place of r in (11) is much worse
than that of C1
Arc
.
4.2.3 The logit interval. The logit interval is
obtained by inverting a Wald type interval for the
log odds = log(
r
1r
); (see Stone, 1995). The MLE
of (for 0 - . - n) is

= log

r
1 r

= log

.
n .

,
which is the so-called empirical logit transform. The
variance of

, by an application of the delta theorem,
can be estimated by

V =
n
.(n .)
.
This leads to an approximate 100(1)% condence
interval for ,
C1() = |
/
,
u
| = |

V
12
,

V
12
|. (12)
The logit interval for r is obtained by inverting the
interval (12),
C1
Logit
=

/
1 c

/
,
c

u
1 c

. (13)
The interval (13) has been suggested, for example,
in Stone (1995, page 667). Figure 11 plots the cov-
erage of the logit interval for n = 50. This interval
performs quite well in terms of coverage for r away
from 0 or 1. But the interval is unnecessarily long;
in fact its expected length is larger than that of the
ClopperPearson exact interval.
Remark. Anscombe (1956) suggested that

=
log(
.12
n.12
) is a better estimate of ; see also Cox
and Snell (1989) and Santner and Duffy (1989). The
variance of Anscombes

may be estimated by

V =
(n 1)(n 2)
n(.1)(n .1)
.
A new logit interval can be constructed using the
new estimates

and

V. Our evaluations show that
the new logit interval is overall shorter than C1
Logit
in (13). But the coverage of the new interval is not
satisfactory.
4.2.4 The Bayesian HPD interval. An exact
Bayesian solution would involve using the HPD
intervals instead of our equal-tails proposal. How-
ever, HPD intervals are much harder to compute
and do not do as well in terms of coverage proba-
bility. See Figure 11 and compare to the Jeffreys
equal-tailed interval in Figure 5.
4.2.5 The likelihood ratio interval. Along with
the Wald and the Rao score intervals, the likeli-
hood ratio method is one of the most used methods
for construction of condence intervals. It is con-
structed by inversion of the likelihood ratio test
which accepts the null hypothesis J
0
: r = r
0
if
2log(A
n
)
2
, where A
n
is the likelihood ratio
A
n
=
1(r
0
)
sup
r
1(r)
=
r
.
0
(1 r
0
)
n.
(.n)
.
(1 .n)
n.
,
1 being the likelihood function. See Rao (1973).
Brown, Cai and DasGupta (1999) show by analyt-
ical calculations that this interval has nice proper-
ties. However, it is slightly harder to compute. For
the purpose of the present article which we view as
primarily directed toward practice, we do not fur-
ther analyze the likelihood ratio interval.
4.3 Connections between Jeffreys Intervals
and Mid-P Intervals
The equal-tailed Jeffreys prior interval has some
interesting connections to the ClopperPearson
interval. As we mentioned earlier, the Clopper
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 115
Pearson interval C1
CT
can be written as
C1
CT
= |1(2; ., n .1),
1(1 2; .1, n .)|.
It therefore follows immediately that C1
J
is always
contained in C1
CT
. Thus C1
J
corrects the conserva-
tiveness of C1
CT
.
It turns out that the Jeffreys prior interval,
although Bayesianly constructed, has a clear and
convincing frequentist motivation. It is thus no sur-
prise that it does well from a frequentist perspec-
tive. As we now explain, the Jeffreys prior interval
C1
J
can be regarded as a continuity corrected
version of the ClopperPearson interval C1
CT
.
The interval C1
CT
inverts the inequality T
r
(.
1(r)) 2 to obtain the lower limit and similarly
for the upper limit. Thus, for xed r, the upper limit
of the interval for r, U
CT
(r), satises
T
U
CT
(r)
(. r) 2, (14)
and symmetrically for the lower limit.
This interval is very conservative; undesirably so
for most practical purposes. A familiar proposal to
eliminate this over-conservativeness is to instead
invert
T
r
(.1(r)1)(12)T
r
(.=1(r))=2, (15)
This amounts to solving
(12){T
U
CT
(r)
(. r 1)
T
U
CT
(r)
(. r)} = 2,
(16)
which is the same as
U
mid-T
(.) = (12)1(1 2; r, n r 1)
(12)1(1 2; r 1, n r)
(17)
and symmetrically for the lower endpoint. These
are the Mid-TClopper-Pearson intervals. They are
known to have good coverage and length perfor-
mance. U
mid-T
given in (17) is a weighted average
of two incomplete Beta functions. The incomplete
Beta function of interest, 1(1 2; r, nr1), is
continuous and monotone in r if we formally treat
r as a continuous argument. Hence the average of
the two functions dening U
mid-T
is approximately
the same as the value at the halfway point, r12.
Thus
U
mid-T
(.)1(12;r12,nr12)=U
J
(r),
exactly the upper limit for the equal-tailed Jeffreys
interval. Similarly, the corresponding approximate
lower endpoint is the Jeffreys lower limit.
Another frequentist way to interpret the Jeffreys
prior interval is to say that U
J
(r) is the upper
limit for the ClopperPearson rule with r12 suc-
cesses and 1
J
(r) is the lower limit for the Clopper
Pearson rule with r 12 successes. Strawderman
and Wells (1998) contains a valuable discussion of
mid-T intervals and suggests some variations based
on asymptotic expansions.
5. CONCLUDING REMARKS
Interval estimation of a binomial proportion is a
very basic problem in practical statistics. The stan-
dard Wald interval is in nearly universal use. We
rst show that the performance of this standard
interval is persistently chaotic and unacceptably
poor. Indeed its coverage properties defy conven-
tional wisdom. The performance is so erratic and
the qualications given in the inuential texts
are so defective that the standard interval should
not be used. We provide a fairly comprehensive
evaluation of many natural alternative intervals.
Based on this analysis, we recommend the Wilson
or the equal-tailed Jeffreys prior interval for small
n(n 40). These two intervals are comparable in
both absolute error and length for n 40, and we
believe that either could be used, depending on
taste.
For larger n, the Wilson, the Jeffreys and the
AgrestiCoull intervals are all comparable, and the
AgrestiCoull interval is the simplest to present.
It is generally true in statistical practice that only
those methods that are easy to describe, remember
and compute are widely used. Keeping this in mind,
we recommend the AgrestiCoull interval for prac-
tical use when n 40. Even for small sample sizes,
the easy-to-present AgrestiCoull interval is much
preferable to the standard one.
We would be satised if this article contributes
to a greater appreciation of the severe aws of the
popular standard interval and an agreement that it
deserves not to be used at all. We also hope that
the recommendations for alternative intervals will
provide some closure as to what may be used in
preference to the standard method.
Finally, we note that the specic choices of the
values of n, r and in the examples and gures
are artifacts. The theoretical results in Brown, Cai
and DasGupta (1999) show that qualitatively sim-
ilar phenomena as regarding coverage and length
hold for general n and r and common values of
the coverage. (Those results there are asymptotic
as n , but they are also sufciently accurate
for realistically moderate n.)
116 L. D. BROWN, T. T. CAI AND A. DASGUPTA
APPENDIX
Table A.1
95% Limits of the modied Jeffreys prior interval
x n=7 n=8 n=9 n=10 n=11 n=12
0 0 0.410 0 0.369 0 0.336 0 0.308 0 0.285 0 0.265
1 0 0.501 0 0.454 0 0.414 0 0.381 0 0.353 0 0.328
2 0.065 0.648 0.056 0.592 0.049 0.544 0.044 0.503 0.040 0.467 0.036 0.436
3 0.139 0.766 0.119 0.705 0.104 0.652 0.093 0.606 0.084 0.565 0.076 0.529
4 0.234 0.861 0.199 0.801 0.173 0.746 0.153 0.696 0.137 0.652 0.124 0.612
5 0.254 0.827 0.224 0.776 0.200 0.730 0.180 0.688
6 0.270 0.800 0.243 0.757
x n=13 n=14 n=15 n=16 n=17 n=18
0 0 0.247 0 0.232 0 0.218 0 0.206 0 0.195 0 0.185
1 0 0.307 0 0.288 0 0.272 0 0.257 0 0.244 0 0.232
2 0.033 0.409 0.031 0.385 0.029 0.363 0.027 0.344 0.025 0.327 0.024 0.311
3 0.070 0.497 0.064 0.469 0.060 0.444 0.056 0.421 0.052 0.400 0.049 0.381
4 0.114 0.577 0.105 0.545 0.097 0.517 0.091 0.491 0.085 0.467 0.080 0.446
5 0.165 0.650 0.152 0.616 0.140 0.584 0.131 0.556 0.122 0.530 0.115 0.506
6 0.221 0.717 0.203 0.681 0.188 0.647 0.174 0.617 0.163 0.589 0.153 0.563
7 0.283 0.779 0.259 0.741 0.239 0.706 0.222 0.674 0.207 0.644 0.194 0.617
8 0.294 0.761 0.272 0.728 0.254 0.697 0.237 0.668
9 0.303 0.746 0.284 0.716
x n=19 n=20 n=21 n=22 n=23 n=24
0 0 0.176 0 0.168 0 0.161 0 0.154 0 0.148 0 0.142
1 0 0.221 0 0.211 0 0.202 0 0.193 0 0.186 0 0.179
2 0.022 0.297 0.021 0.284 0.020 0.272 0.019 0.261 0.018 0.251 0.018 0.241
3 0.047 0.364 0.044 0.349 0.042 0.334 0.040 0.321 0.038 0.309 0.036 0.297
4 0.076 0.426 0.072 0.408 0.068 0.392 0.065 0.376 0.062 0.362 0.059 0.349
5 0.108 0.484 0.102 0.464 0.097 0.446 0.092 0.429 0.088 0.413 0.084 0.398
6 0.144 0.539 0.136 0.517 0.129 0.497 0.123 0.478 0.117 0.461 0.112 0.444
7 0.182 0.591 0.172 0.568 0.163 0.546 0.155 0.526 0.148 0.507 0.141 0.489
8 0.223 0.641 0.211 0.616 0.199 0.593 0.189 0.571 0.180 0.551 0.172 0.532
9 0.266 0.688 0.251 0.662 0.237 0.638 0.225 0.615 0.214 0.594 0.204 0.574
10 0.312 0.734 0.293 0.707 0.277 0.681 0.263 0.657 0.250 0.635 0.238 0.614
11 0.319 0.723 0.302 0.698 0.287 0.675 0.273 0.653
12 0.325 0.713 0.310 0.690
x n=25 n=26 n=27 n=28 n=29 n=30
0 0 0.137 0 0.132 0 0.128 0 0.123 0 0.119 0 0.116
1 0 0.172 0 0.166 0 0.160 0 0.155 0 0.150 0 0.145
2 0.017 0.233 0.016 0.225 0.016 0.217 0.015 0.210 0.015 0.203 0.014 0.197
3 0.035 0.287 0.034 0.277 0.032 0.268 0.031 0.259 0.030 0.251 0.029 0.243
4 0.056 0.337 0.054 0.325 0.052 0.315 0.050 0.305 0.048 0.295 0.047 0.286
5 0.081 0.384 0.077 0.371 0.074 0.359 0.072 0.348 0.069 0.337 0.067 0.327
6 0.107 0.429 0.102 0.415 0.098 0.402 0.095 0.389 0.091 0.378 0.088 0.367
7 0.135 0.473 0.129 0.457 0.124 0.443 0.119 0.429 0.115 0.416 0.111 0.404
8 0.164 0.515 0.158 0.498 0.151 0.482 0.145 0.468 0.140 0.454 0.135 0.441
9 0.195 0.555 0.187 0.537 0.180 0.521 0.172 0.505 0.166 0.490 0.160 0.476
10 0.228 0.594 0.218 0.576 0.209 0.558 0.201 0.542 0.193 0.526 0.186 0.511
11 0.261 0.632 0.250 0.613 0.239 0.594 0.230 0.577 0.221 0.560 0.213 0.545
12 0.295 0.669 0.282 0.649 0.271 0.630 0.260 0.611 0.250 0.594 0.240 0.578
13 0.331 0.705 0.316 0.684 0.303 0.664 0.291 0.645 0.279 0.627 0.269 0.610
14 0.336 0.697 0.322 0.678 0.310 0.659 0.298 0.641
15 0.341 0.690 0.328 0.672
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 117
ACKNOWLEDGMENTS
We thank Xuefeng Li for performing some helpful
computations and Jim Berger, David Moore, Steve
Samuels, Bill Studden and Ron Thisted for use-
ful conversations. We also thank the Editors and
two anonymous referees for their thorough and con-
structive comments. Supported by grants from the
National Science Foundation and the National Secu-
rity Agency.
REFERENCES
Abramowitz, M. and Stegun, I. A. (1970). Handbook of Mathe-
matical Functions. Dover, New York.
Agresti, A. and Coull, B. A. (1998). Approximate is better than
exact for interval estimation of binomial proportions. Amer.
Statist. 52 119126.
Anscombe, F. J. (1948). The transformation of Poisson, binomial
and negative binomial data. Biometrika 35 246254.
Anscombe, F. J. (1956). On estimating binomial response rela-
tions. Biometrika 43 461464.
Berger, J. O. (1985). Statistical Decision Theory and Bayesian
Analysis, 2nd ed. Springer, New York.
Berry, D. A. (1996). Statistics: A Bayesian Perspective.
Wadsworth, Belmont, CA.
Bickel, P. and Doksum, K. (1977). Mathematical Statistics.
Prentice-Hall, Englewood Cliffs, NJ.
Blyth, C. R. and Still, H. A. (1983). Binomial condence inter-
vals. J. Amer. Statist. Assoc. 78 108116.
Brown, L. D., Cai, T. and DasGupta, A. (1999). Condence inter-
vals for a binomial proportion and asymptotic expansions.
Ann. Statist to appear.
Brown, L. D., Cai, T. and DasGupta, A. (2000). Interval estima-
tion in discrete exponential family. Technical report, Dept.
Statistics. Univ. Pennsylvania.
Casella, G. (1986). Rening binomial condence intervals
Canad. J. Statist. 14 113129.
Casella, G. and Berger, R. L. (1990). Statistical Inference.
Wadsworth & Brooks/Cole, Belmont, CA.
Clopper, C. J. and Pearson, E. S. (1934). The use of condence
or ducial limits illustrated in the case of the binomial.
Biometrika 26 404413.
Cox, D. R. and Snell, E. J. (1989). Analysis of Binary Data, 2nd
ed. Chapman and Hall, London.
Cressie, N. (1980). A nely tuned continuity correction. Ann.
Inst. Statist. Math. 30 435442.
Ghosh, B. K. (1979). A comparison of some approximate con-
dence intervals for the binomial parameter J. Amer. Statist.
Assoc. 74 894900.
Hall, P. (1982). Improving the normal approximation when
constructing one-sided condence intervals for binomial or
Poisson parameters. Biometrika 69 647652.
Lehmann, E. L. (1999). Elements of Large-Sample Theory.
Springer, New York.
Newcombe, R. G. (1998). Two-sided condence intervals for the
single proportion; comparison of several methods. Statistics
in Medicine 17 857872.
Rao, C. R. (1973). Linear Statistical Inference and Its Applica-
tions. Wiley, New York.
Samuels, M. L. and Witmer, J. W. (1999). Statistics for
the Life Sciences, 2nd ed. Prentice Hall, Englewood
Cliffs, NJ.
Santner, T. J. (1998). A note on teaching binomial condence
intervals. Teaching Statistics 20 2023.
Santner, T. J. and Duffy, D. E. (1989). The Statistical Analysis
of Discrete Data. Springer, Berlin.
Stone, C. J. (1995). A Course in Probability and Statistics.
Duxbury, Belmont, CA.
Strawderman, R. L. and Wells, M. T. (1998). Approximately
exact inference for the common odds ratio in several 2 2
tables (with discussion). J. Amer. Statist. Assoc. 93 1294
1320.
Tamhane, A. C. and Dunlop, D. D. (2000). Statistics and Data
Analysis from Elementary to Intermediate. Prentice Hall,
Englewood Cliffs, NJ.
Vollset, S. E. (1993). Condence intervals for a binomial pro-
portion. Statistics in Medicine 12 809824.
Wasserman, L. (1991). An inferential interpretation of default
priors. Technical report, Carnegie-Mellon Univ.
Wilson, E. B. (1927). Probable inference, the law of succes-
sion, and statistical inference. J. Amer. Statist. Assoc. 22
209212.
Comment
Alan Agresti and Brent A. Coull
In this very interesting article, Professors Brown,
Cai and DasGupta (BCD) have shown that discrete-
Alan Agresti is Distinguished Professor, Depart-
ment of Statistics, University of Florida, Gainesville,
Florida 32611-8545 (e-mail: aa@stat.u.edu). Brent
A. Coull is Assistant Professor, Department of Bio-
statistics, Harvard School of Public Health, Boston,
Massachusetts 02115 (e-mail: bcoull@hsph.har-
vard.edu).
ness can cause havoc for much larger sample sizes
that one would expect. The popular (Wald) con-
dence interval for a binomial parameter r has been
known for some time to behave poorly, but readers
will surely be surprised that this can happen for
such large n values.
Interval estimation of a binomial parameter is
deceptively simple, as there are not even any nui-
sance parameters. The gold standard would seem
to be a method such as the ClopperPearson, based
on inverting an exact test using the binomial dis-
118 L. D. BROWN, T. T. CAI AND A. DASGUPTA
Fig. 1. A Comparison of mean expected lengths for the nominal 95% Jeffreys (J), Wilson (W), Modied Jeffreys (M-J), Modied Wilson
(M-W), and AgrestiCoull (AC) intervals for n = 5, 6, 7, 8, 9.
tribution rather than an approximate test using
the normal. Because of discreteness, however, this
method is too conservative. A more practical, nearly
gold standard for this and other discrete problems
seems to be based on inverting a two-sided test
using the exact distribution but with the mid-T
value. Similarly, with large-sample methods it is
better not to use a continuity correction, as other-
wise it approximates exact inference based on an
ordinary T-value, resulting in conservative behav-
ior. Interestingly, BCD note that the Jeffreys inter-
val (CI
J
) approximates the mid-T value correction
of the ClopperPearson interval. See Gart (1966)
for related remarks about the use of
1
2
additions
to numbers of successes and failures before using
frequentist methods.
1. METHODS FOR ELEMENTARY
STATISTICS COURSES
Its unfortunate that the Wald interval for r
is so seriously decient, because in addition to
being the simplest interval it is the obvious one
to teach in elementary statistics courses. By con-
trast, the Wilson interval (CI
W
) performs surpris-
ingly well even for small n. Since it is too com-
plex for many such courses, however, our motiva-
tion for the AgrestiCoull interval (CI
PC
) was to
provide a simple approximation for CI
W
. Formula
(4) in BCD shows that the midpoint r for CI
W
is
a weighted average of r and 1/2 that equals the
sample proportion after adding z
2
2
pseudo obser-
vations, half of each type; the square of the coef-
cient of z
2
is the same weighted average of the
variance of a sample proportion when r = r and
when r = 12, using n = n z
2
2
in place of n. The
CI
PC
uses the CI
W
midpoint, but its squared coef-
cient of z
2
is the variance r q n at the weighted
average r rather than the weighted average of the
variances. The resulting interval r z
2
( r q n)
12
is wider than CI
W
(by Jensens inequality), in par-
ticular being conservative for r near 0 and 1 where
CI
W
can suffer poor coverage probabilities.
Regarding textbook qualications on sample size
for using the Wald interval, skewness considera-
tions and the Edgeworth expansion suggest that
guidelines for n should depend on r through (1
2r)
2
|r(1r)|. See, for instance, Boos and Hughes-
Oliver (2000). But this does not account for the
effects of discreteness, and as BCD point out, guide-
lines in terms of r are not veriable. For elemen-
tary course teaching there is no obvious alternative
(such as I methods) for smaller n, so we think it is
sensible to teach a single method that behaves rea-
sonably well for all n, as do the Wilson, Jeffreys and
AgrestiCoull intervals.
2. IMPROVED PERFORMANCE WITH
BOUNDARY MODIFICATIONS
BCD showed that one can improve the behavior
of the Wilson and Jeffreys intervals for r near 0
and 1 by modifying the endpoints for CI
W
when
r = 1, 2, n 2, n 1 (and r = 3 and n 3 for
n > 50) and for CI
J
when r = 0, 1, n 1, n. Once
one permits the modication of methods near the
sample space boundary, other methods may per-
form decently besides the three recommended in
this article.
For instance, Newcombe (1998) showed that when
0 - r - n the Wilson interval CI
W
and the Wald
logit interval have the same midpoint on the logit
scale. In fact, Newcombe has shown (personal com-
munication, 1999) that the logit interval necessarily
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 119
Fig. 2. A comparison of expected lengths for the nominal 95% Jeffreys (J), Wilson (W), Modied Jeffreys (M-J), Modied Wilson (M-W),
and AgrestiCoull (AC) intervals for n = 5.
contains CI
W
. The logit interval is the uninforma-
tive one [0, 1] when r = 0 or r = n, but substitut-
ing the ClopperPearson limits in those cases yields
coverage probability functions that resemble those
for CI
W
and CI
PC
, although considerably more con-
servative for small n. Rubin and Schenker (1987)
recommended the logit interval after
1
2
additions to
numbers of successes and failures, motivating it as a
normal approximation to the posterior distribution
of the logit parameter after using the Jeffreys prior.
However, this modication has coverage probabili-
ties that are unacceptably small for r near 0 and 1
(See Vollset, 1993). Presumably some other bound-
ary modication will result in a happy medium. In
a letter to the editor about Agresti and Coull (1998),
Rindskopf (2000) argued in favor of the logit inter-
val partly because of its connection with logit mod-
eling. We have not used this method for teaching
in elementary courses, since logit intervals do not
extend to intervals for the difference of proportions
and (like CI
W
and CI
J
) they are rather complex for
that level.
For practical use and for teaching in more
advanced courses, some statisticians may prefer the
likelihood ratio interval, since conceptually it is sim-
ple and the method also applies in a general model-
building framework. An advantage compared to the
Wald approach is its invariance to the choice of
scale, resulting, for instance, both from the origi-
nal scale and the logit. BCD do not say much about
this interval, since it is harder to compute. However,
it is easy to obtain with standard statistical soft-
ware (e.g., in SAS, using the LRCI option in PROC
GENMOD for a model containing only an intercept
term and assuming a binomial response with logit
or identity link function). Graphs in Vollset (1993)
suggest that the boundary-modied likelihood ratio
interval also behaves reasonably well, although con-
servative for r near 0 and 1.
For elementary course teaching, a disadvantage
of all such intervals using boundary modications
is that making exceptions from a general, simple
recipe distracts students from the simple concept
of taking the estimate plus and minus a normal
score multiple of a standard error. (Of course, this
concept is not sufcient for serious statistical work,
but some over simplication and compromise is nec-
essary at that level.) Even with CI
PC
, instructors
may nd it preferable to give a recipe with the
same number of added pseudo observations for all
, instead of z
2
2
. Reasonably good performance
seems to result, especially for small , from the
value 4 z
2
0.025
used in the 95% CI
PC
interval (i.e.,
the add two successes and two failures interval).
Agresti and Caffo (2000) discussed this and showed
that adding four pseudo observations also dramat-
ically improves the Wald two-sample interval for
comparing proportions, although again at the cost of
rather severe conservativeness when both parame-
ters are near 0 or near 1.
3. ALTERNATIVE WIDTH COMPARISON
In comparing the expected lengths of the
three recommended intervals, BCD note that the
comparison is clear and consistent as n changes,
with the average expected length being noticeably
larger for CI
PC
than CI
J
and CI
W
. Thus, in their
concluding remarks, they recommend CI
J
and CI
W
for small n. However, since BCD recommend mod-
ifying CI
J
and CI
W
to eliminate severe downward
spikes of coverage probabilities, we believe that a
120 L. D. BROWN, T. T. CAI AND A. DASGUPTA
more fair comparison of expected lengths uses the
modied versions CI
JJ
and CI
JW
. We checked
this but must admit that gures analogous to
the BCD Figures 8 and 9 show that CI
JJ
and
CI
JW
maintain their expected length advantage
over CI
PC
, although it is reduced somewhat.
However, when n decreases below 10, the results
change, with CI
JJ
having greater expected width
than CI
PC
and CI
JW
. Our Figure 1 extends the
BCD Figure 9 to values of n - 10, showing how the
comparison differs between the ordinary intervals
and the modied ones. Our Figure 2 has the format
of the BCD Figure 8, but for n = 5 instead of 25.
Admittedly, n = 5 is a rather extreme case, one for
which the Jeffreys interval is modied unless r = 2
or 3 and the Wilson interval is modied unless r = 0
or 5, and for it CI
PC
has coverage probabilities that
can dip below 0.90. Thus, overall, the BCD recom-
mendations about choice of method seem reasonable
to us. Our own preference is to use the Wilson inter-
val for statistical practice and CI
PC
for teaching in
elementary statistics courses.
4. EXTENSIONS
Other than near-boundary modications, another
type of ne-tuning that may help is to invert a test
permitting unequal tail probabilities. This occurs
naturally in exact inference that inverts a sin-
gle two-tailed test, which can perform better than
inverting two separate one-tailed tests (e.g., Sterne,
1954; Blyth and Still, 1983).
Finally, we are curious about the implications of
the BCD results in a more general setting. How
much does their message about the effects of dis-
creteness and basing interval estimation on the
Jeffreys prior or the score test rather than the Wald
test extend to parameters in other discrete distri-
butions and to two-sample comparisons? We have
seen that interval estimation of the Poisson param-
eter benets from inverting the score test rather
than the Wald test on the count scale (Agresti and
Coull, 1998).
One would not think there could be anything
new to say about the Wald condence interval
for a proportion, an inferential method that must
be one of the most frequently used since Laplace
(1812, page 283). Likewise, the condence inter-
val for a proportion based on the Jeffreys prior
has received attention in various forms for some
time. For instance, R. A. Fisher (1956, pages 63
70) showed the similarity of a Bayesian analysis
with Jeffreys prior to his ducial approach, in a dis-
cussion that was generally critical of the condence
interval method but grudgingly admitted of limits
obtained by a test inversion such as the Clopper
Pearson method, though they fall short in logical
content of the limits found by the ducial argument,
and with which they have often been confused, they
do full some of the desiderata of statistical infer-
ences. Congratulations to the authors for brilliantly
casting new light on the performance of these old
and established methods.
Comment
George Casella
1. INTRODUCTION
Professors Brown, Cai and DasGupta (BCD) are
to be congratulated for their clear and imaginative
look at a seemingly timeless problem. The chaotic
behavior of coverage probabilities of discrete con-
dence sets has always been an annoyance, result-
ing in intervals whose coverage probability can be
George Casella is Arun Varma Commemorative
Term Professor and Chair, Department of Statis-
tics, University of Florida, Gainesville, Florida
32611-8545 (e-mail: casella@stat.u.edu).
vastly different from their nominal condence level.
What we now see is that for the Wald interval, an
approximate interval, the chaotic behavior is relent-
less, as this interval will not maintain 1 cover-
age for any value of n. Although xes relying on
ad hoc rules abound, they do not solve this funda-
mental defect of the Wald interval and, surprisingly,
the usual safety net of asymptotics is also shown
not to exist. So, as the song goes, Bye-bye, so long,
farewell to the Wald interval.
Now that the Wald interval is out, what is in?
There are probably two answers here, depending
on whether one is in the classroom or the consult-
ing room.
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 121
Fig. 1. Coverage probabilities of the Blyth-Still interval (upper) and Agresti-Coull interval (lower) for n = 100 and 1 = 0.95.
2. WHEN YOU SAY 95%. . .
In the classroom it is (still) valuable to have a
formula for a condence intervals, and I typically
present the Wilson/score interval, starting from
the test statistic formulation. Although this doesnt
have the pleasing r something, most students
can understand the logic of test inversion. More-
over, the fact that the interval does not have a
symmetric form is a valuable lesson in itself; the
statistical world is not always symmetric.
However, one thing still bothers me about this
interval. It is clearly not a 1 interval; that is,
it does not maintain its nominal coverage prob-
ability. This is a defect, and one that should not
be compromised. I am uncomfortable in present-
ing a condence interval that does not maintain its
stated condence; when you say 95% you should
mean 95%!
But the x here is rather simple: apply the con-
tinuity correction to the score interval (a technique
that seems to be out of favor for reasons I do not
understand). The continuity correction is easy to
justify in the classroom using pictures of the nor-
mal density overlaid on the binomial mass func-
tion, and the resulting interval will now maintain
its nominal level. (This last statement is not based
on analytic proof, but on numerical studies.) Anyone
reading Blyth (1986) cannot help being convinced
that this is an excellent approximation, coming at
only a slightly increased effort.
One other point that Blyth makes, which BCD do
not mention, is that it is easy to get exact con-
dence limits at the endpoints. That is, for . = 0 the
122 L. D. BROWN, T. T. CAI AND A. DASGUPTA
lower bound is 0 and for . = 1 the lower bound is
1 (1 )
1n
[the solution to T(. = 0) = 1 ].
3. USE YOUR TOOLS
The essential message that I take away from the
work of BCD is that an approximate/formula-based
approach to constructing a binomial condence
interval is bound to have essential aws. However,
this is a situation where brute force computing will
do the trick. The construction of a 1 binomial
condence interval is a discrete optimization prob-
lem that is easily programmed. So why not use the
tools that we have available? If the problem will
yield to brute force computation, then we should
use that solution.
Blyth and Still (1983) showed how to compute
exact intervals through numerical inversion of
tests, and Casella (1986) showed how to compute
exact intervals by rening conservative intervals.
So for any value of n and , we can compute an
exact, shortest 1 condence interval that will
not display any of the pathological behavior illus-
trated by BCD. As an example, Figure 1 shows the
AgrestiCoull interval along with the BlythStill
interval for n = 100 and 1 = 0.95. While
the AgrestiCoull interval fails to maintain 0.95
coverage in the middle r region, the BlythStill
interval always maintains 0.95 coverage. What is
more surprising, however, is that the BlythStill
interval displays much less variation in its cov-
erage probability, especially near the endpoints.
Thus, the simplistic numerical algorithm produces
an excellent interval, one that both maintains its
guaranteed coverage and reduces oscillation in the
coverage probabilities.
ACKNOWLEDGMENT
Supported by NSF Grant DMS-99-71586.
Comment
Chris Corcoran and Cyrus Mehta
We thank the authors for a very accessible
and thorough discussion of this practical prob-
lem. With the availability of modern computa-
tional tools, we have an unprecedented opportu-
nity to carefully evaluate standard statistical pro-
cedures in this manner. The results of such work
are invaluable to teachers and practitioners of
statistics everywhere. We particularly appreciate
the attention paid by the authors to the gener-
ally oversimplied and inadequate recommenda-
tions made by statistical texts regarding when to
use normal approximations in analyzing binary
data. As their work has plainly shown, even in
the simple case of a single binomial proportion,
the discreteness of the data makes the use of
Chris Corcoran is Assistant Professor, Depart-
ment of Mathematics and Statistics, Utah
State University, 3900 old Main Hill, Logon,
Utah, 84322-3900 (e-mail: corcoran@math.
usu.edu). Cyrus Mehta is Professor, Department
of Biostatistics, Harvard School of Public Health,
655 Huntington Avenue Boston, Massachusetts
02115 and is with Cytel Software Corporation, 675
Massachusetts Avenue, Cambridge, Massachusetts
02319.
some asymptotic procedures tenuous, even when the
underlying probability lies away from the boundary
or when the sample size is relatively large.
The authors have evaluated various condence
intervals with respect to their coverage properties
and average lengths. Implicit in their evaluation
is the premise that overcoverage is just as bad as
undercoverage. We disagree with the authors on this
fundamental issue. If, because of the discreteness of
the test statistic, the desired condence level cannot
be attained, one would ordinarily prefer overcover-
age to undercoverage. Wouldnt you prefer to hire
a fortune teller whose track record exceeds expec-
tations to one whose track record is unable to live
up to its claim of accuracy? With the exception of
the ClopperPearson interval, none of the intervals
discussed by the authors lives up to its claim of
95% accuracy throughout the range of r. Yet the
authors dismiss this interval on the grounds that
it is wastefully conservative. Perhaps so, but they
do not address the issue of how the wastefulness is
manifested.
What penalty do we incur for furnishing con-
dence intervals that are more truthful than was
required of them? Presumably we pay for the conser-
vatism by an increase in the length of the condence
interval. We thought it would be a useful exercise
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 123
Fig. 1. Actual coverage probabilities for BSC and LR intervals
as a function of r(n = 50). Compare to authors Figures 5, 10
and 11.
to actually investigate the magnitude of this penalty
for two condence interval procedures that are guar-
anteed to provide the desired coverage but are not
as conservative as ClopperPearson. Figure 1 dis-
plays the true coverage probabilities for the nominal
95% BlythStillCasella (see Blyth and Still, 1983;
Casella, 1984) condence interval (BSC interval)
and the 95% condence interval obtained by invert-
ing the exact likelihood ratio test (LR interval; the
inversion follows that shown by Aitken, Anderson,
Francis and Hinde, 1989, pages 112118).
There is no value of r for which the coverage of the
BSC and LR intervals falls below 95%. Their cover-
age probabilities are, however, much closer to 95%
than would be obtained by the ClopperPearson pro-
cedure, as is evident from the authors Figure 11.
Thus one could say that these two intervals are uni-
formly better than the ClopperPearson interval.
We next investigate the penalty to be paid for the
guaranteed coverage in terms of increased length of
the BSC and LR intervals relative to the Wilson,
AgrestiCoull, or Jeffreys intervals recommended
by the authors. This is shown by Figure 2.
In fact the BSC and LR intervals are actually
shorter than AgrestiCoull for r - 0.2 or r > 0.8,
and shorter than the Wilson interval for r - 0.1
and r > 0.9. The only interval that is uniformly
shorter than BSC and LR is the Jeffreys interval.
Most of the time the difference in lengths is negligi-
ble, and in the worst case (at r = 0.5) the Jeffreys
interval is only shorter by 0.025 units. Of the three
asymptotic methods recommended by the authors,
the Jeffreys interval yields the lowest average prob-
ability of coverage, with signicantly greater poten-
tial relative undercoverage in the (0.05, 0.20) and
(0.80, 0.95) regions of the parameter space. Consid-
ering this, one must question the rationale for pre-
ferring Jeffreys to either BSC or LR.
The authors argue for simplicity and ease of com-
putation. This argument is valid for the teaching of
statistics, where the instructor must balance sim-
plicity with accuracy. As the authors point out, it is
customary to teach the standard interval in intro-
ductory courses because the formula is straight-
forward and the central limit theorem provides a
good heuristic for motivating the normal approxi-
mation. However, the evidence shows that the stan-
dard method is woefully inadequate. Teaching sta-
tistical novices about a ClopperPearson type inter-
val is conceptually difcult, particularly because
exact intervals are impossible to compute by hand.
As the AgrestiCoull interval preserves the con-
dence level most successfully among the three rec-
ommended alternative intervals, we believe that
this feature when coupled with its straightforward
computation (particularly when = 0.05) makes
this approach ideal for the classroom.
Simplicity and ease of computation have no role
to play in statistical practice. With the advent
of powerful microcomputers, researchers no longer
resort to hand calculations when analyzing data.
While the need for simplicity applies to the class-
room, in applications we primarily desire reliable,
accurate solutions, as there is no signicant dif-
ference in the computational overhead required by
the authors recommended intervals when compared
to the BSC and LR methods. From this perspec-
tive, the BSC and LR intervals have a substantial
advantage relative to the various asymptotic inter-
vals presented by the authors. They guarantee cov-
erage at a relatively low cost in increased length.
In fact, the BSC interval is already implemented in
StatXact (1998) and is therefore readily accessible to
practitioners.
124 L.D. BROWN, T.T. CAI AND A. DASGUPTA
Fig. 2. Expected lengths of BSC and LR intervals as a function of r compared, respectively, to Wilson, AgrestiCoull and Jeffreys
intervals (n = 25). Compare to authors Figure 8.
Comment
Malay Ghosh
This is indeed a very valuable article which brings
out very clearly some of the inherent difculties
associated with condence intervals for parame-
ters of interest in discrete distributions. Professors
Brown, Cai and Dasgupta (henceforth BCD) are
to be complimented for their comprehensive and
thought-provoking discussion about the chaotic
behavior of the Wald interval for the binomial pro-
portion and an appraisal of some of the alternatives
that have been proposed.
My remarks will primarily be conned to the
discussion of Bayesian methods introduced in this
paper. BCD have demonstrated very clearly that the
Malay Ghosh is Distinguished Professor, Depart-
ment of Statistics, University of Florida, Gainesville,
Florida 32611-8545 (e-mail: ghoshm@stat.u.edu).
modied Jeffreys equal-tailed interval works well
in this problem and recommend it as a possible con-
tender to the Wilson interval for n 40.
There is a deep-rooted optimality associated with
Jeffreys prior as the unique rst-order probability
matching prior for a real-valued parameter of inter-
est with no nuisance parameter. Roughly speak-
ing, a probability matching prior for a real-valued
parameter is one for which the coverage probability
of a one-sided Baysian credible interval is asymp-
totically equal to its frequentist counterpart. Before
giving a formal denition of such priors, we pro-
vide an intuitive explanation of why Jeffreys prior
is a matching prior. To this end, we begin with
the fact that if .
1
, . . . , .
n
are iid 1(, 1), then

.
n
=

n
i=1
.
i
n is the MLE of . With the uni-
form prior () c (a constant), the posterior of
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 125
is 1(

.
n
, 1n). Accordingly, writing z

for the upper


100% point of the 1(0, 1) distribution,
T(

.
n
z

n
12
[

.
n
)
= 1 = T(

.
n
z

n
12
[)
and this is an example of perfect matching. Now
if

n
is the MLE of , under suitable regular-
ity conditions,

n
[ is asymptotically (as n )
1(, 1
1
()), where 1() is the Fisher Information
number. With the transformation () =

1
12
(I),
by the delta method, (

n
) is asymptotically
1((), 1). Now, intuitively one expects the uniform
prior () c as the asymptotic matching prior for
(). Transforming back to the original parameter,
Jeffreys prior is a probability matching prior for .
Of course, this requires an invariance of probability
matching priors, a fact which is rigorously estab-
lished in Datta and Ghosh (1996). Thus a uniform
prior for arcsin(
12
), where is the binomial pro-
portion, leads to Jeffreys Beta (1/2, 1/2) prior for .
When is the Poisson parameter, the uniform prior
for
12
leads to Jeffreys prior
12
for .
In a more formal set-up, let .
1
, . . . , .
n
be iid
conditional on some real-valued . Let
1

(.
1
, . . . ,
.
n
) denote a posterior (1)th quantile for under
the prior . Then is said to be a rst-order prob-
ability matching prior if
T(
1

(.
1
, . . . , .
n
)[)
= 1 o(n
12
).
(1)
This denition is due to Welch and Peers (1963)
who showed by solving a differential equation that
Jeffreys prior is the unique rst-order probability
matching prior in this case. Strictly speaking, Welch
and Peers proved this result only for continuous
distributions. Ghosh (1994) pointed out a suitable
modication of criterion (1) which would lead to the
same conclusion for discrete distributions. Also, for
small and moderate samples, due to discreteness,
one needs some modications of Jeffreys interval as
done so successfully by BCD.
This idea of probability matching can be extended
even in the presence of nuisance parameters.
Suppose that = (
1
, . . . ,
r
)
T
, where
1
is the par-
ameter of interest, while (
2
, . . . ,
r
)
T
is the nui-
sance parameter. Writing 1() = ((1
jI
)) as the
Fisher information matrix, if
1
is orthogonal to
(
2
, . . . ,
r
)
T
in the sense of Cox and Reid (1987),
that is, 1
1I
= 0 for all I = 2, . . . , r, extending
the previous intuitive argument, () 1
12
11
()
is a probability matching prior. Indeed, this prior
belongs to the general class of rst-order probabil-
ity matching priors
() 1
12
11
()!(
2
, . . . ,
r
)
as derived in Tibshirani (1989). Here !() is an arbi-
trary function differentiable in its arguments.
In general, matching priors have a long success
story in providing frequentist condence intervals,
especially in complex problems, for example, the
BehrensFisher or the common mean estimation
problems where frequentist methods run into dif-
culty. Though asymptotic, the matching property
seems to hold for small and moderate sample sizes
as well for many important statistical problems.
One such example is Garvan and Ghosh (1997)
where such priors were found for general disper-
sion models as given in Jorgensen (1997). It may
be worthwhile developing these priors in the pres-
ence of nuisance parameters for other discrete cases
as well, for example when the parameter of interest
is the difference of two binomial proportions, or the
log-odds ratio in a 2 2 contingency table.
Having argued so strongly in favor of matching
priors, I wonder, though, whether there is any spe-
cial need for such priors in this particular problem of
binomial proportions. It appears that any Beta (o, o)
prior will do well in this case. As noted in this paper,
by shrinking the MLE .n toward the prior mean
1/2, one achieves a better centering for the construc-
tion of condence intervals. The two diametrically
opposite priors Beta (2, 2) (symmetric concave with
maximum at 1/2 which provides the AgrestiCoull
interval) and Jeffreys prior Beta (12, 12) (symmet-
ric convex with minimum at 12) seem to be equally
good for recentering. Indeed, I wonder whether any
Beta (, ) prior which shrinks the MLE toward
the prior mean ( ) becomes appropriate for
recentering.
The problem of construction of condence inter-
vals for binomial proportions occurs in rst courses
in statistics as well as in day-to-day consulting.
While I am strongly in favor of replacing Wald inter-
vals by the new ones for the latter, I am not quite
sure how easy it will be to motivate these new inter-
vals for the former. The notion of shrinking can be
explained adequately only to a few strong students
in introductory statistics courses. One possible solu-
tion for the classroom may be to bring in the notion
of continuity correction and somewhat heuristcally
ask students to work with (.
1
2
, n.
1
2
) instead
of (., n .). In this way, one centers around
(.
1
2
)(n 1) a la Jeffreys prior.
126 L. D. BROWN, T. T. CAI AND A. DASGUPTA
Comment
Thomas J. Santner
I thank the authors for their detailed look at
a well-studied problem. For the Wald binomial r
interval, there has not been an appreciation of
the long persistence (in n) of r locations having
substantially decient achieved coverage compared
with the nominal coverage. Figure 1 is indeed a
picture that says a thousand words. Similarly, the
asymptotic lower limit in Theorem 1 for the mini-
mum coverage of the Wald interval is an extremely
useful analytic tool to explain this phenomenon,
although other authors have given xed r approx-
imations of the coverage probability of the Wald
interval (e.g., Theorem 1 of Ghosh, 1979).
My rst set of comments concern the specic bino-
mial problem that the authors address and then the
implications of their work for other important dis-
crete data condence interval problems.
The results in Ghosh (1979) complement the cal-
culations of Brown, Cai and DasGupta (BCD) by
pointing out that the Wald interval is too long in
addition to being centered at the wrong value (the
MLE as opposed to a Bayesian point estimate such
is used by the AgrestiCoull interval). His Table 3
lists the probability that the Wald interval is longer
than the Wilson interval for a central set of r val-
ues (from 0.20 to 0.80) and a range of sample sizes
n from 20 to 200. Perhaps surprisingly, in view of
its inferior coverage characteristics, the Wald inter-
val tends to be longer than the Wilson interval
with very high probability. Hence the Wald interval
is both too long and centered at the wrong place.
This is a dramatic effect of the skewness that BCD
mention.
When discussing any system of intervals, one
is concerned with the consistency of the answers
given by the interval across multiple uses by a
single researcher or by groups of users. Formally,
this is the reason why various symmetry properties
are required of condence intervals. For example,
in the present case, requiring that the r interval
(1(.), U(.)) satisfy the symmetry property
(1(r), U(r)) = (1 1(n r), 1 U(n r)) (1)
for r {0, . . . , n} shows that investigators who
reverse their denitions of success and failure will
Thomas J. Santner is Profesor, Ohio State Univer-
sity, 404 Cockins Hall, 1958 Neil Avenue, Columbus,
Ohio 43210 (e-mail: tjs@stat.ohio-state.edu).
be consistent in their assessment of the likely values
for r. Symmetry (1) is the minimal requirement of a
binomial condence interval. The Wilson and equal-
tailed Jeffrey intervals advocated by BCD satisfy
the symmetry property (1) and have coverage that
is centered (when coverage is plotted versus true r)
about the nominal value. They are also straightfor-
ward to motivate, even for elementary students, and
simple to compute for the outcome of interest.
However, regarding r condence intervals as the
inversion of a family of acceptance regions corre-
sponding to size tests of H
0
: r = r
0
versus
H
P
: r ,= r
0
for 0 - r
0
- 1 has some sub-
stantial advantages. Indeed, Brown et al. mention
this inversion technique when they remark on the
desirable properties of intervals formed by invert-
ing likelihood ratio test acceptance regions of H
0
versus H
P
. In the binomial case, the acceptance
region of any reasonable test of H
0
: r = r
0
is of
the form {1
r
0
, . . . , U
r
0
}. These acceptance regions
invert to intervals if and only if 1
r
0
and U
r
0
are
nondecreasing in r
0
(otherwise the inverted r con-
dence set can be a union of intervals). Of course,
there are many families of size tests that meet
this nondecreasing criterion for inversion, includ-
ing the very conservative test used by Clopper and
Pearson (1934). For the binomial problem, Blyth and
Still (1983) constructed a set of condence intervals
by selecting among size acceptance regions those
that possessed additional symmetry properties and
were small (leading to short condence intervals).
For example, they desired that the interval should
move to the right as r increases when n is xed
and should move the left as n increases when r
is xed. They also asked that their system of inter-
vals increase monotonically in the coverage proba-
bility for xed r and n in the sense that the higher
nominal coverage interval contain the lower nomi-
nal coverage interval.
In addition to being less intuitive to unsophisti-
cated statistical consumers, systems of condence
intervals formed by inversion of acceptance regions
also have two other handicaps that have hindered
their rise in popularity. First, they typically require
that the condence interval (essentially) be con-
structed for all possible outcomes, rather than
merely the response of interest. Second, their rather
brute force character means that a specialized com-
puter program must be written to produce the
acceptance sets and their inversion (the intervals).
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 127
Fig. 1. Coverage of nominal 95% symmetric DuffySantner r intervals for n = 20 (bottom panel) and n = 50 (top panel).
However, the benets of having reasonably short
and suitably symmetric condence intervals are suf-
cient that such intervals have been constructed for
several frequently occurring problems of biostatis-
tics. For example, Jennison and Turnbull (1983) and
Duffy and Santner (1987) present acceptance set
inversion condence intervals (both with available
FORTRAN programs to implement their methods)
for a binomial r based on data from a multistage
clinical trial; Coe and Tamhane (1989) describe a
more sophisticated set of repeated condence inter-
vals for r
1
r
2
also based on multistage clinical
trial data (and give a SAS macro to produce the
intervals). Yamagami and Santner (1990) present
an acceptance setinversion condence interval and
FORTRAN program for r
1
r
2
in the two-sample
binomial problem. There are other examples.
To contrast with the intervals whose coverages
are displayed in BCDs Figure 5 for n = 20 and
n = 50, I formed the multistage intervals of Duffy
and Santner that strictly attain the nominal con-
dence level for all r. The computation was done
naively in the sense that the multistage FORTRAN
program by Duffy that implements this method
was applied using one stage with stopping bound-
aries arbitrarily set at (o, b) = (0, 1) in the nota-
tion of Duffy and Santner, and a small adjustment
was made to insure symmetry property (1). (The
nonsymmetrical multiple stage stopping boundaries
that produce the data considered in Duffy and Sant-
ner do not impose symmetry.) The coverages of these
systems are shown in Figure 1. To give an idea of
computing time, the n = 50 intervals required less
than two seconds to compute on my 400 Mhz PC.
To further facilitate comparison with the intervals
whose coverage is displayed in Figure 5 of BCD,
I computed the Duffy and Santner intervals for a
slightly lower level of coverage, 93.5%, so that the
average coverage was about the desired 95% nomi-
nal level; the coverage of this system is displayed
in Figure 2 on the same vertical scale and com-
pares favorably. It is possible to call the FORTRAN
program that makes these intervals within SPLUS
which makes for convenient data analysis.
I wish to mention that are a number of other
small sample interval estimation problems of con-
tinuing interest to biostatisticians that may well
have very reasonable small sample solutions based
on analogs of the methods that BCD recommend.
128 L. D. BROWN, T. T. CAI AND A. DASGUPTA
Fig. 2. Coverage of nominal 93.5% symmetric DuffySantner r intervals for n = 50.
Most of these would be extremely difcult to han-
dle by the more brute force method of inverting
acceptance sets. The rst of these is the problem
of computing simultaneous condence intervals for
r
0
r
i
, 1 i T that arises in comparing a con-
trol binomial distribution with T treatment ones.
The second concerns forming simultaneous con-
dence intervals for r
i
r
j
, the cell probabilities
of a multinomial distribution. In particular, the
equal-tailed Jeffrey prior approach recommended by
the author has strong appeal for both of these prob-
lems.
Finally, I note that the Wilson intervals seem
to have received some recommendation as the
method of choice in other elementary texts. In his
introductory texts, Larson (1974) introduces the
Wilson interval as the method of choice although
he makes the vague, and indeed false, statement, as
BCD show, that the user can use the Wald interval if
n is large enough. One reviewer of Santner (1998),
an article that showed the coverage virtues of the
Wilson interval compared with Wald-like intervals
advocated by another author in the magazine Teach-
ing Statistics (written for high school teachers) com-
mented that the Wilson method was the standard
method taught in the U.K.
Rejoinder
Lawrence D. Brown, T. Tony Cai and Anirban DasGupta
We deeply appreciate the many thoughtful and
constructive remarks and suggestions made by the
discussants of this paper. The discussion suggests
that we were able to make a convincing case that
the often-used Wald interval is far more problem-
atic than previously believed. We are happy to see
a consensus that the Wald interval deserves to
be discarded, as we have recommended. It is not
surprising to us to see disagreement over the spe-
cic alternative(s) to be recommended in place of
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 129
this interval. We hope the continuing debate will
add to a greater understanding of the problem, and
we welcome the chance to contribute to this debate.
A. It seems that the primary source of disagree-
ment is based on differences in interpretation
of the coverage goals for condence intervals.
We will begin by presenting our point of view
on this fundamental issue.
We will then turn to a number of other issues,
as summarized in the following list:
B. Simplicity is important.
C. Expected length is also important.
D. Santners proposal.
E. Should a continuity correction be used?
F. The Wald interval also performs poorly in
other problems.
G. The two-sample binomial problem.
H. Probability-matching procedures.
I. Results from asymptotic theory.
A. Professors Casella, Corcoran and Mehta come
out in favor of making coverage errors always fall
only on the conservative side. This is a traditional
point of view. However, we have taken a different
perspective in our treatment. It seems more consis-
tent with contemporary statistical practice to expect
that a % condence interval should cover the true
value approximately % of the time. The approxi-
mation should be built on sound, relevant statisti-
cal calculations, and it should be as accurate as the
situation allows.
We note in this regard that most statistical mod-
els are only felt to be approximately valid as repre-
sentations of the true situation. Hence the result-
ing coverage properties from those models are at
best only approximately accurate. Furthermore, a
broad range of modern procedures is supported
only by asymptotic or Monte-Carlo calculations, and
so again coverage can at best only be approxi-
mately the nominal value. As statisticians we do
the best within these constraints to produce proce-
dures whose coverage comes close to the nominal
value. In these contexts when we claim % cover-
age we clearly intend to convey that the coverage is
close to %, rather than to guarantee it is at least
%.
We grant that the binomial model has a some-
what special character relative to this general dis-
cussion. There are practical contexts where one can
feel condent this model holds with very high preci-
sion. Furthermore, asymptotics are not required in
order to construct practical procedures or evaluate
their properties, although asymptotic calculations
can be useful in both regards. But the discrete-
ness of the problem introduces a related barrier
to the construction of satisfactory procedures. This
forces one to again decide whether % should mean
approximately %, as it does in most other con-
temporary applications, or at least % as can
be obtained with the BlythStill procedure or the
CloppePearson procedure. An obvious price of the
latter approach is in its decreased precision, as mea-
sured by the increased expected length of the inter-
vals.
B. All the discussants agree that elementary
motivation and simplicity of computation are impor-
tant attributes in the classroom context. We of
course agree. If these considerations are paramount
then the AgrestiCoull procedure is ideal. If the
need for simplicity can be relaxed even a little, then
we prefer the Wilson procedure: it is only slightly
harder to compute, its coverage is clearly closer to
the nominal value across a wider range of values of
r, and it can be easier to motivate since its deriva-
tion is totally consistent with NeymanPearson the-
ory. Other procedures such as Jeffreys or the mid-T
ClopperPearson interval become plausible competi-
tors whenever computer software can be substituted
for the possibility of hand derivation and computa-
tion.
Corcoran and Mehta take a rather extreme posi-
tion when they write, Simplicity and ease of com-
putation have no role to play in statistical practice
[italics ours]. We agree that the ability to perform
computations by hand should be of little, if any, rel-
evance in practice. But conceptual simplicity, parsi-
mony and consistency with general theory remain
important secondary conditions to choose among
procedures with acceptable coverage and precision.
These considerations will reappear in our discus-
sion of Santners BlythStill proposal. They also
leave us feeling somewhat ambivalent about the
boundary-modied procedures we have presented in
our Section 4.1. Agresti and Coull correctly imply
that other boundary corrections could have been
tried and that our choice is thus somewhat ad hoc.
(The correction to Wilson can perhaps be defended
on the principle of substituting a Poisson approx-
imation for a Gaussian one where the former is
clearly more accurate; but we see no such funda-
mental motivation for our correction to the Jeffreys
interval.)
C. Several discussants commented on the pre-
cision of various proposals in terms of expected
length of the resulting intervals. We strongly con-
cur that precision is the important balancing crite-
rion vis- a-vis coverage. We wish only to note that
there exist other measures of precision than inter-
val expected length. In particular, one may investi-
gate the probability of covering wrong values. In a
130 L. D. BROWN, T. T. CAI AND A. DASGUPTA
charming identity worth noting, Pratt (1961) shows
the connection of this approach to that of expected
length. Calculations on coverage of wrong values of
r in the binomial case will be presented in Das-
Gupta (2001). This article also discusses a number
of additional issues and presents further analytical
calculations, including a Pearson tilting similar to
the chi-square tilts advised in Hall (1983).
Corcoran and Mehtas Figure 2 compares average
length of three of our proposals with BlythStill and
with their likelihood ratio procedure. We note rst
that their LB procedure is not the same as ours.
Theirs is based on numerically computed exact per-
centiles of the xed sample likelihood ratio statistic.
We suspect this is roughly equivalent to adjustment
of the chi-squared percentile by a Bartlett correc-
tion. Ours is based on the traditional asymptotic
chi-squared formula for the distribution of the like-
lihood ratio statistic. Consequently, their procedure
has conservative coverage, whereas ours has cov-
erage uctuating around the nominal value. They
assert that the difference in expected length is neg-
ligible. How much difference qualies as negligible
is an arguable, subjective evaluation. But we note
that in their plot their intervals can be on aver-
age about 8% or 10% longer than Jeffreys or Wilson
intervals, respectively. This seems to us a nonneg-
ligible difference. Actually, we suspect their prefer-
ence for their LR and BSC intervals rests primarily
on their overriding preference for conservativity in
coverage whereas, as we have discussed above, our
intervals are designed to attain approximately the
desired nominal value.
D. Santner proposes an interesting variant of the
original BlythStill proposal. As we understand it,
he suggests producing nominal % intervals by con-
structing the

% BlythStill intervals, with

%
chosen so that the average coverage of the result-
ing intervals is approximately the nominal value,
%. The coverage plot for this procedure compares
well with that for Wilson or Jeffreys in our Figure 5.
Perhaps the expected interval length for this proce-
dure also compares well, although Santner does not
say so. However, we still do not favor his proposal.
It is conceptually more complicated and requires a
specially designed computer program, particularly if
one wishes to compute

% with any degree of accu-


racy. It thus fails with respect to the criterion of sci-
entic parsimony in relation to other proposals that
appear to have at least competitive performance
characteristics.
E. Casella suggests the possibility of perform-
ing a continuity correction on the score statistic
prior to constructing a condence interval. We do
not agree with this proposal from any perspec-
tive. These continuity-corrected Wilson intervals
have extremely conservative coverage properties,
though they may not in principle be guaranteed to
be everywhere conservative. But even if ones goal,
unlike ours, is to produce conservative intervals,
these intervals will be very inefcient at their nor-
mal level relative to BlythStill or even Clopper
Pearson. In Figure 1 below, we plot the coverage
of the Wilson interval with and without a conti-
nuity correction for n = 25 and = 0.05, and
the corresponding expected lengths. It is seems
clear that the loss in precision more than neutral-
izes the improvements in coverage and that the
nominal coverage of 95% is misleading from any
perspective.
Fig. 1. Comparison of the coverage probabilities and expected lengths of the Wilson (dotted) and continuity-corrected Wilson (solid)
intervals for n = 25 and = 0.05.
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 131
Fig. 2. Comparison of the systematic coverage biases. The -axis is nS
n
(r). From top to bottom: the systematic coverage biases of the
AgrestiCoull, Wilson, Jeffreys, likelihood ratio and Wald intervals, with n = 50 and = 0.05.
F. Agresti and Coull ask if the dismal perfor-
mance of the Wald interval manifests itself in
other problems, including nordiscrete cases. Indeed
it does. In other lattice cases such as the Poisson
and negative binomial, both the considerable neg-
ative coverage bias and inefciency in length per-
sist. These features also show up in some continu-
ous exponential family cases. See Brown, Cai and
DasGupta (2000b) for details.
In the three important discrete cases, the bino-
mial, Poisson and negative binomial, there is in fact
some conformity in regard to which methods work
well in general. Both the likelihood ratio interval
(using the asymptotic chi-squared limits) and the
equal-tailed Jeffreys interval perform admirably in
all of these problems with regard to coverage and
expected length. Perhaps there is an underlying the-
oretical reason for the parallel behavior of these
two intervals constructed from very different foun-
dational principles, and this seems worth further
study.
G. Some discussants very logically inquire about
the situation in the two-sample binomial situation.
Curiously, in a way, the Wald interval in the two-
sample case for the difference of proportions is less
problematic than in the one-sample case. It can
nevertheless be somewhat improved. Agresti and
Caffo (2000) present a proposal for this problem,
and Brown and Li (2001) discuss some others.
H. The discussion by Ghosh raises several inter-
esting issues. The denition of rst-order proba-
bility matching extends in the obvious way to any
set of upper condence limits; not just those cor-
responding to Bayesian intervals. There is also an
obvious extension to lower condence limits. This
probability matching is a one-sided criterion. Thus
a family of two-sided intervals |1
n
, U
n
| will be rst-
order probability matching if
T
r
(r 1
n
) = 2 o(n
12
) = T
r
(r U
n
).
As Ghosh notes, this denition cannot usefully
be literally applied to the binomial problem here,
because the asymptotic expansions always have a
discrete oscillation term that is O(n
12
). However,
one can correct the denition.
One way to do so involves writing asymptotic
expressions for the probabilities of interest that can
be divided into a smooth part, S, and an oscil-
lating part, Osc, that averages to O(n
32
) with
respect to any smooth density supported within (0,
1). Readers could consult BCD (2000a) for more
details about such expansions. Thus, in much gen-
erality one could write
T
r
(r 1
n
)
= 2 S
1
n
(r) O:c
1
n
(r) O(n
1
),
(1)
where S
1
n
(r) = O(n
12
), and O:c
1
n
(r) has the
property informally described above. We would then
say that the procedure is rst-order probability
matching if S
1
n
(r) = o(n
12
), with an analogous
expression for the upper limit, U
n
.
In this sense the equal-tail Jeffreys procedure
is probability matching. We believe that the mid-
T ClopperPearson intervals also have this asymp-
totic property. But several of the other proposals,
including the Wald, the Wilson and the likelihood
ratio intervals are not rst-order probability match-
ing. See Cai (2001) for exact and asymptotic calcula-
tions on one-sided condence intervals and hypoth-
esis testing in the discrete distributions.
132 L. D. BROWN, T. T. CAI AND A. DASGUPTA
The failure of this one-sided, rst-order property,
however, has no obvious bearing on the coverage
properties of the two-sided procedures considered
in the paper. That is because, for any of our proce-
dures,
S
1
n
(r) S
U
n
(r) = 0 O(n
1
), (2)
even when the individual terms on the left are only
O(n
12
). All the procedures thus make compensat-
ing one-sided errors, to O(n
1
), even when they are
not accurate to this degree as one-sided procedures.
This situation raises the question as to whether
it is desirable to add as a secondary criterion for
two-sided procedures that they also provide accu-
rate one-sided statements, at least to the probabil-
ity matching O(n
12
). While Ghosh argues strongly
for the probability matching property, his argument
does not seem to take into account the cancellation
inherent in (2). We have heard some others argue in
favor of such a requirement and some argue against
it. We do not wish to take a strong position on
this issue now. Perhaps it depends somewhat on the
practical contextif in that context the condence
bounds may be interpreted and used in a one-sided
fashion as well as the two-sided one, then perhaps
probability matching is called for.
I. Ghoshs comments are a reminder that asymp-
totic theory is useful for this problem, even though
exact calculations here are entirely feasible and con-
venient. But, as Ghosh notes, asymptotic expres-
sions can be startingly accurate for moderate
sample sizes. Asymptotics can thus provide valid
insights that are not easily drawn from a series of
exact calculations. For example, the two-sided inter-
vals also obey an expression analogous to (1),
T
r
(1
n
r U
n
) (3)
= 1 S
n
(r) O:c
n
(r) O(n
32
).
The term S
n
(r) is O(n
1
) and provides a useful
expression for the smooth center of the oscillatory
coverage plot. (See Theorem 6 of BCD (2000a) for
a precise justication.) The following plot for n =
50 compares S
n
(r) for ve condence procedures.
It shows how the Wilson, Jeffreys and chi-
squared likelihood ratio procedures all have cover-
age that well approximates the nominal value, with
Wilson being slightly more conservative than the
other two.
As we see it our article articulated three primary
goals: to demonstrate unambiguously that the Wald
interval performs extremely poorly; to point out that
none of the common prescriptions on when the inter-
val is satisfactory are correct and to put forward
some recommendations on what is to be used in its
place. On the basis of the discussion we feel gratied
that we have satisfactorily met the rst two of these
goals. As Professor Casella notes, the debate about
alternatives in this timeless problem will linger on,
as it should. We thank the discussants again for a
lucid and engaging discussion of a number of rel-
evant issues. We are grateful for the opportunity
to have learned so much from these distinguished
colleagues.
ADDITIONAL REFERENCES
Agresti, A. and Caffo, B. (2000). Simple and effective con-
dence intervals for proportions and differences of proportions
result from adding two successes and two failures. Amer.
Statist. 54. To appear.
Aitkin, M., Anderson, D., Francis, B. and Hinde, J. (1989).
Statistical Modelling in GLIM. Oxford Univ. Press.
Boos, D. D. and Hughes-Oliver, J. M. (2000). How large does n
have to be for Z and I intervals? Amer. Statist. 54 121128.
Brown, L. D., Cai, T. and DasGupta, A. (2000a). Condence
intervals for a binomial proportion and asymptotic expan-
sions. Ann. Statist. To appear.
Brown, L. D., Cai, T. and DasGupta, A. (2000b). Interval estima-
tion in exponential families. Technical report, Dept. Statis-
tics, Univ. Pennsylvania.
Brown, L. D. and Li, X. (2001). Condence intervals for
the difference of two binomial proportions. Unpublished
manuscript.
Cai, T. (2001). One-sided condence intervals and hypothesis
testing in discrete distributions. Preprint.
Coe, P. R. and Tamhane, A. C. (1993). Exact repeated condence
intervals for Bernoulli parameters in a group sequential clin-
ical trial. Controlled Clinical Trials 14 1929.
Cox, D. R. and Reid, N. (1987). Orthogonal parameters and
approximate conditional inference (with discussion). J. Roy.
Statist. Soc. Ser. B 49 113147.
DasGupta, A. (2001). Some further results in the binomial inter-
val estimation problem. Preprint.
Datta, G. S. and Ghosh, M. (1996). On the invariance of nonin-
formative priors. Ann. Statist. 24 141159.
Duffy, D. and Santner, T. J. (1987). Condence intervals for
a binomial parameter based on multistage tests. Biometrics
43 8194.
Fisher, R. A. (1956). Statistical Methods for Scientic Inference.
Oliver and Boyd, Edinburgh.
Gart, J. J. (1966). Alternative analyses of contingency tables. J.
Roy. Statist. Soc. Ser. B 28 164179.
Garvan, C. W. and Ghosh, M. (1997). Noninformative priors for
dispersion models. Biometrika 84 976982.
Ghosh, J. K. (1994). Higher Order Asymptotics. IMS, Hayward,
CA.
Hall, P. (1983). Chi-squared approximations to the distribution
of a sum of independent random variables. Ann. Statist. 11
10281036.
Jennison, C. and Turnbull, B. W. (1983). Condence intervals
for a binomial parameter following a multistage test with
application to MIL-STD 105D and medical trials. Techno-
metrics, 25 4958.
Jorgensen, B. (1997). The Theory of Dispersion Models. CRC
Chapman and Hall, London.
Laplace, P. S. (1812). Th eorie Analytique des Probabilit es.
Courcier, Paris.
Larson, H. J. (1974). Introduction to Probability Theory and Sta-
tistical Inference, 2nd ed. Wiley, New York.
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 133
Pratt, J. W. (1961). Length of condence intervals. J. Amer.
Statist. Assoc. 56 549567.
Rindskopf, D. (2000). Letter to the editor. Amer. Statist. 54 88.
Rubin, D. B. and Schenker, N. (1987). Logit-based interval esti-
mation for binomial data using the Jeffreys prior. Sociologi-
cal Methodology 17 131144.
Sterne, T. E. (1954). Some remarks on condence or ducial
limits. Biometrika 41 275278.
Tibshirani, R. (1989). Noninformative priors for one parameter
of many. Biometrika 76 604608.
Welch, B. L. and Peers, H. W. (1963). On formula for con-
dence points based on intergrals of weighted likelihoods. J.
Roy. Statist. Ser. B 25 318329.
Yamagami, S. and Santner, T. J. (1993). Invariant small sample
condence intervals for the difference of two success proba-
bilities. Comm. Statist. Simul. Comput. 22 3359.








Appendix 4.
B03002. HISPANIC OR LATINO ORIGIN BY RACE - Universe: TOTAL POPULATION
Data Set: 2005-2009 American Community Survey 5-Year Estimates
Survey: American Community Survey
NOTE. Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau's Population
Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities and towns and estimates of
housing units for states and counties.
For information on confidentiality protection, sampling error, nonsampling error, and definitions, see Survey Methodology.
Autauga County, Alabama
Estimate Margin of Error
Total: 49,584 *****
Not Hispanic or Latino: 48,572 *****
White alone 38,636 +/-43
Black or African American alone 8,827 +/-83
American Indian and Alaska Native alone 183 +/-89
Asian alone 309 +/-94
Native Hawaiian and Other Pacific Islander alone 0 +/-119
Some other race alone 56 +/-45
Two or more races: 561 +/-157
Two races including Some other race 0 +/-119
Two races excluding Some other race, and three or more races 561 +/-157
Hispanic or Latino: 1,012 *****
White alone 579 +/-115
Black or African American alone 27 +/-30
American Indian and Alaska Native alone 10 +/-17
Asian alone 0 +/-119
Native Hawaiian and Other Pacific Islander alone 0 +/-119
Some other race alone 280 +/-109
Two or more races: 116 +/-87
Two races including Some other race 56 +/-58
Two races excluding Some other race, and three or more races 60 +/-68
Source: U.S. Census Bureau, 2005-2009 American Community Survey
Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling
variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error
can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the
estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the
ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of
nonsampling error is not represented in these tables.
While the 2005-2009 American Community Survey (ACS) data generally reflect the November 2008 Office of Management and Budget
(OMB) definitions of metropolitan and micropolitan statistical areas; in certain instances the names, codes, and boundaries of the principal
cities shown in ACS tables may differ from the OMB definitions due to differences in the effective dates of the geographic entities.
Estimates of urban and rural population, housing units, and characteristics reflect boundaries of urban areas defined based on Census
2000 data. Boundaries for urban areas have not been updated since Census 2000. As a result, data for urban and rural areas from the
ACS do not necessarily reflect the results of ongoing urbanization.
Explanation of Symbols:
1. An '**' entry in the margin of error column indicates that either no sample observations or too few sample observations were available to
compute a standard error and thus the margin of error. A statistical test is not appropriate.
2. An '-' entry in the estimate column indicates that either no sample observations or too few sample observations were available to
compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or
upper interval of an open-ended distribution.
3. An '-' following a median estimate means the median falls in the lowest interval of an open-ended distribution.
4. An '+' following a median estimate means the median falls in the upper interval of an open-ended distribution.
5. An '***' entry in the margin of error column indicates that the median falls in the lowest interval or upper interval of an open-ended
distribution. A statistical test is not appropriate.
6. An '*****' entry in the margin of error column indicates that the estimate is controlled. A statistical test for sampling variability is not
appropriate.
Standard Error/Variance documentation for this dataset:
Accuracy of the Data
Main Search Feedback FAQs Glossary Site Map Help













Appendix 5











Chapter 13.
Preparation and Review of Data Products
13.1 OVERVIEW
This chapter discusses the data products derived from the American Community Survey (ACS).
ACS data products include the tables, reports, and files that contain estimates of population and
housing characteristics. These products cover geographic areas within the United States and
Puerto Rico. Tools such as the Public Use Microdata Sample (PUMS) files, which enable data users
to create their own estimates, also are data products.
ACS data products will continue to meet the traditional needs of those who used the decennial
census long-form sample estimates. However, as described in Chapter 14, Section 3, the ACS will
provide more current data products than those available from the census long form, an especially
important advantage toward the end of a decade.
Most surveys of the population provide sufficient samples to support the release of data products
only for the nation, the states, and, possibly, a few substate areas. Because the ACS is a very large
survey that collects data continuously in every county, products can be released for many types of
geographic areas, including many smaller geographic areas such as counties, townships, and cen-
sus tracts. For this reason, geography is an important topic for all ACS data products.
The first step in the preparation of a data product is defining the topics and characteristics it will
cover. Once the initial characteristics are determined, they must be reviewed by the Census
Bureau Disclosure Review Board (DRB) to ensure that individual responses will be kept confiden-
tial. Based on this review, the specifications of the products may be revised. The DRB also may
require that the microdata files be altered in certain ways, and may restrict the population size of
the geographic areas for which these estimates are published. These activities are collectively
referred to as disclosure avoidance.
The actual processing of the data products cannot begin until all response records for a given year
or years are edited and imputed in the data preparation and processing phases, the final weights
are determined, and disclosure avoidance techniques are applied. Using the weights, the sample
data are tabulated for a wide variety of characteristics according to the predetermined content.
These tabulations are done for the geographic areas that have a sample size sufficient to support
statistically reliable estimates, with the exception of 5-year period estimates, which will be avail-
able for small geographic areas down to the census tract and block group levels. The PUMS data
files are created by different processes because the data are a subset of the full sample data.
After the estimates are produced and verified for correctness, Census Bureau subject matter ana-
lysts review them. When the estimates have passed the final review, they are released to the pub-
lic. A similar process of review and public release is followed for PUMS data.
While the 2005 ACS sample was limited to the housing unit (HU) population for the United States
and Puerto Rico, starting in sample year 2006, the ACS was expanded to include the group quar-
ters (GQ) population. Therefore, the ACS sample is representative of the entire resident population
in the United States and Puerto Rico. In 2007, 1-year period estimates for the total population and
subgroups of the total population in both the United States and Puerto Rico were released for
sample year 2006. Similarly, in 2008, 1-year period estimates were released for sample year 2007.
In 2008, the Census Bureau will, for the first time, release products based on 3 years of ACS
sample, 2005 through 2007. In 2010, the Census Bureau plans to release the first products based
on 5 years of consecutive ACS samples, 2005 through 2009. Since several years of samples form
the basis of these multiyear products, reliable estimates can be released for much smaller geo-
graphic areas than is possible for products based on single-year data.
Preparation and Review of Data Products 131 ACS Design and Methodology
U.S. Census Bureau
In addition to data products regularly released to the public, other data products may be
requested by government agencies, private organizations and businesses, or individuals. To
accommodate such requests, the Census Bureau operates a custom tabulations program for the
ACS on a fee basis. These tabulation requests are reviewed by the DRB to assure protection of
confidentiality before release.
Chapter 14 describes the dissemination of the data products discussed in this chapter, including
display of products on the Census Bureaus Web site and topics related to data file formatting.
13.2 GEOGRAPHY
The Census Bureau strives to provide products for the geographic areas that are most useful to
users of those data. For example, ACS data products are already disseminated for many of the
nations legal and administrative entities, including states, American Indian and Alaska Native
(AIAN) areas, counties, minor civil divisions (MCDs), incorporated places, congressional districts,
as well as data for a variety of other geographic entities. In cooperation with state and local agen-
cies, the Census Bureau identifies and delineates geographic entities referred to as statistical
areas. These include regions, divisions, urban areas (UAs), census county divisions (CCDs), cen-
sus designated places (CDPs), census tracts, and block groups. Data users then can select the geo-
graphic entity or set of entities that most closely represent their geographic areas of interest and
needs.
Geographic summary level is a term used by the Census Bureau to designate the different geo-
graphic levels or types of geographic areas for which data are summarized. Examples include the
entities described above, such as states, counties, and places (the Census Bureaus term for enti-
ties such as for cities and towns, including unincorporated areas). Information on the types of
geographic areas for which the Census Bureau publishes data is available at
<http://www.census.gov/geo/www/garm.html>.
Single-year period estimates of ACS data are published annually for recognized legal, administra-
tive, or statistical areas with populations of 65,000 or more (based on the latest Census Bureau
population estimates). Three-year period estimates based on 3 successive years of ACS samples
are published for areas of 20,000 or more. If a geographic area met the 1-year or 3-year threshold
for a previous period but dropped below it for the current period, it will continue to be published
as long as the population does not drop more than 5 percent below the threshold. Plans are to
publish 5-year period estimates based on 5 successive years of ACS samples starting in 2010 with
the 20052009 data. Multiyear period estimates based on 5 successive years of ACS samples will
be published for all legal, administrative, and statistical areas down to the block-group level,
regardless of population size. However, there are rules from the Census Bureaus DRB that must be
applied.
The Puerto Rico Community Survey (PRCS) also provides estimates for legal, administrative, and
statistical areas in Puerto Rico. The same rules as described above for the 1-year, 3-year, and
5-year period estimates for the U.S resident population apply for the PRCS as well.
The ACS publishes annual estimates for hundreds of substate areas, many of which will undergo
boundary changes due to annexations, detachments, or mergers with other areas.
1
Each year, the
Census Bureaus Geography Division, working with state and local governments, updates its files
to reflect these boundary changes. Minor corrections to the location of boundaries also can occur
as a result of the Census Bureaus ongoing Master Address File (MAF)/Topologically Integrated
Geographic Encoding and Referencing (TIGER) Enhancement Project. The ACS estimates must
1
The Census Bureau conducts the Boundary and Annexation Survey (BAS) each year. This survey collects infor-
mation on a voluntary basis from local governments and federally recognized American Indian areas. The
information collected includes the correct legal place names, type of government, legal actions that resulted in
boundary changes, and up-to-date boundary information. The BAS uses a fixed reference date of January 1 of
the BAS year. In years ending in 8, 9, and 0, all incorporated places, all minor civil divisions, and all federally
recognized tribal governments are included in the survey. In other years, only governments at or above vari-
ous population thresholds are contacted. More detailed information on the BAS can be found at
<http://www.census .gov/geo/www/bas/bashome.html>.
132 Preparation and Review of Data Products ACS Design and Methodology
U.S. Census Bureau
reflect these legal boundary changes, so all estimates are based on Geography Division files that
show the geographic boundaries as they existed on January 1 of the sample year or, in the case of
multiyear data products, at the beginning of the final year of data collection.
13.3 DEFINING THE DATA PRODUCTS
For the 1999 through 2002 sample years, the ACS detailed tables were designed to be compa-
rable with Census 2000 Summary File 3 to allow comparisons between data from Census 2000
and the ACS. However, when Census 2000 data users indicated certain changes they wanted in
many tables, ACS managers saw the years 2003 and 2004 as opportunities to define ACS prod-
ucts based on users advice.
Once a preliminary version of the revised suite of products had been developed, the Census
Bureau asked for feedback on the planned changes from data users (including other federal agen-
cies) via a Federal Register Notice (Fed. Reg. #3510-07-P). The notice requested comments on cur-
rent and proposed new products, particularly on the basic concept of the product and its useful-
ness to the data users. Data users provided a wide variety of comments, leading to modifications
of planned products.
ACS managers determined the exact form of the new products in time for their use in 2005 for the
ACS data release of sample year 2004. This schedule allowed users sufficient time to become
familiar with the new products and to provide comments well in advance of the data release for
the 2005 sample.
Similarly, a Federal Register Notice issued in August 2007 shared with the public plans for the
data release schedule and products that would be available beginning in 2008. This notice was
the first that described products for multiyear estimates. Improvements will continue when multi-
year period estimates are available.
13.4 DESCRIPTION OF AGGREGATED DATA PRODUCTS
ACS data products can be divided into two broad categories: aggregated data products, and the
PUMS, which is described in Section 13.5 (Public Use Microdata Sample).
Data for the ACS are collected from a sample of housing units (HUs), as well as the GQ population,
and are used to produce estimates of the actual figures that would have been obtained by inter-
viewing the entire population. The aggregated data products contain the estimates from the sur-
vey responses. Each estimate is created using the sample weights from respondent records that
meet certain criteria. For example, the 2007 ACS estimate of people under the age of 18 in
Chicago is calculated by adding the weights from all respondent records from interviews com-
pleted in 2007 in Chicago with residents under 18 years old.
This section provides a description of each aggregated product. Each product described is avail-
able as single-year period estimates; unless otherwise indicated, they will be available as 3-year
estimates and are planned for the 5-year estimates. Chapter 14 provides more detail on the actual
appearance and content of each product.
These data products contain all estimates planned for release each year, including those from mul-
tiple years of data, such as the 20052007 products. Data release rules will prevent certain
single- and 3-year period estimates from being released if they do not meet ACS requirements for
statistical reliability.
Detailed Tables
The detailed tables provide basic distributions of characteristics. They are the foundation upon
which other data products are built. These tables display estimates and the associated lower and
upper bounds of the 90 percent confidence interval. They include demographic, social, economic,
and housing characteristics, and provide 1-, 3-, or 5-year period estimates for the nation and the
states, as well as for counties, towns, and other small geographic entities, such as census tracts
and block groups.
Preparation and Review of Data Products 133 ACS Design and Methodology
U.S. Census Bureau
The Census Bureaus goal is to maintain a high degree of comparability between ACS detailed
tables and Census 2000 sample-based data products. In addition, characteristics not measured in
the Census 2000 tables will be included in the new ACS base tables. The 2007 detailed table prod-
ucts include more than almost 600 tables that cover a wide variety of characteristics, and another
380 race and Hispanic-origin iterations that cover 40 key characteristics. In addition to the tables
on characteristics, approximately 80 tables summarize allocation rates from the data edits for
many of the characteristics. These provide measures of data quality by showing the extent to
which responses to various questionnaire items were complete. Altogether, over 1,300 separate
detailed tables are provided.
Data Profiles
Data profiles are high-level reports containing estimates for demographic, social, economic, and
housing characteristics. For a given geographic area, the data profiles include distributions for
such characteristics as sex, age, type of household, race and Hispanic origin, school enrollment,
educational attainment, disability status, veteran status, language spoken at home, ancestry,
income, poverty, physical housing characteristics, occupancy and owner/renter status, and hous-
ing value. The data profiles include a 90 percent margin of error for each estimate. Beginning with
the 2007 ACS, a comparison profile that compares the 2007 sample years estimates with those of
the 2006 ACS also will be published. These profile reports include the results of a statistical sig-
nificance test for each previous years estimate, compared to the current year. This test result indi-
cates whether the previous years estimate is significantly different (at a 90 percent confidence
level) from that of the current year.
Narrative Profiles
Narrative profiles cover the current sample year only. These are easy-to-read, computer-produced
profiles that describe main topics from the data profiles for the general-purpose user. These are
the only ACS products with no standard errors accompanying the estimates.
Subject Tables
These tables are similar to the Census 2000 quick tables, and like them, are derived from detailed
tables. Both quick tables and subject tables are predefined, covering frequently requested infor-
mation on a single topic for a single geographic area. However, subject tables contain more detail
than the Census 2000 quick tables or the ACS data profiles. In general, a subject table contains
distributions for a few key universes, such as the race groups and people in various age groups,
which are relevant to the topic of the table. The estimates for these universes are displayed as
whole numbers. The distribution that follows is displayed in percentages. For example, subject
table S1501 on educational attainment provides the estimates for two different age groups18 to
24 years old and 25 years and older, as a whole number. For each age group, these estimates are
followed by the percentages of people in different educational attainment categories (high school
graduate, college undergraduate degree, etc.). Subject tables also contain other measures, such as
medians, and they include the imputation rates for relevant characteristics. More than 40 topic-
specific subject tables are released each year.
Ranking Products
Ranking products contain ranked results of many important measures across states. They are pro-
duced as 1-year products only, based on the current sample year. The ranked results among the
states for each measure are displayed in three wayscharts, tables, and tabular displays that
allow for testing statistical significance.
The rankings show approximately 80 selected measures. The data used in ranking products are
pulled directly from a detailed table or a data profile for each state.
Geographic Comparison Tables (GCTs)
GCTs contain the same measures that appear in the ranking products. They are produced as both
1-year and multiyear products. GCTs are produced for states as well as for substate entities, such
as congressional districts. The results among the geographic entities for each measure are dis-
played as tables and thematic maps (see next).
134 Preparation and Review of Data Products ACS Design and Methodology
U.S. Census Bureau
Thematic Maps
Thematic maps are similar to ranking tables. They show mapped values for geographic areas at a
given geographic summary level. They have the added advantage of visually displaying the geo-
graphic variation of key characteristics (referred to as themes). An example of a thematic map
would be a map showing the percentage of a population 65 years and older by state.
Selected Population Profiles (SPPs)
SPPs provide certain characteristics from the data profiles for a specific race or ethnic group (e.g.,
Alaska Natives) or some other selected population group (e.g., people aged 60 years and older).
SPPs are provided every year for many of the Census 2000 Summary File 4 iteration groups. SPPs
were introduced on a limited basis in the fall of 2005, using the 2004 sample. In 2008 (sample
year 2007), this product was significantly expanded. The earlier SPP requirement was that a sub-
state geographic area must have a population of at least 1,000,000 people. This threshold was
reduced to 500,000, and congressional districts were added to the list of geographic types that
can receive SPPs. Another change to SPPs in 2008 is the addition of many country-of-birth groups.
Groups too small to warrant an SPP for a geographic area based on 1 year of sample data may
appear in an SPP based on the 3- or 5-year accumulations of sample data. More details on these
profiles can be found in Hillmer (2005), which includes a list of selected race, Hispanic origin, and
ancestry populations.
13.5 PUBLIC USE MICRODATA SAMPLE
Microdata are the individual records that contain information collected about each person and HU.
PUMS files are extracts from the confidential microdata that avoid disclosure of information about
households or individuals. These extracts cover all of the same characteristics contained in the
full microdata sample files. Chapter 14 provides information on data and file organization for the
PUMS.
The only geography other than state shown on a PUMS file is the Public Use Microdata Area
(PUMA). PUMAs are special nonoverlapping areas that partition a state, each containing a popula-
tion of about 100,000. State governments drew the PUMA boundaries at the time of Census 2000.
They were used for the Census 2000 sample PUMS files and are known as the 5 percent PUMAs.
(For more information on these geographic areas, go to <http://www.census.gov/prod/cen2000
/doc/pums.pdf>.)
The Census Bureau has released a 1-year PUMS file from the ACS since the surveys inception. In
addition to the 1-year ACS PUMS file, the Census Bureau plans to create multiyear PUMS files from
the ACS sample, starting with the 20052007 3-year PUMS file. The multiyear PUMS files combine
annual PUMS files to create larger samples in each PUMA, covering a longer period of time. This
will allow users to create estimates that are more statistically reliable.
13.6 GENERATION OF DATA PRODUCTS
Following conversations with users of census data, the subject matter analysts in the Census
Bureaus Housing and Household Economic Statistics Division and Population Division specify the
organization of the ACS data products. These specifications include the logic used to calculate
every estimate in each data product and the exact textual description associated with each esti-
mate. Starting with the 2006 ACS data release, only limited changes to these specifications have
occurred. Changes to the data product specifications must preserve the ability to compare esti-
mates from one year to another and must be operationally feasible. Changes must be made no
later than late winter of each year to ensure that the revised specifications are finalized by the
spring of that year and ready for the data releases beginning in the late summer of the year.
After the edited data with the final weights are available (see Chapters 10 and 11), generation of
the data products begins with the creation of the detailed tables data products with the 1-year
period estimates. The programming teams of the American Community Survey Office (ACSO) gen-
erate these estimates. Another staff within ACSO verifies that the estimates comply with the speci-
fications from subject matter analysts. Both the generation and the verification activities are auto-
mated.
Preparation and Review of Data Products 135 ACS Design and Methodology
U.S. Census Bureau
The 1-year data products are released on a phased schedule starting in the summer. Currently, the
Census Bureau plans to release the multiyear data products late each year, after the release of the
1-year products.
One distinguishing feature of the ACS data products system is that standard errors are calculated
for all estimates and are released with the latter in tables. Subject matter analysts also use the
standard errors in their internal reviews of estimates.
Disclosure Avoidance
Once plans are finalized for the ACS data products, the DRB reviews them to assure that confiden-
tiality of respondents has been protected.
Title 13 of the United States Code (U.S.C.) is the basis for the Census Bureaus policies on disclo-
sure avoidance. Title 13 says, Neither the Secretary, nor any other officer or employee of the
Department of Commerce may make any publication whereby the data furnished by any particular
establishment or individual under this title can be identified . . . The DRB reviews all data prod-
ucts planned for public release to ensure adherence to Title 13 requirements, and may insist on
applying disclosure avoidance rules that could result in the suppression of certain measures for
small geographic areas. (More information about the DRB and its policies can be found at
<http://www.factfinder.census.gov/jsp/saff/SAFFInfo.jsp?_pageId=su5_confidentiality>.
To satisfy Title 13 U.S.C., the Census Bureau uses several statistical methodologies during tabula-
tion and data review to ensure that individually identifiable data will not be released.
Swapping. The main procedure used for protecting Census 2000 tabulations was data swap-
ping. It was applied to both short-form (100 percent) and long-form (sample) data indepen-
dently. Currently, it also is used to protect ACS tabulations. In each case, a small percentage of
household records is swapped. Pairs of households in different geographic regions are
swapped. The selection process for deciding which households should be swapped is highly
targeted to affect the records with the most disclosure risk. Pairs of households that are
swapped match on a minimal set of demographic variables. All data products (tables and
microdata) are created from the swapped data files.
For PUMS data the following techniques are employed in addition to swapping:
Top-coding is a method of disclosure avoidance in which all cases in or above a certain per-
centage of the distribution are placed into a single category.
Geographic population thresholds prohibit the disclosure of data for individuals or HUs for
geographic units with population counts below a specified level.
Age perturbation (modifying the age of household members) is required for large households
containing 10 people or more due to concerns about confidentiality.
Detail for categorical variables is collapsed if the number of occurrences in each category
does not meet a specified national minimum threshold.
For more information on disclosure avoidance techniques, see Section 5, Current disclosure
avoidance practices at <http://www.census.gov/srd/papers/pdf/rrs2005-06.pdf>.
The DRB also may determine that certain tables are so detailed that other restrictions are required
to ensure that there is sufficient sample to avoid revealing information on individual respondents.
In such instances, a restriction may be placed on the size of the geographic area for which the
table can be published. Current DRB rules require that detailed tables containing more than 100
detailed cells may not be released below the census tract level.
The data products released in the summer of 2006 for the 2005 sample covered the HU popula-
tion of the United States and Puerto Rico only. In January 2006, data collection began for the
population living in GQ facilities. Thus, the data products released in summer 2007 (and each year
136 Preparation and Review of Data Products ACS Design and Methodology
U.S. Census Bureau
thereafter) covered the entire resident population of the United States and Puerto Rico. Most esti-
mates for person characteristics covered in the data products were affected by this expansion. For
the most part, the actual characteristics remained the same, and only the description of the popu-
lation group changed from HU to resident population.
Data Release Rules
Even with the population size thresholds described earlier, in certain geographic areas some very
detailed tables might include estimates with unacceptable reliability. Data release rules, based on
the statistical reliability of the survey estimates, were first applied in the 2005 ACS. These release
rules apply only to the 1- and 3-year data products.
The main data release rule for the ACS tables works as follows. Every detailed table consists of a
series of estimates. Each estimate is subject to sampling variability that can be summarized by its
standard error. If more than half of the estimates in the table are not statistically different from 0
(at a 90 percent confidence level), then the table fails. Dividing the standard error by the estimate
yields the coefficient of variation (CV) for each estimate. (If the estimate is 0, a CV of 100 percent
is assigned.) To implement this requirement for each table at a given geographic area, CVs are cal-
culated for each tables estimates, and the median CV value is determined. If the median CV value
for the table is less than or equal to 61 percent, the table passes for that geographic area and is
published; if it is greater than 61 percent, the table fails and is not published.
Whenever a table fails, a simpler table that collapses some of the detailed lines together can be
substituted for the original. If the simpler table passes, it is released. If it fails, none of the esti-
mates for that table and geographic area are released. These release rules are applied to single-
and multiyear period estimates based on 3 years of sample data. Current plans are not to apply
data release rules to the estimates based on 5 years of sample data.
13.7 DATA REVIEW AND ACCEPTANCE
After the editing, imputation, data products generation, disclosure avoidance, and application of
the release rules have been completed, subject matter analysts perform a final review of the ACS
data and estimates before release. This final data review and acceptance process helps to ensure
that there are no missing values, obvious errors, or other data anomalies.
Each year, the ACS staff and subject matter analysts generate, review, and provide clearance of all
ACS estimates. At a minimum, the analysts subject their data to a specific multistep review pro-
cess before they are cleared and released to the public. Because of the short time available to
review such a large amount of data, an automated review tool (ART) has been developed to facili-
tate the process.
ART is a computer application that enables subject matter analysts to detect statistically signifi-
cant differences in estimates from one year to the next using several statistical tests. The initial
version of ART was used to review 2003 and 2004 data. It featured predesigned reports as well as
ad hoc, user-defined queries for hundreds of estimates and for 350 geographic areas. An ART
workgroup defined a new version of ART to address several issues that emerged. The improved
version has been used by the analysts since June 2005; it is designed to work on much larger data
sets and a wider range of capabilities, with faster response time to user commands. A team of
programmers, analysts, and statisticians then developed an automated tool to assist analysts in
their review of the multiyear estimates. This tool was used in 2008 for the review of the
20052007 estimates.
The ACSO staff, together with the subject matter analysts, also have developed two other auto-
mated tools to facilitate documentation and clearance for required data review process steps: the
edit management and messaging application (EMMA), and the PUMS management and messaging
application (PMMA). Both are used to track the progress of analysts review activities and both
enable analysts and managers to see the current status of files under review and determine which
review steps can be initiated.
Preparation and Review of Data Products 137 ACS Design and Methodology
U.S. Census Bureau
13.8 IMPORTANT NOTES ON MULTIYEAR ESTIMATES
While the types of data products for the multiyear estimates are almost entirely identical to those
used for the 1-year estimates, there are several distinctive features of the multiyear estimates that
data users must bear in mind.
First, the geographic boundaries that are used for multiyear estimates are always the boundary as
of January 1 of the final year of the period. Therefore, if a geographic area has gained or lost terri-
tory during the multiyear period, this practice can have a bearing on the users interpretation of
the estimates for that geographic area.
Secondly, for multiyear period estimates based on monetary characteristics (for example, median
earnings), inflation factors are applied to the data to create estimates that reflect the dollar values
in the final year of the multiyear period.
Finally, although the Census Bureau tries to minimize the changes to the ACS questionnaire, these
changes will occur from time to time. Changes to a question can result in the inability to build cer-
tain estimates for a multiyear period containing the year in which the question was changed. In
addition, if a new question is introduced during the multiyear period, it may be impossible to
make estimates of characteristics related to the new question for the multiyear period.
13.9 CUSTOM DATA PRODUCTS
The Census Bureau offers a wide variety of general-purpose data products from the ACS designed
to meet the needs of the majority of data users. They contain predefined sets of data for standard
census geographic areas. For users whose data needs are not met by the general-purpose prod-
ucts, the Census Bureau offers customized special tabulations on a cost-reimbursable basis
through the ACS custom tabulation program. Custom tabulations are created by tabulating data
from ACS edited and weighted data files. These projects vary in size, complexity, and cost,
depending on the needs of the sponsoring client.
Each custom tabulation request is reviewed in advance by the DRB to ensure that confidentiality is
protected. The requestor may be required to modify the original request to meet disclosure avoid-
ance requirements. For more detailed information on the ACS Custom Tabulations program, go to
<http://www.census.gov/acs/www/Products/spec_tabs/index.htm>.
138 Preparation and Review of Data Products ACS Design and Methodology
U.S. Census Bureau

You might also like