Professional Documents
Culture Documents
JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We enable the
scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that
promotes the discovery and use of these resources. For more information about JSTOR, please contact support@jstor.org.
http://www.jstor.org
AMERICAN SOCIOLOGICALREVIEW
738
IS IT OUTLIERDELETIONOR IS IT SAMPLE
TRUNCATION?NOTES ON SCIENCEAND
SEXUALITY*
(Reply to Kahn and Udry, ASR, October, 1986)
GUILLERMINAJASSO
Universityof Minnesota
numerical results which appear, variously, "intuitively appealing" or "appealingly counterintuitive" or, conversely, may, like Kahn and Udry, be
"troubled by . . . substantive findings." Relatively objective criteriasuch as those embodied in
the classical properties-of-estimatorsliteratureoperate to guard against too heavy a reliance on
subjectivejudgments.
Questions and Definitions
COMMENTS
exactly linearly dependent regressors produces at
most two estimatesjointly containingthe effects of
the three elements of the trio; hence, such
estimates may be interpreted as indicating the
corresponding effects only if zero-restrictionassumptions are imposed, that is, only if it is
reasonableto argue that one of the three elements
in the trio has no effect on the dependentvariable.
Thus, due to the constraintscurrentlyimposed by
the phenomena, combined with the associated
constraintsof ordinarylanguage and of statistics,
interpretationof the results of demographicstudies
before Jasso (1985) is not always unambiguous.
Jasso's procedureexplicitly eschews zero restrictions; the estimates are obtainedby (i) specifying a
fixed-effects model so as to separate the timevarying from the time-invariant(including cohort)
factors, and (ii) using nonlineartransformationsin
orderto breakthe exact linear relationsinduced by
the fixed-effects model in the time-varyingfactors
measured along the time dimension. Therefore,
Kahn and Udry's claim that Jasso's results
"contradictall previousresearch" would appearto
be somewhat exaggerated.
Further, it is noteworthy that the prevailing
conjecture in the fields of medicine and allied
clinical specialties, as among poets, novelists, and
confessors-and observe that clinicians, poets,
novelists, and confessors attend to the behavior of
particularindividualsover long periods of time-is
exactly consistent with Jasso's estimates.' In brief,
the effects of age (pure age, not the passage of
time) on sexual responsiveness are believed to be
nonmonotonic (initially positive, subsequently
negative) for both males and females; the curves
differ, however, in both the shape and the timing
of the maximum, the latterbelieved to occur rather
early in the male (before age twenty) and rather
late in the female (possibly as late as age forty).
Discussion and some graphic quantification are
provided in Kinsey et al. (1948, 1953), Boalt
(1978), Gebhard(1978), and Kaplan (1986). Note
that, even if this view is correct, it is not
inconsistent with a process in which female
physical decline, though similar to the male's, is
masked by diminution of sexual inhibitions
(Gebhard 1978) and/or increased self-awareness
(Kaplan 1986) and contraceptivecompetence; nor
is it inconsistent with different processes in
different societies and/orhistorical periods.
With respect to the effects of marital duration,
Jasso's results are indeed negative and strongly
statistically significant and thus, contraryto Kahn
1 Serious issues would be raised for the sociology of
science were it the case that individuals who choose
medicine as a vocation and individuals who choose
demographyas a vocation differ systematically in their
basic views of human nature.
739
and Udry's reading, do not contradictthe previous
work.
ESTIMATIONPROCEDUREI: OUTLIERS
Outliers and Influentials:Diagnostics and
Treatment
What is an outlier? How can one tell whether an
outlier (or, for that matter, any other observation)
is influential?and what does one do about it? The
problem arises because ordinary least squares,
being a solution which minimizes the sum of
squared residuals, thereby allows observations
with large residuals to be relatively more influential in the estimate of the slope. It is importantto
note that an outlier-which Beckman and Cook
(1983, p. 121) and Barnett(1983, p. 150) describe
as an observation "that appears surprising or
discrepantto the investigator"2-is not necessarily
influential and as well that inliers may be
influential. Contraryto Kahn and Udry, the degree
of influence depends on the residualand not on the
value of a single measuredvariable. Detecting the
presence of influentials is not as easy as noticing
outliers, in part because "influence" has.meaning
only relative to a model, in part because detection
is an iterative process, each perturbationpotentially generating a new set of influentials, and in
part because the available formal diagnostics,
numerical(of which Hocking (1983) lists nine) as
well as graphical, are sensitive to a variety of
assumptions (e.g., about the number of outliers
present in the data), appear to target different
aspects of the data, and are subject to "masking"
and "swamping" processes (whereby some
noninfluentialsare falsely identified as influentials
and some influentials are falsely absolved). (See
Belsley et al. 1980; Cook and Weisberg 1982;
Beckman and Cook 1983; Barnett 1983; Hocking
1983; Weisberg 1983.)
Even if diagnostics unambiguosly indicate the
presence of influential, the appropriatecourse of
action is far from clear; alternatives include
retention, downweighting, and, as Barnett (1983,.
p. 150) puts it, "the extreme resort of rejection."
Given that, as Beckman and Cook (1983, p. 145)
point out, "Little informationis available on the
performanceof the usual least squares estimators
in combination with rejection via formal tests,"
statisticians' misgivings about deletion (e.g.,
Freedman et al. (1978); Beckman and Cook
(1983); Prescott (1983); Hocking (1983); Zellner
(1984)) would appearreasonable.3Indeed, Belsley
2
The subjectivityof outlier identificationand rejection
is highlighted in Collett and Lewis (1976), who
document both between- and within-individualvariation
in judgments concerning outliers and investigate the
determinantsof the decision that a particularobservation
is an outlier.
3 See also Bollen and Jackman(1985, p. 538).
740
et al. (1980, 16) caution: "It should be obvious
that an influential point is legitimately deleted
altogetheronly if, once identified, it can be shown
to be uncorrectablyin error. Often no action is
warranted,and when it is, the appropriateaction is
usually more subtle than simple deletion."
OutlierDeletion: Induced Sample Truncation?
Considernow the Kahn and Udry procedure.They
begin by assertingcertitudethat the four rightmost
values of CF are the result of keypunchingerrors
and, based on the results from two diagnostics,
proceed to obtain two sets of estimates, labeled (3)
and (4) in their table, in which they eliminate from
the sample, first, those four cases, and, second, an
additionalfour observations.4To learn what effect,
if any, these two procedureshave on the properties
of the obtained estimates, we write the estimating
equation in two equivalent ways:
AMERICAN SOCIOLOGICALREVIEW
(of which valuable exposition is found in Maddala
(1983)), arises as follows: When the sample
contains only cases in which the dependent
variable is confined to a specified segment of its
range, there arises a correlationbetween the error
term and the regressors. In the presence of such
correlation, OLS estimates are biased. Therefore,
if the outlying observationsare correct, then all of
Kahn and Udry's specifications yield biased
estimates of the underlyingparameters.
OutlierRetention:Errorsin Variables?
COMMENTS
1953). Fourth, it is widely believed that there are
also quite large culturallregionaldifferences (Ford
and Beach 1951);7 a probability sample from a
nation of immigrants will thus exhibit this
additional source of variation. Fifth, a figure of
"88" represents a mean weekly figure of 22,
which, in turn, representsthreedaily plus a weekly
lagniappe-well within the bounds of the Aranda
regimen and perhaps not warranting explosion
from the overworld.8
ESTIMATIONPROCEDUREII: SAMPLE
PARTITION
Kahn and Udry propose that the determinationof
coital frequencydiffers systematicallybetween two
subsets of the population, those couples married
less thantwo years and those couples marriedmore
than two years. Space does not permit analysis of
their rationale-of why marital duration (rather
than, say, an exogeneous variable or several
variables)was selected as the criterionvariablefor
the existence of different parameterregimes or of
why they assumed discontinuous regimes or of
why they fixed the numberof regimes at two or of
why they pre-selected the cutoff point or of why
thatpre-selectedpoint was set at two (the statistical
issues and formal tests associated with such
decisions are reviewed in Judge et al.'s (1985)
discussion of varying parametermodels). InsteadI
show that their rationale would lead to equations
differentfrom the ones they estimate, that, in fact,
their specifications (5) and (6) embody a rather
different view one for which they provide no
rationale.
To see this, note that Kahn and Udry's view that
the process of CF determinationdiffers between
couples married less than two years and couples
married more than two years would lead to
partitioning the sample by marital duration; one
subset would contain only observations in which
maritaldurationis less than two, the other subset
only observations in which marital duration
exceeds two. But what they do is very different.
Presumablydue to unwillingness to relinquishthe
differential-interceptmodel (for good reason, in
my view), they partitionthe sample not by marital
duration but rather by date of marriage. Both
subsets contain observations of marital duration
larger than two; they differ only with respect to
date of marriage. Thus, the specifications they
estimate are based on the implicit argumentthatCF
determinationdiffers systematically between two
groups of couples, those who marriedbefore 1968
7 For example, the normal regimen of the Keraki is
once a week, while that of the younger Arandais three to
five times nightly (Ford and Beach 1951).
8 This regimen sets the stage for Graham Green's
classic seduction in his short story, "Chagrin in Three
Parts."
741
and those who married after 1968. I will not
presume to make such an argument. However,
observe that (ignoring truncationbias) interpretation of the estimates obtained from Kahn and
Udry's specifications (5) and (6), given that no
attempt was made to learn the regime-defining
variables and cutoff points empirically, hinges on
the reasonablenessof the assumptionthat couples
who chose to marry before 1968 differ in
fundamental ways from couples who chose to
marryafter 1968.
ESTIMATIONPROCEDUREIII: THE PERIOD
EFFECT
There is a straightforwardway to check whether
the period effects reportedin Jasso (1985) embody
a month effect: in equation (L.a) replace the
decimal-year date-of-observationvariable with an
integer-yearvariable; or in equation (L.b) replace
the (variable) time between interviews by the
(constant)numberof years between waves.9 Early
versions of Jasso (1985) had integer-year in the
specifications;its effect in the equationscorresponding to the six published specifications ranges from
-0.23 to -0.22 in the equations with the single
passage-of-time factor and from -0.72 to -0.70
in the equations with the decomposition-compared to estimates of -0.24 to -0.22 and -0.75
to -0.72, respectively, obtained with the more
refined coding. This almost perfect stability
indicates that Jasso's estimated period effects are
not capturing a month effect but rather the
contemporaneousinfluences associated with each
of the two survey periods approximatelyfive years
apart.
REFERENCES
Barnett, Vic. 1983. "Discussion of Beckman and
Cook." Technometrics25:150-52.
Beckman, R.J., and R.D. Cook. 1983. "Outlier......
.
s." Technometrics25:119-49.
Belsley, David A., Edwin Kuh, and Roy E. Welsch.
1980. Regression Diagnostics: Identifying Influential
Data and Sources of Collinearity. New York: Wiley.
Boalt, GunnarH.R. "Family and Marriage." 1978. Pp.
155-61 in The New Encyclopedia Britannica,
Macropaedia 7. Fifteenth edition. Chicago: Britannica.
Blalock, Hubert M., Jr. 1966. "The Identification
Problem and Theory Building: the Case of Status
Inconsistency."AmericanSociologicalReview31:52-61.
Bollen, Kenneth A., and Robert W. Jackman. 1985.
"Regression Diagnostics: An ExpositoryTreatmentof
9 An equivalent procedure is to estimate the firstdifferencesequationomitting the period term and without
suppressing the constant. The estimated intercept then
representsthe time trend;it is equal to the productof the
coefficient of the integer-yearvariableand a factor equal
to the numberof years between waves.
742
Outliers and InfluentialCases." Sociological Methods
and Research 13:510-42.
Collett, D., and T. Lewis. 1976. "The SubjectiveNature
of Outlier Rejection Procedures." Applied Statistics
25:228-237.
Cook, R. Dennis, and Sanford Weisberg. 1982.
Residuals and Influence in Regression. New York:
Chapmanand Hall.
Duncan, Otis Dudley. 1966. "Methodological Issues in
the Analysis of Social Mobility." Pp. 90-95 in Social
Structure and Mobility in Economic Development,
edited by Neil J. Smelser and Seymour M. Lipset.
Chicago: Aldine.
Ford, Clellan S., and FrankA. Beach. 1951. Patterns of
Sexual Behavior. New York: Harper.
Freedman, David, Robert Pisani, and Roger Purves.
1978. Statistics. New York: Norton.
Gebhard, Paul H. 1978. "Sexual Behavior, Human."
Pp. 593-601 in The New Encyclopedia Britannica,
Macropaedia 16. Fifteenth edition. Chicago: Britannica.
Greene, Graham. 1978. "Chagrin in Three Parts." Pp.
45-52 in May We Borrow YourHusband? and Other
Comediesof the Sexual Life. New York: Penguin. First
published in 1967.
Hocking, Ronald R. 1983. "Developments in Linear
RegressionMethodology: 1959-1982." Technometrics
25:219-30.
Jasso, Guillermina. 1985. "MaritalCoital Frequencyand
the Passage of Time: Estimating the SeparateEffects
of Spouses' Ages and Martial Duration, Birth and
MarriageCohorts, and Period Influences." American
Sociological Review 50:224-41.
Judge, George G., W.E. Griffiths, R. Carter Hill,
AMERICAN SOCIOLOGICALREVIEW
Helmut Lutkepohl, and Tsoung-ChaoLee. 1985. The
Theoryand Practice of Econometrics. Second Edition.
New York: Wiley.
Kahn, Joan R., and J. Richard Udry. 1986. "Marital
Coital Frequency:Unnoticed Outliers and Unspecified
InteractionsLead to ErroneousConclusions (Comment
on Jasso, ASR, April 1985)." American Sociological
Review 51:000-000.
Kaplan, Helen S. 1986. "Sexual Relationshipsin Middle
Age: ComparativePhysiologic Changesin Women and
Men." The Journal of Clinical Practice in Sexuality
2:21-28.
Kinsey, Alfred C., Wardell B. Pomeroy, and Clyde E.
Martin. 1948. Sexual Behavior in the Human Male.
Philadelphia:Saunders.
Kinsey, Alfred C., Wardell B. Pomeroy, Clyde E.
Martin, and Paul H. Gebhard. 1953. Sexual Behavior
in the Human Female. Philadelphia:Saunders.
Kmenta, Jan. 1971. Elements of Econometrics. New
York: Macmillan.
Maddala, G. S. 1983. Limited-Dependentand Qualitative Variablesin Econometrics.Cambridge:Cambridge
University Press.
Prescott, Phillip. 1983. "Discussion." Technometrics
25:156-57.
Trussell, James, and CharlesF. Westoff. 1980. "Contraceptive Practice and Trends in Coital Frequency."
Family Planning Perspectives 12:246-49.
Weisberg, Sanford. 1983. "Some Principles for Regression Diagnostics and Influence Analysis (Discussion of
Hocking)." Technometrics25:240-44.
Zellner, Arnold. 1984. Basic Issues in Econometrics.
Chicago: University of Chicago Press.