You are on page 1of 35

Sociological Methods & Research http://smr.sagepub.

com/

Constrained Estimators and Age-Period-Cohort Models


Robert M. O'Brien Sociological Methods & Research 2011 40: 419 originally published online 29 July 2011 DOI: 10.1177/0049124111415367 The online version of this article can be found at: http://smr.sagepub.com/content/40/3/419

Published by:
http://www.sagepublications.com

Additional services and information for Sociological Methods & Research can be found at: Email Alerts: http://smr.sagepub.com/cgi/alerts Subscriptions: http://smr.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.com/journalsPermissions.nav Citations: http://smr.sagepub.com/content/40/3/419.refs.html

>> Version of Record - Aug 15, 2011 Proof - Jul 29, 2011 What is This?

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

Article

Constrained Estimators and Age-Period-Cohort Models


Robert M. OBrien1

Sociological Methods & Research 40(3) 419452 The Author(s) 2011 Reprints and permission: sagepub.com/journalsPermissions.nav DOI: 10.1177/0049124111415367 http://smr.sagepub.com

Abstract If a researcher wants to estimate the individual age, period, and cohort coefficients in an age-period-cohort (APC) model, the method of choice is constrained regression, which includes the intrinsic estimator (IE) recently introduced by Yang and colleagues. To better understand these constrained models, the author shows algebraically how each constraint is associated with a specific generalized inverse that is associated with a particular solution vector that (when the model is just identified under the constraint) produces the least square solution to the APC model. The author then discusses the geometry of constrained estimators in terms of solutions being orthogonal to constraints, solutions to various constraints all lying on a line single line in multidimensional space, the distance on that line between various solutions, and the crucial role of the null vector. This provides insight into what characteristics all constrained estimators share and what is unique about the IE. The first part of the article focuses on constrained estimators in general (including the IE), and the latter part compares and contrasts the properties of traditionally constrained APC estimators and the IE. The author concludes with some cautions and suggestions for researchers using and interpreting constrained estimators. Keywords age-period-cohort models, constrained model estimation, intrinsic estimator

University of Oregon, Eugene, OR, USA

Corresponding Author: Robert M. OBrien, 720 PLC University of Oregon, Eugene, OR 97403 Email: bobrien@uoregon.edu

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

420

Sociological Methods & Research 40(3)

Constrained estimators are the most popular approach to estimating the individual coefficients in age-period-cohort (APC) accounting models.1 Conventionally, a single constraint is placed on these models to make them just identified. Unfortunately, since each of the constrained models fits the data equally well, the validity of the chosen constraint cannot be judged from the model fit. Instead, in the conventionally constrained APC models, the researcher must decide on the basis of theory or past research the appropriateness of the constraint used to identify the model. Setting the constraint in a defensible manner, however, is the Achilles heel of this approach, and the constraint chosen can greatly affect the estimated coefficients. Almost 30 years ago, Kupper et al. (1983) discussed a constrained estimator for age, period, and cohort effects that they called the principal components estimator. This estimator produces coefficients identical to those of the recently introduced intrinsic estimator (IE), which is also a constrained estimator. Kupper et al. (1983) note, as proved by Yang, Fu, and Land (2004) for the IE, that the principal components estimator has minimum variance. Beyond this, however, Kupper et al. (1983) did not appear to believe it had any special usefulness in the analysis of APC data, pointing out that using this estimator could lead to more bias (in the sense of differences between the expected value of the estimates and the true underlying generating parameters) than the use of some other constraint (Kupper et al. 1983). Because of its minimum variance property and other additional properties, Yang and associates recommend using the IE in the analysis of APC data. Yang et al. (2004) correctly note the important contributions of Fu (Fu 2000; Fu, Hall, and Rohan 2004; Knight and Fu 2000) in the development of the IE and in investigating its properties. It is Fu who has most extensively published work on the IE. Yang et al. (2004) recently introduced the IE to sociologists. After the specific introduction of the IE to sociologists in Sociological Methodology in 2004, it was further described in an article in the American Journal of Sociology (Yang et al. 2008). Yang and associates (2008:1699) view the IE as a general-purpose method of APC analysis with potentially wide applicability in the social sciences. They review several properties of the IE (many of which, as I discuss in the following, are shared with other constrained estimators). Given the widespread use of constrained APC models and the introduction of a new and arguably improved constraint into the sociological literature, this article explicates and clarifies the basic properties of constrained linear models as they are used in the APC context. This should help researchers in several ways: (1) understanding the process of choosing an appropriate

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

OBrien

421

constraint, (2) comprehending the relationship of the results based on using different constraints, (3) untangling which characteristics are unique to the IE and which it shares with other constrained estimators, and (4) facilitating a comparison of the IE to other constrained estimators. I begin with a brief discussion of the general APC accounting model and the identification problem and then move to the use of generalized inverses, which implement constrained estimation. These inverses provide a set of solutions even though the unconstrained model is not identified. I then show how these constraints can be implemented in different ways. Next I turn to the geometry of constrained estimation with the aim of developing a common framework for evaluating different constrained estimators. I then address, among other issues, bias in constrained estimators, the relationship of the constraint to the results it produces, the relationship of the IE estimator to other constrained estimators, and some unique features produced by the constraint used in the IE approach. That discussion is followed by a simulation designed to show some of these properties and to demonstrate how a series of different constraints works in practice. I conclude with an evaluation of the potential advantages of the IE relative to other constrained estimators and one potentially helpful situation in which researchers might gain confidence from using different constrained estimators. In the next two sections I briefly describe the accounting (multiple classification) model utilized in this literature and the identification problem that analysts using this model face.

Multiple Classication Analysis for APC Models


Age groups, periods, and cohorts are often coded with dummy variables by APC analysts. In this article I will use a similar coding scheme: effect coding. I employ this coding because it is used in the IE approach (Kupper et al. 1983, 1985; Yang et al. 2004, 2008). This coding (as well as dummy variable coding) does not constrain the functional form of the relationship between the dependent variable and the age groups, periods, and cohorts. I use the multiple classification model represented in equation (1) throughout this article,2 Yij m ai pj caij eij ; 1

where Yij is the age-periodspecific value of the dependent variable, m represents the intercept, ai represents the effect of the ith age group, pj denotes the effects of the jth period, cai+j represents the effects of the (ai+j)th cohort

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

422

Sociological Methods & Research 40(3)

Table 1. Age-Period Table With Data Generated as Described in the Text Period 1 Age 1 Age 2 Age 3 Age 4 Age 5 10.5 10 9.0 8.0 7.0 Period 2 11.0 10.5 9.0 8.0 7.0 Period 3 11.5 11.0 9.5 8.0 7.0 Period 4 14.0 13.5 12.0 10.5 9.0 Period 5 14.5 14.0 12.5 11.0 9.5

(where a equals the number of age groups), and eij represents the error term [E(eij) 0]. I use age as the row variable, period as the column variable, and cohorts are on the diagonals (see Table 1 for an example age-period data matrix).

Rank Decient Matrix


The identification problem in APC analysis arises because of the linear dependency between the age, period, and cohort variables. If we know a persons age group and the period, we can determine their birth cohort (C P A); if we know their age group and cohort, we can determine the period (P C + A); and if we know their birth cohort and the period, we can determine their age (A P C). This is the linear dependency that prevents a unique solution to the regression of the dependent variable on the effect coded (or dummy coded) variables for age groups, periods, and cohorts (even when we have one reference category for each set of effect coded [dummy coded] variables). In matrix notation, we can represent the multiple classification equation (1) as: Y = Xb + e; 2

where Y is an ap 1 vector of the age-periodspecific rates (counts, or some other form of dependent variable). The X-matrix is an ap 2(a + p) 3 matrix containing ones in the first column and effect coded variables in the remaining columns. The order of the columns can be schematized as [1, (a 1), (p 1), (a + p 2)], where a 1 represents the number of age effect coded variables, p 1 represents the number of period effect coded variables, and a + p 2 represents the number of cohort effect coded variables. Each of these effect coded variables has ap entries (zeros and ones) in its column (and minus one in the row corresponding to the reference category) and is coded to correspond to the cell in which the age-periodspecific value of Y

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

OBrien

423

resides.3 The last categories of age, period, and cohort serve as the reference categories in our representation. When a regular inverse exists, solving an equation of the form of (2) for a unique set of least square regression coefficients is trivial: ^ = X 0 X 1 X 0 Y , b 3

where X 0 is the transpose of X and the superscripted 1 indicates the inverse. The problem with this solution in the APC context occurs because of the linear dependency in the X-matrix, which means that the standard inverse of X 0 X does not exist. If we label the number of columns in X as m [ 2(a + p) 3], only m 1 of these columns are linearly independent. This linear dependency means that there is a set of coefficients that when multiplied times the columns of X produces a column vector of zeros (with the restriction that not all of these multiplying coefficients are zero). This vector of coefficients is called the null vector. In general, one and only one such vector of coefficients exists for the APC model (this vector is unique up to multiplication by a scalar). In the language of linear algebra, we say that there is a nontrivial solution to the a p homogeneous equations; and we can write Xv 0, where X is the ap 2(a + p) 3 design matrix and v is a 2(a + p) 3 1 vector (the null vector). This vector is said to be in the null space of X. The existence of only one such vector indicates that the rank of X is just one less than full column rank and that a single linear constraint should allow for a solution to the identification problem.

Constraints and Generalized Inverses


Even with a rank deficient matrix, it is still possible to find a least squares solution to (2) by using a generalized inverse (Searle 1971). Typically, the generalized inverse is denoted with a superscripted minus rather than a superscripted minus 1. I have added the subscript c to denote that generalized inverses differ depending upon the constraint with which they are associated: X 0 X . For the same reason I have subscripted the solution vector for b with c c. That is, a particular set of solutions is associated with the particular generalized inverse used to solve the equation and that inverse is associated with a particular constraint on the solutions: ^c = X 0 X X 0 Y : b c 4

This allows us to compute solutions in the constrained APC model, and each solution is determined by the constraint. The solution produced is a least

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

424

Sociological Methods & Research 40(3)

squares solution; that is, each solutions (bc) associated with the particular constraint produces the same predicted values of Y and these values minimize ^ the sum of the squared residuals. Thus, (Y Y )2 is the same no matter which generalized inverse is used to solve the equation.4 It is well known in APC modeling that the fit cannot be used to differentiate the validity of different constraints (assuming the use of just one constraint so that the model is just identified). The way that constrained estimation is typically employed in APC analysis is to place a single constraint on X so that the model is just identified. I will use the symbol Gc to represent the generalized inverse associated with a particular constraint, rather than the more awkward notation(X 0 X ) . c With the generalized inverse associated with a particular constraint in hand, we multiply X 0 Y in (3) by the generalized inverse to obtain a solution: ^c = Gc X 0 Y . This solution is determined by the constraint. Mazumdar, Li, b and Bryce (1980) explicate this correspondence between a constraint, its generalized inverse, and the solution. They show how to derive a specific generalized inverse from the constraint. When a constraint is used in the APC context, the constraint conventionally involves setting two of the coefficients associated with age, period, or cohort to be equal. The choice of the constraint then determines a generalized inverse used to solve for the age group, period, and cohort coefficients. The choice of a constraint is crucial, since it determines the set of estimated values (bc) and whether those estimated values are unbiased. The assumption is that c0 = 0, where c is the m 1 vector for the constraint and is the m 1 vector of population effect coefficients that generated the data.5 If c0 b 6 0, the estimate under that constraint will be biased in that its expected value will not equal the value of parameters that generated the data (for a further explication of the bias associated with violating this assumption, see Kupper et al. 1983). This is true for the conventionally used constraints and the constraint associated with the IE. The constraint associated with the IE is that the solution must be perpendicular to the null vector (the vector that when pre-multiplied by X produces an m 1 zero vector). Specifically, if v represents the null vector, the assumption is that v0 = 0. Since v is a specific constraint, we can write c0 = 0, where it will be clear from the context that c is the null vector or a different constraint. To the extent that this assumption is not true, the estimates associated with this constraint will be biased in the sense that they are biased estimates of the parameters that generated the age-periodspecific rates. As I discuss in the following, when Yang et al. (2004, 2008) rightly assert that their estimate is unbiased, it is an unbiased estimate of the least squares

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

OBrien

425

solution in the subspace that is perpendicular to the null vector. That does not mean that the IE is an unbiased estimate of the parameters that generated the age-periodspecific rates, the generating parameters. To make the discussion more concrete, Figure 1 presents three constraints. These constraints are based on a five-by-five age by period matrix, where the columns of the X matrix contain one intercept, four age group effect coded variables, four period effect coded variables, and eight cohort effect coded variables. The first two constraints are conventional constraints that might be used in APC analysis. As previously argued in the literature (Kupper et al. 1985; Mason et al. 1973; Smith 2004), the constraint should be based on theory and/or empirical evidence that the constraint applies to the population under consideration. The third constraint is the one associated with the IE. The first two constraints fix two of the effect coefficients to be equal to each other and the third uses the null vector of X as the constraint. The assumption for unbiased estimates of the generating parameters is that the dot product of the constraint vector and the vector of the parameter values associated with the process that generated the Y-values is equal to zero (the reference categories are omitted). In the first two cases this will be true; respectively, if the population parameter values for age1 and age2 are equal to each other or if the population parameter values for cohort7 equals the value for cohort8. The assumption when using the constraint associated with the null vector is more complicated (though the IE is easily solved for using an add-on program for STATA; cited in Yang et al. 2008); specifically, for this five-by-five age-period matrix, the assumption is that 0.000 bint + ( 0.267 bage1) + (0.134 bage2) + . . . + (.0401 bcoh8) 0. We can write the assumptions associated with each of these restrictions as c0(age1 = age2) = 0, c0(coh7 = coh8) = 0, and c0(nullvector) = 0. Each of these places a linear constraint on the solution. The question is which one of these constraints is best with respect to obtaining a substantively correct result (assuming that the researcher wants to determine the effects of age, period, and cohort coefficients in generating the outcomes: e.g., suicide rates, homicide rates, or marriage rates). Mazumdar et al. (1980) implement the constraints in the following manner: (1) Compute X 0 X ; (2) replace the last row of X 0 X with the constraint that we chose to use; for example, c0(age1 = age2) or c0(nullvector) ; (3) compute the inverse of this new matrix; and (4) replace the last column of this inverse with zeros. The result is the generalized inverse (Gc) associated with the particular constraint. Using this process, we can find a set of least squares solutions under a particular constraint: ^c = Gc X 0 Y .6 When the null vector is b substituted for the aforementioned last row of X 0 X, the generalized inverse

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

426

Sociological Methods & Research 40(3)

age1=age2

coh7=coh8

null vector

intercept age1 age2 age3 age4 period1 period2 period3 period4 cohort1 cohort2 cohort3 cohort4 cohort5 cohort6 cohort7 cohort8

0.000 1.000 -1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 -1.000

0.000 -0.267 -0.134 0.000 0.134 0.267 0.134 0.000 -0.134 -0.535 -0.401 -0.267 -0.134 0.000 0.134 0.267 0.401

Figure 1. Three different constraints on the X-matrix

that results is one that yields a solution for bc that is the same as if we had used the Moore-Penrose generalized inverse: the inverse used in the IE approach (Mazumdar et al. 1980; Yang et al. 2008). This result occurs because the Moore-Penrose generalized inverse institutes this constraint, although the Moore-Penrose generalized inverse has some other valuable properties for those using matrix algebra.

Simple Implementations of Constrained Regression


Researchers wanting to implement the first two constraints in Figure 1 may choose to do so using a simple recoding of their data. To implement the first constraint based on age group 1 and age group 2 having the same effects, one can simply create a new variable (newage1_2) that is the sum of age1 and age2 and use it in the analysis rather than age1 and age2. This is equivalent to creating a single column in X that is equal to the age1 column plus the age2 column and using this column in place of the columns for age1 and age2 in the regression analysis. The coefficient associated with newage1_2 is the coefficient for both age1 and age2. To implement the second constraint, one can add the variable for cohort7 and cohort8 to create the new variable (newcohort7_8) and use it in the analysis instead of cohort7 and cohort8. It is

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

OBrien

427

a straightforward extension to implement a constraint in which age1 0.5 age2. One can construct this constraint by adding 0.5 age1 to age2 (newage1_2 age2 + 0.5 age1). The new column generated by this transformation contains 0.5, 1, 0, 0, -1.5, 0.5, 1, . . . (where 1.5 is for the reference group, the oldest age group in our example), and this pattern repeats until there are no more rows in this column for this newly created variable. Then we can run a regular regression using newage1_2, age3, . . . coh7, coh8 as the regressors. The coefficient for newage1_2 is the coefficient for age2 and .5 times that coefficient is the coefficient for age1. Another approach uses constrained regression, which employs the constraint: age1 0.5 age2. Constrained regression can also be used with the first two examples. Although almost all researchers will want to use the convenient add-on program for STATA (cited by Yang et al. 2008) to calculate the IE, it is possible to implement the null vector constraint and produce the coefficient estimates from the IE using constrained regression or by transforming the columns of X using the strategy described in the previous paragraph. The null vector in Figure 1 means that 0 0.00intercept 2.67a1 .134a2 + 0.00a3 + .134a4 + . . . +.267c7 + .401c8. Adding 2.67a1 to both sides of the equation and dividing both sides of the equation by 2.67 yields a1 0.00intercept .50a2 + 0.00a3 + .50a4 + . . . 1.00c7 1.50c8. In this particular representation, we see that a1 is a constrained linear function of the columns of the other independent variables in X. If we implement this constraint (that is, age1 .50age2 + 0.50age4 + . . . 1.00coh7 1.50coh8) in a constrained regression program, we obtain the same estimates produced by the STATAbased IE program. A reviewer of this article suggested transforming the columns of independent variables in the following manner (although noting that the procedure is tedious): newage2 age2 0.5 age1; newage3 age3; newage4 age4 + 0.5 age1; . . . ; newcoh7 coh7; newcoh8 coh8 + 1.5age1. Using these values as regressors implements the null vector constraint and produces coefficient estimates equivalent to those from using the IE.

The Geometry of Constrained Regression


When X 0 X is one less than full column rank, as it is in the APC situation, the solution to the m 1 b vector of coefficients is in an m dimensional solution space. Specifically, it lies in an m 1 subspace of that solution space that is orthogonal to its constraint. We know from the previous sections how to estimate the m elements of the b vector by using a generalized inverse associated with a constraint, by using constrained regression, or by algebraically

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

428

Sociological Methods & Research 40(3)

applying the constraint using the columns of X. To provide intuition about how constrained regression works geometrically, we use a simple illustration in which X 0 X is a 3 3 matrix. Many essential properties of constrained regression can be illuminated using this approach. In the example I use the following specific data for Xb Y: 2 1 61 6 41 1 4 2 4 2 3 2 3 2 2 3 19 b 1 74 1 5 6 11 7 7 b = 6 7; 4 19 5 25 2 b3 1 11

which produces the normal equations for (X 0 X)b (X 0 Y) 2 32 3 2 3 4 12 6 b1 60 4 12 40 20 54 b2 5 = 4 196 5: 6 20 10 98 b3

Here X 0 X is singular reflecting the linear dependency in X. The second column is twice the third and the null vector is (0, 1, 2),7 since 0 times the first column of X plus 1 times the second column of X plus 2 times the third column of X results in a 3 1 column vector of zeros. Figure 2 has three axes b1, b2, and b3 (the three-dimensional solution space) with the b1 axis representing the intercept. The null vector (0, 1, 2) is represented by the stippled line through the origin of (b1, b2, b3) that lies on the b2-b3 plane (since its b1 value is zero). The slope of the null vector on the b2-b3 plane is 2.0 (if b2 increases by 1.0, b3 decreases by 2.0). The line of solutions, the line on which every constrained solution appears, is three units above the b2-b3 plane. Its slope with respect to the b2-b3 plane is the same as that of the null vector, 2.0. The line of solutions and the null vector are parallel to each other. The line of solutions for our data is always three units above the b2-b3 plane; the value of the intercept remains constant at 3.0 for all solutions using the constrained regression approaches outlined earlier for the data in the example. We can determine the line of solutions in this simple example by using (6), if we bear in mind that the intercept (b1) 3. If b1 3 and we set b3 to zero b2 must be 4, and if we set b2 to zero b3 must be 8. If we connect these two points, (3, 4, 0) and (3, 0, 8), we have the line on which all of the constrained solutions in this example must fall. This line is labeled as the line of solutions in Figure 2. Another way to determine this line is to find the intersection of the first two planes represented by the (first two) normal equations in (6). The problem is that the remaining

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

OBrien

429

b1

Null Vector of XX (0, 1, -2)

Line of Solutions bc = GcXY (Values in our Example)

b3 8

90o (0,0,0) 4

b2

Figure 2. Geometric view of constrained regression in three dimensions

normal equation represents a plane on which this line lies. Since this plane does not intersect the line of solutions in a single point (because of the linear dependency), we do not know where on the line of solutions to chose a solution. We can force this plane to intersect the line of solutions at a single point by setting a constraint on the solution. We have seen earlier that we cannot solve this problem (equation 6) using a regular inverse. We can, however, use a generalized inverse to find a solution for the bs for our example data. Different generalized inverses are associated with different constraints and yield different solutionsall lying on the line of solutions that is parallel to the null vector. Using the system

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

430

Sociological Methods & Research 40(3)

described earlier (Mazumdar et al. 1980) to obtain the generalized inverses for the data in (6) and solving for the bc vectors under different constraints, we can confirm that all of the solutions fall on this line of solutions depicted in Figure 2. We can, of course, obtain the same estimates using constrained regression or transformations of the columns of X to implement the constraint. For example, to obtain the solution for the constraint (0, 1, 1), which implies that b2 b3, we substitute this constraint vector for the last row of the X 0 X; find the inverse of this new matrix; and then substitute a column of zeros for the last column of this inverse matrix. Pre-multiplying X 0 Y by this generalized inverse yields the solution vector (3.0, 2.67, 2.67) 0 . Figure 3 depicts this solution geometrically. The constraint passes through the origin (0, 0, 0) and has a slope of 1.0 with the b2-b3 plane. The solution plane must be perpendicular to the constraint (it must have a slope of 1.0 with respect to the b2-b3 plane) and pass through the b1 axes. The plane depicted in Figure 3 has just such a slope with respect to the b2-b3 plane, is perpendicular to the b2-b3 plane, and passes through the origin. Where this solution plane intersects the line of solutions (3.0, 2.67, 2.67) is the solution under this constraint. If we repeated this process to obtain the Moore-Penrose solution, the difference is that this solution is orthogonal to the null vector.8 Therefore, we substitute the null vector for the last row of the X 0 X; find the inverse of this new matrix; and then substitute a column of zeros for the last column of this inverse matrix. Pre-multiplying X 0 Y by this generalized inverse yields the solution vector (3.0, 3.2, 1.6) 0 . Geometrically, the slope of the null vector (0, 1, 2) to the b2-b3 plane is 2.0, and the slope of the solution plane relative to the b2-b3 plane (which must be orthogonal to this constraint) is .5. This solution is depicted in Figure 4 as occurring where the solution plane intersects the line of solutions. This plane is orthogonal to the null vector (its slope with respect to the b2-b3 plane is 0.5), and it intersects the line of solutions at the point (3.0, 3.2, 1.6). It is easy to check that this solution vector is perpendicular to the null vector: namely, (0, 1, 2) (3.0, 3.2, 1.6) 0 0. With the constraint (0, 1, 0.5) the estimated coefficients are (3.0, 2.0, 4.0), and using the constraint (0, 1, 0.1) yields (3.0, 0.6667, 6.6667). The constraints do not allow just any solution to emerge, but only solutions that are on the single line (the line of solutions) that is parallel to the null vector. Note also that each solution is orthogonal to its constraint. For example, (0, 1, 0.5) (3.0, 2.0, 4.0) 0 0 and (0, 1, 0.1) (3.0, 0.6667, 6.6667) 0 0. If we substitute these solutions (bc) into (5), each one produces the same predicted values of Y. This is part of what it means to be on this line of solutions.

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

OBrien

431

b1

Line of Solutions bc = GcXY (Values in our Example) Null Vector

b3 8

Solution (b1=3.0, b2= 2.67, b3=2.67) 2.67

45o

b2 = b3 2.67 4

Constraint (0,1,-1)

b2

Figure 3. Geometric view of constrained regression in three dimensions: b2 b3

If we were careful geometers, we could produce the constrained solutions for the data in (6) corresponding to any constraint by first constructing the line of solutions, then the solution plane that is orthogonal to the constraint, and then determining where this plane intersects the line of solutions. We would always find that the intercept is 3.0, since the line of solutions (for this data) is always 3.0 units above the b2-b3 plane.

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

432

Sociological Methods & Research 40(3)

b1

Line of Solutions bc = GcXY (Values in our Example) Null Vector

b3 8

Moore-Penrose Solution (Values in Our Example) (b1=3.0; b2=3.2; b3=1.6) 1.60

22.5o

b3 = b2 3.20

(0,0,0)

4 b2

Figure 4. Geometric view of constrained regression in three dimensions: b3 12 b2: The Moore-Penrose solution

We can characterize the differences between various constrained solutions by means of the factor kv, where k is a scalar and v is the null vector. For example, one of our solutions is (3.0, 2.6667, 2.6667) and another is (3.0, 0.6667, 6.6667). We can move from the first to the second solution by adding 2 (0, 1, 2) to the first solution. This manipulation simply moves us along the line of potential solutions, here moving from (3.0, 2.6667, 2.6667) to reach (3.0, 0.6667, 6.6667). This relationship keeps all

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

OBrien

433

of the possible solutions on the line of solutions. Note that k depends on how we scale the null vector, since any scalar multiple of (0, 1, 2) is also the null vector. Two ways to view this movement are first, as a movement from one solution to the other solution by moving parallel to the axes. To move from the point (3.0, 2.6667, 2.6667) to (3.0, 0.6667, 6.6667), move 0.0 units on the b1 axis; 2 units parallel to the b2 axis and up 4 units parallel to the b3 axis where we intercept the line of solutions at (3.0, 0.6667, 6.6667). The second geometric interpretation is based on the length that must be traveled on the line of solutions from one solution to the other. The length of the null vector is the square root of the sum of its squared components: In our example the square root of (02 + 12 + 22) 2.236. The distance along the line of solutions between (3.0, 2.6667, 2.6667) and (3.0, 0.6667, 6.6667) is twice this distance: 4.472. The unit of measurement for k is the length of the null vector. We must move 2 units [2 sqrt(v 0 v)] along the line of solutions to reach (3.0, 0.6667, 6.6667) from (3.0, 2.6667, 2.6667). The difference between any two constrained estimates may be written as bc bc) kv, where c and c* are two different constraints. When we move to a situation in which there are four or more dimensions, the solutions for any one set of data using different constraints still lie on a line that is parallel to the null vector. This (one dimensional) line is then in an m dimensional space rather than in a three-dimensional space. The line of solution is intersected by an m 1 dimensional hyperplane that is orthogonal to the constraint. Where this hyperplane intersects, the line of solutions provides the solution associated with the particular constraint. We can reach all of the solutions for the constrained estimators on the line by taking any solution (from one of the generalized inverses) and adding a scalar times the null vector to it. The scalar, k, still tells us how far we must travel on the line of solutions with the units of measurement for k being the length of the null vector (which now has m elements). Each set of solutions is still perpendicular to the constraint.

Relevance to APC Models


In the class of APC models that I examine, a single linear constraint is placed on the model that makes the model just identified. In this class of models, the solution is always orthogonal to any constraint. The intercept is always the same no matter which linear constraint is chosen. All of the solutions using a single linear constraint lie on a single line of solutions in multidimensional space. All of these solutions fit the model equally well in terms of predicted

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

434

Sociological Methods & Research 40(3)

values. All of the solutions are related to each other by kv. The solution is determined by the intersection of the line of solutions with an m 1 hyperplane, and the direction of that hyperplane is orthogonal to the constraint. Before leaving this introduction to the geometry of constrained regression, I should mention a unique property of using the null vector as a constraint. The solution that it produces is closer to its constraint than using any other constraint. In Figure 4 the solution plane perpendicular to the null vector has a slope of .50 with respect to the b2-b3 plane (a one-unit increase on b2 is associated with a 12 -unit increase in X3). The length of the solution vector is the distance from the b1, b2, b3 origin to the point of solution on the line of solutions. For example, the length of the vector from the origin to the solution on the line of solutions using the null vector is 4.669 ( sqrt(3.02, 3.22 + 1.62)). Using the constraint (0, 1, 1), the distance to the solution is 4.819 ( sqrt(3.02, 2.6672 + 2.6672)). This is often cited as an advantage of using the Moore-Penrose solution: If we want to single out one particular member of this solution set of vectors as a representative, we might want to pick the one with the smallest length (Press et al. 1992:62). We can also see that it is representative in the sense that it is in the middle of a line of solutions that stretches out in either direction. It may be this property that inspired Smiths (2004:116) comment: There is also a sense in which the IE is an average of CGLIM [Constrained Generalized Linear Model] estimates.

Some Shared Characteristics of Constrained APC Models


In this section I discuss some of characteristics that are shared by all constrained APC models, including the IE. I also discuss the characteristics of the IE that are different from other constrained models. The aim is to provide researchers with a basis for: understanding how constraints are related to solutions, comprehending how the estimated coefficients from the various constrained models are related, and judging how much confidence should be placed in the estimates.

A Solution for All of the Effect Coefcients


Because the APC model without an additional constraint is not of full column rank, it is underidentified and there is no solution that provides a unique vector of age, period, and cohort coefficients that corresponds to the generating

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

OBrien

435

parameters for the data. There are instead an infinite number of solutions for the age, period, and cohort coefficients that produce least square estimates. Any constrained regression approach provides estimates of the unique effect coefficients for each age group, period, and cohort under the particular constraint. They are unbiased estimates in the sense that they are unbiased estimates under a particular constraint. The crucial question, from our perspective, is whether the parameter estimates are unbiased, in the sense that their expected values equal the values of the parameters that generated Y: Does, in fact, c0 b = 0?

Solution Space Is Perpendicular to the Constraint


Yang et al. (2004:82) note that the parameter space P can be decomposed as P = N Y, where is the direct product of two linear spaces that are perpendicular to each other; N is the one-dimensional null space of X spanned by the vector {sB0}, with real number s; and is the complement subspace orthogonal to N. In our notation B0 is the null vector (v) and s is the scalar by which the null vector may be multiplied, since it is unique up to multiplication by a scalar. In general, I note that the parameter space can be decomposed as P = Bc c where Bc is a linear constraint spanned by the vector {sc} with real numbers s and constraint vector c; and c is a complement subspace orthogonal to Bc . The linear constraint, like the null vector, is unique up to multiplication by a scalar. The solution vector must be orthogonal to the constraint, since the solution vector is determined by a point on the constrained hyperplane, and the hyperplane is orthogonal to the constraint. Specifically, this point is the intersection of this hyperplane (with its direction constrained to be orthogonal to the constraint) and the line of solutions. As noted earlier, all of the solutions for the different constraints (with a fixed set of data) must fall on a line parallel to the null vector. This restriction on the solutions occurs because of the linear dependency in the original data. For each constrained solution, the constraint determines the direction of the solution plane/hyperplane.

Bias
Arguably the most important consideration in setting a linear restriction is the degree of bias associated with that restriction. The constraint chosen by the researcher determines the amount of bias associated with the particular constrained estimate. Since the effect coefficients are estimated under this

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

436

Sociological Methods & Research 40(3)

assumption, the estimated values will be orthogonal to the constraint: c0 ^c = 0. Therefore, we cannot use this observed orthogonality to judge the b validity of the constraint. The crucial question is whether the constraint is orthogonal to b. A general definition of bias involves the difference between the expected value of an estimator and the parameter that it estimates. In this article, I use bias to refer to the differences between expected values of the estimates and the values of the parameters (b) of the process that generated the outcome values. Specifically, it is the difference between the expected values of the estimates in the vector of constrained estimates, E(^c ), and b: bias E(^c ) b. b b Using this conception of bias, Kupper et al. (1983) note that for APC models kv = E(^c ) b, where kv measures the distance between the expected value of b the constrained solution and the population parameters that generated the data (b). This is identical to the distance between any two constrained estimates on the line of solutions, except that one of the solutions is the true solution (the one representing the process that generated the outcomes). Kupper et al. (1983) note that this value of k can be computed as: k = c0 b=v0 c and that c 0 b 0 when the expected value of the constrained estimate equals the parameter values that generated the outcome: E(bc ) = b. A different use of the term bias involves the difference between the expected values of the estimators and the values of the parameters under a particular constraint: E(^c ) = bc . This is an important property for any conb strained estimator, but the focus of most of the literature has been on bias in terms of the expected value of the estimated parameters and the parameters that generated the data. A careful reading of Yang et al. (2004, 2008) indicates that the IE is an unbiased estimate of the solution to the APC model that lies in the subspace that is orthogonal to the null vector. The IE is an unbiased estimator of the bie parameter values associated with the null vector constraint: E(^ie ) = bie . The b IE would provide an unbiased estimate of the parameters that generated the age-periodspecific rates, if and only if v0 b = 0; just as other constrained estimates would provide unbiased estimates of the parameters that generated the age-periodspecific rates if and only if c0 b = 0.

Relationship of Constrained Solutions to Each Other


The problem confronting APC analysts who employ constrained regression (whether using traditional constraints or the IE) is that they cannot find the unique solution to the normal equations. This occurs because the X matrix is rank deficient by one. Thus, there are an infinite number of possible

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

OBrien

437

solutions; however, not just any solution is possible. All constrained solutions are related to each other in that they lie on a single line of solutions in multidimensional space. In a fundamental sense, the line of solutions is the most tangible thing we know about the solution to this system of equations. Any solution to the normal equations must lie on this line and any of the solutions to the normal equations are ordinary least squares (OLS) solutions. Thus, if our standard assumptions (e.g., those underlying OLS multiple regression) are correct, even though the X matrix is rank deficient by one, we know that the solution corresponding to the parameters values that generated the outcome data must fall on this line. The difference between any two solution vectors in the class of constrained estimators examined in this article is kv (-c01 ^2 =v0 c1 ) v, where b c1 is one constraint and ^2 is the solution under a different constraint. When b we have calculated a single solution, we could determine the difference between that solution and any other solution generated using a different constraint or, of course, determine any other solution. The distance between any p p two solutions on the line of solutions is k v0 v), where v0 v is the length of the null vector. Appropriately, the value of k 0 when the constraint for a solution and the solution ^2 are orthogonal c01 ^2 = 0. Thus, kv indicates b b how much the coefficients based on the two different constraints vary: ^2 b ^1 kv. b

Model Fit
The solutions for the normal equations associated with the APC model all lie on the line of solutions and any solution to these normal equations is a least squares solution. Thus, each of the constrained estimates generates a solution (bc) that produces the same solutions for the values of the dependent variable. I showed this for the 3 3 case using the data in equation (5), and it extends to higher dimensions and accounts for the well-established finding that in APC analysis one cannot distinguish between models on the basis of fit when they are just identified. This is true for all of the constrained estimates that we examine.

Setting the Constraint and Zero Inuence


Many sources (e.g., Kupper et al. 1983; Rodgers 1982; Smith 2004) caution against setting the constraint based on the values of the observed dependent variable. An advantage of the IE approach is that it assures that the researcher does not give in to this temptationthe constraint is based purely on the

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

438

Sociological Methods & Research 40(3)

X-matrix, and in this sense, the analyst does not choose which constraint to use. I suggest two cautions. First, one could argue that in some situations with good theory/past research that it would be better to set the constraint on the basis of theory and previous research than setting it on this basis of the null vector of the X-matrix. Kupper et al. (1983:2803-804) conclude: Given that the observed Yijs are of no help in determining c, the only remaining option is to make use of any reasonably reliable a priori, data independent, knowledge about the underlying age, period, and cohort effect parameters under study. The second caution is that having no choice in the constraint used does not mean that the constraint used is unimportant or has no effect on the outcome measures. Yang et al. (2008:1704) state: The eigenvector B0 [the null vector] does not depend on the observed rates Y, only on the design matrix X, and thus is completely determined by the numbers of age groups and period groupsregardless of the event rates. In other words, B0 has a specific form that is a function of the design matrix. This is correct, as noted earlier, and a potential advantage of using this constraint. But it is not clear what the following statements means: The fact that the fixed vector B0 is independent of the response variable Y suggests that it should not play any role in the estimation of effect coefficients (Yang et al. 2008:1705) or Specifically, the IE imposes the constraint that the direction in parameter space defined by the eigenvector B0 in the null space of the design matrix X have zero influence on the parameter vector b0 (i.e., on the specific parameterization of the vector b that is estimated by the IE) (Yang et al. 2008:1706). If these statements are meant to convey that constraining the solution to be orthogonal to the vector B0 (the null vector) does not affect the estimates, then this is incorrect. The solution vector must be orthogonal to the null vector and therefore is affected by this constraint. If they mean that the Y does not affect the constraint that is chosen, then they are correct.

Variance of the Estimators


Yang et al. (2008:1709) note that for a fixed number of time periods of data, the IE is more statistically efficient (has a smaller variance) than any CGLIM estimator that is obtained from a nontrivial equality constraint on the unconstrained regression coefficient estimator. This is an important statistical advantage of using the null vector as a constraint and is in agreement with Kupper et al. (1983:2797) who note that the principle component estimator (a linear transformation of the IE) deals with

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

OBrien

439

the exact linear dependency by involving only the non-zero eigenvalues associated with the eigenvectors. Because of this, it should be preferred on variance grounds. While noting that this is an important statistical property, I am in agreement with Kupper et al. (1983) that bias (and I and they are using bias in the sense of estimating the generating mechanism) is likely to be the more important factor. Kupper et al. (1983:2797) note that to the extent that: v0 b departs from zero, the principal component estimator is more biased and could lead to more bias than the use of some other constraint . . . , so that the optimal method for choosing c should probably take into account both bias and variance (i.e., mean square error) considerations. Of course, when the squared multiple correlation coefficient R2 is fairly close to 1, a result which seems to occur not infrequently in practice, the bias becomes the main area of concern.

Most Representative Solutions


The constraint used for the IE (the null vector) results in a vector from the origin to the solutions that is the shortest of all the constrained estimators. This property can be used to argue that this estimate is a sort of average of the estimates based on constrained estimation. From the discussion of the geometry of constrained estimation, it is possible to view this solution as the balance point of a teeter-totterwith solutions extending in both directions along the line of solution from this solution point. In a similar vein we can view the Moore-Penrose solution (the IE solution) as a sort of conventional solution. Given that any of the constrained solutions provide the same fit to the data, is there one solution that could serve as the conventional solution so that different researchers would report the same solution in the absence of evidence of a better solution? Some might say yes and that the solution should be one based on the Moore-Penrose generalized inverse, because of its statistical properties: closest to its solution, being (in a sense) most representative, and its variance characteristics.

Convergence
I showed earlier that the scalar k times the null vector (kv) represents the difference between any two constrained estimators when using the same data set. This holds for the intrinsic estimator ^ = bie + kv, where any of the conb strained estimates, ^ equals the estimates based on the intrinsic estimator, b, bie , plus k times the null vector. It also holds for the traditional constrained

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

440

Sociological Methods & Research 40(3)

estimators ^ = bc + kv, where any of the constrained estimators (including b the intrinsic estimator) equals any specific constrained estimator, bc , plus k times the null vector. Each of these solution vectors is a biased estimate of the parameter vector that generated the outcome data to the extent that it differs from the parameters values (the age, period, and cohort parameters) that nature used to generate the age-periodspecific results. The null vector is unique up to multiplication by a scalar, and Yang et al. (2008:1709) choose to norm the length of the null vector to 1 in their presentation. They later transform their solution using the orthonormal matrix of all eigenvectors to transform the coefficients of the principal components regression model to regression coefficients of the estimator B [the intrinsic estimator] (Yang et al. 2008:1705). Their transformed parameter values are the same as those that result from our use of constrained regression (or using generalized inverses) to obtain the parameters of the IE. Since they use this normed null vector that has a length of 1, they note that as the number of elements increases (the number of age groups and periods increase) the elements of the null vector become smaller and converge elementwise to zero (Yang et al. 2008:1709). They use the formula ^ = bie + k * v * (in our notation) to represent the b relationship of any constrained estimate (^ to the values estimated by the b) intrinsic estimator (bie ). I have used the asterisk on v* to emphasize that it is the normed null vector and on k* to emphasize that this scalar is appropriately scaled, given the normed null vector. I note that I can also write ^ = bc + k * v * . Yang et al (2008) argue that the expected value of any conb strained estimator converges in value to the expected value of (bie) as the number of periods and/or age groups increases, since the expected value of v * goes elementwise to zero with such increases. Given ^ = bc + k * v * b and that bie is a constrained estimator, I could just as easily argue that the expected value of the intrinsic estimator converges toward the expected value of any of the constrained estimators as v * goes elementwise zero. I am skeptical of this argument because as the normed v goes elementwise to zero there is no reason to assume that k * remains constant. I have found in simulations that k * does not remain fixed as the number of periods increases for the models I have investigated. On the other hand, convergence can occur for other reasons. It might be the case that there is a zero trend in the period effects in the long term. For example, the unemployment rate across a 60-period span may fluctuate up and down over short periods of time, but with no apparent linear trend in the long term. If we set a zero linear trend (ZLT) constraint for periods over a short time span, we may or may not get an accurate estimate of the data

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

OBrien

441

generating age effect coefficients.9 If we set a ZLT constraint for periods covering a longer span of time, say 30 years, we are likely to get a more accurate measure of the generating parameters for age, and a 60-year span will likely provide an even more accurate measure. This is not surprising, because the ZLT constraint for periods is more likely to be correct (or closer to being correct) as the span of periods increases, assuming an up and down pattern of unemployment with very little or no overall trend.10

Investigating the Effects of Different Constraints


It is straightforward to investigate with data what happens to the estimated age, period, and cohort coefficients as the constraints change. I report the results using two data sets (one with 5 periods and one with 20 periods). I use the data from the 5 5 age-period matrix in Table 1. I generated the cell entries by setting the earliest cohort value (cohort 1) to 5 and keeping the cohort effect at 5 through the fourth cohort and then increasing the cohort effect by .50 for each cohort through the ninth and final cohort. The period effect is two for the first three periods and four for the next two periods. The age effect is three for the two youngest age groups, two for the next oldest, one of the next oldest, and zero for the oldest. I begin the simulation by using the data for the 5 5 age-period matrix in Table 1. When I extend our analysis to 20 periods, I leave the cohort effects as they are for the 5 5 table and continue with no increase or decrease in the cohort effect for the newly added cohorts that correspond to the newly added periods. The period effects are two for the first 3 periods, four for the next 2 periods, two for the next 3 periods, and continue this up-and-down pattern until we reach the 20th period. The age effects remain the same as they are for the 5 5 table across all 20 periods. When we increase the number of periods to 20: the number of age groups remains at five and the number of cohorts increases to 24. Certainly, I do not claim that this data set is somehow representative of all of the possible generating mechanisms. The number of mechanisms is infinite, as are the number of possible constraints. The results point to many important relationships and illustrate many of the points that I discussed previously. I expect that if a constraint is consistent with the way the data were generated, we should obtain results that are consistent with the way the data were generated. If the constraint is not consistent with the way the data were generated, we will obtain some other set of results. But whatever the results, they will fit the data equally well; they will be orthogonal to the constraint used, they will all lie on a line of solutions, and the intercept will be the same no

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

442

Sociological Methods & Research 40(3)

matter what constraint is used. The quandary in APC analysis is to discover a method that will yield results that are consistent with how nature generated the data. To compute the constrained estimates, I used constrained regression analysis in STATA. To calculate the IE, I used the add-on program in STATA (cited in Yang et al. 2008). Table 2 presents the results. The first four columns show the results when there are five periods and five age groups (based on the data displayed in Table 1) for four different constraints: age1 age2, period3 period4, cohort7 cohort8, and the constraint associated with the IE. The fifth column contains the null vector for the 5 5 age-period matrix. The next four columns show the results for the same four constraints, but for the case where there are five age groups and 20 periods. The data have been coded using effect coding so that the sum of the effect coefficients is zero; and I have used the last category of age groups, periods, and cohorts as the reference category. We can therefore determine the coefficients for these reference categories and have reported this in Table 2.11 The final column of Table 2 contains the null vector for the 5 20 age-period matrix. I first note that the analyses produce results consistent with several principles noted earlier. (1) For the same X and Y data the intercept is always the same no matter which linear constraint is used to identify the models: 10.433 for the data with 5 periods and five age groups and 11.575 for the data with five age groups and 20 periods. (2) Each of the solutions is perpendicular to its constraint. This is easily verified by multiplying the transpose of the constraint vector times the solution vector. Note that the constraint vector ignores the reference categories (since they are dropped from the analysis); thus, we need to multiply only the constraint times the corresponding elements of the solution vector. For example, setting age1 age2 corresponds to a constraint vector of (1, 1, 0, 0, . . . , 0), which when multiplied times the appropriate elements of the solution vector results in zero (here I have placed the intercept as the final term to match the output). In the 5 5 case, the age1 and age2 coefficients both equal 1.20 so the dot product [c 0 (solution vector)] is zero; this is also true in the 5 20 case. The solution vector is perpendicular to the constraint vector. Similarly, the transpose of the null vector times the solution for the IE equals zero. For the 5 5 case: .267 age1 .134 age2 + 0.000 age3 + . . . + 0.401 cohort8 + 0.000 intercept 0. (3) All of the solutions lie on a single line of solutions and thus differ from one another by kv. In the 5 5 case, to move from the solution for age1 age2 to the solution for period3 period4, k 14.967; to move from the solution for the age constraint to the solution for cohort constraint, k 3.742; to move from the age constraint to the

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

Table 2. Analysis of the Data Generated in Table 1 for 5 Periods and Extended to 20 Periods as Described in the Text Five age groups 20 periods Null vector 0.267 0.134 0.000 0.134 0.267 0.134 0.000 0.134 1.200 1.200 0.200 0.800 1.800 0.800 0.800 0.800 1.200 1.200 0.800 0.800 0.800 1.200 1.200 0.800 0.800 0.800 1.200 1.200 0.800 2.800 0.800 0.200 1.200 2.200 18.200 16.200 14.200 14.200 12.200 8.200 6.200 4.200 4.200 2.200 1.800 3.800 5.800 5.800 7.800 11.800 2.200 1.700 0.200 1.300 2.800 5.550 5.050 4.550 2.050 1.550 3.050 2.550 2.050 0.450 0.950 0.550 0.050 0.450 2.950 3.450 1.950 1.336 1.268 0.200 0.868 1.936 1.444 1.376 1.308 0.760 0.827 1.105 1.037 0.969 1.098 1.166 0.766 0.698 0.631 1.437 1.505 0.427 age1 age2 intrinsic estimator period3 period4 cohort7 cohort8 Null vector 0.050 0.025 0.000 0.025 0.238 0.213 0.188 0.163 0.138 0.113 0.088 0.063 0.038 0.013 0.013 0.038 0.063 0.088 0.113 0.138 (continued)

Five age groups and 5 periods intrinsic estimator 1.390 1.295 0.200 0.895 1.990 0.990 0.895 0.800 1.295 1.390

Effects 2.200 1.700 0.200 1.300 2.800 1.800 1.300 0.800 1.700 2.200

age1 age2

period3 period4

cohort7 cohort8

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

age1 age2 age3 age4 age5 period1 period2 period3 period4 period5 period6 period7 period8 period9 period10 period11 period12 period13 period14 period15 period16

1.200 1.200 0.200 0.800 1.800 0.800 0.800 0.800 1.200 1.200

2.800 0.800 0.200 1.200 2.200 3.200 1.200 0.800 0.800 2.800

443

Table 2. (continued) Five age groups 20 periods Null vector age1 age2 intrinsic estimator period3 period4 cohort7 cohort8 Null vector 0.163 0.188 0.213 0.288 0.263 0.238 0.213 0.188 0.163 0.138 0.113 0.088 0.063 0.038 0.013 0.013 0.038 0.063 0.088 0.113 (continued)

444
intrinsic estimator 1.167 0.667 0.167 0.333 0.333 0.333 0.333 0.333 0.333 0.452 0.548 0.643 0.738 0.333 0.071 0.476 0.881 1.286 0.535 0.401 0.267 0.134 0.000 0.134 0.267 0.401

Five age groups and 5 periods

Effects

age1 age2

period3 period4

cohort7 cohort8

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

period17 period18 period19 period20 cohort1 cohort2 cohort3 cohort4 cohort5 cohort6 cohort7 cohort8 cohort9 cohort10 cohort11 cohort12 cohort13 cohort14 cohort15 cohort16 cohort17

0.833 0.833 0.833 0.833 0.333 0.167 0.667 1.167 1.667

8.833 6.833 4.833 2.833 0.333 2.167 4.667 7.167 9.667

0.800 0.800 1.200 1.200 1.875 1.875 1.875 1.875 1.375 0.875 0.375 0.125 0.625 0.625 0.625 0.625 0.625 0.625 0.625 0.625 0.625

13.800 15.800 15.800 17.800 24.875 22.875 20.875 18.875 16.375 13.875 11.375 8.875 6.375 4.375 2.375 0.375 1.625 3.625 5.625 7.625 9.625

2.450 2.950 5.450 5.950 3.875 3.375 2.875 2.375 2.375 2.375 2.375 2.375 2.375 1.875 1.375 0.875 0.375 0.125 0.625 1.125 1.625

0.360 0.292 1.776 1.844 1.096 1.164 1.231 1.299 0.867 0.435 0.002 0.430 0.862 0.794 0.727 0.659 0.591 0.523 0.456 0.388 0.320

Table 2. (continued) Five age groups 20 periods Null vector age1 age2 period3 period4 cohort7 cohort8 intrinsic estimator Null vector 0.138 0.163 0.188 0.213 0.238 0.263 0.000

Five age groups and 5 periods intrinsic estimator

Effects

age1 age2

period3 period4

cohort7 cohort8

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

cohort18 cohort19 cohort20 cohort21 cohort22 cohort23 cohort24 intercept 10.433 10.433 0.000

10.433

10.433

0.625 0.625 0.625 0.625 0.625 0.625 0.625 11.475

11.625 13.625 15.625 17.625 19.625 21.625 23.625 11.475

2.125 2.625 3.125 3.625 4.125 4.625 5.125 11.475

0.252 0.185 0.117 0.049 0.019 0.086 0.154 11.475

445

446

Sociological Methods & Research 40(3)

solutions using the IE, k .713. To move from the IE constraint to the age constraint, k .713; from the IE constraint to the period constraint, k 15.679; and from the IE constraint to the cohort constraint, k 3.029. (4) Each of the solutions fits the data equally wellthis also occurs in unreported analyses when I added random error to the cell entries. As long as the same data are analyzed, the fit does not depend on the linear constraint used. For this particular generating mechanism (age1 age2) the IE does better than the period and cohort constraints that we used (k .713 times the null vector for the absolute difference between the IE estimates and the putative data generating parameters); however, the data could have been generated by the coefficients implied by any of the particular constraints used in Table 2. For example, in the case of the period constraint (period3 period4), age group 1 (the youngest) is two units lower than age 2, which is one unit lower than age 3, which is one unit lower than age 4, which is one unit lower than the oldest group. This does not match the putative data generating mechanismbut I could have used these parameters to generate the data (and nature might have). If this were the generating mechanism for the data in Table 1, then the absolute value of k, which determines the distance between estimates, would have been 15.679 for the distance between the IE estimate and this generating mechanism. In terms of substance, I note that the different constraints produce quite different interpretations concerning the generating parameters. In Table 2, the only constraint to produce the putative data generating mechanism (age1 age2) is the only constraint (among those used) that is consistent with the generating mechanism. It is consistent because for the data generating mechanism the two youngest age groups have the same effect. I use the term putative to emphasize that any of the mechanisms implied by any of the solutions could have generated the data. This is the quandary associated with using these constrained regression approaches to APC analysis. If the data were generated by nature the way I generated it, then only the age1 age2 constraint (of the constraints used) would have shown us how nature generated these data. For this constraint, we see that each cohort is 0.50 greater than the earlier one from cohort5 up to cohort9, that the first three periods are two less than the next two, and that the age effect increases by one from the oldest to the next oldest and again by one to the next oldest and then increases by two for the two youngest age groups. This fits the putative data generating mechanism in both the 5 5 and 5 20 cases. Comparing the results as we move from the analysis of data for a 5 5 age-period matrix to those based on the a 5 20 age-period matrix does not allow us to examine patterns across the multitude of possible models

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

OBrien

447

that can arise in APC analysis. I can, however, note some interesting patterns, and as we will see, many of these patterns depend on the following relationship: kv (-c01 ^2 =v0 c1 ) v, where c1 is one constraint and ^2 is b b the solution under a different constraint. Here kv represents the difference between two solution vectors, one based on c1 as a constraint and the other (^2 ) based on c2 . b I will use the phrase convergence as the number of periods increases in a specific manner. It is the extent to which the values in a solution vector based on a particular age-period matrix approach some set of fixed values as more periods of data are added. Specifically, for our data I ask whether the age-effects in the 5 5 age-period matrix (which remain constant across all 20 periods) converge to some other values when there are 20 periods. Do the cohort effects that changed only for cohorts 5 through 9 have estimates that converge to some other value as we add periods? Do the period effect estimates based on the 5 5 age-period matrix converge to some new values as we add periods? Is there some sense in which we find that with more periods one of the constraints is more likely to tell us what nature was doing in the period we were studying. This concern assumes that analysts are actually interested in the age effects, period effects, and cohort effects on rates during a particular era. For example, during any specific era, cohort effects or period effect may be highly associated with smoking behavior, yet these very patterns are not likely to be associated with smoking behavior in the same way in future eras (after the original cohorts are replaced). What sorts of changes in the solutions do we find as we move from 5 to 20 periods? In both the 5 period and 20 period simulations we find that (1) when the constraint is consistent with the generating mechanism, the generating mechanism is estimated correctly. If I were to add error variance to these simulations, we would find that the expected value of the estimated parameters would remain unbiased estimates of the data generating parameters. (2) For each of the three traditional APC constraints, the estimated age effect coefficients are the same for both the 5 and 20 period cases. This means that they do not converge to some other values. (3) The values of k do not remain constant as we move from the 5 5 to the 5 20 situation. For example, the value of k needed to transform the effect coefficient estimates for the age1 age2 constraint to those obtained using the IE constraint is .713 in the 5 5 case. In the 5 20 case, this value of k is 2.709 (about 3.8 times greater). (4) The age effect estimates using the IE are not the same for the 5 period and the 20 period cases. This occurs because unlike the traditional constraints that remain the same in the 5 and 20 period cases, the null vector (and thus the constraint for the IE solution) changes as the number of

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

448

Sociological Methods & Research 40(3)

periods changes. (5) Typically, I calculate k by taking the age constrained estimated coefficients minus the corresponding IE estimated coefficients and dividing each of the resulting elements by the corresponding null vector element. We can, however, use k (-c01 ^2 =v0 c1 ) to calculate k in each of these b cases. For example if c1 is the constraint vector associated with the age constraint and ^2 is the solution vector associated with the IE, we will obtain the b same k values as we obtained before: .713 and 2.709. If we had reason to believe that the period effects in the long run had a zero linear trend, then we might well use a ZLT constraint for periods (see note 9). To the extent that this constraint is consistent with the data generating mechanism, it should make the estimated age effects converge on the data generating age effects. For the data in my simulation the data generating period effect is (2,2,2,4,4) for the 5 period case and is (2,2,2,4,4,2,2,2,4,4,2,2,2,4,4,2,2,2,4,4) for the 20 period case. Not surprisingly the ZLT period constraint is closer to the data generating mechanism in the 20 period case, and this is what we might expect in the long term for many sets of data. With these data the slope of the data generating period effects regressed on time is .60 for the 5 period case and only .04 for the 20 period case. The age effects for the data generating mechanism are the same for the two youngest age groups and then are one lower for each of the succeeding age groups. Using the ZLT constraint for periods in the 5 period case, these age effects are estimated to increase by .60 between age groups 1 and 2 and then to drop by .4 for each of the succeeding age groups. Using the ZLT constraint for periods in the 20 period case, there is an increase of .036 between the first and second age group and then drops of .964 for each of the succeeding age groups. There is clear convergence to the data generating values for age. This, however, depends on making a constraint that is consistent with the data generating mechanism. The data could have been generated by the mechanism implied by the constraint p3 p4 (see Table 2). If that were the case, the assumption that there is no linear trend in periods would be incorrect. The linear trend for that potential generating mechanism is strongly negative. But theory and other data might convince us that the zero linear trend for periods or some other linear trend in periods is much more plausible. Note that we might well make similar arguments for a ZLT constraint for cohorts.

Conclusion
Constrained estimation is the most common method used by those seeking to estimate each of the age, period, and cohort coefficients in APC models.

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

OBrien

449

Traditionally, equality constraints on two of the effect coefficients have been used, but a different and more complex constraint is used for the IE. The use of this constraint in APC models and some of its advantages were examined by Kupper et al. (1983, 1985). Its development as a general method for estimating the effect coefficients in APC models can be traced to Wenjiang Fu in a series of article (e.g., Fu 2000; Fu et al. 2004; Fu and Hall 2006; Knight and Fu 2000). This method has been highlighted for sociologists in two major articles by Yang and associates (2004, 2008). This articles goal is to offer a better understanding of the properties of constrained estimators in the APC context and to compare the conventional constrained estimator approach with the IE approach.12 With this in mind, I examined these estimators from several perspectives: (1) algebraically, as constraints directly associated with specific generalized inverses (each constraint has its own associated generalized inverse); (2) geometrically, showing many characteristics shared by all constrained estimators and some that are not shared; and (3) using two small sets of data (one with 5 periods and one with 20) to show how these relationships occur in practice and some patterns that can change and some that remain constant as the number of periods is increased (at least for the data I used). Because the various traditional constraints, the ZLT constraint discussed in this article, and the constraint imposed by the IE are all linear constraints applied to the X-matrix, it is not surprising that the solutions resulting from these constraints share many characteristics in common. These common characteristics include the following: With the constraint in place we can estimate each of the age, period, and cohort effects separately; each set of estimates is subject to bias when we define bias as the extent to which the parameter values that generated the data are not the same as the expected value of the estimates under the particular constraint used in the estimation; and the solutions are perpendicular to their constraints. When analyzing the same data set: All of the just identified constrained linear solutions lie on a line that is parallel to the null vector; each constrained solution has the same intercept; each constrained solution fits the data equally well; and all of these solutions are related to each other by a scalar times the null vector. There are some special characteristics that the IE does not share with other constrained estimators. It uses the null vector as a constraint (and this eliminates the temptation to examine the Y variable in order to set the constraint). It is the solution with the shortest length from the constraints origin; and related to this, the variances of the estimated coefficients based on the IE are smaller than for other constrained solutions. Additionally, in some senses, the IE could serve as the representative solution. As I have emphasized, however, arguably the major consideration should be

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

450

Sociological Methods & Research 40(3)

how well the data generating parameters are estimated, and using the wrong constraints can lead to highly misleading estimates. It is tempting to think that in some situations we can rely on theory and previous literature to make an educated guess about the constraint, and this may be the case in some situations. Perhaps the most compelling suggestion is this from Kupper et al. (1983), who note that there may be situations in which theory and previous literature allow the researcher to contemplate more than one set of constraints. Then one can use these different constraints separately: If the separate sets of estimates obtained by applying each of these various theoretically-based (a priori) choices for c are in close agreement, then one could justifiably have some confidence in the accuracy of this common set of estimated age, period, and cohort effects (Kupper et al. 1983:2804). Of course, these constraints should not be obtained by searching the data for constraints that work. One possibility might be to impose a ZLT for periods and a ZLT for cohorts when there are a large number of periods and cohorts, and it is reasonable to assume that there are no particular trends for periods or for cohorts. Then the researcher would obtain estimates under these two different constraints and see if they agree. If we move outside of the tradition of constrained estimators, there is a literature on methods that bypass the identification problem by not attempting to estimate the individual age, period, and cohort coefficients. Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.

Notes
1. Mason et al. (1973) introduced this technique to sociologists in 1973. 2. The basic model can also be estimated as a Poisson regression or as a logistic regression in a straightforward manner with a bit of notational change and, of course, the appropriate software (see Yang et al. 2008). The substance of the discussion in this article applies to these maximum likelihood (ML) estimation techniques as well as to the ordinary least squares (OLS) approach to estimation.

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

OBrien

451

3. The only difference between dummy variable coding and effect coding is the minus one coding for the reference category rather than a zero. 4. Analogously, if using Poisson or logistic regression, the model fit is the same no matter which generalized inverse is used; that is, 2 log-likelihood is the same no matter which generalized inverse is used (here I assume that we are using just one constraint so that the model is just identified). 5. I emphasize here that b represents the parameter vector associated with the process that generated the data. 6. I arbitrarily chose to place the constraint in the last row and then replace the last column of the resulting inverse with zeros. I could have placed the constraint in the fourth row and after obtaining the inverse replaced the fourth column with zeros (Mazumdar, Li, and Bryce 1980). 7. The null vector of (0, 1, 2) is unique up to multiplication by a scalar. 8. Interestingly, the Mazumdar et al. (1980) system does not yield the Moore-Penrose matrix, but since it implements the same constraint it yields the same set of solutions. 9. To produce a zero linear trend (ZLT) constraint for periods, we can use the following linear constraint: (n 1) period1 + (n 2) period2 + . . . + 1 period(n 1) 0, where n is the number of periods and the nth period is used as a reference category. 10. This is certainly the case for the situation in which the period effects across time is of the form of a sine wave. 11. With effect coding, the sum of the coefficients for age groups, period, and cohorts each equal zero, so that it is easy to calculate the effect for the reference category. 12. Yang et al. (2008:1706) certainly recognize that intrinsic estimator (IE) is constrained estimator: Figure 1 also helps to illustrate geometrically that the IE may in fact also be viewed as a constrained estimator.

References
Fu, Wenjiang J. 2000. Ridge Estimator in Singular Design With Applications to Age-Period-Cohort Analysis of Disease Rates. Communications in Statistics Theory and Method 29:263-78. Fu, Wenjiang J., and Peter Hall. 2006. Asymptotic Properties of Estimators in AgePeriod-Cohort Analysis. Statistics and Probability Letters 76:1925-129. Fu, Wenjiang J., Peter Hall, and Thomas E. Rohan. 2004. Age-Period-Cohort Analysis: Structure of Estimators, Estimability, Sensitivity and Asymptotics. Technical Report. Michigan State University, Department of Epidemiology. Knight, Keith, and Wenjiang Fu. 2000. Asymptotics for Lasso-Type Estimators. The Annals of Statistics 28:1356-378. Kupper, Lawrence L., Joseph M. Janis, Azza Karmous, and Bernard G. Greenberg. 1985. Statistical Age-Period-Cohort Analysis: A Review and Critique. Journal of Chronic Disease 38:811-30.

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

452

Sociological Methods & Research 40(3)

Kupper, Lawrence L., Joseph M. Janis, Ibrahim A. Salama, Carl N. Yoshizawa, and Bernard G. Greenberg. 1983. Age-Period-Cohort Analysis: An Illustration of the Problems in Assessing Interaction in One Observation Per Cell Data. Communications in StatisticsTheory and Method 12:2779-807. Mason, Karen Oppenheim, William M. Mason, H. H. Winsborough, and Kenneth W. Poole. 1973. Some Methodological Issues in Cohort Analysis of Archival Data. American Sociological Review 38:242-58. Mazumdar, S., C. C. Li, and G. R. Bryce. 1980. Correspondence Between a Linear Restriction and a Generalized Inverse in Linear Model Analysis. The American Statistician 34:103-05. Press, William H., Brian P. Flannery, Saul A. Teukolsky, and William T. Vetterling. 1992. Numerical Recipes in C: The Art of Scientific Computing. New York: Cambridge University Press. Rodgers, Willard L. 1982. Reply to Comments by Smith, Mason and Fineberg. American Sociological Review 47:793-96. Searle, S. R. 1971. Linear Models. New York: John Wiley & Sons. Smith, Herbert L. 2004. Response: Cohort Analysis Redux. Pp. 111-19 in Sociological Methodology, edited by R. M. Stolzenberg. Oxford. UK: Basil Blackwell. Yang, Yang, Wenjiang J. Fu, and Kenneth C. Land. 2004. A Methodological Comparison of Age-Period-Cohort Models: Intrinsic Estimator and Conventional Generalized Linear Models. Pp. 75-110 in Sociological Methodology, edited by R. M. Stolzenberg. Oxford, UK: Basil Blackwell. Yang, Yang, Sam Schulehoffer-Wohl, Wenjiang J. Fu, and Kenneth C. Land. 2008. The Intrinsic Estimator for Age-Period-Cohort Analysis: What It Is and How to Use It. American Journal of Sociology 113:1697-736.

Bio
Robert M. OBrien is a Professor of Sociology at the University of Oregon. He specializes in criminology and quantitative methods. He has published extensively in both areas. His recent publications include: Can Cohort Replacement Explain Changes in the Relationship Between Age and Homicide Offending? (Journal of Quantitative Criminology with Jean Stockard); A Mixed Model Estimation of Age, Period, and Cohort Effects (Sociological Methods and Research with Ken Hudson and Jean Stockard); The Age-Period-Cohort Conundrum as Two Fundamental Problems (Quality and Quantity); and Still Separate and Unequal? A City Level Analysis of the Black-White Gap in Homicide Arrest Since 1960 (American Sociological Review with Gary LaFree and Eric Baumer).

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

You might also like