You are on page 1of 43

Taylor Projection: A New Solution Method for Dynamic

General Equilibrium Models


Oren Levintal∗
Interdisciplinary Center (IDC) Herzliya, Israel
oren.levintal@idc.ac.il
May 11, 2018

Abstract
This paper presents a new solution method for dynamic equilibrium models. The so-
lution is approximated by polynomials that zero the residual function and its derivatives
at a given point x0 . The algorithm is essentially a type of projection but is significantly
faster, since the problem is highly sparse and can be easily solved by a Newton solver.
The obtained solution is accurate locally in the neighbourhood of x0 . Importantly, a
local solution can be obtained at any point of the state space. This makes it possible
to solve models at points that are further away from the steady state.

Keywords: Taylor projection, DSGE, Taylor series, perturbation, computational meth-


ods, inequality dynamics, high-order chain rules, automatic differentiation.

JEL classification: C61,C68,E12,E13,E17.


I am grateful to Jesús Fernández-Villaverde for fruitful discussions and to the Economics Department
at the University of Pennsylvania for their kind hospitality. I also thank three anonymous referees for their
helpful comments and Dirk Krueger, Harold Cole, Guido Menzio, Izhar Neder, Russell Cooper, Kenneth Judd,
Elhanan Helpman, Yona Rubinstein, Nadav Levy, Wouter den Haan, Tony Smith and seminar participants
at the University of Pennsylvania, Tel-Aviv University, Haifa University, IDC, the 69th Econometric Society
European meetings and the 86th annual conference of the Southern Economic Association for their comments
on earlier drafts of the paper. All errors are solely mine.

Electronic copy available at: https://ssrn.com/abstract=2728858


1 Introduction
Dynamic general equilibrium models are growing in size and complexity. New features
that have been introduced into these models generate nonlinearities that cannot be handled
satisfactorily by standard perturbation solutions.2 Examples include models with rare dis-
asters (Fernández-Villaverde and Levintal 2017), models with crisis equilibria that occur far
from the steady state (Brunnermeier and Sannikov 2014, Boissay, Collard and Smets 2016),
models with occasionally binding constraints3 (Guerrieri and Iacoviello 2015) and others.
A common alternative to perturbation is projection (Judd 1992), which is highly flexible
and accurate. Yet, projection suffers from a severe “curse of dimensionality”, which limits
the possibility of studying large-scale models, and in particular estimating them. Recent
developments, surveyed by Maliar and Maliar (2014), have mitigated the problem, but pro-
jection is still computationally costly, as runtime ranges from minutes to hours for medium
to large-scale models (Maliar and Maliar 2015).
This paper presents a new type of projection method that is less sensitive to the curse of
dimensionality. The method is called “Taylor projection”, because it is a hybrid of Taylor-
based approximations and projection techniques. The proposed method differs from standard
projection in the type of information used to approximate the solution. Specifically, standard
projection approximates the solution by polynomials and finds the polynomial coefficients
Θ that zero the residual function R (x, Θ) across a grid of N points x ∈ {x1 , . . . , xN }.4 By
contrast, Taylor projection zeros the residual function R (x, Θ) and all the derivatives of the
residual function with respect to x at a given point x0 up to order k, by solving the following
system for Θ:

∂ 2 R (x, Θ) ∂ k R (x, Θ)

∂R (x, Θ)
(1) 0 = R (x0 , Θ) = = = ... =
∂xj1 x0 ∂xj1 ∂xj2 x0 ∂xj1 · · · ∂xjk x0
∀xj1 , . . . , xjk ∈ x.

When these conditions hold, the k-order Taylor series of R (x, Θ) about x0 is exactly zero.
It follows from the Taylor theorem that R (x, Θ) ≈ 0 in the neighbourhood of x0 , which
implies that the obtained polynomials are an approximate solution to the model in this
neighbourhood.
2
For surveys of the existing solution methods see Fernández-Villaverde, Rubio-Ramı́rez and Schorfheide
(2016) and Maliar and Maliar (2014). Earlier surveys can be found in Judd (1998), Marimon and Scott (1999)
and Adda and Cooper (2003).
3
The algorithm proposed in this paper is not applicable to models with occasionally binding constraints.
4
If the system is overidentified or an exact solution does not exist, a least-squares algorithm is used (see
Judd 1992).

Electronic copy available at: https://ssrn.com/abstract=2728858


Taylor projection bears similarity to perturbation methods (Judd 1998). Both methods
differentiate the equilibrium conditions and exploit this information to approximate the solu-
tion. The difference between the two methods is that perturbation is applicable only at the
deterministic steady state, whereas Taylor projection can be implemented at any arbitrary
point of the state space. The theoretical background of the two methods is also different. Per-
turbation builds on the implicit function theorem, which does not hold for the case of Taylor
projection. This paper presents an alternative theoretical foundation for Taylor projection.
It shows that the algorithm converges to the true solution in the neighbourhood of x0 (for
any arbitrary x0 ) as the approximation order k increases, under three regularity conditions.
First, the policy functions and the model conditions must be analytic. Second, the dynamics
of the state variables should be sufficiently gradual. Third, the derivatives of the true solution
at x0 should decay sufficiently fast. Under these conditions, the Taylor projection solution
converges to the Taylor series of the true solution about x0 .
Krusell, Kuruşçu and Smith (2002), henceforth KKS, propose an algorithm that is closely-
related to Taylor projection.5 Specifically, they approximate the solution by a k-order poly-
nomial, evaluate the residual function at the steady state and differentiate it k times. KKS
use this algorithm to solve a deterministic model with quasi-geometric discounting, for which
the standard perturbation method is not applicable.6 The present paper differs from KKS in
two ways. First, KKS implement their algorithm at the steady state, which is in line with
the traditional perturbation approach. Indeed, they view their algorithm as “a variation on
the regular perturbation method” (KKS, p. 59). The present paper makes a more general
statement, backed by a theoretical argument, that Taylor projection can be implemented at
any arbitrary point of the state space. Interestingly, KKS acknowledge in the appendix of
their paper that their algorithm can be applied at points other than the steady state, but
they do not explore this possibility in their paper and do not provide a theory to support
this argument.7

5
I am grateful to Tony Smith for letting me know about this work.
6
For standard models, the KKS algorithm is identical to perturbation. However, in models with quasi-
geometric discounting, a k-order system contains the k + 1-order derivative of the solution. KKS assume
that this derivative is zero, which is equivalent to assuming that the solution is a k-order polynomial, as
done in Taylor projection. Moreover, KKS solve all the derivatives simultaneously. This is similar to Taylor
projection, which solves the polynomial coefficients Θ simultaneously. By comparison, in perturbation the
system is solved recursively, which is much easier, see Judd (1998).
7
To my knowledge, the first paper that applied the method at points other than the steady state is
Levintal (2013), which is a preliminary version of the present paper. After this paper circulated, Den Haan,
Kobielarz and Rendahl (2015) explored a similar approach and compared it to Coeurdacier, Rey and Winant
(2011). Fernández-Villaverde and Levintal (2017) and Barro, Fernández-Villaverde, Levintal and Mollerus
(2017) employ the algorithm proposed in the present paper to solve models with rare disasters. In particular,
Barro, Fernández-Villaverde, Levintal and Mollerus (2017) solve the model at points that are extremely far
away from the deterministic steady state.

2
Second, KKS study a model with only one variable. By doing so, they abstract from
the curse of dimensionality, which is currently the major constraint on the solution and
estimation of dynamic macroeconomic models. Importantly, solving a nonlinear system such
as (1) is extremely sensitive to the curse of dimensionality. The present paper resolves this
problem by developing new computational techniques that mitigate substantially the curse
of dimensionality. Consequently, the Taylor projection algorithm proposed in his paper is
significantly faster than standard projection methods.
The curse of dimensionality shows up in two stages of the algorithm. The first stage is
the computation of the high-order derivatives that constitute the nonlinear system (1). This
stage becomes extremely costly for large models or at high orders, because the complexity
of these derivatives grows very fast with the size of the model and the approximation order.
Note that standard projection methods do not compute high-order derivatives, hence they are
easier to implement in this respect. The second stage is the solution of the nonlinear system
(i.e. finding Θ). In large-scale models the vector Θ may contain thousands of parameters.
Solving these unknown parameters by standard algorithms would be too costly.
The present paper resolves these two computational problems, as explained in more detail
below. Interestingly, once these bottlenecks are removed, Taylor projection turns out to be
significantly faster than the most recently proposed projection algorithms. For instance, a
quadratic solution to a multi-country growth model with 20 state variables is obtained by
Taylor projection in only 1.4 seconds. By comparison, the EDS projection method proposed
by Maliar and Maliar (2015), which is among the most efficient projection methods currently
available, computes a quadratic solution to the same model in 180 seconds. The Smolyak
method of Judd, Maliar, Maliar and Vallero (2014) takes more than 3,000 seconds to yield
a “quadratic”8 solution to the same model. Thus, the computational gains of the proposed
method amount to orders of magnitude.
The main cost of Taylor projection is the computation of the high-order derivatives that
constitute the nonlinear system (1) and the Jacobian of this system (see below the discussion
on the Newton method and the Jacobian). The complexity of these derivatives increases
rapidly with the size of the model and the derivative order. This problem has long been ac-
knowledged in the computer science literature (Griewank and Walther 2008), which developed
efficient algorithms to compute derivatives by applying the chain rule. However, these algo-
rithms, called “automatic differentiation”, are relatively slow and memory intensive, which
precludes the solution of large-scale models.
To address this problem, the paper develops a new differentiation method that builds
on high-order chain rules. This method can be viewed as an extended version of automatic

8
More precisely, a second-level Smolyak approximation.

3
differentiation, which exploits not only the simple chain rule but also high-order chain rules.
The new differentiation method is faster than standard automatic differentiation in computing
high-order derivatives. Moreover, it is orders of magnitude more efficient in terms of memory
requirements. Thus, the implementation of high-order chain rules makes it possible to solve
large-scale models efficiently using standard computational and memory resources.
The second problem addressed in this paper is how to solve the nonlinear system (1).
A similar problem arises in standard projection methods, which also solve a large nonlinear
system. The literature on these methods has resorted to fixed-point iteration algorithms, thus
avoiding the computation of the Jacobian that is required in Newton solvers.9 Maliar and
Maliar (2014, p. 346) note that “Newton methods are fast and efficient in small problems but
become increasingly expensive when the number of unknowns increases. In high-dimensional
applications, we may have thousands of parameters in approximating functions, and the cost
of computing derivatives may be prohibitive.” Indeed, Fernández-Villaverde and Levintal
(2017) document rapidly growing computational costs when the Newton method is used to
compute a collocation solution.
Taylor projection offers a way to get around this problem. The paper shows that the
Jacobian of the nonlinear system (1) is highly sparse, if the approximating polynomials are
power series centered at x0 (the approximation point). Taking advantage of this sparsity
makes it feasible to apply the Newton method in large-scale models. Moreover, it is possible
to exploit certain features of the model dynamics to compute an approximate Jacobian that
is even sparser than the exact Jacobian. This Jacobian could reduce computational costs
dramatically for large models that are solved at high orders. Finally, it is shown that the Ja-
cobian is well-conditioned, because the nonlinear system captures heterogeneous information
on the model solution. These properties facilitate the implementation of the Newton method.
The key advantage of the Newton method is the convergence of the computational al-
gorithm provided that a good initial guess is available.10 By comparison, the fixed-point
iteration algorithm used in grid-based methods (Maliar and Maliar 2014) does not rely on a
convergence result, hence it may not converge even for a good initial guess, as discussed in
Maliar and Maliar (2015). Since a good initial guess is usually available through perturbation,
the convergence property is an important advantage of the Newton method over fixed-point
iteration.
Taylor projection can be useful in several cases. First, it allows to obtain a perturbation
solution at the steady state, when standard perturbation methods are not applicable. This
is the case of quasi-geometric discounting studied by Krusell, Kuruşçu and Smith (2002).
9
Early implementations of projection methods have used the Newton method, e.g. see Judd (1992).
10
A good initial guess is necessary but not sufficient. For necessary and sufficient conditions see Judd
(1998, p. 168) and his examples of failure of the Newton method (p. 153).

4
Second, models with strong volatility may be difficult to solve by standard perturbation.
For instance, Fernández-Villaverde and Levintal (2017) employ the algorithm proposed in
the present paper to solve new Keynsian models with Epstein-Zin preferences and variable
disaster risk with up to 12 state variables. Perturbation methods are not accurate for these
models due to the strong nonlinearity generated by the disaster risk.
The present paper demonstrates the performance of the method in regions that are far
away from the steady state. Solutions of this kind are required in models of inequality dy-
namics. The paper studies the capital dynamics of poor and rich countries. Specifically,
a multi-country growth model is solved and simulated from a point of high cross-country
inequality. At the initial state, the richest country is endowed with the steady-state level of
capital, while the poorest country is endowed with 10% of the steady-state level. Perturba-
tion solutions are extremely inaccurate in this region and often exhibit explosive paths. By
contrast, the proposed method produces accurate results, since the model is solved locally at
several points along the simulation path.
Finally, the choice of a solution method is always a matter of tradeoff between speed
and accuracy. Taylor projection is faster than standard projection methods, but it delivers
only local accuracy. If global accuracy is important, then global methods would be more
appropriate. Similarly, in small models the curse of dimensionality is not critical, hence the
gain in accuracy of global methods may worth the extra cost. Moreover, Taylor projection
involves high programming costs, which makes it less flexible than standard projection meth-
ods. Fortunately, most economic models fit into a fixed structure, hence these programming
costs are borne only once. To reduce entry costs, the companion MATLAB codes implement
the proposed algorithm on a large class of dynamic models.11 Hopefully, future research will
extend these codes to other classes of models.
The paper proceeds as follows: Section 2 describes the algorithm. Section 3 compares its
performance to the latest projection methods. Section 4 presents the differentiation method.
Section 5 discusses the sparsity of the Jacobian. Numerical stability is discussed in section
6. Section 7 implements the algorithm on a multi-country growth model and simulates
the convergence of the model from high cross-country inequality to full equality. Section 8
concludes. The online appendix provides further details.

2 Taylor Projection
A dynamic general equilibrium model is defined by the following system of functional
equations:
11
The codes are available at https://sites.google.com/site/orenlevintal/research.

5
(2) Et f (yt+1 , yt , xt+1 , xt ) = 0,
(3) xt+1 = h (xt ) + ηǫt+1 ,
(4) yt = g (xt )
! !
h̃ (xt ) 0
(5) h (xt ) = , η= .
Φ (x2t ) η̃

This notation follows Schmitt-Grohé and Uribe (2004). yt is a vector of ny control variables
T
and xt = (x1t , x2t ) is a column vector of nx state variables, where x1t denotes n1x predetermined
endogenous variables and x2t denotes n2x exogenous variables. The functions f : R2nf → Rnf
2 2
and Φ : Rnx → Rnx are known, where nf = n1x + ny . The functions g : Rnx → Rny and
1
h̃ : Rnx → Rnx are unknown (implicit functions). η is a known nx × nǫ matrix, and its lower
nonzero block η̃ is of size n2x × nǫ , and ǫt+1 is a vector of nǫ zero-mean shocks. Et is the
expectation operator conditional on information known in period t. For notational ease, the
endogenous and exogenous state variables are stacked together in vector xt and their expected
t + 1 value is given by the function h (xt ). It should be clear that only the first n1x rows of h
are unknown.12
Let ĝ (x) and ĥ (x) denote the following approximating functions:

k
X
(6) ĝ (x) = Ĝi (x − x0 )⊗i ,
i=0
k
 
P ⊗i
Ĥi (x − x0 )
(7) ĥ (x) =  i=0
,
Φ (x2t )

where Ĝi denotes a matrix of coefficients of size ny × (nx )i and Ĥi denotes a matrix of
coefficients of size n1x × (nx )i . The notation ⊗i denotes a “Kronecker power”, that is x⊗i =
x ⊗ · · · ⊗ x i times, where x⊗1 = x is a column vector, and x⊗0 = 1 is a scalar.
The approximating functions ĝ (x) and ĥ (x) are composed of k-order power series centered
at x0 .13 These functions approximate g (x) and h (x), respectively. Note that ĥ needs to
approximate only the upper block of h, since the lower block is the known function Φ. For
notational convenience, ĥ is treated as the approximating function of the entire vector h.

12
Note that (2)-(5) do not allow for occasionally binding constraints.
13
It is possible to choose any other basis function, but these power series yield a highly sparse Jacobian
(see section 5).

6
Furthermore, the power series include symmetric monomials that can be dropped, but for
notational ease I proceed with the Kronecker forms (6)-(7).14
Stack the coefficients of the power series into vector Θ, and denote the approximating
functions by ĝ (x, Θ) and ĥ (x, Θ). Substitute in (2)-(5) to form the residual function (time
subscripts are dropped):

   
(8) R (x, Θ) = Ef ĝ ĥ (x, Θ) + ηǫ, Θ , ĝ (x, Θ) , ĥ (x, Θ) + ηǫ, x .

Algorithm. Taylor Projection: Given a finite order k and a point x0 in the state space,
find Θ that satisfies (1) in the least-squares sense, where the residual function R is defined
by (8) and the approximating functions are given by (6)-(7).

The following assumptions are sufficient to prove that the Taylor projection algorithm
converges to a local solution as k goes to infinity:

Assumption 1. The functions f , g and h are analytic.

Assumption 2. The infinite Taylor series of g (x) about x0 , denoted g T (x), satisfies:

(9) g (x) = g T (x) and g (h (x) + ηǫ) = g T (h (x) + ηǫ)


∀x in the neighbourhood of x0 and ∀ǫ ∈ {ǫ : Pr (ǫ) > 0} .

Assumption 3. The k-order derivatives of g (x) at x0 converge sufficiently fast to zero as


k → ∞.

Under these regularity assumptions, the Taylor projection algorithm converges to a local
solution, as stated in the following theorem:

Theorem 1. Given the dynamic general equilibrium model (2)-(5), if assumptions 1-3 hold,
then ĝ and ĥ obtained by the Taylor projection algorithm converge to the Taylor series of g
and h about x0 as k → ∞.

Proof. See the online appendix. 

Assumption 1 requires that the policy functions and the model conditions be analytic.
This condition is often satisfied, since economic models are usually smooth. A similar condi-
tion holds for perturbation methods and therefore any model that can be solved by pertur-
bation satisfies it. If the policy functions have kinks, then the model must be defined such
14
The MATLAB code excludes symmetric monomials.

7
that assumption 1 holds. For instance, the kink can be approximated by an analytic penalty
function, as in Dewachter and Wouters (2014). Alternatively, the kink can be modelled as
the intersection point of two analytic functions, as in Christiano and Fisher (2000).15
Assumption 2 is central. It requires that states xt and xt+1 be sufficiently close to each
other. Under this condition, the policy function g is equal to its Taylor series over the
domain {xt , xt+1 }. Namely, xt and xt+1 lie within the convergence domain of the Taylor series
g T (x). This condition has an economic interpretation. It implies that the model dynamics is
sufficiently gradual. This assumption is likely to hold in many economic applications, because
economic dynamics tends to be gradual. Assumption 2 may fail to hold if the model generates
big jumps in the state variables.
Assumption 3 is more technical. It requires that the Taylor series of g about x0 be
approximately a low-order polynomial. Namely, the high-order Taylor coefficients become
negligible sufficiently fast. If this condition does not hold, a change of variables may attain
it. For example, the derivatives of g (x) = ex do not converge to zero, but the derivatives
p
of log g (x) or g (x) do. Note that when the true solution is an exact k-order polynomial,
then a k-order Taylor projection yields the exact solution, since (1) holds exactly for that
polynomial (see Levintal 2013 for an example).
Taylor projection is a local solution, because it converges to the Taylor series of the true
solution. The sense of locality is somewhat different than the standard meaning of locality.
A local solution near x0 usually means that the solution is exact at x0 and gets less accurate
away from x0 . This is not true here. The solution is not exact at x0 and does not necessarily
deteriorate around x0 . However, if we consider the residual function R (x, Θ), then this
function is locally zero around x0 by the standard meaning of locality. Specifically, under the
Taylor projection solution the residual function is exactly zero at x0 and gets further away
from zero as we move away from x0 .
By comparison, in a global method the residual function is close to zero globally over a
domain. For instance, in collocation the residual function is exactly zero at the Chebyshev
nodes, and gets closer to zero across the entire domain as the number of nodes increases
(Judd 1992). Taylor projection cannot achieve such a global accuracy. At best, the solution
would converge to its Taylor series, so accuracy is always limited to the convergence domain
of the Taylor series. However, since the method can be implemented at any point of the state
space, a globally accurate solution can be obtained by solving the model locally at multiple
points.

15
This introduces computational issues that are beyond the scope of this paper. I leave these issues for
future research.

8
2.1 Taylor projection versus perturbation
Perturbation is a special case of Taylor projection, because it satisfies the same conditions
specified in (1) for an extended version of the model. To see this, let σΣ be the variance-
covariance matrix of ǫt+1 , where σ is a perturbation parameter. Define the extended vector
xt = (x1t , x2t , σ), which includes the endogenous state variables x1t , the exogenous state vari-
ables x2t and the perturbation parameter σ. For this extended model, the residual function
R (x, Θ) is a function of the state variables and the perturbation parameter σ.
It can be easily verified that the Taylor projection conditions (1) are identical to the
perturbation conditions, provided that x0 = (x̄1 , x̄2 , 0), where x̄1 and x̄2 are the steady
state values of the respective state variables.16 Hence, a k-order perturbation solution is a
solution to the Taylor projection algorithm, provided that: (1) x is extended to include the
perturbation parameter; (2) the residual function and its derivatives are evaluated at the
deterministic steady state; and (3) the volatility of the shocks is set to zero (σ = 0). Taylor
projection is more general than perturbation, because: (1) it does not require the derivatives
w.r.t the perturbation parameter to be zero; (2) the residual function and its derivatives can
be evaluated at points other than the steady state; and (3) the volatility of the shocks can
be nonzero, i.e. σ > 0.
This analysis suggests two cases where Taylor projection can potentially be more accurate
than perturbation. The first case is when the model is solved at points that are far away
from the steady state. This case is studied in section 7. The second case is when the
model volatility is strong. In this case, perturbation may fail to deliver accurate results,
because the approximation point assumes zero volatility. By comparison, the approximation
point of Taylor projection assumes the true volatility, so the solution is likely to be more
accurate. Fernández-Villaverde and Levintal (2017) demonstrate this case for models with
rare disasters.
The two methods yield identical results if the model is deterministic and solved at the
steady state. In this case, Taylor projection is equivalent to perturbation, because the model
has no volatility and the approximation point is the steady state for both methods. Similarly,
models with low volatility solved at the steady state, such as the multi-country model studied
in the next section, would yield similar results, because the gain of using the true volatility
is negligible.

16
For instance, under the perturbation solution, the residual function is exactly zero at the steady state
with zero volatility,
 i.e. at the point x0 = x̄1 , x̄2 , 0 . This is implied by the first equality in (1) evaluated at
1 2
x0 = x̄ , x̄ , 0 . Furthermore, under the perturbation solution, the first derivatives of the residual function
w.r.t the state variables and the perturbation parameters must be zero at x0 = x̄1 , x̄2 , 0 . This is implied
by the second equality in (1). Similar arguments apply to higher orders.

9
2.2 Taylor projection versus standard projection
Projection methods are defined by Judd (1992) as solutions that satisfy projection con-
ditions of the form:
Z
R (x, Θ) pi (x) w (x) dx ∀i = 1, . . . , N,
x∈D

where the functions pi (x) for i = 1, . . . , N are the directions of the projection and w (x) is a
weighting function.
Interestingly, Taylor projection can be interpreted as a a special case of projection. As
shown by Judd (2003), the nonlinear system (1) can be defined as follows (for simplicity, I
show the case that x is scalar):

Z Z Z
0= R (x, Θ) δ (x) dx = ′
R (x, Θ) δ (x) dx = . . . R (x, Θ) δ [k] (x) dx,
x∈D x∈D x∈D

where δ (x) is the Dirac delta function with a unit mass at x0 , and δ ′ , δ ′′ , . . . , δ [k] are the first,
second to kth derivatives of δ (x).17 In this form, the Taylor projection method fits into the
general definition of projection, where the projection directions are the Dirac delta function
and its derivatives.18
A key advantage of Taylor projection over other projection methods is that a Newton
solver can be applied easily to find the unknown coefficients Θ. This is due to two reasons:
First, the Jacobian of the system is highly sparse and well-conditioned, as discussed in sections
5 and 6. Second, a good initial guess is usually available through perturbation, which is a
special case of Taylor projection. Therefore, it is always possible to solve the model at the
non-stochastic steady state by perturbation (assuming that a perturbation solution exists).
Once the model is solved at that point, the solution can serve as an initial guess for an
adjacent point. Proceeding to further adjacent points in small steps yields a good initial
guess at the point of interest.

17
Rb Rb
Integration by parts implies that a f (x) g ′ (x) dx = f (b) g (b) − f (a) g (a) − a f ′ (x) g (x) dx. If g (x) is
the density function of a Gaussian distribution with mean x0 ∈ (a, b) and standard deviation ǫ, then taking ǫ
Rb Rb
to zero yields at the limit a f (x) δ ′ (x) dx = − a f ′ (x) δ (x) dx = −f ′ (x0 ). Extension to higher orders and
to multi-dimensional functions is straightforward, see Hörmander (1990, p. 56).
18
There is nothing special in using the Dirac delta function as a projection direction. This function is used
also in collocation, see Judd (1992).

10
2.3 Implementation issues
Like other solution methods, the Taylor projection algorithm does not guarantee that the
obtained solution is a good approximation of the true solution. It is critically important to
verify by some measure of accuracy that the obtained solution satisfies approximately all the
equilibrium conditions over the domain of interest. Moreover, if the true model has multiple
solutions, finding all the solutions may be difficult. In this case, the Newton solver should be
employed from different starting values.
The algorithm may fail to find a solution in several instances. First, the Newton solver may
fail if the initial guess is not sufficiently good, or if the conditions of the Newton algorithm do
not hold. For instance, if there is a continuum of solutions or if the Jacobian is not invertible
near the solution, the Newton method will fail to converge. In those cases, a different solver
should be used. Second, models with large variations in the state variables may be difficult to
solve by low-order approximations. Recall that the nonlinear system (1) is evaluated at one
point x0 , which is the current state. In this system, the values of the control variables at the
future state are extrapolated by the approximating polynomials. If the future state is close to
the current state, low-order polynomials may extrapolate the solution accurately. However,
if the future state is distant, then higher-order approximations may be required to get a
reasonably accurate extrapolation.19 If distance is even larger, it may violate assumption 2
and the algorithm would always fail. Hence, the distance between the current and future
states is an important determinant of the performance of the algorithm.
There are cases where the algorithm delivers an accurate solution when applied at a
certain state, but fails to find a solution at a different state. Models with rare disasters are
a case in point. In these models the economy can be in normal or disaster states, denoted
xN D N
t and xt , respectively. When the algorithm is applied at a normal state xt , the likelihood

of staying in a normal state, i.e. Pr xt+1 = xN N
t+1 |xt = xt , is high. Hence, the current and
future states are very similar (with high probability). In this case, the algorithm is likely to
work smoothly, because the Taylor series at xN t approximates well the future state, which
is likely to be xt+1 . The approximation error that occurs at a future disaster state xD
N
t+1 is

negligible in expected value, because Pr xt+1 = xD N
t+1 |xt = xt is low.
Conversely, the algorithm may fail when applied at a disaster state. In this case, we are
approximating the Taylor series about the current disaster state xDt . Since the next period
N D
is likely to be a normal state xt+1 , the Taylor series about xt needs to approximate well
the future normal state xN N
t+1 , which is distant. If the approximation error at xt+1 is large, it
remains large also in expected value, because xN t+1 is of high probability, which contaminates

19
Footnote 37 discusses a particular case, where the approximation fails at the first order, but succeeds at
higher orders

11
the obtained solution. Usually, having an accurate solution at a normal state is satisfactory,
because the most interesting effects of rare disasters take place in normal states through asset
prices and precautionary savings. However, if one is interested in a highly accurate solution
also at the disaster state, then the disaster shock should be modelled as a persistent process,
so that the recovery from a disaster state is more gradual. For further discussion on disaster
models see Fernández-Villaverde and Levintal (2017).

2.4 A simple example


To illustrate the Taylor projection method, consider a simple neoclassical growth model
with a fixed technology. The model conditions are given by:

! !
f1 K α + (1 − δ) K − C − K ′
f= = γ = 0,
f2 β CC′ α (K ′ )α−1 + 1 − δ − 1


where K and C denote capital and consumption of current period and ′ denotes next period
value. The budget constraint and the Euler condition are denoted f 1 and f 2 , respectively.
For simplicity, consider a first-order approximation of the endogenous variables C and K ′ ,
denoted by the first-order polynomials ĝ and ĥ:

ĝ (K) = G0 + G1 (K − K0 )
ĥ (K) = H0 + H1 (K − K0 ) ,

where G0 , H0 , G1 , H1 are (unknown) scalar coefficients. These coefficients are stacked in


vector Θ:

Θ = (G0 , H0 , G1 , H1 ) .

Substitute C = ĝ (K) and K ′ = ĥ (K) to form the residual function:

 
K α + (1 − δ) K − ĝ (K) − ĥ (K)
!
1
R γ  
R (K) = = .
 α−1 
g(K)
R2 β ĝ ĥ(K) α ĥ (K) +1−δ −1
( )

The residual function is a 2 × 1 vector, whose elements are denoted R1 and R2 .


The nonlinear system is obtained by differentiating the residual function w.r.t K, evalu-

12
ating at K0 and equating to zero:

 
R1 (K0 )
R2 (K0 )
 
 
  = 0.
∂R1
 
 ∂K K0 
∂R2

∂K K0

This system has 4 equations and 4 unknowns, which are the polynomial coefficients in Θ.

3 Comparison to other Projection Methods


This section compares the performance of the proposed method to the latest projection
methods studied in Maliar and Maliar (2015) and Judd, Maliar, Maliar and Valero (2014). It
is shown that the proposed method is faster than these two projection methods by orders of
magnitude. The model used as an example is the standard multi-country growth model with
up to 20 state variables. A detailed discussion on the computational aspects of the proposed
algorithm is delayed to the following sections.
Maliar and Maliar (2015) and Judd, Maliar, Maliar and Valero (2014) solve the multi-
country growth model over the ergodic set using two projection methods and test their
performance in terms of speed and accuracy. For comparability with their results, the same
model and accuracy and speed measures are used here. The multi-country growth model
has been used in a number of other studies to compare the performance of various solution
methods, e.g. see Kollmann, Maliar, Malin and Pichler (2011), Maliar, Maliar and Judd
(2011), Maliar, Maliar and Villemot (2013).
It should be made clear that the focus of this section is on computational costs. The multi-
country model is not a challenging model for testing accuracy, because the nonlinearity of the
model is relatively weak. This model can be solved very accurately by simple perturbation
methods (see Table 1). However, for the purpose of comparing computational costs across
different solution methods the model is very useful, because it allows to test computational
costs on models of any arbitrary size, simply by changing the number of countries. For this
reason, it has been used repeatedly in the previous literature. I follow the same practice here.
Accuracy will be tested later in section 7 in a more challenging setup, where a version
of the multi-country model will be simulated from an initial point of high cross-country
inequality. This requires to solve the model far from its steady state, a region where sim-
ple perturbation methods deliver inaccurate results and often produce explosive simulations.
Fernández-Villaverde and Levintal (2017) use the proposed method to solve a new Keyn-

13
sian model with rare disasters, which is another example of a model that cannot be solved
accurately by perturbation.
The multi-country growth model consists of a social planner that maximizes an aggregate
welfare function over N countries:

N X
X ∞
β t u cht ,

max E0
h=1,...,N
{cht ,iht }t=0,...,∞ h=1 t=0

subject to the aggregate resource constraint:

N
X N
X N
X
cht iht aht Af kth ,

+ =
h=1 h=1 h=1

and the evolution law of capital and technology:

h
kt+1 = (1 − δ) kth + iht
log aht+1 = ρ log aht + ǫht+1 + ǫt+1
∀h = 1, . . . , N,

where kth , aht , cht and iht denote the capital, technology, consumption and investment, re-
spectively, of country h in period t, δ denotes the depreciation rate, ρ is the persistence of
technology shocks, ǫht is a country-specific shock and ǫt is an aggregate shock, where all shocks
are independently normally distributed with zero mean and standard deviation σ. The func-
tions u and f denote the utility function and the production function, respectively, which are
assumed to have the standard functional forms:

c1−γ
u (c) = , f (k) = k α .
1−γ
The number of state variables is 2N, where N is the number of countries. Parameter values
are taken from Maliar and Maliar (2015): γ = 1, α = .36, β = .99, δ = .025, ρ = .95 and
σ = .01.
The first projection method studied is the EDS method described by Maliar and Maliar
(2015), as implemented by their MATLAB codes. The method starts by obtaining a linear
solution through the simulation method (GSSA) of Judd, Maliar and Maliar (2011). Using
the GSSA solution, the model is simulated for T periods, of which M points are chosen to
construct the EDS grid.20 Then, a fixed point iteration algorithm finds the k-order polynomi-

20
Maliar and Maliar (2015) advise to choose M that is “slightly larger” than the number of coefficients in

14
als that solve the model over the grid. The second projection method is the Smolyak method
of Judd, Maliar, Maliar and Valero (2014), as implemented by their MATLAB codes. This
method also starts with a cheap GSSA procedure in order to obtain a linear solution, which
is used to construct a Smolyak grid over the ergodic set. Then, a fixed-point iteration algo-
rithm finds the Smolyak polynomials that solve the model over the grid. The third projection
method is the proposed method of Taylor projection. It is implemented at the steady state
using a perturbation solution for the initial guess. For all methods, the expected values of the
Euler conditions are approximated by monomial rules with 2nǫ nodes (for nǫ shocks), and the
convergence criterion is set at 1e-8, as in Maliar and Maliar (2015).21 For comparability, the
Table reports also the results of perturbation solutions obtained by the algorithm described
in Levintal (2017). The codes are written in MATLAB/MEX and run on a Dell computer
with an Intel(R) Core(TM) i7-5600U Processor and 16GB RAM.
Table 1 reports runtime and accuracy measures for each solution method for orders 1, 2 and
3. For the Smolyak method, the “order” of the solution refers to the Smolyak approximation
level (see Judd, Maliar, Maliar and Valero 2014). For the other methods, it is the order of
the approximating polynomials, which are complete polynomials. The initial guess is always
a complete polynomial, and its order is also reported. The table reports total runtime (in
seconds), including time required for constructing the grid (when relevant). In addition,
the number of iterations to convergence is reported. For the EDS and Smolyak methods,
number of iterations refers only to the main computational algorithm (excluding the GSSA
initial step). Finally, accuracy measures include maximum and mean absolute unit-free errors
across a simulation of 10,000 periods (presented in log10 units).
(Table 1 about here)
The first model shown in Table 1 is a two-country growth model, which has 4 state
variables. The EDS method solves the model in 23 to 31 seconds. The Smolyak method
takes between 22 and 62 seconds. The EDS and Smolyak methods use a fixed-point iteration
algorithm to solve the polynomial coefficients, starting from an initial linear guess. The
number of iterations required for convergence is roughly three thousand for the EDS method
and four and a half thousand for the Smolyak method. Since these methods do not compute
the Jacobian, the runtime of each iteration is very fast, although the total number of iterations
is large.
Consider now the Taylor projection algorithm. Table 1 presents two different implementa-
tions of the algorithm that are labelled “Exact Jacobian” and “Approximate Jacobian”. The
“Exact Jacobian” algorithm solves the nonlinear system (1) by a standard Newton method

the basis function. Hence, M is set at 20 percent above the number of polynomial coefficients. T is set at
10,000.
21
The preliminary GSSA step in the EDS and Smolyak methods uses a looser criterion of 1e-5.

15
with an exact analytic Jacobian, starting from a perturbation solution. The results show that
the algorithm converges in only 2-3 iterations, totalling 0.1-0.2 seconds, if the initial guess is
of the same order as the final solution. If the initial guess is a first-order perturbation solu-
tion, it takes 3-4 iterations to converge. Namely, when the initial guess is less accurate, more
iterations are required, as expected. In any case, the number of Newton iterations required
for convergence is small and stable across all specifications.
The accuracy measures of Taylor projection are almost indistinguishable from perturba-
tion of a similar order.22 This is not surprising. As explained in subsection 2.1, if Taylor
projection is implemented at the deterministic steady state (as done here), it differs from
perturbation only if the volatility of the model is significant. If volatility is zero, the two
methods are identical, and if it is small (as in the current multi-growth model) the differences
are small.
Compared to the EDS and Smolyak methods, the accuracy measures of Taylor projection
and perturbation are somewhat lower, but are still very good. The (log10) mean error ranges
from -3.1 to -6.5 for all models. The maximum error is also quite small. For the second and
third-order solutions, the max error ranges from -2.8 to -4.6. The first-order solution exhibits
somewhat larger errors, with the maximum error reaching -1.2 for the two-country model.
Given the high accuracy of perturbation and its fast runtime, it is clear that perturbation is
the preferable solution method for this model, which is consistent with similar findings on the
neoclassical growth model reported in the previous literature (Aruoba, Fernández-Villaverde
and Rubio-Ramı́rez 2006). Therefore, the rest of this section focuses on the computational
costs of the projection methods. Accuracy comparison will be performed later in section 7,
which studies a model that cannot be solved accurately by perturbation.
The differences in computational costs between the projection methods are large. Most
relevant is the performance of these methods in large-scale models, such as the 10-country
model with 20 state variables. A quadratic solution to this model is obtained in 180 seconds by
the EDS method and 3,028 seconds by the Smolyak method. By contrast, Taylor projection
produces a quadratic solution in 3.5 seconds if the initial guess is a second-order perturbation
solution, or in 5.3 seconds if the initial guess is a first-order solution. Similar runtime figures
are reported in section 7 for a model with capital adjustment costs, which is solved far from
the steady state.
Taylor projection is fast because the Jacobian is highly sparse, as discussed in section 5.
Hence, the computational costs of the Jacobian and the Newton step are low. It is possible
to reduce computational costs further by using an approximate Jacobian. The approximate
Jacobian is also an analytic Jacobian, but it is sparser than the exact Jacobian, because

22
The differences between the accuracy measures show up only at the fourth decimal digit.

16
elements of the Jacobian that are close to zero are assumed to have a zero value and hence
not computed (a detailed discussion is delayed to section 5). The approximate Jacobian is
useful mainly in large models. For instance, the 10-country model is solved quadratically in
5.3 seconds using an exact Jacobian and a linear initial guess. In the case of the approximate
Jacobian, runtime falls to 1.4 seconds. The computational gain is even more pronounced for a
third-order solution, which takes 4,514 seconds with an exact Jacobian, but only 175 seconds
with the approximate Jacobian.
To summarize, the proposed Taylor projection algorithm offers significant computational
gains over the currently available projection methods. It is able to obtain a quadratic solution
to a 10-country model in less than 2 seconds, which is orders of magnitude faster than the grid-
based projection methods considered here. Further evidence on the computational advantage
of Taylor projection over standard projection is documented in Fernández-Villaverde and
Levintal (2017) for a new Keynsian model with Epstein-Zin preferences and variable disaster
risk. They compare Taylor projection to a Smolyak collocation, where both methods employ
a Newton solver to solve the nonlinear system. Interestingly, Taylor projection is found to
be not only faster than the Smolyak algorithm, but also more accurate (for a comparable
solution order).
A final note on convergence is in order. The grid-based methods studied in this section
solve the nonlinear system using a fixed-point iteration algorithm. This solver accelerates
significantly the performance of grid-based methods relative to previous algorithms (Judd,
Maliar, Maliar and Valero 2014). The drawback is that the algorithm has to be designed
carefully in order to obtain convergence. This requires a fair bit of trial and error and yet
convergence is never guaranteed, even if we start near the solution (Maliar and Maliar 2015).
By comparison, Taylor projection employs the Newton method, which under certain regular-
ity conditions yields a quadratic convergence rate, provided that the initial guess is sufficiently
accurate. In the applications studied here and in Fernández-Villaverde and Levintal (2017),
perturbation solutions proved to be satisfactory, and convergence was attained in a few iter-
ations, demonstrating the robustness of the algorithm. Furthermore, since we compute the
Jacobian, we can apply more advanced Newton solvers that are less sensitive to the initial
guess.23

23
The companion MATLAB code is compatible with the MATLAB functions fsolve and lsqnonlin.
These functions employ advanced nonlinear solvers and least squares algorithms.

17
4 Differentiation Costs
The main cost of Taylor projection is the computation of the high-order derivatives of the
residual function R (x, Θ). The nonlinear system (1) is composed of derivatives of R with
respect to x, up to order k. In addition, the Jacobian requires to compute the derivatives of
the nonlinear system with respect to Θ. Hence, a k-order solution requires to differentiate
the residual function k + 1 times.
Taking derivatives cannot be done manually in large and complicated economic mod-
els. A similar problem arises in high-order perturbation solutions. Fernández-Villaverde,
Rubio-Ramı́rez and Schorfheide (2016) survey the available solutions, which consist of three
differentiation methods: numerical derivatives, symbolic differentiation and automatic differ-
entiation. This section discusses the applicability of these methods and demonstrates their
performance. The results suggest that the existing differentiation methods are overly restric-
tive and that their memory consumption increases rapidly with the size of the model and
the solution order. This precludes the possibility of solving large models, or even small ones,
at high orders. In addition, the existing methods are relatively slow, particularly for large
models or at high orders.
In order to resolve this problem, the paper presents a new differentiation method, which is
based on high-order chain rules. The new method yields exactly the same derivatives as the
other methods studied in this section, but it is faster and orders of magnitude more memory
efficient. Importantly, it makes it possible to differentiate and solve large-scale economic
models using standard computational and memory resources.
Derivatives are often obtained numerically by differencing.24 This method may be useful
in computing first derivatives. However, it tends to generate large approximation errors at
higher orders (see Griewank and Walther 2008, p. 2). Hence, the main focus of this section
is on differentiation methods that compute exact derivatives up to machine precision. The
appendix discusses the case of numerical derivatives and shows that they are also extremely
slow.

4.1 Symbolic versus automatic derivatives


Exact derivatives can be obtained by symbolic differentiation. However, the complexity
of this method increases extremely rapidly. Table 2 presents in columns (1)-(3), panel I, the
size of codes that compute the symbolic derivatives required for solving the multi-country
model (including the Jacobian).25 As can be seen, the codes are very large and grow rapidly

24
For instance, Krusell, Kuruşçu and Smith (2002) use numerical derivatives.
25
The MATLAB symbolic toolbox is used.

18
with the size of the model and the solution order. Compiling and evaluating these codes for
large and complicated models becomes infeasible with standard computational resources.
(Table 2 about here)
This problem has long been acknowledged by the computer science literature (Griewank
1989). It has led to the development of new algorithms, called “Automatic Differentiation”
(AD), which are surveyed by Griewank and Walther (2008). AD algorithms apply the chain
rule to simplify the computation of derivatives. Table 2 reports in columns (4)-(6) the memory
consumption of the AD method of Patterson, Weinstein and Rao (2013), which has been
shown to be highly efficient compared to other AD algorithms. Memory consumption is
measured by the size of two files that are generated automatically by the algorithm: MATLAB
codes that compute derivatives (panel I), and data (variables) that store necessary information
on the derivatives (panel II). The required memory in AD is much smaller than for symbolic
derivatives. However, memory consumption still grows rapidly with the size of the model
and the solution order, imposing a binding constraint on the solution of large models at high
orders.
Given the limitations of the existing differentiation methods, this paper proposes a new
differentiation method that is based on high-order chain rules. The proposed method can be
viewed as an extended version of automatic differentiation (AD) that employs not only the
first-order chain rule, but also higher-order rules. The high-order chain rules are described
briefly below. A detailed derivation is relegated to the online appendix. The companion
MATLAB code implements the new differentiation method for the class of models defined by
(2)-(5).

4.2 High-order chain rules


The standard chain rule for vector valued functions can be expressed in a simple matrix
form. Let f (v) and v (x) denote two differentiable functions, and let f (v (x)) denote the
composition of the two functions. The derivatives of f (v (x)) with respect to x are given by:

(10) fx = fv vx ,

where fv is the Jacobian matrix of f w.r.t v and vx is the Jacobian matrix of v w.r.t x.
This first-order chain rule can be extended to higher orders. Higher-order chain rules are
multi-dimensional tensors, but we can transform them into matrices to yield simple matrix
forms. The second- and third-order chain rules are given by the following matrices:

19
(11) fxx = fvv vx⊗2 + fv vxx ,
(12) fxxx = fvvv vx⊗3 + fvv (vx ⊗ vxx ) Ω1 + fv vxxx .

In this notation, fxx and fxxx denote the second and third-order derivatives of the composite
function f (v (x)) with respect to x, respectively. These objects are tensors, but here they
are denoted as matrices. The full details of these matrices, as well as the other matrices that
appear in these chain rules, are provided by the online appendix. Levintal (2017) extends
these chain rules to the fifth order and uses them to derive perturbation solutions.
These high-order chain rules reduce the computational costs and memory consumption
of high-order derivatives. First, partial derivatives of f and v are computed only once and
used throughout (e.g. the matrices fv , fvv , . . . , vx , vxx , . . .). Second, the chain rules exploit
permutations (the matrix Ω1 is a sum of permutation matrices). Third, the chain rules can
be computed by operations on sparse matrices or tensors.26 This is important, because many
expressions in these chain rules are numerically zero, e.g. powers of x − x0 when x = x0 .
Symbolic and automatic derivatives cannot exploit this sparsity. Finally, symmetry of mixed
derivatives can be easily exploited. The online appendix provides further details.

4.3 High-order chain rules versus automatic derivatives


Table 2 reports in columns (7)-(9) the memory consumption of high-order chain rules
for the multi-country model. Note the significant reduction in memory consumption relative
to the AD method. For instance, a second-order system of a 10-country model consumes
about 845MB of codes and data with AD, but only 0.8MB with high-order chain rules. The
reason for this large difference is that the AD method produces files that code explicitly all the
numerical operations. By contrast, the high-order chain rules produce files that code only the
partial derivatives, i.e. the matrices fv , fvv , . . . , vx , vxx , . . ., while the matrix and Kronecker
products are performed by specialized algorithms. This yields a significant reduction in the
size of the required codes.
The higher efficiency of the proposed method translates into faster codes. Table 2 reports
the runtime of three stages of differentiation. The first stage generates the codes and data; the

26
MATLAB does not support sparse tensors, hence the companion codes use FORTRAN/MEX files to
perform operations on sparse tensors. In principle, the computation of the chain rules can be performed
exclusively by sparse matrices without using tensors, as done in the perturbation code of Levintal (2017).
However, tensors provide more flexibility to the programmer in terms of memory management, which is
important particularly for large-scale models.

20
second stage compiles the codes generated in the first stage;27 and the third stage evaluates
the compiled codes. Although the AD method performs well for a first-order system, the
high-order chain rules outperform in all three stages for second and third-order systems.
For example, the second-order system of a 10-country model is generated 65 times faster,
compiled 72 times faster, and evaluated 18 times faster than the AD method. Importantly,
the differences between the two differentiation methods grow with the size of the model and
the derivative order. Thus, the proposed high-order chain rules are both faster and more
memory efficient than the AD method, particularly for large models and high orders.
In summary, these results demonstrate the sensitivity of Taylor projection to the differ-
entiation method employed. All methods considered in this section produced exactly the
same nonlinear system and Jacobian but varied considerably in memory consumption and
evaluation time. The existing differentiation methods were found to be too slow and memory
constrained. With these methods, Taylor projection could hardly compete against standard
projection methods, which do not need to compute derivatives. However, by employing the
high-order chain rules proposed in this paper, differentiation costs fall significantly, which
makes Taylor projection highly competitive for solving large-scale models.

5 Sparsity of the Jacobian


The most intensive part of the Taylor projection algorithm is the computation of the
Jacobian. A key advantage of Taylor projection compared to grid-based methods is that
the Jacobian is highly sparse. The sparsity is high because the approximating functions are
composed of monomials of x − x0 and the nonlinear system is evaluated at x0 . Therefore, the
monomials of x − x0 are zero, hence coefficients that are associated with those monomials
have no effect on the nonlinear system. By comparison, in grid-based methods the nonlinear
system is evaluated at several points in the state space so monomials of x − x0 are not zero.
To illustrate this, consider a simple model with one state variable x and one endogenous
variable y that satisfy the condition f (x, y) = 0. Suppose that y is approximated by the
quadratic function ĝ (x) = G0 + G1 (x − x0 ) + G2 (x − x0 )2 . A second-order Taylor projection
system consists of the residual function R (x) = f (x, ĝ (x)) and its first and second derivatives
with respect to x. The Jacobian of the second-order system is (see the appendix for details):

27
Since MATLAB compiles codes in the first run, the compilation time is approximated by the runtime
of the first execution of the code. For comparability with the AD method, the time for loading data into
memory is also added to the compilation stage. Hence, the compilation time reported in the table captures
all of the fixed costs of the first run of the code, of which compilation time is a major component.

21
1 (x − x0 ) (x − x0 )2
 

(13) A 0 1 2 (x − x0 )  ,
 

0 0 2

where A is a lower-triangular matrix.


Importantly, the nonlinear system is evaluated at x0 , hence the powers of x − x0 are zero.
It follows that the Jacobian of the Taylor projection system is:

 
1 0 0
(14) A 0 1 0 .
 

0 0 2

Since A is lower triangular, the Jacobian has at most 6 nonzero elements (out of 9). By
comparison, if the approximating function ĝ were centered at a different point c 6= x0 , then
the Jacobian would have 9 nonzero elements, because the powers of x − c would not be zero
at x0 . Thus, centering the approximating polynomial at x0 yields a sparser Jacobian.
Rational expectations models are more complicated than this simple example, because
the policy function g can be evaluated not only at the current state x, but also at the future
state x′ . For example, Euler conditions usually include current and future consumption. In
this case the Jacobian would be less sparse, because the future variables are approximated
by polynomials that are evaluated at the future state, which is different from x0 . To see this,
suppose that the model condition is f (x, y ′) = 0, where ′ denotes the next period and the
law of motion of x is x′ = Φ (x), which is assumed to be known. In this case, the Jacobian is
of the following form (see the appendix):

(Φ (x0 ) − x0 )2
 
1 Φ (x0 ) − x0
(15) B 0 Φx 2 (Φ (x0 ) − x0 ) Φx ,
 

0 Φxx 2Φ2x + 2 (Φ (x0 ) − x0 ) Φxx

where B is a lower-triangular matrix. Note that the Jacobian is now less sparse than (14),
because powers of Φ (x0 ) −x0 are not zero (unless Φ is the identity function, which boils down
to the previous example). This example illustrates the important difference between model
conditions that are independent of future control variables (e.g. a budget constraint) and
model conditions that depend on those variables (e.g. an Euler condition). The Jacobian of

22
the former is sparser than the latter.
The appendix generalizes these results to multivariate models and provides formulas for
the (maximum) number of nonzero elements in the Jacobian. The main finding of these
results is that centering the approximating polynomials at x0 reduces significantly the number
of nonzero elements in the Jacobian and the dependence of the number of nonzeros on the
size of the model. For details, the reader is referred to the appendix.
Table 3 demonstrates the sparsity of the Jacobian for the multi-country growth model.
The table presents the sparsity of two types of systems. The first system centers the approxi-
mating power series at x0 , thereby exploiting the sparsity of the problem. The second system
centers the power series at c 6= x0 . There is a large difference in sparsity between the two sys-
tems, and this gap grows as the number of countries (state variables) increases. For instance,
the Jacobian of the second-order system of a 10-country model has 629 thousand nonzero
values if the approximating polynomials are centered at x0 . The number of nonzero values
jumps to 6.5 million if the approximating power series are centered at c 6= x0 . Moreover, the
Jacobian of the third-order system with c 6= x0 is so large that it could not be computed due
to insufficient memory, while the sparse Jacobian could be. Hence, centering the power series
at x0 entails huge computational gains.
(Table 3 about here)
Previous studies that employed similar algorithms did not exploit these computational
gains. For instance, Krusell, Kuruşçu and Smith (2002) solve their algorithm, which is equiv-
alent to Taylor projection at the steady state, using ordinary polynomials for the approxi-
mating function. Ordinary polynomials are power series centered at zero, hence they do not
yield the high sparsity of the Jacobian discussed in this section. Similarly, the preliminary
version of this paper, Levintal (2013), centered the power series at a general point c without
imposing c = x0 . However, centering the power series at x0 is necessary but not sufficient. To
exploit the sparsity of the Jacobian, two conditions have to be met: first, the approximating
power series must be centered at x0 ; and second, the differentiation method that computes
the Jacobian must be able to exploit this sparsity (e.g. finite differencing does not exploit
sparsity, because it produces a dense Jacobian).

Approximate Jacobian

As explained, Euler conditions generate Jacobians that are not as sparse as other model
conditions (e.g. budget constraint), because Euler conditions depend on future control vari-
ables. However, it is possible to increase the sparsity in this case by approximation. Recall
that assumption 2 implies that xt+1 should be close to xt , hence xt+1 − x0 ≈ 0. This yields
the following approximation:

23
∂ n+1 ĝ (x, Θ) ∂ n+1 ĝ (x, Θ)

≈ ∀n = 0, 1, . . . , k.
∂xj1 · · · ∂xjn ∂Θ x=xt+1 ∂xj1 · · · ∂xjn ∂Θ x=x0

Namely, the partial derivatives of ĝ (x, Θ) w.r.t x and Θ evaluated at the future state are
approximated by their values at the current state, because the monomials are close to zero.
For instance, the Jacobian matrix (15) can be approximated by:

 
1 0 0
(16) B  0 Φx 0 .
 

0 Φxx 2Φ2x

Note that the powers of Φ (x0 ) − x0 that appear in the right matrix of (15) are approximated
by zeros. This yields an approximate Jacobian with at most 6 nonzero elements, compared
to the exact Jacobian (15), which has 9 nonzero elements.
Table 4 presents the sparsity of the approximate Jacobian for the multi-country model,
which is several times higher (fewer nonzero values) than the exact Jacobian. For instance,
the third-order system of a 10-country model has a Jacobian with 33 million nonzero values,
whereas the approximate Jacobian contains only 7 million nonzero values. These differences
grow with the size of the model, so that the approximate Jacobian is particularly useful in
large models.
(Table 4 about here)
The approximate Jacobian used in Table 1 to solve the multi-country model performs
extremely well, suggesting that the approximation errors are sufficiently small. Note that the
approximate Jacobian sometimes requires more Newton iterations to converge than the exact
Jacobian, which is expected. Importantly, the total runtime of the approximate Jacobian is
significantly lower than the exact Jacobian. For the largest model, the runtime difference
reaches a factor of 25.

Collocation

Maliar and Maliar (2014) and Judd (1992) note that Newton methods are too costly
in the case of large models that are solved by grid-based projection methods. To see this,
suppose that the simple model presented previously is solved by collocation. Namely, the
residual function is evaluated at three points x1 , x2 , x3 and equated to zero. The Jacobian of
the nonlinear system is (see the appendix):

24
1 (x1 − x0 ) (x1 − x0 )2
 

(17) C  1 (x2 − x0 ) (x2 − x0 )2  ,


 

1 (x3 − x0 ) (x3 − x0 )2

where C is a diagonal matrix. Note that this Jacobian is dense with 9 nonzero elements,
because the monomials are not zero. By comparison, the Jacobian of Taylor projection (14)
had only 6 nonzero elements. Hence, collocation generates a Jacobian that is less sparse than
Taylor projection.
Table 5 illustrates this property for the multi-country model.28 The Jacobian in this
case is almost fully dense. Hence, the computational costs of the Jacobian and the Newton
step in collocation are much larger than in Taylor projection. Fernández-Villaverde and
Levintal (2017) provide further evidence on the computational costs of Taylor projection
versus collocation.
(Table 5 about here)

6 Numerical Stability
The Newton method works well if the Jacobian of the nonlinear system is well-conditioned.
In standard projection methods, the residual function is evaluated over a grid of points. Since
these points are fairly similar to each other, the Jacobian tends to be ill-conditioned (Judd
1992). By comparison, in Taylor projection the nonlinear system is comprised of different
derivatives of the residual function. The information captured in these derivatives is more
heterogeneous, so the Jacobian is more likely to be well-conditioned.
To illustrate this point, consider the simple model presented in section 5. The Jacobian of
collocation is given by (17). Note that the second and third columns of the right matrix are
likely to be highly correlated, hence the condition number of the entire Jacobian should be
high. This problem can be resolved by using Chebyshev polynomials and a grid of Chebyshev
nodes (Judd 1992).
Consider now the Jacobian of Taylor projection given by (14). In contrast to the Ja-
cobian of collocation, in Taylor projection there is no fundamental reason to expect an
ill-conditioned Jacobian, because none of the matrices in (14) have the problem of highly
correlated columns.29
Table 6 reports the condition number of the Jacobian for the multi-country model. Col-

28
I abstract from the issue of numerical stability, which is discussed in the next section.
29
For similar reasons, the Jacobians (15) and (16) are also likely to be well conditioned.

25
umn (1) shows that the condition number is relatively small, so that the Newton step can be
computed accurately on a double-precision computer. If we scale the problem, the condition
number decreases, as shown in column (2).30 Scaling would be important if the problem is
badly scaled. In general, it is recommended to express the model conditions f in a unit-free
form in order to minimize scaling problems.
(Table 6 about here)
In summary, the Jacobian of Taylor projection does not have a conditioning problem
(abstracting from scaling issues). Therefore, the approximating polynomials need not be
Chebyshev polynomials, because the orthogonality property is not used.31 This yields an
important computational advantage over collocation. Multidimensional collocation methods
use as bases functions tensor products of Chebyshev polynomials, which are extremely sen-
sitive to the curse of dimensionality, or Smolyak polynomials, which are less sensitive to the
curse (Judd, Maliar, Maliar and Valero 2014). By comparison, Taylor projection can use
simple complete polynomials, which have a smaller number of terms.32 Consequently, the
size of the Jacobian is smaller, which reduces computational costs.

7 Inequality Dynamics
This section simulates the dynamics of capital inequality in the multi-country growth
model. The computational challenge lies in the fact that the model has to be solved far
from the steady state, where inequality is pervasive. Specifically, at the initial point the rich
country is endowed with the steady-state capital level while the poor country is endowed with
10 percent of that level. Perturbation methods are inaccurate in this area of the state space,
due to the large distance from the steady state. The advantage of Taylor projection is that
the model can be solved locally at several points along the simulation path, so that accuracy
remains high throughout the simulation.
To simulate the inequality dynamics, we depart from the simple frictionless model studied
previously. In that centralized economy, the social planner can shift capital costlessly between
countries, so that convergence to equality is immediate. To relax this assumption, capital

30
To scale the Jacobian, let Θ denote the vector of the polynomial coefficients, and let T (Θ) = 0 denote
the full Taylor projection system (1).
 The Jacobian
 of T is denoted J. Define a scaled parameter vector
Θ̃ = Wc Θ and a scaled system T̃ Θ̃ = Wr−1 T Wc−1 Θ̃ , where Wc and Wr are diagonal weighting matrices.
The Jacobian of T̃ is J˜ = Wr−1 JWc−1 . The weighting matrices Wc and Wr are chosen so as to scale the
columns and rows of the Jacobian.
31
Moreover, using Chebyshev polynomials in Taylor projection would not yield the highly sparse Jacobian,
which is crucial to mitigate the curse of dimensionality.
32
Collocation methods based on complete polynomials are rarely used, because they are not backed by a
well-defined theory, see Krueger and Kubler (2004).

26
adjustment costs are introduced so that the capital of country h evolves as follows:
" 2 #
iht iht

h φ
kt+1 = (1 − δ) kth + − −δ kth .
kth 2 kth

In the standard business cycle literature, the capital adjustment cost parameter φ is usually
small (Mendoza 1991). However, in the context of the multi-country model, adjustment costs
should be high, otherwise countries converge too quickly. I examine different values of the
capital adjustment cost parameter to see its impact on the model dynamics and the solution
accuracy. For another study of the multi-country model with capital adjustment costs see
Kollmann, Maliar, Malin and Pichler (2011).
The model is solved by Taylor projection of orders 1, 2 and 3. The solution is obtained at
the initial point of the simulation33 and then used to simulate the model for subsequent peri-
ods. The accuracy of the solution is checked at each point during the simulation. Whenever
it violates the required tolerance, a new solution is obtained. Consequently, the simulation
is comprised of different solutions obtained at different approximation points along the sim-
ulation. Under assumptions 1-3, as the approximation order goes to infinity, these solutions
converge to the true solution in the neighbourhood of the approximation points. If the true
solution is unique, the simulated path converges to the true path of the model.
Since the solution changes across the simulation, it is important to ensure that it remains
sufficiently accurate along the simulated path. I examine three measures of accuracy. The
first measure, reported in Table 7, is the maximum error (unit-free model residuals) across the
simulation. Points at which the model is solved by Taylor projection are excluded, because
the algorithm zeros the model residuals at these points by construction. Note that this
measure is not very informative if the model is solved at many points, because the accuracy
at these (excluded) points is not measured.
The second accuracy measure, reported in Table 7, is the maximum error across a rolling
hypercube. At each point of the simulation, a hypercube of 10-percent radius is constructed,
and the maximum error within the hypercube is reported. This measure is more conservative,
because it encompasses a larger domain of the state space. It is also consistent with the
common practice in the previous literature, e.g. Aruoba, Fernández-Villaverde and Rubio-
Ramı́rez (2006). The only difference is that the hypercube is changing rather than fixed.
The third accuracy measure, reported in Table 8, compares the simulated paths of the
endogenous variables to a solution obtained by a global method. For this purpose, the model
is solved by a Smolyak collocation method of level 3. The collocation solution is constructed

33
To obtain a good guess for the Newton method, the multi-step procedure described in section 2.2 is
employed.

27
over a grid of Smolyak nodes that covers the simulated path of the model.34
Table 7 presents accuracy measures for the two-country model for three parameterizations
of the adjustment cost parameter: φ = 2, 10, 50. The table reports the results for four
solution methods: 1. the global Smolyak solution; 2. the Taylor projection solution; 3.
perturbation solution with pruning; 4. perturbation without pruning. The global solution
is highly accurate. The (log10) errors across the simulation are smaller than -4.9, -5.2 and
-6.1 for the adjustment cost parameter φ = 2, 10, 50, respectively. Errors across a rolling
hypercube are larger, because the rolling hypercube encompasses a larger domain of the
state space. Perturbation solutions deliver extremely poor approximations. The maximum
errors are very large, and for the high adjustment cost parameter they explode to infinity.
The explosive paths can be ruled out by pruning, but the errors are still very large. For
instance, the (log10) maximum error across a rolling hypercube ranges from −0.9 to −0.6 for
a pruned third-order perturbation solution.35 By comparison, Taylor projection delivers very
accurate results despite the distance from the steady state. The accuracy measure across
a rolling hypercube for a third order solution is around −3.2, which is fairly close to the
global solution. A second-order solution is also highly accurate, while the first-order solution
is somewhat less so. In any case, Taylor projection is about 3-5 orders of magnitude more
accurate than perturbation of similar order.
(Table 7 about here)
Figure 1 shows more clearly the difference between Taylor projection and perturbation
for solution orders k = 1, 2, 3 and adjustment costs φ = 2, 10, 50. The plots show the
maximum error across a rolling hypercube. Perturbation solutions exhibit very high errors
at the beginning of the simulation, when the model is far from the steady state. However,
as the simulation progresses towards the steady state, the errors decrease (except for the
explosive paths). In contrast, the accuracy of Taylor projection is roughly uniform across
the simulation, particularly for the second and third-order solutions. The “zig-zag” pattern
(of the thick black line) reflects points at which new solutions were obtained by Taylor
projection. At these points, errors fall abruptly but then increase gradually as the model
progresses, until a new solution is obtained and the errors fall again. This process ensures
that the errors remain small across the simulation.
(Figure 1 about here)
Table 8 presents the maximum difference in the endogenous variables between the Smolyak
solution and the other solutions across the simulation. All variables are in logs, so that dif-

34
Finding the Smolyak coefficients that satisfy the model conditions across the grid points is sensitive to
the initial guess. A good initial guess was generated by solving the model at each of the grid points by Taylor
projection.
35
The pruning method of Andreasen, Fernández-Villaverde and Rubio-Ramı́rez (2017) is employed.

28
ferences are expressed in log differences. The differences from the Taylor projection solutions
decrease as φ increases. For instance, for φ = 2 the third-order Taylor projection differs from
the Smolyak solution by up to 3.9% across the simulation. For φ = 50 the difference between
the two solutions is smaller than 0.5%. This result reflects the importance of assumption 2,
which requires that the model dynamics be gradual. In the multi-country model, dynamics
is governed by capital adjustment costs. Thus, when adjustment costs are small, the model
dynamics is fast. Hence, capital of the poor country grows very rapidly. This implies that the
current state (xt ) and the future state (xt+1 ) are relatively distant. In this case, the Taylor
projection solution is less accurate (compared to the global solution). On the other hand,
when adjustment costs are high, the model dynamics is more gradual, so that xt+1 is close
to xt . Consequently, the Taylor projection solution is more accurate. These results indicate
that models with strong variations in the state variables require higher-order solutions in
order to meet the researcher’s accuracy standard. On the other hand, gradual dynamics can
be well-approximated by lower-order solutions.
The model dynamics is presented in greater detail in Figure 2, which plots the dynamics
of capital in the two-country model for selected solutions. Figure 3 plots the same dynamics
for 5 countries and for 10 countries (only for tp2). For better visualization, the simulations
assume zero TFP shocks. In all plots countries converge faster when adjustment costs are
smaller. Figure 2 shows that perturbation solutions (presented here only for the third order)
are significantly different from the Smolyak solutions and may explode without pruning.
Moreover, perturbation solutions are sometimes in the wrong direction. For instance, the
pruned third-order perturbation solution shows a capital increase in the rich country at the
beginning of the simulation, whereas the (more accurate) Smolyak solution shows a capital
decrease.
(Figure 2 about here)
(Figure 3 about here)
Table 9 examines the sensitivity of Taylor projection to the curse of dimensionality. The
table reports the computational costs and accuracy measures for 2, 5 and 10 countries. Run-
time depends on the number of solutions required to perform the simulation, which includes
the preliminary multi-step procedure that is needed to produce a good initial guess. In this
stage, the model is solved at 10 different points along a path from the steady state to the
initial point of the simulation.36 The total number of solutions is reported in the table. For
instance, the 10-country model with high adjustment costs is solved quadratically 20 times

36
The choice of 10 steps is conservative. Similar results can be obtained with 5 steps only, thus reducing
total runtime. One can optimize the number of steps further by the following procedure: (a) Choose a
temporary step size; (b) Evaluate the residual function at the new point, using the previous solution; (c) If
the residual is smaller than a tolerance, increase the step size. Otherwise, solve the model at the new point.

29
(tp2, φ = 50), of which 10 solutions are obtained during the preliminary multi-step procedure.
Each solution takes 4.4 seconds and total runtime is 88 seconds. Note that when the solution
order increases, the model is solved less frequently, because each solution is more accurate.37
The third order solution (obtained with the approximate Jacobian) is somewhat more ac-
curate, but computational costs rise dramatically. Hence, for this model the second-order
solution yields the best accuracy/speed tradeoff.
(Table 9 about here)

8 Conclusion
This paper develops a new solution method for dynamic general equilibrium models,
called “Taylor projection”. The method provides a local solution, but the solution can be
obtained at any arbitrary point. The method is faster by orders of magnitude than the
existing projection methods. The key computational property of the proposed method is the
high sparsity of the Jacobian, which facilitates the implementation of the Newton method in
large models. By comparison, recent implementations of projection methods have resorted to
fixed-point iteration to avoid the costly computation of the Jacobian. The main cost of the
proposed method is the computation of high-order derivatives of the residual function. To
overcome this cost, the paper develops a new differentiation method that computes efficiently
high-order derivatives using high-order chain rules.
Taylor projection can be useful in a variety of models that cannot be solved accurately by
perturbation. This is demonstrated on a model of cross-country inequality dynamics, which
has to be solved far away from the steady state. This type of application arises also in models
with crisis equilibria that occur far from the steady state (e.g. Brunnermeier and Sannikov
2014). Taylor projection is also suitable for solving models with strong volatility, such as
those with rare disasters (Fernández-Villaverde and Levintal 2017). Further applications are
left for future research.

37
Interestingly, total runtime of a first-order solution can be larger than for higher-order solutions. This
is because a first-order solution is less accurate and therefore has to be recalculated many times along the
simulation. For the 10-country model with φ = 10, the first-order solution completely failed. This is evident
from the large number of solutions obtained along the simulation and the large errors.

30
References
[1] Adda, Jérôme and Russell Cooper, Dynamic Economics: Quantitative Methods and Ap-
plications, (Cambridge, Massachusetts: MIT Press, 2003).

[2] Andreasen, Martin M., Jesús Fernández-Villaverde and Juan F. Rubio-Ramı́rez, “The
Pruned State-Space System for Non-Linear DSGE Models: Theory and Empirical Ap-
plications,” Review of Economic Studies, forthcoming, 2017.

[3] Aruoba, Borağan S., Jesús Fernández-Villaverde and Juan F. Rubio-Ramı́rez, “Compar-
ing solution methods for dynamic equilibrium economies,” Journal of Economic Dynam-
ics and Control, 30 (2006), 2477-2508.

[4] Barro, Robert J., Jesús Fernández-Villaverde, Oren Levintal and Andrew Mollerus, “Safe
Assets,” Mimeo, Harvard University, 2017.

[5] Boissay, Frédéric, Fabrice Collard and Frank Smets, “Booms and Banking Crises,” Jour-
nal of Political Economy, 124 (2016), 489-538.

[6] Brunnermeier, Markus K., and Yuliy Sannikov, “A Macroeconomic Model with a Finan-
cial Sector,” American Economic Review, 104 (2014), 379-421.

[7] Christiano, Lawrence J. and Jonas D.M. Fisher, “Algorithms for solving dynamic models
with occasionally binding constraints,” Journal of Economic Dynamics and Control, 24
(2000), 1179-1232.

[8] Coeurdacier, Nicolas, Hélène Rey, and Pablo Winant, “The Risky Steady State,” Amer-
ican Economic Review, 101 (2011), 398-401.

[9] Den Haan, Wouter J., Michal L. Kobielarz and Pontus Rendahl, “Exact Present Solu-
tion with Consistent Future Approximation: A Gridless Algorithm to Solve Stochastic
Dynamic Models,” CEPR Discussion Paper no. 10999, 2015.

[10] Dewachter, Hans and Raf Wouters, “Endogenous risk in a DSGE model with capital-
constrained financial intermediaries,” Journal of Economic Dynamics and Control 43
(2014), 241-268.

[11] Fernández-Villaverde, Jesús, Juan F. Rubio-Ramı́rez and Frank Schorfheide, “Solution


and Estimation Methods for DSGE Models,” Handbook of Macroeconomics, 2 (2016),
527-724.

31
[12] Fernández-Villaverde, Jesús, and Oren Levintal, “Solution Methods for Models with
Rare Disasters,” Quantitative Economics, forthcoming, 2017.

[13] Griewank, Andreas, “On Automatic Differentiation,” Mathematical Programming: recent


developments and applications, 6 (1989), 83-107.

[14] Griewank, Andreas and Andrea Walther, Evaluating Derivatives, Principles and Tech-
niques of Algorithmic Differentiation, Second Edition, (Philadelphia: SIAM, 2008)

[15] Guerrieri, Luca and Matteo Iacoviello, “OccBin: A toolkit for solving dynamic models
with occasionally binding constraints easily,” Journal of Monetary Economics, 70 (2015),
22-38.

[16] Hörmander, Lars, The Analysis of Linear Partial Differential Operators I Distribution
Theory and Fourier Analysis, (Berlin, Heidelberg, New-York: Springer-Verlag, 1990)

[17] Judd, Kenneth L., “Projection Methods for Solving Aggregate Growth Models,” Journal
of Economic Theory, 58 (1992), 410-452.

[18] Judd, Kenneth L., Numerical Methods in Economics, (Cambridge, Massachusetts: The
MIT Press, 1998).

[19] Judd, Kenneth L., “Existence, Uniqueness, and Computational Theory for Time Con-
sistent Equilibria: A Hyperbolic Discounting Example,” manuscript, Hoover Institution,
December 2003.

[20] Judd, Kenneth, Lilia Maliar and Serguei Maliar, “Numerically stable and accurate
stochastic simulation approaches for solving dynamic economic models,” Quantitative
Economics 2 (2011), 173-210.

[21] Judd, Kenneth, Lilia Maliar, Serguei Maliar and Rafael Valero, “Smolyak method for
solving dynamic economic models: Lagrange interpolation, anisotropic grid and adaptive
domain,” Journal of Economic Dynamics and Control, 44 (2014), 92-123.

[22] Kollmann, Robert, Serguei Maliar, Benjamin A. Malin and Paul Pichler, “Comparison
of solutions to the multi-country Real Business Cycle model,” Journal of Economic
Dynamics and Control, 35 (2011), 186-202.

[23] Krueger, Dirk and Felix Kubler, “Computing equilibrium in OLG models with stochastic
production”, Journal of Economic Dynamics and Control, 28 (2004), 1411-1436.

32
[24] Krusell, Per, Burhanettin Kuruşçu and Anthony A. Smith, Jr., “Equilibrium Welfare and
Government Policy with Quasi-geometric Discounting,” Journal of Economic Theory,
105 (2002), 42-72.

[25] Levintal, Oren, “Fifth-Order Perturbation Solution to DSGE Models,” Journal of Eco-
nomic Dynamics and Control, 80 (2017), 1-16.

[26] Levintal, Oren, “Solving DSGE Models without a Grid,” Manuscript, Bar-Ilan Univer-
sity, 2013.

[27] Maliar, Lilia and Serguei Maliar, “Numerical Methods for Large Scale Dynamic Eco-
nomic Models”, Handbook of computational economics, 3 (2014), 325-477.

[28] Maliar, Lilia, and Serguei Maliar, “Merging simulation and projection approaches to
solve high-dimensional problems with an application to a new Keynesian model,” Quan-
titative Economics, 6 (2015), 1-47.

[29] Maliar, Serguei, Lilia Maliar and Kenneth Judd, “Solving the multi-country real business
cycle model using ergodic set methods,” Journal of Economic Dynamics and Control,
35 (2011), 207-228.

[30] Maliar, Lilia, Serguei Maliar and Sébastien Villemot, “Taking Perturbation to the Ac-
curacy Frontier: A Hybrid of Local and Global Solutions,” Computational Economics,
42 (2013), 307-325.

[31] Marimon, Ramon and Andrew Scott, Computational Methods for the Study of Dynamics
Economics, (New York: Oxford University Press, 1999).

[32] Mendoza, Enrique G., “Real Business Cycles in a Small Open Economy,” The American
Economic Review, 81 (1991), 797-818.

[33] Patterson, Michael A., Matthew J. Weinstein, and Anil V. Rao, “An Efficient Overloaded
Method for Computing Derivatives of Mathematical Functions in MATLAB,” ACM
Transactions on Mathematical Software, 39 (2013), 17.

[34] Schmitt-Grohé, Stephanie and Martı́n Uribe, “Solving dynamic general equilibrium mod-
els using a second-order approximation to the policy function,” Journal of Economic
Dynamics and Control, 28 (2004), 755-775.

33
Table 1: Multi-Country Growth Model: Taylor Projection vs. Standard Projection
Order 2 Countries 5 Countries 10 Countries
Method solution guess time iters max mean time iters max mean time iters max mean
1. EDS-grid
1 1 23.3 3,045 -2.9 -4.4 34.9 3,181 -2.4 -4.1 77.9 3,141 -3.0 -4.5
2 1 25.2 3,279 -4.0 -5.7 55.4 3,207 -4.0 -5.7 179.3 3,146 -4.2 -5.8
3 1 31.3 3,249 -4.5 -6.6 128.7 3,184 -4.7 -6.7 8,330.3 3,121 -5.0 -6.9
2. Smolyak
1 1 21.5 4,572 -1.9 -3.6 49.0 4,351 -1.8 -3.3 140.4 4,116 -1.9 -3.3
2 1 29.7 4,634 -4.2 -5.7 249.1 4,517 -3.9 -5.5 3,028.3 4,299 -3.9 -5.4
3 1 61.5 4,698 -5.3 -7.0 3,156.5 4,599 -5.3 -7.4 - - - -
3.Taylor Projection
a. Exact Jacobian 1 1 0.1 3 -1.2 -3.1 0.1 3 -1.5 -3.1 0.1 3 -1.8 -3.2
2 2 0.1 2 -3.7 -5.9 0.2 2 -2.8 -4.8 3.5 2 -3.0 -4.8
3 3 0.2 2 -3.6 -6.2 3.7 2 -4.1 -6.4 2,285.0 2 -4.6 -6.5

2 1 0.1 3 -3.7 -5.9 0.2 3 -2.8 -4.8 5.3 3 -3.0 -4.8


3 1 0.4 4 -3.6 -6.2 7.3 4 -4.1 -6.4 4,513.6 4 -4.6 -6.5
34

b. Approximate Jacobian 1 1 0.1 3 -1.2 -3.1 0.1 3 -1.5 -3.1 0.1 2 -1.8 -3.2
2 2 0.1 3 -3.7 -5.9 0.1 2 -2.8 -4.8 0.7 2 -3.0 -4.8
3 3 0.2 2 -3.6 -6.2 0.9 3 -4.1 -6.4 132.1 3 -4.6 -6.5

2 1 0.2 4 -3.7 -5.9 0.2 4 -2.8 -4.8 1.4 4 -3.0 -4.8


3 1 0.5 5 -3.6 -6.2 1.3 4 -4.1 -6.4 175.1 4 -4.6 -6.5
4.Perturbation
1 0.0 -1.2 -3.1 0.0 -1.5 -3.1 0.0 -1.8 -3.2
2 0.0 -3.7 -5.9 0.0 -2.8 -4.8 0.0 -3.0 -4.8
3 0.0 -3.6 -6.2 0.0 -4.1 -6.4 0.2 -4.6 -6.5

The table compares runtime and accuracy of solutions of the multi-country model by four methods: 1. EDS-grid 2. Smolyak 3. Taylor
projection 4. Perturbation. The solution orders (or Smolyak approximation levels) range from 1 to 3. The initial guess of the EDS-grid
and Smolyak methods is linear. The initial guess of Taylor projection is a perturbation solution of order 1, 2 or 3, as mentioned in
column “guess”. Runtime is in seconds and includes time required for constructing a grid. iters denotes the number of iterations required
for convergence. max and mean denote maximum and mean absolute (unit-free) errors (in log10 units) across a simulation of length
T = 10, 000.
Table 2: Differentiation Costs: Symbolic Differentiation, Automatic Differentiation and High-
Order Chain Rules
Symbolic Automatic High-Order Chain Rules
❍❍ k
❍❍ 1 2 3 1 2 3 1 2 3
N ❍
(1) (2) (3) (4) (5) (6) (7) (8) (9)
I. Automatically Generated Codes (Kilobytes)
2 334 224,030 - 43 262 1,429 18 23 29
4 6,993 - - 101 1,110 10,144 29 38 47
6 50,983 - - 177 2,926 - 41 53 65
8 228,850 - - 273 6,090 - 52 68 83
10 - - - 388 10,975 - 64 83 101
II. Precomputed Data (Kilobytes)
2 0 0 - 108 1,232 11,924 193 245 373
4 0 - - 365 9,180 438,880 269 372 630
6 0 - - 924 48,306 - 350 490 1,080
8 0 - - 1,558 214,570 - 425 613 2,025
10 - - - 2,792 833,950 - 502 732 3,745
III. Generation time (seconds)
2 2 1,186 - 2.5 7.6 45.9 2.2 2.5 3.2
4 27 - - 3.2 31.8 560.3 3.4 4.2 5.3
6 177 - - 5.9 100.8 - 4.8 6.1 7.8
8 773 - - 8.6 275.2 - 6.7 8.3 11.0
10 - - - 14.1 681.6 - 8.6 10.5 15.4
IV. Compilation time (seconds)
2 1 2,038 - 0.4 0.8 4.0 0.7 0.2 0.4
4 14 - - 0.4 3.5 117.7 0.1 0.2 0.5
6 129 - - 0.7 15.6 - 0.2 0.3 3.5
8 1,939 - - 1.1 56.8 - 0.2 0.5 20.0
10 - - - 1.5 72.5 - 0.3 1.0 82.5
V. Evaluation time (seconds)
2 0.00 2.12 - 0.01 0.02 0.09 0.07 0.07 0.14
4 0.02 - - 0.00 0.08 6.20 0.04 0.07 0.32
6 0.20 - - 0.01 0.53 - 0.04 0.10 3.00
8 1.67 - - 0.02 2.84 - 0.05 0.27 17.93
10 - - - 0.03 11.21 - 0.06 0.63 77.69

The table compares three differentiation methods: Symbolic Differentiation, Automatic Differen-
tiation, and High-Order Chain Rules. The model employed is the multi-country growth model
with N=2,4,6,8,10 countries. The model is differentiated k+1 times, k=1,2,3, to obtain the k-order
nonlinear system and the Jacobian. The table reports the output of the differentiation process,
which includes automatically generated MATLAB codes (panel I) and precomputed data (panel
II). The lower panels report runtime in seconds for generating (panel III), compiling (panel IV)
and evaluating (panel V) the derivatives. Compilation time is approximated by the runtime of the
first execution of the generated MATLAB codes, including loading data into memory.

35
Table 3: Sparsity of the Jacobian
no. of countries
order Jacobian details 2 5 10
1. Power series centered at x0
dimensions 15 × 15 66 × 66 231 × 231
1st nonzero elements 104 1,206 8,761
nonzeros share 0.462 0.277 0.164

dimensions 45 × 45 396 × 396 2541 × 2541


2nd nonzero elements 756 28,880 628,926
nonzeros share 0.373 0.184 0.097

dimensions 105 × 105 1716 × 1716 19481 × 19481


3rd nonzero elements 3,596 463,818 32,734,006
nonzeros share 0.326 0.158 0.086

2. Power series centered at c 6= x0


dimensions 15 × 15 66 × 66 231 × 231
1st nonzero elements 216 4,306 53,161
nonzeros share 0.960 0.989 0.996

dimensions 45 × 45 396 × 396 2541 × 2541


2nd nonzero elements 2,011 156,758 6,456,446
nonzeros share 0.993 1.000 1.000

dimensions 105 × 105 1716 × 1716 19481 × 19481


3rd nonzero elements 11,013 2,944,580 insufficient
nonzeros share 0.999 1.000 memory

The table presents the sparsity of the Jacobian of system (1). The model is a multi-country growth
model. In the upper panel, the approximating polynomials are power series centered at x0 . In the
lower panel, the power series are centered at c 6= x0 . “Insufficient memory” refers to a computer
with 16GB RAM.

36
Table 4: Sparsity of the Approximate Jacobian
no. of countries
order Jacobian details 2 5 10
dimensions 15 × 15 66 × 66 231 × 231
1st nonzero elements 84 931 6,661
nonzeros share 0.373 0.214 0.125

dimensions 45 × 45 396 × 396 2541 × 2541


2nd nonzero elements 510 14,780 258,776
nonzeros share 0.252 0.094 0.040

dimensions 105 × 105 1716 × 1716 19481 × 19481


3rd nonzero elements 2,070 157,668 7,446,656
nonzeros share 0.188 0.054 0.020

The table presents the sparsity of the approximate Jacobian. The model is similar to that in Table
3.

Table 5: Sparsity of Collocation


no. of countries
order Jacobian details 2 5 10
dimensions 15 × 15 66 × 66 231 × 231
1st nonzero elements 220 4,356 53,361
nonzeros share 0.978 1.000 1.000

dimensions 45 × 45 396 × 396 2541 × 2541


2nd nonzero elements 2,025 156,816 6,456,681
nonzeros share 1.000 1.000 1.000

dimensions 105 × 105 1716 × 1716 19481 × 19481


3rd nonzero elements 11,025 2,944,370 insufficient
nonzeros share 1.000 1.000 memory

The table presents the sparsity of the Jacobian for a collocation method. The model is similar
to that in Table 3. The approximating functions are complete polynomials in the form of power
series.

37
Table 6: Condition Number of the Jacobian
Unscaled Scaled
Order (1) (2)
2 countries
1 3.5 2.5
2 3.6 3.1
3 4.1 3.6
5 countries
1 3.4 2.5
2 3.7 2.9
3 4.1 3.6

The table presents the log10 of the condition number of the Jacobian of the multi-country growth
model. Column (1) presents the unscaled Jacobian and column (2) presents a scaled Jacobian.

Table 7: Inequality Dynamics - Accuracy Comparison


Simulation Rolling Hypercube
Pruning φ=2 φ = 10 φ = 50 φ=2 φ = 10 φ = 50
Smolyak - -4.9 -5.2 -6.1 -4.1 -3.4 -3.3
tp1 - -4.0 -4.0 -4.0 -2.2 -2.1 -2.6
tp2 - -4.0 -4.1 -4.0 -3.1 -3.1 -3.0
tp3 - -4.0 -4.1 -4.0 -3.3 -3.3 -3.2
pert1 Yes -2.0 -2.0 0.6 -0.9 -0.8 1.5
pert2 Yes -1.2 -0.6 0.4 -0.8 2.0 1.8
pert3 Yes -1.3 -1.3 -0.9 -0.8 -0.9 -0.6
pert1 No -2.0 -2.0 0.6 -0.9 -0.8 1.5
pert2 No -1.2 -0.6 Inf -0.8 2.3 Inf
pert3 No -1.3 -1.3 Inf -0.8 -0.9 Inf

The table presents accuracy measures for the two-country growth model with capital adjustment
costs. The Smolyak solution is of level 3. tp1, tp2, tp3 denote Taylor projection of orders 1, 2, 3,
respectively, and pert1, pert2, pert3 denote perturbation solutions of orders 1, 2, 3, respectively.
The table reports maximum model residuals across a simulation and across a rolling hypercube in
log10 units. Perturbation solutions are reported with and without pruning. The adjustment cost
parameter takes three values: φ = 5, 10, 20. Further details can be found in the text.

Table 8: Inequality Dynamics - Comparison of Simulated Paths


φ tp1 tp2 tp3 pert1 pert2 pert3
(pruned) (pruned)
2 0.253 0.048 0.039 0.344 0.229 0.076
10 0.131 0.025 0.029 0.281 0.303 0.063
50 0.054 0.014 0.005 0.860 0.442 0.432

The table compares the simulated paths of the endogenous variables between the Smolyak solution
and the other solutions. Each column reports the maximal distance (measured by log difference)
between the path simulated by the solution in the title and the Smolyak solution.

38
Table 9: Inequality Dynamics - Accuracy/Speed
2 countries 5 countries 10 countries
φ tp1 tp2 tp3 tp1 tp2 tp3 tp1 tp2 tp3
Total runtime (secs)
2 5.0 3.1 5.3 4.8 6.1 81.9 6.2 110.2 12,571
10 11.2 7.0 5.3 12.1 5.7 78.9 163.5 99.8 11,089
50 1.9 2.6 4.4 2.9 5.0 71.2 4.7 87.7 7,653
Number of solutions obtained along the simulation
2 37 19 15 41 19 15 46 20 15
10 48 23 16 49 20 15 501 21 16
50 30 18 15 35 19 15 39 20 16
Runtime per solution (secs)
2 0.1 0.2 0.4 0.1 0.3 5.5 0.1 5.5 838
10 0.2 0.3 0.3 0.2 0.3 5.3 0.3 4.8 693
50 0.1 0.1 0.3 0.1 0.3 4.7 0.1 4.4 478
Max Error across a Rolling Hypercube (log10)
2 -2.2 -3.1 -3.3 -2.0 -3.0 -3.3 -1.8 -2.8 -3.1
10 -2.1 -3.1 -3.3 -2.1 -3.1 -3.2 -1.0 -3.0 -3.3
50 -2.6 -3.0 -3.2 -2.3 -3.1 -3.3 -2.1 -3.0 -3.3

The table presents computational costs of the model simulation. Runtime is in seconds. Number of
solutions is the number of solutions obtained along the simulation path (including the preliminary
multi-step procedure that generates the initial guess).

39
φ = 2, k = 1 φ = 2, k = 2 φ = 2, k = 3
2 2 2

0 0 0
−2 −2 −2
−4 −4 −4
−6 −6 −6
0 100 200 0 100 200 0 100 200

φ = 10, k = 1 φ = 10, k = 2 φ = 10, k = 3


2 2 2

0 0 0
−2 −2 −2
−4 −4 −4
−6 −6 −6
0 100 200 0 100 200 0 100 200

φ = 50, k = 1 φ = 50, k = 2 φ = 50, k = 3


2 2 2
0 0 0
−2 −2 −2
−4 −4 −4
−6 −6 −6
0 100 200 0 100 200 0 100 200

tp pert (pruned) pert (unpruned)

Figure 1: Inequality Dynamics - Accuracy Comparison (log10)


The figure compares the accuracy of the simulation for the two-country model. The plots present
the maximum error across a rolling hypercube in log10 units. The horizontal axis denotes time
from period 1 to period 200. The thick black line depicts Taylor projection, the grey line depicts
perturbation with pruning, and the thin black line depicts perturbation without pruning. k = 1, 2, 3
is the solution order, and φ = 2, 10, 50 is the adjustment cost parameter.

40
φ=2 φ=2
1 1

0.8 0.8

0.6 0.6
tp1
0.4 tp2 0.4 pert3 - pruned
tp3 pert3 - unpruned
0.2 Smolyak 0.2 Smolyak
0 0
0 100 200 300 400 500 0 100 200 300 400 500

φ = 10 φ = 10
1 1

0.8 0.8

0.6 0.6
tp1
0.4 tp2 0.4 pert3 - pruned
tp3 pert3 - unpruned
0.2 Smolyak 0.2 Smolyak
0 0
0 100 200 300 400 500 0 100 200 300 400 500

φ = 50 φ = 50
1 1

0.8 0.8

0.6 0.6
tp1
0.4 tp2 0.4 pert3 - pruned
tp3 pert3 - unpruned
0.2 Smolyak 0.2 Smolyak
0 0
0 100 200 300 400 500 0 100 200 300 400 500

Figure 2: Inequality Dyanmics - Capital


The plots depict the capital dynamics of the rich and poor countries for selected solutions. The
model is a two-country model.

41
φ=2 φ=2
1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0
0 100 200 300 400 500 0 100 200 300 400 500

φ = 10 φ = 10
1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0
0 100 200 300 400 500 0 100 200 300 400 500

φ = 50 φ = 50
1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0
0 100 200 300 400 500 0 100 200 300 400 500

Figure 3: Capital Dynamics - 5 and 10 countries


The plots depict capital dynamics for 5 and 10 countries (left and right panels, respectively). The
solution method is a second-order Taylor projection. At the initial state, log capital is distributed
uniformly on the interval [log 0.1, log 1].

42

You might also like