You are on page 1of 13

5190 Ind. Eng. Chem. Res.

2008, 47, 51905202

Computer-Aided Solvent Design for Reactions: Maximizing Product Formation


Milica Folic, Claire S. Adjiman,* and Efstratios N. Pistikopoulos
Centre for Process Systems Engineering, Department of Chemical Engineering, Imperial College London,
London SW7 2AZ, United Kingdom

A hybrid experimental/computer-aided methodology for the design of solvents for reactions, recently proposed
by the authors [Folic et al., AIChE J. 2007, 53, 12401256], is extended. The methodology is based on the
use of a few reaction rate measurements to build a reaction model, followed by the formulation and solution
of an optimal computer-aided molecular design (CAMD) problem. The treatment of complex reaction systems,
such as competing or consecutive reactions, is considered through the incorporation of a simple reactor model
in the problem formulation. This approach is applied to two model reaction schemes, and it is shown that, in
principle, it is possible to identify solvents that maximize product formation by enhancing the main reaction
and suppressing byproduct formation. Since very few measurements are used to build the reaction model, the
effect of uncertainty is tackled explicitly in a stochastic formulation of the CAMD problem. An approach to
sensitivity analysis for the identification of the key model parameters is discussed. Using this information to
generate scenarios, a stochastic optimization problem (whose objective is to determine the solvents
with the best expected performance) is then solved. The final output consists of a list of candidate solvents
that can be targeted for experimentation. The methodology is demonstrated on a Menschutkin reaction, which
is a representative SN2 reaction. This shows that the uncertainty in the reaction model has little impact on the
types of solvent molecules that have the best performance. Dinitrates are found to be a promising class of
solvents, with regard to maximizing the reaction rate constant.

1. Introduction on experience and intuition when it comes to solvent choice


during the development of new reaction routes.
Given the important role played by solvents in chemical Recently, Gani et al.5 proposed a method for solvent selection
processing, significant research effort has been devoted to the for the promotion of organic reactions that combines knowledge
development of systematic methods for solvent selection. from industrial practice and physical insights. CAMD is used
Computer-aided molecular design (CAMD) has emerged as an to generate a list of solvent candidates, ranked according to their
attractive route for solvent selection. Several CAMD approaches score. This method has been proven to be very effective for
have been proposed in the last two decades.1 The two most some application studies, but it requires a significant amount
significant classes of techniques are generate-and-test and of information on both the solvents and the reactions to build
optimization-based methods. These have been successfully a table with solvent scores for each reaction. Although it takes
applied to a variety of solvent-based separation problems, a large number of issues that affect performance into consid-
allowing a much larger number of solvent molecules to be eration, it is not possible to predict the solvent effect on the
considered during separation system design than is possible by reaction rate in a quantitative manner.
experimentation alone.
Using an empirical model of solvent effect on reaction rate
On the other hand, there has been little work on the design constants, in our previous work,68 we have developed a
of solvents that can enhance reaction rates, despite the significant systematic approach to solvent design for reactions, with the
gains that can be achieved by optimizing the choice of reaction objective of maximizing the rate constant for a single reaction.
medium.2 Reaction rate constants and the solubility of the This method is based on targeted experiments, model develop-
substrates, catalysts, and products can vary by several orders ment, and candidate generation by optimization-based CAMD.
of magnitude from solvent to solvent. In complex systems, a The optimization problem is formulated so that a range of design
judicious choice of solvent has the potential to accrue large criteria can easily be incorporated within the problem and
benefits; for instance, it may be possible to influence selectivity considered simultaneously during the design phase. This allows
and thereby minimize byproduct formation. Solvents have been the identification of optimal tradeoffs. The model of solvent
reported to affect the regioselectivity of 1,3-dipolar cycloaddi- effects on the reaction kinetics takes on a simple algebraic form
tions, particularly between nitrones and substituted alkenes.3 It and therefore is quick to evaluate. It can identify solvent
may also be possible to reduce the number of processing steps rankings reliably, but it generally does not provide a high degree
required by telescoping them (that is, by carrying several of quantitative accuracy.
reactions in a single solvent). This leads to a reduction in costs Stanescu and Achenie9 have proposed a two-step method
and, often, emissions. An example of the benefits that can be based on a CAMD step in which candidate solvents are
achieved can be found in the work of Rasmy et al.4 However,
generated based on constraints on their physical properties,
systematic solvent design methods for these complex problems
followed by density functional theory (DFT) solvation calcula-
are lacking, and industrial practice is currently mostly based
tions to estimate the reaction rate and product yield in the
candidate solvents. They have applied this approach to the
* Author to whom correspondence should be addressed. Tel: +44
(0)20 7594 6638. E-mail: c.adjiman@imperial.ac.uk. Kolbe-Schmitt reaction, and they found that the product yield

Current address: R&D-Environmental, Haldor Topsoe A/S, Ny- is largest without solvent. In solvents, the yield is limited by
mollevej 55, 2800 Lyngby, Denmark. the reversibility of the reaction, especially in solvents with a
10.1021/ie0714549 CCC: $40.75 2008 American Chemical Society
Published on Web 06/14/2008
Ind. Eng. Chem. Res., Vol. 47, No. 15, 2008 5191
high dielectric constant. The use of DFT in this approach can because many more have been tabulated. The work of Li et
lead to a better accuracy than that which is possible with simpler al.15 has shown that solute values can be used successfully to
models,8 but it is very time-consuming. model the solvent in the Smx solvation model. Our earlier work
These different attempts at developing CAMD approaches on the solvolysis reaction has also shown the suitability of this
that are applicable to reactive systems highlight the central role approach.8 In the remainder of this paper, S, A, B, , and H2
of predictive models that can relate kinetics to solvent structure. are called solvent properties, and logk0, s, d, a, b, and h are
These range from the empirical solvatochromic equation of called reaction parameters.
Abraham and co-workers1012 to ab initio approaches, such as An important characteristic of the solvatochromic equation
those discussed by Cramer.13 These classes of models pose is that the reaction parameters and solvent properties are
distinct challenges, in terms of their incorporation within independent of each other. Thus, for each reaction, there is a
CAMD, because of their varying levels of complexity, com- fixed set of reaction parameters. Similarly, each solvent has a
putational cost, and reliability. fixed set of properties, regardless of the reaction considered.
The objectives of this paper are to extend our earlier The equation has been used previously to quantify solvent effects
methodology8 to more-complex reaction systems and to test this on the solvolysis of t-butyl halides11,12,16 and on DielsAlder
through several examples. In our earlier work, the approach was reactions.17 Both reaction classes have long been a key reference
applied successfully to an SN1 reaction, where it was shown for theories of solvent effects on organic reaction rates.
that the uncertainty in the empirical model has little effect on To obtain the reaction parameters, the model-building step
the solution. While the results obtained were promising, further involves gathering experimental data for a small set of prede-
experience is required to test whether the proposed framework termined solvents and generating a reaction model through
is broadly applicable across reaction classes. The empirical regression. First, a small set of initial solvents for the reaction
model, in which the solvatochromic equation and group studied must be assembled. Although the number of solvents
contribution techniques are combined, is briefly reviewed in can be chosen arbitrarily, we have found that data in eight
section 2. The formulation is then extended to more-complex solvents yields sufficient information to build a suitable model,
reaction networks, with special emphasis on competing (or provided that the solvents are diverse, in terms of their polarity.
concurrent) reactions, in which more than one transformation As a measure of this, we use the ETN solvent polarity scale18
is available for a given reactant, and consecutive reactions in and we choose solvents that have ENT values distributed over
which the product of the first reaction is an intermediate in the the entire physical range. In addition, it is preferable to choose
formation of the final product, or in which the desired product solvents with different functional groups. Wherever possible,
undergoes further transformation in the second reaction. This literature data should be used at this stage to minimize
is the focus of section 3. The approach is then applied to model experimental costs. In the absence of reliable data, experimental
systems of competing and consecutive reactions in section 4. reaction rate constants for the solvents chosen are measured.
A methodology for handling uncertainty within the CAMD To complete the relationship between solvent structure and
formulation is presented in detail in section 5. In section 6, it is reaction rate constant, solvent property values predicted from a
applied to a Menschutkin reaction, which is typical of the SN2 knowledge of the molecular structure are used in the solvato-
class. chromic equation. The polarizability correction term () can
be calculated exactly based on molecular structure. The other
2. Reaction Model solvent properties are correlated to their molecular structure
using group contribution (GC) methods and correlations that
At the core of the proposed solvent selection methodology8 are widely used for property prediction within CAMD-based
is a two-step approach. The first step is concerned with the frameworks. These methods are based on the principles of
development of a model of solvent effects on the reaction, and transferability and additivity, with atom groups such as CH2
the second step involves the formulation and solution of an and OH, used as building blocks. For further details on the
optimization-based computer-aided solvent design problem, specific methods used for solvent property prediction, the reader
which can be either deterministic or stochastic. The first step is referred to Folic et al.8 Forty-three functional groups (42 of
of the methodology is briefly reviewed in this section. which are UNIFAC groups) are available to construct solvent
The key issue in the first step (model development) is to molecules. The use of UNIFAC first-order groups to build
identify a relationship that links solvent structure to the reaction solvents makes integration with other solvent design approaches
rate in a way that quantifies solvent effects on a particular easier.
reaction. To do this, we use the multiparameter solvatochromic The kinetic data and predicted solvent properties are used to
equation,12 which correlates properties of the solvent such as regress the reaction parameter values (logk0, s, d, a, b, and h)
the empirical solvatochromic parameters (A, B, and S), a and thereby obtain a model of solvent effects on the chosen
polarizability correction term (), and the Hildebrand solubility reaction. The resulting model can predict the reaction rate
parameter (H2 ) with the logarithm of the reaction rate constant: constant in solvents other than those tested experimentally, based
solely on their molecular structure. In addition to the optimal
hH2 values of the reaction parameters which are used in the model,
logk ) logk0 + sS + d + aA + bB + (1) the regression provides confidence intervals for the parameters.
100
These make it possible to quantify the uncertainty in the model.
where logk0, s, d, a, b and h are reaction-specific constants that
quantify the dependence of the rate constant on each solvent 3. Deterministic CAMD Problem Formulation
property. Following the notation of Zissimos et al.,14 A is the
hydrogen bond acidity, B is the hydrogen bond basicity, and S In the second step of the solvent design methodology, a
is the dipolarity/polarizability. Two types of solvatochromic computer-aided solvent design problem is formulated based on
parameters are reported in the literature for a given compound, the predictive model of solvent effects. Two formulations can
depending on how the measurements were obtained: solute be used. Using the optimal, or nominal, values of the reaction
and solvent values. In this work, solute values are used, parameters, a deterministic formulation is derived. Alternatively,
5192 Ind. Eng. Chem. Res., Vol. 47, No. 15, 2008

the confidence intervals can be used to generate a stochastic pendix A. Based on the molecular complexity constraints
formulation that takes the uncertainty into account to arrive at given and the set of atom groups available, 9759 molecules
a design. In this section, the deterministic formulation is are feasible. These include many chemical classes, such as
considered. alcohols, diols, triols, nitrates, amines, carboxylic acids,
In this case, a mixed-integer problem (MIP) is posed, with ethers, esters, aldehydes and ketones, both aliphatic and
the objective to identify a solvent in which the reaction rate aromatic. An even larger space can be explored if some of
constant under given conditions or a function of the reaction the restrictions in molecular complexity constraints are lifted.
rate constant, is maximized. This may be product formation, We also include integer cuts (see Appendix A) in the
yield, or another performance-related objective. The problem formulation to allow the generation of successive solutions,
may be linear (MILP) or nonlinear (MINLP), depending on the giving a ranked list of candidate solvents.
objective function, as will be shown in sections 4 and 6.2 that The introduction of process constraints and a generic objective
involve these case studies. The generic formulation of this function allows the consideration of more-complex reaction
problem is given below: schemes, as shown in sections 3.1 and 3.2.
3.1. Competing Reactions. For competing reactions, we
max f(x_, logk)
consider a model system of a continuous stirred tank reactor
s.t. h_1(x_, p_, logk) ) 0 (CSTR) with a feed stream that consists of reactant C and a
g_1(x_, p_, logk) e 0
solvent. The following first-order reactions occur:
_e(logk, p_) ) 0
h2(n_, p_, _y) ) 0 k1
C 98 D
g_2(n_, p_, _y) e 0
h_3(n_, _y) ) 0
g_3(n_, _y) e 0 (P1) k2
C 98 E
_x Rt

p_ Rm
where D is the desired product whose concentration should
n_ Rq be maximized and E is the side product whose production
logk Rr should be as small as possible, compared to the production
of D. The reaction rate constants corresponding to the desired
_y {0, 1}
u
reaction and side reaction are denoted by k1 and k2,
where f is the objective function; h1 is a set of process model respectively. The following objective function can be used
equality constraints; g1 is a set of process inequality constraints; in formulation P1:
e is a set of solvatochromic equations (one equation for each
rate constant); h2 a set of structureproperty equality constraints; maxCD - CE (2)
g2 is a set of structureproperty inequality constraints; h3 is a
set of chemical feasibility and complexity equality constraints; where Ci denotes the outlet concentration of species i (in units
g3 is a set of chemical feasibility and complexity inequality of mol/m3).
constraints; x is a t-dimensional vector of process variables (e.g., For this case study, the process model (h1) equations in
concentration); logk is an r-dimensional vector of logarithms formulation P1 are as follows:
of reaction rate constants; p is an m-dimensional vector of
continuous variables denoting physical properties; n is a k1 ) 10logk1 (3)
q-dimensional vector of continuous variables denoting the
number of groups of each type in the molecule; and y is a set k2 ) 10logk2 (4)
of binary variables (e.g., used to constrain the n variables to
integer values). CC0
Structure/property constraints include the property predic- CC ) (5)
1 + k1 + k2
tion methods mentioned in section 2. Chemical feasibility
constraints include the octet rule19 and the modified bonding CD ) k1CC (6)
rule,20 to ensure there are no free attachments and no double
bonds between atom groups. These rules ensure that the CE ) k2CC (7)
molecule designed is structurally feasible, but they do not
guarantee that the molecule is chemically stable. Instead, Here, the x (process) variables are given as follows: CC, the
stability is assessed as part of the post-processing of results. outlet concentration of C; CD, the concentration of the desired
Molecular complexity constraints include standard constraints product D; and CE, the concentration of side-reaction product
on the maximum and minimum number of groups in the E. There are two logarithms of the rate constants: logk1 and
molecule, constraints on the maximum number of main and logk2. The parameter CC0 denotes the inlet concentration of
functional groups, as well as constraints that forbid or limit reactant C.
the joint occurrence of some groups. The choice of these The remainder of the formulation is as presented in Appendix
constraints is based on the recognition that solvents are A. The resulting problem is a MINLP program, which is linear
usually medium-size molecules, because they must have a in the binary variables. It can be solved locally in GAMS21 with
useful liquid range. In addition, the group contribution the outer-approximation algorithm.22
methods used here do not account for proximity effects 3.2. Consecutive Reactions. Another reaction system com-
beyond the makeup of atom groups and are, therefore, most monly encountered in industry is the case of consecutive
reliable when used for medium-size molecules. reactions, where the reaction that yields the desired product is
Structureproperty and chemical feasibility and complexity followed by another reaction, which consumes the desired
constraints for this problem formulation are given in Ap- product and, therefore, must be demoted, such as
Ind. Eng. Chem. Res., Vol. 47, No. 15, 2008 5193
k1 k2 Table 2. List of Solvents Used in Regression for Case 1b and the
C 98 D 98 E Chosen Rate Constant Valuesa
solvent EN
T log(k)
where C is the reactant, D the desired product, and E is the phenol 0.95 -0.5
undesired byproduct; k1 and k2 are rate constants. glycerol 0.81 -1.0
One possible objective is the maximization of the difference propane-1,3-diol 0.75 -1.5
between the concentrations of products D (CD) and E (CE). The N-methylformamide 0.72 -1.7
following set of process model (h1) equations must be added to dioxane 0.16 -4.0
benzene 0.11 -5.0
the problem formulation to consider this objective: diethyl ether 0.12 -6.0
pentane 0.01 -7.0
k1 ) 10logk1 (8) a
Polar solvents are assigned high experimental rate constants,
k2 ) 10 logk2
(9) because they are favored for the desired reaction.

CC0 the A values. For instance, aniline, which has a lower value of
CC ) (10) A than acetic acid, has a slightly higher rate constant. The initial
1 + k1
concentration of reactant C (CC0) is assigned a value of 1.5 mol/
k1CC m3.
CD ) (11)
1 + k2 The reaction coefficients for the side reactions in both cases
were obtained by assigning the reaction rate constants to the
k1k2CC same solvents as those used for the main reaction, but in reverse
CE ) (12)
1 + k2 order.
The following equations were obtained for the two competing
Here, the x (process) variables are given as follows: CC, the
reactions in case 1a, through two linear regressions:
outlet concentration of C; CD, the concentration of the desired
product D; and CE, the concentration of side-reaction product 9.76H2
E. There are two logarithms of rate constants: logk1 and logk2. logk1 ) -2.26 + 11.00S - 2.13 + 24.19A - 1.23B -
Parameter CC0 is the inlet concentration of reactant C. 100
(13)
4. Application of Deterministic CAMD to Model 6.05H2
Reaction Schemes logk2 ) -3.19 - 4.82S - 0.09 - 16.52A - 1.85B +
100
The proposed deterministic formulation is applied to the (14)
reaction schemes discussed in sections 3.1 and 3.2.
The regression and prediction statistics obtained (based on
4.1. Competing Reactions. Two model reaction systems are
the eight solvents that were used for regression) are, for eq 13,
considered to demonstrate the application of the formulation
R2 ) 0.97, SE ) 0.77, and AAPE ) 17.50%, and, for eq 14,
from section 3.1. First, we consider a system of competing
R2 ) 0.99, SE ) 0.30, and AAPE ) 6.40%. (Note: AAPE
reactions where the desired reaction is favored in protic solvents
stands for average absolute percentage error, which is defined
while the side reaction is favored in aprotic solvents (case 1a).
as AAPE (%) ) (1/N)i)1 N
(|Xpred,i - Xexp,i|/Xexp,i) 100, where
We then consider a case where the desired reaction rate is
N is the number of compounds in the dataset, Xexp,i is the
enhanced in polar solvents, while the side reaction is faster in
experimental value of the property (log(k)) for compound i and
apolar solvents (case 1b). The eight solvents used for regression
Xpred,i is the predicted value of the property for compound i.)
were chosen based on their hydrogen-bond donor acidity value
Comparing the coefficients in eqs 13 and 14, one can see
(A), for case 1a, or based on their polarity (ETN scale) for case
that the largest absolute difference is in reaction parameter a,
1b, so that half of the solvents are protic (or polar) and half of
which is multiplied by the hydrogen bond donor acidity A; this
the solvents are aprotic (or apolar), for case 1a (or case 1b).
is the property that discriminates between protic and aprotic
The lists of solvents used in cases 1a and 1b are given in Tables
solvents.
1 and 2, respectively. Logarithms of reaction rate constants were
Equations 15 and 16 were obtained for case 1b:
attributed in such a way that solvents in which a reaction is
favored have high values of the rate constants, and solvents 4.26H2
that do not affect the reaction favorably are assigned lower logk1 ) -4.85 + 3.05S + 1.17 + 12.57A + 2.18B -
values. To be more realistic, the rate constant data for case 1a, 100
shown in Table 1, were not ranked exactly in the same order as (15)

7.49H2
Table 1. List of Solvents Used in Regression for Case 1a and the logk2 ) -4.58 - 0.70S - 1.39 - 21.76A - 2.85B +
Chosen Rate Constant Valuesa 100
solvent A logk (16)
glycerol 0.96 -0.5 The regression and prediction statistics obtained (based on the
propane-1,2-diol 0.64 -1.0 eight solvents used for regression) are, for eq 15, R2 ) 0.99,
aniline 0.27 -1.5 SE ) 0.40, and AAPE ) 11.06%, and for eq 16, R2 ) 0.96,
acetic acid 0.61 -1.7
acetone 0.00 -4.0
SE ) 0.83, and AAPE ) 26.19%.
1,2-dichloroethane 0.00 -5.0 Equations 13 and 14 were included in the problem formula-
benzene 0.00 -6.0 tion (these are the e equations in formulation P1). Solving the
pentane 0.00 -7.0 resulting MINLP 100 times, with the addition of an integer cut
a
Protic solvents are assigned high experimental rate constants, because after each solution, resulted in a ranked list of 100 solvents.
they are favored for the desired reaction. The highest objective values were obtained for protic solvents
5194 Ind. Eng. Chem. Res., Vol. 47, No. 15, 2008

(alcohols and amines). Aliphatic hydrocarbons and esters had 5.1. Sensitivity Analysis. Global sensitivity analysis is
the lowest objective function values. For example, the reaction performed first by sampling the uncertain parameter space and
giving product E was completely suppressed in 1,2-dihydroxy- solving the deterministic design problem for each combination
2-propene, whereas in ethane, it was more favorable than the of the parameters sampled. A uniform distribution is used to
reaction that gave product D (CE was higher than CD by 0.22 sample the space if only one parameter is varied. When varying
mol/m3). more than one parameter simultaneously, we use an approach
Similarly, for case 1b, eqs 15 and 16 are included in the proposed by Sobol.24,25 The Sobol approach is based on a
problem formulation. The highest values for CD (e.g., for 1,2- quasi-random generation of numbers, which allows a progressive
dihydroxy-2-propene, a value of 1.5) are obtained for alcohols and efficient sampling of multidimensional spaces. The main
(diols and triols), which are polar solvents, and the lowest (close sensitivity metric considered is the number of different optimal
to zero, and lower than the values of CE) are obtained for solvents found by solving the deterministic design problem
aromatic hydrocarbons and halosubstituted hydrocarbons, which repeatedly for different reaction parameter (Ns) values. Given
are apolar solvents. the large size of the design space (nearly 10 000 solvents), if
4.2. Consecutive Reactions. For the consecutive reactions, uncertainty has a significant impact, it is expected that a large
two cases are considered; when the desired reaction is favored number of different solvents will be found. The occurrence
in protic solvents while the consecutive reaction is favored in frequency of each optimal solvent, which is defined as the
aprotic solvents (case 2a) and a case when the desired reaction number of times a given solvent is found divided by the total
is promoted and the consecutive reaction demoted in polar number of samples, also gives an indication of the importance
solvents (case 2b). The reaction models given by eqs 1316 of uncertainty. If the solution space is fragmented, with all
were used to predict the logarithms of reaction rate constants optimal solvents recurring with low frequency, then uncertainty
for the two cases. affects the solution greatly. If, on the other hand, a few solvents
Solving the deterministic optimization problem for case 2a occur more frequently than others, the impact of uncertainty is
not as significant.
100 times, with the addition of an integer cut after each solution,
For a problem including a single reaction, the full uncertainty
yielded a list of 100 molecules. By ranking those molecules
space for the design problem is five-dimensional and consists
according to the performance measure, the difference in
of parameters s, d, a, b, and h. The value of logk0 can be kept
performance between protic and aprotic solvents is highly
constant, because it does not have an impact on the optimal
visible. The highest objective function values belong to alcohols
solvent structure. When there are several reactions, all six
(diols and triols), followed by the N-monosubstituted amides,
reaction parameters must be included in the uncertainty analysis.
whereas the hydrocarbons and thiols (aprotic solvents) are
Therefore, a large number of sample points may be necessary
located at the bottom of the list, with objective function values
to obtain a good estimate of the sensitivity metrics.
very close to zero. In some solvents (for example, 2-methylene-
It is possible to reduce the resulting computational cost by
1,8-octanediol), the reaction that yields product C is entirely
first computing the impact of each parameter individually and
suppressed.
using this information to reduce the dimensionality of the
Similar results are obtained for case 2b; the highest objective problem. This is done through a one-dimensional global
function values (e.g., 1.47 mol/m3 for 1,1,2-butanetriol) were sensitivity analysis, where the value of one parameter is varied
obtained for polar solvents, mainly alcohols (diols and triols), uniformly over its range, while keeping all other parameters
and the lowest objective function values were observed for constant at their nominal values. In this case, the relative
halosubstituted aromatic and aliphatic hydrocarbons (apolar importance of parameter is computed as
solvents). For those solvents, the objective function value is
close to zero and less than the value of CE, which means that Ns,
the reaction that yields E is more favorable than the reaction S ) (17)
95 - L95
U
that yields product D.
where Ns, is the number of different optimal solvents designed
U L
5. CAMD under Uncertainty over the set of sampled values of and 95 and 95 are the
upper and lower limits of the 95% confidence intervals,
One issue that arises in the formulations presented is the fact respectively (for example, see Table 7, presented later in this
that the reaction model is based on kinetic data in a few solvents paper). The parameters that have a larger S value are key
only, so significant uncertainty is likely to be associated with parameters, and the higher-dimensional sensitivity analysis can
the reaction parameters. This is confirmed by the 95% confi- be focused on the reduced space of key parameters. While this
dence interval that is calculated for each of the reaction analysis allows a reduction on complexity by focusing on key
parameters, which are usually very wide. Therefore, it is parameters, it ignores any correlation between parameters. This
instructive to investigate the impact of uncertainty on the can only be captured by performing a multidimensional analysis,
deterministic design. A two-step strategy has been proposed8 as described later in this paper.
to quantify the effect of uncertainty on model reliability and to Strictly speaking, the evaluation of the sensitivity metric Ns
determine the optimal solvent candidate, given this uncertainty. is completed after all of the new designs have been identified.
The first step is determine the key reaction parameters and the This is an infinite-dimensional problem, and an alternative
optimal solvents most frequently identified across the uncer- convergence criterion is considered which is based on setting a
tain parameter space. To ensure that any correlation between limit on the rate at which new designs are being identified. A
the different reaction parameters is captured, a global sensitivity rate of 1 design per 100 sample points is used in this work,
analysis23 is performed. A stochastic (scenario-based) optimiza- because, in our experience, this has proven to be a good
tion problem then is formulated and solved, using representative indication that sufficient solvent diversity has been achieved.
scenarios distributed throughout the uncertainty range/space. The A byproduct of the sensitivity analysis is a virtual solution
choice of these scenarios is based on the results of sensitivity map, that is, a record of which optimal solutions occur over
analysis, as discussed in the remainder of this section. different regions of the multidimensional parameter space. In
Ind. Eng. Chem. Res., Vol. 47, No. 15, 2008 5195
the two-dimensional case, this map can be visualized, as shown
in Figure 1 where eight solvent molecules, represented by eight
different symbols, are shown. The map shows some dominant
designs, which cover relatively large regions of space (e.g., the
triangle), and less-common designs (such as the cross and
rhomboid). In the case where the design problem is linear, it
can be shown that each design occurs over a convex region of
the parameter space or a union of disjoint convex regions.26
The virtual solution map is used to identify a set of representa-
tive scenarios, as discussed in the next section.
5.2. Scenario Generation and Problem Formulation. After
we have gathered and analyzed the results of the sensitivity analysis,
we can proceed to formulate and solve a stochastic optimization
problem over the uncertain parameter space. The objective function
is the expected performance of the solvent, and it is approximated Figure 1. Map of solutions resulting from sensitivity analysis in a two-
dimensional space (1 2). Each symbol represents a different molecule.
by taking the average performance over a finite set of scenarios.
The problem is formulated for a small number of representative
scenarios that are distributed throughout the uncertainty space. For 6. Application of Stochastic CAMD to a Menschutkin
this purpose, the solution map obtained by sensitivity analysis is Reaction
used. A convex hull is identified for each optimal solvent and one
or more scenarios are selected within each hull. A weight factor is Both the deterministic and the stochastic problem formula-
assigned to each scenario based on the fraction of the volume of tions are applied to a Menschutkin reaction case study. Men-
the overall space represented by the scenario. Further details can schutkin reactions are SN2 reactions between haloalkanes and
tertiary amines, yielding quaternary ammonium salts, and they
be found in Folic et al.8
are known to be highly influenced by the change in reaction
When the representative scenarios have been determined, we medium.27 Quaternary ammonium salts are ionic liquids, which
can formulate and solve the stochastic optimization problem, means they contain only ionic species, not molecular species.
for the purpose of maximizing the expected value of the The particular reaction studied here is between tripropylamine
performance objective over the entire uncertain parameter space. ((C3H7)3N) and methyliodide (CH3I):
The problem is written as follows:
(C3H7)3N + CH3I f CH3(C3H7)3N+ + I-
M Reaction rate data were collected from the kinetic study
max
_x,logk,n_,y_
1

M i)1 i
w f(x_, logki) published by Lassau and Jungers.28 In total, data for 59 solvents
were gathered. Significant solvent effects were observed, as can
s.t. h_1,i(x_, p_i, logki, n_, _y) ) 0 (i ) 1, . . . , M) be seen in Table 3. A theoretical study of solvent effects on
g_1,i(x_, p_i, logki, n_, _y) e 0 (i ) 1, . . . , M) Menschutkin reactions, using ab initio calculations and Monte
_e(logki, p_i) ) 0 (i ) 1, . . . , M) Carlo calculations, has been performed by Castejon and
Wiberg.29 The authors also conducted a study of the reaction
h2,i(n_, p_i, _y) ) 0 (i ) 1, . . . , M) of amines with trimethylsulfonium salts, using ab initio methods,
g_2,i(n_, p_i, _y) e 0 (i ) 1, . . . , M) which predicted the effect of two solvents (water and DMSO)
h_3(n_, _y) ) 0 very well.30
g_3(n_, _y) e 0 6.1. Reaction Model Building. The set of 59 solvents studied
_x Rt by Lassau and Jungers28 is very diverse. The solvents are
presented in a rank-ordered list in Table 3, based on the
p_i RmM
experimentally measured rate constant values.
_n Rq Eight solvents were chosen to build the reaction model, based
log(ki) RrM on their diversity in polarity and in functional groups. These
are listed in Table 4. The remaining 51 solvents are used for
_y {0, 1}
u
verification of the model (that is, for obtaining the statistics upon
(P2) extrapolation of the model to the entire set of 59 solvents).
The following solvatochromic equation was obtained through
where wi is the weight factor for scenario i and M represents regression performed with the eight solvents from Table 4:
the number of scenarios. All other symbols are as previously
0.44H2
defined, and all the constraints are as given in Appendix A. logk ) -5.73 + 3.80S - 0.03 + 3.17A + 0.09B -
The subscript i indicates which scenario is being considered 100
and the weight factors in the formulation correspond to the ratio (18)
of the area (volume) of the particular molecular design convex The statistics obtained for this regression were satisfactory. R2,
hull to the total sum of the area (volume) of convex hulls. The which is the square of the Pearson product moment correlation
solution obtained via stochastic optimization is more reliable coefficient, is 0.86. The average absolute percentage error
than the one obtained by deterministic optimization in the sense (AAPE) calculated after extrapolation of eq 18 to predict
of average performance. We also include integer cuts in the logkvalues for all 59 solvents in the data set is 18.77%.
formulation (see Appendix A.1) to allow the generation of The rank of each solvent is also reported in Table 3, where
successive solutions, giving a ranked list of candidate solvents. 1 denotes the solvent with the largest rate constant, and 59
5196 Ind. Eng. Chem. Res., Vol. 47, No. 15, 2008

Table 3. Solvent Number, Solvent Name, Experimental logk Value, Table 4. Eight Solvents Used to Build the Reaction Model and
Predicted logk Value, and Associated Ranking for the 59 Solvents Their ETN Solvatochromic Polarity Values for the Menschutkin
for the Menschutkin Reaction Reaction
Solvent solvent EN
T value

number name logkexp logkpred predicted ranking benzyl cyanide 0.37


1,1,2,2-tetrachloroethane 0.27
1 benzyl cyanide -1.74 -0.94 1
nitropropane 0.37
2 1,1,2,2-tetrachloroethane -1.84 -1.86 6
bromobenzene 0.18
3 N,N-dimethylformamide -2.00 -2.05 9
tetrahydrofuran 0.21
4 nitroethane -2.12 -2.63 14
benzene 0.11
5 acetophenone -2.15 -1.96 8
ethanol 0.65
6 1,2-dichloroethane -2.20 -3.53 29
hexane 0.01
7 benzaldehyde -2.22 -1.94 7
8 2,5-hexanedione -2.23 -1.21 2 Table 5. Ranked List of Top 10 Solvents Obtained in the
9 phenylpropanone -2.26 -1.84 5 Deterministic Design Step for the Menschutkin Reaction
10 phenyl-4-butanone-2 -2.26 -1.22 3
11 1,1,2-trichloroethane -2.26 -2.66 15 Solvent
12 nitropropane -2.32 -2.61 13 rank name objective function
13 propionitrile -2.33 -2.52 12
14 cyclopentanone -2.44 -2.79 16 1 2-methylene-1,8-dinitrooctane 0.94
15 butyronitrile -2.46 -2.49 11 2 2-methylene-1,7-dinitroheptane 0.93
16 1-methylnaphthalene -2.50 -2.87 18 3 2-methylene-1,6-dinitrohexane 0.92
17 1,4-dichlorobutane -2.50 -3.53 30 4 2-methylene-1,5-dinitropentane 0.91
18 cyclohexanone -2.57 -2.80 17 5 2-methylene-1,4-dinitrobutane 0.88
19 acetone -2.60 -3.45 24 6 (3-nitro-2-butenyl)aniline 0.77
20 iodobenzene -2.66 -3.08 19 7 2-methylene-3-methyl-1,8-dinitrooctane 0.76
21 1,2-dibromoethane -2.66 -3.50 28 8 2-methylene-3-methyl-1,7-dinitroheptane 0.75
22 2,4-pentanedione -2.66 -1.24 4 9 2-methylene-3-methyl-1,6-dinitrohexane 0.75
23 o-dichlorobenzene -2.78 -3.48 26 10 2-methylene-3-methyl-1,5-dinitropentane 0.74
24 2-butanone -2.79 -3.47 25
25 anisole -2.83 -3.43 23
26 bromobenzene -2.83 -3.16 20 good predictions of the effect of solvent on the logarithm of
27 1,1-dichloroethane -2.90 -3.73 34 the reaction rate constant for the Menschutkin reaction, although
28 ethanethiol -2.91 -4.64 52 a small set of solvents is used for regression.
29 chlorobenzene -2.93 -3.74 35 6.2. Deterministic Design of Solvent Molecules. Having
30 benzylalcohol -3.01 -2.08 10 developed a model of solvent effects for the Menschutkin
31 styrene -3.05 -3.97 41
32 ethoxybenzene -3.08 -3.41 22 reaction, we now focus on generating new solvent candidates.
33 3-heptanone -3.15 -3.60 31 Because there is only one reaction, the following simple
34 dioxane -3.21 -3.34 21 objective function is used:
35 m-dichlorobenzene -3.21 -3.48 27
36 allylchloride -3.23 -4.36 48 max logk (19)
37 tetrahydrofuran -3.32 -4.07 45
38 bromoethane -3.40 -4.61 50 where logk is given by eq 18.
39 ethyl acetate -3.44 -3.90 38 Solving the design problem, which involves the constraints
40 benzene -3.52 -3.99 42 given in Appendix A, we obtain a ranked list of candidate
41 1-bromobutane -3.66 -4.64 53 solvents that give high values for the reaction rate constant for
42 1-chlorobutane -3.66 -4.59 49
43 methanol -3.66 -3.91 39 the Menschutkin reaction. In Table 5, a list of the 10 best ranked
44 toluene -3.80 -3.96 40 solvents and their objective function values is shown. The three
45 ethanol -3.80 -3.78 36 best solvents are presented in Figure 2.
46 1-chlorohexane -3.89 -4.62 51 Nine out of the 10 candidate solvents in Table 5 are
47 propanol -3.91 -3.70 32
unsaturated aliphatic dinitrates, which points to this chemical
48 ethylbenzene -3.99 -4.10 46
49 p-xylene -4.04 -4.03 43 class as a source of good solvents for the Menschutkin reaction.
50 m-xylene -4.06 -4.03 44 The unsaturated bond on C2 carbon in most of the structures
51 n-butanol -4.11 -3.72 33 reported in Table 5 may readily lead to isomerization into
52 cumene -4.12 -4.19 47 conjugation with the N1 nitrogen, forming a CdCH-NO2
53 1,3,5-trimethylbenzene -4.40 -3.89 37
54 ether -4.70 -4.92 56
group, which would react with nucleophiles. These compounds
55 isoprene -4.93 -5.18 58 are thus not standardly used as solvents, and this explains their
56 dibutyl ether -5.18 -5.01 57 absence from the experimental list. As an alternative, saturated
57 cyclohexene -5.48 -4.86 54 aliphatic dinitrates (1,n-dinitroalkanes, where n ranges from 2
58 cyclohexane -5.93 -4.90 55 to 9) are also identified as good candidates. Thus, 1,9-
59 hexane -6.78 -5.74 59
dinitrononane has an objective function value of 0.64 and is
denotes the solvent with the smallest rate constant. This can be predicted to be the best unsaturated molecule, followed closely
used for the qualitative analysis of the results obtained from by similar compounds. No dinitro compounds were present in
the regression. Benzyl cyanide is ranked first by prediction and the experimental lists given in Table 3; therefore, their
by experiment. If we consider the first 10 experimentally tested performance must be tested in the laboratory for these results
solvents, 7 are among the top 10 solvents given by the model to be confirmed experimentally. Most of the other highly ranked
predictions. For the first 20 experimentally tested solvents, 17 solvent candidates contain at least one nitro group. They include,
are predicted to be within the top 20. Therefore, the model for instance, benzene derivatives, with a nitro group and another
provides a ranking suitable for solvent design. functional group such as the amine group. Compounds with a
Overall, given the statistics and rankings presented for the single NO2 group (e.g., nitroethane and nitropropane) are ranked
regression performed, the solvatochromic equation seems to give high in the list of experimentally tested solvent shown in Table
Ind. Eng. Chem. Res., Vol. 47, No. 15, 2008 5197

Figure 3. Number of sampled points versus the number of generated designs


for the two-dimensional uncertainty space (s-h) for the Menschutkin
reaction.

Table 8. Results for the Two-Dimensional Sensitivity Analysis Runs


for the Menschutkin Reaction
number of molecules
parameters varied size of parameter space generated
s and d 14.53 10.09 13
a and b 33.41 24.79 9
s and a 14.53 33.41 28
s and b 14.53 24.79 27
s and h 14.53 21.81 34

reflects a fairly high uncertainty. Robust molecular design is


addressed in the next section through the strategy presented in
section 5.
Figure 2. Structures of the three best molecules generated in the design
step for the Menschutkin reaction. 6.3. Design of Solvent Molecules under Uncertainty. In
this section, the reliability of the reaction model developed for
Table 6. Lower and Upper Bounds of the 95% Confidence Intervals the Menschutkin reaction in section 6.1 is investigated with
for the Reaction Parameters for the Menschutkin Reaction in eq 18
global sensitivity analysis to determine the key parameters and
95% Confidence Interval obtain a map of solutions. Based on these results, a small
parameter lower bound upper bound number of representative scenarios are identified. For those
scenarios, we then solve a stochastic optimization problem and
s -3.46 11.07
d -5.07 5.02 compare the resulting robust solutions with those obtained via
a -13.54 19.87 deterministic optimization in section 6.2.
b -12.31 12.48 Sensitivity analyses are performed in one, two and five
h -11.34 10.47
dimensions. The results of a two-dimensional run and of the
Table 7. Sampled Parameter Range and Number of Distinct Designs five-dimensional run are used to formulate and solve the problem
from One-Dimensional Sensitivity Runs for All Five Parameters for of solvent design under uncertainty.
the Menschutkin Reaction
6.3.1. One-Dimensional Sensitivity Analysis. For the one-
parameter sensitivity dimensional sensitivity analysis, 1000 parameter combinations
varied range number of designs generated index, S
were sampled. No new designs were generated in the last 500
s 14.53 9 0.62 combinations. The numbers of designs generated from these
d 10.09 2 0.20 runs, together with the width of the uncertainty range for each
a 33.41 3 0.09 parameter, are shown in Table 7. It can be seen from the S
b 24.79 5 0.20
h 21.81 11 0.50 values that the design is almost insensitive to parameter a, more
sensitive to parameters b and d, and notably more sensitive to
parameters s and h. This is a similar result to that obtained for
3, which indicates that the properties of the nitro group affect the solvolysis reaction.8 Here, we have a case of a reaction in
this reaction favorably. which the product is an electrolyte, so the reaction involves the
The molecules designed should be considered with caution, formation of ions and, most likely, the solvent dipolarity/
although they give a good indication of what would be the best polarizability and cohesive energy density contribute the most
type of solvents to target for experimental verification. Because to the value of logk.
the reaction model is built based on kinetic data in eight solvents 6.3.2. Two-Dimensional Design under Uncertainty.
only, and then extrapolated in the design step to predict rate 6.3.2.1. Sensitivity Analysis. To illustrate the proposed pro-
constants for nearly 10 000 compounds, high uncertainty in the cedure graphically, two-dimensional sensitivity analyses are
performed. The Sobol sequence24,25 is used to sample the
rate constants can be expected.
uncertainty space, starting with 1024 (210) sampled points, and
The bounds (upper and lower bound) of the 95% confidence increasing the number in each run by 1024 points. A graph
intervals for the parameters in eq 18 are shown in Table 6. For showing the change in the number of designs with the number
example, the most uncertain parameter is a, for which the lower of sampled points for the two-dimensional case where conver-
bound is -13.54 and the upper bound is 19.87. This wide range gence was achieved with the slowest rate is presented in Figure
5198 Ind. Eng. Chem. Res., Vol. 47, No. 15, 2008

Figure 4. Solution map for s vs h, divided into clustered areas of different design solutions. All molecules generated are presented. The largest areas
correspond to the following molecules: 1, glycerol, 2, 2-methylene-1,8-dinitrooctane, 3, 6,7-dimethyl-8-nitro-2-nitromethyl-1-octene; and 4, iso-butane.

3. This figure shows that convergence is achieved after 6144 the first 10 solutions are the same. The only difference is in the
points. ranking, where we have (3-nitro-2-butenyl)aniline, the only
The results of five two-dimensional runs are shown in Table aromatic within the 10 molecules, ranked as the 6th-best solvent
8. As expected, varying two parameters at a time results in an in Table 5 and the 10th-best solvent in Table 9. This overlap of
increase in the number of designs generated. If the two the solvent candidates obtained via deterministic and via
parameters varied describe different interactions in the system, stochastic optimization indicates the relative insensitivity of the
more than twice the number of designs are obtained than when solvents designed to uncertainty.
one combines parameters that describe similar types of interac- 6.3.3. Five-Dimensional Design under Uncertainty. Fol-
tions (s and d or a and b). The largest number of designs is lowing the application of the sensitivity analysis to the full five-
obtained when varying simultaneously parameters s and h. The dimensional problem, 150 different solvent candidates are
design is almost insensitive to parameter a, more sensitive to obtained after sampling 8192 parameter combinations. At that
parameters b and d, and notably more sensitive to parameters s point, few new molecules are found for new combinations of
and h. This is consistent with the results obtained in the one- the parameters (less than 1 molecule per 100 runs). Although
dimensional analysis. several thousand molecular design problems are solved during
6.3.2.2. The Design Problem. Having identified s and h as the five-dimensional analysis, each run requires 0.2 CPU s, on
the key parameters, we focus only on their combination to obtain average, on a single CPU (Intel Pentium, 1.73 GHz), so that
a robust solution in the two-dimensional uncertainty space, while the overall computational requirement is low.
keeping the other parameters at their nominal values. The map Out of 150 molecules designed, only 33 occur with enough
of clustered solutions in the two-dimensional (s and h) space is frequency (more than 6 parameter realizations) to generate the
shown in Figure 4. convex hull. The scenario-based optimization problem is
Ninety representative scenarios are obtained by dividing each formulated based on these 33 scenarios, with the inclusion of
cluster into three subareas. Out of 34 molecular design clusters, integer cuts. Its solution results in a ranked list of candidate
4 molecules appear in too few runs (<4) to allow the calculation solvents. The first five are listed in Table 10. These solutions
of the cluster area and consecutive splitting into three subareas. are not identical to those in Tables 5 and 9, but they share very
Although only 34 molecules are used to generate scenarios, the similar structural and functional characteristics. The molecule
entire design space of almost 10 000 molecules is considered that is ranked first belongs to one of the smallest clusters
when solving the design problem. The best solution of the identified by sensitivity analysis, but it was found to be the most
stochastic optimization problem is 2-methylene-1,8-dinitrooc- robust and gives the highest expected reaction rate constant.
tane. This molecule seems to be the most robust and has the The optimal molecule from the deterministic optimization run
highest expected value of the rate constant. This molecule is (2-methylene-1,8-dinitrooctane) is ranked 14th, with an objective
the optimal deterministic design. Interestingly, it is not the function value of -0.122, which is very close to that of
molecule with the largest cluster (glycerol). 2-methylene-10-nitro-1,1-dichlorodecane, the top-ranked solvent.
The problem was solved again with just 30 scenarios, one Thus, a candidate solvent other than that which was deter-
per molecule cluster. Each scenario is assigned a weight factor mined in section 6.2 has the best average performance, according
proportional to the cluster area for that molecule. Comparing to the five-dimensional study. This is due to the uncertainty
the solution list in Table 9 with the one obtained via the associated with the reaction parameters. The optimal solution
deterministic optimization given in Table 5, we can see that obtained with the deterministic optimization is the 14th most
Ind. Eng. Chem. Res., Vol. 47, No. 15, 2008 5199
robust molecule throughout the uncertainty space, but it is close, Table 9. Ranked List of Top 10 Solvents Obtained via Stochastic
in regard to functionality, structure, and performance, to the Optimization in the Two-Dimensional s-h Uncertainty Space for the
Menschutkin Reaction
best design.
solvent rank solvent name objective function
Given this observation, the model that has been developed
for the Menschutkin reaction seems to give sufficiently reliable 1 2-methylene-1,8-dinitrooctane 0.041
predictions for design, given the very large search space and 2 2-methylene-1,7-dinitroheptane 0.041
3 2-methylene-1,6-dinitrohexane 0.041
the small number of data used to develop it. 4 2-methylene-1,5-dinitropentane 0.040
5 2-methylene-1,4-dinitrobutane 0.039
7. Conclusions 6 2-methylene-3-methyl-1,5-dinitropentane 0.035
7 2-methylene-3-methyl-1,8-dinitrooctane 0.035
In this paper, we have extended a systematic approach for 8 2-methylene-3-methyl-1,7-dinitroheptane 0.034
9 2-methylene-3-methyl-1,6-dinitrohexane 0.034
solvent design for enhanced reaction rate constants to more- 10 (3-nitro-2-butenyl)aniline 0.034
complex reactions schemes, such as those involving competing
and consecutive reactions. This has been applied to two model Table 10. Ranked List of the Top Five Solvents Obtained via
Stochastic Optimization in Five-Dimensional Uncertainty Space for
systems. A systematic framework to handle uncertainty has been
the Menschutkin Reaction
discussed and applied to the Menschutkin reaction. A key feature
objective
of the approach is that kinetic data in a small number of solvents solvent rank solvent name function
can be used as a basis to explore a very large solvent design
space. 1 2-methylene-10-nitro-1,1-dichlorodecane -0.117
2 2-methylene-3,4-methyl-1,8-dinitrooctane -0.118
For model reaction systems of competing and consecutive 3 2-methylene-9-nitro-1,1-dichlorononane -0.119
reactions, the objective considered was to maximize the 4 2-methylene-3-methyl-9-nitro-1,1-dichlorononane -0.119
concentration of the desired product relative to that of a 5 2-methylene-3-methyl-1,8-dinitrooctane -0.120
byproduct. This led to a mixed-integer nonlinear programming
(MINLP) problem formulation, which is linear in the binary order groups (see, for instance, the work of Constantinou and
variables. The case studies showed that the solution of this Gani31) will be considered, because this will allow the distinction
design problem can lead to solvents that potentially suppress of isomers.
undesired reactions. This is promising for the application of the
method to multistep reactions. Appendix A. Constraints Used in Problem Formulations
P1 and P2
Application of the methodology to a Menshcutkin reaction
representative of the SN2 class has shown that it provides A.1. StructureProperty Constraints. The hydrogen bond
adequate quantitative predictions and a good qualitative assess- acidity (A) is given by
ment of the suitability of a wide range of solvents. The objective
considered in the deterministic problem formulation was the n A + 0.010641 - 0.029 - My
i i Ae0 (20)
logarithm of the reaction rate constant, which led to a mixed- iG

integer linear programming (MILP) problem formulation. By


solving this problem, good preliminary candidates that give high
M(yA - 1) - n A - 0.010641 + 0.029 e 0
i i (21)
iG
values of reaction rate constants were generated. In section 6.3,
the impact of the uncertainty associated with the model was -A + n A + 0.010641 + (y
i i A - 1) e 0 (22)
iG
handled by searching for robust candidates via stochastic
optimization. Despite the simplicity of the model and the 0 e A e MyA (23)
significant uncertainty, consistent results were obtained, indicat-
ing the robustness of the approach for this case study and A- n A - 0.010641 e 0
i i (24)
iG
pointing to dinitrates as being suitable solvents for this reaction.
Although many of the molecules generated are not standardly where M is a large-enough positive number, G represents a set
of 43 functional groups for which the contributions are available
used as solvents, the prediction of improved performance makes
for all the solvent properties, and yA is a binary variable. We
it worthwhile to investigate other relevant properties of these
use a value of M ) 100.
compounds, such as reactivity, environmental impact, and health
The hydrogen bond basicity (B) is given by
and safety implications.
One of the advantages of using the solvatochromic equation
is that its development requires only a basic understanding of
n B + 0.12371 - 0.124 - My
i i Be0 (25)
iG

n B - 0.12371 + 0.124 e 0
the mechanism of the reaction and does not require postulating
the effects of solvent at the electronic level. Therefore, the M(yB - 1) - i i (26)
iG
proposed framework can be deployed early in the investigation
of a new reaction. To achieve greater accuracy in the predictions -B + n B + 0.12371 + (y
i i B - 1) e 0 (27)
of reaction rate constants, future work will focus on incorporat- iG
ing a quantum mechanical analysis of the reaction and of solvent 0 e B e MyB (28)
effects. This will ultimately provide insight into the mechanistic
aspects of solvent effects and will thereby result in further B- n B - 0.12371 e 0
i i (29)
opportunities to improve a reaction of interest. It will also require iG

a complete description of the molecular structure of the reactants where M is a large-enough positive number (M ) 100, as
and transition states. Finally, further extensions to allow a more- previously stated) and yB is a positive variable.
detailed representation of the solvent via second- and third- The polarizability correction parameter () is given by
5200 Ind. Eng. Chem. Res., Vol. 47, No. 15, 2008

ne n
iGA
i ( ) iGA
i y1
max
(30)
Table 11. Bounds on the Number of Atom Groups Used in the
Problem Formulation
Parameter Description Value
n gy i 1 (31) nG,min minimum number of groups allowed in a molecule 2
iGA
nG,max maximum number of groups allowed in a molecule 10

hn
nC)C,max maximum number of carbon-carbon double bonds 1
2y2 e i i (32)
iGH
The binary variables yi, k ensure that the values of the continuous
y2
( )
iGH
hini
max
g h n - 0.26 h n
iGH
i i ( iGH
i i ) max
(33) variables ni are integers.
K

g y1 (34) 2 k-1
yi,k - ni ) 0 iG (45)
k)1
e y1 + 0.5y2 (35)
The modified bonding rule20 is
g 0.5y2 (36)
where GA is the set of aromatic groups; hi is the number of nj(j - 1) + 2m - n e0 i jG (46)
iG
halogen atoms in group i, and GH is the set of halogen-containing
groups; y1 and y2 are binary variables; the subscript max A.3. Chemical Complexity Constraints. The following
denotes the maximum possible values of the summations given constraint limits the complexity and size of molecules:

n e0
the bounds on ni and hi for each i.
The dipolarity/polarizability parameter (S) is given by nG, min - i (47)
iG

S) n S + 0.326
i i (37) n en i G,max (48)
iG iG

()
The Hildebrand solubility parameter (H) is given by

ni
e y5 + y7 (49)
HV - RT iGF nUi
H2 ) 0.239 (38)
Vm
nCH2)CH + nCH)CH + nCH2)C + nCH)C + nC)C e nC)C,max
where HV is the solvents enthalpy of vaporization at a (50)
temperature of 298 K and at a pressure equal to the vapor
pressure of the compound at this temperature and Vm is the liquid where nG, min and nG, max are the minimum and maximum
molar volume of the solvent. This nonlinear expression is numbers of groups in the molecule, respectively; niU is the
linearized32 for the reformulation of structureproperty func- maximum number of groups of type i allowed; GF is the set of
tions: functional groups; and nC)C,max is the maximum number of
carbon-carbon double bonds. Values for these scalars are
HV[298 K] ) nH i V,i + 6.829 (39) reported in Table 11.
iG Further constraints are
where HV, i is the coefficient for group i, and
n g 2y
( n V )
i 5 (51)
Vm ) i m,i + 0.012 (40) iGM

n e 2y
iG
i 7 + 10y5 (52)
where Vm, i is the coefficient for group i. iGM
A.2. Chemical Feasibility Constraints. The type of mol-
ecule designed (acyclic, bicyclic, monocyclic) is represented ni e nUi iG (53)
by three binary variables: a value of y5 ) 1 gives an acyclic where GM is the set of main groups. Those are the aliphatic
molecule, a value of y6 ) 1 gives a bicyclic solvent molecule, groups that contain only C and H atoms, regardless of whether
and a value of y7 ) 1 gives a monocyclic molecule:19 the bond between two carbons is single or double. Values and
y5 + y6 + y7 ) 1 (41) expressions for niU for each group i are shown in Table 12.
A.3.1. Formation of Mixed Aliphatic and Aromatic
m - (y5 - y6) ) 0 (42) Molecules. In a monocyclic aromatic molecule, we allow the
where m is a continuous variable that represents the type of occurrence of side chains that consist of, at most, two nonaro-
molecule. matic groups, through the following constraints:
The octet rule19 is defined as naC - 0.9 - MyaC e 0 (54)
M(yaC - 1) - naC + 1 e 0
(2 - )n - 2m ) 0
(55)
i i (43)
iG naCCH - 0.9 - MyaCCH e 0 (56)
where i is the valency of group i. M(yaCCH - 1) - naCCH + 1 e 0 (57)
In an aromatic molecule, the number of aromatic groups must
equal 6 if the molecule is monocyclic or 10 if it is bicyclic: naCCH2 - 0.9 - MyaCCH2 e 0 (58)
M(yaCCH2 - 1) - naCCH2 + 1 e 0 (59)
n - 6yi 7 - 10y6 ) 0 (44)
y7 + yaC - 1 - MyM e 0 (60)
iGA
Ind. Eng. Chem. Res., Vol. 47, No. 15, 2008 5201
U
Table 12. Upper Bounds ni on the Occurrence of Each Group Table 13. List of Groups Allowed in the Side Chains of Monocyclic
Aromatic Molecules
group i niU
set Nceg set Ceg
CH3 nG, maxy5 + yaCCH + yaCCH2
CH2 nG, maxy5 CH)CH CH2)CH
CH 3y5 CH2CO CH3CO
C y5 CH2COO CH3COO
CH2dCH y5 + yM + yaCCH + yaCCH2 CH2O CH3O
CHdCH y5 + yM + yaCCH2 CH2NH CHO
CH2dC y5 CH3N CH2NH2
CHdC y5 CHNO2 COOH
C)C y5 CH2SH CH2CN
aCH 6y7 + 8y6 CH2Cl
aC yM + 2y6 CH2NO2
aCCH3 6y7 + 8y6 I
aCCH2 yaCCH2 Br
aCCH yaCCH

n e 3y
OH 3
aCOH 6y7 + 8y6 i 5 + yaCCH + yM + yaCCH2 (65)
CH3CO 1 iCeg


CH2CO 1
CHO 1 ni e 3y5 + yM + yaCCH2 (66)
CH3COO 1 iNceg
CH2COO 1
CH3O 1 A.4. Design Constraints. The solvent obtained should be
CH2O 1 liquid at room temperature. This observation translates into a
CH-O 1 constraint on the melting point Tm. An upper bound for Tm of
CH2NH2 2 317 K is used, taking into account the error in the group
CH3NH 1 contribution method used.33 We define Tm,e as
CH2NH 1

( )
CH3N 1 Tm
CH2N 1 Tm,e ) exp (67)
aCNH2 6y7 + 8y6 Tm0
CH2CN 1
COOH 1 where Tm0 is a reference value for the melting point.33 Then,
CH2Cl 2
CHCl
CHCl2
1
2
Tm,e ) nT i mi (68)
iG
aCCl 6y7 + 8y6
CH2NO2 2 Tm,e e 8.6 (69)
CHNO2 1
CH2SH 1 where Tm,i is the contribution of group i, i G.
I 2 A.5. Formulation-Specific Bounds. When the nU i values are
Br 2 known, the dimension of the problem becomes dependent on
aCF 6y7 + 8y6 the values of several parameters, such as the minimum number
CH2S 1 of groups in the solvent (nG,min). The values for these parameters
are given in Table 11. The maximum number of groups of each
M(yM - 1) - y7 - yaC + 2 e 0 (61) type in the molecule is further limited to 7, by specifying the
2y6 + yM - naC ) 0 (62) value of K in eq 45 to be 3.
A.6. Integer Cuts. We include integer cuts in the formulation
yM + yaCCH + yaCCH2 e 1 (63) to allow the generation of successive solutions, giving a ranked
list of candidate solvents. For candidate p, we define Zp ) {i:
where yaC, yaCCH, yaCCH2, and yM are binary variables. M is a yip ) 0} and NZp ) {i:yip ) 1}, and then the constraint is given
large enough positive number (M ) 100, as previously stated). by
The composition of the side chains is limited to only certain
aliphatic groups. We categorize these groups into non-chain-
ending (which are the groups allowed on the first position,
y(i, k) - y(i, k) e |NZ |-1
p
(70)
iNZp,k iZp,k
directly linked to the aromatic group) and chain-ending (which
are the groups allowed either on the first position or on the Acknowledgment
second position, linked to one of the nonchain-ending groups).
The authors thank P. C. Taylor for valuable discussions. M.F.
Non-chain-ending groups belong to set Nceg, and chain-ending
gratefully acknowledges financial support from the ORS scheme.
groups belong to set Ceg. Both sets are shown in Table 13.
The parameter values chosen in Table 13 provide a balance
between the complexity and diversity of the molecules designed Literature Cited
and the reliability of the property prediction techniques used (1) Achenie, L. E. K.; Gani, R.; Venkatasubramanian, V. Computer-
for complex molecules. These parameters can easily be modified Aided Molecular Design: Theory and Practice; Elsevier Science Publishers:
to increase the design space. Amsterdam, The Netherlands, 2002.
(2) Cox, B. G. Modern Liquid Phase Kinetics; Oxford University Press:
The aCCH group leads to the presence of two side chains. Oxford, U.K., 1994.
One of these chains must be the CH3 group: (3) Cosso, F. P.; Morao, I.; Jiao, H.; Schleyer, P. V. R. In-Plane
Aromaticity in 1,3-Dipolar Cycloadditions. Solvent Effects, Selectivity, and
yaCCH e nCH3 (64) Nucleus-Independent Chemical Shifts. J. Am. Chem. Soc. 1999, 121, 6737.
(4) Rasmy, O. M.; Vaid, R. K.; Semo, M. J.; Chelius, E. C.; Robey,
The other chain consists of a chain-ending group: R. L.; Alt, C. A.; Rhodes, G. A.; Vicenzi, J. T. Process Development of
5202 Ind. Eng. Chem. Res., Vol. 47, No. 15, 2008

(1S,2S,5R,6S)-Spiro[Bicyclo[3.1.0]Hexane-2,5-Dioxo-2,4-Imidazolidine]- (17) Cativiela, C.; Garcia, J. I.; Gil, J.; Martinez, R. M.; Mayoral, J. A.;
6-Carboxylic Acid, (R)-R-Methylbenzenemethanamine Salt (LSN344309). Salvatella, L.; Urieta, J. S.; Mainar, A. M.; Abraham, M. H. Solvent Effects
Org. Process Res. DeV. 2006, 10, 28. on Diels-Alder Reactions. The Use of Aqueous Mixtures of Fluorinated
(5) Gani, R.; Jimnez-Gonzlez, C.; Constable, D. J. C. Method for Alcohols and the Study of Reactions of Acrylonitrile. J. Chem. Soc., Perkin
Selection of Solvents for Promotion of Organic Reactions. Comput. Chem. Trans. 2 1997, 653.
Eng. 2005, 29, 1661. (18) Reichardt, C.; Harbusch-Grnert, E. Erweiterung, Korrektur und
(6) Folic, M.; Adjiman, C. S.; Pistikopoulos, E. N. The Design of Neudefinition der E T -Lsungs-mittepolaritsskala mit Hilfe eines Lipo-
Solvents for Optimal Reaction Rates. In Proceedings of the 14th European philen Penta-tert-butyl-substituierten Pyridinium-N-phenolat-betainfarbst-
Symposium on Computer-Aided Process Engineering (ESCAPE 14); Bar- offes. Liebigs Ann. Chem. 1983, 5, 721.
bosa, A., Matos , H., Eds.; Computer-Aided Chemical Engineering, Vol. (19) Odele, O.; Macchietto, S. Computer Aided Molecular Design: A
18; Elsevier B.V. Science Publishers: Amsterdam, The Netherlands, 2004; Novel Method for Optimal Solvent Selection. Fluid Phase Equilib. 1993,
p 175. 82, 47.
(7) Folic, M.; Adjiman, C. S.; Pistikopoulos, E. N. A Computer-Aided (20) Buxton, A.; Livingston, A. G.; Pistikopoulos, E. N. Optimal Design
Methodology for Optimal Solvent Design for Reactions with Experimental of Solvent Blends for Environmental Impact Minimization. AIChE J. 1999,
Verification. In Proceedings of the 15th European Symposium on Computer- 45, 817.
Aided Process Engineering (ESCAPE 15); Puigjaner, L., Ed.; Computer- (21) Brooke, A.; Kendrick, D.; Meeraus, A.; Raman, R. GAMS A Users
Aided Chemical Engineering, Vol. 20B; Elsevier B.V. Science Publishers: Guide; GAMS Development Corporation: 1998.
Amsterdam, The Netherlands, 2005; p 1651. (22) Viswanathan, J.; Grossmann, I. E. A Combined Penalty Function
and Outer Approximation Method for MINLP Optimization. Comput. Chem.
(8) Folic, M.; Adjiman, C. S.; Pistikopoulos, E. N. Design of Solvents
Eng. 1990, 14, 769.
for Optimal Reaction Rate Constants. AIChE J. 2007, 53, 1240.
(23) Saltelli, A., Chan, K., Scott, E. M., , Eds. SensitiVity Analysis; John
(9) Stanescu, I.; Achenie, L. E. K. A Theoretical Study of Solvent Effects Wiley & Sons, Ltd.: Chichester, England, 2000.
on Kolbe-Schmitt Reaction Kinetics. Chem. Eng. Sci. 2006, 61, 6199. (24) Sobol, I. M. On the Distribution of Points in a Cube and the
(10) Abraham, M. H. Substitution at Saturated Carbon. Part XIV. Solvent Approximate Evaluation of Integrals. Comput. Math. Math. Phys. 1967, 7,
Effects on the Free Energies of Ions, Ion-Pairs, Non-Electrolytes, and 86.
Transition States in Some SN and SE Reactions. J. Chem. Soc., Perkin Trans. (25) Sobol, I. M. Uniformly Distributed Sequences with Additional
2 1972, 1343. Uniformity Properties. USSR Comput. Math. Math. Phys. 1976, 16, 236.
(11) Abraham, M. H.; Taft, R. W.; Kamlet, M. J. Linear Solvation (26) Gal, T. Postoptimal Analyses, Parametric Programming, and
Energy Relationships. 15. Heterolytic Decomposition of the Tert-Butyl Related Topics; de Gruyter: New York, 1995.
Halides. J. Org. Chem. 1981, 46, 3053. (27) Reichardt, C. SolVents and SolVent Effects in Organic Chemistry;
(12) Abraham, M. H.; Doherty, R. M.; Kamlet, M. J.; Harris, J. M.; WileyVCH: Weinheim, Germany, 1988.
Taft, R. W. Linear Solvation Energy Relationships. Part 37. An Analysis (28) Lassau, C.; Jungers, J. LInfluence du Solvant sur la Raction
of Contributions of Dipolarity-Polarisability, Nucleophilic Assistance, Chimique. La Quaternation des Amines Tertiaires par lIodure de Mthyle.
Electrophilic Assistance, and Cavity Terms to Solvent Effects on t-Butyl Bull. Soc. Chim. Fr. 1968, 7, 2678.
Halide Solvolysis Rates. J. Chem. Soc., Perkin Trans. 2 1987, 913. (29) Castejon, H.; Wiberg, K. B. Solvent Effects on Methyl Transfer
(13) Cramer, C. J. Essentials of Computational Chemistry;Theories Reactions. 1. The Menshutkin Reaction. J. Am. Chem. Soc. 1999, 121, 2139.
and Models; John Wiley and Sons, Ltd.: Chichester, U.K., 2005. (30) Castejon, H.; Wiberg, K. B.; Sklenak, S.; Hinz, W. Solvent Effects
(14) Zissimos, A. M.; Abraham, M. H.; Du, C. M.; Valko, K.; Bevan, on Methyl Transfer Reactions. 2. The Reaction of Amines with Trimeth-
C.; Reynolds, D.; Wood, J.; Tam, K. Y. Calculation of Abraham Descriptors ylsulfonium Salts. J. Am. Chem. Soc. 2001, 123, 6092.
from Experimental Data from Seven HPLC Systems; Evaluation of Five (31) Constantinou, L.; Gani, R. New Group Contribution Method for
Different Methods of Calculation. J. Chem. Soc., Perkin Trans. 2 2002, Estimating Properties of Pure Compounds. AIChE J. 1994, 40, 1697.
2001. (32) Maranas, C. D. Optimal Computer Aided Molecular Design: A
(15) Li, J.; Hawkins, G. D.; Cramer, C. J.; Truhlar, D. G. Universal Polymer Design Case Study. Ind. Eng. Chem. Res. 1996, 35, 3403.
reaction field model based on ab initio Hartree-Fock theory. Chem. Phys. (33) Marrero, J.; Gani, R. Group-Contribution Based Estimation of Pure
Lett. 1998, 288, 293. Component Properties. Fluid Phase Equilib. 2001, 183184, 183.
(16) Abraham, M. H.; Doherty, R. M.; Kamlet, M. J.; Harris, J. M.; ReceiVed for reView October 27, 2007
Taft, R. W. Linear Solvation Energy Relationships. Part 38. An Analysis ReVised manuscript receiVed January 21, 2008
of the Use of Solvent Parameters in Correlation of the Rate Constants, with Accepted January 31, 2008
Special Reference to the Solvolysis of t-Butyl Chloride. J. Chem. Soc.,
Perkin Trans. 2 1987, 1097. IE0714549

You might also like