MMP Manuscript A

Using Genetic Programming for an Advanced Performance Assessment of
Industrially Relevant Heterogeneous Catalysts
L.A. Baumes1*, A. Blansché2, P. Serna1, A.Tchougang2, N. Lachiche2, P. Collet2, A. Corma1

1
Instituto de Tecnología Química, UPV, Av. Naranjos s/n, E-46022 Valencia, Spain
2
Université Louis Pasteur, LSIIT, FDBT, Pôle API, F-67400 Illkirch, France
* Corresponding author(s)
Abstract
Beside the ease and speed brought by automated synthesis stations and reactors technologies in materials
science, adapted informatics tools must be further developed in order to handle the increase of throughput
and data volume, and not to slow down the whole process. This paper reports the use of genetic
programming (GP) in heterogeneous catalysis. Despite the fact that GP has received only little attention
in this domain, it is shown how such an approach can be turned into a very singular and powerful tool for
solid optimization, discovery, and monitoring. Jointly with neural networks, the GP paradigm is
employed in order to accurately and automatically estimate the whole curve “conversion versus time” in
the epoxidation of large olefins using titanosilicates, Ti-MCM-41 and Ti-ITQ-2, as catalysts. In contrast
to previous studies in combinatorial materials science and high-throughput screening, it was possible to
estimate the entire evolution of the catalytic reaction for unsynthesized catalysts. Consequently the
evaluation of the performance of virtual solids is not reduced to a single point (e.g. the conversion level at
only one given reaction time or the initial reaction rate). The methodology is thoroughly detailed, while
stressing on the comparison between a newly proposed CAX crossover operator and the traditional one.
Keywords: High-throughput, Data Mining, Genetic Programming, Materials Science, Heterogeneous

Catalysis
1. Introduction
The availability of long chain lineal olefins from Fisher-Tropsch units opens new possibilities to
obtain long chain aliphatic epoxides that can be functionalised for application in lubricants,
plastisizers, chemicals and fine chemicals production. Among the different catalytic systems to
carry out the epoxidation of double bonds, micro and mesoporous titanosilicates1,2,3 have been
shown more efficient catalysts than other metal-based materials.4,5,6 Considering this, and the
fact that extra-large pores or high external surface areas are required to avoid diffusional
restrictions when reacting large olefins, structured mesoporous material7,8,9 MCM-41, and the
delaminated zeolite10,11,12,13,14 ITQ-2, were selected in this paper as silica supports for grafting
active Ti species (see Figure 1).
Figure 1. Synthesis of the catalysts. Right - Firstly one of the two supports is selected. Then a given
amount of Titanium is grafted onto the surface. Finally, a given amount of one of the four selected
silylating agent is grafted on the solid. Left - Example of catalyst with ITQ-2 as support, SiMe3 as
silylating agent.
On the other hand, the catalytic activity of such materials can be improved by properly
controlling their surface properties, taking into account that the own hydrophilic nature of these
silica supports can contribute to the Ti sites deactivation by water adsorption and formation of
different by-products such as diols. Therefore, the design of an efficient epoxidation catalyst
requires not only the synthesis of highly active sites, but also a way to prevent their poisoning
during the reaction. Tailoring the hydrophobicity allows an optimum adsorption of the reactants,
while reducing the adsorption of the water and the opening of the desired product (epoxide) to
form diols (see Figure 2), which would lead to the deactivation of the catalyst.7
Figure 2. Reaction scheme. The starting reactant is on the left, the target product is in the middle, and
the molecule to avoid is on the right hand side.
In the present work, this control has been achieved by anchoring alkyl-silylated agents onto the
catalyst surface, see Figure 1, whose apolar character modifies the final hydrophilicity of the
material. During the silylation process, the amount of grafted molecules and the nature of the
alkyl ligands are key parameters. Four different silylating agents have been selected to test their
ability for protecting the Ti active sites from the presence of water. Such procedure introduces
numerous variables to be optimised, requiring an important experimental effort, which has been
reduced by using high throughput synthesis and testing apparatus,15 see Figure 3. In our
precedent work,16 the amount of grafted Ti, the level of silylation, and the nature of the
silylating agent on the two different supports (MCM-41 and ITQ-2) were studied for the
epoxidation of a C10 n-olefine taking the initial reaction rate as performance criterion. Contrarily
to most prior studies16,17 in combinatorial materials science and high-throughput screening
applied to heterogeneous catalysis, which restrict the data analysis by using a single standpoint
(e.g. conversion value at one given reaction time, or initial reaction rate), we want to extract
more information from the previously collected data in order to be able to automatically
compare the materials behaviour from different catalytic criteria.
Figure 3. High throughput equipments. Left - Automated solid and liquid handling station for catalyst
synthesis. Right - Parallel batch reactors in which catalysts and reactants are mixed and analyzed.
In absence of a complete kinetic studies of the different synthesized catalysts, which could not
be tackled in practice due to the relatively large number of experiments, a new approach needs
to be proposed. To do this, a genetic programming18 (GP) technique is employed in order to
discover one analytical function f behind the general shape corresponding to all the previously
tested catalysts ci, i=1..C. The GP objective can be formulated as the minimization of the error e
taking into account all the conversion measurements xi,t, t=tj ..tT, for the whole dataset.
Therefore, considering a given function, its parameters bɶi ,k , k=1..N are fitted using Levenberg-
Marquart methodology for each solid.
C T
∑e = ∑ ( xi ,t − xɶi ,t )
2
Min
i
ci with eci and (
xɶi ,t = f bɶi ,k , xi ,t ) (Equation 1)
t
Once the best function is found, the parameters can be used as output of a neural network while
the synthesis variables of the catalysts are the inputs. This allows obtaining the parameters
values for unsynthesized solids, and thus, the entire conversion curve. Beside the ease and speed
brought by automated synthesis stations and reactors technologies in materials science, adapted
informatics tools must be further developed in order to handle the increase of throughput and
data volume, and not to slow down the whole process.19 In Ref.20, 21 and 22, the authors present
a new Genetic Programming crossover operator called Context Aware Crossover (CAX) that
yielded great results on several usual benchmarks. Therefore, it was decided to try it out on the
real problem of catalyst performance modelling, which is a form of multi-objective symbolic
regression. We report the use of genetic programming (GP) in heterogeneous catalysis. Despite
the fact that GP has received only a little attention in this domain,23 this paper shows how such
an approach can be turned into a very singular and powerful tool for solid optimization and
discovery. The GP paradigm is employed in order to accurately and automatically estimate the
whole curve “conversion versus time” in the epoxidation of large olefins using titanosilicates,
Ti-MCM-41 and Ti-ITQ-2, as catalysts. Because of this, the evaluation of the performance of
the virtual solids is not reduced to a single conversion value or the initial reaction rate, while the
knowledge gain about the response of the catalysts, expressed through few parameters capturing
the evolution of the reaction along time, can be applied to predict the behaviour of new
(unsynthesized) materials. The methodology is thoroughly detailed, and the analysis of the GP
crossover is stressed by comparing the newly proposed CAX operator with the traditional one.
This paper starts with a quick description of the real dataset. Then, the scheme of the employed
methodology is drawn and the paper focuses on the CAX crossover. The presentation of the
results obtained on the catalyst optimisation problem and different benchmarks allow comparing
the CAX with the standard GP crossover based on consumed CPU-time. Finally, a conclusion
ends the paper.
2. Description of the input data and experimental setup

2.1.- Datasets
• Benchmarks
Standard benchmarks have been implemented in order to asses the efficiency of the CAX under
the new point of view of CPU-time basis, namely the quadratic polynomial symbolic regression,
the 11 bit multiplexer and the artificial ant on the Santa-Fe trail (with no ADF as in Koza's
implementation).
• Real application
The dataset obtained from the first step of the study16 is composed of 128 different synthesized
and tested catalysts, e.g. 36 for catalysts with SiMe3 as silylating agent, 6 for the three next
silylating agents, each time on both supports, and a selection of 10 new diverse catalysts per
support for verifying the modelling (36×2+6×3×2+10×2=128). Catalysts activity has been
monitored during 16 hours giving a series of seven conversion measurements, i.e. the quantity
of initial reactant which is transformed along time, see Equation 2. Since reactions were
performed in a closed reactor, so-called batch mode, reactant concentration decreases over time,
providing always curves “conversion versus time” characterized by a positive first derivative
and a negative second derivative.
x (t0 ) − x(t1 )
% Conversion (t1 ) = xt1 = ×100 (Equation 2)
x(t1 )
2.2.- Experimental setup
The CAX aims at improving the efficiency of the standard GP crossover by improving the
second part of the operation, i.e. choosing where to graft into parent 1 (P1) a subtree chosen in
parent 2 (P2). Usually, a “modern” GP crossover operator creates one new child from two
selected parents (P1 and P2) by i) randomly selecting a subtree S2 in P2 with 90% chance to
select a node, ii) randomly selecting a subtree S1 in P1 pointing on a node if S2 is a node, and
iii) creating a child which is the clone of P1 with subtree S2 in place of S1. Considering the
CAX operator, after selecting S2 in P2, one tries to find the best place where it could be grafted
in P1. All nodes of P1 can potentially receive the graft, excluding the root of P1 and the nodes at
the bottom of P1 due to depth constraint. All possibilities are deterministically explored, by
evaluating all possible children resulting from the graft of S2 wherever P1 can receive it (gray
nodes in Figure 4), and the child with the best fitness is returned. Even though the exhaustive
exploration of all potential crossover points in P1 is clearly expensive, Majeed and Ryan
claimed exceptional results, convincing us to try this new operator.
Figure 4. Context Aware Crossover (CAX): the shaded nodes in P1 are possible crossover points where
the selected subtree S2 from P2 can go in.
In their different papers, Majeed and Ryan suggest to first use the standard GP crossover, and
then start the CAX only after some time, so curves were plotted for CAX_10 (CAX started after
10% of the run), CAX_40, CAX_70 and no CAX (cf . Figure. 5).
In Ref.20, the population is made of 4,000 individuals for standard GP where the algorithm
using CAX only needed 200. Figure 5-left shows that if the same population size is used for
standard GP and CAX, the generation count just freezes when the CAX starts, due to the huge
amount of children evaluations that this operator needs. Therefore, it appears that using 200
individuals for CAX is an advantage to CAX rather than GP. Thus, it was decided to reduce the
population by 95% when CAX starts, so as to keep a generation count roughly equivalent to
standard GP as shown in Figure 5- centre.
Note that in Ref.20, fitness curves are given with reference to the number of generations.
However, to produce one child the CAX needs many more evaluations than a standard
crossover. In Ref.21, performance is given considering the number of evaluations. One could
argue that all individuals do not take the same time to be evaluated. For these reasons, the
results will be expressed against computing time, all four plots being done in parallel, on a
quadri-processor exclusively devoted to the runs.
Figure 5. Top - Catalyst optimisation problem. Left - Number of generations for constant population;
Centre - reduced population for CAX; Right - Results averaged on 4 runs for a reduced population size
when CAX starts. Each run takes around 13 hours on a 3Ghz PC. Bottom - The implementation
population reduction scheme is fair for the CAX evaluation- and generation-wise.
All the experiments were done over 50 runs, but for a number of seconds allowing standard GP
to perform the same number of evaluations as found in Koza's book. The experiments
implement the simple solution of turning on the CAX after completion of a certain percentage
of a run. In order to precisely evaluate the effects of CAX, the standard GP population size
(4,000) is used in the beginning of CAX runs until the CAX operator is started, after which the
population is reduced down to 200 individuals. As a consequence, in this paper, the runs using
CAX are identical to the standard GP run until the CAX operator is started.
3. Results
3.1.- Benchmarks
Koza's quartic polynomial symbolic regression problem (x4 + x3 + x2 + x) is implemented. To

obtain the CAX_10 curve which takes 1200 seconds, see Figure 6, the algorithm begins with a
population of 4,000 individuals for 120 seconds (10% of 1200), after which the CAX is started.
At this moment, the population is reduced down to 200 individuals using the following process:
the best individual is kept (elitism), and the other 199 individuals are selected with a tournament
of size 40 (1% of the original population size). Lower arities were tested, with elitist
tournament-7 and random selection, but tournament-40 is what yielded the best results. On
Figure 6-Top-Left, it can be observed that all methods perform the same, even when the CAX is
started and the population reduced from 4,000 down to 200. However, the Average Population
fitness curve, Figure 6-Top-Right clearly shows that, when the CAX starts, the average
population fitness is boosted to values not far from the best individual's, but apparently, this
does not lead to premature convergence, which is an interesting feature. Unfortunately, the great
improvement announced in Ref.21 was not seen.
On the 11-bit multiplexer problem, the effects of CAX look pretty much the same: on Figure 6-
Centre-Left, starting the CAX does not seem to have much effect at all (although it seems that
CAX_10 has had a small negative impact on the best individual performance). On the right, one
can clearly see the effect of CAX on the population average fitness whenever CAX is started.
Before CAX starts, the curve is of course identical to standard GP. What is remarkable, though,
is that for CAX_10, it seems that the population has not prematurely converged, though the
average fitness is very close to the best fitness. In the end, the best individual value for CAX_10
is the same as for standard GP.
The last benchmark in Ref.21 was the Lawnmower problem.18 However, this problem uses
ADFs that were not implemented in this work, since the original catalysis problem did not need
them. So, in order to take a comparable benchmark, the Artificial Ant on the Santa-Fe trail
problem was chosen. On this benchmark, still no improvement on the best fitness can be seen,
cf. Figure 6-Bottom-Left, although this time, CAX_10 does not seem to recover and catch up
with Standard GP. Here again, a spectacular boost on the population average fitness is observed
whenever the CAX starts.
Figure 6. Top - Quartic polynomial symbolic regression. Left: Best individual performance. Right:
Average performance of the population. Middle - 11 bit multiplexer problem. Left - Best performance.
Right- mean performance. Bottom - Artificial Ant on the Santa-Fe Trail. Left - Number of hits of the best
individual. Right - Number of hits of the average population.
3.2.- Real application

This difficult problem was first tackled with a tailored GP algorithm that did not use the CAX
operator. The adjusted fitness (in the Koza sense) of the best individual measured on the
evaluation set is 0.93 which corresponds to a mean R2 of 0.93, considering all the catalysts and
all measurements. Data has been previously divided in learning set, test set, and evaluation set
in order to detect overfitting.
Considering the real application, it seems that one can conclude that the exhaustive search
started by the CAX in order to find the best positions of grafting does not yield much better
results than when the same amount of CPU time is used by an ordinary standard crossover, see
Figure 5. Finally, different functions can be extracted from the best Pareto front, see Figure 7.
For example, a two parameter function X=h(t)=ktn/(1+ktn) is selected that was found, using the
standard GP operator, that shows the best balance between fitting accuracy and number of
parameters. On the other hand, a three parameter function X=f(t)=a-bct is also selected since the
number of operators is minimized while showing approximately the same fitting quality.
- /
a × × +
b ^ k ^ 1 ×
c t t n k ^
t n
Figure 7. Genetic programming trees: a-bct on the left hand side, and ktn/(1+ktn) respectively on the
right.
4- Using genetic programming results
GP algorithm using the CAX operator, as well as the ordinary standard crossover, were
evaluated on a real set of data, consisting of kinetic measurements for 128 different catalysts in
the epoxidation of 4-decene. The application of GP to extract an analytical expression for
reproducing the relationship between the conversion level and the reaction time introduces new
opportunities during the evaluation of the results, since all the information obtained during the
experimental assays is entirely retained. As a consequence, the loss of information is avoided
through an expression capturing the evolution of conversion with reaction time for each
catalyst, while the data storage is also simplified by transforming the collection of discrete
conversion vs. time values into the few parameters of the proposed equation.
The automatic discovery and fitting of analytical expressions to reproduce kinetic experiments
represents a key issue for speeding up the data treatment stage, especially when large amount of
information has been generated by using high-throughput technologies. Even when the
behaviour of the tested catalysts wants to be evaluated from one unique stand point (initial
reaction rate, or the conversion level at a specific reaction time), it is necessary to normalize the
experimental results to fairly perform the comparisons, since aliquots for each reaction are
hardly ever taken at the same reaction times. In this scenario, managing a simple equation to
rapidly estimate initial rates or the conversion at any reaction time (interpolation) becomes
crucial. by simply calculating the derivate of a given analytical function, reaction rates can be
evaluated at whatever reaction time including the initial reaction rate r0. For example,
k .t n δ h  k 2 nt 2 n −1 knt n −1  2
considering = h ( t ) , v (t ) = h '
( t ) = = − +  , R =0.98 is found
1 + k .t n δ t  (1 + kt n ) 1 + kt n 
 
between estimated r0 and previously reported in Ref.16
On the other hand, analytical equations allow retaining most of the information from the kinetic
measurements, whose importance has been graphically expressed in Figure 8. In this Figure
three representative “conversion vs. reaction time” curves are depicted, showing that the
ranking of catalysts (A, B, C) depends on the selected criteria, i.e. C > B > A at t = 1, B > C >
A at t = 4, and A > B > C at t = 10, with t in hours. This result is a direct consequence of the
fact that the final catalytic response is actually defined by a set of chemical-physical
phenomena, such as the type and magnitude of the interactions between reactants and the active
sites or the occurrence of some deactivation processes.
100
A
A
B
B
B
Conversion (%)
C C
C
50 C A
1 4 10
Time (h)
Figure 8. On the importance of the comparison criterion
Therefore, the use of GP on catalysis field deals with acquiring a global understanding of the
studies in case, enhancing the quality of results and the final knowledge gain. For instance, in
the present work we have applied the GP algorithm to infer various analytical expressions able
to reproduce within very low errors (global R2 ≈ 0.93) the “conversion vs. reaction time” curves
for the epoxidation of 4-decene using Ti-MCM-41 and Ti-ITQ-2 catalysts. As a consequence,
we are now ready to evaluate the catalysts behaviour from different standpoints, as shown in
Figure 9. In this Figure, experimental results (conversion levels) are represented at two (left) or
three (right) reaction times, using some filters to identify some of the characteristics of the
related catalysts (type of material, i.e. MCM-41 or ITQ-2, top; type of silylating agent, SiMe3,
SiMe2Bu, SiMe2Ph, or SiMePh2, bottom). Under this approach, new conclusions can be
extracted about the catalysts mode of action, complementing those previously reported.16 On
this regard, it is shown that ITQ-2 samples providing the same conversion level than MCM-41
at the initial stages of the reaction (t = 0.2 h), are generally more active materials at larger
reaction times (t = 1 h), as can be inferred from Figure 9 (top, left). A similar analysis can be
carried out in three dimensions by considering the behaviour of the catalysts at 0.2, 1, and 6 h
(Figure 9, top right). Moreover, when the same results are filtered by the type of silylating
agent, it is possible to observe the formation of some clusters, indicating, for instance, that
SiMe2Bu is highly active at short reaction times, but becomes overcome by SiMe3 at 6 h.
Figure 9. Left - 2D plot of percent of conversion at t=1, and t=0.2. Right - 3D plot of percent conversion
at t=0.2, t=1, and t=6; t in hours. Influence of support is shown in the top charts with ITQ samples as red
squares and MCM with filled blue circles, while silylating agent influence is shown at the bottom with
blue, red, gray, green, and white circles respectively for SiMe2Ph, SiMe2Bu, SiMe3, SiMePh2, and without
silylating agent.
On the other hand, the powerful of GP to offer an analytical expression to the experimental data
is not only related to the ration knowledge gain/time savings but to the possibility of introducing
diverse mathematical criteria during the search process. For instance, among the large number
of possible equations to fit our kinetic measurements, we have limited the complexity of the
solution (number of operators, and number of parameters), leading to simple empiric
expressions. For instance, the equation a − bct = f ( t ) has been found by minimizing the
number of operators to obtain a satisfactory correlation (R2=0.92). Thanks to this fact, a new
criterion can be easily calculated to rank the catalysts with regard to the whole “conversion vs.
reaction time” data, using the area bellow the kinetic curves (integral of the analytical
expression between 0 and T=10 hours) as shown in Equation 3. Figure 10 shows that this new
criterion allows giving a new point of view on catalysts ranking complementarily to previously
established one.16
t =T
b.cT
∫ ( a − b.c ) = F − F , with F = a.T − Ln ( c ) (Equation 3)
t 0 T T
0.07
r0 - MCM41
Integral - MCM41
0.06
0.05
0.04
0.03
0.02
0.01
0
1 11 21 31 41 51 61
0.07
r0 - ITQ2
Integral - ITQ2
0.06
0.05
0.04
0.03
0.02
0.01
0
1 11 21 31 41 51 61
Figure 10. In black is represented the initial reaction rate while the area below the reaction curve
between 0 and 10 hours appears in grey. Area has been divided by 100 in order to keep only one y-axis.
Results are given separately for MCM41 and ITQ2, resp. Top and Bottom.
k .t n
On the other hand, the equation = h ( t ) has been achieved by minimizing the total
1 + k .t n
number of parameters involved. Although the resulting expression is clearly more complex,
making difficult its analytical treatment (and in particular the definition of the primitive for
integral calculation), it is more convenient for trying to correlate the responses (conversion vs.
time curves) of the catalysts with their chemical characteristics using advanced modelling
algorithms. In this sense, the regression between parameters values and synthesis variables has
been handled with a neural network. The synthesis variables are the following: 2 supports for
the Titanium grafting process {MCM-41, ITQ-2}, the range [0.1-5] Ti wt% for the Titanium
grafting, 4 silylant agents to analyse the effect of the alkyl group size on the catalytic properties
{SiMe3, SiMe2Ph, SiMePh2, SiMe2Bu}, [0.0-1] and [0.0-0.5] for the silylation degree, e.g.
SiR3/(SiO2+TiO2), for MCM-41 and ITQ-2 samples respectively. The epoxidation of trans-4-
decene is elected as test reaction to evaluate the catalytic performance of the synthesized
materials. Minimizing the low number of parameters allows overfitting of the neural network to
be easily handled, and thus, the resulting architecture shows a very low level of complexity in
both the number of hidden layers and total amount of neurons (Multi-layer Perceptron 4:4-8-
2:2), four synthesis variables as input, and k and n as output.
Table 1. Neural network statistics

Training Selection Test
k n k n k n
Data mean 0.178283 0.539448 0.223869 0.574403 0.196315 0.561092
Data S.D. 0.143507 0.162605 0.164714 0.145667 0.159821 0.111543
Error Mean -0.000971 -0.001282 -0.032269 -0.036759 -0.010183 -0.019647
Error S.D. 0.054646 0.075635 0.080973 0.080328 0.084756 0.075151
Abs. E. Mean 0.041913 0.062049 0.060644 0.069006 0.067030 0.063672
S.D. Ratio 0.380789 0.465148 0.491600 0.551453 0.530319 0.673738
Correlation 0.924835 0.886070 0.872050 0.854034 0.854259 0.804007
Figure 11 shows the estimation and nominal errors of k and n using the synthesis variables as
input (e.g. %Ti, %Sylilation, Sylilating agent, and Support). Before using the neural network, a
division of the dataset (½ for training, ¼ for selection, and ¼ for testing, i.e. unseen materials)
allows preventing overfitting, see Table 1 for statistics.
1.5
k n
kpred npred 0.3 diff k
1.2 diff n
0.2
0.9
0.1
0.6 0.0
-0.1
0.3
-0.2
0.0
Figure 11. Top - Neural network (Multi-layer Perceptron 4:4-6-2:2) estimation. Bottom – Observed
versus predicted values of k and n (respectively left and right) for separated datasets, i.e. T for training, S
for selection, and X for Test.
5.- Conclusion
The conclusion is not exactly the one that was originally planned. When starting this work, the
aim was to improve the best individual result on the heterogeneous catalyst optimisation
problem using the Context Aware Crossover. Unfortunately, things did not turn out as expected,
as it was impossible to obtain better results with the CAX than with an ordinary crossover
operator on this real world problem. A careful implementation of the benchmarks seems to
show that CAX is not capable of improving the best fitness value; although CAX seems to be a
very good exploitation operator that boosts the whole population towards much better fitness
values while maintaining a good level of diversity (best individual fitness keeps rising after the
CAX is started). This means that CAX remains a very interesting crossover method that would
deserve another careful investigation on diversity preservation. From the point of view of the
chemistry, the application of GP allows reproducing the relationship between the conversion
level and the reaction time, it retains all the information, and data storage is also simplified.
Moreover, the use of GP permits acquiring a more global understanding, enhancing the quality
of results and the final knowledge gain. Catalysts behaviour can be quickly evaluated from
different points of view, allowing new conclusions to be extracted about the catalysts mode of
action. For the first time in heterogeneous catalysis, genetic programming has been used for an
application of industrial interest. With this study, it has been shown how such a tool can open
new opportunities for data mining and knowledge extraction in material science. As an example,
the combination with a modelling tool such as neural network makes again the GP strategy very
promising and relevant.
References
1
A. Corma, M.T. Navarro, J. Perez Pariente. J. Chem. Soc., Chem. Commun. 1994 147
2
A. Thangaraj, R. Kumar, P. Ratnasamy, J. Catal. 131 1991 294
3
W. Fan, P. Wu, S. Namba, T. Tatsumi, Angew. Chem., Int. Ed. 43 2003 236
4
P. Barret, F. Pautet, M. Dauton, J.F. Sabot, Pharm. Acta Helv., 62 1987 348
5
N. Fdil, A. Romane, S. Allaoud, A. Karim, Y. Castanet, A. Morteaux., J. Mol. Catal., 108 1996 15
6
M. Lajunen, A.M.P. Koskinen, Tet. Lett., 35 1994 4461
7
A. Corma, M. Domine, J.A. Gaona, J.L. Jorda, M.T. Navarro, F. Rey, J. Perez-Pariente, J. Tsuji, B.
McCullock, L.T. Nemeth, Chem. Comm., 2211 1998
8
W. Zhang, M. Froeba, J. Wang, P.T. Tanev, J. Wong, T.J. Pinnavaia, JACS 1996, 118(38), 9164-
9171.
9
K.A. Koyano, T. Tatsumi, Microporous Materials 1997, 10(4-6), 259-271.
10
A. Corma, V. Fornes, S.B. Pergher, Th.L.M. Maesen, J.G. Buglass, Nature (London) 1998,
396(6709), 353-356.
11
A. Corma, U. Diaz, V. Fornes, J.L Jorda, M.E. Domine, F. Rey, Chem. Comm. (Cambridge) 1999,
(9), 779-780.
12
A. Corma, U. Diaz, M.E. Domine, V. Fornes, Angewandte Chemie, Int. Ed. 2000, 39(8), 1499-1501.
13
A. Corma, U. Diaz, M.E. Domine, V. Fornes, JACS 2000, 122(12), 2804-2809.
14
P. Wu, D. Nuntasri, J. Ruan, Y. Liu, M. He, W. Fan, O. Terasaki, T. Tatsumi, J. of Physical
Chemistry B 2004, 108(50), 19126-19131.
15
(a) Jandeleit, B.; Schaefer, D.J.; Powers, T.S.; Turner, H.W.; Weinberg, W.H., Angew. Chem. Int. Ed.
1999, 38, (17), 2494-2532. (b) Senkan, S.M., Angew. Chem. Int. Ed. 2001, 40, (2), 312-329. (c) Reetz,
M.T., Angew. Chem. Int. Ed. 2001, 40, (2), 284-310. (d) Newsam, J.M.; Schuth, F., Biotechnol.
Bioeng. 1999, 61, (4), 203-216. (e) Gennari, F.; Seneci, P.; Miertus, S., Catal. Rev.-Sci. Eng. 2000, 42,
(3), 385-402.
16
P. Serna, L.A. Baumes, M. Moliner, A. Corma, Journal of Catalysis, 258, 35-34, 2008
17
(a) M. Holena, M. Baerns, Catal. Today, 2003, 81, 485-494. (b) L.A. Baumes, M. Moliner, A.
Corma., QSAR comb. Sci. Vol. 26, Issue 2, 255-272, 2007 (c) D. Nicolaides, QSAR Comb. Sci. 2005,
24, 15-21. (d) L.A. Baumes, J.M. Serra, P. Serna, A. Corma. J. Comb. Chem. 2006, 8, 583-596 (e) M.
M. Gardner, J. N. Cawse, In Experimental Design for Combinatorial and High Throughput Materials
Development, Ed. J.M. Cawse. J. Wiley & Sons, Inc. 2003, 129-145. (f) F. Schüth, L.A. Baumes, F.
Clerc, D. Demuth, D. Farrusseng, J. Llamas-Galilea, C. Klanner, J. Klein, A. Martinez-Joaristi, J.
Procelewska, M. Saupe, S. Schunk, M. Schwickardi, W. Strehlau, T. Zech. Catal. Today. Vol. 117,
2006. 284-290 (g) A. Corma, J. M. Serra, E. Argente, S. Valero, V. Botti, Chem. Phys. Chem., 2002,
3, 939-945. (h) L.A. Baumes, D. Farruseng, M. Lengliz, C. Mirodatos. QSAR & Comb. Sci. Nov.
2004, vol. 29, Issue 9, 767-778.
18
(a) J.R. Koza. Genetic Programming: On the Programming of Computers by means of Natural
Evolution. MIT Press, Massachusetts, 1992. (b) J.R. Koza. Genetic Programming II: Automatic
Discovery of Reusable Programs. MIT Press, Massachussetts, 1994.
19
(a) Baumes, L.A. Combinatorial Stochastic Iterative Algorithms and High-Throughput Approach:
from Discovery to Optimisation of Heterogeneous Catalysts (in English). Univ. Claude Bernard Lyon
1, Lyon, France, 2004. (b) Farrusseng, D.; Baumes, L.A.; Mirodatos, C., Data management for
combinatorial heterogeneous catalysis: methodology and development of advanced tool. In In High-
Throughput Analysis: A Tool For Combinatorial Materials Science, Potyrailo., R. A.; Amis., E. J.,
Eds. Kluwer Academic/Plenum Publishers: 2003; pp 551-579. (c) http://catalyse.univ-
lyon1.fr/gre3b4.htm website accessed the 20th july 2006 (d) http://www.fist.fr/article259.html website
accessed the 20th july 2006. (e) Adams, N.; Schubert, U.S., Macromol. Rapid. Commun. 2005, 25, 48-
58. (f) Adams, N.; Schubert, U.S., QSAR & Comb. Sci. 2005, 24, 58-65. (g) Ohrenberg, A.; von
Torne, C.; Schuppet, A.; Knab, B., QSAR & Comb. Sci. 2005, 24, 29-37. (h) Saupe, M.; Fodisch, R.;
Sunderrmann, A.; Schunk, S.A.; Finger, K.E., QSAR & Comb. Sci. 2005, 24, 66-77. (i) Gilardoni, F.;
Curcin, V.; Karunanayake, K.; Norgaard, J.; Guo, Y., QSAR & Comb. Sci. 2005, 24, 120-130.
20
H. Majeed and C. Ryan. A less destructive, context-aware crossover operator for GP. In P. Collet et
al., editor, Proc of the 9th European Conf. on Genetic Programming, vol. 3905 of Lecture Notes in
Computer Science, 36-48, Budapest, 2006. Springer.
21
H. Majeed and C. Ryan. Using context-aware crossover to improve the performance of GP. In
Maarten Keijzer et al., editor, GECCO 2006: Proc. of the 8th annual conf. on Genetic and evolutionary
computation, vol.1, 847-854, Seattle, Washington, USA, 8-12 July 2006. ACM Press.
22
(a) H. Majeed and C. Ryan. Context-aware mutation: a modular, context aware mutation operator for
genetic programming. In Dirk Thierens et al., editor, GECCO '07: Proc. of the 9th annual conf. on
Genetic and evolutionary computation, vol.2, 1651-1658, London, 7-11 July 2007. ACM Press. (b) H.
Majeed and C. Ryan. On the constructiveness of contextaware crossover. In Dirk Thierens et al.,
editor, GECCO '07: Proc. of the 9th annual conf. on Genetic and evolutionary computation, vol.2,
1659-1666, London, 7-11 July 2007. ACM Press.
23
L.A. Baumes, P.Collet. Computational Materials Science. 2008. In Press.

MMP Manuscript A

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MMP Manuscript A

Uploaded by

Copyright:

Available Formats

Using Genetic Programming for an Advanced Performance Assessment of

Industrially Relevant Heterogeneous Catalysts

L.A. Baumes1*, A. Blansché2, P. Serna1, A.Tchougang2, N. Lachiche2, P. Collet2, A. Corma1

Keywords: High-throughput, Data Mining, Genetic Programming, Materials Science, Heterogeneous

2. Description of the input data and experimental setup

2.2.- Experimental setup

Koza's quartic polynomial symbolic regression problem (x4 + x3 + x2 + x) is implemented. To

3.2.- Real application

4- Using genetic programming results

Figure 8. On the importance of the comparison criterion

Table 1. Neural network statistics

You might also like