F09 E6 CB0 D 01

Machine Learning B
708.068 08W 1sst KU, WS 2008/09
Exercises
Problems marked with * are optional.
1 Genetic Algorithms [5+2* P]

Apply genetic algorithms to maximize the growth rate of the Escherichia coli
bacterium. The growth rate is enhanced by LacZ proteins, which break down
sugar lactose for use as an energy and carbon source. This LacZ proteins (Z) are
controlled by a gene regulatory network of the lactose (lac) system (see Fig. 1)
consisting of an activator protein X = CRP (signal Sx =cAMP, a molecule pro-
duced within the cell upon glucose starvation) and a protein Y =LacI that is
induced in the presence of Sy =lactose (which is assumed here to be constant, i.e
1). In the absence of the superior energy source glucose the lac system utilizes
lactose to enhance the growth rate.
The benefit b in growth rate is proportional to the rate at which LacZ (Z)
breaks down its substrate, lactose, which is approximately proportional to the
number of copies of the protein
b(Z) = benef it · Z
where benef it denotes a constant factor. Note that the expression of LacZ (Z) is
only beneficial in the absence of glucose, i.e. when cAMP (= Sx ) is produced. On
the other hand the expression of LacZ consumes energy and reduces the growth
rate approximately by a constant factor
c = cost.
Therefore only for longer intervals of Z expression enough proteins accumulate

to compensate for the cost of protein expression. The fitness function to be
maximized is given by
f = b − c.
The dynamics of the proteins Y and Z is modeled by

dY dZ
= βIY − αY, = βIZ − αZ
dt dt
Figure 1: lac system of the Escherichia coli bacterium.
with constant maximum production rate β = 0.02 and degradation rate α = 0.02.
This results in a maximum expression rate Ymax = Zmax = β/α = 1 and decay
time constant of α = 0.02. The production rates IY /Z are given by
IY = Θ(k11 X + k12 Y + k13 Z − 1) (1)

IZ = Θ(k21 X + k22 Y + k23 Z − 1) (2)
(3)
where Θ denotes the Heaviside step function and matrix k denotes activation
coefficients. X is in this example is activated instantaneously by input Sx , i.e.
X = Sx . This model is already implemented in MATLAB and only the activation
coefficients k have to be modified for the following analysis.
For this task the input Sx has the shape of a pulse with amplitude 1 and a
duration of either 20 ms or 200 ms occurring with probability p1 and p2 (p1 +p2 =
1), respectively. The activation coefficients k should be optimized with genetic
algorithms to maximize the growth rate for different settings of benef it, cost, p1 ,
and p2 .
a) Download the example code and the Genetic algorithm toolbox.1
b) Complete the code for the evaluation of the fitness value in the file calc fitness.m.
1
http://www.igi.tugraz.at/lehre/MLB/WS08/ga.zip
c) Optimize k consisting of only positive elements (kij ∈ [0, 10]) for cost = 2,
benef it = 4, p1 = 0.9 and p2 = 0.1 using a population size of 5000 and 500
generations. Describe and explain the mechanisms of the solution. Average
results over several runs of the genetic algorithm to average out outliers.
Hand in a plot illustrating the dynamic of the gene regulatory network
(GRN).
d) Repeat the optimization with parameters cost/benef it ∈ [0.1, 2], p1 ∈ [0, 1].
Describe different solutions (mechanisms of the GRN) found by the genetic
algorithm. Hand in a plot of the gene regulatory network dynamics for each
solution.
e) Hand in a two dimensional plot with axes cost/benef it and p1 which shows
for the results obtain in d) which solution is selected in different parameter
ranges.
f) [2* P] Repeat point d) with k consisting of positive and negative elements

(kij ∈ [−10, 10]). Describe one of the new solutions that didn’t emerge for
kij > 0 for each of the parameter ranges found in e).
Present your results clearly, structured and legible. Document them in such a
way that anybody can reproduce them effortless.
2 Comparison of Learning Algorithms [5 P]

Compare the performance of the three learning algorithms back-propagation, ge-
netic algorithms and simulated annealing that optimize the weights of a neu-
ral network to predict a nonlinear function of two variables. Use the dataset
dataset.m as the source for training and test sets.
a) Download the example code and the Genetic algorithm toolbox.2
b) Modify the code in the file compare.m to train neural networks with the
standard back-propagation algorithm from the MATLAB Neural Networks
Toolbox. Train neural networks consisting of 1, 3, 5, 7, 10 and 15 hidden
units and plot the training/test errors and the run times of the algorithm
in dependence on the number of hidden units. Average the results over
several runs for each network size. Apply a standard method of your choice
to compensate for over-fitting (i.e. weight decay or early stopping).
c) Repeat the analysis in b) with simulated annealing. Adjust the cooling

schedule to achieve appropriate convergence properties. Set the maximum
number of iterations (e.g. 1000) to avoid too long run times.
2
http://www.igi.tugraz.at/lehre/MLB/WS08/sa.zip
d) Repeat the analysis in b) with genetic algorithms. Adjust the parameters
population size (e.g. 100), number of generations (e.g. 1000) and mutation
operations (as you like) to achieve appropriate convergence properties.
e) Compare and interpret the results for all three learning algorithms.
f) Repeat the analysis b) - d) with a modified mean squared error function
where Gaussian noise with mean 0 and standard deviation 0.2 is added to
the target values. Interpret the results for all three algorithms and compare
it to the results obtained in e).
3 Distributed computing: WTA [5 P]

Implement a Winner-Take-All mechanism with the artificial organism simulator
(AOS). The task is to find and mark the input cell with the highest chemical
production by placing cells of type M ax at its location.
The environment consists of a 30 × 30 grid containing a blast cell B0 and
the 4 input cells X1 , X2 , Y1 , and Y2 . Input cells of type Xi and Yi (i = 1, 2)
produce the chemicals x and y, respectively. Each input cell produces chemicals
at a different rate. If a cell of type X (Y ) has maximum chemical production
(the highest concentration of chemical x or y at its location) than output cells of
type M ax1 (M ax2 ) should mark its location and no other output cell should be
present in the environment. Therefore the type and the location of the remaining
output cells indicate the type and the location of the input cell with the highest
chemical production (which is the winner of the computation), respectively.
a) Download the simulator3 and read the README.txt file. Modify the XML
files config/environment WTA.xml and config/organism WTA.xml to pro-
gram the environment and the cells.
You can use the following competences and reactors for this example:
DivideCompetence, MigrateCompetence, ConstReactor, SourceReactor,
DiffSourceReactor and KillReactor.
You can visualize the results of a simulation with the MATLAB script
visualize (see README.txt).
b) Program the blast cell to divide into two types of output cells M ax1 and
M ax2 that should search for local maxima of the chemicals x and y, re-
spectively. Make sure that only a certain number of cells of each type is
generated.
3
http://www.igi.tugraz.at/lehre/MLB/WS08/simulator.zip
c) Implement a mechanism that only output cells at the location of the high-
est chemical concentration (of the chemicals x and y) remain after 100000
simulation time steps.
4 Patterning [5+4* P]
Use the artificial organism simulator (AOS) to program a blast cell B0 to grow
cells to from a pattern that resembles a Japanese flag.
a) Download the simulator4 and read the README.txt file. Modify the XML
files config/environment patterning.xml and config/organism patterning.xml
to program the environment and the cells.
You can use the following competences and reactors for this example:
DivideCompetence, MigrateCompetence, ConstReactor, SourceReactor,
DiffSourceReactor and KillReactor.
You can visualize the results of a simulation with the MATLAB script
visualize (see README.txt).
b) Program the blast cell to divide into two types of output cells C1 and C2
that should self-organize to form a pattern that resembles a Japanese flag.
To obtain a smooth pattern the output cells C1 and C2 should produce
chemicals c1 and c2 . Don’t care about the specific colors of the flag. You
can of course produce more than two cell types. Run the simulation for
100000 time steps.
c) [2* P] Program the blast cell to divide into three types of output cells C1,
C2 and C3 that should self-organize to form a pattern that resembles a
French flag. You can of course produce more than three cell types.
c) [2* P] Program the blast cell to divide into two types of output cells C1
and C2 that should self-organize to form a pattern that resembles the flag
of Austria or Burundi. You can of course produce more than two cell types.
4
http://www.igi.tugraz.at/lehre/MLB/WS08/simulator.zip
5 Small-world networks [3 P]
Carry out a web search and find three datasets that have small-world properties.
That is
1. the cluster coefficient should be at least 20% larger than for corresponding
random networks that consist of the same number of elements.
2. the average shortest path length should differ at most by 10% compared to
random networks.
Use MATLAB to calculate the cluster coefficient and the average shortest path
length of the networks. Hints:
a) Use the commands fopen, fclose, textscan, textread and sscanf to

read data files in MATLAB (type HELP for details).
b) Write a function that takes the connectivity matrix W as input and calcu-
lates the cluster coefficient and the average shortest path length.
c) To calculate the cluster coefficient count only links between existing neigh-
bors. Nodes with only one or zero links are therefore not used for the
calculation. This is a property of undirected graphs. If necessary you can
convert a directed graph into an unidirected one.
d) To calculate the average shortest path length ignore unconnected nodes and
average only over shortest path lengths of connected nodes. This is a prop-
erty of directed and undirected graphs.
e) For each dataset that represents a directed graph you can choose if you
analyse the original directed graph or its undirected version obtained by
adding reciprocal links.
f) You can start your web search at http://www-personal.umich.edu/ mejn/netdata/.
6 Robustness [3 P]
Compare the memory capacity and the robustness properties of five networks
consisting of threshold gates with different connectivity graphs for a binary clas-
sification task.
a) Networks:
The five types of networks are: regular 2-dimensional lattice, random net-
works, Barabasi-Albert scale-free networks, Watts-Strogatz small-world net-
works, and Kaiser-Hilgetag small-world networks. Each network consists of
100 threshold gates and 1000 weights, where each weight is drawn randomly
from a Gaussian distribution with mean zero and standard deviation of one.
The threshold of each gate is set to zero.
1. Download the MATLAB code 5 and modify the MATLAB file robustness.m
to generate the missing networks. Implement the small-world and
scale-free networks as outlined in the exercises.
2. For each network a function should be implemented that takes the in-
put parameters as stated in the corresponding function header (nNodes,nLinks,α,...)
and outputs a connectivity matrix Wij of size 100×100 with elements
0 and 1, where 1 indicates a weight from gate j to gate i. All net-
works represent directed graphs and Wij is therefore in general not
symmetric.
b) Memory capacity:
To estimate the memory capacity simulate the networks for 1000 time steps.
During each time step binary input (1-dimensional with randomly chosen
bit values) is injected into the first 20 threshold gates of the network. The
input weights are drawn randomly from a Gaussian distribution with mean
zero and a standard deviation of eight. A linear classifier should be trained
with linear regression on the states of the last 20 gates of the network to
output the input bit that was injected ∆t ∈ {1, 2, 3, ..., 50} ms ago. For each
∆t a different classifier should be trained, thereby estimating the memory
capacity of the linear classifier for each network as a function of ∆t.
1. Perform 100 network simulations for each network type and generate
for each simulation a novel input and a novel network.
2. Train for each simulation a linear classifier for each ∆t and determine
the MAE. Calculate for all 100 simulations the average error and the
standard error of the mean SEM to verify the significance of the results.
3. Generate a plot that shows the average MAE and the SEM as a func-
tion of ∆t for all network types. Note that the output of the linear
classifier is 0 or 1. It is obtained by thresholding the output of the
linear regression model at a value of 0.5. Compare and discuss the
results for all networks.
5
http://www.igi.tugraz.at/lehre/MLB/WS08/robustness.zip
c) Robustness:
To estimate the robustnes properties train the networks for ∆t = 2ms for
which the performance of the linear classifier is highest. After training prune
all weights of P = 0, 2, 4, 6, ..., 20, 30 and 50 % of the network gates. (but not
the input weights). Simulate the network again and test the performance
of the linear classifier on the pruned network. Calculate the performance
values thereby estimating the robustness of the performance of the linear
classifier as a function of the number of gates with pruned weights.
1. Perform 100 simulations for each network type and generate for each
simulation a novel input and a novel network.
2. Train for each simulation a linear classifier for each P and determine
the MAE. Calculate for all 100 simulations the average error and the
standard error of the mean SEM to verify the significance of the results.
3. Generate a plot that shows the average MAE and the standard error
of the mean SEM as a function of P (the % of pruned gates) for all
network types. Compare the difference in performance for P = 0,
P = 0.1 and P = 0.5 for each network type. Which network has the
smallest memory loss?
7 RL theory I [3 P]
Prove Corollary 1.1 (p. 7) from the script Theory of Reinforcement Learning 6 :
For every policy π there exists a deterministic policy π ′ such that π ′ ≥ π. As a

special case: If there exists a stochastic optimal policy π, then there exists also
a deterministic optimal policy π ′ such that π ′ ≥ π.
8 RL theory II [3 P]
Prove Corollary 1.3 (p. 9) from the script Theory of Reinforcement Learning 7 :
Every policy π for which V π satisfies the Bellman optimality equations

V π (s) = max Qπ (s, a) ∀s ∈ S
a∈As
is optimal.
6
http://www.igi.tugraz.at/lehre/MLB/WS06/MDP Theory.pdf
7
http://www.igi.tugraz.at/lehre/MLB/WS06/MDP Theory.pdf
9 RL game [3* P]
Consider the following game: You have a random number generator that produces
in every round an integer number from 1 to 3 with equal probability. You play 3
rounds and have to decide at which position of a 3 digit number you want to place
the random digit. Your goal is to form the largest possible (decimal) number.
Formulate this game as a Markov decision process and find an optimal policy.
Also analyze the case where the numbers are drawn without replacement, i.e. if
the digit 3 appears in the first round, it cannot appear anymore in the remaining
two rounds.
10 On- and off-policy learning [5 P]

Download the Reinforcement Learning (RL) Toolbox8 and the example files9 .
See ToolboxTutorial.pdf for a RL toolbox tutorial. A similar example can be
found in the folder cliffworld to help you getting started. Consider the grid-
world shown in Figure 2. Implement this environment with the RL Toolbox as
an undiscounted (γ = 1), episodic task with a start state at S and a goal state at
G. The actions move the agent up, down, left and right, unless he bumps into a
wall, in which case the position is not changed. The reward is −1 on all normal
transitions, −10 for bumping into a wall, and 0 at the bonus state marked with B.
S B
Figure 2: Gridworld with bonus state.
Use Q-Learning and SARSA without eligibility traces to learn policies for this
task. Use ǫ-greedy action selection with a constant ǫ = 0.1. Measure and plot the
online performance of both learning algorithms (i.e. average reward per episode),
and also sketch the policies that the algorithms find. Explain any differences in
the performance of the algorithms. Are the learned policies optimal? Try this
8
www.igi.tugraz.at/ril-toolbox/general/overview.html
9
http://www.igi.tugraz.at/lehre/MLB/WS08/intern/examples.zip
exercise again with ǫ being gradually reduced after every episode and explain
what you find. Submit your code and the gridworld configuration file.
a) (1 point) For fixed ǫ = 0.1, how large do you have to set the bonus reward at
B, such that both algorithms converge to the same policy. Plot the online
performance for both methods.
b) (1 point) How does the online performance and the final policy change if
you use eligibility traces for both algorithms? Try different values for λ.
11 Function approximation [3* P]

Implement the Mountain Car example from the Sutton and Barto book (Example
8.2) 10 . A similar task (swing up a pendulum with V-Function Learning using
RBFs) can be found in the folder mountaincar to help you getting started. The
mountain car model is already implemented (see cmountaincarmodel.cpp). This
exmaple is more advanced than the previous one and it might be necesarry for
you to get more familiar with the RL toolbox (see Manual.pdf). Learn to reach
the goal on top of the hill with the SARSA(λ) algorithm and linear function
approximation. Use the following learning parameters: λ = 0.9, ǫ = 0, α = 0.1.
Initialize the action values to zero (optimistic initialization) to ensure exploration.
Measure the steps needed to reach the goal to evaluate the success of your learning
algorithm.
a) Use 5 grid-tilings of size 9 × 9 to discretize the state space. Show in a plot
how the number of steps needed to reach the goal evolves during learning.
b) Use RBF function approximation with 30 evenly spaced RBF centers in
each dimension (i.e. 900 total centers). Set the widths in every dimension
such that one RBF roughly spans 1-2 tiles.
c) Submit the code of your model and the learning algorithms.
12 Own ideas for learning algorithms [2* P]

Come up with your own innovative ideas and design or propose a learning algo-
rithm based on principles or methods that you heared during this course. De-
fine which type of task should be learned (supervised, unsupervised, prediction,
robustness etc.), which type of implementation is used (like gene regulatory net-
works or other types of networks to transmit information) and what type of
learning algorithm is applied (heuristics are also ok). Good luck!
10
http://www.cs.ualberta.ca/%7Esutton/book/ebook/node89.html

F09 E6 CB0 D 01

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

F09 E6 CB0 D 01

Uploaded by

Copyright:

Available Formats

Machine Learning B

708.068 08W 1sst KU, WS 2008/09

1 Genetic Algorithms [5+2* P]

Therefore only for longer intervals of Z expression enough proteins accumulate

The dynamics of the proteins Y and Z is modeled by

IY = Θ(k11 X + k12 Y + k13 Z − 1) (1)

a) Download the example code and the Genetic algorithm toolbox.1

f) [2* P] Repeat point d) with k consisting of positive and negative elements

2 Comparison of Learning Algorithms [5 P]

a) Download the example code and the Genetic algorithm toolbox.2

c) Repeat the analysis in b) with simulated annealing. Adjust the cooling

3 Distributed computing: WTA [5 P]

a) Use the commands fopen, fclose, textscan, textread and sscanf to

f) You can start your web search at http://www-personal.umich.edu/ mejn/netdata/.

For every policy π there exists a deterministic policy π ′ such that π ′ ≥ π. As a

Every policy π for which V π satisfies the Bellman optimality equations

10 On- and off-policy learning [5 P]

Figure 2: Gridworld with bonus state.

11 Function approximation [3* P]

12 Own ideas for learning algorithms [2* P]

You might also like