Professional Documents
Culture Documents
Data Mining
• I.H. Witten and E. Frank, Data Mining: Prac-
tical Machine Learning Tools and Techniques
Lecturer: with Java Implementations, Morgan Kauf-
• Peter Lucas mann, San Francisco, 2000.
Knowledge/Patterns
E1
I1
C
I2
E2
if I1 and I2 then C1
Data if (I3 or I4) and not I2 then C2
Distance
150
35
mm Rain
30
25
20
100 15
10
5
0
50 4
3
2
−4 −3 1
−2 −1 0
−1
0 0 1 −2
0 200 400 600 800 1000 1200 1400 2 3 −3
4−4
minutes Sunshine/day
http://www.cs.waikato.ac.nz/ml/weka
WEKA – Preprocessing
WEKA – Classification by decision tree WEKA – Visualisation
Datasets:
• training data: used for model building
• test data: used for model evaluation
• preferably disjoint datasets
• Suppose a process is governed by the (un- • Suppose a process is governed by the (un-
known) function f (x) = −1x + 4 known) function f (x) = −1x + 4
• Training data: • Testing data:
6 6
−0.97*x + 3.9 −0.97*x + 3.9
1.17*x**2 − 4.5*x + 5.2 1.17*x**2 − 4.5*x + 5.2
5 5
4 4
y3 y 3
2 2
1 1
0 0
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
x x
Properties
Bias-variance decomposition
• E(X) expresses that the values observed for
• T : training dataset
X are governed by a stochastic, uncertain
process • Y = f (X) is predictor of the process
Course Outline
Theory:
• Learning classification rules (supervised)
• Bayesian networks (from simple to complex)
(partially supervised)
• Clustering (unsupervised)
Practice:
• Data-mining software: WEKA
• BayesBuilder
• Practical assessment
Tutorials:
• Exercises