You are on page 1of 42

2

CS 3244
Machine Learning

Week 2 The Linear Model, Part I


How can machines learn?
Kan Min-Yen

Background Photo credits: Rafiq Mirza, Luan Ahn, Rosmarie Voegtli @ Flickr

Recap
Learning is used when
1.
2.
3.

A pattern exists
We cannot pin it down
mathematically
We have data on it

Focus on supervised learning


- Unknown target function
= ()
- Data set ' , ' , , (* , * )
- Learning algorithm picks
from a hypothesis set

Example: PLA

Learning an unknown function?


- Impossible. The function can
assume any value outside the
data we have.
- Return to this in learning theory.

NUS CS3244: Machine Learning

Three learning problems

Credit
Analysis

Approve or
Deny?

CLASSIFICATION

= 1

Amount of
Credit

REGRESSION

Next week
Probability
of Default

LOGISTIC REGRESSION

[0, 1]

Linear models are the fundamental models


The linear model is the first model to try
NUS CS3244: Machine Learning

The linear signal

= 9

sign (9 )

= 1

(9 )

[0, 1]

NUS CS3244: Machine Learning

Three learning problems


>> Input representation
>> Linear classification
Linear regression
Nonlinear transformation

Outline
Error Measures
Noisy Targets

NUS CS3244: Machine Learning

A real dataset

Each digit is a 16x16 pixel gray-intensity image.


[-1 -1 -1 -1 -1 -1 -1 -0.63 0.86 -0.17 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -0.99 0.3 1 0.31 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -0.41 1 0.99 -0.57 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -0.68 0.83 1 0.56 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -0.94 0.54 1 0.78 -0.72 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0.1 1 0.92 -0.44 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -0.26 0.95 1 -0.16 -1 -1 -1 -0.99 -0.71 -0.83 -1 -1 -1 -1 -1 -0.8
0.91 1 0.3 -0.96 -1 -1 -0.55 0.49 1 0.88 0.09 -1 -1 -1 -1 0.28 1 0.88 -0.8 -1 -0.9 0.14 0.97 1 1 1 0.99 -0.74 -1 -1 -0.95 0.84 1 0.32 -1 -1 0.35 1 0.65 -0.10 -0.18 1 0.98 -0.72 -1 -1 -0.63 1 1
0.07 -0.92 0.11 0.96 0.30 -0.88 -1 -0.07 1 0.64 -0.99 -1 -1 -0.67 1 1 0.75 0.34 1 0.70 -0.94 -1 -1 0.54 1 0.02 -1 -1 -1 -0.90 0.79 1 1 1 1 0.53 0.18 0.81 0.83 0.97 0.86 -0.63 -1 -1 -1 -1 -0.45
0.82 1 1 1 1 1 1 1 1 0.13 -1 -1 -1 -1 -1 -1 -0.48 0.81 1 1 1 1 1 1 0.21 -0.94 -1 -1 -1 -1 -1 -1 -1 -0.97 -0.42 0.30 0.82 1 0.48 -0.47 -0.99 -1 -1 -1 -1]

NUS CS3244: Machine Learning

Input representation
Raw input = (@ , ' , A , B , C , , ADE)
Linear model: (@ , ' , A , , ADE)
Too many (257) parameters!

Features: extract useful information, e.g.,


Intensity and symmetry: = (@ , ' , A )
Linear model:

(@ , ' , A )

NUS CS3244: Machine Learning

Illustration of features
= (@ , ' , A )

' = intensity

A = symmetry

Quick Question: which axes is which?

NUS CS3244: Machine Learning

Iterations of PLA
One iteration of PLA:
+

+1

where (, ) is a misclassified training point.


At iteration = 1, 2, 3, , pick a misclassified
point from:
(' , ' ), (A , A ), , (* , * )
and run a PLA iteration on it.
NUS CS3244: Machine Learning

What PLA does


Final perceptron boundary

Error

Symmetry

Evolution of Ein and Eout

Iterations

Average intensity

NUS CS3244: Machine Learning

10

The pocket algorithm


PLA (for comparison): Pocket:

Symmetry

Run PLA
- At each step keep the best Ein
(and ) so far
(its not rocket science, but it works!)

Iterations

Symmetry

Error (log scale)

Error (log scale)

Average intensity

Iterations
NUS CS3244: Machine Learning

Average intensity
11

Three learning problems


Input representation
Linear classification
>> Linear regression

regression = real-valued output

Nonlinear transformation

Outline
Error Measures
Noisy Targets

NUS CS3244: Machine Learning

12

Credit Approval
How much credit
do we extend this
person?

Classification: Approve/Deny
Regression: Credit Line (dollar amount)
Input: <table on the left>

Criterion

Value

Age

32 years

Gender

Male

Salary

40 K

Debt

26 K

Years in Job

1 year

Years at
Current
Residence

3 years

Linear regression output: = ONP@ N N = 9


Data set: Credit officers decide on credit lines
(historical data ): ' , ' , A , A , , * , *
R is the credit line for customer R ; regression tries
to replicate this.

NUS CS3244: Machine Learning

13

Artwork credits: Project DebtRelief

-valued error measurement


How well does = 9 approximate ()?
In linear regression, we use squared error: ( )A
in-sample error:

1
A
in = V( R R )

RP'

Average
NUS CS3244: Machine Learning

How bad
is bad?
14

Illustration of linear regression

NUS CS3244: Machine Learning

15

The expression for Ein


*

NR

1
= V(9 R R )A

RP'
1
=
A

' 9
A 9
where = B 9 , =

Quick Question:
What are the dimensions of X?
* 9
NUS CS3244: Machine Learning

'
A
B

*
16

Minimizing Ein
NR
NR

1
=

A 9
=
*

9 = 9
= Z

where Z = 9

[' 9

Z is the pseudo inverse of


NUS CS3244: Machine Learning

17

The pseudo inverse


Z = 9

Z
[' 9

+ 1

NUS CS3244: Machine Learning

18

Linear regression algorithm


1. Construct the matrix and the vector from the data
set as follows:
' 9
A 9
= B 9 , =

* 9
input data matrix

'
A
B

target vector

2. Compute the pseudo inverse Z = 9


3. Return = Z

[' 9

One-step learning!

NUS CS3244: Machine Learning

19

Linear regression for classification


Linear regression learns a real-valued function =
Binary valued functions are also real valued! 1
Use linear regression to get where 9 R R = 1
In this case, sign(9 R ) is likely to agree with R = 1
Good initial weights for classification

NUS CS3244: Machine Learning

20

Why linear regression doesnt set


good weights for classification

Whats wrong
with this
picture?
Hint: think
squared error

NUS CS3244: Machine Learning

21

Three learning problems


Input representation
Linear classification
Linear regression
>> Nonlinear transformation

Outline
Error Measures
Noisy Targets

NUS CS3244: Machine Learning

22

Linear models are limited


Data:

Hypothesis:

NUS CS3244: Machine Learning

23

Another example
Credit line is affected by years in current residence,
but not in a linear way

Nonlinear features N < 1 and [ N > 5 ] are better.


means return 1 if x is true, 0 otherwise

Can we do that with linear models?

NUS CS3244: Machine Learning

24

But linear in what?


Linear regression implements
O

V N N
NP@

Linear classification implements


O

sign(V N N )
NP@

Algorithms work because of the linearity of weights,


but it doesnt say anything about the observed data .
NUS CS3244: Machine Learning

25

Transform the data nonlinearly


b

(' , A , , * ) ('A , AA , , *A )

Any preserves this linearity!


NUS CS3244: Machine Learning

26

NUS CS3244: Machine Learning

27

What transforms to what


b

= ' , A , , O

= (@ , ' , , Of )

' , A , , *

@ , ' , , *

' , A , , *
?

No weights in

' , A , , *
h = (' , A ,, Of )

h 9
= sign
h 9 )
= sign(
NUS CS3244: Machine Learning

28

Three learning problems


Input representation
Linear classification
Linear regression
Nonlinear transformation

Outline
>> Error Measures
>> Noisy Targets

NUS CS3244: Machine Learning

29

Recap: the learning diagram


UNKNOWN TARGET FUNCTION

DATA

(ideal credit approval function)

R = (R )
TRAINING EXAMPLES
', ' , , (* , * )
(historical records of credit customers)

LEARNING
ALGORITHM

FINAL HYPOTHESIS

(final credit approval function)

HYPOTHESIS SET

(set of candidate functions)

NUS CS3244: Machine Learning

30

Error measures
What does mean?
Need an error measure ,
This is almost always a pointwise definition: e( , ())
Examples weve seen:
Squared error

( ())A

Binary error

[[ ]]

NUS CS3244: Machine Learning

Which is for
classification?

31

From pointwise to overall


Overall error , =
average of pointwise errors ( , )
In-sample error:
NR

1
= V ( R , R )

RP'

Why not a
sum instead
of an
average?

Out-of-sample error:
stu = [( , )]

NUS CS3244: Machine Learning

32

The learning diagram


with testing data, pointwise error
UNKNOWN TARGET FUNCTION

DATA

(ideal credit approval function)

R = (R )
TRAINING EXAMPLES

', ' , , (* , * )
(historical records of credit customers)

() ()
LEARNING
ALGORITHM

FINAL HYPOTHESIS

(final credit approval function)

HYPOTHESIS SET

(set of candidate functions)

NUS CS3244: Machine Learning

33

Choosing your error measure


Are you sick?
Two types of error:
false accept (positive) or
false reject (negative)

+1 sick
1 well

How should we penalize


for each type?

NUS CS3244: Machine Learning

34

During your last final exam


before graduation
Are you sick?
False reject get better on your own,
or come back to the clinic later. At
least you graduate on time.

+1 sick
1 well

False accept Take the exam next


year. Possibly pay tuition fees. $$$

NUS CS3244: Machine Learning

35

During SARS
Are you sick?
False reject highly costly!
Epidemic ensures!
False accept requires inconvenience
of quarantine.

+1 sick
1 well

00
NUS CS3244: Machine Learning

36

How you measure matters


Where possible, we should use error measures
that fit the task, specified by the user.
However, this isnt always possible. Then use:
Plausible measures:

squared error Gaussian noise

Convenient measures: closed form solution,


convex optimization

NUS CS3244: Machine Learning

37

The learning diagram


with error measure
UNKNOWN TARGET FUNCTION

DATA

(ideal credit approval function)

R = (R )
TRAINING EXAMPLES

', ' , , (* , * )

ERROR
MEASURE
()
() ()

(historical records of credit customers)

LEARNING
ALGORITHM

FINAL HYPOTHESIS

(final credit approval function)

HYPOTHESIS SET

Quick Question:
where does the
error measure go?

(set of candidate functions)

NUS CS3244: Machine Learning

38

Noisy targets
The target function isnt always a function :

Criterion

Value

Age

32 years

Gender

Male

Salary

40 K

Debt

26 K

Years in Job

1 year

Years at
Current
Residence

3 years

Consider two identical


customers for loan
approval
could have two different
outcomes!
Why? And how to we
characterize these sources
of noise?
NUS CS3244: Machine Learning

Is misreporting
salary also a
cause of
noisy targets?

39

Target distribution
Instead of saying the target is a function, think of it as a
distribution: (|)
Our data , is now generated by the joint distribution:
(|)

Well revisit the


likelihood of the data
again later.

Noisy target = deterministic target function = (|)


+ noise ()
A deterministic target is just a special case:
= 0, except for = ()
NUS CS3244: Machine Learning

40

The learning diagram


including noisy target
UNKNOWN TARGET FUNCTION
|
: , plus noise

DATA

(noisy credit approval function)

TRAINING EXAMPLES

', ' , , (* , * )

ERROR
MEASURE

(historical records of credit customers)

()

LEARNING
ALGORITHM

() ()
FINAL HYPOTHESIS

(final credit approval function)

HYPOTHESIS SET

(set of candidate functions)

NUS CS3244: Machine Learning

41

Summary
Linear models use the signal:
= 9
Classification: = sign 9
Regression: = 9

Linear regression algorithm:


= 9

['

Error measures
Application specific, user should
specify
False accepts and rejects may
differ in badness

Noisy targets
= ~(|)

Nonlinear transformation
9 is linear in
b

Any preserves this linearity


b

E.g., (' , A ) ('A , AA )

NUS CS3244: Machine Learning

42

You might also like