What Is The Most Natural (Non-Autonomous, E.G. Breathing) Thing Done by Human Beings? How Often Does The Average Human Do It?

Quiz
What is the most natural (non-autonomous, e.g.

breathing) thing done by human beings?
How often does the average human do it?
Clustering
With your host, the self-appointed King of
ClusteringKai Larsen
Cluster Analysis
Source: http://www.vias.org/science_cartoons/cluster_analysis.html
http://www.abdn.ac.uk/zoologymuseum/images/kingdoms.jpg
Can we use this information?
Writing Skills
English Majors
Business Majors
Salary
5
Unsupervised Classification
Training Data
case
case
case
case
case
1: inputs, ?
2: inputs, ?
3: inputs, ?
4: inputs, ?
5: inputs, ?
new
case
6
Training Data
case 1: inputs, cluster 1
case 2: inputs, cluster 3
case 3: inputs, cluster
2 case 4: inputs, cluster
1 case 5: inputs, cluster
2
new
case
What and Why?

What:
Classification with an unknown target
# of classes is unknown
Increase between class distance, decrease within class distance

Why:
Description
For example, segmenting existing customers into groups and associating a
distinct profile with each group could help future marketing strategies.
From the Internet: There are three customer types, each of which need to
be sold to very differently. These are: the Financier, the Techie and the
User.
From Kai: There are two kinds of students, those with BI experience, and
those without
Caveat:
There is no guarantee that the resulting clusters will be meaningful or useful. You
have to carefully consider them.
Two, basic, types of cluster analysis
K-means (iterative)
Hierarchical (one-shot)
k-means Clustering
Assignment
10
Reassignment
11
Example K-means Clustering

Andromeda Galaxy
Source:www.freewebs.com/
bnip1/andromedakmeans.htm
12
Euclidean Distance
(U2,V2)
(U1,V1)
L2 = ((U1 - U2)2 + (V1 - V2)2)1/2
(generally leads to spherical clusters)
13
Hierarchical
Create a table with all distances

between people or cases
We get the following table of differences:
Red1
Red2
Red3
Red4
Red1
1.12
.5
2.7
Red2
1.12
Red3
.5
2.24
Red4
2.7
2.24
Now, starting with he shortest distances between dots, we cluster

items.
14
Hierarchical
Red1/3
Red2
Red4
Red1/3
1.03
2.46
Red2
1.03
Red4
2.46
1/3

items.
15
Hierarchical
Red1/2/3
Red4
Red1/2/3
2.28
Red4
2.28
1/2/3

items.
16
Hierarchical
1/2/
3/4
Red1/2/3/4
1
Red1/2/3/4

items.
17
Result
18
Manhattan Distance
(U2,V2)
(U1,V1)
L1 = |U1 - U2| + |V1 - V2|
19
In teams of two
1. Using Manhattan Distance,
create a table with all
distances between red dots
2. Create a dendrogram
20
6
1
21
Life is often messy

Tribe Movement
22
Tribe Creation
How many clusters?
23
Flow Clustering Example
24
Source: http://wiki.na-mic.org/Wiki/index.php/Progress_Report:DTI_Clustering
Ancient Chinese Classification of Animals:

"Animals are divided into:
a)
b)
c)
d)
e)
f)
g)
h)
i)
j)
k)
l)
m)
n)
those that belong to the Emperor

embalmed ones
those that are trained
suckling pigs
Mermaids
fabulous ones
stray dogs
those that are included in this classification
those that tremble as if they were mad
innumerable ones
those drawn with a very fine camel's hair brush
others
those that have just broken a flower vase
those that resemble flies from a distance."
from Other Inquisitions: 1937-1952 by Jorge Luis Borges
25
For the Marketing Buffs

Market Basket Analysis
(a quick intro)
Association Rules
A B C
A CD
Rule
AD
CA
AC
B&CD
B CD
Support
2/5 (.40)
2/5 (.40)
2/5 (.40)
1/5 (.20)
Probability
Probabilitythat
thattwo
twoitems
items
co-occur
co-occur
# transactions with both A and D
All transactions
All transactions
27
ADE
B C E
Confidence
2/3 (.67)
2/4 (.50)
2/3 (.67)
1/3 (.33)
Conditional
Conditionalprobability
probabilitythat
that
transaction
contains
D,
transaction contains D,
given
giventhat
thatititcontains
containsAA
# transactions with A
# transactions with A
28
Size
Sizeofofbox=
box=transaction
transactioncounts
counts
Color
of
link=
indicates
confidence
Color of link= indicates confidencelevel
levelofofrule
rule
Thickness
of
link
=confidence
Thickness of link =confidence
29
Barbie Candy
1.
2.
3.
4.
5.
6.
7.
8.
30
Put them closer together in the store.

Put them far apart in the store.
Package candy bars with the dolls.
Package Barbie + candy + poorly selling item.
Raise the price on one, lower it on the other.
Barbie accessories for proofs of purchase.
Do not advertise candy and Barbie together.
Offer candies in the shape of a Barbie doll.
Conclusions
Clustering provides another way to understand data
Its results need to jive with human understanding
Unless we use the clusters directly for predictive
analysis
Market basket analysis is now an industry standard
31
Lets Submit to Titanic
32
Create Kaggle Account

Invite team members
Download train and test files from Kaggle
Save files as .xlsx
Import files into SQL Server
Run prediction with multiple models
Figure out which is best based on cross-validation
Use that model to predict
Upload results
Note: gendermodel.csv has submission format example
You need the same column names and number of rows

What Is The Most Natural (Non-Autonomous, E.G. Breathing) Thing Done by Human Beings? How Often Does The Average Human Do It?

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

What Is The Most Natural (Non-Autonomous, E.G. Breathing) Thing Done by Human Beings? How Often Does The Average Human Do It?

Uploaded by

Copyright:

Available Formats

Quiz

What is the most natural (non-autonomous, e.g.

Can we use this information?

What and Why?

Classification with an unknown target

Increase between class distance, decrease within class distance

Two, basic, types of cluster analysis

Example K-means Clustering

Create a table with all distances

We get the following table of differences:

Now, starting with he shortest distances between dots, we cluster

Now, starting with he shortest distances between dots, we cluster

Now, starting with he shortest distances between dots, we cluster

Now, starting with he shortest distances between dots, we cluster

Life is often messy

How many clusters?

Flow Clustering Example

Ancient Chinese Classification of Animals:

those that belong to the Emperor

For the Marketing Buffs

Put them closer together in the store.

Lets Submit to Titanic

Create Kaggle Account

You might also like