You are on page 1of 3

More Clustering

Session Overview
This lecture covers hierarchical clustering and introduces k-means clustering. This image is from the Wikimedia Commons. This image is in the public domain.

Session Activities
Lecture Videos

Lecture 20: More Clustering (00:49:09)

About this Video Topics covered: Feature vectors, scaling, k-means clustering. Resources

Lecture code handout (PDF) Lecture code (PY) Lecture slides (PDF) Lecture data files (ZIP) (This ZIP file contains: 3 .txt files.)

Recitation Videos

Recitation 8: Hierarchical and k-means Clustering (00:50:49)

About this Video

Topics covered: Unsupervised learning, k-means clustering, distance metric, cluster merging, centroid, k-mean error, holdout set, k value significance, features of k-means clustering, merits and disadvantages of types of clustering.

Check Yourself
How do we use nominal (non-numeric or noncontinuous) categories as features? answer Convert each possible value to a real number.

Why do we need to use scaling (normalization)? answer To indicate the relative importance of each feature.

How does k-means clustering work? answer A number 'k' points are chosen, randomly or otherwise, to be the initial centroids; all other points are assigned to their nearest centroid. A new, better centroid is then chosen for each cluster, and we rinse and repeat until the difference between our current set of clusters and the previous set is insignificant.

Problem Sets
Problem Set 9: Schedule Optimization (Due)

At an institute of higher education that shall remain nameless, it used to be the case that a human adviser would help each student formulate a list of subjects that would meet the student's objectives. However, because of financial troubles, the Institute has decided to replace human advisers with software. Given the amount of work a student wants to do, the program returns a list of subjects that maximizes the amount of value. The goal of this problem set is to implement optimization algorithms.

Instructions (PDF) Code Files (ZIP) (This ZIP file contains: 2 .py files and 2 .txt files.) Solutions (ZIP) (This ZIP file contains: 1 .png file and 4 .py files.)

Problem Set 10 (Assigned) Problem set 10 is assigned in this session. The instructions and solutions can be found on the session page where it is due, Lecture 22 Using Graphs to Model Problems, Part 2.

You might also like