Chapter 11: Unsupervised Learning

K – Means Algorithm

An unsupervised learning algorithm for finding structure for an unlabelled data set. K Means is an iterative algorithm

1. Pick K where K denotes the number of clusters which we want our unlabelled data to be divided into

2. Randomly initialized K markers known as cluster centroids.

3. Run K Means algorithm

K Means Algorithm is divided into 2 steps. These 2 steps are repeated iteratively until the cluster converge

1. cluster assignment which is to assign each example to a cluster centroid which is the closest to it.

2. move centroid executed which is to compute the mean location of all the examples assigned to particular centroid then move it there. Do it for all centroids

Training set = {x¹,x²,x³, ……. , x^m}

xⁱ will be a n dimensional vector (no longer contain the feature 0 which is always 1)

So:

Random initialize K centroids (µ₁, µ₂, µ₃,…..,µ_k)

Repeat {

for i = 1 to m:
cⁱ := µ which is the closest to xⁱ(closest in terms of distance : min(||xⁱ – µ_k||², for k= 1 to K) )

for k = 1 to K:

µ_k := average of all xⁱ which are assigned to µ_cⁱ

(µ_cⁱ denotes the cluster centroid to which example xⁱ has been assigned)

}

µ_k is also a n dimensional vector

Optimization Objective

The cost function of K-means is a function of c¹, c²….. c^m and µ₁ , µ₂ …. µ_k denoted as follows:

J(c¹,….. c^m,µ₁,µ₂ …. µ_k) = 1/m * ^mΣ_i=1 || xⁱ – µ_cⁱ ||² and we want to minimize this cost function.

We want to find what specific values of c¹,….. c^m,µ₁,µ₂ …. µ_k can produce the smallest cost.

The cluster assignment step in K mean will minimise J as it choosing the closest centroid for each example

The move centroid step will use the optimal position of the centroids.

Together , when K means converge , we should had minimized cost function J.

Random Initialization

How to initialize K cluster centroid and avoiding local optimal problem ?

Recommended method as follows:

One condition which should be true is K < m

1. Randomly pick K training examples

2. Set µ₁,µ₂ …….µ_k to these K examples

We might encounter issues where the cluster centroids get struck in global optimum like in the pic.

In order to avoid local optimum problem , we can randomly initialise k cluster centroid M times and run K-means M times then choose the m^th choice of initial K centroids, µ₁to µ_k and c¹to c^m which had the lowest J cost value.

Some intuition on when to have multiple random initialization

If K is small (2-10), running K means multiple times will likely help to avoid the problem of local optimum

If K is large (100s to 1000s) , running K means multiple time might not be beneficial as the first choice of the random K values will unlikely run into a local optimum situation .

Why ? (Below is what I think , not too sure too)

If the K is small , this means each example do not have much choices to choose which k to be associated with. Thus the choice of k which an example is tied to is much or less fixed throughout the time while iterating the 2 steps in K-means.

Imagine if the green, blue and red centroids in the pic started with location somewhat close to each other or 2 of them are close and one of them is further away. Then after one iteration of K-means, the 3 centroids highly will move to local optimal locations like the locations depicted in the bottom right.

This behaviour means that the movement of the cluster centroids in step 2 is minimal and since they don’t move much , it is more likely for the centroid to get stuck in local optimum locations.

On the contrary , if K is large to begin with , then the locations of the centroids will be more randomly disperse. Examples that are relatively closer to each other will highly choose different centroids. This should promotes the movement of the centroids during each iteration and therefore having the centroids getting stuck in local optimum is less likely to happen.