Chapter 11: Unsupervised Learning

K – Means Algorithm

An unsupervised learning algorithm for finding structure for an unlabelled data set. K Means is an iterative algorithm

1. Pick K where K denotes the number of clusters which we want our unlabelled data to be divided into

2. Randomly initialized K markers known as cluster centroids.

3. Run K Means algorithm

K Means Algorithm is divided into 2 steps. These 2 steps are repeated iteratively until the cluster converge

1. cluster assignment which is to assign each example to a cluster centroid which is the closest to it.

2. move centroid executed which is to compute the mean location of all the examples assigned to particular centroid then move it there. Do it for all centroids

 

Training set = {x1,x2,x3, ……. , xm}

xi will be a n dimensional vector (no longer contain the feature 0 which is always 1)

So:

Random initialize K centroids (µ1, µ2, µ3,…..,µk)

Repeat {

for i = 1 to m:
ci := µ which is the closest to xi(closest in terms of distance : min(||xi – µk||2, for k= 1 to K) )

for k = 1 to K:

µk := average of all xi which are assigned to µci

ci denotes the cluster centroid to which example xi has been assigned)

}

µk  is also a n dimensional vector

 

Optimization Objective

The cost function of K-means is a function of c1, c2….. cm and µ1 , µ2 …. µk  denoted as follows:

J(c1,….. cm12 …. µk) = 1/m * mΣi=1 || xi – µci ||2 and we want to minimize this cost function.

We want to find what specific values of c1,….. cm12 …. µk can produce the smallest cost.

The cluster assignment step in K mean will minimise J as it choosing the closest centroid for each example

The move centroid step will use the optimal position of the centroids.

Together , when K means converge , we should had minimized cost function J.

 

Random Initialization 

How to initialize K cluster centroid and avoiding local optimal problem ?

Recommended method as follows:

One condition which should be true is K < m

1. Randomly pick K training examples

2. Set µ12 …….µk to these K examples

We might encounter issues where the cluster centroids get struck in global optimum like in the pic.

ml8

In order to avoid local optimum problem , we can randomly initialise k cluster centroid M times and run K-means M times then choose the mth choice of initial K centroids, µto µ and  cto c which had the lowest J cost value.

Some intuition on when to have multiple random initialization 

If K is small  (2-10), running K means multiple times will likely help to avoid the problem of local optimum

If K is large (100s to 1000s) , running K means multiple time might not be beneficial as the first choice of the random K values will unlikely run into a local optimum situation .

Why ? (Below is what I think , not too sure too)

If the K is small , this means each example do not have much choices to choose which k to be associated with. Thus the choice of k which an example is tied to is much or less fixed throughout the time while iterating the 2 steps in K-means.

Imagine if the green, blue and red centroids in the pic started with location somewhat close to each other or 2 of them are close and one of them is further away. Then after one iteration of K-means, the 3 centroids highly will move to local optimal locations like the locations depicted in the bottom right.

This behaviour  means that the movement of the cluster centroids in step 2 is minimal and since they don’t move much , it is more likely for the centroid to get stuck in local optimum locations.

On the contrary , if K is large  to begin with , then the locations of the centroids will be more randomly disperse. Examples that are relatively closer to each other will highly choose different centroids. This should promotes the movement of the centroids during each iteration and therefore having the centroids getting stuck in local optimum is less likely to happen.

 

 

Leave a comment