Chapter 1: Introduction

Overview

ML methods are classified mainly into 2 classes

  • Supervised Learning
  • Unsupervised Learning

What is Supervised Learning ?

Given a data set of training examples from past data whereby each example contain an input value and the correct output value, supervised learning methods will try to provide a prediction function which we can use it to predict the output value from an input value which is not inside the training set.   Used for regression (real value) and classification (discrete values) problems.

Example 1 :

Given past data of housing prices based on sizes. The input value will be the size and the correct output value will be the price.  Supervised learning will then learn the data, find some correlation/patterns, produce a predicting function which we can use it later to predict future housing prices (real value) based on a given size.

Example 2:

Given past data of tumor diagnosis(whether it is malignant or benign) based on tumor size,  supervised learning give us a function which can classified a future tumor if it is bad or good (discrete values) based on the size of tumor

What is unsupervised learning  ?

Given a data set, unsupervised learning is supposed to find some structure and cluster them together. The data set does not label or tagged the values.  The machines using unsupervised learning are not given the “correct” answers. It is supposed to find out and cluster similar data together.

Example of tumor records (of course there should be more rows and I am not sure if I am correct in my interpretation of unsupervised learning) :

Size (in cm^2) Diagnosis
0.05 Good
2 Bad

Supposed we change the word “Size”  to  “input”, “Diagnosis” to “output”,  “Good” to “1” and “Bad” to “0” and give it to unsupervised learning. These words does not give meaning to the values and thus not labelled correctly. Therefore unsupervised learning will try to find structure and cluster them together.

Own method of  distinguishing between supervised or unsupervised learning

If I imagined that I am the machine and input with a data set with values which does not make any sense to me , then the humans are trying to tell me to use unsupervised learning. If the values are properly tagged with meaning , then I should be making use of supervised learning.

Last Notes 

Please leave some comments / point out my mistakes/misinterpretation/misunderstandings.

Reserved the right to be wrong 🙂

Leave a comment