Chapter 6: Neural Networks (Representation)

General Idea

If our feature set is large, then it become very computing intensive to calculate the hypothesis. If we reduce our feature set , then we cannot learn non-linear hypothesis. Much of the machine learning problems involve large features set with the examples lies in different regions and a linear hypothesis will not work well . So we need to have large feature set and compute them efficiently by using neural networks in order to derive non-linear hypothesis.

Growth of the number of features

Suppose we have n = 100 features, x1 …… x100. If we include all polynomial terms up to a max of 2 degree (X[i] * X[j]) where 100 > i > 1, 100 > j > 1, we will have roughly 5000 features. (This is like how many ways can i choose 2 from 101 )

If we include polynomial to 3 degree (X[a] * X[i] * X[j]), then we have 171700. (choosing 3 from 102)

The number of features really grows exponentially. Imagine computer vision problem, where each example of our image in the training set is just a 50pixel * 50pixel image . This will equate to 2500 features (each pixel represent one feature ) . 2 degree polynomial hypothesis will result in ≈ 3.12 million.

Neuron Network Model

The green circles denotes the inputs features X₁, X₂, X₃, X₄

The blue and red circles can be treated like the neurons.

The blue circles more formally denotes the “activation” (a) of the green inputs, therefore we have 5 a. Looking from top down , we have ²a₁, ²a₂, ²a₃, ²a₄, ²a₅.

Note: We should have X₀ and ¹a₀ known as the “bias” unit. They output the value of 1 but we leave them out in picture.

The superscript digit denotes the layer number. So green circles is layer 1 , blue circles is the 2nd layer … The subscript denote the neuron or activation number in that layer. So ²a₄ denotes the activation of the fourth neuron in layer 2.

There red circle denotes the output of H_Θ(X).

²a₁ = g(¹Θ₁₀X₀ + ¹Θ₁₁X₁ + ¹Θ₁₂X₂ + ¹Θ₁₃X₃ + ¹Θ₁₄X₄ )

²a₂ = g(¹Θ₂₀X₀ + ¹Θ₂₁X₁ + ¹Θ₂₂X₂ + ¹Θ₂₃X₃ + ¹Θ₂₄X₄ )

²a₃ = g(¹Θ₃₀X₀ + ¹Θ₃₁X₁ + ¹Θ₃₂X₂ + ¹Θ₃₃X₃ + ¹Θ₃₄X₄ )

²a₄ = g(¹Θ₄₀X₀ + ¹Θ₄₁X₁ + ¹Θ₄₂X₂ + ¹Θ₄₃X₃ + ¹Θ₄₄X₄ )

²a₅ = g(¹Θ₅₀X₀ + ¹Θ₅₁X₁ + ¹Θ₅₂X₂ + ¹Θ₅₃X₃ + ¹Θ₅₄X₄ )

¹Θ is a theta matrix. The superscript 1 denotes that this matrix is associated with layer 1 and by same logic ²Θ denotes the 2nd layer theta matrix.

If a layer L has j neurons and the next layer L + 1 has n neurons, then the theta matrix associated with layer L will have a dimension of n * (j + 1).

So in order to compute the 5 a above , we need 5 * ( 4 + 1 ) = 5 * 5 theta matrix.

¹Θ₅₂

The digit 5 denote the row and 2 denotes the 3rd (remember we start from 0 column) column . Superscript 1 represent the first layer . Altogether it represent the 5th row , 3rd column theta of the 1st theta matrix .

Lastly:

H_Θ(X) = g(²Θ₁₀²a₀ + ²Θ₁₁²a₁ + ²Θ₁₂²a₂ + ²Θ₁₃²a₃ + ²Θ₁₄²a₄ + ²Θ₁₅²a₅ )

G = sigmoid function

Vectorized Implementation

a² = g(Θ¹ * X) = z² which is 5 dimensional vector but after adding the bias unit ²a₀ , it becomes 6.

H_Θ(X) = g(Θ² * z²)

Multiclass Classification

Previously we have y ∈ { 1 , 2 , 3 , 4 } suppose we have 4 classes.

Now we will represent y as a 4 dimensional vector where y is one of the following 4 vectors

[1;0;0;0] , [0;1;0;0] , [0;0;1;0] or [0;0;0;1]

Therefore we want H_Θ(X) ≈ y

	jin long lee on CS7641
	Nandhini on CS7641
	xiangnan on Exercise 4
	jin long lee on Exercise 2 Part 1
	brin on Exercise 2 Part 1

ML Study Notes

Chapter 6: Neural Networks (Representation)

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply