In the last post we have seen neural network with only two layers that is "Input layer" and "Output layer", which is like a logistic regression algorithm. However in this post we are going to code a Neural network with one more layer that is "hidden layer".
You will learn how to:
- Implement a 2-class classification neural network with a single hidden layer
- Use units with a non-linear activation function, such as tanh
- Compute the cross entropy loss
- Implement forward and backward propagation
# Package imports import numpy as np import matplotlib.pyplot as plt from testCases_v2 import * import sklearn import sklearn.datasets import sklearn.linear_model from planar_utils import plot_decision_boundary, sigmoid, load_planar_dataset, load_extra_datasets %matplotlib inline np.random.seed(1) # set a seed so that the results are consistent
Dataset
First, let's get the dataset you will work on. The following code will load a "flower" 2-class dataset into variables X and Y.In [16]:
X, Y = load_planar_dataset()
Visualize the dataset using matplotlib. The data looks like a "flower" with some red (label y=0) and some blue (y=1) points. Your goal is to build a model to fit this data. In other words, we want the classifier to define regions as either red or blue.In [17]:
plt.scatter(X[0, :], X[1, :], c=Y[0], s=40, cmap=plt.cm.Spectral)
Out[17]:
<matplotlib.collections.PathCollection at 0x27c7e1ee7f0>

You have:
- a numpy-array (matrix) X that contains your features (x1, x2)
- a numpy-array (vector) Y that contains your labels (red:0, blue:1).
Lets first get a better sense of what our data is like.
Exercise:
How many training examples do you have? In addition, what is the shape of the variables X and Y?In [19]:
X.shape,Y.shape
Out[19]:
((2, 400), (1, 400))
In [25]:
X.T.shape,Y.T.shape
Out[25]:
((400, 2), (400, 1))
Simple Logistic Regression
In [22]:
clf = sklearn.linear_model.LogisticRegressionCV();
Before building a full neural network, lets first see how logistic regression performs on this problem. You can use sklearn's built-in functions to do that. Run the code below to train a logistic regression classifier on the dataset.In [27]:
clf.fit(X.T,Y.T)
anaconda3\envs\tf\lib\site-packages\sklearn\utils\validation.py:761: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). y = column_or_1d(y, warn=True) anaconda3\envs\tf\lib\site-packages\sklearn\model_selection\_split.py:2053: FutureWarning: You should specify a value for 'cv' instead of relying on the default value. The default value will change from 3 to 5 in version 0.22. warnings.warn(CV_WARNING, FutureWarning)
Out[27]:
LogisticRegressionCV(Cs=10, class_weight=None, cv='warn', dual=False,
fit_intercept=True, intercept_scaling=1.0, max_iter=100,
multi_class='warn', n_jobs=None, penalty='l2',
random_state=None, refit=True, scoring=None, solver='lbfgs',
tol=0.0001, verbose=0)
You can now plot the decision boundary of these models. Run the code below.In [29]:
# Plot the decision boundary for logistic regression
plot_decision_boundary(lambda x: clf.predict(x), X, Y[0])
plt.title("Logistic Regression")
# Print accuracy
LR_predictions = clf.predict(X.T)
print ('Accuracy of logistic regression: %d ' % float((np.dot(Y, LR_predictions) + np.dot(1 - Y,1 - LR_predictions)) / float(Y.size) * 100) +
'% ' + "(percentage of correctly labelled datapoints)")
Accuracy of logistic regression: 47 % (percentage of correctly labelled datapoints)

Interpretation: The dataset is not linearly separable, so logistic regression doesn't perform well. Hopefully a neural network will do better. Let's try this now!
The general methodology to build a Neural Network is to:
1. Define the neural network structure ( # of input units, # of hidden units, etc).
2. Initialize the model's parameters
3. Loop: - Implement forward propagation - Compute loss - Implement backward propagation to get the gradients - Update parameters (gradient descent)
Neural Network model
Exercise:
Define three variables: -
n_x: the size of the input layer -
n_h: the size of the hidden layer (set this to 4) -
n_y: the size of the output layerIn [34]:
def layer_sizes(X, Y):
n_x=X.shape[0]
n_h=4
n_y=Y.shape[0]
return (n_x,n_h,n_y)
In [35]:
X_assess, Y_assess = layer_sizes_test_case()
(n_x, n_h, n_y) = layer_sizes(X_assess, Y_assess)
print("The size of the input layer is: n_x = " + str(n_x))
print("The size of the hidden layer is: n_h = " + str(n_h))
print("The size of the output layer is: n_y = " + str(n_y))
The size of the input layer is: n_x = 5 The size of the hidden layer is: n_h = 4 The size of the output layer is: n_y = 2
Initialize the model's parameters
Exercise:
Implement the function initialize_parameters()
Instructions: Make sure your parameters' sizes are right.
Refer to the neural network figure above if needed.
You will initialize the weights matrices with random values.
Use: np.random.randn(a,b) * 0.01 to randomly initialize a matrix of shape (a,b).
You will initialize the bias vectors as zeros.
Use: np.zeros((a,b)) to initialize a matrix of shape (a,b) with zeros.In [97]:
def initialize_parameters(n_x, n_h, n_y):
np.random.seed(2)
W1=np.random.randn(n_h,n_x)*.01
b1=np.zeros((n_h,1))
W2=np.random.randn(n_y,n_h)*.01
b2=np.zeros((n_y,1))
assert (W1.shape == (n_h, n_x))
assert (b1.shape == (n_h, 1))
assert (W2.shape == (n_y, n_h))
assert (b2.shape == (n_y, 1))
parameters = {"W1": W1,
"b1": b1,
"W2": W2,
"b2": b2}
return parameters
In [48]:
n_x, n_h, n_y = initialize_parameters_test_case()
parameters = initialize_parameters(n_x, n_h, n_y)
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))
W1 = [[-0.00416758 -0.00056267] [-0.02136196 0.01640271] [-0.01793436 -0.00841747] [ 0.00502881 -0.01245288]] b1 = [[0.] [0.] [0.] [0.]] W2 = [[-0.01057952 -0.00909008 0.00551454 0.02292208]] b2 = [[0.]]
Exercise:
Implement forward_propagation()In [58]:
def forward_propagation(X, parameters):
W1=parameters["W1"]
W2=parameters["W2"]
b1=parameters["b1"]
b2=parameters["b2"]
Z1=np.dot(W1,X)+b1
A1=np.tanh(Z1)
Z2=np.dot(W2,A1)+b2
A2=sigmoid(Z2)
assert(A2.shape == (1, X.shape[1]))
cache = {"Z1": Z1,
"A1": A1,
"Z2": Z2,
"A2": A2}
return A2, cache
In [59]:
X_assess, parameters = forward_propagation_test_case() A2, cache = forward_propagation(X_assess, parameters) # Note: we use the mean here just to make sure that your output matches ours. print(np.mean(cache['Z1']), np.mean(cache['A1']), np.mean(cache['Z2']), np.mean(cache['A2']))
0.26281864019752443 0.09199904522700109 -1.3076660128732143 0.21287768171914198
Exercise:
Implement compute_cost() to compute the value of the cost J.
In [67]:
def compute_cost(A2, Y, parameters):
m=Y.shape[1]
cost=(-1/m)*(np.sum((np.multiply(Y,np.log(A2))+(np.multiply((1-Y),np.log(1-A2))))))
cost = np.squeeze(cost) # makes sure cost is the dimension we expect.
assert(isinstance(cost, float))
return cost
In [68]:
A2, Y_assess, parameters = compute_cost_test_case()
print("cost = " + str(compute_cost(A2, Y_assess, parameters)))
cost = 0.6930587610394646
Exercise:
Implement the function backward_propagation()
In [75]:
def backward_propagation(parameters, cache, X, Y):
m=Y.shape[1]
Z1=cache["Z1"]
A1=cache["A1"]
Z2=cache["Z2"]
A2=cache["A2"]
dZ2=A2-Y
dW2=(1/m)*np.dot(dZ2,A1.T)
db2=(1/m)*np.sum(dZ2,axis=1,keepdims=True)
dZ1=np.multiply(np.dot(parameters["W2"].T,dZ2),1-np.power(A1,2))
dW1=(1/m)*np.dot(dZ1,X.T)
db1=(1/m)*np.sum(dZ1,axis=1,keepdims=True)
grads = {"dW1": dW1,
"db1": db1,
"dW2": dW2,
"db2": db2}
return grads
In [76]:
parameters, cache, X_assess, Y_assess = backward_propagation_test_case()
grads = backward_propagation(parameters, cache, X_assess, Y_assess)
print ("dW1 = "+ str(grads["dW1"]))
print ("db1 = "+ str(grads["db1"]))
print ("dW2 = "+ str(grads["dW2"]))
print ("db2 = "+ str(grads["db2"]))
dW1 = [[ 0.00301023 -0.00747267] [ 0.00257968 -0.00641288] [-0.00156892 0.003893 ] [-0.00652037 0.01618243]] db1 = [[ 0.00176201] [ 0.00150995] [-0.00091736] [-0.00381422]] dW2 = [[ 0.00078841 0.01765429 -0.00084166 -0.01022527]] db2 = [[-0.16655712]]
Exercise:
Implement the update rule.
Use gradient descent. #
You have to use (dW1, db1, dW2, db2) in order to update (W1, b1, W2, b2).In [80]:
def update_parameters(parameters, grads, learning_rate = 1.2):
dw1=grads["dW1"]
db1=grads["db1"]
dw2=grads["dW2"]
db2=grads["db2"]
W1=parameters["W1"]
W2=parameters["W2"]
b1=parameters["b1"]
b2=parameters["b2"]
W1=W1-np.multiply(learning_rate,dw1)
b1=b1-np.multiply(learning_rate,db1)
W2=W2-np.multiply(learning_rate,dw2)
b2=b2-np.multiply(learning_rate,db2)
parameters = {"W1": W1,
"b1": b1,
"W2": W2,
"b2": b2}
return parameters
In [81]:
parameters, grads = update_parameters_test_case()
parameters = update_parameters(parameters, grads)
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))
W1 = [[-0.00643025 0.01936718] [-0.02410458 0.03978052] [-0.01653973 -0.02096177] [ 0.01046864 -0.05990141]] b1 = [[-1.02420756e-06] [ 1.27373948e-05] [ 8.32996807e-07] [-3.20136836e-06]] W2 = [[-0.01041081 -0.04463285 0.01758031 0.04747113]] b2 = [[0.00010457]]
Exercise:
Build your neural network model in nn_model()In [85]:
def nn_model(X, Y, n_h, num_iterations = 10000, print_cost=True):
np.random.seed(3)
(n_x,n_h,n_y)=layer_sizes(X, Y)
n_h=n_h
parameters = initialize_parameters(n_x, n_h, n_y)
costs=[]
for i in range(num_iterations):
A2, cache = forward_propagation(X, parameters)
cost=compute_cost(A2, Y, parameters)
grads = backward_propagation(parameters, cache, X, Y)
parameters = update_parameters(parameters, grads)
# Print the cost every 1000 iterations
if print_cost and i % 1000 == 0:
print ("Cost after iteration %i: %f" % (i, cost))
return parameters
In [87]:
X_assess, Y_assess = nn_model_test_case()
parameters = nn_model(X_assess, Y_assess, 4, num_iterations=10000, print_cost=True)
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))
Cost after iteration 0: 0.692739 Cost after iteration 1000: 0.000218 Cost after iteration 2000: 0.000107 Cost after iteration 3000: 0.000071 Cost after iteration 4000: 0.000053 Cost after iteration 5000: 0.000042 Cost after iteration 6000: 0.000035 Cost after iteration 7000: 0.000030 Cost after iteration 8000: 0.000026 Cost after iteration 9000: 0.000023 W1 = [[-0.65848169 1.21866811] [-0.76204273 1.39377573] [ 0.5792005 -1.10397703] [ 0.76773391 -1.41477129]] b1 = [[ 0.287592 ] [ 0.3511264 ] [-0.2431246 ] [-0.35772805]] W2 = [[-2.45566237 -3.27042274 2.00784958 3.36773273]] b2 = [[0.20459656]]
Exercise:
Use your model to predict by building predict(). Use forward propagation to predict results.
In [90]:
def predict(parameters, X):
A2, cache = forward_propagation(X, parameters)
predictions = np.round(A2)
return predictions
In [91]:
parameters, X_assess = predict_test_case()
predictions = predict(parameters, X_assess)
print("predictions mean = " + str(np.mean(predictions)))
predictions mean = 0.6666666666666666
It is time to run the model and see how it performs on a planar dataset. Run the following code to test your model with a single hidden layerIn [93]:
# Build a model with a n_h-dimensional hidden layer
parameters = nn_model(X, Y, n_h = 4, num_iterations = 10000, print_cost=True)
# Plot the decision boundary
plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y[0])
plt.title("Decision Boundary for hidden layer size " + str(4))
Cost after iteration 0: 0.693048 Cost after iteration 1000: 0.288083 Cost after iteration 2000: 0.254385 Cost after iteration 3000: 0.233864 Cost after iteration 4000: 0.226792 Cost after iteration 5000: 0.222644 Cost after iteration 6000: 0.219731 Cost after iteration 7000: 0.217504 Cost after iteration 8000: 0.219504 Cost after iteration 9000: 0.218571
Out[93]:
Text(0.5, 1.0, 'Decision Boundary for hidden layer size 4')

In [94]:
# Print accuracy
predictions = predict(parameters, X)
print ('Accuracy: %d' % float((np.dot(Y,predictions.T) + np.dot(1-Y,1-predictions.T))/float(Y.size)*100) + '%')
Accuracy: 90%
Refrences:
https://www.coursera.org/ Deep learning Specialization
planar_utils.py file code below:
In [104]:
# import matplotlib.pyplot as plt
# import numpy as np
# import sklearn
# import sklearn.datasets
# import sklearn.linear_model
# def plot_decision_boundary(model, X, y):
# # Set min and max values and give it some padding
# x_min, x_max = X[0, :].min() - 1, X[0, :].max() + 1
# y_min, y_max = X[1, :].min() - 1, X[1, :].max() + 1
# h = 0.01
# # Generate a grid of points with distance h between them
# xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
# # Predict the function value for the whole grid
# Z = model(np.c_[xx.ravel(), yy.ravel()])
# Z = Z.reshape(xx.shape)
# # Plot the contour and training examples
# plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral)
# plt.ylabel('x2')
# plt.xlabel('x1')
# plt.scatter(X[0, :], X[1, :], c=y, cmap=plt.cm.Spectral)
# def sigmoid(x):
# """
# Compute the sigmoid of x
# Arguments:
# x -- A scalar or numpy array of any size.
# Return:
# s -- sigmoid(x)
# """
# s = 1/(1+np.exp(-x))
# return s
# def load_planar_dataset():
# np.random.seed(1)
# m = 400 # number of examples
# N = int(m/2) # number of points per class
# D = 2 # dimensionality
# X = np.zeros((m,D)) # data matrix where each row is a single example
# Y = np.zeros((m,1), dtype='uint8') # labels vector (0 for red, 1 for blue)
# a = 4 # maximum ray of the flower
# for j in range(2):
# ix = range(N*j,N*(j+1))
# t = np.linspace(j*3.12,(j+1)*3.12,N) + np.random.randn(N)*0.2 # theta
# r = a*np.sin(4*t) + np.random.randn(N)*0.2 # radius
# X[ix] = np.c_[r*np.sin(t), r*np.cos(t)]
# Y[ix] = j
# X = X.T
# Y = Y.T
# return X, Y
# def load_extra_datasets():
# N = 200
# noisy_circles = sklearn.datasets.make_circles(n_samples=N, factor=.5, noise=.3)
# noisy_moons = sklearn.datasets.make_moons(n_samples=N, noise=.2)
# blobs = sklearn.datasets.make_blobs(n_samples=N, random_state=5, n_features=2, centers=6)
# gaussian_quantiles = sklearn.datasets.make_gaussian_quantiles(mean=None, cov=0.5, n_samples=N, n_features=2, n_classes=2, shuffle=True, random_state=None)
# no_structure = np.random.rand(N, 2), np.random.rand(N, 2)
# return noisy_circles, noisy_moons, blobs, gaussian_quantiles, no_structure





Comments