Creating a two-layer neural network from scratch using only Numpy¶

import numpy as np

Defining architecture parameters¶

It will be a two-layer network (1 hidden layer) with variable number of units in each layer.

Ni = 2
Nh = 2
No = 3

Defining the network parameters and initializing them.¶

sig = 0.01
W1 = sig * np.random.rand(Ni, Nh)
b1 = 0 * np.random.rand(1 , Nh)
W2 = sig * np.random.rand(Nh, No)
b2 = 0 * np.random.rand(1 , No)

Defining the model forward pass¶

The hidden layer uses RELU: $h = \max(xW1 + b1)$ The output layer uses Softmax activation: $o = \frac{e^{\text{scores}}}{\sum_i e^{\text{scores}_i}}$ where $\text{scores} = hW2+b2.$

def forward(x):
    h = np.maximum(0, np.dot(x,W1)+b1)
    score = np.dot(h,W2)+b2
    o = np.exp(score)
    o /= np.sum(o, axis=1, keepdims=True)
    return o

Defining the back-propagation pass¶

We must calculate the partial derivatives to compute the gradient

def backward(x, y):    
    ####Forward pass
    h = np.maximum(0, np.dot(x,W1)+b1)
    score = np.dot(h,W2)+b2
    o = np.exp(score)
    o /= np.sum(o, axis=1, keepdims=True)
    
    ####Backward
    n_samples = x.shape[0]
    
    dscore = np.copy(o)
    dscore[range(n_samples), y] -= 1
    dscore /= n_samples
    
    dW2 = np.dot(h.T, dscore)
    db2 = np.sum(dscore, axis=0, keepdims=True)

    dh = np.dot(dscore, W2.T)
    dh[h <= 0] = 0
    
    dW1 = np.dot(x.T, dh)
    db1 = np.sum(dh, axis=0, keepdims=True)
    
    return dW1, db1, dW2, db2

Defining the loss function: cross-entropy aka log-loss¶

def loss(x,y):
    p = forward(x)
    logloss = -np.log(p[range(x.shape[0]), y])
    return np.sum(logloss)/x.shape[0]

Example loss

from sklearn.datasets import make_classification

x = np.array(((1,3),(3,-1),(6,9)))
y = np.array((0,1,1))

x,y = make_classification(20, 2, 2, 0, 0, 2, 1)

print(forward(x))
print('Example loss: {}'.format(loss(x,y)))

[[  9.99967583e-01   2.31099181e-05   9.30688800e-06]
 [  1.20516577e-04   9.99823303e-01   5.61801230e-05]
 [  9.97991460e-01   1.81858111e-03   1.89958515e-04]
 [  9.80781404e-01   1.82874610e-02   9.31134996e-04]
 [  1.26273428e-08   9.99999783e-01   2.03928412e-07]
 [  7.99157169e-01   1.96336092e-01   4.50673980e-03]
 [  9.94706205e-01   4.91649903e-03   3.77295763e-04]
 [  3.03776693e-09   9.99999960e-01   3.68116381e-08]
 [  5.34658617e-06   9.99986335e-01   8.31859387e-06]
 [  4.59835088e-10   9.99999977e-01   2.27363715e-08]
 [  9.96180798e-01   3.51958257e-03   2.99619173e-04]
 [  6.85554625e-01   3.08570869e-01   5.87450571e-03]
 [  9.02260129e-01   9.49085068e-02   2.83136424e-03]
 [  4.37793355e-09   9.99999889e-01   1.06514102e-07]
 [  2.88413828e-09   9.99999915e-01   8.24657470e-08]
 [  9.95850198e-01   3.83207603e-03   3.17726335e-04]
 [  1.03256900e-02   9.89654850e-01   1.94600432e-05]
 [  4.67339009e-04   9.99403718e-01   1.28942866e-04]
 [  9.32387466e-06   9.99978978e-01   1.16986075e-05]
 [  6.34873285e-01   3.58760346e-01   6.36636834e-03]]
Example loss: 0.0887813325272285

Gradient descent optimization¶

epochs = 10000
lr = 1e-2

for i in range(epochs):
    dW1, db1, dW2, db2 = backward(x,y)
    W1 -= lr*dW1
    b1 -= lr*db1
    W2 -= lr*dW2
    b2 -= lr*db2
    
    if i%1000 == 0:
        print("Loss at iteration {}: {}".format(i, loss(x,y)))

Loss at iteration 0: 0.10059516800864957
Loss at iteration 1000: 0.09901802000879636
Loss at iteration 2000: 0.09756084060466579
Loss at iteration 3000: 0.09620885095312401
Loss at iteration 4000: 0.09494796376343123
Loss at iteration 5000: 0.09376818723494482
Loss at iteration 6000: 0.09265943036516561
Loss at iteration 7000: 0.09161442384153387
Loss at iteration 8000: 0.09062266452680043
Loss at iteration 9000: 0.08968079707415758

print(forward(x))
print(y)

[[  9.99967583e-01   2.31099181e-05   9.30688800e-06]
 [  1.20516577e-04   9.99823303e-01   5.61801230e-05]
 [  9.97991460e-01   1.81858111e-03   1.89958515e-04]
 [  9.80781404e-01   1.82874610e-02   9.31134996e-04]
 [  1.26273428e-08   9.99999783e-01   2.03928412e-07]
 [  7.99157169e-01   1.96336092e-01   4.50673980e-03]
 [  9.94706205e-01   4.91649903e-03   3.77295763e-04]
 [  3.03776693e-09   9.99999960e-01   3.68116381e-08]
 [  5.34658617e-06   9.99986335e-01   8.31859387e-06]
 [  4.59835088e-10   9.99999977e-01   2.27363715e-08]
 [  9.96180798e-01   3.51958257e-03   2.99619173e-04]
 [  6.85554625e-01   3.08570869e-01   5.87450571e-03]
 [  9.02260129e-01   9.49085068e-02   2.83136424e-03]
 [  4.37793355e-09   9.99999889e-01   1.06514102e-07]
 [  2.88413828e-09   9.99999915e-01   8.24657470e-08]
 [  9.95850198e-01   3.83207603e-03   3.17726335e-04]
 [  1.03256900e-02   9.89654850e-01   1.94600432e-05]
 [  4.67339009e-04   9.99403718e-01   1.28942866e-04]
 [  9.32387466e-06   9.99978978e-01   1.16986075e-05]
 [  6.34873285e-01   3.58760346e-01   6.36636834e-03]]
[0 1 0 0 1 0 0 1 1 1 0 0 0 1 1 0 1 1 1 1]