Professional Documents
Culture Documents
Neural Network
in Python
Divanshu Singh
AIM
To understand NN
To be able to code NN in any language of choice
NOTE : I have changed code to make it more simple and dependency free. Code in
book uses scipy and matpotlib functions.
Things about NN not found in books
NN are very old. Older than first electronic computer.
Model for Neural Network was built in 1943
First electronic was built in 1946
Why are they famous ?
Because Universal Approximation theorem states that NN can compute any function
Even functions which we can not define like how our brain works or compute something
For 50 years NN were not used because our computers were not fast enough
Hidden Layer
Output
Y
NAND
Why NAND ? Because if NN can learn NAND then it can do anything that any
boolean logic based circuit can do. NAND is universal gate.
X Y OUTPUT
0 0 1
0 1 1
1 0 1
1 1 0
Input Layer Output Layer
Output
Y
How many nodes in middle layer ?
Input Layer Output Layer
Output
Y
No answer.
Input Layer Output Layer
Generally, it depends
on the amount of data
that NN needs to
X learn. Like in our case
it needs to learn 4
specific cases. So we
need at least 4 nodes Output
in middle layer.
Output
Y
Hidden Layer
Input Layer w1
X w2
w3
w4
Hidden Layer
Input Layer
v1
v2
Y v3
v4
Hidden Layer
Input Layer w1
v1
X w2
w3
v2
w4
Y v3
v4
Hidden Layer
Input Layer w1
v1
X w2
X Y OUTPUT
w3 0 0 1
v2
0 1 1
w4
Y v3 1 0 1
1 1 0
v4
Hidden Layer
0.w1 + 1.v1
Input Layer w1
v1
X=0 w2 0.w2 + 1.v2
w3
v2
w4 0.w3 + 1.v3
Y=1 v3
v4
0.w4 + 1.v4
0.w1 + 1.v1 0.w2 + 1.v2 0.w3 + 1.v3 0.w4 + 1.v4
Input Layer w1
v1
X Y w1 w 2 w 3 w 4
X=0 w2
v1 v 2 v 3 v 4
w3
v2
w4
Y=1 v3
v4
0.w1 + 1.v1 0.w2 + 1.v2 0.w3 + 1.v3 0.w4 + 1.v4
Input Layer w1
v1
X Y w1 w 2 w 3 w 4
X=0 w2
v1 v 2 v 3 v 4
w3
v2
w4 General rule 1 :
Y=1 v3 Layer (the area b/w two set of nodes) =
Input Layer w1
v1
X Y w1 w 2 w 3 w 4
X=0 w2
v1 v 2 v 3 v 4
w3
v2
w4 General rule 2 :
Y=1 v3 Note size of matrices:
Matrix 1 is 1 x 2
v4
Matrix 2 is 2 x 4
# Code
w3
v2
w4
Y=1 v3
v4
# input data
x = numpy.array([[0,0],
Input Layer w1 [0,1],
v1 [1,0],
[1,1]])
X=0 w2
# output data
w3
v2 y = numpy.array([[1],
w4 [1],
Y=1 v3 [1],
[0]])
v4
# weights without bias
w1 = numpy.random.rand(2, 4)
Input Layer w1
v1
X=0 w2
w3
v2
w4
Y=1 v3
v4
# weights with bias
w1 = numpy.random.rand(2, 4)+1
Input Layer w1
v1
X=0 w2
w3
v2
w4
Y=1 v3
v4
# weights with bias
h1
w1 = numpy.random.rand(2, 4)+1
Input Layer w1
v1
# calculating input to hidden
X=0 w2 h2 # layer
w3 h = numpy.dot(x, w1)
v2
w4 # h will 1 x 4 matrix
h3
Y=1 v3
v4
h4
Who will decide what goes out of
h1 hidden layer ?
v4
h4
# sigmoid function
h1
def nonlin(x):
Input Layer w1 return (1/(1 + numpy.exp(-x)))
v1
X=0 w2 h2
w3
v2
w4
h3
Y=1 v3
v4
h4
# sigmoid function
h1
def nonlin(x):
Input Layer w1 return (1/(1 + numpy.exp(-x)))
v1
X=0 w2 h2 # h = numpy.dot(x, w1)
w3 l2 = nonlin(numpy.dot(x, w1))
v2
w4
h3
Y=1 v3
v4
h4
Hidden Layer
[1 x 2] [2 x 4] [1 x 4] [4 x 1]
= [1 x 4] = [1 x 1]
Input Layer Output Layer
Output
Y
# code file nn.py # feed forward
import numpy
l1 = x
x = numpy.array([[0,0],
[0,1], l2 = nonlin(numpy.dot(l1, w1))
[1,0],
[1,1]]) l3 = nonlin(numpy.dot(l2, w2))
y = numpy.array([[1], l3_errors = y - l3
[1],
[1], print "Total error :"
[0]])
print
w1 = numpy.random.rand(2, 4) + 1 numpy.mean(numpy.abs(l3_errors))
w2 = numpy.random.rand(4, 1) + 1
def nonlin(x):
return (1/(1 + numpy.exp(-x)))
L2
Send error back to everyone
(Blame everyone)
L1 L3
E
Output R
R
O
Y R
L2
Blame everyone as per their
weight proportion
L1 L3
E
Output R
R
O
Y R
L2
Blame everyone as per their
weight proportion
L1 L3
w1
X
w2
E
Output R
w3 R
O
Y R
w4
L2 W1
Blame everyone as per their
weight proportion W1 + w 2 + w 3 + w 4
L1 L3
w1
X
w2
E
Output R
w3 R
O
Y R
w4
L2 W1
Blame everyone as per their
weight proportion W1 + w 2 + w 3 + w 4
L3
w1
W1 W2
W1 + w 2 + w 3 + w 4 W1 + w 2 + w 3 + w 4
w2
E
Output R
w3 R
O
R
w4
MT
Transpose of a matrix
L2
Back Propagation Matrix is
L3
w1
E. WT
w2
E
Remember,
Output R
w3 R
O
Error matrix was 4 x 1 R
W2 was 4 x 1 w4
So WT will be 1 x 4
We now know following:
x = numpy.array([[0,0], l1 = x
[0,1],
[1,0], l2 = nonlin(numpy.dot(l1, w1))
[1,1]])
l3 = nonlin(numpy.dot(l2, w2))
w2 = numpy.random.rand(4, 1) + 1 l3_errors = y - l3
l2_error = l3_delta.dot(w2.T)
w2 = numpy.random.rand(4, 1) + 1
l2_delta = l2_error * nonlin(l2,
def nonlin(x, deriv=False):
deriv=True)
if deriv==True:
return (x * (1 - x)) w2 += l2.T.dot(l3_delta)
return (1/(1 + numpy.exp(-x)))
w1 += l1.T.dot(l2_delta)
Accuracy ?
77%
How to improve it ?
How do teach a child to speak ?