CNN Basics Explained

Convolutional
neural networks
Abin - Roozgard1
Presentation layout
 Introduction
 Drawbacks of previous neural networks
 Convolutional neural networks
 LeNet 5
 Comparison
 Disadvantage
 Application
2
i n t r o d u c t i o n
3
Introduction
 Convolutional neural networks

 Signal processing, Image processing
 improvement over the multilayer perceptron

 performance, accuracy and some degree of
invariance to distortions in the input images
4
Drawbacks
5
Behavior of multilayer neural networks
Types of Exclusive-OR Classes with Most General

Structure Decision Regions Problem Meshed regions Region Shapes
Single-Layer Half Plane A B

Bounded By B
Hyper plane A
B A
Two-Layer Convex Open A B

Or B
Closed Regions A
B A
Three-Layer Arbitrary
(Complexity A B
Limited by No. B
A
of Nodes) B A
6
Multi-layer perceptron and image processing
 One or more hidden layers

 Sigmoid activations functions
7
Drawbacks of previous neural networks
 the number of trainable parameters becomes

extremely large
8
 Little or no invariance to shifting, scaling, and other

forms of distortion
9
 Little or no invariance to shifting, scaling, and other

forms of distortion
Shift left
10
154 input change

from 2 shift left
77 : black to white
77 : white to black
11
 scaling, and other forms of distortion
12
 the topology of the input data is completely ignored

 work with raw data.
Feature 1
Feature 2
13
Black and white patterns: 232*32  21024
Gray scale patterns : 25632*32  2561024
32 * 32 input image
14
A A
15
Improvement
 Fully connected network of sufficient size can produce

outputs that are invariant with respect to such
variations.
 Training time
 Network size
 Free parameters
16
Convolutional neural network
(CNN)
17
History
Yann LeCun, Professor of Computer Science

The Courant Institute of Mathematical Sciences
New York University
Room 1220, 715 Broadway, New York, NY 10003, USA.
(212)998-3283 yann@cs.nyu.edu
 In 1995, Yann LeCun and Yoshua Bengio introduced

the concept of convolutional neural networks.
18
About CNN’s
 CNN’s Were neurobiologically motivated by the

findings of locally sensitive and orientation-selective
nerve cells in the visual cortex.
 They designed a network structure that implicitly

extracts relevant features.
 Convolutional Neural Networks are a special kind of

multi-layer neural networks.
19
About CNN’s
 CNN is a feed-forward network that can extract

topological properties from an image.
 Like almost every other neural networks they are

trained with a version of the back-propagation
algorithm.
 Convolutional Neural Networks are designed to

recognize visual patterns directly from pixel images
with minimal preprocessing.
 They can recognize patterns with extreme variability

(such as handwritten characters).
20
Classification
 Classification f1
f2
Input
Pre-processing for f1 … fn Classification output
feature extraction
 Convolutional neural network
Feature Shift and distortion

Input classification output
extraction invariance
21
CNN’s Topology
Feature maps
C
Feature extraction layer
Convolution layer
Shift and distortion invariance or S

Subsampling layer
22
Feature extraction layer or Convolution layer
 detect the same feature at different positions in the

input image. features
23
Feature extraction
-1 0 1
Convolve with Threshold
-1 0 1
-1 0 1
w11 w12 w13

w11 w12 w13
w21 w22 w23
w21 w22 w23
w31 w32 w33
w31 w32 w33
24
Feature extraction
 Shared weights: all neurons in a feature share the

same weights (but not the biases).
 In this way all neurons detect the same feature at
different positions in the input image.
 Reduce the number of free parameters.
Inputs C S
25
Feature extraction
 If a neuron in the feature map fires, this corresponds to

a match with the template.
26
Subsampling layer
 the subsampling layers reduce the spatial resolution of

each feature map
 By reducing the spatial resolution of the feature map,

a certain degree of shift and distortion invariance is
achieved.
27
Subsampling layer
 the subsampling layers reduce the spatial resolution of

each feature map
28
Subsampling layer
29
Subsampling layer
 The weight sharing is also applied in subsampling

layers.
30
Subsampling layer
 the weight sharing is also applied in subsampling layers

 reduce the effect of noises and shift or distortion
Feature map
31
Up to now …
32
LeNet 5
33
LeNet5
 Introduced by LeCun.
 raw image of 32 × 32 pixels as input.
34
LeNet5
 C1,C3,C5 : Convolutional layer.

 5 × 5 Convolution matrix.
 S2 , S4 : Subsampling layer.
 Subsampling by factor 2.
 F6 : Fully connected layer.
35
LeNet5
 All the units of the layers up to F6 have a sigmoidal acti

vation function of the type:
y j   (v j )  A tanh( Sv j )
36
LeNet5
+1
+1
84
F1 W1 Y j   ( Fi  Wij ) 2 , j  0,...,9
F2 W2 i 1
0 Y0
+1
W84
F84
37
LeNet5
 About 187,000 connection.

 About 14,000 trainable weight.
38
LeNet5
39
LeNet5
40
C o m p a r i s o n
41
Comparison
 Database : MNIST (60,000 handwritten digits )

 Affine distortion : translation, rotation.
 elastic deformations : corresponding to uncontrolled
oscillations of the hand muscles.
 MLP (this paper) : has 800 hidden unit.
42
Comparison
 “This paper” refer to reference[ 3] on references slide.
43
Disadvantages
44
Disadvantages
 From a memory and capacity standpoint the CNN is
not much bigger than a regular two layer network.
 At runtime the convolution operations are

computationally expensive and take up about
67% of the time.
 CNN’s are about 3X slower than their fully connected

equivalents (size-wise).
45
Disadvantages
 Convolution operation
 4 nested loops ( 2 loops on input image & 2 loops on kernel)
 Small kernel size

 make the inner loops very inefficient as they frequently JMP.
 Cash unfriendly memory access

 Back-propagation require both row-wise and column-wise
access to the input and kernel image.
 2-D Images represented in a row-wise-serialized order.
 Column-wise access to data can result in a high rate of cash
misses in memory subsystem.
46
A p p l i c a t i o n
47
Application
Mike O'Neill
 A Convolutional neural network achieves 99.26%

accuracy on a modified NIST database of
hand-written digits.
 MNIST database : Consist of 60,000 hand written digits

uniformly distributed over 0-9.
48
Application
49
Application
50
Application
51
Application
52
References
[1].Y. LeCun and Y. Bengio.“Convolutional networks for image

s, speech, and time-series.” In M. A. Arbib, editor, The Hand
book of Brain Theory and Neural Networks. MIT Press, 1995.
[2].Fabien Lauer, ChingY. Suen, Gérard Bloch,”A trainable

feature extractor for handwritten digit recognition“,Elsevier,
october 2006.
[3].Patrice Y. Simard, Dave Steinkraus, John Platt, "Best

Practices for Convolutional Neural Networks Applied to
Visual Document Analysis," International Conference on
Document Analysis and Recognition (ICDAR), IEEE
Computer Society, Los Alamitos, pp. 958-962, 2003.
53
Questions
4 8
6
54

CNN Basics Explained

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CNN Basics Explained

Uploaded by

Copyright:

Available Formats

Convolutional

 Drawbacks of previous neural networks

 Convolutional neural networks

 Convolutional neural networks

 improvement over the multilayer perceptron

Types of Exclusive-OR Classes with Most General

Single-Layer Half Plane A B

Two-Layer Convex Open A B

 One or more hidden layers

 the number of trainable parameters becomes

 Little or no invariance to shifting, scaling, and other

 Little or no invariance to shifting, scaling, and other

154 input change

 scaling, and other forms of distortion

 the topology of the input data is completely ignored

Black and white patterns: 232*32  21024

Gray scale patterns : 25632*32  2561024

 Fully connected network of sufficient size can produce

Yann LeCun, Professor of Computer Science

 In 1995, Yann LeCun and Yoshua Bengio introduced

 CNN’s Were neurobiologically motivated by the

 They designed a network structure that implicitly

 Convolutional Neural Networks are a special kind of

 CNN is a feed-forward network that can extract

 Like almost every other neural networks they are

 Convolutional Neural Networks are designed to

 They can recognize patterns with extreme variability

 Convolutional neural network

Feature Shift and distortion

Shift and distortion invariance or S

 detect the same feature at different positions in the

w11 w12 w13

 Shared weights: all neurons in a feature share the

 If a neuron in the feature map fires, this corresponds to

 the subsampling layers reduce the spatial resolution of

 By reducing the spatial resolution of the feature map,

 the subsampling layers reduce the spatial resolution of

 The weight sharing is also applied in subsampling

 the weight sharing is also applied in subsampling layers

 C1,C3,C5 : Convolutional layer.

 All the units of the layers up to F6 have a sigmoidal acti

 About 187,000 connection.

 Database : MNIST (60,000 handwritten digits )

 “This paper” refer to reference[ 3] on references slide.

 At runtime the convolution operations are

 CNN’s are about 3X slower than their fully connected

 Small kernel size

 Cash unfriendly memory access

 A Convolutional neural network achieves 99.26%

 MNIST database : Consist of 60,000 hand written digits

[1].Y. LeCun and Y. Bengio.“Convolutional networks for image

[2].Fabien Lauer, ChingY. Suen, Gérard Bloch,”A trainable

[3].Patrice Y. Simard, Dave Steinkraus, John Platt, "Best

You might also like