Jimbo Project-Handwriting Recognition Using An Arificial Neural Network

Jimbo Project: Handwriting recognition using an Articial Neural Network
Claudio Martella p.num. 810807-P112 claudio.martella@acaro.org Martin Chlupac p.num. 810928-P272 mail@fredfred.net 1st June 2005
Abstract This work deals with recognition of isolated hand-written characters using an artical neural network. The characters are written on a regular sheet of paper using a special pen that produces biometric signals which are then analysed by a computer. In this document we describe our research and tests performed on several MLP and RBF network architectures. For each of these, a solution is found and it is compared to the current one based on K-means.
Introduction
Neural networks allow for high generalisation ability and do not require deep background knowledge and formalisation to be able to solve the problem. For these reasons and considering the high dimensionality of the input space we tried this approach to determine whether a neural network could do better than a distance vector based problem. The pen consists of two pairs of mechanical sensors that measure the horizontal and vertical movements of the ballpoint nib, and a pressure sensor that is placed in the top of the pen. The pen produces a total of three signals. Two signals correspond to the horizontal and vertical accelerations of the pen, and the remaining one corresponds to the pressure sensor. These signals are processed by a computer system. 1
Description
The signals produced by the pen are ltered, normalised and saved in text les. Each of these signals is divided into 7 overlapping segments. Each segment is analised and a vector of eight features is extracted. These feature vectors are then saved to SNNS pattern les in different sets. The following picture shows the unltered and ltered data in three dimensions.
The feature vector for one segment is described as follows: a; q;

max ; min ;
1 4
1 2
3 4
a ... approximated line coefcient a, q ... aprroximated line coefcient q,

max min
... maximum difference between original and approximated value, ... minimum difference between original and approximated value,
... average difference between original and approximated value,

1 4 1 2 3 4
... difference between original and approximated value in ... difference between original and approximated value in ... difference between original and approximated value in
1 4 1 2 3 4
of segment, of segment, of segment.
The experiment involved 20 volunteers who wrote 100 characters each, (all digits repeated 10 times), which resulted in 2000 patterns. We divided the data into two sets of equal size. We noticed two possible methods of separating our data into training and test sets: by dividing the number of people, having all data from 10 different persons, or by dividing the examples, giving 5 samples per number for every person. We decided to test whether the features of different people are more important than the different features of the different characters written by the same person. We achieved better results for the test with half the number of the samples from all people. Every pattern was composed of 168 real numbers as a sequence of the X, Y and Z values from all the segments. The output pattern was a binary string made of zeros except for the n th element, if the pattern represented number n.
Network Design
We tried different types of network, each of them was trying to solve some problems introduced previously. All these networks have 168 input nodes and 10 output nodes, but they differ in the number of hidden layers, hidden nodes, and links between input and hidden layers.
The rst network connected the data of each segments (the 8 values in the feature vector) to one node in the hidden layer (21 nodes). A second hidden layer of 7 nodes represented 7 segments and provided a correlation between X, Y and Z signals. This network yielded successful recognition of 70% on the validation set.
The second network tried to correlate data between the different axis, so every node in the hidden layer was connected to three input nodes, which described the same feature on a different axis for the same segment. For example one node could be connected to the rst feature of X, Y, Z of the rst segment. This network had more hidden nodes (56) but no second layer. We did not notice any substantial difference in the results.
The third network tried to correlate the feature values inside the same segment with the same feature from all segments. This was done by dividing the hidden layer in two sets: feature set: each node in this set is connected to eight input nodes representing all the features of one segment. segment set: each node in this set is connected to seven input nodes each of them representing the same feature but in different segments. 5
This network worked better allowing us to succesfully recognise 77% patterns in the validation set.
The fourth network was a sort of hybrid of the best features of the previous nets. Because there was no correlation between the X, Y and Z axis in the third network, we added two more sets of hidden nodes. The rst set connected the feature sets and the second connected the segment sets. This network allowed us to reach 81% successful recognitions. Analysing the results, we realised that some patterns were systematically unclassied, where all the output values where close to zero. This hinted us about a possible inability of the network to represent the problem completely. This pushed us to a specic test, described in the next section, and to the design of the next network.
The fth network in fact had an increased number of hidden nodes (20 more nodes), divided into two groups, fully connecting all the feature sets and all the segment sets. This new capability pushed the number of succesfull recognitions to 85.1% on the validation set. This is the actual state-of-the-art network of the Jimbo Project. We also tried an RBF approach, with different network designs, but we could not get any better result than 21% which might be justied by just the network initialization. No improvement was reached via learning.
Tests
As a rst test we had to see if our network had enough capability to solve the problem, so we overtrained it until we could reach an SSE close to zero.
This was possible after 700 epochs with Backprop with momentum showing us that we had enough hidden nodes. We would like to emphasise that the Standard Backprop and RProp were not able to reach such a result. So we trained our network trying different learning rules and paramenters and we got our best results with Backprop-Momentum, given: = 0.01, = 0.6, c = 0.1, dmax = 0.1, 700 epochs. This is the Error graph for our training and validation set:
Our results can be described by the output of one the scripts used for analysis:
hora had 12 errors krav had 3 errors blaz had 7 errors chme had 15 errors bart had 13 errors hyne had 3 errors barta had 10 errors fris had 8 errors cerm had 4 errors fise had 19 errors kriz had 9 errors cimp had 9 errors holi had 3 errors jako had 9 errors krat had 9 errors habe had 4 errors bern had 4 errors chlu had 6 errors kost had 1 error ---[ Unsuccesfull recognitions stats:]--0 1 2 3 4 5 6 7 8 9 not not not not not not not not not not recognised recognised recognised recognised recognised recognised recognised recognised recognised recognised 18 times 4 times 11 times 8 times 13 times 5 times 33 times 19 times 20 times 18 times
We had 851 succesfull recognitions on 1000 patterns 85.100000 We had 68 uncertain right and 56 unclassified
The rst part explains how many patterns where misclassied for every person while the second one tells us the same information but about the number to be classied. In the end we show the rate of succesfully classied patterns. To decide if a pattern was uncertain right (the highest output in the right position) or unclassied (the highest value in the wrong position) we checked if the output was lower than 0.5. We have also tried to reduce the input space dimensionality by reducing the number of segments from seven to three. This reduction allowed us to test the network with an input layer of 72 nodes but this network was not able to succesfully recognise more than 74% of the validation set. This stopped us from continuing any further in this direction.
Conclusion
What can be seen from this image, which compares on the left our best recognised volunteer Kost with the worst Fise on the right, is that the characters do not differ from each other too much, but the results show that discriminants exist. We might nd one in the Z axis that represents the pressure. This also hints us where to nd other improvements in the network and the feature extraction. Another hint to this problem is the number of unclassied patterns. In fact, if we could classify even just 80% of these patterns, we could get close to 93% of successful recognitions achieved by Marek Musil with an ad-hoc K-Means based solution. Considering the number of weights, increasing the number of patterns in the training set might help as well (Data we did not have access to). Considering that both RBF and K-Means use codebook vectors, we would have expected the RBF approach to yield good results, but we think this inability is caused by the high dimensionality of the input space.
10
Acknowledgements
We would like to thank Marek Musil who provided the data and the ltering engine. We would also like to say thanks to Jimbo Jones for the inspiration.
References
Marek Musil 2004: Diplomova prace: Hybridni metody extrakce priznaku z biometrickych signalu (Master thesis: Hybrid methods for feature extraction from biometric signals). Olle Gaellmo, Jim Holmstroem 2005: Handouts to Articial Neural Networks Course. Andries P. Engelbrecht 2002: Computational Intelligence. An Introduction.
Appendix
The list of wrong patterns follows:
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # <\bart_d0_07> 52 <\bart_d0_08> 53 <\bart_d1_06> 56 <\bart_d3_06> 66 <\bart_d3_07> 67 <\bart_d3_10> 70 <\bart_d4_06> 71 <\bart_d6_10> 85 <\bart_d7_07> 87 <\bart_d7_08> 88 <\bart_d7_09> 89 <\bart_d7_10> 90 <\bart_d8_06> 91 <\barta_d2_08> 113 <\barta_d3_06> 116 <\barta_d3_08> 118 <\barta_d6_07> 132 <\barta_d6_08> 133 <\barta_d6_09> 134 <\barta_d6_10> 135 <\barta_d7_06> 136 <\barta_d9_07> 147 <\barta_d9_09> 149 <\bern_d6_09> 184 <\bern_d6_10> 185 <\bern_d7_06> 186 <\bern_d8_08> 193 <\blaz_d6_08> 233 <\blaz_d7_07> 237 <\blaz_d7_08> 238 <\blaz_d8_06> 241 <\blaz_d8_07> 242
11
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
<\blaz_d9_06> <\blaz_d9_08> <\cimp_d0_07> <\cimp_d2_06> <\cimp_d2_09> <\cimp_d6_08> <\cimp_d6_09> <\cimp_d6_10> <\cimp_d7_06> <\cimp_d8_08> <\cimp_d9_08> <\cerm_d6_09> <\cerm_d7_06> <\cerm_d8_06> <\cerm_d9_06> <\fise_d0_07> <\fise_d0_08> <\fise_d1_08> <\fise_d1_10> <\fise_d2_06> <\fise_d2_07> <\fise_d2_08> <\fise_d2_09> <\fise_d2_10> <\fise_d3_06> <\fise_d4_09> <\fise_d6_10> <\fise_d7_06> <\fise_d7_09> <\fise_d8_06> <\fise_d8_08> <\fise_d9_06> <\fise_d9_07> <\fise_d9_08> <\fris_d0_06> <\fris_d4_10> <\fris_d6_07> <\fris_d6_09> <\fris_d7_08> <\fris_d8_08> <\fris_d8_10> <\fris_d9_10> <\habe_d0_10> <\habe_d6_08> <\habe_d6_10> <\habe_d8_10> <\holi_d0_09> <\holi_d5_08> <\holi_d9_08> <\hora_d0_06> <\hora_d0_08> <\hora_d1_06> <\hora_d4_08> <\hora_d4_09> <\hora_d4_10> <\hora_d6_07> <\hora_d6_08> <\hora_d8_09> <\hora_d8_10> <\hora_d9_08> <\hora_d9_09> <\hyne_d0_06>
246 248 252 261 264 283 284 285 286 293 298 334 336 341 346 352 353 358 360 361 362 363 364 365 366 374 385 386 389 391 393 396 397 398 401 425 432 434 438 443 445 450 455 483 485 495 504 528 548 551 553 556 573 574 575 582 583 594 595 598 599 601
12
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
<\hyne_d5_08> <\hyne_d6_08> <\chlu_d0_10> <\chlu_d4_09> <\chlu_d6_06> <\chlu_d7_08> <\chlu_d8_10> <\chlu_d9_06> <\chme_d0_06> <\chme_d2_09> <\chme_d3_08> <\chme_d3_09> <\chme_d4_07> <\chme_d4_09> <\chme_d4_10> <\chme_d5_06> <\chme_d5_09> <\chme_d6_07> <\chme_d7_07> <\chme_d7_08> <\chme_d7_09> <\chme_d8_09> <\chme_d9_09> <\jako_d0_09> <\jako_d4_08> <\jako_d5_10> <\jako_d6_07> <\jako_d6_08> <\jako_d6_09> <\jako_d7_06> <\jako_d7_10> <\jako_d8_10> <\kost_d4_08> <\krat_d0_07> <\krat_d0_08> <\krat_d0_09> <\krat_d0_10> <\krat_d1_06> <\krat_d3_09> <\krat_d4_10> <\krat_d6_07> <\krat_d6_09> <\krav_d0_10> <\krav_d4_06> <\krav_d7_08> <\kriz_d0_06> <\kriz_d2_09> <\kriz_d7_06> <\kriz_d7_10> <\kriz_d8_08> <\kriz_d8_09> <\kriz_d8_10> <\kriz_d9_06> <\kriz_d9_09>
628 633 655 674 681 688 695 696 701 714 718 719 722 724 725 726 729 732 737 738 739 744 749 754 773 780 782 783 784 786 790 795 823 852 853 854 855 856 869 875 882 884 905 921 938 951 964 986 990 993 994 995 996 999
13

Jimbo Project-Handwriting Recognition Using An Arificial Neural Network

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Jimbo Project-Handwriting Recognition Using An Arificial Neural Network

Uploaded by

Copyright:

Available Formats

Jimbo Project: Handwriting recognition using an Articial Neural Network

The feature vector for one segment is described as follows: a; q;

a ... approximated line coefcient a, q ... aprroximated line coefcient q,

... average difference between original and approximated value,

of segment, of segment, of segment.

You might also like