Computer Vision: Principles, Algorithms, Applications, Learning
By E. R. Davies
5/5
()
About this ebook
Computer Vision: Principles, Algorithms, Applications, Learning (previously entitled Computer and Machine Vision) clearly and systematically presents the basic methodology of computer vision, covering the essential elements of the theory while emphasizing algorithmic and practical design constraints. This fully revised fifth edition has brought in more of the concepts and applications of computer vision, making it a very comprehensive and up-to-date text suitable for undergraduate and graduate students, researchers and R&D engineers working in this vibrant subject.
See an interview with the author explaining his approach to teaching and learning computer vision - http://scitechconnect.elsevier.com/computer-vision/
- Three new chapters on Machine Learning emphasise the way the subject has been developing; Two chapters cover Basic Classification Concepts and Probabilistic Models; and the The third covers the principles of Deep Learning Networks and shows their impact on computer vision, reflected in a new chapter Face Detection and Recognition.
- A new chapter on Object Segmentation and Shape Models reflects the methodology of machine learning and gives practical demonstrations of its application.
- In-depth discussions have been included on geometric transformations, the EM algorithm, boosting, semantic segmentation, face frontalisation, RNNs and other key topics.
- Examples and applications—including the location of biscuits, foreign bodies, faces, eyes, road lanes, surveillance, vehicles and pedestrians—give the ‘ins and outs’ of developing real-world vision systems, showing the realities of practical implementation.
- Necessary mathematics and essential theory are made approachable by careful explanations and well-illustrated examples.
- The ‘recent developments’ sections included in each chapter aim to bring students and practitioners up to date with this fast-moving subject.
- Tailored programming examples—code, methods, illustrations, tasks, hints and solutions (mainly involving MATLAB and C++)
E. R. Davies
Roy Davies is Emeritus Professor of Machine Vision at Royal Holloway, University of London. He has worked on many aspects of vision, from feature detection to robust, real-time implementations of practical vision tasks. His interests include automated visual inspection, surveillance, vehicle guidance, crime detection and neural networks. He has published more than 200 papers, and three books. Machine Vision: Theory, Algorithms, Practicalities (1990) has been widely used internationally for more than 25 years, and is now out in this much enhanced fifth edition. Roy holds a DSc at the University of London, and has been awarded Distinguished Fellow of the British Machine Vision Association, and Fellow of the International Association of Pattern Recognition.
Related to Computer Vision
Related ebooks
Pattern Recognition Rating: 4 out of 5 stars4/5Machine Learning: A Bayesian and Optimization Perspective Rating: 3 out of 5 stars3/5Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp Rating: 4 out of 5 stars4/5Introduction to Dynamic Programming: International Series in Modern Applied Mathematics and Computer Science, Volume 1 Rating: 0 out of 5 stars0 ratingsThe Essential Guide to Image Processing Rating: 5 out of 5 stars5/5Deep Learning with Keras Rating: 5 out of 5 stars5/5Convolutional Neural Networks in Python: Beginner's Guide to Convolutional Neural Networks in Python Rating: 0 out of 5 stars0 ratingsDeep Learning with TensorFlow Rating: 5 out of 5 stars5/5Machine Learning: Adaptive Behaviour Through Experience: Thinking Machines Rating: 4 out of 5 stars4/5Principles of Artificial Intelligence Rating: 3 out of 5 stars3/5Feature Extraction and Image Processing for Computer Vision Rating: 4 out of 5 stars4/5Deep Learning for Vision Systems Rating: 5 out of 5 stars5/5OpenCV: Computer Vision Projects with Python Rating: 0 out of 5 stars0 ratingsPython Deep Learning Rating: 5 out of 5 stars5/5TensorFlow in 1 Day: Make your own Neural Network Rating: 4 out of 5 stars4/5Computer Vision and Image Processing Rating: 5 out of 5 stars5/5Deep Learning with Python Rating: 5 out of 5 stars5/5Advanced Machine Learning with Python Rating: 0 out of 5 stars0 ratingsOpenCV with Python By Example Rating: 5 out of 5 stars5/5Human-in-the-Loop Machine Learning: Active learning and annotation for human-centered AI Rating: 0 out of 5 stars0 ratingsNeural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention Rating: 0 out of 5 stars0 ratingsDeep Learning for Medical Image Analysis Rating: 4 out of 5 stars4/5Building Machine Learning Systems with Python Rating: 4 out of 5 stars4/5
Computers For You
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 5 out of 5 stars5/5Deep Search: How to Explore the Internet More Effectively Rating: 5 out of 5 stars5/5How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally Rating: 4 out of 5 stars4/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Practical Lock Picking: A Physical Penetration Tester's Training Guide Rating: 5 out of 5 stars5/5The Professional Voiceover Handbook: Voiceover training, #1 Rating: 5 out of 5 stars5/5Computer Science: A Concise Introduction Rating: 4 out of 5 stars4/5The Insider's Guide to Technical Writing Rating: 0 out of 5 stars0 ratingsThe Mega Box: The Ultimate Guide to the Best Free Resources on the Internet Rating: 4 out of 5 stars4/5Elon Musk Rating: 4 out of 5 stars4/5Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5Python Machine Learning By Example Rating: 4 out of 5 stars4/5Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad Rating: 0 out of 5 stars0 ratingsChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology Rating: 0 out of 5 stars0 ratingsThe ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology Rating: 0 out of 5 stars0 ratingsThe Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling Rating: 0 out of 5 stars0 ratingsUser Friendly: How the Hidden Rules of Design Are Changing the Way We Live, Work, and Play Rating: 4 out of 5 stars4/5101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters Rating: 4 out of 5 stars4/5The Invisible Rainbow: A History of Electricity and Life Rating: 4 out of 5 stars4/5CompTIA Security+ Practice Questions Rating: 2 out of 5 stars2/5Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Rating: 4 out of 5 stars4/5
Reviews for Computer Vision
1 rating0 reviews
Book preview
Computer Vision - E. R. Davies
Computer Vision
Principles, Algorithms, Applications, Learning
Fifth Edition
E.R. Davies
Royal Holloway, University of London, United Kingdom
Table of Contents
Cover image
Title page
Copyright
Dedication
About the Author
Foreword
Preface to the Fifth Edition
Preface to the First Edition
Acknowledgments
Topics Covered in Application Case Studies
Influences Impinging Upon Integrated Vision System Design
Glossary of Acronyms and Abbreviations
Chapter 1. Vision, the challenge
Abstract
1.1 Introduction—Man and His Senses
1.2 The Nature of Vision
1.3 From Automated Visual Inspection to Surveillance
1.4 What This Book Is About
1.5 The Part Played by Machine Learning
1.6 The Following Chapters
1.7 Bibliographical Notes
Part 1: Low-level vision
Part 1. Low-level vision
Chapter 2. Images and imaging operations
Abstract
2.1 Introduction
2.2 Image Processing Operations
2.3 Convolutions and Point Spread Functions
2.4 Sequential Versus Parallel Operations
2.5 Concluding Remarks
2.6 Bibliographical and Historical Notes
2.7 Problems
Chapter 3. Image filtering and morphology
Abstract
3.1 Introduction
3.2 Noise Suppression by Gaussian Smoothing
3.3 Median Filters
3.4 Mode Filters
3.5 Rank Order Filters
3.6 Sharp–Unsharp Masking
3.7 Shifts Introduced by Median Filters
3.8 Shifts Introduced by Rank Order Filters
3.9 The Role of Filters in Industrial Applications of Vision
3.10 Color in Image Filtering
3.11 Dilation and Erosion in Binary Images
3.12 Mathematical Morphology
3.13 Morphological Grouping
3.14 Morphology in Grayscale Images
3.15 Concluding Remarks
3.16 Bibliographical and Historical Notes
3.17 Problems
Chapter 4. The role of thresholding
Abstract
4.1 Introduction
4.2 Region-Growing Methods
4.3 Thresholding
4.4 Adaptive Thresholding
4.5 More Thoroughgoing Approaches to Threshold Selection
4.6 The Global Valley Approach to Thresholding
4.7 Practical Results Obtained Using the Global Valley Method
4.8 Histogram Concavity Analysis
4.9 Concluding Remarks
4.10 Bibliographical and Historical Notes
4.11 Problems
Chapter 5. Edge detection
Abstract
5.1 Introduction
5.2 Basic Theory of Edge Detection
5.3 The Template Matching Approach
5.4 Theory of 3×3 Template Operators
5.5 The Design of Differential Gradient Operators
5.6 The Concept of a Circular Operator
5.7 Detailed Implementation of Circular Operators
5.8 The Systematic Design of Differential Edge Operators
5.9 Problems With the Above Approach—Some Alternative Schemes
5.10 Hysteresis Thresholding
5.11 The Canny Operator
5.12 The Laplacian Operator
5.13 Concluding Remarks
5.14 Bibliographical and Historical Notes
5.15 Problems
Chapter 6. Corner, interest point, and invariant feature detection
Abstract
6.1 Introduction
6.2 Template Matching
6.3 Second-Order Derivative Schemes
6.4 A Median Filter–based Corner Detector
6.5 The Harris Interest Point Operator
6.6 Corner Orientation
6.7 Local Invariant Feature Detectors and Descriptors
6.8 Concluding Remarks
6.9 Bibliographical and Historical Notes
6.10 Problems
Chapter 7. Texture analysis
Abstract
7.1 Introduction
7.2 Some Basic Approaches to Texture Analysis
7.3 Graylevel Co-occurrence Matrices
7.4 Laws’ Texture Energy Approach
7.5 Ade’s Eigenfilter Approach
7.6 Appraisal of the Laws and Ade approaches
7.7 Concluding Remarks
7.8 Bibliographical and Historical Notes
Part 2: Intermediate-level vision
Part 2. Intermediate-level vision
Chapter 8. Binary shape analysis
Abstract
8.1 Introduction
8.2 Connectedness in Binary Images
8.3 Object Labeling and Counting
8.4 Size Filtering
8.5 Distance Functions and Their Uses
8.6 Skeletons and Thinning
8.7 Other Measures for Shape Recognition
8.8 Boundary Tracking Procedures
8.9 Concluding Remarks
8.10 Bibliographical and Historical Notes
8.11 Problems
Chapter 9. Boundary pattern analysis
Abstract
9.1 Introduction
9.2 Boundary Tracking Procedures
9.3 Centroidal Profiles
9.4 Problems With the Centroidal Profile Approach
9.5 The (s,ψ) Plot
9.6 Tackling the Problems of Occlusion
9.7 Accuracy of Boundary Length Measures
9.8 Concluding Remarks
9.9 Bibliographical and Historical Notes
9.10 Problems
Chapter 10. Line, circle, and ellipse detection
Abstract
10.1 Introduction
10.2 Application of the Hough Transform to Line Detection
10.3 The Foot-of-Normal Method
10.4 Using RANSAC for Straight Line Detection
10.5 Location of Laparoscopic Tools
10.6 Hough-Based Schemes for Circular Object Detection
10.7 The Problem of Unknown Circle Radius
10.8 Overcoming the Speed Problem
10.9 Ellipse Detection
10.10 Human Iris Location
10.11 Concluding Remarks
10.12 Bibliographical and Historical Notes
10.13 Problems
Chapter 11. The generalized Hough transform
Abstract
11.1 Introduction
11.2 The Generalized Hough Transform
11.3 The Relevance of Spatial Matched Filtering
11.4 Gradient Weighting Versus Uniform Weighting
11.5 Use of the GHT for Ellipse Detection
11.6 Comparing the Various Methods for Ellipse Detection
11.7 A Graph-Theoretic Approach to Object Location
11.8 Possibilities for Saving Computation
11.9 Using the GHT for Feature Collation
11.10 Generalizing the Maximal Clique and Other Approaches
11.11 Search
11.12 Concluding Remarks
11.13 Bibliographical and Historical Notes
11.14 Problems
Chapter 12. Object segmentation and shape models
Abstract
12.1 Introduction
12.2 Active Contours
12.3 Practical Results Obtained Using Active Contours
12.4 The Level-Set Approach to Object Segmentation
12.5 Shape Models
12.6 Concluding Remarks
12.7 Bibliographical and Historical Notes
Part 3: Machine learning and deep learning networks
Part 3. Machine learning and deep learning networks
Chapter 13. Basic classification concepts
Abstract
13.1 Introduction
13.2 The Nearest Neighbor Algorithm
13.3 Bayes’ Decision Theory
13.4 Relation of the Nearest Neighbor and Bayes’ Approaches
13.5 The Optimum Number of Features
13.6 Cost Functions and Error–Reject Tradeoff
13.7 Supervised and Unsupervised Learning
13.8 Cluster Analysis
13.9 The Support Vector Machine
13.10 Artificial Neural Networks
13.11 The Back-Propagation Algorithm
13.12 Multilayer Perceptron Architectures
13.13 Overfitting to the Training Data
13.14 Concluding Remarks
13.15 Bibliographical and Historical Notes
13.16 Problems
Chapter 14. Machine learning: Probabilistic methods
Abstract
14.1 Introduction
14.2 Mixtures of Gaussians and the EM Algorithm
14.3 A More General View of the EM Algorithm
14.4 Some Practical Examples
14.5 Principal Components Analysis
14.6 Multiple Classifiers
14.7 The Boosting Approach
14.8 Modeling AdaBoost
14.9 Loss Functions for Boosting
14.10 The LogitBoost Algorithm
14.11 The Effectiveness of Boosting
14.12 Boosting with Multiple Classes
14.13 The Receiver Operating Characteristic
14.14 Concluding Remarks
14.15 Bibliographical and Historical Notes
14.16 Problems
Chapter 15. Deep-learning networks
Abstract
15.1 Introduction
15.2 Convolutional Neural Networks
15.3 Parameters for Defining CNN Architectures
15.4 LeCun et al.’s LeNet Architecture
15.5 Krizhevsky et al.’s AlexNet Architecture
15.6 Zeiler and Fergus’s Work on CNN Architectures
15.7 Zeiler and Fergus’s Visualization Experiments
15.8 Simonyan and Zisserman’s VGGNet Architecture
15.9 Noh et al.’s DeconvNet Architecture
15.10 Badrinarayanan et al.’s SegNet Architecture
15.11 Recurrent Neural Networks
15.12 Concluding Remarks
15.13 Bibliographical and Historical Notes
Part 4: 3D vision and motion
Part 4. 3D vision and motion
Chapter 16. The three-dimensional world
Abstract
16.1 Introduction
16.2 Three-Dimensional Vision—The Variety of Methods
16.3 Projection Schemes for Three-Dimensional Vision
16.4 Shape from Shading
16.5 Photometric Stereo
16.6 The Assumption of Surface Smoothness
16.7 Shape from Texture
16.8 Use of Structured Lighting
16.9 Three-Dimensional Object Recognition Schemes
16.10 Horaud’s Junction Orientation Technique
16.11 An Important Paradigm—Location of Industrial Parts
16.12 Concluding Remarks
16.13 Bibliographical and Historical Notes
16.14 Problems
Chapter 17. Tackling the perspective n-point problem
Abstract
17.1 Introduction
17.2 The Phenomenon of Perspective Inversion
17.3 Ambiguity of Pose Under Weak Perspective Projection
17.4 Obtaining Unique Solutions to the Pose Problem
17.5 Concluding Remarks
17.6 Bibliographical and Historical Notes
17.7 Problems
Chapter 18. Invariants and perspective
Abstract
18.1 Introduction
18.2 Cross Ratios: The Ratio of Ratios
Concept
18.3 Invariants for Noncollinear Points
18.4 Invariants for Points on Conics
18.5 Differential and Semidifferential Invariants
18.6 Symmetric Cross-Ratio Functions
18.7 Vanishing Point Detection
18.8 More on Vanishing Points
18.9 Apparent Centers of Circles and Ellipses
18.10 Perspective Effects in Art and Photography
18.11 Concluding Remarks
18.12 Bibliographical and Historical Notes
18.13 Problems
Chapter 19. Image transformations and camera calibration
Abstract
19.1 Introduction
19.2 Image Transformations
19.3 Camera Calibration
19.4 Intrinsic and Extrinsic Parameters
19.5 Correcting for Radial Distortions
19.6 Multiple View Vision
19.7 Generalized Epipolar Geometry
19.8 The Essential Matrix
19.9 The Fundamental Matrix
19.10 Properties of the Essential and Fundamental Matrices
19.11 Estimating the Fundamental Matrix
19.12 An Update on the Eight-Point Algorithm
19.13 Image Rectification
19.14 3-D Reconstruction
19.15 Concluding Remarks
19.16 Bibliographical and Historical Notes
19.17 Problems
Chapter 20. Motion
Abstract
20.1 Introduction
20.2 Optical Flow
20.3 Interpretation of Optical Flow Fields
20.4 Using Focus of Expansion to Avoid Collision
20.5 Time-to-Adjacency Analysis
20.6 Basic Difficulties with the Optical Flow Model
20.7 Stereo from Motion
20.8 The Kalman Filter
20.9 Wide Baseline Matching
20.10 Concluding Remarks
20.11 Bibliographical and Historical Notes
20.12 Problem
Part 5: Putting computer vision to work
Part 5. Putting computer vision to work
Chapter 21. Face detection and recognition: The impact of deep learning
Abstract
21.1 Introduction
21.2 A Simple Approach to Face Detection
21.3 Facial Feature Detection
21.4 The Viola–Jones Approach to Rapid Face Detection
21.5 The Eigenface Approach to Face Recognition
21.6 More on the Difficulties of Face Recognition
21.7 Frontalization
21.8 The Sun et al. DeepID Face Representation System
21.9 Fast Face Detection Revisited
21.10 The Face as Part of a 3-D Object
21.11 Concluding Remarks
21.12 Bibliographical and Historical Notes
Chapter 22. Surveillance
Abstract
22.1 Introduction
22.2 Surveillance—The Basic Geometry
22.3 Foreground–Background Separation
22.4 Particle Filters
22.5 Use of Color Histograms for Tracking
22.6 Implementation of Particle Filters
22.7 Chamfer Matching, Tracking, and Occlusion
22.8 Combining Views from Multiple Cameras
22.9 Applications to the Monitoring of Traffic Flow
22.10 License Plate Location
22.11 Occlusion Classification for Tracking
22.12 Distinguishing Pedestrians by Their Gait
22.13 Human Gait Analysis
22.14 Model-based Tracking of Animals
22.15 Concluding Remarks
22.16 Bibliographical and Historical Notes
22.17 Problem
Chapter 23. In-vehicle vision systems
Abstract
23.1 Introduction
23.2 Locating the Roadway
23.3 Location of Road Markings
23.4 Location of Road Signs
23.5 Location of Vehicles
23.6 Information Obtained by Viewing License Plates and Other Structural Features
23.7 Locating Pedestrians
23.8 Guidance and Egomotion
23.9 Vehicle Guidance in Agriculture
23.10 Concluding Remarks
23.11 More Detailed Developments and Bibliographies Relating to Advanced Driver Assistance Systems
23.12 Problem
Chapter 24. Epilogue—Perspectives in vision
Abstract
24.1 Introduction
24.2 Parameters of Importance in Machine Vision
24.3 Tradeoffs
24.4 Moore’s Law in Action
24.5 Hardware, Algorithms, and Processes
24.6 The Importance of Choice of Representation
24.7 Past, Present, and Future
24.8 The Deep Learning Explosion
24.9 Bibliographical and Historical Notes
Appendix A. Robust statistics
A.1 Introduction
A.2 Preliminary Definitions and Analysis
A.3 The M-Estimator (Influence Function) Approach
A.4 The Least Median of Squares Approach to Regression
A.5 Overview of the Robustness Problem
A.6 The RANSAC Approach
A.7 Concluding Remarks
A.8 Bibliographical and Historical Notes
A.9 Problems
Appendix B. The sampling theorem
B.1 The Sampling Theorem
Appendix C. The representation of color
C.1 Introduction
C.2 Details of the HSI Color Representation
C.3 A Typical Example of the Use of Color
C.4 Bibliographical and Historical Notes
Appendix D. Sampling from distributions
D.1 Introduction
D.2 The Box–Muller and Related Methods
D.3 Bibliographical and Historical Notes
References
Index
Copyright
Academic Press is an imprint of Elsevier
125 London Wall, London EC2Y 5AS, United Kingdom
525 B Street, Suite 1800, San Diego, CA 92101-4495, United States
50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom
Copyright © 2018 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library
Library of Congress Cataloging-in-Publication Data
A catalog record for this book is available from the Library of Congress
ISBN: 978-0-12-809284-2
For Information on all Academic Press publications visit our website at https://www.elsevier.com/books-and-journals
Publisher: Mara Conner
Acquisition Editor: Tim Pitts
Editorial Project Manager: Charlotte Kent
Production Project Manager: Sruthi Satheesh
Cover Designer: Greg Harris
Typeset by MPS Limited, Chennai, India
Dedication
This book is dedicated to my family.
To my late mother, Mary Davies, to record her never-failing love and devotion.
To my late father, Arthur Granville Davies, who passed on to me his appreciation of the beauties of mathematics and science.
To my wife, Joan, for love, patience, support, and inspiration.
To my children, Elizabeth, Sarah, and Marion, the music in my life.
To my grandchildren, Jasper, Jerome, Eva, and Tara, for constantly reminding me of the carefree joys of youth!
About the Author
Roy Davies is Emeritus Professor of Machine Vision at Royal Holloway, University of London, United Kingdom. He has worked on many aspects of vision, from feature detection and noise suppression to robust pattern matching and real-time implementations of practical vision tasks. His interests include automated visual inspection, surveillance, vehicle guidance, and crime detection. He has published more than 200 papers and three books—Machine Vision: Theory, Algorithms, Practicalities (1990), Electronics, Noise and Signal Recovery (1993), and Image Processing for the Food Industry (2000); the first of these has been widely used internationally for more than 25 years, and is now out in this much enhanced fifth edition. Roy is a fellow of the IoP and the IET, and a senior member of the IEEE. He is on the Editorial Boards of Pattern Recognition Letters, Real-Time Image Processing, Imaging Science, and IET Image Processing. He holds a DSc from the University of London, he was awarded BMVA Distinguished Fellow in 2005, and Fellow of the International Association of Pattern Recognition in 2008.
Foreword
Mark S. Nixon, University of Southampton, Southampton, United Kingdom
It is an honor to write a foreword for Roy Davies’ new edition of Computer and Machine Vision, now entitled Computer Vision: Principles, Algorithms, Applications, Learning. This is one of the major books in Computer Vision and not just for its longevity, having now reached its Fifth Edition. It is actually a splendid achievement to reach this status and it reflects not only on the tenacity and commitment of its author, but also on the achievements of the book itself.
Computer Vision has shown awesome progress in its short history. This is part due to technology: computers are much faster and memory is now much cheaper than they were in the early days when Roy started his research. There have been many achievements and many developments. All of this can affect the evolution of a textbook. There have been excellent textbooks in the past, which were neither continued nor maintained. That has been avoided here as the textbook has continued to mature with the field and its many developments.
We can look forward to a future where automated computer vision systems will make our lives easier while enriching them too. There are already many applications of Computer Vision in the food industry and robotic cars that will be with us very soon. Then there are continuing advancements in medical image analysis, where Computer Vision techniques can be used to aid in diagnosis and therapy by automated means. Even accessing a mobile phone is considerably more convenient when using a fingerprint and access by face recognition continues to improve. These have all come about due to advancements in computers, Computer Vision, and applied artificial intelligence.
Adherents of Computer Vision will know it to be an exciting field indeed. It manages to cover many aspects of technology from human vision to machine learning requiring electronic hardware, computer implementations, and a lot of computer software. Roy continues to cover these in excellent detail.
I remember the First Edition when it was first published in 1990 with its unique and pragmatic blend of theory, implementation, and algorithms. I am pleased to see that the Fifth Edition maintains this unique approach, much appreciated by students in previous editions who wanted an accessible introduction to Computer Vision. It has certainly increased in size with age, and that is often the way with books. It is most certainly the way with Computer Vision since many of its researchers continue to improve, refine, and develop new techniques.
A major change here is the inclusion of Deep Learning. Indeed, this has been a major change in the field of Computer Vision and Pattern Recognition. One implication of the increase in computing power and the reduction of memory cost is that techniques can become considerably more complex, and that complexity lends itself to application in the analysis of big data.
One cannot ignore the performance of deep learning and convolutional neural networks: one only has to peruse the program of top international conferences to perceive their revolutionary effect on research direction. Naturally, it is early days but it is good to have guidance as we have here. The nature of performance is always in question in any system in artificial intelligence and part of the way to answer those questions is to consider more deeply the architectures and their basis. That again is the function of a textbook for it is the distillation of research and practice in a ratiocinated exposition. It is a brave move to include Deep Learning in this edition, but a necessary one.
And what of Roy Davies himself? Following his DPhil in Solid State Physics at Oxford, he later developed a new sensitive method in Nuclear Resonance called Davies-ENDOR
(Electron and Nuclear Double Resonance) which avoided the blind spots of its predecessor Mims-ENDOR.
In 1970 he was appointed as a lecturer at Royal Holloway and a long series of publications in pattern recognition and its applications led to the award of his Personal Chair, his DSc and then the Distinguished Fellow of the British Machine Vision Association (BMVA), 2005. He has served the BMVA in many ways, latterly editing its Newsletter. Clearly the level of his work and his many contacts and papers have contributed much to the material that is found herein.
I look forward to having this Fifth Edition sitting proudly in my shelf, replacing the Fourth that will in turn pass to one of my student’s shelves. It will not stop there for long for it is one of the textbooks I often turn to for the information I need. Unlike the snapshots to be found on the Web, in a textbook I find it placed in context and in sequence and with extension to other material. That is the function of a textbook and it will be well served by this Fifth Edition.
July 2017
Preface to the Fifth Edition
Roy Davies, Royal Holloway, University of London, United Kingdom
The first edition of this book came out in 1990, and was welcomed by many researchers and practitioners. However, in the subsequent two decades the subject moved on at a rapidly accelerating rate, and many topics that hardly deserved a mention in the first edition had to be solidly incorporated into subsequent editions. For example, it seemed particularly important to bring in significant amounts of new material on feature detection, mathematical morphology, texture analysis, inspection, artificial neural networks, 3D vision, invariance, motion analysis, object tracking, and robust statistics. And in the fourth edition, cognizance had to be taken of the widening range of applications of the subject: in particular, two chapters had to be added on surveillance and in-vehicle vision systems. Since then, the subject has not stood still. In fact, the past four or five years have seen the onset of an explosive growth in research on deep neural networks, and the practical achievements resulting from this have been little short of staggering. It soon became abundantly clear that the fifth edition would have to reflect this radical departure—both in fundamental explanation and in practical coverage. Indeed, it necessitated a new part in the book—Part 3, Machine Learning and Deep Learning Networks—a heading which affirms that the new content reflects not only Deep Learning
(a huge enhancement over the older Artificial Neural Networks
) but also an approach to pattern recognition that is based on rigorous probabilistic methodology.
All this is not achieved without presentation problems: for probabilistic methodology can only be managed properly within a rather severe mathematical environment. Too little maths, and the subject could be so watered down as to be virtually content-free: too much maths, and many readers might not be able to follow the explanations. Clearly, one should not protect readers from the (mathematical) reality of the situation. Hence, Chapter 14 had to be written in such a way as to demonstrate in full what type of methodology is involved, while providing paths that would take readers past some of the mathematical complexities—at least, on first encounter. Once past the relatively taxing Chapter 14, Chapters 15 and 21 take the reader through two accounts consisting largely of case studies, the former through a crucial development period (2012–2015) for deep learning networks, and the latter through a similar period (2013–2016) during which deep learning was targeted strongly at face detection and recognition, enabling remarkable advances to be made. It should not go unnoticed that these additions have so influenced the content of the book that the title had to be modified to reflect them. Interestingly, the organization of the book was further modified by collecting three applications chapters into the new Part 5, Putting Computer Vision to Work.
It is worth remarking that, at this point in time, computer vision has attained a level of maturity that has made it substantially more rigorous, reliable, generic, and—in the light of the improved hardware facilities now available for its implementation (in particular, extremely powerful GPUs)—capable of real-time performance. This means that workers are more than ever before using it in serious applications, and with fewer practical difficulties. It is intended that this edition of the book will reflect this radically new and exciting state of affairs at a fundamental level.
A typical final-year undergraduate course on vision for Electronic Engineering and Computer Science students might include much of the work of Chapters 1–13 and Chapter 16, plus a selection of sections from other chapters, according to requirements. For MSc or PhD research students, a suitable lecture course might go on to cover Parts 3 or 4 in depth, and several of the chapters in Part 5, with many practical exercises being undertaken on image analysis systems. (The importance of the appendix on robust statistics should not be underestimated once one gets onto serious work, though this will probably be outside the restrictive environment of an undergraduate syllabus.) Here much will depend on the research programme being undertaken by each individual student. At this stage the text may have to be used more as a handbook for research, and indeed, one of the prime aims of the volume is to act as a handbook for the researcher and practitioner in this important area.
As mentioned in the original Preface, this book leans heavily on experience I have gained from working with postgraduate students: in particular, I would like to express my gratitude to Mark Edmonds, Simon Barker, Daniel Celano, Darrel Greenhill, Derek Charles, Mark Sugrue, and Georgios Mastorakis, all of whom have in their own ways helped to shape my view of the subject. In addition, it is a pleasure to recall very many rewarding discussions with my colleagues Barry Cook, Zahid Hussain, Ian Hannah, Dev Patel, David Mason, Mark Bateman, Tieying Lu, Adrian Johnstone, and Piers Plummer, the last two of whom were particularly prolific in generating hardware systems for implementing my research group’s vision algorithms. Next, I would like to record my thanks to my British Machine Vision Association colleagues for many wide-ranging discussions on the nature of the subject: in particular, I am hugely grateful to Majid Mirmehdi, Adrian Clark, Neil Thacker, and Mark Nixon, who, over time, have strongly influenced the development of the book and left a permanent mark on it. Next, I would like to thank the anonymous reviewers for making insightful comments and what have turned out to be extremely valuable suggestions. Finally, I am indebted to Tim Pitts of Elsevier Science for his help and encouragement, without which this fifth edition might never have been completed.
Supporting materials:
Elsevier’s website for the book contains programming and other resources to help readers and students using this text. Please check the publisher’s website for further information: https://www.elsevier.com/books-and-journals/book-companion/9780128092842.
Preface to the First Edition
Over the past 30 years or so, machine vision has evolved into a mature subject embracing many topics and applications: these range from automatic (robot) assembly to automatic vehicle guidance, from automatic interpretation of documents to verification of signatures, and from analysis of remotely sensed images to checking of fingerprints and human blood cells; currently, automated visual inspection is undergoing very substantial growth, necessary improvements in quality, safety, and cost-effectiveness being the stimulating factors. With so much ongoing activity, it has become a difficult business for the professional to keep up with the subject and with relevant methodologies: in particular, it is difficult for them to distinguish accidental developments from genuine advances. It is the purpose of this book to provide background in this area.
The book was shaped over a period of 10–12 years, through material I have given on undergraduate and postgraduate courses at London University, and contributions to various industrial courses and seminars. At the same time, my own investigations coupled with experience gained while supervising PhD and postdoctoral researchers helped to form the state of mind and knowledge that is now set out here. Certainly it is true to say that if I had had this book 8, 6, 4, or even 2 years ago, it would have been of inestimable value to myself for solving practical problems in machine vision. It is therefore my hope that it will now be of use to others in the same way. Of course, it has tended to follow an emphasis that is my own—and in particular one view of one path towards solving automated visual inspection and other problems associated with the application of vision in industry. At the same time, although there is a specialism here, great care has been taken to bring out general principles—including many applying throughout the field of image analysis. The reader will note the universality of topics such as noise suppression, edge detection, principles of illumination, feature recognition, Bayes’ theory, and (nowadays) Hough transforms. However, the generalities lie deeper than this. The book has aimed to make some general observations and messages about the limitations, constraints, and tradeoffs to which vision algorithms are subject. Thus there are themes about the effects of noise, occlusion, distortion, and the need for built-in forms of robustness (as distinct from less successful ad hoc varieties and those added on as an afterthought); there are also themes about accuracy, systematic design, and the matching of algorithms and architectures. Finally, there are the problems of setting up lighting schemes which must be addressed in complete systems, yet which receive scant attention in most books on image processing and analysis. These remarks will indicate that the text is intended to be read at various levels—a factor that should make it of more lasting value than might initially be supposed from a quick perusal of the contents.
Of course, writing a text such as this presents a great difficulty in that it is necessary to be highly selective: space simply does not allow everything in a subject of this nature and maturity to be dealt with adequately between two covers. One solution might be to dash rapidly through the whole area mentioning everything that comes to mind, but leaving the reader unable to understand anything in detail or to achieve anything having read the book. However, in a practical subject of this nature this seemed to me a rather worthless extreme. It is just possible that the emphasis has now veered too much in the opposite direction, by coming down to practicalities (detailed algorithms, details of lighting schemes, and so on): individual readers will have to judge this for themselves. On the other hand, an author has to be true to himself and my view is that it is better for a reader or student to have mastered a coherent series of topics than to have a mishmash of information that he is later unable to recall with any accuracy. This, then, is my justification for presenting this particular material in this particular way and for reluctantly omitting from detailed discussion such important topics as texture analysis, relaxation methods, motion, and optical flow.
As for the organization of the material, I have tried to make the early part of the book lead into the subject gently, giving enough detailed algorithms (especially in Chapter 2: Images and imaging operations and Chapter 6: Corner, interest point, and invariant feature detection) to provide a sound feel for the subject—including especially vital, and in their own way quite intricate, topics such as connectedness in binary images. Hence Part I provides the lead-in, although it is not always trivial material and indeed some of the latest research ideas have been brought in (e.g., on thresholding techniques and edge detection). Part II gives much of the meat of the book. Indeed, the (book) literature of the subject currently has a significant gap in the area of intermediate-level vision; while high-level vision (AI) topics have long caught the researcher’s imagination, intermediate-level vision has its own difficulties which are currently being solved with great success (note that the Hough transform, originally developed in 1962, and by many thought to be a very specialist topic of rather esoteric interest, is arguably only now coming into its own). Part II and the early chapters of Part III aim to make this clear, while Part IV gives reasons why this particular transform has become so useful. As a whole, Part III aims to demonstrate some of the practical applications of the basic work covered earlier in the book, and to discuss some of the principles underlying implementation: it is here that chapters on lighting and hardware systems will be found. As there is a limit to what can be covered in the space available, there is a corresponding emphasis on the theory underpinning practicalities. Probably this is a vital feature, since there are many applications of vision both in industry and elsewhere, yet listing them and their intricacies risks dwelling on interminable detail, which some might find insipid; furthermore, detail has a tendency to date rather rapidly. Although the book could not cover 3D vision in full (this topic would easily consume a whole volume in its own right), a careful overview of this complex mathematical and highly important subject seemed vital. It is therefore no accident that Chapter 16, The three-dimensional world, is the longest in the book. Finally, Part IV asks questions about the limitations and constraints of vision algorithms and answers them by drawing on information and experience from earlier chapters. It is tempting to call the last chapter the Conclusion. However, in such a dynamic subject area any such temptation has to be resisted, although it has still been possible to draw a good number of lessons on the nature and current state of the subject. Clearly, this chapter presents a personal view but I hope it is one that readers will find interesting and useful.
Acknowledgments
The author would like to credit the following sources for permission to reproduce tables, figures, and extracts of text from earlier publications:
Elsevier
For permission to reprint portions of the following papers from Image and Vision Computing as text in Chapter 5; as Tables 5.1–5.5; and as Figs. 3.31, 5.2:
Davies (1984b, 1987b)
For permissiovon to reprint portions of the following paper from Pattern Recognition as text in Chapter 8; and as Fig. 8.11:
Davies and Plummer (1981)
For permission to reprint portions of the following papers from Pattern Recognition Letters as text in Chapters 3, 5, 10, 11, 13; as Tables 3.2; 10.4; 11.1; and as Figs. 3.6, 3.8, 3.10, 5.1, 5.3, 10.1, 10.10, 10.11, 10.12, 10.13, 11.1,11.3, 11.4, 11.5, 11.6, 11.7, 11.8, 11.9, 11.10, 11.11:
Davies (1986, 1987a,c,d, 1988b,c,e, 1989a)
For permission to reprint portions of the following paper from Signal Processing as text in Chapter 3; and as Fig. 3.15, 3.17, 3.18, 3.19, 3.20:
Davies (1989b)
For permission to reprint portions of the following paper from Advances in Imaging and Electron Physics as text in Chapter 3:
Davies (2003c)
For permission to reprint portions of the following article from Encyclopedia of Physical Science and Technology as Figs. 8.9, 8.12, 9.1, 9.4:
Davies, E.R., 1987. Visual inspection, automatic (robotics). In: Meyers, R.A. (Ed.) Encyclopedia of Physical Science and Technology, vol. 14. Academic Press, San Diego, pp. 360–377.
IEEE
For permission to reprint portions of the following paper as text in Chapter 3; and as Figs. 3.4, 3.5, 3.7, 3.11:
Davies (1984a)
IET
For permission to reprint portions of the following papers from the IET Proceedings and Colloquium Digests as text in Chapters 3, 4, 6, 13, 21, 22, 23; as Tables 3.3, 4.2; and as Fig. 3.21, 3.28, 3.29, 4.6, 4.7, 4.8, 4.9, 4.10, 6.5, 6.6, 6.7, 6.8, 6.9, 6.12, 11.20, 14.16, 14.17, 22.16, 22.17, 22.18, 23.1, 23.3, 23.4:
Davies (1988a, 1999c, 2000a, 2005, 2008)
Sugrue and Davies (2007)
Mastorakis and Davies (2011)
Davies et al. (1998)
Davies et al. (2003)
IFS Publications Ltd
For permission to reprint portions of the following paper as text in Chapters 12, 20; and as Figs. 10.7, 10.8:
Davies (1984c)
The Royal Photographic Society
For permission to reprint portions of the following papers (see also the Maney website: www.maney.co.uk/journals/ims) as text in Chapter 3; and as Fig. 3.12, 3.13, 3.22, 3.23, 3.24:
Davies (2000c)
Charles and Davies (2004)
Springer-Verlag
For permission to reprint portions of the following papers as text in Chapter 6; and as Figs. 6.2, 6.4:
Davies (1988d), Figs. 1–3
World Scientific
For permission to reprint portions of the following book as text in Chapters 7, 22, 23; and as Fig. 3.25, 3.26, 3.27, 5.4, 22.20, 23.15, 23.16:
Davies, 2000. Image Processing for the Food Industry. World Scientific, Singapore.
The Committee of the Alvey Vision Club
To acknowledge that extracts of text in Chapter 11 and Figs. 11.12, 11.13, 11.17 were first published in the Proceedings of the 4th Alvey Vision Conference:
Davies, E.R., 1988. An alternative to graph matching for locating objects from their salient features. In: Proceedings of 4th Alvey Vision Conference, Manchester, 31 August–2 September, pp. 281–286.
F.H. Sumner
For permission to reprint portions of the following article from State of the Art Report: Supercomputer Systems Technology as text in Chapter 8; and as Fig. 8.4:
Davies, E.R., 1982. Image processing. In: Sumner, F.H. (Ed.), State of the Art Report: Supercomputer Systems Technology. Pergamon Infotech, Maidenhead, pp. 223–244.
Royal Holloway, University of London
For permission to reprint extracts from the following examination questions, originally written by E.R. Davies:
EL385/97/2; EL333/98/2; EL333/99/2, 3, 5, 6; EL333/01/2, 4–6; PH5330/98/3, 5; PH5330/03/1–5; PH4760/04/1–5.
University of London
For permission to reprint extracts from the following examination questions, originally written by E.R. Davies:
PH385/92/2, 3; PH385/93/1–3; PH385/94/1–4; PH385/95/4; PH385/96/3, 6; PH433/94/3, 5; PH433/96/2, 5.
Collectors of publicly available image databases and utilities
To acknowledge use of the following image databases and utilities for generating a number of images presented in Chapters 15 and 21:
The Cambridge semantic segmentation online demo
The images in Fig. 15.14 were processed using the online demo available from the University of Cambridge, UK (see Badrinarayanan et al., 2015) at
http://mi.eng.cam.ac.uk/projects/segnet/ (website accessed 07.10.16).
The CMU image dataset
The newsradio
image used to obtain Fig. 21.6 was taken from Test Set C—collected at CMU by Rowley, H.A., Baluja, S., and Kanade, T.—and is described in their paper:
Rowley, H.A., Baluja, S., Kanade, T., 1998. Neural network-based face detection. IEEE Trans. Pattern Anal. Mach. Intell. 20(1), 23–38.
It may be downloaded from the website:
http://vasc.ri.cmu.edu/idb/html/face/frontal_images/ (website accessed 20.04.17).
The Bush LFW dataset
The images of George W. Bush used in Chapter 21 were taken from the set collected at the University of Massachusetts:
Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E., 2007. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. University of Massachusetts, Amherst, Technical Report 07-49, October.
The database may be downloaded from the website:
http://vis-www.cs.umass.edu/lfw/ (website accessed 20.04.17).
Topics Covered in Application Case Studies
Influences Impinging Upon Integrated Vision System Design
Glossary of Acronyms and Abbreviations
1-D one dimension/one-dimensional
2-D two dimensions/two-dimensional
3-D three dimensions/three-dimensional
AAM active appearance model
ACM Association for Computing Machinery (USA)
ADAS advanced driver assistance system
AFW annotated faces in the wild
AI artificial intelligence
ANN artificial neural network
AP average precision
APF auxiliary particle filter
ASCII American Standard Code for Information Interchange
ASIC application specific integrated circuit
ASM active shape model
ATM automated teller machine
AUC area under curve
AVI audio video interleave
BCVM between-class variance method
BDRF bidirectional reflectance distribution function
BetaSAC beta [distribution] sampling consensus
BMVA British Machine Vision Association
BPTT backpropagation through time
CAD computer-aided design
CAM computer-aided manufacture
CCTV closed-circuit television
CDF cumulative distribution function
CLIP cellular logic image processor
CNN convolutional neural network
CPU central processor unit
CRF conditional random field
DCSM distinct class based splitting measure
DET Beaudet determinant operator
DG differential gradient
DN Dreschler–Nagel corner detector
DNN deconvolution network
DoF degree of freedom
DoG difference of Gaussians
DPM deformable parts models
EM expectation maximization
EURASIP European Association for Signal Processing
f.c. fully connected
FAR frontalization for alignment and recognition
FAST features from accelerated segment test
FCN fully convolutional network
FDDB face detection data set and benchmark
FDR face detection and recognition
FFT fast Fourier transform
FN false negative
fnr false negative rate
FoE focus of expansion
FoV field of view
FP false positive
FPGA field programmable gate array
FPP full perspective projection
fpr false positive rate
GHT generalized Hough transform
GLOH gradient location and orientation histogram
GMM Gaussian mixture model
GPS global positioning system
GPU graphics processing unit
GroupSAC group sampling consensus
GVM global valley method
HOG histogram of orientated gradients
HSI hue, saturation, intensity
HT Hough transform
IBR intensity extrema-based region detector
IDD integrated directional derivative
IEE Institution of Electrical Engineers (UK)
IEEE Institute of Electrical and Electronics Engineers (USA)
IET Institution of Engineering and Technology (UK)
ILSVRC ImageNet large-scale visual recognition object challenge
ILW iterated likelihood weighting
IMPSAC importance sampling consensus
IoP Institute of Physics (UK)
IRLFOD image-restricted, label-free outside data
ISODATA iterative self-organizing data analysis
JPEG/JPG Joint Photographic Experts Group
k-NN k-nearest neighbor
KL Kullback–Leibler
KR Kitchen–Rosenfeld corner detector
LED light emitting diode
LFF local-feature-focus method
LFPW labeled face parts in the wild
LFW labeled faces in the wild
LIDAR light detection and ranging
LMedS least median of squares
LoG Laplacian of Gaussian
LRN local response normalization
LS least squares
LSTM long short-term memory
LUT lookup table
MAP maximum a posteriori
MDL minimum description length
ML machine learning
MLP multi-layer perceptron
MoG mixture of Gaussians
MP microprocessor
MSER maximally stable extremal region
NAPSAC n adjacent points sample consensus
NIR near infra-red
NN nearest neighbor
OCR optical character recognition
OVR one versus the rest
PASCAL Network of Excellence on pattern analysis, statistical modeling and computational learning
PC personal computer
PCA principal components analysis
PE processing element
PnP perspective n-point
PPR probabilistic pattern recognition
PR pattern recognition
PROSAC progressive sample consensus
PSF point spread function
R-CNN regions with CNN features
RAM random access memory
RANSAC random sample consensus
RBF radial basis function [classifier]
RELU rectified linear unit
RGB red, green, blue
RHT randomized Hough transform
RKHS reproducible kernel Hilbert space
RMS root mean square
RNN recurrent neural network
ROC receiver–operator characteristic
RoI region of interest
RPS Royal Photographic Society (UK)
s.d. standard deviation
SFC Facebook social face classification
SFOP scale-invariant feature operator
SIFT scale invariant feature transform
SIMD single instruction stream, multiple data stream
Sir sampling importance resampling
SIS sequential importance sampling
SISD single instruction stream, single data stream
SOC sorting optimization curve
SOM self-organizing map
SPIE Society of Photo-optical Instrumentation Engineers
SPR statistical pattern recognition
STA spatiotemporal attention [neural network]
SURF speeded-up robust features
SUSAN smallest univalue segment assimilating nucleus
SVM support vector machine
TM template matching
TMF truncated median filter
TN true negative
tnr true negative rate
TP true positive
tpr true positive rate
TV television
USEF unit step edge function
VGG Visual Geometry Group (Oxford)
VJ Viola–Jones
VLSI very large scale integration
VMF vector median filter
VOC visual object classes
VP vanishing point
WPP weak perspective projection
YOLO you only look once
YTF YouTube faces
ZH Zuniga–Haralick corner detector
Chapter 1
Vision, the challenge
Abstract
This chapter introduces the subject of computer vision. It shows how recognition may be performed partly by image processing, although abstract pattern recognition methods are usually needed to complete the task. Important in this process is normalization of the image content to reduce variability so that statistical pattern recognizers such as the nearest neighbor algorithm can carry out their task with limited training requirements and low error rates. It extends the discussion by introducing machine learning and the recently prominent deep learning networks. This chapter also discusses the various applications of vision, contrasting automated visual inspection, and surveillance.
Keywords
Computer vision; process of recognition; nearest neighbor algorithm; template matching; image preprocessing; need for normalization; machine learning; deep learning networks; automated visual inspection; surveillance
1.1 Introduction—Man and His Senses
Of the five senses—vision, hearing, smell, taste, and touch—vision is undoubtedly the one that man has come to depend upon above all others, and indeed the one that provides most of the data he receives. Not only do the input pathways from the eyes provide megabits of information at each glance but also the data rates for continuous viewing probably exceed 10 Mbps. However, much of this information is redundant and is compressed by the various layers of the visual cortex, so that the higher centers of the brain have to interpret abstractly only a small fraction of the data. Nonetheless, the amount of information the higher centers receive from the eyes must be at least two orders of magnitude greater than all the information they obtain from the other senses.
Another feature of the human visual system is the ease with which interpretation is carried out. We see a scene as it is—trees in a landscape, books on a desk, widgets in a factory. No obvious deductions are needed and no overt effort is required to interpret each scene; in addition, answers are effectively immediate and are normally available within a tenth of a second. Just now and again some doubt arises—e.g., a wire cube might be seen
correctly or inside out. This and a host of other optical illusions are well known, although for the most part we can regard them as curiosities—irrelevant freaks of nature. Somewhat surprisingly, illusions are quite important, since they reflect hidden assumptions that the brain is making in its struggle with the huge amounts of complex visual data it is receiving. We have to pass by this story here (although it resurfaces now and again in various parts of this book). However, the important point is that we are for the most part unaware of the complexities of vision. Seeing is not a simple process: it is just that vision has evolved over millions of years, and there was no particular advantage in evolution giving us any indication of the difficulties of the task (if anything, to have done so would have cluttered our minds with irrelevant information and slowed our reaction times).
In the present-day and age, man is trying to get machines to do much of his work for him. For simple mechanistic tasks this is not particularly difficult, but for more complex tasks the machine must be given the sense of vision. Efforts have been made to achieve this, sometimes in modest ways, for well over 40 years. At first, schemes were devised for reading, for interpreting chromosome images, and so on; but when such schemes were confronted with rigorous practical tests, the problems often turned out to be more difficult. Generally, researchers react to finding that apparent trivia
are getting in the way by intensifying their efforts and applying great ingenuity, and this was certainly so with early efforts at vision algorithm design. However, it soon became plain that the task really is a complex one, in which numerous fundamental problems confront the researcher, and the ease with which the eye can interpret scenes turned out to be highly deceptive.
Of course, one of the ways in which the human visual system gains over the machine is that the brain possesses more than 10¹⁰ cells (or neurons), some of which have well over 10,000 contacts (or synapses) with other neurons. If each neuron acts as a type of microprocessor, then we have an immense computer in which all the processing elements can operate concurrently. Taking the largest single man-made computer to contain several hundred million rather modest processing elements, the majority of the visual and mental processing tasks that the eye–brain system can perform in a flash have no chance of being performed by present-day man-made systems. Added to these problems of scale, there is the problem of how to organize such a large processing system and also how to program it. Clearly, the eye–brain system is partly hard-wired by evolution but there is also an interesting capability to program it dynamically by training during active use. This need for a large parallel processing system with the attendant complex control problems shows that computer vision must indeed be one of the most difficult intellectual problems to tackle.
So what are the problems involved in vision that make it apparently so easy for the eye, yet so difficult for the machine? In the next few sections an attempt is made to answer this question.
1.2 The Nature of Vision
1.2.1 The Process of Recognition
This section illustrates the intrinsic difficulties of implementing computer vision, starting with an extremely simple example—that of character recognition. Consider the set of patterns shown in Fig. 1.1A. Each pattern can be considered as a set of 25 bits of information, together with an associated class indicating its interpretation. In each case imagine a computer learning the patterns and their classes by rote. Then any new pattern may be classified (or recognized
) by comparing it with this previously learnt training set,
and assigning it to the class of the nearest pattern in the training set. Clearly, test pattern (1) (Fig. 1.1B) will be allotted to class U on this basis. Chapter 13, Basic Classification Concepts, shows that this method is a simple form of the nearest neighbor approach to pattern recognition.
Figure 1.1 Some simple 25-bit patterns and their recognition classes used to illustrate some of the basic problems of recognition: (A) training set patterns (for which the known classes are indicated); (B) test patterns.
The scheme outlined above seems straightforward and is indeed highly effective, even being able to cope with situations where distortions of the test patterns occur or where noise is present: this is illustrated by test patterns (2) and (3). However, this approach is not always foolproof. First, there are situations where distortions or noise is excessive, so errors of interpretation arise. Second, there are situations where patterns are not badly distorted or subject to obvious noise, yet are misinterpreted: this seems much more serious, since it indicates an unexpected limitation of the technique rather than a reasonable result of noise or distortion. In particular, these problems arise where the test pattern is displaced or misorientated relative to the appropriate training set pattern, as with test pattern (6).
As will be seen in Chapter 13, Basic Classification Concepts, there is a powerful principle that indicates why the unlikely limitation given above can arise: it is simply that there are insufficient training set patterns, and that those that are present are insufficiently representative of what will arise in practical situations. Unfortunately, this presents a major difficulty, since providing enough training set patterns incurs a serious storage problem and an even more serious search problem when patterns are tested. Furthermore, it is easy to see that these problems are exacerbated as patterns become larger and more real (obviously, the examples of Fig. 1.1 are far from having enough resolution even to display normal type-fonts). In fact, a combinatorial explosion
takes place: this is normally taken to mean that one or more parameters produce fast-varying (often exponential) effects, which explode
as the parameters increase by modest amounts. Forgetting for the moment that the patterns of Fig. 1.1 have familiar shapes, let us temporarily regard them as random bit patterns. Now the number of bits in these N×N patterns is N: even in a case where N=20, remembering all these patterns and their interpretations would be impossible on any practical machine, and searching systematically through them would take impracticably long (involving times of the order of the age of the universe). Thus it is not only impracticable to consider such brute force means of solving the recognition problem, but is also effectively impossible theoretically. These considerations show that other means are required to tackle the problem.
1.2.2 Tackling the Recognition Problem
An obvious means of tackling the recognition problem is to standardize the images in some way. Clearly, normalizing the position and orientation of any 2D picture object would help considerably: indeed this would reduce the number of degrees of freedom by three. Methods for achieving this involve centralizing the objects—arranging that their centroids are at the center of the normalized image—and making their major axes (e.g., deduced by moment calculations) vertical or horizontal. Next, we can make use of the order that is known to be present in the image—and here it may be noted that very few patterns of real interest are indistinguishable from random dot patterns. This approach can be taken further: if patterns are to be nonrandom, isolated noise points may be eliminated. Ultimately, all these methods help by making the test pattern closer to a restricted set of training set patterns (although care must also be taken to process the training set patterns initially so that they are representative of the processed test patterns).
It is useful to consider character recognition further. Here we can make additional use of what is known about the structure of characters—namely, that they consist of limbs of roughly constant width. In that case the width carries no useful information, so the patterns can be thinned to stick figures (called skeletons—see Chapter 8: Binary Shape Analysis); then, hopefully, there is an even greater chance that the test patterns will be similar to appropriate training set patterns (Fig. 1.2). This process can be regarded as another instance of reducing the number of degrees of freedom in the image, and hence of helping to minimize the combinatorial explosion—or, from a practical point of view, to minimize the size of the training set necessary for effective recognition.
Figure 1.2 Use of thinning to regularize character shapes. Here character shapes of different limb widths—or even varying limb widths—are reduced to stick figures or skeletons. Thus irrelevant information is removed and at the same time recognition is facilitated.
Next, consider a rather different way of looking at the problem. Recognition is necessarily a problem of discrimination—i.e., of discriminating between patterns of different classes. However, in practice, considering the natural variation of patterns, including the effects of noise and distortions (or even the effects of breakages or occlusions), there is also a problem of generalizing over patterns of the same class. In practical problems there is a tension between the need to discriminate and the need to generalize. Nor is this a fixed situation. Even for the character recognition task, some classes are so close to others (n’s and h’s will be similar) that less generalization is possible than in other cases. On the other hand, extreme forms of generalization arise when, for example, an A is to be recognized as an A whether it is a capital or small letter, or in italic, bold, suffix, or other form of font—even if it is handwritten. The variability is determined largely by the training set initially provided. What we emphasize here, however, is that generalization is as necessary a prerequisite to successful recognition as is discrimination.
At this point it is worth considering more carefully the means whereby generalization was achieved in the examples cited above. First, objects were positioned and orientated appropriately; second, they were cleaned of noise spots; and third, they were thinned to skeleton figures (although the latter process is relevant only for certain tasks such as character recognition). In the last case, we are generalizing over characters drawn with all possible limb widths, width being an irrelevant degree of freedom for this type of recognition task. Note that we could have generalized the characters further by normalizing their size and saving another degree of freedom. The common feature of all these processes is that they aim to give the characters a high level of standardization against known types of variability before finally attempting to recognize them.
The standardization (or generalization) processes outlined above are all realized by image processing, i.e., the conversion of one image into another by suitable means. The result is a two-stage recognition scheme: first, images are converted into more amenable forms containing the same numbers of bits of data; and second, they are classified with the result that their data content is reduced to very few bits (Fig. 1.3). In fact, recognition is a process of data abstraction, the final data being abstract and totally unlike the original data. Thus we must imagine a letter A starting as an array of perhaps 20×20 bits arranged in the form of an A, and then ending as the 7 bits in an ASCII representation of an A, namely 1000001 (which is essentially a random bit pattern bearing no resemblance to an A).
Figure 1.3 The two-stage recognition paradigm: C, input from camera; G, grab image (digitize and store); P, preprocess; R, recognize (i, image data; a, abstract data). The classical paradigm for object recognition is that of (1) preprocessing (image processing) to suppress noise or other artefacts and to regularize the image data and (2) applying a process of abstract (often statistical) pattern recognition to extract the very few bits required to classify the object.
The last paragraph reflects to a large extent the history of image analysis. Early on, a good proportion of the image analysis problems being tackled were envisaged as consisting of an image preprocessing
task carried out by image processing techniques, followed by a recognition task undertaken by pure pattern recognition methods (see Chapter 13: Basic Classification Concepts). These two topics—image processing and pattern recognition—consumed much research effort and effectively dominated the subject of image analysis, while intermediate-level
approaches such as the Hough transform were, for a time, slower to develop. One of the aims of this book is to ensure that such intermediate-level processing techniques are given due emphasis, and indeed that the best range of techniques is applied to any computer vision task.
1.2.3 Object Location
The problem that was tackled above—that of character recognition—is a highly constrained one. In a great many practical applications it is necessary to search pictures for objects of various types, rather than just interpreting a small area of a picture.
Search is a task that can involve prodigious amounts of computation and is also subject to a combinatorial explosion. Imagine the task of searching for a letter E in a page of text. An obvious way of achieving this is to move a suitable template
of size n×n over the whole image, of size N×N, and to find where a match occurs (Fig. 1.4). A match can be defined as a position where there is exact agreement between the template and the local portion of the image but, in keeping with the ideas of Section 1.2.1, it will evidently be more relevant to look for a best local match (i.e., a position where the match is locally better than in adjacent regions) and where the match is also good in some more absolute sense, indicating that an E is present.
Figure 1.4 Template matching, the process of moving a suitable template over an image to determine the precise positions at which a match occurs, hence revealing the presence of objects of a particular type.
One of the most natural ways of checking for a match is to measure the Hamming distance between the template and the local n×n region of the image, i.e., to sum the number of differences between corresponding bits. This is essentially the process described in Section 1.2.1. Then places with a low Hamming distance are places where the match is good. These template-matching ideas can be extended to cases where the corresponding bit positions in the template and the image do not just have binary values but may have intensity values over a range 0–255. In that case the sums obtained are no longer Hamming distances but may be generalized to the form:
(1.1)
It being the local template value, Ii being the local image value, and the sum being taken over the area of the template. This makes template matching practicable in many situations: the possibilities are examined in more detail in subsequent chapters.
We referred above to a combinatorial explosion in this search problem too. The reason this arises is as follows. First, when a 5×5 template is moved over an N×N image in order to look for a match, the number of operations required is of the order of 5²N², totaling some 1 million operations for a 256×256 image. The problem is that when larger objects are being sought in an image, the number of operations increases as the square of the size of the object, the total number of operations being N²n² when an n×n template is used. For a 30×30 template and a 256×256 image, the number of operations required rises to ~60 million. Note that, in general, a template will be larger than the object it is used to search for, because some background will have to be included to help demarcate the object.
Next, recall that in general, objects may appear in many orientations in an image (E’s on a printed page are exceptional). If we imagine a possible 360 orientations (i.e., one per degree of rotation), then a corresponding number of templates will in principle have to be applied in order to locate the object. This additional degree of freedom pushes the search effort and time to enormous levels, so far away from the possibility of real-time implementation that new approaches must be found for tackling the task. [Real-time
is a commonly used phrase meaning that the information has