Matlab Opencv Visual PSeye2

Technische Universit at Berlin Faculty IV Computer Science and Electrical Engineering Computer Graphics Group http://www.cg.tu-berlin.
de
Combining Diuse Illumination and Frustrated Total Internal Reection for touch detection
by Andreas Holzammer Matrikelnumber: 300708 Berlin, October 22, 2009
Supervisor Uwe Hahne Examiners Prof. Dr. Marc Alexa Prof. Dr.-Ing. Olaf Hellwich
Erkl arung
Die selbstst andige und eigenh andige Anfertigung versichere ich an Eides Statt. Berlin, den 22. Oktober 2009.
Andreas Holzammer
Zusammenfassung Es gibt viele Techniken um mehrere Ber uhrungspunkte auf einem Bildschirm zu erkennen. Die verschiedenen Techniken haben jeweils Vorund Nachteile. Manche Techniken erfordern viel Druck, andere erkennen Ber uhrungen schon wenn der Nutzer noch nicht die Ober ache ber uhrt oder die Anzahl der gleichzeitigen Ber uhrungen sind eingeschr ankt. Viele dieser Techniken werden nur dazu eingesetzt Ber uhrungspunkte zu ande erkennen k onnten. Eine Kombinabestimmen, obwohl manche H tion von diesen Techniken k onnte die Vorteile vereinen und somit die aftigt sich mit der Nachteile kompensieren. Diese Diplomarbeit besch Kombination von zwei optischen Techniken; Frustrated Total Internal Reection und Diused Illumination. Diese Techniken verwenden infrarotes Licht das an Fingerspitzen/H anden reektiert und durch eine Kamera aufgezeichnet wird. Es werden verschiedene Techniken vorgestellt und diskutiert weshalb gerade diese beiden Techniken kombiniert werden sollten. Des Weiteren wird ein Tischaufbau beschrieben, der die zwei Techniken vereint. F ur die Bilddarstellung wird ein Bild von unten auf die Tischplatte projiziert. Im laufe der Diplomarbeit wurde eine Software entwickelt, die es dem Entwickler erm oglicht schnell und unkompliziert verschiedene Techniken zum erkennen von Ber uhrungspunkten zu testen. Diese Software kann Bilder von einer angeschlossenen Kamera aufnehmen, vorverarbeiten, analysieren, die analysierten Daten weiterverarbeiten und das Ergebnis an ein Nutzerprogramm schicken. Auerdem k onnen noch mehr Informationen, als nur Ber uhrungspunkte extrahiert werden wie die Zuordnung zwischen Ber uhrungspunkt und Hand, Hand Orientierung und Abstand zwischen Ber uhrungspunkt und der Ober ache. Abschlieend wird ein Nutzerprogamm pr asentiert, das diese Zusatzinformationen verarbeiten kann.
Abstract There are many dierent approaches for detecting multiple touches on device surfaces, which all have their own advantages and disadvantages. Some of the approaches require a lot of pressure to be activated; others are activated even if the user is only close to the surface or are restricted by the number of touches they can detect simultaneously. Most of the technologies are only used to detect touches, but some of the technologies can be used to detect hands. To use the advantages and overcome the disadvantages of the individual technologies a combination of technologies should be researched. This thesis presents a combination of two optical technologies, which are called Frustrated Total Internal Reection and Diuse Illumination. These technologies work with infrared light reected by ngertips/hands, captured by a camera. Other multi-touch technologies are presented and it is discussed why these two selected technologies should be combined. A tabletop hardware setup is presented, which combines both technologies in one setup. For displaying an image onto the touch surface a projector is used, which projects the image from behind. In the process of this thesis an easy to use software was developed, for rapidly testing various processing steps which are needed for the detection process. With this software images can be captured, preprocessed, analyzed, resulting information post-processed and afterwards send to an application. Additional information can be derived from these technologies, like the aliation between ngers and a hand, hand orientation and depth information of touches. Furthermore an application has been created that uses these additional information. Keywords: Touch detection, Multi-touch, Diuse Illumination, Frustrated Total Internal Reection, Hand detection
Acknowledgments First, I would like to thank my parents for their consideration and support as I prepared this diploma thesis. I would also like to thank Bj orn Breitmeyer for all the support that he has given me over the years in my studies and especially during my thesis preparation. I am grateful to Uwe Hahne for supervising me, for great support and for helpful suggestions. I also want to thank Jonas Pfeil for his ideas and great lab days. I want to thank Bj orn Bollensdorf for assistance regarding the hardware and ideas. I like to thank Matthias Eitz for his great support he gave us and the interest he put into our project. I also want to thank my brother for supporting me so much and getting me out to go biking from time to time. I want to thank Prof. Dr. Marc Alexa for examining this work and Prof. Dr.-Ing. Olaf Hellwich for co-examining this work. I want to express my gratitude to all people that proof read this thesis; Rudolf Jacob, Melanie Ott and many others. I also want to thank all the others whom I missed.
Contents
Contents
1 Introduction 1.1 Motivation . . . 1.2 Goal . . . . . . . 1.3 Design of System 1.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 3 4 4 7 7 8 10 11 12 13 13 14 15 15 15 16 17 18 19 20 22 22 24 24 25 25 26 27 27 27 28 28 28 29 30 30
2 Touch-Sensing Technologies 2.1 Frustrated Total Internal Reection (FTIR) 2.2 Diused Illumination (DI) . . . . . . . . . . 2.3 Diused Surface Illumination (DSI) . . . . . 2.4 Laser Light Plane (LLP) . . . . . . . . . . . 2.5 LED Light Plane (LED-LP) . . . . . . . . . 2.6 Resistance-Based Touch Surfaces . . . . . . 2.7 Capacitance-Based Touch Surfaces . . . . . 2.8 Discussion . . . . . . . . . . . . . . . . . . . 3 Hardware 3.1 Assembly . . . . . . . . . . . . . . . . 3.2 Old Setup . . . . . . . . . . . . . . . . 3.3 Camera . . . . . . . . . . . . . . . . . 3.4 Infrared Bandpass Filter . . . . . . . . 3.5 Lens . . . . . . . . . . . . . . . . . . . 3.6 Projector . . . . . . . . . . . . . . . . 3.7 Infrared Light . . . . . . . . . . . . . . 3.8 Surface Layers . . . . . . . . . . . . . 3.8.1 Compliant Layer . . . . . . . . 3.8.2 Projection Screen . . . . . . . . 3.8.3 Protective Layer . . . . . . . . 3.8.4 Dierent Surface Layer Setups 3.9 Switching Circuit . . . . . . . . . . . . 3.10 Power Supply . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
4 Algorithms 4.1 Image Preprocessing . . . . . . . . . . . . 4.1.1 Bright Image Removal . . . . . . . 4.1.2 Ambient Light Subtraction . . . . 4.1.3 Background Subtraction . . . . . . 4.1.4 Hotspot Removal . . . . . . . . . . 4.1.5 Image Normalization of DI Images 4.2 Feature Detection . . . . . . . . . . . . . . 4.2.1 Touch Detection . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
Combining DI and FTIR for touch detection
Andreas Holzammer
II 4.2.2 Hand Detection . . . 4.2.3 Fingertip Detection 4.2.4 Hand Orientation . . Post-processing . . . . . . . 4.3.1 Undistortion . . . . 4.3.2 Tracking . . . . . . . 4.3.3 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents . . . . . . . . . . . . . . . . . . . . . 31 31 35 36 36 37 38 39 40 41 42 43
4.3
5 Combining Frustrated Total Internal Reection and Diused Illumination 5.1 Images of Frustrated Total Internal Reection and Diuse Illumination 5.2 Processing Pineline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6 DIFTIRTracker 45 6.1 Graphical User Interface . . . . . . . . . . . . . . . . . . . . . . . . . 46 6.2 Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 6.3 Network Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 7 Results 7.1 Proof of Concept . . 7.2 Informal User Study 7.3 Conclusion . . . . . 7.4 Future Work . . . . 49 49 51 54 55 57 57 59 60 60 60 61 62 VII
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
8 Appendix 8.1 Several Spectra of Infrared Bandpass Filters 8.2 Projector list . . . . . . . . . . . . . . . . . 8.3 Software . . . . . . . . . . . . . . . . . . . . 8.3.1 Community Core Vision (CCV) . . . 8.3.2 CG Tracker . . . . . . . . . . . . . . 8.3.3 reacTIVision . . . . . . . . . . . . . 8.3.4 Touchlib . . . . . . . . . . . . . . . . Bibliography
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
October 22, 2009
List of Figures
III
List of Figures
1.1 1.2 1.3 2.1 2.2 2.3 2.4 2.5 2.6 2.7 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 5.1 5.2 5.3 5.4 5.5 5.6 Popularity of the search terms multi touch . . . . . . . . . . . . . Multi-touch Table of Computer Graphics institute . . . . . . . . . . Parts of the multi-touch table . . . . . . . . . . . . . . . . . . . . . . General FTIR setup . . . . . . . . . . . . . Coupling infrared light into an acrylic plate General DI setup . . . . . . . . . . . . . . . General DSI setup . . . . . . . . . . . . . . Basic Laser Light Plane setup . . . . . . . . Occlusion of ngers . . . . . . . . . . . . . . Basic LED Light Plane setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 3 4 8 8 9 10 11 12 12 15 16 16 18 19 19 20 20 21 22 22 24 25 26 29 30 32 32 34 36 37 37 39 40 40 41 41 42
A basic optical hardware assembly . . . . . . Point Grey Firey MV . . . . . . . . . . . . . Spectrum of the Point Grey Firey MV . . . Infrared bandpass lter from Midwest Optical Calculation of lens distance . . . . . . . . . . Distortion of the lens . . . . . . . . . . . . . . Principle of ultra-short-throw projector . . . Acer S1200 ultra-short-throw projector . . . . Osram SFH 4250 . . . . . . . . . . . . . . . . Etching layout . . . . . . . . . . . . . . . . . Placement of the infrared illuminators . . . . Streaks . . . . . . . . . . . . . . . . . . . . . Surface layers for an FTIR setup . . . . . . . Switching Circuit . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Hotspot . . . . . . . . . . . . . . . . . . . . . . . . . . Illumination of the surface . . . . . . . . . . . . . . . . Convexity Defects of a Hand . . . . . . . . . . . . . . Smoothed and non-smoothed contour . . . . . . . . . Dominant point detection . . . . . . . . . . . . . . . . Orientation angle theta, derived from central moments Example image of checkerboard . . . . . . . . . . . . . States of a touch, derived by tracking touches . . . . . Idea of the thesis . . . . . . . . . . . . . . . Comparison of hand touch with pressure . . Comparison of hand touch with no pressure Comparison of at hand touch . . . . . . . Comparison of touches close together . . . . FTIR, DI pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Andreas Holzammer
IV 5.7 5.8 6.1 6.2 6.3 7.1 7.2 7.3 7.4 7.5 7.6 7.7 8.1 8.2 8.3 8.4 8.5 8.6 8.7
List of Figures FTIR and DI LEDs on vs multiplied . . . . . . . . . . . . . . . . . . FTIR and DI switched on vs multiplied . . . . . . . . . . . . . . . . DIFTIRTracker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parts of the DIFTIRTracker . . . . . . . . . . . . . . . . . . . . . . . Hand TUIO Package . . . . . . . . . . . . . . . . . . . . . . . . . . . Hand menu . . . . . . . . . . . . . . . . . . . . . . Determination if the hand is a right or a left hand Community Earth . . . . . . . . . . . . . . . . . . Gestures used by Community Earth . . . . . . . . Labyrinth application . . . . . . . . . . . . . . . . Pipelines with non combined technologies . . . . . Pipelines with combined technologies . . . . . . . . Spectrum of one overexposed photo negative Spectrum of two overexposed photo negative Spectrum of one oppy disk . . . . . . . . . . Spectrum of two oppy disk . . . . . . . . . . Community Core Vision (CCV) . . . . . . . . CG Tracker . . . . . . . . . . . . . . . . . . . ReacTIVision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 43 45 46 47 49 50 51 52 53 53 54 57 57 58 58 60 61 61
October 22, 2009
List of Tables
List of Tables
3.1 3.2 3.3 3.4 Specication Specication Specication Parallel port of the Point Grey Firey MV . of the lens . . . . . . . . . . . of the Acer S1200 . . . . . . . data pins used for switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 18 20 26
Andreas Holzammer
Chapter 1 Introduction
There are several ways to interact with the computer. The oldest interaction method is the keyboard and later the mouse made a profound impact on in computer interaction. The Zuse Z3 (1941), the rst computer even had buttons to interact with the computer. Later these buttons formed a keyboard. The mouse was invented in 1963/1964 by a team around Douglas C. Engelbart and William English at the Stanford Research Institute (SRI). This mouse enabled the user to point in a 2D space, which indirectly manipulates the cursor on the computer monitor. These two methods are still widely used at the present time. Almost every computer has a keyboard and a mouse. This adds up to about 60 years of success for the keyboard and 40 years of success for the mouse. Many other interaction methods have been invented but no other technology has had as much success. Touchpads were introduced when notebooks became successful. This pad is placed beside the keyboard and can normally only track one ngertip. It is also small in size and without display technology. Today there are multi-touch touchpads, but with some limitations, such as the size of the pad and number of ngers it can detect. Experience has shown that users would prefere an interaction with the computer in a very simple and natural manner. They normally work with their hands, so a natural interface for the hand is needed. The user desires a visual feedback from the computer and wants to interact with the displayed content. Obviously it would be nice if the user is able to touch the visual feedback to interact with the computer. Touchscreens where invented in the late 1960s. But the rst commercial touchscreen computer, the HP-150, was presented not before 1983. These touchscreens could only detect one touch point. The user has two hands, ten ngers, and wants to use both hands to interact with his tools. For example, if a human wants to cut a tree branch into two parts, the user holds the branch with one hand and the saw with the other. He then cuts the branch into two parts. It is very natural to use two hands to work, although it is not the case for all tasks. Why should the user be restricted to using just one nger to interact with a computer? Humans often work with two hands very productively, but on the computer it has not always been so successful. Users would like to have a user interface which is very intuitive and sensitive enough to that little pressure is needed to interact with the device. The multi-touch technology enables the user to employ both hands and even use the computer with other people at the same time.
1. Introduction
Figure 1.1: Popularity of the search terms multi touch analyzed by Google Trends [29] from the beginning of 2004, peeks are labelled with main events, a) Je Han presented FTIR surface at TED, b) iPhone, c) Microsoft Surface, d) Microsoft Wall, e) iPhone3G, f) Windows 7 announced with multi-touch support
1.1 Motivation
Multi-touch is an interaction technology that allows the user to control the computer with several ngers. These multi-touch devices typically consist of a touch screen (e.g., computer display, table or wall), as well as a computer that is detecting such touches and producing the image. In the last couple of years multi-touch interfaces have become increasingly popular. This popularity can be seen in the search requests on Google for the term multi touch (see Figure 1.1). During the US presidential elections in 2008, Cable News Network (CNN) utilized a multi-touch screen in order to present interactive maps displaying the presidential race results in each state. Even though the multi touch technology was initially introduced in the 1970s, it did not gain popularity until Je Han presented his low-cost multi-touch sensing technology in 2005 [24]. Bill Buxton gave a good overview of the history of the multi-touch technologies on his website [8], where it is very interesting to see, which kind of devices were invented and at what time in the past. Hans, low-cost multitouch sensing technology is based on Total Internal Reection (FTIR) principle, which was rediscovered by him to detect touches. In 2007 Apple introduced their new multi-touch smartphone, called the iPhone, which is using an electrical eect to detect touches, which can be seen as second peak in Figure 1.1. The iPhones multitouch interactions were embraced by the user because of its ease of use. Microsoft then introduced their multi-touch table, called Surface in 2007 [43]. The table uses a dierent optical method than Je Hans, which is called Diused Illumination (DI). This technology can detect objects and interact with them. After that Microsoft invented a multi-touch wall. Then in 2008 Apple promoted its second version of the iPhone, which introduced a faster internet connection as well as global positioning system A-GPS. 2008 Microsoft announced the new Microsoft Windows would support multi-touch.
October 22, 2009
1.2 Goal
Figure 1.2: Multi-touch Table of Computer Graphics institute
User studies have shown that a direct manipulation with one or more ngers can increase performance dramatically in contrast to using a mouse [38]. Hence, if two hands are used instead of just one a higher performance can be achieved as Buxton et al. are stating [9]. Wigdor et al. [55] state that it is very important for user interaction satisfaction on a multi-touch device that there is accurate touch detection. The user becomes frustrated or even loses the sense of control if the system is not responding in the way the user expects it to. This can have several causes like the system not being responsive, hardware failing to detect the input, input delivering the wrong location or input not mapping to the expected function. The existing multi-touch technologies have their own advantages and disadvantages.
1.2 Goal
The goal of this diploma thesis is to enhance the touch detection of the multi-touch table at the Technical University Berlin, which is shown in Figure 1.2. The issue of the old table was the touch sensivity of the panel. Users must push very hard, particularly if the user drags his nger on the surface to interact with the table. Many people are not comfortable with pushing very hard while they are dragging a nger on surface, especially if the surface is very glossy. On the other hand with a dierent technology called diused illumination sensitivity is very high, but it is dicult to sense whether a user is really touching the surface or is just above the surface. The idea is to combine these two technologies to produce sensitive and accurate touch detection. This combination needs to be studied, not only how the touch information is derived, but also how connections between touches and the hand could be established to enhance the human-computer interaction. This information could be used to approximate how many people work on the table.
Andreas Holzammer
1. Introduction
1.3 Design of System
Client Touch Server
Figure 1.3: Parts of the multi-touch table The basic idea of the multi-touch table is to have one device with all the hardware that is required to detect various touches, which is shown in Figure 1.3. The computer underneath the table, which we call this the touch server, does all the touch detection. This touch server is providing all the data that is detected with the tables hardware to a client computer which runs an application. This client computer processes the data which is transfered via network.
1.4 Related Work

Many people are working on interaction models with the computer like the HumanComputer Interaction (HCI) community, which include the Association for Computer Machinery (ACM) Symposium on User Interface Software and Technology (UIST), ACM Conference on Human Factors in Computing Systems (CHI) and HCI International, even a user community has been created which conducts research in this eld, the Natural User Interface Group (NUI Group) [22]. A lot of these people are putting a great deal of eort into building their own multi-touch tables and even researchers contribute to this community. But there are also commercial eorts being made in multi-touch technology. Microsoft developed a multi-touch table and introduced it in May 2007. This table is 56 cm high, 53 cm deep and 107 cm wide. It has a 30-inch display and houses a Windows Vista computer inside. The display has a native resolution of 1024x768 pixels. This table can detect touches and recognize objects. Microsoft Surface also uses an optical method to detect touches and objects, which is called Diused Illumination. It uses an array of ve cameras for imaging. The computer that is integrated in the table is processing the data of these cameras and projects an image with a projector built into the tables surface. Microsoft also supplies a Software Development Kit (SDK) for writing own multi-touch applications. Apple developed the iPhone with a multi-touch interface, using an electronic technology that uses the capacity between the human body and the panel. Such sensors can be built fairly thin, but cannot scale as well, in contrast to the optical methods and are expensive to produce. Firmware and the design of the controller restrict the number of touches it can detect simultaneously. Apple also has a SDK for their iPhone to write own applications. But there are not only commercial products, for example, the UnMousePad [50],
October 22, 2009
1.4 Related Work
which is a exible and inexpensive multi-touch input device. It is just a pad that is pressure sensitive, but the authors say it can also be developed to be transparent as an overlay to displays. It uses a principle called Interpolating Force Sensitive Resistance, which is a electric method of sensing multiple touches on a surface. The authors print two conductive layers with wires. One layer is positioned horizontal and one vertical. Between those layers is a resistive layer. These wires are connected if a user touches the pad, which is measured by a micro controller. These measurement results form an image of pressure upon the surface, which is analyzed to extract touch information. In 2003 Jord` a et al.[34] created a table in which a user can make music with various objects, called rectable. These objects are recognized by the table and the user can interact with a music software to make music. They use the optical method Diused Illumination to detect duciary marker (ducial) on the objects. First they put markers on the ngers to nd ngers with the existing software. Later they included normal touch detection as well as touch interaction. Kaltenbrunner et al., a member of the research group introduced in 2005 a standardized network protocol for touches and objects. In 2008 Izadi et al. from Microsoft research presented in 2008 a new surface technology called SecondLight [32], where two projectors are combined to produce one projection image to the table surface and one to an object above the surface. They used a special acrylic plate which can be switched diused with 60 Hz. A combination of the Frustrated Total Internal Reection eect and the Diused Illumination is used to detect touches and objects at the same time. Weiss et al. introduced in 2009 a multi-touch table which can be used with silicone objects, like buttons, sliders, knobs and keyboards. The labeling of these objects is produced by the projector that is used for image displaying. They use a combination of the Frustrated Total Internal Reection and Diused Illumination for the detection of touches and the silicone objects.
Andreas Holzammer
Chapter 2 Touch-Sensing Technologies

Many multi-touch technologies have been invented. For better understanding why Frustrated Total Internal Reection and Diused Illumination can be combined to enhance touch detection, we need to know how these technologies work and what other technologies could be used in combination. First the optical technologies are described and later on the electric technologies. This is only an incomplete list, because this would go beyond our scope of this thesis.
2.1 Frustrated Total Internal Reection (FTIR)

The Frustrated Total Internal Reection (FTIR) eect was rediscovered by Je Han [24] for multi-touch sensing. Je Hans rediscovery can be seen as the starting point for the optical multi-touch sensing. For the FTIR eect two materials are needed, one that has a higher reection index than the other. Light rays are totally reected at the boundaries at a certain angle. This angle can be calculated by Snells law. The material with the higher refraction index is normally acrylic glass which has a refraction index of approximately 1.5 and the material that has the lower index is normally the air, which has a refraction index of about 1.0. So we can calculate the critical angle as follows: c = arcsin n2 n1 = arcsin 1.00 1.50 41.8
The light rays are injected in the acrylic plate from the edges. If a user touches the acrylic plate, the total internal reection is interrupted at this point and reected straight down, because of the higher reection index of the ngertip. An illustration of this eect can be seen in Figure 2.1. The minimum thickness of the acrylic plate should be 6 mm (depends on the size of the multi-touch surface) to prevent too much bending of the screen. The acrylic plate is normally cut roughly. For an ecient coupling of the light into the plate, the edges of the acrylic plate have to be polished. To enhance the coupling of the light into the edges, the edges can be cut o with an angle of 45 which is shown in Figure 2.2. Infrared light is mostly used for illumination due to the fact that the human eye cannot see this light. An infrared camera is placed beneath the acrylic plate. Common Charge-Coupled Device (CCD) cameras are sensitive in the infrared spectrum, but they normally have an infrared lter in front of the sensor. Color CCD cameras have an additionally bayer lter in front of the sensor. All these lters are disturbing the imaging process of infrared light, so a CCD camera without infrared lter or bayer lter is required.
2. Touch-Sensing Technologies
IR-LED Total Internal Reflaction
Camera
Figure 2.1: General FTIR setup
Figure 2.2: Coupling infrared light into an acrylic plate without angle left and with 45right The resulting images are analyzed by a computer vision program, which detects bright spots, which we call blobs and tracks them. A bae is necessary to hide the light that is leaking from the LEDs that are mounted at the sides. Otherwise infrared light can be reected directly by a hand towards the camera. This Bae should preferable be made of a material that does not reect infrared light. Because ngertips have little rills in the skin the frustration of the internal total reection takes place only at the skin ridges of those rills. This results in very dark blobs. To overcome this issue a layer, which we call the compliant layer, is needed that closes the little air gaps between the rills.
2.2 Diused Illumination (DI)

Diused Illumination (DI) is very similar to FTIR, has also a projector, camera and infrared illuminators. But this time the infrared illuminators illuminate the projection surface from behind. Matsushita et al. used this technology for their HeloWall [42]. With this technology a compliant layer is not required, because the infrared light is directly reected by the hand. The general setup can be seen in Figure 2.3. A diusor is needed, because the hand should only be detected if it is close to the surface, which is done by the diusor in the way that the objects that are close to the diusor are sharp, and if the object is far away the object gets more
October 22, 2009
2.2 Diused Illumination (DI)
IR Illuminator Camera
Figure 2.3: General DI setup and more unsharp and at a certain distance the object cannot be detected anymore. Normally the projection screen that is needed for image displaying is diuse enough to fulll this eect. It is very important to get a unied distribution of infrared light across the surface to get good detection results. If the surface is not evenly illumninated, the object on one spot of the surface is very bright and on other places very dark, which makes the image preprocessing very dicult or even impossible, because of the brightness sampling of the camera. It is very dicult to get an evenly spread illumination, which leads to a variance of sensitivity over the regions of the surface. Hochenbaum and Vallis, who constructed the bricktable [25] say that it is very hard to get a setup that is working with the same sensitivity in all spots. Teichert et al.[33] have researched a method to get a surface of a multi-touch table even illuminated. They used 2520 infrared light-emitting diodes (LEDs), mirrors and local shadowing with a cross illumination technique to get their surface even illuminated. Another approach is to put the illumination in front of the projection screen and track shadows in contrast of the reected light, which was stated by Echtler in 2008 [15]. This can be a good idea, because the sunlight and other light bulbs are emitting infrared light, which we call ambient light. But if there is no ambient light it has to be produced. Therefore, some infrared illuminators have to be placed above the surface. On the other hand, if we are not using shadow tracking; the stronger the external light is, the brighter the background of the captured image gets. It can get so bright that there is no dierence between reected light from the hand and ambient light. Here an acrylic plate is not needed, but the user needs a hard surface which he can touch, to get a haptic feedback. But glass or other transparent material can be
Andreas Holzammer
10
IR-LED
Camera
Figure 2.4: General DSI setup
used for that purpose. The projection screen in this case can be either placed below the acrylic plate or above, but this depends on the touch feeling of the projection screen or the material that is used for the haptic feedback. One major advantage of rear diused illumination is that it can be used to detect objects or even ducial markers.
2.3 Diused Surface Illumination (DSI)

Later on more advanced technologies were introduced. Diused Surface Illumination (DSI) is almost a combination of the FTIR and DI technology. For DSI a special acrylic plate is used, which has small particles inside, which reect the infrared light. Infrared light is coupled in from the edges of that special acrylic plate, like how it is accomplished with the FTIR technology. But here these particles in the acrylic plate are reecting the infrared light to the outside of the acrylic plate. This way the the acrylic plate is shining evenly at almost every spot on the acrylic plate. This setup is shown in Figure 2.4. As mentioned before the even distribution of the infrared light is very important for the detection. With this technology no compliant surface is required and the projection screen can be placed above or below the acrylic plate. The images of the DSI technology is very similar to the DI technology, but the illumination is more even because of the particles inside the acrylic plate.
October 22, 2009
2.4 Laser Light Plane (LLP)
11
Camera
Figure 2.5: Basic Laser Light Plane setup
2.4 Laser Light Plane (LLP)
Laser Light Plane (LLP) is a technology which uses lasers as infrared source. A infrared light plane is produced by lasers with a line generator in front of the lasers. The laser plane should be about 1 mm thick. Normal line generator procures a 120-degree line plane. The laser plane should be just above the surface. A basic setup is shown in Figure 2.5. Due to the fact that lasers are used to produce the infrared plane, some safety issues have to be taken into account. The human eye cannot see the infrared light, but can be hurt by it. The eye has a blinking response for visible light, but with infrared light the eye does not respond and the human does not realize that he is being hurt by the laser. Therefore, only so much lasers and power should be used to cover the surface. This technology works as follows. The infrared light from the lasers is scattered at the ngertip of the user, towards the camera, if the user touches the surface. The user does not really need to touch the surface to be detected, because the light plane is above the surface. But the ngers can occlude the infrared light so ngers hidden behind other ngers can not be detected, as shown in Figure 2.6. To overcome this problem, more lasers are needed. The projection screen can either be placed above or below the acrylic plate.
Andreas Holzammer
12
Figure 2.6: Fingers can occlude each other. The black touch is occluding the gray touch.
Camera
Figure 2.7: Basic LED Light Plane setup
2.5 LED Light Plane (LED-LP)

LED Light Plane (LED-LP) is very similar to Laser Light Plane, but here LEDs are used to produce an infrared light plane. Therefore, LEDs with a very small opening angle are required, which should preferable be placed on all sides of the touch surface. For LED-LP it is very important that the LEDs are covered up with a material that is not reecting infrared light. If the LEDs are not covert the LEDs could illuminate the hands or other objects that are above the surface and the light is scattered back to the camera. This could end up in a very high false touch detection rate. Here again ngers can occlude the infrared light, so that ngers behind other ngers are not illuminated and therefore not detected. A basic setup is shown in Figure 2.7.
October 22, 2009
2.6 Resistance-Based Touch Surfaces
13
2.6 Resistance-Based Touch Surfaces

Another group of multi-touch technologies are the electrical technologies. Resistancebased touch surfaces have two conductive layers, one with horizontal lanes and one with vertical lanes. These two layers are separated with an insulation layer, which is normally formed by tiny silicon dots. Above these layers typically a exible hard-coated layer is placed, which protects the layers beneath. The bottom normally consists of a glass layer to give a base for touching. A controller applies voltage to one of the conductive layers and measures the output of the lanes at the other layer. If a user touches the surface the lanes of the horizontal and vertical layers are connected, and current can ow. The controller changes the voltage layer and the measuring layer to determine the exact position. This method has a very low power consumption. The surface can be used with ngers and a stylus, because it just needs pressure. A big disadvantage of this method is that the touch layer has only a light transmission of about 75%-85% and addtionally screenprotection cannot be applied without interfering with the touch detection. These touch surfaces are used for small devices like the Nintendo DS [10]. More information about resistance-based (multi)-touch displays can be found in [14].
2.7 Capacitance-Based Touch Surfaces

Due to the fact that the human body is an electrical conductor, humans can change the charge of a capacitance system. Capacitive touch surfaces are relatively expensive to produce and the accuracy is rather bad in contrast to the other technologies. Capacitance-based touch surfaces can be divided into two main classes:
Surface Capacitance Projected Capacitance
The surfaces of Surface Capacitance touch panels consists of a thin conductive layer on a glass substrate which is transparent and serves as an electrode of a capacitor. The corners of the conductive layer are connected to a voltage source via a sensitive current measuring system. If a user touches the surface, the charge is transported from the conductive layer to the human body. The drawn current from the corners is measured and a position is estimated. Projected capacitive touch surfaces consist of a capacitive sensorgrid, which is normally between two protective glass layers. The sensorgrid can measure the capacitance forms between the nger and the grid, while the user is touching the surface. The touch position is derived due to the change of electrical properties of the sensorgrid. This method can detect ngertips even if they are not touching the surface, because the electrical properties already change if the nger is close to the surface. This type of panel can be used in rough environments such as public installations because it can be covert with a non-conductive material without interfering the touch detection. Due to the sensor grid, multiple touches can be derived more easily, compared to the surface-capacitance based technology.
Andreas Holzammer
14
One example of capacitive touch surfaces is the DiamondTouch system created by Dietz and Leigh in 2003 [13], which is a multi-user, debris-tolerant, touch-andgesture-activated screen for supporting small group collaborations. They transmit a signal, which depends on the location on the table through antennas, and if the user touches the screen the signal is capacitively coupled to the chair where it is received and analyzed. This leads into the restriction that only four users can be distinguished. Another famous device is the iPhone [28] from Apple, which uses a capacitive touch surface, but not much technical information about it, is known.
2.8 Discussion
After the presentation of the dierent multi-touch technologies it becomes clear that not all technologies can be combined. The combination of electric and optical methods, would be possible, but if the electrical methods reach a certain size, the electric issues are getting huge. Because of the desired size of the touch screen 120 cm x 90 cm, the electric methods where not chosen. The technologies to combine should be real multi-touch technologies; the technologies should allow to detect many touches without restrictions. The infrared light plane technologies are restricted due to the fact that ngers can be occluded by each other. Some of the electrical methods have an issue with multiple touches too. Diused Surface Illumination cannot be combined because it uses a special acrylic plate, which is already a combination of Frustrated Total Internal Reection and Diused Illumination. Both eects always have to be used at the same time; it is not possible to use both eects separately and calculating one result. The advantage of the Frustrated Total Internal Reection technology is that there is a strong contrast between a touch and the background. A pressure approximation can be done with the brightness of a touch, but this can be a disadvantage too, because this technology requires pressure to work and if the user is not applying pressure to the surface it is not detected. The advantage of Diused Illumination is that it is very sensitive to touches, but this again can be a disadvantage too because this can lead into a false detection. The combination of Frustrated Total Internal Reection and Diused Illumination was chosen to combine the advantages of these both technologies and balance the disadvantages. After looking at the technologies, a hardware setup is needed that combines the chosen technologies in one setup.
October 22, 2009
Chapter 3 Hardware
In this chapter the hardware components which are required to build a multitouch table with the Frustrated Total Internal Reection and Diused Illumination technology are discussed. The hardware which we used for the multi-touch table is presented afterwards. The multi-touch table of the Institute Computer Graphics (CGTable) at the Technical University of Berlin [3], was rst built as part of a project in winter of 2007/08. The table had only the FTIR technique to detect touches. During this thesis the table was upgraded with the DI technology; also other hardware problems where resolved by part replacing.
3.1 Assembly
A basic hardware assembly of an optical multi-touch display consists of the following parts: a camera, infrared illuminators, projector and a projection screen, as seen in Figure 3.1.
Projectionscreen
Projector Camera
IR-Illuminator
Figure 3.1: A basic optical hardware assembly
3.2 Old Setup

The old multi-touch table had a normal projector inside, which needed a fairly long projection distance, so two mirrors were required to reach the projection distance needed for the projection size of 60 inches. The camera was placed just beside the projector and was taking the images over the mirrors. 98 Osram SFH 485 infrared
15
16
3. Hardware
Figure 3.2: Point Grey Firey MV, picture taken from the website of Point Grey Research Inc. [49]
Figure 3.3: Spectrum of the Point Grey Firey MV, picture taken from the website of Point Grey Research Inc. [49] light-emitting diodes (LEDs) [48] were used to illuminate the acrylic plate, which illuminated the the acrylic plate fairly poorly, because there where not enough.
3.3 Camera
A CCD camera is needed that has no infrared lter and should be a black-and-white camera. The camera should also have a large sensor, so that a great deal of light can be captured by the sensor. A small imaging sensor can lead a high noise ratio. Due to the fact that we want to take images of infrared light, the camera should have a spectrum that covers the wavelength that we use, which is typically 850 nm. Also, a high frame rate is needed, because we want a fast response time and a good tracking result. One good and cheap camera is the Playstation 3 Eye camera [18]. This camera is a color CCD camera, which can capture 640x480 pixels at a frame rate of 60 fps. The old camera of the CG-Table was a Imaging Source DMK 21BF04 [52], which was replaced because of the size of the imaging sensor. We chose the Firey MV from Point Grey, because it has good specications (see Table 3.1) and a matching spectrum (see Figure 3.3). Also many other projects have used this camera with good results.
October 22, 2009
3.4 Infrared Bandpass Filter
17
The Firey has an external trigger, which is used to synchronize the camera and the infrared LEDs. Most other cameras do not have external triggers like webcams, which are very popular in the community, because they are cheap and very easy to get, but the built-in infrared lter needs to be removed. We need the external trigger, because we want to make dierent images of the Frustrated Total Internal Reection, Diused Illumination eect and images with no infrared light on. We call the images of the eects FTIR image, DI image and reference image. Description Image Sensor Type: Image Sensor Model: Maximum Resolution: Pixel Size: Imaging Area: Digital Interface: Maximum Frame Rates: General Purpose I/O Ports:
Synchronization:
Lens Mount:
1/3 progressive scan CMOS, global shutter Micron MT9V022 752(H) x 480(V) 6.0 m x 6.0 m 4.55 mm x 2.97 mm IEEE 1394a / USB 2.0 63 FPS at 752x480 7-pin JST GPIO connector, 4 pins for trigger and strobe, 1 pin +3.3 V, 1 VEXT pin for external power via external trigger, software trigger, or free-running CS-mount (5 mm C-mount adapter included)
Table 3.1: Specication of the Point Grey Firey MV found on the website of Point Grey Research Inc. [49]
3.4 Infrared Bandpass Filter

An infrared bandpass lter is needed because we only want to sense infrared light. All other light is just disturbing the detection process. There are several materials which can be used as an infrared bandpass lter like exposed negative lm, strong sunglasses, a oppy disk, lter taken from an IR remote and a professional band pass lter. Professional band pass lters are quite expensive and sometimes dicult to get, so many people opt for one of the other lters, but they can have quality issues. Due to the fact that we use 850 nm infrared LEDs we need a lter that has this value as peak. The lter is placed preferable in front of the CCD sensor of the camera. Figure 3.4 shows the transmission curve of the lter we have chosen for our setup. We have chosen this lter, because many people in the community have used this lter before and had good results; also we obtained it in the correct size for our camera and did not have to cut the lter to the right size. Other curves of some of the lters listed above are in the Appendix, in section 8.1.
Andreas Holzammer
18
3. Hardware
Figure 3.4: Infrared bandpass lter from Midwest Optical Systems, chart taken from website of Midwest Optical Systems [53] Description Model: Focal Length: Iris Range: Calculated distance to screen:
Computar T2Z 1816 CS 1.8 - 3.6 mm F1.6 - F16C 55 cm - 115 cm
Table 3.2: Specication of the lens, informations taken from [12]
3.5 Lens
The choosing process of the lens depends on the distance between the camera and the touch surface. Due to the fact that the surface is very big (60 inches) and not too high, a wide lens opening angle is required to capture the full surface. For this purpose we chose a sheye lens, which has a barrel distortion eect. This eect is shown in Figure 3.6. Our surface has a dimension of 120 cm x 90 cm at a hight of 103 cm. Figure 3.5 shows the physical setup. With the following equation the needed distance can be calculated:
Screen x = fSensor Sensor = 4.55 mm x 2.07 mm Screen = 1200 mm x 900 mm
We chose a vari lens, because we wanted the freedom of placing the camera at various positions. The specications of the used lens are shown in Table 3.2. Experiments have shown that the ideal place is in the middle of the acrylic plate.
October 22, 2009
3.6 Projector
19
Figure 3.5: Calculation of lens distance
Figure 3.6: Left is a tele lens with little distortion. Right is the sheye lens with much distortion.
3.6 Projector
As mentioned earlier projectors need a certain distance for a certain projection size. If a normal projector is used, a projection distance of approximately 2.5 m is needed. In this case, mirrors have to extend the distance between the projector and the surface. So if the the projection screen is at a height of 104 cm, 2 mirrors are needed, because with just one mirror the projector would be above the table surface. But with mirrors, ghosting eects appear, which is a replica image appearing fainter with an oset in position to the primary image; an alternative are short-throw projectors. A table of possible short-throw projectors is presented in the Appendix in section 8.2. These projectors need a projection distance from -0.04 m to 2 m for our table. The negative value means that the projector is actually placed 4 cm above the surface. These projectors are projecting at a very high angle, as shown in Figure 3.7. The projector also needs to be mounted at an angle of 90 degrees. Not all projectors can be mounted at 90 degrees, because their heat ventilation needs to be at an upright position. Otherwise the projector is getting very hot and the lifetime of the lamp decreases dramatically. Another problem is the mounting of the projector. Normal ceiling mounts cannot be used for that purpose, because they are not stable enough to hold the projector at 90 degrees. The projectors have normally screw holes for the ceiling mounts, where a board can be mounted which can be mounted to the table. We have chosen the Acer S1200 projector, which is shown in Figure 3.8, because of the brightness, contrast ratio (see Table 3.3) and that the projector projects not
Andreas Holzammer
20
3. Hardware
Figure 3.7: Principle of ultra-short-throw projector from the side. Also, with this projector no mirrors are needed to project 60 inches.
Figure 3.8: Acer S1200 ultra-short-throw projector, picture taken from web page of Acer [1] Description Projection System Native Resolution Brightness Contrast Projection lens Throw Ratio Projection Screen Size Projection Distance Lamp lifetime Distance for 60
DLP 1024 x 768 2500 Lumen 2000:1 F = 2.6, f = 6.97mm 0.60:1 4.15 m@2 m or 2.082 @1 m 0.5 m 3.7 m 4000h(ECO)/3000h(Bright Mode) 0.72 m
Table 3.3: Specication of the Acer S1200, information taken from [1]
3.7 Infrared Light

As stated before the FTIR and the DI eect is using infrared light, which does not interfere with the image projection. For this purpose infrared illuminators are
October 22, 2009
3.7 Infrared Light
21
needed to illuminate the acrylic plate. Infrared illuminators can be self build out of single light-emitting diodes (LEDs), LED emitter or LED ribbon: Single LEDs Can be either normal infrared LEDs or Surface-Mounted Device (SMD) infrared LEDs. The normal infrared LEDs are bigger and easier to solder, but normally they are less powerful than the SMD LEDs. The SMD LEDs need to be soldered to a board, which is not needed for the normal LEDs. With single LEDs the user has the freedom of arranging the LEDs for his own needs. He is not bound to industrial standards, but needs soldering experience and the tools to do so. LED emitter Prefabricated emitters, which are used for a DI setup, because they are normally round and have a large surface. These emitters are normally used as the headlight for a night-vision camera. These emitters have a dense area of infrared LEDs, so a hotspot is produced. This can be eliminated by bouncing the infrared light o the sides and oor of an enclosed box. With emitters the user does not need to solder anything. LED ribbons These are prefabricated LED strips with SMD LEDs on them. They are normally used for an FTIR setup. This is the easiest way to build an FTIR setup, because the LED ribbons have an adhesive side and therefore can be glued to a frame. For the FTIR method a long thin illuminator is required and for DI an even illumination is needed. Due to the fact that we want bright spots at the places where the user touches the surface, a great deal of infrared light is needed. We chose SMD LEDs, because they have a higher total radient ux. Most of the people who build such multi-touch tables use the SFH 485 from Osram [48]. We used the SFH 4250, which is shown in Figure 3.9. We have chosen this LED, because many people have used this LED before and it was recommended by Sch oning et al. [51].
Figure 3.9: The Osram SFH 4250 soldered on a board For the FTIR eect we need to mount the LEDs at the edges of the acrylic plate so the infrared illuminator needs to be long and narrow. For building such a long and thin illuminator a board has been created which ts 24 of these LEDs in groups of 6. A group consists of the LEDs with a resistor in series. 14 of those boards are used to illuminate the acrylic plate. This custom board was self-etched with the pattern shown in Figure 3.10. The area of each pad which is needed for connecting the LEDs with the board should be at least 16 mm2 for absorbing the heat. Due to the fact that we are switching on the LEDs only when it is necessary, the LEDs are not getting too hot and do not need a bigger heat pad.
Andreas Holzammer
22
3. Hardware
Figure 3.10: Etching layout
Figure 3.11: Placement of the infrared illuminators For the DI eect bigger illuminators are needed. The LEDs are mounted on a normal board which is 16 x 10 cm big. On such a board are 24 LEDs placed in four rows, with six LEDs in each row. The placing of these infrared illuminators is not easy, because the acrylic plate is reecting infrared light and these reections interfere with the touch detection. A position needs to be found that almost no direct reection gets to the camera. Positioning of the infrared illuminators on the oor results into such reections. Either the illuminators need to be placed at an angle which moves the illuminators outside of the table or they have to be reected at a wall. Another approach is to place the illuminators just below the acrylic plate nearly vertically, as shown in Figure 3.11
3.8 Surface Layers

The Surface normally consists of dierent layers. For dierent touch-detection technologies it can dier what layers are needed and which are not needed. In the Chapter 2 we already stated which layers are needed for the dierent setups. In this section we discuss what these layers are and which dierent materials can be used for those layers.
3.8.1 Compliant Layer

As stated before, for an FTIR setup we need a compliant layer, which is not needed for a DI, DSI, LLP and LED-LP setup. A material with a dierent refraction index than the air is needed to frustrate the internal reection. Since the ngers have rills, there is a lot of air between the nger and the acrylic plate if the user touches the surface; to ll these air gaps between the nger rills a compliant layer is required. It is possible to build a multi-touch display without a compliant layer, but the user needs very greasy ngertips or a lot of pressure to use the display. Baby oil can be used to substitute the compliant layer, but it needs to be spread out from time to time on the acrylic plate. Many people are not comfortable working with baby oil on their hands and perhaps oil will get on any papers they may have that are close to the display. A material is required which is exible enough for lling the rills of the ngertips or a material that has a similar refraction index to the acrylic plate. It should be either transparent or a good rear projection screen. The compliant layer can stick
October 22, 2009
3.8 Surface Layers
23
to the acrylic plate, but it then needs a refractive index close to the acrylic. If an additional projection foil is needed the compliant layer should not stick to the projection foil. Silicone Silicone can be used as the compliant layer due to the fact that its refractive index is very close to that of the acrylic plate. It can be spread over the acrylic plate. There are several ways to make a smooth layer of silicone on the acrylic plate.
spraying the silicone onto the acrylic plate roll the silicone onto the acrylic plate spread out a thin layer with a rigid bar get a silicone foil
For all the variants described above a low viscosity silicone is needed. Silicone can be thinned with Zylol or Xylol. This makes the silicone more liquid, but because of the mixing process it has little bubbles of air inside. These bubbles could interfere with the FTIR eect, but tests showed that the bubbles are not interfering, actually there is no dierence between the thinned and normal silicone. For spraying and rolling the silicone onto the acrylic plate a few layers are needed, because one layer would be too thin to work. The rolling method produces a textured layer, so if it is rolled on the acrylic plate the surface of the plate has a texture, which interferes the FTIR eect, because the angles of the rays that travel in the acrylic plate are not perfect anymore. And if the angles are not perfect there is no total internal reection. The issue with silicone is that it easily sticks to other materials. If the projection foil sticks to the silicone the total internal reection is frustrated at this point. In our tests the projection foil was sticking only short to the silicon, but this eected streaks. These streaks can be removed with talcum powder, but the talcum powder also interferes with the FTIR eect. The problem with bought silicone foil is that it is mostly powdered with talcum, because they sticks very well to each other. It is very dicult to wash o the talcum powder from the silicone. Latex As compliant layer latex can be used too, because it is exible enough to fulll the ngertip rills. Latex is a natural product, so there is no latex that is totally transparent. The latex has a yellow/brown color. But latex can also be used as the projection screen. The projection performances are not as good as a professional back projection screen, but good enough for normal working conditions. Latex is dedicated to human grease, it needs to be cleaned with silicone oil. Latex also sticks to the human skin, so an additional protective layer is required. This layer should be transparent, so that it is not interfering with the image and it should have a nice touch feeling, because it is the actual touching layer.
Andreas Holzammer
24
3. Hardware
Figure 3.12: Streaks of thinned silicone at the top and unthinned silicone at the bottom
Discussion It is very dicult to get an even thick layer of silicone on an acrylic plate. We have tried to roll the silicone on the acrylic, which produced a structure on the silicone which interfered with the FTIR eect. We also tried to spread out a thin layer with a rigid bar, which had good results in terms of evenness, but the projection foil stuck to the silicone and produced streaks (shown in Figure 3.12), which is disturbing the touch detection process. We have chosen a latex layer, because it has no problems with streaks.
3.8.2 Projection Screen

As projection screen several materials can be used. Sometimes other surface layers can be used as projection screen. One cheap approach was stated by Tinkerman from the NUI Group forum [22]. He rolled silicone to a vellum paper. This method combines the compliant layer and the projection screen. This method produces a textured layer of silicon; this is good because then the silicon is not sticking to the acrylic plate. With this layer good, bright blobs are produced by pushing the surface, but the vellum paper is on top so it can be damaged by the touching ngers. If there are only transparent layers a projection screen is needed. A popular projection screen used by multi-touch display builders is the Grey by Rosco [30], but there are many other back projection screens that are sucient. Peau Productions [46] gives a good overview of projection materials.
3.8.3 Protective Layer

For some setups a protective layer is needed, because the layers beneath are sensitive to scratches or the layer does not have good friction characteristics for touches. For setups that do not need to push something in, a thin acrylic layer can be used. With glossy materials the touch feeling is not very smooth because of the friction on the glossy material. It shows that rough materials have a nice touch feeling; this is because the friction on this material is less. For example, a sandblasted acrylic plate can be used if it does not disturb the image viewing. For setups where the push eect is needed, soft Polyvinyl chloride (PVC) can be used. The soft PVC should be very thin, because it needs to be dented and the thicker the soft PVC is, the harder it is to dent.
October 22, 2009
3.9 Switching Circuit
25
3.8.4 Dierent Surface Layer Setups

Figure 3.13 shows two dierent setups for an FTIR setup. One is done with silicone and one with latex. As stated before a big contact area is required for the frustration of the infrared light, to get a large contact area the air gap between the nger rills is fullled by the compliant layer to get a large contact area.
a b c d
Figure 3.13: Surface layers for an FTIR setup with silicone: projection foil (a), gap (b), silicone (c), acrylic plate (d) and for an FTIR setup with latex: protective foil (a), latex (b), gap (c), acrylic plate (d)
3.9 Switching Circuit

The old table at our Institute had a switching circuit to switch the infrared illumination on and o to subtract the ambient light that is produced by the sun and urban lights, which is described in [3]. The old assembly used the output trigger of the camera to get the clock for switching on and o. Pulse width modulation (PWM) was used to control the light intensity. Four LEDs at the corners (outside of the touch-sensitive area) where used to determine if the main infrared illumination is turned on or o. The old assembly used only the FTIR technique to detect touches. The new setup should have the FTIR, DI and ambient light subtraction, so a new switching schematic was needed. The new circuit should be exible, so we decided to develop a schematic that is controlled by the computer. Now the software can decide which infrared illumination is switched on, while the camera takes an image. For the new setup we needed an interface to the computer. We decided the parallel port is the easiest way to proceed. Almost every computer has a parallel port which is very easy to use and how to set this up is well documented. The serial port has only one data pin and a power level of 24 V, whereas the parallel port has a power level of 5 V. The circuit is using 5 V so the parallel port is ideal to use. We took the rst four data pins of the parallel port. If the operating system is started, the level of the data pins is high. To avoid this we used an inverter. The inverter also increases the high level, because the actual level on the parallel port can be much lower, like in our setup 3.4 V. A topfet is used to switch the infrared illumninators, which need a lot of power. But for the topfet it is not good, to be switched by 3.4V because then more voltage is falling o at the topfed, which makes the topfet pretty hot. Also, the LEDs then do not get the full voltage. The resulting switching schematic is shown in Figure 3.9. The pin selection for the LEDs is shown in Table 3.4.
Andreas Holzammer
26
3. Hardware
Figure 3.14: Switching Circuit. There are many more LEDs involved in that circuit, but are not shown. Pin D0 D1 D2 D3 Purpose FTIR LEDs Reference LEDs External camera trigger DI LEDs
Table 3.4: Parallel port data pins used for switching
3.10 Power Supply

A power supply is needed for the switching schematic and the infrared illuminators. 5 V are needed for the logical circuits and the topfet. To power the infrared LEDs a forward voltage of 12 V is used. There are 6 LEDs in series with a resistor. Each LED has a forward voltage of 1.5 V at a forward current of about 100 mA. This results in a forward voltage of 9V, so the resistor needs to take the rest of the voltage. With Ohms law we calculate the value of the resistor to be 30 . We took a 22 resistor, because we are switching on the LEDs only for a very short time (about 2 ms). This is done to get more illumination power from the LEDs. The pulsing of the power is described by the manufacturer of the LEDs. We took a normal computer power supply for powering the switching schematic and the infrared illuminators.
October 22, 2009
Chapter 4 Algorithms
After looking at the hardware setup, this section describes algorithms, which are needed for image preprocessing, touch detection and tracking of FTIR and DI images. The preprocessing of the images extracts the informations we need for the analyzing process by ltering the captured images. Afterwards a feature detection is carried out to nd touches and other information in the image. Later this information is post-processed to transform the touches to the right place on the screen and to track touches.
4.1 Image Preprocessing

All images need to be preprocessed to subtract physical and hardware related side eects which disturb the analyzing process.
4.1.1 Bright Image Removal

The rey MV captures sometimes an overexposed image at the beginning of the capturing process, which have to be removed for the background subtraction and hotspot removal. One basic approach would be to calculate the histogram of each image and decide if the image is too bright with a static value. But due to the fact that the algorithm should work for various imaging technologies (FTIR, DI, etc.) the brightness of the image cannot be stated as a constant value. To overcome this aw the following adaptive algorithm has been created. A set of images has to be evaluated, because only then can the typical brightness be evaluated. At least 3 images are required if it can be assumed that only one image is overexposed. To nd the overexposed image the histograms of the images are pairwise compared (histogram distance). The algorithm is divided into the following parts:
nd the image that has the smallest histogram distance to all other images collect all images that are within a certain distance to the previous found image
To nd the smallest distance between one image to the others, the accumulated distances of all images are calculated with a given distance metric. An accumulated distance is the sum of the distances of one image to all the others. In this list of distances the smallest value is searched. This corresponding image has the smallest histogram distance to all the others. After this calculation, the distances between the found image and the others are calculated. Is the distance below a certain value these images are not overexposed.
27
28
4. Algorithms
This algorithm is necessary because it is important that in the background subtraction algorithm no overexposed image is involved, otherwise the touch detection is not sensible enough.
4.1.2 Ambient Light Subtraction

As stated before changing light conditions are disturbing the detection as follows. If the background image is statically taken at the beginning and the light conditions change afterwards the background is changing too. If the background changes the background subtraction fails; this can lead into false touch detection, especially if the background gets brighter. One solution to this issue is to use an adaptive background subtraction, which requires greater performance and also decreases the touch sensitivity. Another approach is stated by Alexa et al. [3], to take two images, one with the infrared light turned on (blob image) and one with the infrared illuminators turned o (reference image). The image with the infrared illuminators turned o gives us the ambient light which is shining into the table. These two images are subtracted to remove the ambient light from the blob image. This method is also described for eye detection by Zhu et al. in 2002 [59] to increase the contrast of the eyes to detect them. If these images are taken close together, this methods gives good results. But this means that a second image have to be taken; each time a blob image has to be taken; this results in half of the frame rate of the camera for the whole setup.
4.1.3 Background Subtraction

The background of the image disturbs the detection algorithms, so it needs to be subtracted, especially for the DI technology, because infrared light is partly directly reected by the acrylic plate. Due to the fact that the background is static, because the multi-touch table is not moving, in contrast to the touches that are appearing, disappearing and moving, an image can be taken at the beginning and be subtracted from each following image. This technique is well known and is described in various computer vision books like Computer Vision: A Modern Approach from Forsyth et al. [19]. For this process a few background images are taken, because an overexposed image would be fatal. To reduce the noise of the images, the maximum brightness value of each pixel is used as background image.
4.1.4 Hotspot Removal

As mentioned earlier all external light can lead to a false touch detection. The projector itself is a light source, which also generates infrared light. The projected image is reected by the acrylic plate in the way that the camera captures a little bright spot, if the projector projects bright images. We call this little bright spot a hotspot. This hotspot can be reduced or removed with several techniques. One solution would be the change the camera position. This cannot be done because then the view angle of the camera changes and with it the image content. With the FTIR technique most of the beams that are reected by the nger are going straight down, so if the camera is not exactly in the middle of the acrylic plate, the touches at the sides are getting darker.
October 22, 2009
4.1 Image Preprocessing
29
Another approach is to polarize the projector light and use another polarization lter in front of the camera (lter rotated by 90 degrees). This solution was not used because the polarize lter in front of the projector is darkening the projected image. To suppress the hotspot an infrared lter can be used in front of the projector. This approach reduces the hotspot, but it does not remove it completely. The hotspot can also be removed by software, because it is not moving and therefore always at the same spot. But the problem with this solution is that the table is not sensitive at this spot. At our observe actions we determined that the spot is most of the time smaller than a nger, so if the hotspot image is subtracted from the captured image of the camera the blob of the nger is not lost, but has a hole inside, which is not interfering the detection process.
Figure 4.1: Reection of the beamer produces a hotspot in the picture To remove the hotspot by software, it is necessary to detect where the exact position of the hotspot is, in order to subtract it. A few images are captured when the projector is projecting a black image and no user is touching the surface. More than one image is taken, because the camera and the projector are not synchronized. Later on, images that are not overexposed are selected, which is calculated by the algorithm presented earlier. A resulting image is calculated by taking the maximum color value for each pixel of the imageset. Next a few images are captured when the projector is projecting a white image. Here again a resulting image is created by combining the images. Afterwards, these two resulting images are subtracted from each other. This gives us an image where only the hotspot is showing. This image can be subtracted from each image captured by the camera to remove the hotspot.
4.1.5 Image Normalization of DI Images

As stated before an even illumination for a diused illumination setup is needed for the detection. It is very dicult to get an even illumination with a self-built hardware setup. The infrared LEDs need to be placed at exact positions and a position plan needs to be created which calculates all overlaps, reections, refractions etc. For an FTIR setup the total internal reection spreads the illumination in the acrylic plate more evenly. In an FTIR setup the internal reection is only frustrated if a user actually touches the surface, but for a DI setup the hand only needs to be close to the surface to be detected. Then if the illumination is not even the surface has a varying sensivity at dierent spots on the surface. This can lead into a false detection and can be very disturbing for the user if the computer detects the hand even if he is not touching the surface. To overcome this problem the illumination of the acrylic plate can be measured
Andreas Holzammer
30
4. Algorithms
Figure 4.2: Illumination of the surface. The image has been normalized and colored to show the illumination dierences. The image should be totally red to be even illuminated. and the captured images can be normalized to this illumination. The illumination does not change, because all the parts are mounted to the table and nothing moves. To measure the illumination the same setup is used, as for the detection. An image is needed which shows the maximum brightness in each pixel that can be produced by a hand. To capture such an image, the camera captures images of the surface and these images are combined in the following manner. For a pair of two images the maximum brightness of each pixel is calculated for the resulting image. This result is combined with the next image and so on. On the surface a hand needs to be placed at all locations to get the the maximum brightness for each pixel. It is important to use a hand, because dierent materials have dierent reection properties and even hands have dierent reection properties. The resulting image has to be blurred, because it is not possible to put a hand in all positions. Also, the black frame around the surface is colored white, because these regions are outside of the surface. We call the resulting image illumination image. The image for our table is shown in Figure 4.2. To normalize the captured images, each pixel of the captured image is divided by the corresponding pixel in the illumination image and then multiplied by the maximum brightness value of the captured image (typically 255 for an 8 Bit greyscale image).
4.2 Feature Detection

A human can see various features like touches, hands and ngertips in an image, but the computer needs to analyze the image to derive these information. A touch detection is normally applied to a preprocessed FTIR image. The hand and ngertip-detection algorithms are normally applied to DI images.
4.2.1 Touch Detection

A touch can be described as a bright spot in an FTIR image, which we call a blob. If the pixels at the touch points have a certain intensity and the other pixels are nearly black a static threshold can be applied. After this preprocessing a blob detection can be used to nd the touches. For the DI images it is a little bit more complicated, because the image illustrates the contour of the hand. A touch in a DI image can be seen as a bright spot at the
October 22, 2009
31
ngertips. A ngertip has a certain size, so all bigger spots and smaller spots can be removed with a bandpass lter. This lter extracts the spots that have the certain size. Afterwards a threshold is applied and the blobs are detected.
4.2.2 Hand Detection

A human hand is a fairly complex object with ve ngers and a palm. Each nger has three joints, except for the thumb, which has just two joints. Kang et al. state that the hand has roughly 27 degrees of freedom [36]. With the 27 degrees of freedom the hand can make too many poses to detect all of them. Due to the fact that just one camera is used in the setup not all degrees of freedom can be approximated; with more cameras more perspectives could be used to extract more information. The ve ngers of a hand cannot always be seen clearly on images because the user is not always spreading his hand fully, which leads in two ngers merged together to one or even the whole hand forming a st. The detection of the st is not important, because we only want to detect the touches on the surface. So minimal requirements to dene a contour of a hand would be:
at least one nger maximum of ve ngers has a minimal area
4.2.3 Fingertip Detection

There are several methods to detect ngertips. One method is to t a template or templates of ngertips to windows of the image, which is described by Kim et al. 2008 [37]. Another method is described by Wu et al. in 2000 [56] to t a conic to a possible ngertip. Yang et al. describe in their paper [57] various ways to detect ngertips. Two possible algorithms for nding ngertips are described below. Convexity Defects Based Detection One basic approach is to evaluate the convexity defects of a hand contour. First a convex hull of the Hand contour is calculated. Later on the convexity defects are computed, which are normally between ngers (shown in Figure 4.3). This algorithm can only detect ngertips, when there are at least two ngers. This leads to the algorithm not being able to nd a pointing gesture, where only one nger can be seen. Convexity defects have a starting point, an end point and a point where the depth of the convexity defect has its maximum depth. These three points can describe a triangle. Due to the fact that ngers have a certain size, the maximum depth of the convexity defect has to have a certain size too. Also, the ngers can only be spread a certain amount. So the distance between the start- and the endpoint can have only a certain size. All these attributes are important for detecting ngertips, because the hand has other convexity defects than between the ngers (seen in Figure 4.3) and we have to separate these convexity defects from each other to make the detection stable.
Andreas Holzammer
32
4. Algorithms
Figure 4.3: Convexity Defects of a Hand. Convexity defects are drawn in red and the depth is shown with arrows
Figure 4.4: The left image shows a non-smoothed contour of a hand and right a smoothed contour The contour-nding algorithm can make the contour of the hand very rough, because the edges of the hand can be very noisy. Figure 4.4 shows a non-smoothed contour of a hand. Especially at the arm which is normally fading out because the user is touching the surface from above. One approach is to smooth the image with a smoothing lter, which needs a lot of performance and can merge two ngers which are close together. Due to the fact that we want a real-time application this is not an option. Another approach is to smooth the contour. The points of the contour are sorted clockwise by the contour nding-algorithm of OpenCV. The smoothing algorithm works as follows:
go through the contour clockwise with a step size of s collect n neighbors on the contour around the certain point take the average position of the collected neighbors
October 22, 2009

add this point to the new smoothed contour
33
The algorithm rst picks one point on the contour pi and for this point pi the algorithm collects its neighbors on the contour. To do so, the algorithm collects (n/2) points that are next in the contour and (n/2) points that are before point pi in the contour. These are the neighbors, because the points of the contour are sorted clockwise. Then the arithmetic average is calculated:
n
pj n The resulting point pk is added to the new smoothed contour. After that a new point is picked out of the contour at a step size of s; this is repeated until the algorithm is at the start of the contour again. Due to the fact that the contour is closed, there are previous points for the start and next points for the end of the contour. The parameters are depending on the roughness of the input contour. Experiments showed that s = 4, n = 10 are sucient parameters for out setup. A result is shown in Figure 4.4. Shape-based Detection Malik and Laszlo described in 2004 a shape-based ngertip-detection algorithm [41], where ngertips can be extracted out of the contour itself. As stated before a nger has a certain size and a certain width. A nger can be approximated with a cone, therefore a triangle can be tted on the ngertip of a contour. Again the contour consists of points that are ordered clockwise. We take three points p, a, b on the contour (shown in Figure 4.5), which form a triangle. These points should be at a certain distance. The ngertips point out of the contour. So we need only triangles that are pointing outside of the contour. Two vectors can be established: pa, pb, with which we calculate the 3D crossproduct, where the z component is zero. A direction of the triangle can be determined by the right-hand rule. If the z component of the 3D crossproduct is positive the triangle points to the outside of the contour. We can also determine the angle ab (shown in Figure 4.5) and if that angle is in a certain range, it is possibly a ngertip. Due to a rough contour as described earlier, a lot of false possible ngertips are detected. Hence, here again the contour smoothing is applied to reduce the false positive rate. One side eect is that the algorithm is faster, because the contour has fewer points after the smoothing. But still with the contour smoothing false positives are recognized. Due to the curvature of the ngertip more triangle positions are found for one ngertip. These triangle points have to be grouped and one representative has to be found for that group. The found points of a ngertip are likely close together and ordered clockwise on the contour, so these points are one after another in the list of the possible ngertip list. The algorithm goes through the list and calculates the pairwise distance of the points and pushes the points that are close together onto a stack. If it gets to a point that is not close to the last point pushed to the stack, it picks out the point that is in the middle of the stack and denes this as our representative for the ngertip. After that the algorithm goes on like this, until it is nished with the possible ngertip list. pk =
j =0
Andreas Holzammer
34
4. Algorithms
p ab a b
Figure 4.5: Dominant point detection The possible ngertips can be ltered as follows to reduce the false positive rate. Touches are bright spots on the image. So we have to nd a bright spot close to the ngertip to verify that the chosen point is really a ngertip. For this purpose the image is analyzed. The mean shift algorithm, which was developed by Fukunaga and Hostetler in [20], is used to nd the bright spot. The mean shift algorithm can nd local extrema in density distribution data sets. For our purposes the density dataset consists of the brightness values of the pixels. The algorithm can be easily described as hill climbing on a density histogram. It is robust in a statistical manner; this means that it ignores outliers in the data. A local window is used to ignore points which are far away from the peaks of the data and then the window is moved. The algorithm works as follows. 1. a window size is chosen 2. an initial starting point is chosen 3. calculate the window center of mass 4. move the window to the center of mass 5. go to step 3, until the window is not moving anymore Comaniciu and Meer proved in [11] that the mean shift algorithm is always convergent. The mean shift algorithm is now shifting the window in the direction of the bright spot. The implementation of the mean shift algorithm in OpenCV is slightly dierent; it quits the loop if a certain epsilon is achieved or x iterations have been reached. As window size, a normal size of a nger is chosen. The starting point of the algorithm is the possible ngertip. The algorithm stops as stated above if the window is above the bright spot or x iterations are reached. The algorithm also provides the sum of the brightness values of the pixels which are in the window. If this sum is over a certain value the algorithm has found the bright spot and veried that the possible ngertip has a touch related to it. Discussion The convexity defects algorithm has weaknesses with ngers which are not stretched out. The problem is to determine if the chosen convexity defect is between two ngers and not between a nger and the hand. The diculty of the problem
October 22, 2009
35
depends on how much of the arm can be seen on the captured image. Also, the roughness of the contour disturbs the detection of both algorithms. Both methods are faster than applying a bandpass lter to the captured image and applying a blob detection afterwards. The convexity defects algorithm is not used because of its restrictions (can only detect two or more ngers).
4.2.4 Hand Orientation

Many multi-touch displays are mounted on a table, were the orientation of the display is not clearly dened. So the user can use the table from various sides. The information about hand orientation can help the user program to determine where the user is positioned and could adjust its content to this orientation. The orientation of the hand can be derived mostly due to the arm which can be seen in a DI image. If the DI image does not show the arm, the user most likely holds his arm straight up in this case ngers can be used to approximate the hand orientation. To determine the orientation the centroid is determined. This can be done with the help of the contour moments [27][26]. A contour moment can be calculated by integrating over all pixels of the contour. The (p,q) moment can be calculated as follows:
n
mpq =
i=1
I (x, y )xp y q
Where n is the number of points in the contour and I (x, y ) is the brightness value of the pixel at position (x,y). So the centroid can be calculated as follows: xc = m01 m10 , yc = m00 m00
The orientation of an object can be described as the tilt angle between the x-axes and the major axis of the object (which can be seen in Figure 4.6). This corresponds to the eigenvector with minimal eigenvalue. In this direction the object has its biggest extension. The orientation can be calculated like this: m20 m00 m11 b= m00 m02 c= m00 2b ac+ 4b2 + (a c)2
a=
= arctan
This angle points to the direction of the biggest extension, so if there is an arm in the DI image, it points in this direction. But if there is no arm in the image it points to the longest nger seen in the image, which should be typically the middle nger.
Andreas Holzammer
36
4. Algorithms
Figure 4.6: Orientation angle theta, derived from central moments The orientation vector needs to be corrected due to the fact that it could point to the arm or to a nger. If the vector points to the direction of the arm the vector needs to reversed. To determine if the vector points to the arm, or to a nger a vote is done. If more then half of the nger vectors are pointing in the direction of the orientation vector the orientation is accurate; if not the vector needs to be rotated by 180 degrees. The vectors of the ngers can be derived from the centroid and the points of the ngertips. The orientation vector can be derived from the centroid and the angle . Angles between the orientation vector and the ngertip vectors are calculated with the dot product of two vectors as follows: cos = ab |a| b This calculates the smallest angle between these two vectors a,b. This means the angle is either clockwise or counter clockwise. If the angle exceeds 90 degrees it is pointing in the opposite direction. If more than half of the ngertip vectors do not point in the same direction, the orientation vector needs to be reversed. To reverse the orientation vector 180 degrees are added to the angle.
4.3 Post-processing
The detected features normally needs to be post-processed in the following manner. The objects that are detected in the individual images need to be tracked over the time. Also, the position needs to be transformed from image space to the screen coordinate space.
4.3.1 Undistortion
Due to the fact that we use a sheye lens for the camera, the pictures taken by the camera come with a barrel distortion eect. For correcting this distortion a camera calibration has to be done. One approach would be a linear approximation of this eect, but the eect is not linear, so only very small windows can be used. This means the user needs a lot of calibration points in the calibration process to get a sucient calibration result. To correct this eect suciently the camera calibration was done with the Camera Calibration Toolbox for Matlab [7]. For the process approximately 20 images of
October 22, 2009
4.3 Post-processing
37
Figure 4.7: Example image of checkerboard

"Old" touch "New" touch
Up
Down
Moved
Figure 4.8: States of a touch, derived by tracking touches a planar checkerboard (dierent angles, scales, etc.), which cover about 1/4 of the image, are needed. An example image is shown in Figure 4.7. The Camera Calibration Toolbox for Matlab then calculates the specic parameters of the lens. With this data a distorted image can be undistorted, but to avoid performance issues, only the positions of the touches and hands are undistorted.
4.3.2 Tracking
The detected touches and hands need to be tracked over the time. For multi-touch interaction it is not enough to detect touches in each frame, because we want to know how a touch has moved. The following information is needed if a new touch was introduced, a touch has moved or a touch has left, which is shown in Figure 4.8. For deriving this information the touches of the previous detection round are required, which we call old touches; the current detected touches are called new touches. The new touches need to be assigned to the old touches to derive the information stated above. There are several approaches for tracking positions of objects. One is the stable marriage problem algorithm, which was created by Gale and Shapley in 1962 [21]. Another simple solution is to nd for each new touch the closest old touch. If these two touches are close together it can be stated that these two touches belong together and the touch has actually just moved. We have chosen this approach because it is easy to implement.
Andreas Holzammer
38
4. Algorithms
The algorithm is altered for the matching of FTIR and DI, as follows. The user is mostly pushing with pressure, if he initially touches the screen and later on when he drags the nger the pressure gets lesser. Because of this fact the FTIR technology loses the track of the touch. The DI technology is very sensitive and produces a high false detection rate, which can be very disturbing to the user as Wigdor et al. [55] state. Therefore, only new FTIR touches are kept. If a new DI touch is detected, which cannot be matched with an old touch, it is ignored.
4.3.3 Calibration
To transform the position of touches and hands from image space to the screen space a calibration is needed. The screen coordinates are normalized to a scale between 0 and 1 in width and height, so that various display techniques can be used for the image displaying. For calibration purposes a calibration tool on the client side is used to calculate the transformation. To do so, the user needs to push nine points. These nine points are sucient because the barrel distortion eect is removed beforehand. With these nine points a perspective transformation is calculated and the touch points are transformed with that perspective transformation.
October 22, 2009
Chapter 5 Combining Frustrated Total Internal Reection and Diused Illumination
In this chapter the combination of Frustrated Total Internal Reection and Diused Illumination is discussed. First the resulting images of the individual techniques are discussed and afterwards it is discussed how these images can be combined and how the data of the individual techniques can be combined. Figure 5.1 shows the basic hardware approach of a combination. Each of the technologies has its own weaknesses. The FTIR technology has its strength in touching, but has problems with less pressure. The issue is that if the user rst touches the surface he applies pressure, but if he drags the nger less pressure is applied and then the FTIR eect is not working anymore. The DI technology has the problem that it cannot be determined whether the user is really touching the surface or not, but it is very sensitive. If the surface is not even illuminated the sensitivity is varying depending on the position.
Figure 5.1: A combination of Frustrated Total Internal Reection (red) and Diuse Illumination (blue).
39
40
5. Combining Frustrated Total Internal Reection and Diused Illumination
Figure 5.2: Comparison of hand touch with pressure DI (left) FTIR (right). Colormap (shown at the top) has been changed for the window to show the contrast.
Figure 5.3: Comparison of hand touch with no pressure DI (left) FTIR (right)
5.1 Images of Frustrated Total Internal Reection and Diuse Illumination

The resulting images of the capturing process for an FTIR image are shown in Figure 5.2 and Figure 5.3. The shown images are taken with 1.2 ms capturing time and 12 db gain. These images are also preprocessed in the way that a reference image has been subtracted (which is described in [3]) and the background was removed (described in section 4.1.3). As one can see if no pressure is applied the FTIR image shows almost no touch information (contrast too low between touch and noise). This is because the latex and the protective foil need to be pushed to the acrylic plate in the way that the gap between latex and acrylic plate is lled. The projective foil is 0.2 mm thick and also needs to be dented. If the protective foil is thinner, the better it can be pressed to the latex. Remember that the protective foil was needed because of the friction of the ngertip on the latex and to avoid grease getting onto the latex. Also, the thickness of the compliant layer plays a role on how hard the user has to push the surface to produce the FTIR eect. The information which can be extracted from the images is quite dierent. The
October 22, 2009
5.2 Processing Pineline
41
Figure 5.4: Comparison of at hand touch DI (left) FTIR (right)
Figure 5.5: Comparison of touches close together, DI (left) FTIR (right) FTIR image illustrates the actual touches of the nger, but gives no information about how these touches belong together. The DI image illustrates the contour of the hands and some depth information by the gradients in the image. Depth information also can be approximated by the brightness at a certain spot of the image. But these informations are very rough, because each material has dierent infrared reection characteristics. The human skin reects the infrared light fairly good [58]. If the user wears a long shirt, a sweater or even a watch the image looks very dierent. If the hand is laid at on the surface almost no information can be extracted from the FTIR image, as illustrated in Figure 5.4. Parts of the palm can be seen and a few ngers that are very dark. On the other hand, in the DI image the hand can be seen clearly. When two touches are close together it is dicult to separate them from each other in the DI image in contrast to the FTIR image; there the dierence between these two touches can be clearly seen (Figure 5.5). The human eye can determine these touches, but the computer needs a few steps to detect the touches.
5.2 Processing Pineline

The processing pipelines for FTIR and DI are slightly dierent. For an FTIR setup the captured image is preprocessed by subtracting the ambient light and background,
Andreas Holzammer
42

Ambient Light Subtraction Background Removal Blob Detection Tracking Transformation Application Ambient Light Subtraction Background Remove Fingertip Detection Tracking Transformation Application
Figure 5.6: Normal FTIR Pipeline shown on the left, a normal DI Pipeline is shown on the right
Figure 5.7: left: Hand image with FTIR and DI LEDs on, right: Hand with FTIR and DI image multiplied
which was discussed in section 4.1. After this, bright spots are detected and then the coordinates of these spots are post-processed to track the touches and transformed to match the screen coordinates. The transformation includes the undistortion and calibration. For a DI setup there are several pipeline models possible. The general pipeline order is that the captured image is preprocessed with subtraction of ambient light and background and then ngertips are extracted, as described in section 4.2.3. These coordinates are post-processed like in the FTIR pipeline. For detecting ngertips a bandpass lter can be applied and then bright spots can be detected as ngertips. Another method is to evaluate the hand contour as described in section 4.2.3. Sample pipelines for FTIR and DI are illustrated in Figure 5.6.
5.3 Combination
One way of combining the FITR and the DI eect is described by Weiss et al. in 2008 [54]. They built a table on which the user can put silicone widgets, to work with and get a haptic feedback. They can change the labeling of the widgets by changing the image projection to the silicone widgets, because they call them Silicone ILluminated Active Peripherals (SLAP). They have the FTIR and DI LEDs switched on at same time. Their table cannot switch the individual LEDs on and o at the same time, in contrast to ours that can be switched on and o
October 22, 2009
5.4 Matching
43
Figure 5.8: Left post-processed image where FTIR and DI LEDs are switched on (threshold applied) and right a multiplied FTIR and DI image (additional threshold of 70 applied) individually. If there are individual images, these images can be multiplied, as shown in Figure 5.7, in contrast to turning on FTIR and DI LEDs at the same time. In the multiplied image each pixel is multiplied with the corresponding pixel in the other image. As one can see, the ngertips get very bright, but also noise can be seen where the hand is placed. It can be assumed that in the FTIR image the pixels where the hand is at are not zero. Due to the fact that two 8-bit brightness values are multiplied and the result is written again in an 8-bit brightness value, data is lost. A bigger brightness depth is needed or the result needs to be scaled to t into an 8-bit value. Due to the fact that we process 8 bit greyscale images we want to stick to the 8-bit. Practical tests have shown that a scale factor of 0.2 is sucient for our setup. The resulting images can be seen in Figure 5.8, where the touches can be clearly seen. The issue with this technique is that a time gap between the FTIR and DI image exists. If the user is moving his hands, which is typical for touch displays, the images are not matching exactly. If the speed increases the blobs are getting smaller due to the dierence. This goes so far that at a certain speed no blob can be detected. Therefore, this method is not used, because it is very dicult to extract touch information from these images.
5.4 Matching
Another approach is to match the results of the FTIR and DI eect. As stated above, the FTIR eect can be used to detect the actual touches, but only at a certain amount of pressure. On the other hand a DI image gives the information about the contour of the hand; the ngertips can be extracted and touch points can be approximated with the help of the extracted ngertips as stated in section 4.2.3. The results of the FTIR eect can be seen as stable information, because the user actually needs to touch the surface to produce a touch; therefore this information is preferred. To match both informations, the touches extracted from the FTIR images are taken as base information and tested against the contour extracted touches from the DI image. This is done with the algorithm cvPointPolygonTest of OpenCV 1.1pre [31]. Algorithms to determine if a point is inside a polygon are described in [23]. All the touches of the FTIR image are tested if they are inside a hand contour and are grouped together to a hand. If there are ve touches associated, this hand is complete and does not need to get possible ngertips from the DI image.
Andreas Holzammer
44
If there are more than ve FTIR touches associated with the hand contour the biggest touches are removed from the hand, because if there are more than ve touches in an FTIR image it is possible that the user has put his hand at on the surface and the palm of the hand is producing the extra touches, which are normally bigger than touches of ngers. If there are fewer then ve touches inside a contour, the extracted ngertips of the contour are checked against the FTIR touches. If an extracted ngertip is below a certain distance to an touch, it is assumed that this extracted ngertip and touch are produced by the same nger. Therefore, this is not a new touch which can be associated with the hand and is not added to the hand. If there is no touch close to the ngertip, the ngertip is added to the hand. But it can be that the ngertip-detection algorithm detects a false ngertip and then the hand has more than ve ngers and normally a human does not have more than ve ngers on one hand. The brightness of the ngertip touches are calculated as a spin-o product of the mean shift algorithm described in section 4.2.3. It is more likely that bright ngertips are really a ngertip. Therefore, only so many touches are added to the hand so that it has ve ngers. Due to the fact that pressure is needed to produce bright spots in the captured image, it can be assumed that the user actually touches the surface. The extracted ngertips of the DI image can be extracted even if the user is not touching the surface, so it is not clear if the user is above the the surface or touching it without pressure. So a height of a touch can be approximated as follows:
Touch in FTIR image only, height is 0 Touch is in FTIR image and corresponding ngertip in DI image, height is 0 Fingertip in DI image only, height is 1
A more accurate way would be to take the brightness of the ngers/touches to approximate the height, but for doing so an even illumination of the surface is needed and that the objects are reecting infrared light similar. This is not established, because it needs a lot of calibration work and a good normalization of the surface illumination is needed.
October 22, 2009
Chapter 6 DIFTIRTracker
The software for an optical multi-touch device is the most important part, because this software is analyzing the captured images of the camera to extract the touch information. With the algorithms described in the last chapter an application can be created to determine and track touches. By looking at some other tracking software, it comes clear that we want to develop our own software. In the Appendix is a short overview of tracking software developed by various authors. The DIFTIRTracker was developed to rapidly test all kinds of technologies and to be exible in order to integrate new functionalities. This software should process data from the hardware and control the hardware in real-time. The idea for the software was to split everything into small parts (modules), so that the software can be easily extended and most modules can be reused. All those parts should have the possibility to display what these modules are doing (show what happened to the image, or what was detected). These modules can be connected to each other in various ways; the only limitation is that each input of a module has only one other connected module. There are three categories of modules: input, lter and output modules. Input modules get a datagram from an external source, e.g., a camera, video le, image le. Filter modules take data, process it and send it to the next module. Output modules send, save etc. a datagram. A datagram normally represents an image, but it is not restricted to image datagrams; also touch datagrams and even others can be implemented. Each module is working in its own thread so that the application can take advantage of multi-core systems and is very scalable.
Figure 6.1: DIFTIRTracker when it is started
45
46
6. DIFTIRTracker
Figure 6.2: Parts of the DIFTIRTracker
The DIFTIRTracker is written in C++ and uses the Qt framework [45] from Nokia. This combination is chosen because the application should run in real-time and the software should be platform independent. The only reason why it is not platform independent is the camera module, because the software development kit (SDK) from Point Grey [49] is not platform independent.
6.1 Graphical User Interface

The Graphical User Interface (GUI) consists of the following parts: diagram scene, module selection, properties window, action control, log window and menus as shown in Figure 6.2.
6.2 Pipeline
Due to the fact that every module is running in its own thread all the modules have to be synchronized. This is done by pipelining the modules. Each connection between modules is represented by a ring buer. If a module is done with its processing of the datagram it puts the resulting datagram into the ring buer, wakes the next module in the pipeline. This next module takes the datagram out of the ring buer and processes it and puts it again in the next ring buer and so on. A datagram can only be sent into one direction, but a circle connection can be built for feedback. The modules can wait for datagrams in a ring buer or can determine if there is any datagram in the ring buer.
October 22, 2009
6.3 Network Interface
47
0 Hand
1 Touch
2 Touch
3 Touch
4 Touch
5 Touch
... ...
Figure 6.3: Hand TUIO Package
6.3 Network Interface

The touch server sends the touch/hand information through the network. To serve more applications the TUIO protocol described by Kaltenrunner et al. in [35] is used to transfer the data from the touch server to the client application. The protocol has been implemented in various platforms and programming languages like Java, C++, PureData, Max/MSP, SuperCollider and Flash. The protocol uses User Datagram Protocol (UDP) packets to send the data through the network, which is faster than the Transmission Control Protocol (TCP), but not as reliable. It is very important that the latency is low, because the user wants to interact with the computer and they will think if the reaction of a touch is too slow that the touch was not recognized and then repeat it. The TUIO protocol has two main types of data; the protocol describes touches and objects, but custom proles can be implemented too. There is a reference implementation of the TUIO protocol, which implements the 2D touch and object prole of the protocol. Due to the fact that a hand can also be seen as an object, and objects in the TUIO protocol have a position, symbol ID and a angle, it matches the requirements we need for the hand. Objects can be used to describe a hands position and orientation, which cannot be done with the touch, because it has no angle. But the hand has more information than a position and orientation, it has ngers that belong to the hand. A touch in the TUIO protocol has no symbol ID, which can be set to produce a relation between the hand and the touches, but objects have a symbol ID, so a nger is also described as an object. A package of a hand and ngers is introduced. Due to the fact that a normal human has ve ngers the package needs six objects, a hand object and ve nger objects. Figure 6.3 shows such a package. These packages are followed by each other, so it can be derived which object is a hand and which are the corresponding ngers, by dividing the symbol ID by six, and examining the rest; 0 is the hand and 1 to 5 are touches. To use the multi-touch table with objects and hands together, the starting symbol ID for hands is increased so as to not to interfere with other possible object symbol IDs. A integer with 32 bits is used for the symbol ID; therefore there are plenty of IDs and not all IDs are needed for objects. A number of 1000 objects which can be used on the surface seems to be sucient. All objects with a symbol ID above 1000 belong to a hand, and hand IDs are reused if the hand leaves the surface, so there should be no ID overow.
Andreas Holzammer
Chapter 7 Results
In this chapter, an application is presented that uses the additional information derived from the combination from the FTIR and DI eect, as proof of concept. Afterwards, an informal user study is presented where the results of the technologies which use the FTIR, the DI eect as well as the combination of both are looked at. Finally the conclusion of this thesis and its outlook for the future is given.
7.1 Proof of Concept

A small demonstration application has been created as proof of concept. It uses the information of the hand and the touches at the same time, and the depth information that is provided by the touch server. It is written in C++ and uses the Qt cross-platform development framework [45]. This demonstration application can run on various platforms, including Microsofts Windows [44], Linux and Apples Mac OS X [5]. Bollensdor presents in his diploma thesis [6] a menu where the user can choose menu entries by laying his hands down, lifting his ngers again and the last nger lifted chooses the menu entry. Labels of the menu are not adjusted to the users position. The developed proof of concept is very similar to this, a user can choose colors by laying his hand on the surface. If the user puts his full hand spread out on the surface the colors are shown as texts placed over the ngertips of the hand. The user can choose the color he wants to pick by pressing the specic nger harder. The last pushed nger chooses the color. A chosen color is then shown in another color, so the user knows which color is chosen (here yellow).
Figure 7.1: Demonstration application of a hand menu
49
50
7. Results
d1
d2
Figure 7.2: Determination if the hand is a right or a left hand To make the labels readable to the user they need to be adjusted to the user position. If we only had the information of the touches this would be impossible. Also, the information that these touches are belonging to one hand is needed to provide this kind of menu. The orientation of the hand is used to adjust the label position in the way that the user can read it. It makes no dierence where the user is standing when they want to use the menu. The user can be in front or sideways of the table to use the menu. The label is also tilted by 45 degrees to reduce the overlap of the labels. Due to the fact that the ngers are not sorted by the touch server, the labels of the menu would be unsorted too. The user is very confused if he wants to go into the menu and every time the menu is sorted dierently. So the ngers need to be sorted, that the user has the same menu each time he opens it up, by laying his hand down. To sort the ngertips of a hand, the hand orientation is used to produce a vector from the centroid in the direction of the arm. This can be done with the orientation provided by the touch server by rotating the vector by 180 degrees. Vectors for the ngertips can be created by taking the centroid and the coordinates of each individual ngertip. Afterwards the angles are calculated between the arm vector and the ngertip vectors. Due to the fact that with the help of the dot product a minimal angle between two vectors can be calculated, the ngertips cannot be sorted by this angle. To calculate the clockwise angle, it is checked if the dot product has calculated a clockwise angle or a counterclockwise angle. If it is a counterclockwise angle, the angle is corrected. The result is sorted. Later on the ngertips can be used for the labeling, so that the user always has the same order of the labels in the menu. It can be determined if the hand is a right or a left hand. A left hand has a thumb that is pointing in the right direction as shown in Figure 7.2, and the right hand thumb is pointing to the left. The thumb is also positioned closer to the wrist than the other ngers. So the thumb and the index nger have mostly a higher distance between each other than the pinky and the ring nger. If the ngers are sorted clockwise and the distance between the rst and the second (d1) are greater than between the last and the previous (d2) it is a right hand, and if not it is a left hand. Figure 7.2 shows a left hand, where the two distances are labeled d1 and d2. Due to the fact that it is known which hand it is, the labels can be adjusted to
October 22, 2009
7.2 Informal User Study
51
this information. For a right hand the labels can be tilted to the right, and if it is a left hand the labels can be tilted to the left. It is also possible to show dierent menus for the right and the left hand, for example, with the right hand a color can be chosen and with the left hand a brush size.

We have presented various ways for detecting multiple touches. Users have to operate such multi-touch surfaces, therefore users can state if these technologies t their expectations. A informal user study has been done to determine which technology works best for them. We hypothesized that the combination of the FTIR and DI eect will be preferred by the users, because it uses the advantages of the FTIR eect and the DI eect. The participants were asked to use Community Earth (shown in Figure 7.3), which is a multi-touch implementation of Nasa World Wind [2] written by Ang et al. [4]. Community earth shows a virtual globe where the user can pan, rotate, zoom and tilt the globe. The globe is panned with a touching nger, with two ngers the user can zoom and rotate the globe and it can be tilted if the user uses three ngers, as shown in Figure 7.4.
Figure 7.3: Community Earth Another application has been created to test the blob detection and tracking rate of the implemented technologies. A maze application is found sucient enough for that task, which is written in C++ with the QT framework of Nokia. The maze can be solved with one or two ngers. To solve the maze the user has to push the starting circle (green) and has to drag his nger through the maze to the end point (blue). If the user is hitting the wall with his nger he has not completed the maze and has to start over again. If the touch detection or the tracking fails, the touch is lost and the user has to start over again too. With two ngers the user pushes the start circle with two ngers and drags them to the end point. A restriction with two ngers is the distance between ngers, when the user exceeds the maximum distance he has to start over again. Statistics over the successful and unsuccessful tries are presented in the upper left corner. The maze itself can be
Andreas Holzammer
52
7. Results
drawn with this application too. The maze has been created to be easily solved and only as big as it could be solved without changing the stand position. The maze should be easy to solve, because it is not the task for the users to solve the maze, they should test the touch detection and tracking. The size of the display is so big that the full screen cannot be reached from one standpoint; that is why the maze is not created to ll the full screen. Seven unpaid participants (six male and one female) in the age of 20 to 28 were asked to participate in the informal user study. All users were right handed and their education level varied from high-school student to post-graduate degrees. None of the participants had experience with multi-touch devices. The following ve dierent processing pipelines were tested.
FTIR pipeline DI with hand detection pipeline DI with a bandpass lter pipeline FTIR matched with DI with a bandpass lter pipeline FTIR matched with DI with hand detection pipeline
The pipelines which were used, are shown in Figures 7.6 and 7.7. Ambient light subtraction is not used for pipelines that involve the DI technology, because most of the DI pipelines make heavy use of modules that need a great deal of performance, which makes the latency a little bit better. The participants where asked to test three to four of those processing pipelines with the Community Earth application and with the maze application. In the Community Earth application the users were rst introduced to the application by showing the gestures illustrated by Figure 7.4 and later on they were asked to navigate to their homes. Afterwards, with other pipelines, they should nd other places like New York, Paris etc. In the labyrinth application the users should try to solve the labyrinth with one and two ngers. The participants were observed while doing their tasks and afterwards informally interviewed about their opinion on how well each detection pipeline is working. Almost all participants were comfortable with the multi-touch interface after a short starting phase; only one had problems after a while with the navigation
Pan
Zoom in
Zoom out
Rotate
Tilt
Figure 7.4: Gestures used by Community Earth, red circle indicates starting point
October 22, 2009
53
Figure 7.5: Labyrinth application

FTIR pipeline
Ambient Light Subtraction Background Removal Blob Detection
Tracking
DI, bandpass filter pipeline

Background Remove Image Normalization Fingertip Detection
DI, hand detection pipeline

Background Remove Image Normalization
Bandpass Filter
Blob Detection
Tracking
FTIR
DI
DI
Tracking Transformation
Application
Transformation
Application
Transformation
Application
Figure 7.6: Pipelines with non combined technologies
gestures. Two participants used their middle nger to navigate, which was a problem for the DI technology, because the participants where putting other ngers on top of the middle nger, which changed the characteristic of the nger. In Community Earth a problem for the users was that the zooming point is always the middle of the screen, in contrast to the middle of the ngers, which is what they expected. The zooming in Community Earth is logarithmic, because if the user is far away he gets to the position he wants to go to faster, but many participants had problems with this eect, because it goes to fast. The participants were disorientated if a false touch was detected, because Community Earth can be very sensitive with zooming and moving. The position can jump fairly wide if a false touch is detected. Therefore, the participants were very disturbed by this. Most of the participants preferred the simple FTIR technology, because of its low false detection rate, with no jumps and accidental zooming and tilting. Just one of the participants preferred the DI with bandpass lter pipeline. The matching of the two technologies was for many too sensitive, even with new DI touches dropped (described in section 4.3.2). If the latency was too high, the participant was wondering why he was zooming so far in or rotated too far. But the false detection rate depends on the ambient light conditions that were present at the time the
Andreas Holzammer
54
FTIR matched with DI with a bandpass filter pipeline
Background Remove Blob Detection
7. Results
FTIR matched with DI with hand detection pipeline
Background Remove Blob Detection
FTIR
DI
Background Remove Image Normalization Bandpass Filter

Blob Detection
FTIR
DI
Background Remove Image Normalization Fingertip Detection
Match Blobs and Hand Tracking
Match Blobs
Tracking
Transformation
Application
Transformation
Application
Figure 7.7: Pipelines with combined technologies participants were performing the informal user study. In the labyrinth applications users had problems with the hand detection pipeline, because they did not spread their ngers far enough or were touching from above so that the nger cannot be clearly seen in the DI image. Overall it can be said that the participants preferred technologies which have less false detection and are accepting less detection sensitivity. In contrast to the hypothesize, which stated that the combination of the FTIR and DI eect will match their expectations, the single FTIR technology worked best for them. This had several purposes, like the longer latency of the combined technologies and the higher false detection rate.
7.3 Conclusion
We have presented a several methods to combine the Frustrated Total Internal Reection and Diused Illumination eect. Therefore, we gave an overview of multi-touch technologies and discussed why FTIR and DI should be combined. Next we looked at the hardware setup, which is needed for the combination and discussed why we took these hardware parts. After all the algorithms have been presented, which are needed for the preprocessing, feature detection and the post-processing of the data which is captured by the camera. The following part of the thesis dealt with the combination of the informations gathered by the Frustrated Total Internal Reection eect and the Diused Illumination information. It was discussed whether it is better to combine the information in the pre- or the post-processing step. An application for rapid testing and debugging was developed to detect and track touches for various optical technologies. This easy-to-use application can even be used by people with no programming skills, because the user can visually combine standard modules. To test the developed methods for combing the technologies, a small informal user study has been performed. Seven participants were asked to use the multi-touch table with two applications, Community Earth and a self-built
October 22, 2009
7.4 Future Work
55
maze application. A demonstration application has been created to show how the extra information from the DI technology can be used for user interaction. The user can lay his hand down on the surface to open up a user context menu to choose colors by pushing the specic nger. Experiments have shown that ambient light reduces the contrast of touch information to the background signicantly. Therefore, the touch sensivity is less than without ambient light. It has been shown that not only touch informations can be derived from DI images, also the connection of touches to a hand and the hand orientation can be derived, which can be used to adjust user interfaces in that way that the user can see it the right way (not upside down), wherever he is standing. The informal user study has shown that many people are very disturbed by a false touch detection and accept a less sensitivity in order of higher precision. Therefore, the users of the study preferred the more stable recognition with the FTIR eect.
7.4 Future Work

Suggestions to enhance the touch detection and other combinations of technologies could be done, like a combination of an optical and electrical method. Some of the electrical methods can measure pressure which could also be included in various applications. An infrared camera that can capture heat could be used to overcome the problem with the ambient infrared light from the sun and other sources, which is already described by Sato et al. [47]. The informal user study showed that the user is disturbed by a long latency of touch detection. Therefore, the performance of the pipelines with long latency should be enhanced, by integrating various modules in one module. This can enhance the performance, because the scheduling of the threads requires time and could even be in the wrong order. In addition, the touch server could be replaced by a more potential computer and another camera can be used with a higher frame rate, to enhance the latency. Furthermore, for providing additional informations about touches, like the connection between hands and touches to the application, the TUIO protocol could be extended. A hand type could be introduced, to provide the application with information about the hand, e.g., if it is a left or a right hand and the orientation of the hand. More eort needs to be put into multi-touch applications or even multi-touch operating systems, because the normal applications/operating systems are not sucient enough for multi-touch interaction. The buttons are normally too small to be precisely touched or are even at a false position for touching. Furthermore, the applications and operating systems do not support multi-touch gestures or even the the operation at various positions and rotation. A gesture set should be standardized, so that the users can get used to multi-touch interfaces and do not have to learn specic gestures for each application. Due to the high false detection rate with the DI technologies, more work has to be put into segmentation of hands and background in DI images and shape analysis. Furthermore, other hand-detection algorithms could be reused or created to get better results for the DI technology.
Andreas Holzammer
Chapter 8 Appendix
8.1 Several Spectra of Infrared Bandpass Filters
Here a selection of transmission curves of infrared lters are printed.
Figure 8.1: Spectrum of one overexposed photo negative. Figure taken from [40]
Figure 8.2: Spectrum of two overexposed photo negative. Figure taken from [40]
57
58
8. Appendix
Figure 8.3: Spectrum of one oppy disk. Figure taken from [40]
Figure 8.4: Spectrum of two oppy disks. Figure taken from [40]
October 22, 2009
8.2 Projector list

Projection lens F = 2.6, f = 6.97mm F = 2.59, f = 6.97mm @ 0,73m F = 2.5 F =3.5, f =14.9mm F/2.52 2.79,f=22.55 27.06mm F/2.52 2.79,f=22.55 27.06mm F = 2,6 / f = 8,37 F2.5 Native Resolution 1024 x 768 1024 x 768 1280 x 800 1024 x 768 1024 x 768 1024 x 768 1024 x 768 1024 x 768 1024 x 768 1024 x 768 1024 x 768 1024 x 768 1024 x 768 F=2,5 mm f=10 mm Brightness 2500 Lumen 2500 Lumen 2600 Lumen 1500 Lumen 3500 Lumen 4000 Lumen 4700 Lumen 4300 Lumen 2300 Lumen 3000 Lumen 2500 Lumen 2700 Lumen 2000 Lumen Contrast Ratio 2000:1 2500:1 1600:1 1200:1 3500:1 3000:1 3000:1 3000:1 2000:1 1600:1 400:1 1200:1 1000:1 Throw Ratio 0.60:1 0.60:1 0.69:1 0.60:1 0.063:1 1.66 2.0:1 1.3 1.57:1 1.3 1.57:1 0.56:1 0.68:1 0.28:1 0.039:1 0.9:1
8.2 Projector list

Projection Distance 0.5m 3.7m 0.5m 3.7m 0.9m 1.49m 0.5m 1.25m 0.064m 0.659m 1.2m 10 m 1.2m 10 m 1.4m 2.1m Lamp lifetime 4000h(ECO)/3000h(Bright 4000h(ECO)/3000h(Bright 3000h(ECO)/2000h(Bright 3000h(ECO)/2000h(Bright 4000h(ECO)/3000h(Bright 3000h(ECO)/2000h(Bright 3000h(ECO)/2000h(Bright 3000h(ECO)/2000h(Bright 3000h(ECO)/2000h(Bright 3000h(ECO)/2000h(Bright Mode) Mode) Mode) Mode) Mode) Mode) Mode) Mode) Mode) Mode) 0.5m 5.5m 4000h(ECO)/3000h(Bright Mode) Distance for our table 0.72m 0.72m 0.828m 0.72m 0.1m 1.992m 1.56m 1.56m 0.67m 0.828m 0.42m -0.04m 1.35m
Manufacturer Acer Optoma 3M 3M NEC Optoma (Lense) Optoma (Lense) Optoma (Lense) Toshiba 3M Hitachi Sanyo BenQ
Model S1200 EX525ST SCP740 DMS700 WT610 EP-780 EP-782 EP-776 EX-20 SCP717 CP-A100 PLC-XL51 MP522 ST
Projection System DLP DLP DLP DLP DLP DLP DLP DLP DLP DLP LCD LCD DLP
Manufacturer Acer Optoma 3M 3M NEC Optoma (Lense) Optoma (Lense) Optoma (Lense) Toshiba EX-20 3M Hitachi Sanyo BenQ
Model S1200 EX525ST SCP740 DMS700 WT610 EP-780 EP-782 EP-776 0.6m - 7.m SCP717 CP-A100 PLC-XL51 MP522 ST
Projection Screen Size 4.15@2m or 2.082@1m 1.04m 7.72m 1.27m 2.48m 1.02m 2.54m 1.016m@64mm 2.032m@461mm 1.2m 12m 0.79m 7.97m 0.79m 7.97m 0.5m 1.5m 1.5m 2.5m
Andreas Holzammer
0.69m 7.53m
59
60
8. Appendix
8.3 Software
In this section a few touch and ducial marker tracking applications are presented to give an overview of what has already been done.
8.3.1 Community Core Vision (CCV)

Community Core Vision was created by the NUI Group community [22]. It was previously called tbeta, but before the source code of the software was released they renamed it. It is platform independent. CCV is based on OpenFrameworks [17], so it should be easy to extend. It has support for various cameras and various multi-touch lighting techniques: FTIR, DI, DSI and LLP. Touching data is served to client applications through TUIO/OSC. Figure 8.5 shows a screenshot from Community Core Vision.
Figure 8.5: Community Core Vision (CCV)
8.3.2 CG Tracker
The CG Tracker from the Technical University of Berlin was rst created as part of a project in the winter semester of 2007/08. Afterwards Stefan Elstner wrote a graphical user interface(GUI) as part of his master thesis [16] for the tracker. The tracker was designed for the FTIR technique. It uses its own, newly created network protocol to serve as a multi-touch application. This tracker also tracked patterns of a display, which is placed on top of the table. This software is hard to extend, because it makes heavy use of the Windows application programming interface.
October 22, 2009
8.3 Software
61
Figure 8.6: CG Tracker, which was developed by a project at the Technical University Berlin and Stephan Elstner
8.3.3 reacTIVision
Figure 8.7: ReacTIVision in action. Picture taken from their website [39] ReacTIVision has been developed by Martin Kaltenbrunner and Ross Bencina at the Music Technology Group at the Universitat Pompeu Fabra in Barcelona, Spain. It was developed for the reacTable project, which has already been described in the related work section. It is an open-source, cross-platform computer vision framework to track ducial markers attached to physical objects, as well as touching ngers. It can be used with various cameras. Due to the fact that the software tracks ducial markers, it can only handle optical technologies that support ducial
Andreas Holzammer
62
8. Appendix
markers, like DI and DSI. ReacTIVision implements the TUIO protocol, so all supporting applications can be used.
8.3.4 Touchlib
Touchlib is a library for creating multi-touch interaction surfaces. It is written in C++, works only with Windows, will interact with most types of webcams and has no graphical user interface. The user can build his own touch tracking application fairly easily due to the simple programming interface. Touchlib communicates through the TUIO protocol with various multi-touch applications.
October 22, 2009
VII
Bibliography
[1] Acer. http://www.acer.com, accessed on 09/26/2009 12:00AM. [2] National Aeronautics and Space Administration. World wind. http://worldwind. arc.nasa.gov/java/, accessed on 10/09/2009 3:30PM. [3] Marc Alexa, Bj orn Bollensdor, Ingo Bressler, Stefan Elstner, Uwe Hahne, Nino Kettlitz, Norbert Lindow, Robert Lubkoll, Ronald Richter, Claudia Stripf, Sebastian Szczepanski, Karl Wessel, and Carsten Zander. Touch sensing based on ftir in the presence of ambient light. techreport, Technical University of Berlin, 2008. [4] Linus Ang, Charles Lo Taha Bintahir, and Zhen Zhang Pat King. Community earth. http://nuicode.com/projects/earth, accessed on 10/09/2009 3:30PM. [5] Apple. Mac os x. http://www.apple.com/macosx/, accessed on 09/24/2009 3:15PM. [6] Bj orn Bollensdor. Multitouch navigation and manipulation of 3d objects. Masters thesis, TU-Berlin, 2009. [7] Jean-Yves Bouguet. Camera calibration toolbox for matlab. http://www.vision. caltech.edu/bouguetj/calib_doc/, accessed on 10/01/2009 9:15PM. [8] Bill Buxton. http://www.billbuxton.com/multitouchOverview.html, accessed on 08/06/2009 3:00PM. [9] W. Buxton and B. Myers. A study in two-handed input. SIGCHI Bull., 17(4):321326, 1986. [10] Nintendo Co. Ltd. nintendo ds. 09/22/2009 3:00PM. http://www.nintendo.com/ds, accessed on
[11] D. Comaniciu and P. Meer. Mean shift analysis and applications. In Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on, volume 2, pages 11971203 vol.2, 1999. [12] Computar. http://computarganz.com/, accessed on 09/26/2009 12:00AM. [13] Paul Dietz and Darren Leigh. Diamondtouch: a multi-user touch technology. In UIST 01: Proceedings of the 14th annual ACM symposium on User interface software and technology, pages 219226, New York, NY, USA, 2001. ACM. [14] Rick Downs. Using resistive touch screens for human/machine interface. Technical report, Texas Instruments Incorporated, 2005. [15] Florian Echtler, Manuel Huber, and Gudrun Klinker. Shadow tracking on multi-touch tables. In AVI 08: Proceedings of the working conference on Advanced visual interfaces, pages 388391, New York, NY, USA, 2008. ACM. [16] Stefan Elstner. Combining pen and multi-touch displays for focus+context interaction. Masters thesis, TU-Berlin, 2009. [17] Zach Lieberman et al. openframeworks. http://www.openframeworks.cc, accessed on 09/29/2009 4:30PM. [18] Sony Computer Entertainment Europe. Playstation eye. http://en.playstation.com, accessed on 10/04/2009 10:30AM.
Andreas Holzammer
VIII
Bibliography
[19] David A. Forsyth and Jean Ponce. Computer Vision: A Modern Approach. Prentice Hall, us ed edition, August 2002. [20] K. Fukunaga and L. Hostetler. The estimation of the gradient of a density function, with applications in pattern recognition. Information Theory, IEEE Transactions on, 21(1):3240, Jan 1975. [21] D. Gale and L. S. Shapley. College admissions and the stability of marriage. The American Mathematical Monthly, 69(1):915, 1962. [22] NUI GROUP. http://nuigroup.com/, accessed on 09/26/2009 11:00AM. [23] Eric Haines. Point in polygon strategies. pages 2446, 1994. [24] Jeerson Y. Han. Low-cost multi-touch sensing through frustrated total internal reection. In UIST 05: Proceedings of the 18th annual ACM symposium on User interface software and technology, pages 115118, New York, NY, USA, 2005. ACM Press. [25] Jordan Hochenbaum and Owen Vallis. Bricktable. bricktable/, accessed on 10/05/2009 9:45AM. http://flipmu.com/work/
[26] Alexander Hornberg. Handbook of Machine Vision. Wiley-VCH, 2006. [27] Ming-Kuei Hu. Visual pattern recognition by moment invariants. Information Theory, IRE Transactions on, 8(2):179187, February 1962. [28] Apple Inc. iphone. http://www.apple.com/iphone/, accessed on 10/05/2009 3:00PM. [29] Google Inc. Google trends. http://www.google.com/trends, accessed on 09/26/2009 11:00AM. [30] Rosco Laboratories Inc. http://www.rosco.com/us/corporate/index.asp, accessed on 09/27/2009 3:00PM. [31] Intel and Willow Garage. Opencv. http://opencv.willowgarage.com/wiki/, accessed on 09/23/2009 2:00PM. [32] Shahram Izadi, Steve Hodges, Stuart Taylor, Dan Rosenfeld, Nicolas Villar, Alex Butler, and Jonathan Westhues. Going beyond the display: a surface technology with an electronically switchable diuser. In UIST 08: Proceedings of the 21st annual ACM symposium on User interface software and technology, pages 269278, New York, NY, USA, 2008. ACM. [33] Benjamin Walther-Franks Jens Teichert, Marc Herrlich, Lasse Schwarten, Sebastian Feige, Markus Krause, and Rainer Malaka. Advancing large interactive surfaces for use in the real world. Technical report, Digital Media Group, TZI, University of Bremen, 2009. [34] Sergi Jord` a. Interactive music systems for everyone: Exploring visual feedback as a way for creating more intuitive, ecient and learnable instruments. In Proceedings of the Stockholm Music Acoustics Conference (SMAC 03), Stockholm, Sweden, 2003. [35] Martin Kaltenbrunner, Till Bovermann, Ross Bencina, and Enrico Costanza. Tuio - a protocol for table based tangible user interfaces. In Proceedings of the 6th International Workshop on Gesture in Human-Computer Interaction and Simulation (GW 2005), Vannes, France, 2005. [36] Sung K. Kang, Mi Y. Nam, and Phill K. Rhee. Color based hand and nger detection technology for user interaction. Hybrid Information Technology, International Conference on, 0:229236, 2008.
October 22, 2009
Bibliography
IX
[37] Jong-Min Kim and Woong-Ki Lee. Hand shape recognition using ngertips. In FSKD 08: Proceedings of the 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery, pages 4448, Washington, DC, USA, 2008. IEEE Computer Society. [38] Kenrick Kin, Maneesh Agrawala, and Tony DeRose. Determining the benets of direct-touch, bimanual, and multinger input on a multitouch workstation. In GI 09: Proceedings of Graphics Interface 2009, pages 119124, Toronto, Ont., Canada, Canada, 2009. Canadian Information Processing Society. [39] R. Bencina M. Kaltenbrunner. reactivision. http://reactivision.sourceforge. net/, accessed on 09/27/2009 6:00PM. [40] Madian. Spectral analysis of ir leds and lters. http://nuigroup.com/forums/ viewthread/6458/, accessed on 10/11/2009 12:30PM. [41] Shahzad Malik and Joe Laszlo. Visual touchpad: a two-handed gestural input device. In ICMI 04: Proceedings of the 6th international conference on Multimodal interfaces, pages 289296, New York, NY, USA, 2004. ACM. [42] Nobuyuki Matsushita and Jun Rekimoto. Holowall: designing a nger, hand, body, and object sensitive wall. In UIST 97: Proceedings of the 10th annual ACM symposium on User interface software and technology, pages 209210, New York, NY, USA, 1997. ACM. [43] Microsoft. Surface. http://www.microsoft.com/surface/, accessed on 09/29/2009 3:00PM. [44] Microsoft. Windows. http://www.microsoft.com/windows/, accessed on 09/24/2009 3:15PM. [45] Nokia. Qt. http://qt.nokia.com/, accessed on 09/24/2009 3:15PM. [46] Nolan. Peau productions. http://peauproductions.com/diff.html, accessed on 10/04/2009 11:30AM. [47] K. Oka, Y. Sato, and H. Koike. Real-time ngertip tracking and gesture recognition. Computer Graphics and Applications, IEEE, 22(6):6471, Nov/Dec 2002. [48] Osram. http://www.osram.com, accessed on 09/26/2009 12:00AM. [49] Inc. Point Grey Research. 11:00AM. http://www.ptgrey.com/, accessed on 09/26/2009
[50] Ilya Rosenberg and Ken Perlin. The unmousepad: an interpolating multi-touch forcesensing input pad. ACM Trans. Graph., 28(3):19, 2009. [51] Johannes Sch oning, Peter Brandl, Florian Daiber, Florian Echtler, Otmar Hilliges, Jonathan Hook, Markus L ochtefeld, Nima Motamedi, Laurence Muller, Patrick Olivier, Tim Roth, and Ulrich von Zadow. Multi-touch surfaces: A technical guide. techreport, Technical University of Munich, 2008. [52] The Imaging Source. Dmk 21bf04. http://www.theimagingsource.com/de_ DE/products/cameras/firewire-ccd-mono/dmk21bf04/, accessed on 10/04/2009 10:30AM. [53] Midwest Optical Systems. Machine vision lter. http://www.midopt.com/, accessed on 09/26/2009 12:00AM. [54] Malte Weiss, Julie Wagner, Yvonne Jansen, Roger Jennings, Ramsin Khoshabeh, James D. Hollan, and Jan Borchers. Slap widgets: Bridging the gap between virtual and physical controls on tabletops. In CHI 09: Proceeding of the twenty-seventh annual SIGCHI conference on Human factors in computing systems, New York, NY, USA, 2009. ACM.
Andreas Holzammer
Bibliography
[55] D. Wigdor, M. Cronin S. Williams, K. White R. Levy, M. Mazeev, and H. Benko. Ripples: Utilizing per-contact visualizations to improve user interaction with touch displays system. In UIST 09: Proceedings of the 22nd annual ACM symposium on User interface software and technology, 2009. and Steven Shafer. Visual [56] Ying Wu, Ying Shan, Zhengyou Zhang, Zhengyou Zhang Y, panel: From an ordinary paper to a wireless and mobile input device, 2000. [57] Duan-Duan Yang, Lian-Wen Jin, and Jun-Xun Yin. An eective robust ngertip detection method for nger writing character recognition system. Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on, 8:49914996 Vol. 8, Aug. 2005. [58] Hongqin Yang, Shusen Xie, Hui Li, and Zukang Lu. Determination of human skin optical properties in vivo from reectance spectroscopic measurements. Chin. Opt. Lett., 5(3):181183, 2007. [59] Zhiwei Zhu, Kikuo Fujimura, and Qiang Ji. Real-time eye detection and tracking under various light conditions. In ETRA 02: Proceedings of the 2002 symposium on Eye tracking research & applications, pages 139144, New York, NY, USA, 2002. ACM.
October 22, 2009

Matlab Opencv Visual PSeye2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Matlab Opencv Visual PSeye2

Uploaded by

Copyright:

Available Formats

Technische Universit at Berlin Faculty IV Computer Science and Electrical Engineering Computer Graphics Group http://www.cg.tu-berlin.

Combining DI and FTIR for touch detection

October 22, 2009

Combining DI and FTIR for touch detection

Combining DI and FTIR for touch detection

October 22, 2009

Combining DI and FTIR for touch detection

Combining DI and FTIR for touch detection

October 22, 2009

Combining DI and FTIR for touch detection

Figure 1.2: Multi-touch Table of Computer Graphics institute

Combining DI and FTIR for touch detection

1.3 Design of System

Client Touch Server

1.4 Related Work

October 22, 2009

Combining DI and FTIR for touch detection

1.4 Related Work

Combining DI and FTIR for touch detection

Chapter 2 Touch-Sensing Technologies

2.1 Frustrated Total Internal Reection (FTIR)

IR-LED Total Internal Reflaction

Figure 2.1: General FTIR setup

2.2 Diused Illumination (DI)

October 22, 2009

Combining DI and FTIR for touch detection

2.2 Diused Illumination (DI)

Combining DI and FTIR for touch detection

Figure 2.4: General DSI setup

2.3 Diused Surface Illumination (DSI)

October 22, 2009

Combining DI and FTIR for touch detection

2.4 Laser Light Plane (LLP)

Figure 2.5: Basic Laser Light Plane setup

2.4 Laser Light Plane (LLP)

Combining DI and FTIR for touch detection

Figure 2.7: Basic LED Light Plane setup

2.5 LED Light Plane (LED-LP)

October 22, 2009

Combining DI and FTIR for touch detection

2.6 Resistance-Based Touch Surfaces

2.6 Resistance-Based Touch Surfaces

2.7 Capacitance-Based Touch Surfaces

Combining DI and FTIR for touch detection

October 22, 2009

Combining DI and FTIR for touch detection

Figure 3.1: A basic optical hardware assembly

3.2 Old Setup

October 22, 2009

Combining DI and FTIR for touch detection

3.4 Infrared Bandpass Filter

3.4 Infrared Bandpass Filter

Combining DI and FTIR for touch detection

Computar T2Z 1816 CS 1.8 - 3.6 mm F1.6 - F16C 55 cm - 115 cm

Table 3.2: Specication of the lens, informations taken from [12]

October 22, 2009

Combining DI and FTIR for touch detection

Figure 3.5: Calculation of lens distance

Combining DI and FTIR for touch detection

3.7 Infrared Light

October 22, 2009

Combining DI and FTIR for touch detection

3.7 Infrared Light

Combining DI and FTIR for touch detection