Professional Documents
Culture Documents
de
Combining Diuse Illumination and Frustrated Total Internal Reection for touch detection
by Andreas Holzammer Matrikelnumber: 300708 Berlin, October 22, 2009
Supervisor Uwe Hahne Examiners Prof. Dr. Marc Alexa Prof. Dr.-Ing. Olaf Hellwich
Erkl arung
Die selbstst andige und eigenh andige Anfertigung versichere ich an Eides Statt. Berlin, den 22. Oktober 2009.
Andreas Holzammer
Zusammenfassung Es gibt viele Techniken um mehrere Ber uhrungspunkte auf einem Bildschirm zu erkennen. Die verschiedenen Techniken haben jeweils Vorund Nachteile. Manche Techniken erfordern viel Druck, andere erkennen Ber uhrungen schon wenn der Nutzer noch nicht die Ober ache ber uhrt oder die Anzahl der gleichzeitigen Ber uhrungen sind eingeschr ankt. Viele dieser Techniken werden nur dazu eingesetzt Ber uhrungspunkte zu ande erkennen k onnten. Eine Kombinabestimmen, obwohl manche H tion von diesen Techniken k onnte die Vorteile vereinen und somit die aftigt sich mit der Nachteile kompensieren. Diese Diplomarbeit besch Kombination von zwei optischen Techniken; Frustrated Total Internal Reection und Diused Illumination. Diese Techniken verwenden infrarotes Licht das an Fingerspitzen/H anden reektiert und durch eine Kamera aufgezeichnet wird. Es werden verschiedene Techniken vorgestellt und diskutiert weshalb gerade diese beiden Techniken kombiniert werden sollten. Des Weiteren wird ein Tischaufbau beschrieben, der die zwei Techniken vereint. F ur die Bilddarstellung wird ein Bild von unten auf die Tischplatte projiziert. Im laufe der Diplomarbeit wurde eine Software entwickelt, die es dem Entwickler erm oglicht schnell und unkompliziert verschiedene Techniken zum erkennen von Ber uhrungspunkten zu testen. Diese Software kann Bilder von einer angeschlossenen Kamera aufnehmen, vorverarbeiten, analysieren, die analysierten Daten weiterverarbeiten und das Ergebnis an ein Nutzerprogramm schicken. Auerdem k onnen noch mehr Informationen, als nur Ber uhrungspunkte extrahiert werden wie die Zuordnung zwischen Ber uhrungspunkt und Hand, Hand Orientierung und Abstand zwischen Ber uhrungspunkt und der Ober ache. Abschlieend wird ein Nutzerprogamm pr asentiert, das diese Zusatzinformationen verarbeiten kann.
Abstract There are many dierent approaches for detecting multiple touches on device surfaces, which all have their own advantages and disadvantages. Some of the approaches require a lot of pressure to be activated; others are activated even if the user is only close to the surface or are restricted by the number of touches they can detect simultaneously. Most of the technologies are only used to detect touches, but some of the technologies can be used to detect hands. To use the advantages and overcome the disadvantages of the individual technologies a combination of technologies should be researched. This thesis presents a combination of two optical technologies, which are called Frustrated Total Internal Reection and Diuse Illumination. These technologies work with infrared light reected by ngertips/hands, captured by a camera. Other multi-touch technologies are presented and it is discussed why these two selected technologies should be combined. A tabletop hardware setup is presented, which combines both technologies in one setup. For displaying an image onto the touch surface a projector is used, which projects the image from behind. In the process of this thesis an easy to use software was developed, for rapidly testing various processing steps which are needed for the detection process. With this software images can be captured, preprocessed, analyzed, resulting information post-processed and afterwards send to an application. Additional information can be derived from these technologies, like the aliation between ngers and a hand, hand orientation and depth information of touches. Furthermore an application has been created that uses these additional information. Keywords: Touch detection, Multi-touch, Diuse Illumination, Frustrated Total Internal Reection, Hand detection
Acknowledgments First, I would like to thank my parents for their consideration and support as I prepared this diploma thesis. I would also like to thank Bj orn Breitmeyer for all the support that he has given me over the years in my studies and especially during my thesis preparation. I am grateful to Uwe Hahne for supervising me, for great support and for helpful suggestions. I also want to thank Jonas Pfeil for his ideas and great lab days. I want to thank Bj orn Bollensdorf for assistance regarding the hardware and ideas. I like to thank Matthias Eitz for his great support he gave us and the interest he put into our project. I also want to thank my brother for supporting me so much and getting me out to go biking from time to time. I want to thank Prof. Dr. Marc Alexa for examining this work and Prof. Dr.-Ing. Olaf Hellwich for co-examining this work. I want to express my gratitude to all people that proof read this thesis; Rudolf Jacob, Melanie Ott and many others. I also want to thank all the others whom I missed.
Contents
Contents
1 Introduction 1.1 Motivation . . . 1.2 Goal . . . . . . . 1.3 Design of System 1.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 3 4 4 7 7 8 10 11 12 13 13 14 15 15 15 16 17 18 19 20 22 22 24 24 25 25 26 27 27 27 28 28 28 29 30 30
2 Touch-Sensing Technologies 2.1 Frustrated Total Internal Reection (FTIR) 2.2 Diused Illumination (DI) . . . . . . . . . . 2.3 Diused Surface Illumination (DSI) . . . . . 2.4 Laser Light Plane (LLP) . . . . . . . . . . . 2.5 LED Light Plane (LED-LP) . . . . . . . . . 2.6 Resistance-Based Touch Surfaces . . . . . . 2.7 Capacitance-Based Touch Surfaces . . . . . 2.8 Discussion . . . . . . . . . . . . . . . . . . . 3 Hardware 3.1 Assembly . . . . . . . . . . . . . . . . 3.2 Old Setup . . . . . . . . . . . . . . . . 3.3 Camera . . . . . . . . . . . . . . . . . 3.4 Infrared Bandpass Filter . . . . . . . . 3.5 Lens . . . . . . . . . . . . . . . . . . . 3.6 Projector . . . . . . . . . . . . . . . . 3.7 Infrared Light . . . . . . . . . . . . . . 3.8 Surface Layers . . . . . . . . . . . . . 3.8.1 Compliant Layer . . . . . . . . 3.8.2 Projection Screen . . . . . . . . 3.8.3 Protective Layer . . . . . . . . 3.8.4 Dierent Surface Layer Setups 3.9 Switching Circuit . . . . . . . . . . . . 3.10 Power Supply . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
4 Algorithms 4.1 Image Preprocessing . . . . . . . . . . . . 4.1.1 Bright Image Removal . . . . . . . 4.1.2 Ambient Light Subtraction . . . . 4.1.3 Background Subtraction . . . . . . 4.1.4 Hotspot Removal . . . . . . . . . . 4.1.5 Image Normalization of DI Images 4.2 Feature Detection . . . . . . . . . . . . . . 4.2.1 Touch Detection . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
Andreas Holzammer
II 4.2.2 Hand Detection . . . 4.2.3 Fingertip Detection 4.2.4 Hand Orientation . . Post-processing . . . . . . . 4.3.1 Undistortion . . . . 4.3.2 Tracking . . . . . . . 4.3.3 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents . . . . . . . . . . . . . . . . . . . . . 31 31 35 36 36 37 38 39 40 41 42 43
4.3
5 Combining Frustrated Total Internal Reection and Diused Illumination 5.1 Images of Frustrated Total Internal Reection and Diuse Illumination 5.2 Processing Pineline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6 DIFTIRTracker 45 6.1 Graphical User Interface . . . . . . . . . . . . . . . . . . . . . . . . . 46 6.2 Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 6.3 Network Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 7 Results 7.1 Proof of Concept . . 7.2 Informal User Study 7.3 Conclusion . . . . . 7.4 Future Work . . . . 49 49 51 54 55 57 57 59 60 60 60 61 62 VII
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
8 Appendix 8.1 Several Spectra of Infrared Bandpass Filters 8.2 Projector list . . . . . . . . . . . . . . . . . 8.3 Software . . . . . . . . . . . . . . . . . . . . 8.3.1 Community Core Vision (CCV) . . . 8.3.2 CG Tracker . . . . . . . . . . . . . . 8.3.3 reacTIVision . . . . . . . . . . . . . 8.3.4 Touchlib . . . . . . . . . . . . . . . . Bibliography
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
List of Figures
III
List of Figures
1.1 1.2 1.3 2.1 2.2 2.3 2.4 2.5 2.6 2.7 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 5.1 5.2 5.3 5.4 5.5 5.6 Popularity of the search terms multi touch . . . . . . . . . . . . . Multi-touch Table of Computer Graphics institute . . . . . . . . . . Parts of the multi-touch table . . . . . . . . . . . . . . . . . . . . . . General FTIR setup . . . . . . . . . . . . . Coupling infrared light into an acrylic plate General DI setup . . . . . . . . . . . . . . . General DSI setup . . . . . . . . . . . . . . Basic Laser Light Plane setup . . . . . . . . Occlusion of ngers . . . . . . . . . . . . . . Basic LED Light Plane setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 3 4 8 8 9 10 11 12 12 15 16 16 18 19 19 20 20 21 22 22 24 25 26 29 30 32 32 34 36 37 37 39 40 40 41 41 42
A basic optical hardware assembly . . . . . . Point Grey Firey MV . . . . . . . . . . . . . Spectrum of the Point Grey Firey MV . . . Infrared bandpass lter from Midwest Optical Calculation of lens distance . . . . . . . . . . Distortion of the lens . . . . . . . . . . . . . . Principle of ultra-short-throw projector . . . Acer S1200 ultra-short-throw projector . . . . Osram SFH 4250 . . . . . . . . . . . . . . . . Etching layout . . . . . . . . . . . . . . . . . Placement of the infrared illuminators . . . . Streaks . . . . . . . . . . . . . . . . . . . . . Surface layers for an FTIR setup . . . . . . . Switching Circuit . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Hotspot . . . . . . . . . . . . . . . . . . . . . . . . . . Illumination of the surface . . . . . . . . . . . . . . . . Convexity Defects of a Hand . . . . . . . . . . . . . . Smoothed and non-smoothed contour . . . . . . . . . Dominant point detection . . . . . . . . . . . . . . . . Orientation angle theta, derived from central moments Example image of checkerboard . . . . . . . . . . . . . States of a touch, derived by tracking touches . . . . . Idea of the thesis . . . . . . . . . . . . . . . Comparison of hand touch with pressure . . Comparison of hand touch with no pressure Comparison of at hand touch . . . . . . . Comparison of touches close together . . . . FTIR, DI pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Andreas Holzammer
IV 5.7 5.8 6.1 6.2 6.3 7.1 7.2 7.3 7.4 7.5 7.6 7.7 8.1 8.2 8.3 8.4 8.5 8.6 8.7
List of Figures FTIR and DI LEDs on vs multiplied . . . . . . . . . . . . . . . . . . FTIR and DI switched on vs multiplied . . . . . . . . . . . . . . . . DIFTIRTracker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parts of the DIFTIRTracker . . . . . . . . . . . . . . . . . . . . . . . Hand TUIO Package . . . . . . . . . . . . . . . . . . . . . . . . . . . Hand menu . . . . . . . . . . . . . . . . . . . . . . Determination if the hand is a right or a left hand Community Earth . . . . . . . . . . . . . . . . . . Gestures used by Community Earth . . . . . . . . Labyrinth application . . . . . . . . . . . . . . . . Pipelines with non combined technologies . . . . . Pipelines with combined technologies . . . . . . . . Spectrum of one overexposed photo negative Spectrum of two overexposed photo negative Spectrum of one oppy disk . . . . . . . . . . Spectrum of two oppy disk . . . . . . . . . . Community Core Vision (CCV) . . . . . . . . CG Tracker . . . . . . . . . . . . . . . . . . . ReacTIVision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 43 45 46 47 49 50 51 52 53 53 54 57 57 58 58 60 61 61
List of Tables
List of Tables
3.1 3.2 3.3 3.4 Specication Specication Specication Parallel port of the Point Grey Firey MV . of the lens . . . . . . . . . . . of the Acer S1200 . . . . . . . data pins used for switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 18 20 26
Andreas Holzammer
Chapter 1 Introduction
There are several ways to interact with the computer. The oldest interaction method is the keyboard and later the mouse made a profound impact on in computer interaction. The Zuse Z3 (1941), the rst computer even had buttons to interact with the computer. Later these buttons formed a keyboard. The mouse was invented in 1963/1964 by a team around Douglas C. Engelbart and William English at the Stanford Research Institute (SRI). This mouse enabled the user to point in a 2D space, which indirectly manipulates the cursor on the computer monitor. These two methods are still widely used at the present time. Almost every computer has a keyboard and a mouse. This adds up to about 60 years of success for the keyboard and 40 years of success for the mouse. Many other interaction methods have been invented but no other technology has had as much success. Touchpads were introduced when notebooks became successful. This pad is placed beside the keyboard and can normally only track one ngertip. It is also small in size and without display technology. Today there are multi-touch touchpads, but with some limitations, such as the size of the pad and number of ngers it can detect. Experience has shown that users would prefere an interaction with the computer in a very simple and natural manner. They normally work with their hands, so a natural interface for the hand is needed. The user desires a visual feedback from the computer and wants to interact with the displayed content. Obviously it would be nice if the user is able to touch the visual feedback to interact with the computer. Touchscreens where invented in the late 1960s. But the rst commercial touchscreen computer, the HP-150, was presented not before 1983. These touchscreens could only detect one touch point. The user has two hands, ten ngers, and wants to use both hands to interact with his tools. For example, if a human wants to cut a tree branch into two parts, the user holds the branch with one hand and the saw with the other. He then cuts the branch into two parts. It is very natural to use two hands to work, although it is not the case for all tasks. Why should the user be restricted to using just one nger to interact with a computer? Humans often work with two hands very productively, but on the computer it has not always been so successful. Users would like to have a user interface which is very intuitive and sensitive enough to that little pressure is needed to interact with the device. The multi-touch technology enables the user to employ both hands and even use the computer with other people at the same time.
1. Introduction
Figure 1.1: Popularity of the search terms multi touch analyzed by Google Trends [29] from the beginning of 2004, peeks are labelled with main events, a) Je Han presented FTIR surface at TED, b) iPhone, c) Microsoft Surface, d) Microsoft Wall, e) iPhone3G, f) Windows 7 announced with multi-touch support
1.1 Motivation
Multi-touch is an interaction technology that allows the user to control the computer with several ngers. These multi-touch devices typically consist of a touch screen (e.g., computer display, table or wall), as well as a computer that is detecting such touches and producing the image. In the last couple of years multi-touch interfaces have become increasingly popular. This popularity can be seen in the search requests on Google for the term multi touch (see Figure 1.1). During the US presidential elections in 2008, Cable News Network (CNN) utilized a multi-touch screen in order to present interactive maps displaying the presidential race results in each state. Even though the multi touch technology was initially introduced in the 1970s, it did not gain popularity until Je Han presented his low-cost multi-touch sensing technology in 2005 [24]. Bill Buxton gave a good overview of the history of the multi-touch technologies on his website [8], where it is very interesting to see, which kind of devices were invented and at what time in the past. Hans, low-cost multitouch sensing technology is based on Total Internal Reection (FTIR) principle, which was rediscovered by him to detect touches. In 2007 Apple introduced their new multi-touch smartphone, called the iPhone, which is using an electrical eect to detect touches, which can be seen as second peak in Figure 1.1. The iPhones multitouch interactions were embraced by the user because of its ease of use. Microsoft then introduced their multi-touch table, called Surface in 2007 [43]. The table uses a dierent optical method than Je Hans, which is called Diused Illumination (DI). This technology can detect objects and interact with them. After that Microsoft invented a multi-touch wall. Then in 2008 Apple promoted its second version of the iPhone, which introduced a faster internet connection as well as global positioning system A-GPS. 2008 Microsoft announced the new Microsoft Windows would support multi-touch.
1.2 Goal
User studies have shown that a direct manipulation with one or more ngers can increase performance dramatically in contrast to using a mouse [38]. Hence, if two hands are used instead of just one a higher performance can be achieved as Buxton et al. are stating [9]. Wigdor et al. [55] state that it is very important for user interaction satisfaction on a multi-touch device that there is accurate touch detection. The user becomes frustrated or even loses the sense of control if the system is not responding in the way the user expects it to. This can have several causes like the system not being responsive, hardware failing to detect the input, input delivering the wrong location or input not mapping to the expected function. The existing multi-touch technologies have their own advantages and disadvantages.
1.2 Goal
The goal of this diploma thesis is to enhance the touch detection of the multi-touch table at the Technical University Berlin, which is shown in Figure 1.2. The issue of the old table was the touch sensivity of the panel. Users must push very hard, particularly if the user drags his nger on the surface to interact with the table. Many people are not comfortable with pushing very hard while they are dragging a nger on surface, especially if the surface is very glossy. On the other hand with a dierent technology called diused illumination sensitivity is very high, but it is dicult to sense whether a user is really touching the surface or is just above the surface. The idea is to combine these two technologies to produce sensitive and accurate touch detection. This combination needs to be studied, not only how the touch information is derived, but also how connections between touches and the hand could be established to enhance the human-computer interaction. This information could be used to approximate how many people work on the table.
Andreas Holzammer
1. Introduction
Figure 1.3: Parts of the multi-touch table The basic idea of the multi-touch table is to have one device with all the hardware that is required to detect various touches, which is shown in Figure 1.3. The computer underneath the table, which we call this the touch server, does all the touch detection. This touch server is providing all the data that is detected with the tables hardware to a client computer which runs an application. This client computer processes the data which is transfered via network.
which is a exible and inexpensive multi-touch input device. It is just a pad that is pressure sensitive, but the authors say it can also be developed to be transparent as an overlay to displays. It uses a principle called Interpolating Force Sensitive Resistance, which is a electric method of sensing multiple touches on a surface. The authors print two conductive layers with wires. One layer is positioned horizontal and one vertical. Between those layers is a resistive layer. These wires are connected if a user touches the pad, which is measured by a micro controller. These measurement results form an image of pressure upon the surface, which is analyzed to extract touch information. In 2003 Jord` a et al.[34] created a table in which a user can make music with various objects, called rectable. These objects are recognized by the table and the user can interact with a music software to make music. They use the optical method Diused Illumination to detect duciary marker (ducial) on the objects. First they put markers on the ngers to nd ngers with the existing software. Later they included normal touch detection as well as touch interaction. Kaltenbrunner et al., a member of the research group introduced in 2005 a standardized network protocol for touches and objects. In 2008 Izadi et al. from Microsoft research presented in 2008 a new surface technology called SecondLight [32], where two projectors are combined to produce one projection image to the table surface and one to an object above the surface. They used a special acrylic plate which can be switched diused with 60 Hz. A combination of the Frustrated Total Internal Reection eect and the Diused Illumination is used to detect touches and objects at the same time. Weiss et al. introduced in 2009 a multi-touch table which can be used with silicone objects, like buttons, sliders, knobs and keyboards. The labeling of these objects is produced by the projector that is used for image displaying. They use a combination of the Frustrated Total Internal Reection and Diused Illumination for the detection of touches and the silicone objects.
Andreas Holzammer
The light rays are injected in the acrylic plate from the edges. If a user touches the acrylic plate, the total internal reection is interrupted at this point and reected straight down, because of the higher reection index of the ngertip. An illustration of this eect can be seen in Figure 2.1. The minimum thickness of the acrylic plate should be 6 mm (depends on the size of the multi-touch surface) to prevent too much bending of the screen. The acrylic plate is normally cut roughly. For an ecient coupling of the light into the plate, the edges of the acrylic plate have to be polished. To enhance the coupling of the light into the edges, the edges can be cut o with an angle of 45 which is shown in Figure 2.2. Infrared light is mostly used for illumination due to the fact that the human eye cannot see this light. An infrared camera is placed beneath the acrylic plate. Common Charge-Coupled Device (CCD) cameras are sensitive in the infrared spectrum, but they normally have an infrared lter in front of the sensor. Color CCD cameras have an additionally bayer lter in front of the sensor. All these lters are disturbing the imaging process of infrared light, so a CCD camera without infrared lter or bayer lter is required.
2. Touch-Sensing Technologies
Camera
Figure 2.2: Coupling infrared light into an acrylic plate without angle left and with 45right The resulting images are analyzed by a computer vision program, which detects bright spots, which we call blobs and tracks them. A bae is necessary to hide the light that is leaking from the LEDs that are mounted at the sides. Otherwise infrared light can be reected directly by a hand towards the camera. This Bae should preferable be made of a material that does not reect infrared light. Because ngertips have little rills in the skin the frustration of the internal total reection takes place only at the skin ridges of those rills. This results in very dark blobs. To overcome this issue a layer, which we call the compliant layer, is needed that closes the little air gaps between the rills.
IR Illuminator Camera
Figure 2.3: General DI setup and more unsharp and at a certain distance the object cannot be detected anymore. Normally the projection screen that is needed for image displaying is diuse enough to fulll this eect. It is very important to get a unied distribution of infrared light across the surface to get good detection results. If the surface is not evenly illumninated, the object on one spot of the surface is very bright and on other places very dark, which makes the image preprocessing very dicult or even impossible, because of the brightness sampling of the camera. It is very dicult to get an evenly spread illumination, which leads to a variance of sensitivity over the regions of the surface. Hochenbaum and Vallis, who constructed the bricktable [25] say that it is very hard to get a setup that is working with the same sensitivity in all spots. Teichert et al.[33] have researched a method to get a surface of a multi-touch table even illuminated. They used 2520 infrared light-emitting diodes (LEDs), mirrors and local shadowing with a cross illumination technique to get their surface even illuminated. Another approach is to put the illumination in front of the projection screen and track shadows in contrast of the reected light, which was stated by Echtler in 2008 [15]. This can be a good idea, because the sunlight and other light bulbs are emitting infrared light, which we call ambient light. But if there is no ambient light it has to be produced. Therefore, some infrared illuminators have to be placed above the surface. On the other hand, if we are not using shadow tracking; the stronger the external light is, the brighter the background of the captured image gets. It can get so bright that there is no dierence between reected light from the hand and ambient light. Here an acrylic plate is not needed, but the user needs a hard surface which he can touch, to get a haptic feedback. But glass or other transparent material can be
Andreas Holzammer
10
2. Touch-Sensing Technologies
IR-LED
Camera
used for that purpose. The projection screen in this case can be either placed below the acrylic plate or above, but this depends on the touch feeling of the projection screen or the material that is used for the haptic feedback. One major advantage of rear diused illumination is that it can be used to detect objects or even ducial markers.
11
Camera
Laser Light Plane (LLP) is a technology which uses lasers as infrared source. A infrared light plane is produced by lasers with a line generator in front of the lasers. The laser plane should be about 1 mm thick. Normal line generator procures a 120-degree line plane. The laser plane should be just above the surface. A basic setup is shown in Figure 2.5. Due to the fact that lasers are used to produce the infrared plane, some safety issues have to be taken into account. The human eye cannot see the infrared light, but can be hurt by it. The eye has a blinking response for visible light, but with infrared light the eye does not respond and the human does not realize that he is being hurt by the laser. Therefore, only so much lasers and power should be used to cover the surface. This technology works as follows. The infrared light from the lasers is scattered at the ngertip of the user, towards the camera, if the user touches the surface. The user does not really need to touch the surface to be detected, because the light plane is above the surface. But the ngers can occlude the infrared light so ngers hidden behind other ngers can not be detected, as shown in Figure 2.6. To overcome this problem, more lasers are needed. The projection screen can either be placed above or below the acrylic plate.
Andreas Holzammer
12
2. Touch-Sensing Technologies
Figure 2.6: Fingers can occlude each other. The black touch is occluding the gray touch.
Camera
13
The surfaces of Surface Capacitance touch panels consists of a thin conductive layer on a glass substrate which is transparent and serves as an electrode of a capacitor. The corners of the conductive layer are connected to a voltage source via a sensitive current measuring system. If a user touches the surface, the charge is transported from the conductive layer to the human body. The drawn current from the corners is measured and a position is estimated. Projected capacitive touch surfaces consist of a capacitive sensorgrid, which is normally between two protective glass layers. The sensorgrid can measure the capacitance forms between the nger and the grid, while the user is touching the surface. The touch position is derived due to the change of electrical properties of the sensorgrid. This method can detect ngertips even if they are not touching the surface, because the electrical properties already change if the nger is close to the surface. This type of panel can be used in rough environments such as public installations because it can be covert with a non-conductive material without interfering the touch detection. Due to the sensor grid, multiple touches can be derived more easily, compared to the surface-capacitance based technology.
Andreas Holzammer
14
2. Touch-Sensing Technologies
One example of capacitive touch surfaces is the DiamondTouch system created by Dietz and Leigh in 2003 [13], which is a multi-user, debris-tolerant, touch-andgesture-activated screen for supporting small group collaborations. They transmit a signal, which depends on the location on the table through antennas, and if the user touches the screen the signal is capacitively coupled to the chair where it is received and analyzed. This leads into the restriction that only four users can be distinguished. Another famous device is the iPhone [28] from Apple, which uses a capacitive touch surface, but not much technical information about it, is known.
2.8 Discussion
After the presentation of the dierent multi-touch technologies it becomes clear that not all technologies can be combined. The combination of electric and optical methods, would be possible, but if the electrical methods reach a certain size, the electric issues are getting huge. Because of the desired size of the touch screen 120 cm x 90 cm, the electric methods where not chosen. The technologies to combine should be real multi-touch technologies; the technologies should allow to detect many touches without restrictions. The infrared light plane technologies are restricted due to the fact that ngers can be occluded by each other. Some of the electrical methods have an issue with multiple touches too. Diused Surface Illumination cannot be combined because it uses a special acrylic plate, which is already a combination of Frustrated Total Internal Reection and Diused Illumination. Both eects always have to be used at the same time; it is not possible to use both eects separately and calculating one result. The advantage of the Frustrated Total Internal Reection technology is that there is a strong contrast between a touch and the background. A pressure approximation can be done with the brightness of a touch, but this can be a disadvantage too, because this technology requires pressure to work and if the user is not applying pressure to the surface it is not detected. The advantage of Diused Illumination is that it is very sensitive to touches, but this again can be a disadvantage too because this can lead into a false detection. The combination of Frustrated Total Internal Reection and Diused Illumination was chosen to combine the advantages of these both technologies and balance the disadvantages. After looking at the technologies, a hardware setup is needed that combines the chosen technologies in one setup.
Chapter 3 Hardware
In this chapter the hardware components which are required to build a multitouch table with the Frustrated Total Internal Reection and Diused Illumination technology are discussed. The hardware which we used for the multi-touch table is presented afterwards. The multi-touch table of the Institute Computer Graphics (CGTable) at the Technical University of Berlin [3], was rst built as part of a project in winter of 2007/08. The table had only the FTIR technique to detect touches. During this thesis the table was upgraded with the DI technology; also other hardware problems where resolved by part replacing.
3.1 Assembly
A basic hardware assembly of an optical multi-touch display consists of the following parts: a camera, infrared illuminators, projector and a projection screen, as seen in Figure 3.1.
Projectionscreen
Projector Camera
IR-Illuminator
15
16
3. Hardware
Figure 3.2: Point Grey Firey MV, picture taken from the website of Point Grey Research Inc. [49]
Figure 3.3: Spectrum of the Point Grey Firey MV, picture taken from the website of Point Grey Research Inc. [49] light-emitting diodes (LEDs) [48] were used to illuminate the acrylic plate, which illuminated the the acrylic plate fairly poorly, because there where not enough.
3.3 Camera
A CCD camera is needed that has no infrared lter and should be a black-and-white camera. The camera should also have a large sensor, so that a great deal of light can be captured by the sensor. A small imaging sensor can lead a high noise ratio. Due to the fact that we want to take images of infrared light, the camera should have a spectrum that covers the wavelength that we use, which is typically 850 nm. Also, a high frame rate is needed, because we want a fast response time and a good tracking result. One good and cheap camera is the Playstation 3 Eye camera [18]. This camera is a color CCD camera, which can capture 640x480 pixels at a frame rate of 60 fps. The old camera of the CG-Table was a Imaging Source DMK 21BF04 [52], which was replaced because of the size of the imaging sensor. We chose the Firey MV from Point Grey, because it has good specications (see Table 3.1) and a matching spectrum (see Figure 3.3). Also many other projects have used this camera with good results.
17
The Firey has an external trigger, which is used to synchronize the camera and the infrared LEDs. Most other cameras do not have external triggers like webcams, which are very popular in the community, because they are cheap and very easy to get, but the built-in infrared lter needs to be removed. We need the external trigger, because we want to make dierent images of the Frustrated Total Internal Reection, Diused Illumination eect and images with no infrared light on. We call the images of the eects FTIR image, DI image and reference image. Description Image Sensor Type: Image Sensor Model: Maximum Resolution: Pixel Size: Imaging Area: Digital Interface: Maximum Frame Rates: General Purpose I/O Ports:
Synchronization:
Lens Mount:
1/3 progressive scan CMOS, global shutter Micron MT9V022 752(H) x 480(V) 6.0 m x 6.0 m 4.55 mm x 2.97 mm IEEE 1394a / USB 2.0 63 FPS at 752x480 7-pin JST GPIO connector, 4 pins for trigger and strobe, 1 pin +3.3 V, 1 VEXT pin for external power via external trigger, software trigger, or free-running CS-mount (5 mm C-mount adapter included)
Table 3.1: Specication of the Point Grey Firey MV found on the website of Point Grey Research Inc. [49]
Andreas Holzammer
18
3. Hardware
Figure 3.4: Infrared bandpass lter from Midwest Optical Systems, chart taken from website of Midwest Optical Systems [53] Description Model: Focal Length: Iris Range: Calculated distance to screen:
3.5 Lens
The choosing process of the lens depends on the distance between the camera and the touch surface. Due to the fact that the surface is very big (60 inches) and not too high, a wide lens opening angle is required to capture the full surface. For this purpose we chose a sheye lens, which has a barrel distortion eect. This eect is shown in Figure 3.6. Our surface has a dimension of 120 cm x 90 cm at a hight of 103 cm. Figure 3.5 shows the physical setup. With the following equation the needed distance can be calculated:
Screen x = fSensor Sensor = 4.55 mm x 2.07 mm Screen = 1200 mm x 900 mm
We chose a vari lens, because we wanted the freedom of placing the camera at various positions. The specications of the used lens are shown in Table 3.2. Experiments have shown that the ideal place is in the middle of the acrylic plate.
3.6 Projector
19
Figure 3.6: Left is a tele lens with little distortion. Right is the sheye lens with much distortion.
3.6 Projector
As mentioned earlier projectors need a certain distance for a certain projection size. If a normal projector is used, a projection distance of approximately 2.5 m is needed. In this case, mirrors have to extend the distance between the projector and the surface. So if the the projection screen is at a height of 104 cm, 2 mirrors are needed, because with just one mirror the projector would be above the table surface. But with mirrors, ghosting eects appear, which is a replica image appearing fainter with an oset in position to the primary image; an alternative are short-throw projectors. A table of possible short-throw projectors is presented in the Appendix in section 8.2. These projectors need a projection distance from -0.04 m to 2 m for our table. The negative value means that the projector is actually placed 4 cm above the surface. These projectors are projecting at a very high angle, as shown in Figure 3.7. The projector also needs to be mounted at an angle of 90 degrees. Not all projectors can be mounted at 90 degrees, because their heat ventilation needs to be at an upright position. Otherwise the projector is getting very hot and the lifetime of the lamp decreases dramatically. Another problem is the mounting of the projector. Normal ceiling mounts cannot be used for that purpose, because they are not stable enough to hold the projector at 90 degrees. The projectors have normally screw holes for the ceiling mounts, where a board can be mounted which can be mounted to the table. We have chosen the Acer S1200 projector, which is shown in Figure 3.8, because of the brightness, contrast ratio (see Table 3.3) and that the projector projects not
Andreas Holzammer
20
3. Hardware
Figure 3.7: Principle of ultra-short-throw projector from the side. Also, with this projector no mirrors are needed to project 60 inches.
Figure 3.8: Acer S1200 ultra-short-throw projector, picture taken from web page of Acer [1] Description Projection System Native Resolution Brightness Contrast Projection lens Throw Ratio Projection Screen Size Projection Distance Lamp lifetime Distance for 60
DLP 1024 x 768 2500 Lumen 2000:1 F = 2.6, f = 6.97mm 0.60:1 4.15 m@2 m or 2.082 @1 m 0.5 m 3.7 m 4000h(ECO)/3000h(Bright Mode) 0.72 m
Table 3.3: Specication of the Acer S1200, information taken from [1]
21
needed to illuminate the acrylic plate. Infrared illuminators can be self build out of single light-emitting diodes (LEDs), LED emitter or LED ribbon: Single LEDs Can be either normal infrared LEDs or Surface-Mounted Device (SMD) infrared LEDs. The normal infrared LEDs are bigger and easier to solder, but normally they are less powerful than the SMD LEDs. The SMD LEDs need to be soldered to a board, which is not needed for the normal LEDs. With single LEDs the user has the freedom of arranging the LEDs for his own needs. He is not bound to industrial standards, but needs soldering experience and the tools to do so. LED emitter Prefabricated emitters, which are used for a DI setup, because they are normally round and have a large surface. These emitters are normally used as the headlight for a night-vision camera. These emitters have a dense area of infrared LEDs, so a hotspot is produced. This can be eliminated by bouncing the infrared light o the sides and oor of an enclosed box. With emitters the user does not need to solder anything. LED ribbons These are prefabricated LED strips with SMD LEDs on them. They are normally used for an FTIR setup. This is the easiest way to build an FTIR setup, because the LED ribbons have an adhesive side and therefore can be glued to a frame. For the FTIR method a long thin illuminator is required and for DI an even illumination is needed. Due to the fact that we want bright spots at the places where the user touches the surface, a great deal of infrared light is needed. We chose SMD LEDs, because they have a higher total radient ux. Most of the people who build such multi-touch tables use the SFH 485 from Osram [48]. We used the SFH 4250, which is shown in Figure 3.9. We have chosen this LED, because many people have used this LED before and it was recommended by Sch oning et al. [51].
Figure 3.9: The Osram SFH 4250 soldered on a board For the FTIR eect we need to mount the LEDs at the edges of the acrylic plate so the infrared illuminator needs to be long and narrow. For building such a long and thin illuminator a board has been created which ts 24 of these LEDs in groups of 6. A group consists of the LEDs with a resistor in series. 14 of those boards are used to illuminate the acrylic plate. This custom board was self-etched with the pattern shown in Figure 3.10. The area of each pad which is needed for connecting the LEDs with the board should be at least 16 mm2 for absorbing the heat. Due to the fact that we are switching on the LEDs only when it is necessary, the LEDs are not getting too hot and do not need a bigger heat pad.
Andreas Holzammer
22
3. Hardware
Figure 3.11: Placement of the infrared illuminators For the DI eect bigger illuminators are needed. The LEDs are mounted on a normal board which is 16 x 10 cm big. On such a board are 24 LEDs placed in four rows, with six LEDs in each row. The placing of these infrared illuminators is not easy, because the acrylic plate is reecting infrared light and these reections interfere with the touch detection. A position needs to be found that almost no direct reection gets to the camera. Positioning of the infrared illuminators on the oor results into such reections. Either the illuminators need to be placed at an angle which moves the illuminators outside of the table or they have to be reected at a wall. Another approach is to place the illuminators just below the acrylic plate nearly vertically, as shown in Figure 3.11
23
to the acrylic plate, but it then needs a refractive index close to the acrylic. If an additional projection foil is needed the compliant layer should not stick to the projection foil. Silicone Silicone can be used as the compliant layer due to the fact that its refractive index is very close to that of the acrylic plate. It can be spread over the acrylic plate. There are several ways to make a smooth layer of silicone on the acrylic plate.
spraying the silicone onto the acrylic plate roll the silicone onto the acrylic plate spread out a thin layer with a rigid bar get a silicone foil
For all the variants described above a low viscosity silicone is needed. Silicone can be thinned with Zylol or Xylol. This makes the silicone more liquid, but because of the mixing process it has little bubbles of air inside. These bubbles could interfere with the FTIR eect, but tests showed that the bubbles are not interfering, actually there is no dierence between the thinned and normal silicone. For spraying and rolling the silicone onto the acrylic plate a few layers are needed, because one layer would be too thin to work. The rolling method produces a textured layer, so if it is rolled on the acrylic plate the surface of the plate has a texture, which interferes the FTIR eect, because the angles of the rays that travel in the acrylic plate are not perfect anymore. And if the angles are not perfect there is no total internal reection. The issue with silicone is that it easily sticks to other materials. If the projection foil sticks to the silicone the total internal reection is frustrated at this point. In our tests the projection foil was sticking only short to the silicon, but this eected streaks. These streaks can be removed with talcum powder, but the talcum powder also interferes with the FTIR eect. The problem with bought silicone foil is that it is mostly powdered with talcum, because they sticks very well to each other. It is very dicult to wash o the talcum powder from the silicone. Latex As compliant layer latex can be used too, because it is exible enough to fulll the ngertip rills. Latex is a natural product, so there is no latex that is totally transparent. The latex has a yellow/brown color. But latex can also be used as the projection screen. The projection performances are not as good as a professional back projection screen, but good enough for normal working conditions. Latex is dedicated to human grease, it needs to be cleaned with silicone oil. Latex also sticks to the human skin, so an additional protective layer is required. This layer should be transparent, so that it is not interfering with the image and it should have a nice touch feeling, because it is the actual touching layer.
Andreas Holzammer
24
3. Hardware
Figure 3.12: Streaks of thinned silicone at the top and unthinned silicone at the bottom
Discussion It is very dicult to get an even thick layer of silicone on an acrylic plate. We have tried to roll the silicone on the acrylic, which produced a structure on the silicone which interfered with the FTIR eect. We also tried to spread out a thin layer with a rigid bar, which had good results in terms of evenness, but the projection foil stuck to the silicone and produced streaks (shown in Figure 3.12), which is disturbing the touch detection process. We have chosen a latex layer, because it has no problems with streaks.
25
a b c d
Figure 3.13: Surface layers for an FTIR setup with silicone: projection foil (a), gap (b), silicone (c), acrylic plate (d) and for an FTIR setup with latex: protective foil (a), latex (b), gap (c), acrylic plate (d)
Andreas Holzammer
26
3. Hardware
Figure 3.14: Switching Circuit. There are many more LEDs involved in that circuit, but are not shown. Pin D0 D1 D2 D3 Purpose FTIR LEDs Reference LEDs External camera trigger DI LEDs
Chapter 4 Algorithms
After looking at the hardware setup, this section describes algorithms, which are needed for image preprocessing, touch detection and tracking of FTIR and DI images. The preprocessing of the images extracts the informations we need for the analyzing process by ltering the captured images. Afterwards a feature detection is carried out to nd touches and other information in the image. Later this information is post-processed to transform the touches to the right place on the screen and to track touches.
To nd the smallest distance between one image to the others, the accumulated distances of all images are calculated with a given distance metric. An accumulated distance is the sum of the distances of one image to all the others. In this list of distances the smallest value is searched. This corresponding image has the smallest histogram distance to all the others. After this calculation, the distances between the found image and the others are calculated. Is the distance below a certain value these images are not overexposed.
27
28
4. Algorithms
This algorithm is necessary because it is important that in the background subtraction algorithm no overexposed image is involved, otherwise the touch detection is not sensible enough.
29
Another approach is to polarize the projector light and use another polarization lter in front of the camera (lter rotated by 90 degrees). This solution was not used because the polarize lter in front of the projector is darkening the projected image. To suppress the hotspot an infrared lter can be used in front of the projector. This approach reduces the hotspot, but it does not remove it completely. The hotspot can also be removed by software, because it is not moving and therefore always at the same spot. But the problem with this solution is that the table is not sensitive at this spot. At our observe actions we determined that the spot is most of the time smaller than a nger, so if the hotspot image is subtracted from the captured image of the camera the blob of the nger is not lost, but has a hole inside, which is not interfering the detection process.
Figure 4.1: Reection of the beamer produces a hotspot in the picture To remove the hotspot by software, it is necessary to detect where the exact position of the hotspot is, in order to subtract it. A few images are captured when the projector is projecting a black image and no user is touching the surface. More than one image is taken, because the camera and the projector are not synchronized. Later on, images that are not overexposed are selected, which is calculated by the algorithm presented earlier. A resulting image is calculated by taking the maximum color value for each pixel of the imageset. Next a few images are captured when the projector is projecting a white image. Here again a resulting image is created by combining the images. Afterwards, these two resulting images are subtracted from each other. This gives us an image where only the hotspot is showing. This image can be subtracted from each image captured by the camera to remove the hotspot.
Andreas Holzammer
30
4. Algorithms
Figure 4.2: Illumination of the surface. The image has been normalized and colored to show the illumination dierences. The image should be totally red to be even illuminated. and the captured images can be normalized to this illumination. The illumination does not change, because all the parts are mounted to the table and nothing moves. To measure the illumination the same setup is used, as for the detection. An image is needed which shows the maximum brightness in each pixel that can be produced by a hand. To capture such an image, the camera captures images of the surface and these images are combined in the following manner. For a pair of two images the maximum brightness of each pixel is calculated for the resulting image. This result is combined with the next image and so on. On the surface a hand needs to be placed at all locations to get the the maximum brightness for each pixel. It is important to use a hand, because dierent materials have dierent reection properties and even hands have dierent reection properties. The resulting image has to be blurred, because it is not possible to put a hand in all positions. Also, the black frame around the surface is colored white, because these regions are outside of the surface. We call the resulting image illumination image. The image for our table is shown in Figure 4.2. To normalize the captured images, each pixel of the captured image is divided by the corresponding pixel in the illumination image and then multiplied by the maximum brightness value of the captured image (typically 255 for an 8 Bit greyscale image).
31
ngertips. A ngertip has a certain size, so all bigger spots and smaller spots can be removed with a bandpass lter. This lter extracts the spots that have the certain size. Afterwards a threshold is applied and the blobs are detected.
Andreas Holzammer
32
4. Algorithms
Figure 4.3: Convexity Defects of a Hand. Convexity defects are drawn in red and the depth is shown with arrows
Figure 4.4: The left image shows a non-smoothed contour of a hand and right a smoothed contour The contour-nding algorithm can make the contour of the hand very rough, because the edges of the hand can be very noisy. Figure 4.4 shows a non-smoothed contour of a hand. Especially at the arm which is normally fading out because the user is touching the surface from above. One approach is to smooth the image with a smoothing lter, which needs a lot of performance and can merge two ngers which are close together. Due to the fact that we want a real-time application this is not an option. Another approach is to smooth the contour. The points of the contour are sorted clockwise by the contour nding-algorithm of OpenCV. The smoothing algorithm works as follows:
go through the contour clockwise with a step size of s collect n neighbors on the contour around the certain point take the average position of the collected neighbors
33
The algorithm rst picks one point on the contour pi and for this point pi the algorithm collects its neighbors on the contour. To do so, the algorithm collects (n/2) points that are next in the contour and (n/2) points that are before point pi in the contour. These are the neighbors, because the points of the contour are sorted clockwise. Then the arithmetic average is calculated:
n
pj n The resulting point pk is added to the new smoothed contour. After that a new point is picked out of the contour at a step size of s; this is repeated until the algorithm is at the start of the contour again. Due to the fact that the contour is closed, there are previous points for the start and next points for the end of the contour. The parameters are depending on the roughness of the input contour. Experiments showed that s = 4, n = 10 are sucient parameters for out setup. A result is shown in Figure 4.4. Shape-based Detection Malik and Laszlo described in 2004 a shape-based ngertip-detection algorithm [41], where ngertips can be extracted out of the contour itself. As stated before a nger has a certain size and a certain width. A nger can be approximated with a cone, therefore a triangle can be tted on the ngertip of a contour. Again the contour consists of points that are ordered clockwise. We take three points p, a, b on the contour (shown in Figure 4.5), which form a triangle. These points should be at a certain distance. The ngertips point out of the contour. So we need only triangles that are pointing outside of the contour. Two vectors can be established: pa, pb, with which we calculate the 3D crossproduct, where the z component is zero. A direction of the triangle can be determined by the right-hand rule. If the z component of the 3D crossproduct is positive the triangle points to the outside of the contour. We can also determine the angle ab (shown in Figure 4.5) and if that angle is in a certain range, it is possibly a ngertip. Due to a rough contour as described earlier, a lot of false possible ngertips are detected. Hence, here again the contour smoothing is applied to reduce the false positive rate. One side eect is that the algorithm is faster, because the contour has fewer points after the smoothing. But still with the contour smoothing false positives are recognized. Due to the curvature of the ngertip more triangle positions are found for one ngertip. These triangle points have to be grouped and one representative has to be found for that group. The found points of a ngertip are likely close together and ordered clockwise on the contour, so these points are one after another in the list of the possible ngertip list. The algorithm goes through the list and calculates the pairwise distance of the points and pushes the points that are close together onto a stack. If it gets to a point that is not close to the last point pushed to the stack, it picks out the point that is in the middle of the stack and denes this as our representative for the ngertip. After that the algorithm goes on like this, until it is nished with the possible ngertip list. pk =
j =0
Andreas Holzammer
34
4. Algorithms
p ab a b
Figure 4.5: Dominant point detection The possible ngertips can be ltered as follows to reduce the false positive rate. Touches are bright spots on the image. So we have to nd a bright spot close to the ngertip to verify that the chosen point is really a ngertip. For this purpose the image is analyzed. The mean shift algorithm, which was developed by Fukunaga and Hostetler in [20], is used to nd the bright spot. The mean shift algorithm can nd local extrema in density distribution data sets. For our purposes the density dataset consists of the brightness values of the pixels. The algorithm can be easily described as hill climbing on a density histogram. It is robust in a statistical manner; this means that it ignores outliers in the data. A local window is used to ignore points which are far away from the peaks of the data and then the window is moved. The algorithm works as follows. 1. a window size is chosen 2. an initial starting point is chosen 3. calculate the window center of mass 4. move the window to the center of mass 5. go to step 3, until the window is not moving anymore Comaniciu and Meer proved in [11] that the mean shift algorithm is always convergent. The mean shift algorithm is now shifting the window in the direction of the bright spot. The implementation of the mean shift algorithm in OpenCV is slightly dierent; it quits the loop if a certain epsilon is achieved or x iterations have been reached. As window size, a normal size of a nger is chosen. The starting point of the algorithm is the possible ngertip. The algorithm stops as stated above if the window is above the bright spot or x iterations are reached. The algorithm also provides the sum of the brightness values of the pixels which are in the window. If this sum is over a certain value the algorithm has found the bright spot and veried that the possible ngertip has a touch related to it. Discussion The convexity defects algorithm has weaknesses with ngers which are not stretched out. The problem is to determine if the chosen convexity defect is between two ngers and not between a nger and the hand. The diculty of the problem
35
depends on how much of the arm can be seen on the captured image. Also, the roughness of the contour disturbs the detection of both algorithms. Both methods are faster than applying a bandpass lter to the captured image and applying a blob detection afterwards. The convexity defects algorithm is not used because of its restrictions (can only detect two or more ngers).
mpq =
i=1
I (x, y )xp y q
Where n is the number of points in the contour and I (x, y ) is the brightness value of the pixel at position (x,y). So the centroid can be calculated as follows: xc = m01 m10 , yc = m00 m00
The orientation of an object can be described as the tilt angle between the x-axes and the major axis of the object (which can be seen in Figure 4.6). This corresponds to the eigenvector with minimal eigenvalue. In this direction the object has its biggest extension. The orientation can be calculated like this: m20 m00 m11 b= m00 m02 c= m00 2b ac+ 4b2 + (a c)2
a=
= arctan
This angle points to the direction of the biggest extension, so if there is an arm in the DI image, it points in this direction. But if there is no arm in the image it points to the longest nger seen in the image, which should be typically the middle nger.
Andreas Holzammer
36
4. Algorithms
Figure 4.6: Orientation angle theta, derived from central moments The orientation vector needs to be corrected due to the fact that it could point to the arm or to a nger. If the vector points to the direction of the arm the vector needs to reversed. To determine if the vector points to the arm, or to a nger a vote is done. If more then half of the nger vectors are pointing in the direction of the orientation vector the orientation is accurate; if not the vector needs to be rotated by 180 degrees. The vectors of the ngers can be derived from the centroid and the points of the ngertips. The orientation vector can be derived from the centroid and the angle . Angles between the orientation vector and the ngertip vectors are calculated with the dot product of two vectors as follows: cos = ab |a| b This calculates the smallest angle between these two vectors a,b. This means the angle is either clockwise or counter clockwise. If the angle exceeds 90 degrees it is pointing in the opposite direction. If more than half of the ngertip vectors do not point in the same direction, the orientation vector needs to be reversed. To reverse the orientation vector 180 degrees are added to the angle.
4.3 Post-processing
The detected features normally needs to be post-processed in the following manner. The objects that are detected in the individual images need to be tracked over the time. Also, the position needs to be transformed from image space to the screen coordinate space.
4.3.1 Undistortion
Due to the fact that we use a sheye lens for the camera, the pictures taken by the camera come with a barrel distortion eect. For correcting this distortion a camera calibration has to be done. One approach would be a linear approximation of this eect, but the eect is not linear, so only very small windows can be used. This means the user needs a lot of calibration points in the calibration process to get a sucient calibration result. To correct this eect suciently the camera calibration was done with the Camera Calibration Toolbox for Matlab [7]. For the process approximately 20 images of
4.3 Post-processing
37
Up
Down
Moved
Figure 4.8: States of a touch, derived by tracking touches a planar checkerboard (dierent angles, scales, etc.), which cover about 1/4 of the image, are needed. An example image is shown in Figure 4.7. The Camera Calibration Toolbox for Matlab then calculates the specic parameters of the lens. With this data a distorted image can be undistorted, but to avoid performance issues, only the positions of the touches and hands are undistorted.
4.3.2 Tracking
The detected touches and hands need to be tracked over the time. For multi-touch interaction it is not enough to detect touches in each frame, because we want to know how a touch has moved. The following information is needed if a new touch was introduced, a touch has moved or a touch has left, which is shown in Figure 4.8. For deriving this information the touches of the previous detection round are required, which we call old touches; the current detected touches are called new touches. The new touches need to be assigned to the old touches to derive the information stated above. There are several approaches for tracking positions of objects. One is the stable marriage problem algorithm, which was created by Gale and Shapley in 1962 [21]. Another simple solution is to nd for each new touch the closest old touch. If these two touches are close together it can be stated that these two touches belong together and the touch has actually just moved. We have chosen this approach because it is easy to implement.
Andreas Holzammer
38
4. Algorithms
The algorithm is altered for the matching of FTIR and DI, as follows. The user is mostly pushing with pressure, if he initially touches the screen and later on when he drags the nger the pressure gets lesser. Because of this fact the FTIR technology loses the track of the touch. The DI technology is very sensitive and produces a high false detection rate, which can be very disturbing to the user as Wigdor et al. [55] state. Therefore, only new FTIR touches are kept. If a new DI touch is detected, which cannot be matched with an old touch, it is ignored.
4.3.3 Calibration
To transform the position of touches and hands from image space to the screen space a calibration is needed. The screen coordinates are normalized to a scale between 0 and 1 in width and height, so that various display techniques can be used for the image displaying. For calibration purposes a calibration tool on the client side is used to calculate the transformation. To do so, the user needs to push nine points. These nine points are sucient because the barrel distortion eect is removed beforehand. With these nine points a perspective transformation is calculated and the touch points are transformed with that perspective transformation.
In this chapter the combination of Frustrated Total Internal Reection and Diused Illumination is discussed. First the resulting images of the individual techniques are discussed and afterwards it is discussed how these images can be combined and how the data of the individual techniques can be combined. Figure 5.1 shows the basic hardware approach of a combination. Each of the technologies has its own weaknesses. The FTIR technology has its strength in touching, but has problems with less pressure. The issue is that if the user rst touches the surface he applies pressure, but if he drags the nger less pressure is applied and then the FTIR eect is not working anymore. The DI technology has the problem that it cannot be determined whether the user is really touching the surface or not, but it is very sensitive. If the surface is not even illuminated the sensitivity is varying depending on the position.
Figure 5.1: A combination of Frustrated Total Internal Reection (red) and Diuse Illumination (blue).
39
40
Figure 5.2: Comparison of hand touch with pressure DI (left) FTIR (right). Colormap (shown at the top) has been changed for the window to show the contrast.
Figure 5.3: Comparison of hand touch with no pressure DI (left) FTIR (right)
41
Figure 5.5: Comparison of touches close together, DI (left) FTIR (right) FTIR image illustrates the actual touches of the nger, but gives no information about how these touches belong together. The DI image illustrates the contour of the hands and some depth information by the gradients in the image. Depth information also can be approximated by the brightness at a certain spot of the image. But these informations are very rough, because each material has dierent infrared reection characteristics. The human skin reects the infrared light fairly good [58]. If the user wears a long shirt, a sweater or even a watch the image looks very dierent. If the hand is laid at on the surface almost no information can be extracted from the FTIR image, as illustrated in Figure 5.4. Parts of the palm can be seen and a few ngers that are very dark. On the other hand, in the DI image the hand can be seen clearly. When two touches are close together it is dicult to separate them from each other in the DI image in contrast to the FTIR image; there the dierence between these two touches can be clearly seen (Figure 5.5). The human eye can determine these touches, but the computer needs a few steps to detect the touches.
Andreas Holzammer
42
Figure 5.6: Normal FTIR Pipeline shown on the left, a normal DI Pipeline is shown on the right
Figure 5.7: left: Hand image with FTIR and DI LEDs on, right: Hand with FTIR and DI image multiplied
which was discussed in section 4.1. After this, bright spots are detected and then the coordinates of these spots are post-processed to track the touches and transformed to match the screen coordinates. The transformation includes the undistortion and calibration. For a DI setup there are several pipeline models possible. The general pipeline order is that the captured image is preprocessed with subtraction of ambient light and background and then ngertips are extracted, as described in section 4.2.3. These coordinates are post-processed like in the FTIR pipeline. For detecting ngertips a bandpass lter can be applied and then bright spots can be detected as ngertips. Another method is to evaluate the hand contour as described in section 4.2.3. Sample pipelines for FTIR and DI are illustrated in Figure 5.6.
5.3 Combination
One way of combining the FITR and the DI eect is described by Weiss et al. in 2008 [54]. They built a table on which the user can put silicone widgets, to work with and get a haptic feedback. They can change the labeling of the widgets by changing the image projection to the silicone widgets, because they call them Silicone ILluminated Active Peripherals (SLAP). They have the FTIR and DI LEDs switched on at same time. Their table cannot switch the individual LEDs on and o at the same time, in contrast to ours that can be switched on and o
5.4 Matching
43
Figure 5.8: Left post-processed image where FTIR and DI LEDs are switched on (threshold applied) and right a multiplied FTIR and DI image (additional threshold of 70 applied) individually. If there are individual images, these images can be multiplied, as shown in Figure 5.7, in contrast to turning on FTIR and DI LEDs at the same time. In the multiplied image each pixel is multiplied with the corresponding pixel in the other image. As one can see, the ngertips get very bright, but also noise can be seen where the hand is placed. It can be assumed that in the FTIR image the pixels where the hand is at are not zero. Due to the fact that two 8-bit brightness values are multiplied and the result is written again in an 8-bit brightness value, data is lost. A bigger brightness depth is needed or the result needs to be scaled to t into an 8-bit value. Due to the fact that we process 8 bit greyscale images we want to stick to the 8-bit. Practical tests have shown that a scale factor of 0.2 is sucient for our setup. The resulting images can be seen in Figure 5.8, where the touches can be clearly seen. The issue with this technique is that a time gap between the FTIR and DI image exists. If the user is moving his hands, which is typical for touch displays, the images are not matching exactly. If the speed increases the blobs are getting smaller due to the dierence. This goes so far that at a certain speed no blob can be detected. Therefore, this method is not used, because it is very dicult to extract touch information from these images.
5.4 Matching
Another approach is to match the results of the FTIR and DI eect. As stated above, the FTIR eect can be used to detect the actual touches, but only at a certain amount of pressure. On the other hand a DI image gives the information about the contour of the hand; the ngertips can be extracted and touch points can be approximated with the help of the extracted ngertips as stated in section 4.2.3. The results of the FTIR eect can be seen as stable information, because the user actually needs to touch the surface to produce a touch; therefore this information is preferred. To match both informations, the touches extracted from the FTIR images are taken as base information and tested against the contour extracted touches from the DI image. This is done with the algorithm cvPointPolygonTest of OpenCV 1.1pre [31]. Algorithms to determine if a point is inside a polygon are described in [23]. All the touches of the FTIR image are tested if they are inside a hand contour and are grouped together to a hand. If there are ve touches associated, this hand is complete and does not need to get possible ngertips from the DI image.
Andreas Holzammer
44
If there are more than ve FTIR touches associated with the hand contour the biggest touches are removed from the hand, because if there are more than ve touches in an FTIR image it is possible that the user has put his hand at on the surface and the palm of the hand is producing the extra touches, which are normally bigger than touches of ngers. If there are fewer then ve touches inside a contour, the extracted ngertips of the contour are checked against the FTIR touches. If an extracted ngertip is below a certain distance to an touch, it is assumed that this extracted ngertip and touch are produced by the same nger. Therefore, this is not a new touch which can be associated with the hand and is not added to the hand. If there is no touch close to the ngertip, the ngertip is added to the hand. But it can be that the ngertip-detection algorithm detects a false ngertip and then the hand has more than ve ngers and normally a human does not have more than ve ngers on one hand. The brightness of the ngertip touches are calculated as a spin-o product of the mean shift algorithm described in section 4.2.3. It is more likely that bright ngertips are really a ngertip. Therefore, only so many touches are added to the hand so that it has ve ngers. Due to the fact that pressure is needed to produce bright spots in the captured image, it can be assumed that the user actually touches the surface. The extracted ngertips of the DI image can be extracted even if the user is not touching the surface, so it is not clear if the user is above the the surface or touching it without pressure. So a height of a touch can be approximated as follows:
Touch in FTIR image only, height is 0 Touch is in FTIR image and corresponding ngertip in DI image, height is 0 Fingertip in DI image only, height is 1
A more accurate way would be to take the brightness of the ngers/touches to approximate the height, but for doing so an even illumination of the surface is needed and that the objects are reecting infrared light similar. This is not established, because it needs a lot of calibration work and a good normalization of the surface illumination is needed.
Chapter 6 DIFTIRTracker
The software for an optical multi-touch device is the most important part, because this software is analyzing the captured images of the camera to extract the touch information. With the algorithms described in the last chapter an application can be created to determine and track touches. By looking at some other tracking software, it comes clear that we want to develop our own software. In the Appendix is a short overview of tracking software developed by various authors. The DIFTIRTracker was developed to rapidly test all kinds of technologies and to be exible in order to integrate new functionalities. This software should process data from the hardware and control the hardware in real-time. The idea for the software was to split everything into small parts (modules), so that the software can be easily extended and most modules can be reused. All those parts should have the possibility to display what these modules are doing (show what happened to the image, or what was detected). These modules can be connected to each other in various ways; the only limitation is that each input of a module has only one other connected module. There are three categories of modules: input, lter and output modules. Input modules get a datagram from an external source, e.g., a camera, video le, image le. Filter modules take data, process it and send it to the next module. Output modules send, save etc. a datagram. A datagram normally represents an image, but it is not restricted to image datagrams; also touch datagrams and even others can be implemented. Each module is working in its own thread so that the application can take advantage of multi-core systems and is very scalable.
45
46
6. DIFTIRTracker
The DIFTIRTracker is written in C++ and uses the Qt framework [45] from Nokia. This combination is chosen because the application should run in real-time and the software should be platform independent. The only reason why it is not platform independent is the camera module, because the software development kit (SDK) from Point Grey [49] is not platform independent.
6.2 Pipeline
Due to the fact that every module is running in its own thread all the modules have to be synchronized. This is done by pipelining the modules. Each connection between modules is represented by a ring buer. If a module is done with its processing of the datagram it puts the resulting datagram into the ring buer, wakes the next module in the pipeline. This next module takes the datagram out of the ring buer and processes it and puts it again in the next ring buer and so on. A datagram can only be sent into one direction, but a circle connection can be built for feedback. The modules can wait for datagrams in a ring buer or can determine if there is any datagram in the ring buer.
47
0 Hand
1 Touch
2 Touch
3 Touch
4 Touch
5 Touch
... ...
Andreas Holzammer
Chapter 7 Results
In this chapter, an application is presented that uses the additional information derived from the combination from the FTIR and DI eect, as proof of concept. Afterwards, an informal user study is presented where the results of the technologies which use the FTIR, the DI eect as well as the combination of both are looked at. Finally the conclusion of this thesis and its outlook for the future is given.
49
50
7. Results
d1
d2
Figure 7.2: Determination if the hand is a right or a left hand To make the labels readable to the user they need to be adjusted to the user position. If we only had the information of the touches this would be impossible. Also, the information that these touches are belonging to one hand is needed to provide this kind of menu. The orientation of the hand is used to adjust the label position in the way that the user can read it. It makes no dierence where the user is standing when they want to use the menu. The user can be in front or sideways of the table to use the menu. The label is also tilted by 45 degrees to reduce the overlap of the labels. Due to the fact that the ngers are not sorted by the touch server, the labels of the menu would be unsorted too. The user is very confused if he wants to go into the menu and every time the menu is sorted dierently. So the ngers need to be sorted, that the user has the same menu each time he opens it up, by laying his hand down. To sort the ngertips of a hand, the hand orientation is used to produce a vector from the centroid in the direction of the arm. This can be done with the orientation provided by the touch server by rotating the vector by 180 degrees. Vectors for the ngertips can be created by taking the centroid and the coordinates of each individual ngertip. Afterwards the angles are calculated between the arm vector and the ngertip vectors. Due to the fact that with the help of the dot product a minimal angle between two vectors can be calculated, the ngertips cannot be sorted by this angle. To calculate the clockwise angle, it is checked if the dot product has calculated a clockwise angle or a counterclockwise angle. If it is a counterclockwise angle, the angle is corrected. The result is sorted. Later on the ngertips can be used for the labeling, so that the user always has the same order of the labels in the menu. It can be determined if the hand is a right or a left hand. A left hand has a thumb that is pointing in the right direction as shown in Figure 7.2, and the right hand thumb is pointing to the left. The thumb is also positioned closer to the wrist than the other ngers. So the thumb and the index nger have mostly a higher distance between each other than the pinky and the ring nger. If the ngers are sorted clockwise and the distance between the rst and the second (d1) are greater than between the last and the previous (d2) it is a right hand, and if not it is a left hand. Figure 7.2 shows a left hand, where the two distances are labeled d1 and d2. Due to the fact that it is known which hand it is, the labels can be adjusted to
51
this information. For a right hand the labels can be tilted to the right, and if it is a left hand the labels can be tilted to the left. It is also possible to show dierent menus for the right and the left hand, for example, with the right hand a color can be chosen and with the left hand a brush size.
Figure 7.3: Community Earth Another application has been created to test the blob detection and tracking rate of the implemented technologies. A maze application is found sucient enough for that task, which is written in C++ with the QT framework of Nokia. The maze can be solved with one or two ngers. To solve the maze the user has to push the starting circle (green) and has to drag his nger through the maze to the end point (blue). If the user is hitting the wall with his nger he has not completed the maze and has to start over again. If the touch detection or the tracking fails, the touch is lost and the user has to start over again too. With two ngers the user pushes the start circle with two ngers and drags them to the end point. A restriction with two ngers is the distance between ngers, when the user exceeds the maximum distance he has to start over again. Statistics over the successful and unsuccessful tries are presented in the upper left corner. The maze itself can be
Andreas Holzammer
52
7. Results
drawn with this application too. The maze has been created to be easily solved and only as big as it could be solved without changing the stand position. The maze should be easy to solve, because it is not the task for the users to solve the maze, they should test the touch detection and tracking. The size of the display is so big that the full screen cannot be reached from one standpoint; that is why the maze is not created to ll the full screen. Seven unpaid participants (six male and one female) in the age of 20 to 28 were asked to participate in the informal user study. All users were right handed and their education level varied from high-school student to post-graduate degrees. None of the participants had experience with multi-touch devices. The following ve dierent processing pipelines were tested.
FTIR pipeline DI with hand detection pipeline DI with a bandpass lter pipeline FTIR matched with DI with a bandpass lter pipeline FTIR matched with DI with hand detection pipeline
The pipelines which were used, are shown in Figures 7.6 and 7.7. Ambient light subtraction is not used for pipelines that involve the DI technology, because most of the DI pipelines make heavy use of modules that need a great deal of performance, which makes the latency a little bit better. The participants where asked to test three to four of those processing pipelines with the Community Earth application and with the maze application. In the Community Earth application the users were rst introduced to the application by showing the gestures illustrated by Figure 7.4 and later on they were asked to navigate to their homes. Afterwards, with other pipelines, they should nd other places like New York, Paris etc. In the labyrinth application the users should try to solve the labyrinth with one and two ngers. The participants were observed while doing their tasks and afterwards informally interviewed about their opinion on how well each detection pipeline is working. Almost all participants were comfortable with the multi-touch interface after a short starting phase; only one had problems after a while with the navigation
Pan
Zoom in
Zoom out
Rotate
Tilt
Figure 7.4: Gestures used by Community Earth, red circle indicates starting point
53
FTIR
DI
DI
Tracking Transformation
Application
Transformation
Application
Transformation
Application
gestures. Two participants used their middle nger to navigate, which was a problem for the DI technology, because the participants where putting other ngers on top of the middle nger, which changed the characteristic of the nger. In Community Earth a problem for the users was that the zooming point is always the middle of the screen, in contrast to the middle of the ngers, which is what they expected. The zooming in Community Earth is logarithmic, because if the user is far away he gets to the position he wants to go to faster, but many participants had problems with this eect, because it goes to fast. The participants were disorientated if a false touch was detected, because Community Earth can be very sensitive with zooming and moving. The position can jump fairly wide if a false touch is detected. Therefore, the participants were very disturbed by this. Most of the participants preferred the simple FTIR technology, because of its low false detection rate, with no jumps and accidental zooming and tilting. Just one of the participants preferred the DI with bandpass lter pipeline. The matching of the two technologies was for many too sensitive, even with new DI touches dropped (described in section 4.3.2). If the latency was too high, the participant was wondering why he was zooming so far in or rotated too far. But the false detection rate depends on the ambient light conditions that were present at the time the
Andreas Holzammer
54
FTIR matched with DI with a bandpass filter pipeline
Background Remove Blob Detection
7. Results
FTIR matched with DI with hand detection pipeline
Background Remove Blob Detection
FTIR
DI
FTIR
DI
Match Blobs
Tracking
Transformation
Application
Transformation
Application
Figure 7.7: Pipelines with combined technologies participants were performing the informal user study. In the labyrinth applications users had problems with the hand detection pipeline, because they did not spread their ngers far enough or were touching from above so that the nger cannot be clearly seen in the DI image. Overall it can be said that the participants preferred technologies which have less false detection and are accepting less detection sensitivity. In contrast to the hypothesize, which stated that the combination of the FTIR and DI eect will match their expectations, the single FTIR technology worked best for them. This had several purposes, like the longer latency of the combined technologies and the higher false detection rate.
7.3 Conclusion
We have presented a several methods to combine the Frustrated Total Internal Reection and Diused Illumination eect. Therefore, we gave an overview of multi-touch technologies and discussed why FTIR and DI should be combined. Next we looked at the hardware setup, which is needed for the combination and discussed why we took these hardware parts. After all the algorithms have been presented, which are needed for the preprocessing, feature detection and the post-processing of the data which is captured by the camera. The following part of the thesis dealt with the combination of the informations gathered by the Frustrated Total Internal Reection eect and the Diused Illumination information. It was discussed whether it is better to combine the information in the pre- or the post-processing step. An application for rapid testing and debugging was developed to detect and track touches for various optical technologies. This easy-to-use application can even be used by people with no programming skills, because the user can visually combine standard modules. To test the developed methods for combing the technologies, a small informal user study has been performed. Seven participants were asked to use the multi-touch table with two applications, Community Earth and a self-built
55
maze application. A demonstration application has been created to show how the extra information from the DI technology can be used for user interaction. The user can lay his hand down on the surface to open up a user context menu to choose colors by pushing the specic nger. Experiments have shown that ambient light reduces the contrast of touch information to the background signicantly. Therefore, the touch sensivity is less than without ambient light. It has been shown that not only touch informations can be derived from DI images, also the connection of touches to a hand and the hand orientation can be derived, which can be used to adjust user interfaces in that way that the user can see it the right way (not upside down), wherever he is standing. The informal user study has shown that many people are very disturbed by a false touch detection and accept a less sensitivity in order of higher precision. Therefore, the users of the study preferred the more stable recognition with the FTIR eect.
Andreas Holzammer
Chapter 8 Appendix
8.1 Several Spectra of Infrared Bandpass Filters
Here a selection of transmission curves of infrared lters are printed.
Figure 8.1: Spectrum of one overexposed photo negative. Figure taken from [40]
Figure 8.2: Spectrum of two overexposed photo negative. Figure taken from [40]
57
58
8. Appendix
Figure 8.3: Spectrum of one oppy disk. Figure taken from [40]
Figure 8.4: Spectrum of two oppy disks. Figure taken from [40]
Manufacturer Acer Optoma 3M 3M NEC Optoma (Lense) Optoma (Lense) Optoma (Lense) Toshiba 3M Hitachi Sanyo BenQ
Model S1200 EX525ST SCP740 DMS700 WT610 EP-780 EP-782 EP-776 EX-20 SCP717 CP-A100 PLC-XL51 MP522 ST
Projection System DLP DLP DLP DLP DLP DLP DLP DLP DLP DLP LCD LCD DLP
Manufacturer Acer Optoma 3M 3M NEC Optoma (Lense) Optoma (Lense) Optoma (Lense) Toshiba EX-20 3M Hitachi Sanyo BenQ
Model S1200 EX525ST SCP740 DMS700 WT610 EP-780 EP-782 EP-776 0.6m - 7.m SCP717 CP-A100 PLC-XL51 MP522 ST
Projection Screen Size 4.15@2m or 2.082@1m 1.04m 7.72m 1.27m 2.48m 1.02m 2.54m 1.016m@64mm 2.032m@461mm 1.2m 12m 0.79m 7.97m 0.79m 7.97m 0.5m 1.5m 1.5m 2.5m
Andreas Holzammer
0.69m 7.53m
59
60
8. Appendix
8.3 Software
In this section a few touch and ducial marker tracking applications are presented to give an overview of what has already been done.
8.3.2 CG Tracker
The CG Tracker from the Technical University of Berlin was rst created as part of a project in the winter semester of 2007/08. Afterwards Stefan Elstner wrote a graphical user interface(GUI) as part of his master thesis [16] for the tracker. The tracker was designed for the FTIR technique. It uses its own, newly created network protocol to serve as a multi-touch application. This tracker also tracked patterns of a display, which is placed on top of the table. This software is hard to extend, because it makes heavy use of the Windows application programming interface.
8.3 Software
61
Figure 8.6: CG Tracker, which was developed by a project at the Technical University Berlin and Stephan Elstner
8.3.3 reacTIVision
Figure 8.7: ReacTIVision in action. Picture taken from their website [39] ReacTIVision has been developed by Martin Kaltenbrunner and Ross Bencina at the Music Technology Group at the Universitat Pompeu Fabra in Barcelona, Spain. It was developed for the reacTable project, which has already been described in the related work section. It is an open-source, cross-platform computer vision framework to track ducial markers attached to physical objects, as well as touching ngers. It can be used with various cameras. Due to the fact that the software tracks ducial markers, it can only handle optical technologies that support ducial
Andreas Holzammer
62
8. Appendix
markers, like DI and DSI. ReacTIVision implements the TUIO protocol, so all supporting applications can be used.
8.3.4 Touchlib
Touchlib is a library for creating multi-touch interaction surfaces. It is written in C++, works only with Windows, will interact with most types of webcams and has no graphical user interface. The user can build his own touch tracking application fairly easily due to the simple programming interface. Touchlib communicates through the TUIO protocol with various multi-touch applications.
VII
Bibliography
[1] Acer. http://www.acer.com, accessed on 09/26/2009 12:00AM. [2] National Aeronautics and Space Administration. World wind. http://worldwind. arc.nasa.gov/java/, accessed on 10/09/2009 3:30PM. [3] Marc Alexa, Bj orn Bollensdor, Ingo Bressler, Stefan Elstner, Uwe Hahne, Nino Kettlitz, Norbert Lindow, Robert Lubkoll, Ronald Richter, Claudia Stripf, Sebastian Szczepanski, Karl Wessel, and Carsten Zander. Touch sensing based on ftir in the presence of ambient light. techreport, Technical University of Berlin, 2008. [4] Linus Ang, Charles Lo Taha Bintahir, and Zhen Zhang Pat King. Community earth. http://nuicode.com/projects/earth, accessed on 10/09/2009 3:30PM. [5] Apple. Mac os x. http://www.apple.com/macosx/, accessed on 09/24/2009 3:15PM. [6] Bj orn Bollensdor. Multitouch navigation and manipulation of 3d objects. Masters thesis, TU-Berlin, 2009. [7] Jean-Yves Bouguet. Camera calibration toolbox for matlab. http://www.vision. caltech.edu/bouguetj/calib_doc/, accessed on 10/01/2009 9:15PM. [8] Bill Buxton. http://www.billbuxton.com/multitouchOverview.html, accessed on 08/06/2009 3:00PM. [9] W. Buxton and B. Myers. A study in two-handed input. SIGCHI Bull., 17(4):321326, 1986. [10] Nintendo Co. Ltd. nintendo ds. 09/22/2009 3:00PM. http://www.nintendo.com/ds, accessed on
[11] D. Comaniciu and P. Meer. Mean shift analysis and applications. In Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on, volume 2, pages 11971203 vol.2, 1999. [12] Computar. http://computarganz.com/, accessed on 09/26/2009 12:00AM. [13] Paul Dietz and Darren Leigh. Diamondtouch: a multi-user touch technology. In UIST 01: Proceedings of the 14th annual ACM symposium on User interface software and technology, pages 219226, New York, NY, USA, 2001. ACM. [14] Rick Downs. Using resistive touch screens for human/machine interface. Technical report, Texas Instruments Incorporated, 2005. [15] Florian Echtler, Manuel Huber, and Gudrun Klinker. Shadow tracking on multi-touch tables. In AVI 08: Proceedings of the working conference on Advanced visual interfaces, pages 388391, New York, NY, USA, 2008. ACM. [16] Stefan Elstner. Combining pen and multi-touch displays for focus+context interaction. Masters thesis, TU-Berlin, 2009. [17] Zach Lieberman et al. openframeworks. http://www.openframeworks.cc, accessed on 09/29/2009 4:30PM. [18] Sony Computer Entertainment Europe. Playstation eye. http://en.playstation.com, accessed on 10/04/2009 10:30AM.
Andreas Holzammer
VIII
Bibliography
[19] David A. Forsyth and Jean Ponce. Computer Vision: A Modern Approach. Prentice Hall, us ed edition, August 2002. [20] K. Fukunaga and L. Hostetler. The estimation of the gradient of a density function, with applications in pattern recognition. Information Theory, IEEE Transactions on, 21(1):3240, Jan 1975. [21] D. Gale and L. S. Shapley. College admissions and the stability of marriage. The American Mathematical Monthly, 69(1):915, 1962. [22] NUI GROUP. http://nuigroup.com/, accessed on 09/26/2009 11:00AM. [23] Eric Haines. Point in polygon strategies. pages 2446, 1994. [24] Jeerson Y. Han. Low-cost multi-touch sensing through frustrated total internal reection. In UIST 05: Proceedings of the 18th annual ACM symposium on User interface software and technology, pages 115118, New York, NY, USA, 2005. ACM Press. [25] Jordan Hochenbaum and Owen Vallis. Bricktable. bricktable/, accessed on 10/05/2009 9:45AM. http://flipmu.com/work/
[26] Alexander Hornberg. Handbook of Machine Vision. Wiley-VCH, 2006. [27] Ming-Kuei Hu. Visual pattern recognition by moment invariants. Information Theory, IRE Transactions on, 8(2):179187, February 1962. [28] Apple Inc. iphone. http://www.apple.com/iphone/, accessed on 10/05/2009 3:00PM. [29] Google Inc. Google trends. http://www.google.com/trends, accessed on 09/26/2009 11:00AM. [30] Rosco Laboratories Inc. http://www.rosco.com/us/corporate/index.asp, accessed on 09/27/2009 3:00PM. [31] Intel and Willow Garage. Opencv. http://opencv.willowgarage.com/wiki/, accessed on 09/23/2009 2:00PM. [32] Shahram Izadi, Steve Hodges, Stuart Taylor, Dan Rosenfeld, Nicolas Villar, Alex Butler, and Jonathan Westhues. Going beyond the display: a surface technology with an electronically switchable diuser. In UIST 08: Proceedings of the 21st annual ACM symposium on User interface software and technology, pages 269278, New York, NY, USA, 2008. ACM. [33] Benjamin Walther-Franks Jens Teichert, Marc Herrlich, Lasse Schwarten, Sebastian Feige, Markus Krause, and Rainer Malaka. Advancing large interactive surfaces for use in the real world. Technical report, Digital Media Group, TZI, University of Bremen, 2009. [34] Sergi Jord` a. Interactive music systems for everyone: Exploring visual feedback as a way for creating more intuitive, ecient and learnable instruments. In Proceedings of the Stockholm Music Acoustics Conference (SMAC 03), Stockholm, Sweden, 2003. [35] Martin Kaltenbrunner, Till Bovermann, Ross Bencina, and Enrico Costanza. Tuio - a protocol for table based tangible user interfaces. In Proceedings of the 6th International Workshop on Gesture in Human-Computer Interaction and Simulation (GW 2005), Vannes, France, 2005. [36] Sung K. Kang, Mi Y. Nam, and Phill K. Rhee. Color based hand and nger detection technology for user interaction. Hybrid Information Technology, International Conference on, 0:229236, 2008.
Bibliography
IX
[37] Jong-Min Kim and Woong-Ki Lee. Hand shape recognition using ngertips. In FSKD 08: Proceedings of the 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery, pages 4448, Washington, DC, USA, 2008. IEEE Computer Society. [38] Kenrick Kin, Maneesh Agrawala, and Tony DeRose. Determining the benets of direct-touch, bimanual, and multinger input on a multitouch workstation. In GI 09: Proceedings of Graphics Interface 2009, pages 119124, Toronto, Ont., Canada, Canada, 2009. Canadian Information Processing Society. [39] R. Bencina M. Kaltenbrunner. reactivision. http://reactivision.sourceforge. net/, accessed on 09/27/2009 6:00PM. [40] Madian. Spectral analysis of ir leds and lters. http://nuigroup.com/forums/ viewthread/6458/, accessed on 10/11/2009 12:30PM. [41] Shahzad Malik and Joe Laszlo. Visual touchpad: a two-handed gestural input device. In ICMI 04: Proceedings of the 6th international conference on Multimodal interfaces, pages 289296, New York, NY, USA, 2004. ACM. [42] Nobuyuki Matsushita and Jun Rekimoto. Holowall: designing a nger, hand, body, and object sensitive wall. In UIST 97: Proceedings of the 10th annual ACM symposium on User interface software and technology, pages 209210, New York, NY, USA, 1997. ACM. [43] Microsoft. Surface. http://www.microsoft.com/surface/, accessed on 09/29/2009 3:00PM. [44] Microsoft. Windows. http://www.microsoft.com/windows/, accessed on 09/24/2009 3:15PM. [45] Nokia. Qt. http://qt.nokia.com/, accessed on 09/24/2009 3:15PM. [46] Nolan. Peau productions. http://peauproductions.com/diff.html, accessed on 10/04/2009 11:30AM. [47] K. Oka, Y. Sato, and H. Koike. Real-time ngertip tracking and gesture recognition. Computer Graphics and Applications, IEEE, 22(6):6471, Nov/Dec 2002. [48] Osram. http://www.osram.com, accessed on 09/26/2009 12:00AM. [49] Inc. Point Grey Research. 11:00AM. http://www.ptgrey.com/, accessed on 09/26/2009
[50] Ilya Rosenberg and Ken Perlin. The unmousepad: an interpolating multi-touch forcesensing input pad. ACM Trans. Graph., 28(3):19, 2009. [51] Johannes Sch oning, Peter Brandl, Florian Daiber, Florian Echtler, Otmar Hilliges, Jonathan Hook, Markus L ochtefeld, Nima Motamedi, Laurence Muller, Patrick Olivier, Tim Roth, and Ulrich von Zadow. Multi-touch surfaces: A technical guide. techreport, Technical University of Munich, 2008. [52] The Imaging Source. Dmk 21bf04. http://www.theimagingsource.com/de_ DE/products/cameras/firewire-ccd-mono/dmk21bf04/, accessed on 10/04/2009 10:30AM. [53] Midwest Optical Systems. Machine vision lter. http://www.midopt.com/, accessed on 09/26/2009 12:00AM. [54] Malte Weiss, Julie Wagner, Yvonne Jansen, Roger Jennings, Ramsin Khoshabeh, James D. Hollan, and Jan Borchers. Slap widgets: Bridging the gap between virtual and physical controls on tabletops. In CHI 09: Proceeding of the twenty-seventh annual SIGCHI conference on Human factors in computing systems, New York, NY, USA, 2009. ACM.
Andreas Holzammer
Bibliography
[55] D. Wigdor, M. Cronin S. Williams, K. White R. Levy, M. Mazeev, and H. Benko. Ripples: Utilizing per-contact visualizations to improve user interaction with touch displays system. In UIST 09: Proceedings of the 22nd annual ACM symposium on User interface software and technology, 2009. and Steven Shafer. Visual [56] Ying Wu, Ying Shan, Zhengyou Zhang, Zhengyou Zhang Y, panel: From an ordinary paper to a wireless and mobile input device, 2000. [57] Duan-Duan Yang, Lian-Wen Jin, and Jun-Xun Yin. An eective robust ngertip detection method for nger writing character recognition system. Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on, 8:49914996 Vol. 8, Aug. 2005. [58] Hongqin Yang, Shusen Xie, Hui Li, and Zukang Lu. Determination of human skin optical properties in vivo from reectance spectroscopic measurements. Chin. Opt. Lett., 5(3):181183, 2007. [59] Zhiwei Zhu, Kikuo Fujimura, and Qiang Ji. Real-time eye detection and tracking under various light conditions. In ETRA 02: Proceedings of the 2002 symposium on Eye tracking research & applications, pages 139144, New York, NY, USA, 2002. ACM.