You are on page 1of 15

Pers Ubiquit Comput (2006) 10: 285299 DOI 10.

1007/s00779-005-0033-8

O R I GI N A L A R T IC L E

Juha Kela Panu Korpipaa Jani Mantyjarvi Sanna Kallio Giuseppe Savino Luca Jozzo Sergio Di Marca

Accelerometer-based gesture control for a design environment

Received: 18 January 2005 / Accepted: 29 April 2005 / Published online: 23 August 2005 Springer-Verlag London Limited 2005

Abstract Accelerometer-based gesture control is studied as a supplementary or an alternative interaction modality. Gesture commands freely trainable by the user can be used for controlling external devices with with handheld wireless sensor unit. Two user studies are presented. The rst study concerns nding gestures for controlling a design environment (Smart Design Studio), TV, VCR, and lighting. The results indicate that dierent people usually prefer dierent gestures for the same task, and hence it should be possible to personalise them. The second user study concerns evaluating the usefulness of the gesture modality compared to other interaction modalities for controlling a design environment. The other modalities were speech, RFID-based physical tangible objects, laser-tracked pen, and PDA stylus. The results suggest that gestures are a natural modality for certain tasks, and can augment other modalities. Gesture commands were found to be natural, especially for commands with spatial association in design environment control. Keywords Gesture recognition Gesture control Multimodal interaction Mobile device Accelerometer

1 Introduction
A variety of spontaneous gestures, such as nger, hand, body and head movements, are used to convey information in interactions among people. Gestures can hence

J. Kela (&) P. Korpipaa J. Mantyjarvi S. Kallio VTT Electronics, 1100, 90571 Oulu, Finland E-mail: juha.kela@vtt. Tel.: +358-20-722111 Fax: +358-20-7222320 G. Savino L. Jozzo S. D. Marca Italdesign-Giugiaro, Via A. Grandi 25, 10024 Moncalieri, Italy E-mail: giuseppe.savino@italdesign.it E-mail: luca.jozzo@italdesign.it E-mail: sergio.dimarca@italdesign.it

be considered a natural communication channel, which has not yet been fully utilised in humancomputer interaction. Most of our interaction with computers to date is carried out with traditional keyboards, mouses and remote controls designed mainly for stationary interaction. Mobile devices, such as PDAs, mobile phones and wearable computers, provide new possibilities for interacting with various applications, but also introduce new problems with small displays and miniature input devices. Small wireless gesture input devices containing accelerometers for detecting movements could be integrated into clothing, wristwatches, jewellery or mobile terminals to provide a means for interacting with dierent kinds of devices and environments. These input devices could be used to control such things as home appliances with simple user-denable hand movements. For example, simple up and down hand movements could be used to operate a garage door, adjust the volume of your stereo equipment or your rooms lights. As a general additional or alternative interaction modality, the acceleration-based gesture command interface is quite recent, and many research problems still need to be solved. Firstly, it should be examined what kind of gestures are natural and useful for the multitude of tasks they could be used for; and for what tasks other modalities are more natural. Gestures can also be combined with other modalities. The topic is very wide in scope since there are a large number of possible gestures suitable for certain tasks, as well as many tasks that could potentially be performed using gestures. Secondly, the recognition accuracy for detecting the gesture commands should be high; nearly 100% accuracy is required for user satisfaction since too many mistakes may cause the users to abandon the method. Thirdly, in personalising the gestures, the control commands need to be trained by the user. If the training is too laborious, it may also cause the users to abandon the interaction method. Therefore, the training process should be as fast and eortless as possible. Finally, practical implementation and experience are important for empirical evaluation of the gesture modality.

286

Previous work on gesture control can be classied into two categories, camera-based and movement sensor-based. Camera-based recognition is most suitable for stationary applications, and often requires specic camera set up and calibration. For example, the camera-based control device called the Gesture Pendant allows its wearer to control elements in the home via palm and nger gestures [1]. The movement sensorbased approach utilises dierent kinds of sensors e.g. tilt, acceleration, pressure, conductivity, capacitance, etc. to measure movement. An example of such an implementation is GestureWrist, a wristwatch-type gesture recognition device using both capacitance and acceleration sensors to detect simple hand and nger gestures [2]. Accelerometer-based gesture recognition is used in, for example, a musical performance control and conducting system [3]; and a glove-based system for recognition of a subset of German sign language [4]. Tsukada and Yasumura developed a wearable interface called Ubi-nger, using acceleration, touch and bend sensors to detect a xed set of hand gestures, and an infrared LED for pointing a device to be controlled [5]. XWand, a gesture-based interaction device, utilises both sensor-based and camera-based technologies [6]. The creators of XWand present a control device that can detect the orientation of a device using a two-axis accelerometer, a three-axis magnetometer and a oneaxis gyroscope, as well as position and pointing direction using two cameras. The system is also equipped with an external microphone for speech recognition. The user can select a known target device from the environment by pointing, and control it with speech and a xed set of simple gestures. Moreover, gesture recognition has been studied for implicit control of functions in cell phones for answering and terminating a call for example without the user having to explicitly perform the control function [7, 8]. In implicit control, or in other words context-based gesture control, the main problem is that users usually perform the gesture very dierently in dierent cases, e.g. picking up a phone can be done in very many dierent ways depending on the situation. Explicit control directly presupposes that gestures trained for performing certain functions are always repeated as they were trained. In addition to the camera and movement sensorbased gesture control approaches, 2D patterns, such as characters or strokes drawn on a surface with a mouse or a pen can be used as an input modality. This category of input has been referred to as character recognition and gesture recognition in the literature [9]. Some of the current PDAs and mobile devices are equipped with touch-sensitive displays that can be utilised for pen or nger-based 2D stroke gesture control. Using a media player application in a PDA with gestures has been reported to reduce the user workload for the task [10]. Mouse-based 2D strokes have been successfully used as an input modality in the early nineties UNIX workstation CAD program (Mentor Graphics) and todays modern www browsers (e.g. Opera or Mozilla) for page

navigation. The focus of this paper is on 3D movement sensor-based freely personalisable gesture control. As a sensing device, SoapBox (Sensing, Operating and Activating Peripheral Box) is utilised in this work. It is a sensor device developed for research activities in ubiquitous computing, context awareness, multimodal and remote user interfaces, and low-power radio protocols [11]. It is a light, matchbox-sized device with a processor, a versatile set of sensors, and wireless and wired data communications. Because of its small size, wireless communication and battery-powered operation, SoapBox is easy to install at dierent places, including moving objects. The basic sensor board of SoapBox includes a three-axis acceleration sensor, an illumination sensor, a magnetic sensor, an optical proximity sensor and an optional temperature sensor. Various statistical and machine learning methods can potentially be utilised in training and recognising gestures. This study applies Hidden Markov Models (HMMs), a well-known method for speech recognition [12]. HMM is a statistical signal modelling technique that can be applied to the analysis of a time series with properties that change over time. HMM is widely used in speech and handwritten character recognition as well as in gesture recognition in video-based and glove-based systems. The main contributions of this article are the following. Two user studies were conducted. The rst study examined the suitability of gestures and types of gestures for controlling selected home appliances and an interactive virtual reality design environment, Smart Design Studio. The results indicate that dierent people usually use different gestures for performing the same task, and hence these gestures should be easily personalisable. The second user study evaluated the usefulness of gestures compared with other modalities in a design environment control. Gesture commands were found to be natural, especially for commands with spatial association to the design space. Furthermore, the users should be able to use as few gestures as possible for training, since repetitions can be a nuisance. A method for reducing the number of required repetitions, while maintaining proper recognition accuracy was applied [13]. For getting empirical experience of gesture control, a prototype multimodal design environment was implemented, and a gesture recognition system was integrated as a part of the system. The article is organised as follows. Firstly, gesture interface basic concepts are properly dened and categorised to clarify the characteristics of the dierent methods, devices and applications for gesture-based control. To address the wide topic of gesture usability and usefulness, a pre-study on gesture control in chosen application domains is presented in Sect. 3. The applied gesture training and recognition methods are introduced briey and user-dependent recognition accuracy is evaluated in Sect. 4. The gesture recognition system was integrated into the Smart Design Studio prototype, which is described in Sect. 5. The prototype was used for the second user study that aimed at evaluating and comparing the use of dierent modalities for dierent

287

tasks in the design environment. Subsequently, the results of both user studies are discussed. Finally, suggestions for future work are given together with conclusions.

2 Gesture control
Gestures are an alternative or complementary modality for application control with mobile devices. A multimodal interface utilising gesture commands is presented in Sect. 5. Before that, the types of movement sensorbased user interfaces need to be categorised to clarify the dierences between the various approaches. Table 1 presents a categorisation based on a few essential properties of movement sensor-based control systems. Based on their operating principle, direct measurement and control systems are not considered to be gesture recognition systems. In this article, gestures are referred to as users hand movements collected by a set of sensors in the device in the users hand and modelled by mathematical methods in such a way that any performed movement can be trained to be later recognised in real-time; based on which a discrete device control command is executed. Gesture commands can be used to control two types of applications: device internal functions, and external devices. In device internal application control, gestures are both recognised and applied inside the device. In external device control, gestures operate separate devices outside the sensing device, and the recognition, based on the collected gesture movement data, is performed either inside or outside the sensing device. If the recognition is performed outside the sensing device, the movement data is sent to an external recognition platform, such as a PC. The recognition results are mapped to commands for external device(s) and the commands are transmitted to the control target with a wireless or wired communication channel. With regard to sensor-based hand movement interfaces, this paper focuses on gesture command interfaces (the second category in Table 1). Moreover, this article

focuses on the gesture control of external devices. There are a multitude of existing simple measure and control applications that belong to category one in Table 1. The Smart Design Studio prototype presented in Sect. 5 utilises this category of control in addition to category two. Other modalities in the prototype, such as speech or a PDA stylus, can be used as an alternative or complementary interface to gestures, making the system multimodal.

3 User study on gesture types


The range of potential applications for gesture command control is extensive. This study focuses on certain chosen application areas. A pre-study was conducted in order to examine the potential suitability of gestures for the chosen applications. The goal of the user study was to gather information about feasible gestures for the control of home equipment and computer-aided design software. Another goal was to investigate whether there is a universal gesture set, a vocabulary that can be identied for certain applications. The test participants were requested to imagine, create and sketch the gesture(s) they considered most appropriate for the control functions given in the questionnaire form. They were given two weeks to contemplate and try dierent potential gestures in their homes before returning the questionnaire form. Use of the same gestures for similar functions in dierent devices, such as VCR on, TV on, etc., was also allowed. Discrete gesture movements could be performed in all three dimensions (up-down, leftright, forward-backward), and it was supported in the questionnaire notation. Although simple tilting and rotation-based interaction could be useful in some application areas, it was not within the focus of this questionnaire. The responses for the two questionnaires were acquired from European ITEA Ambience project partners and their families. The rst was carried out in Finnish and the main target group was IT professionals and their families. The task was to describe gestures for controlling

Table 1 Categorisation and properties of movement sensor-based user interfaces Sensor-based user interface type (movement) 1. Measure and control 2. Gesture command 3. Continuous gesture command Operating principle Analogy with speech interface e.g. control based on volume level Speech command recognition Continuous speech recognition Personalisation Complexity and computing load Very low High Very high

Direct measurement of tilting, rotation or amplitude Gesture command recognition Continuous gesture recognition

Machine learning, freely personalisable Machine learning, freely personalisable

288

the basic functionality of three dierent home appliances TV, VCR and lighting. The functions included such control actions as changing the channel, increasing/ decreasing the volume, and switching the device on and o. The second questionnaire was carried out among Ambience project partners and included the same VCR control tasks and ve 3D design software control tasks. The total number of tasks was 21 in the rst and 26 in the second questionnaire. The questionnaire was initially delivered to 50 selected participants, who were also encouraged to distribute and recommend the test to other friends and colleagues. Both the questionnaires had the same open questions at the end of the form to ask for additional information, such as: Did the respondents nd it natural to use gestures to control the given devices? What other devices could be controlled with gestures? What remote controllable devices did they own and use? Give free comments and ideas about the design and applicability of the gesture control device.

4 Questionnaire results and discussion


The total number of responses was 37, of which 73% were males and 27% females, with varying cultural backgrounds. About 78% had a technical education. The average age was around 32, while the age distribution ranged from 21 to 54. The nationalities of the respondents were 57% Finnish, 38% Italian, 3% French and 3% Dutch. Typical frequently used remotely controllable home appliances were TV (97%), stereo/Hi- (80%), and VCR (71%). The VCR control gestures were given by 37 participants, TV and lighting control by 23, and 3D design software control by 14. Table 2 summarises the three most popular controlling gestures proposed for the different VCR control tasks. Each gesture command is represented by the gesture trace, the spatial plane, and the percentage of responses suggesting that particular gesture for the given control function. The results indicate that the respondents tended to use spatial two-dimensional gestures in the xy plane.

Table 2 Most popular gestures for VCR control

289

Fig. 1 Sample of gestures proposed for the VCR record task

Utilising all three dimensions in one gesture was rare. The third dimension, depth axis (pointing away from the user) was used for the functions On-O gesture pair that consists of push forward pull backward movements, Record, and Pause, Table 2. Another nding was that some of the proposed gestures follow the control logic, so that opposite control actions are represented by opposite direction gestures. on off next previous increase decrease ! ! ! push pull right left up down

VCR VCR VCR VCR

play, tv volume up, lights brighten stop, tv volume down, lights dim next channel, tv next channel previous channel, tv previous channel

These gestures seem intuitive for the given tasks. However, there were control tasks that did not produce such straightforward gestures such as VCR record, VCR pause and TV mute. This was obvious with operations that have no easily identiable mental model or universal symbol, e.g. VCR record, Fig. 1. Comments were given that specic critical functions, such as VCR record, should be protected from accidental activation by dening more complex control gestures. Another observation was that the same gestures were proposed for dierent devices: VCR on, tv on, lights on, zoom in VCR o, tv o, lights o, zoom out

This result suggests that the users would like to control dierent devices using the same basic gestures. Hence, there is a need for a method for selecting the controllable device from the environment prior to making the gesture. Research studies can be found using, e.g., a laser/infrared pointer to pick out the desired control target [14]. Seventy six percent of the respondents found it natural to use gestures for controlling the given devices, while 8% did not nd it natural, and 16% left the question unanswered. Respondents commented that they found the gestures natural for some commands but the gestures should only be used for simple basic tasks. Moreover, according to the comments, the number of gestures should be relatively small and there should be a possibility of training and rehearsing dicult gestures. When asked what devices people would like to control by gestures, the most popular device category was personal devices and home entertainment devices, such as PDAs, mobile terminals, TVs, DVD players and Hi- equipment. Another popular category was locking and security applications, including garage doors, alarm systems, and car and home locks. People also proposed some typical PC environment tasks, such as presentation control, Internet browser navigation and tool selection of paint software. There were also a couple of special application areas, such as vacuum cleaners, car gearboxes and microwave ovens. To generalise, the most popular application target for gesture control, according to the responses, seems to be simple functions of current remotely controllable devices.

5 Gesture training and recognition


According to the study, users prefer intuitive userdenable gestures for gesture-based interaction. This is a challenge for the gesture recognition and training system, since both real-time training and recognition are required. To make the usage of the system comfortable, a low number of repetitions of a gesture is required during the training. On the other hand, a good generalisation performance in recognition must be achieved. Other requirements include: a recogniser must maintain models of several gestures, and when a gesture is performed, training or recognition operations must be short. This section presents the methods used in

Fig. 2 Block diagram of a gesture recognition/training system

290

real-time gesture recognition in the design environment prototype. In accelerometer-based gesture interaction, sensors produce signal patterns typical for gestures. These signal patterns are used in generating models that allow the recognition of distinct gestures. We have used discrete HMMs in recognizing gestures. HMMs are stochastic state machine that can be applied to modelling time series with spatial and temporal variability. HMMs have been utilised in experiments for gesture and speech recognition [8, 15]. Acceleration sensor-based gesture recognition using HMMs have been studied in [4, 8, 1618]. The recognition system works in two phases: training and recognition. A block diagram is presented in Fig. 2. Common steps for these phases are signal sampling from three accelerometers to 3D sample signals, preprocessing, and vector quantisation of signals. Repeating the same gesture produces a variety of measured signals because the tempo and the scale of the gesture can change. In preprocessing, data from the gestures is rst normalised to equal length and amplitude. The data is then submitted to a vector quantiser. The purpose of the vector quantiser is to reduce the dimensionality of the preprocessed data to 1D sequences of discrete symbols that are used as inputs for the HMMs in training and in recognition. 5.1 Preprocessing The preprocessing stage consists of interpolation or extrapolation and scaling. The gesture data is rst linearly interpolated or extrapolated if the data sequence is too short or too long, respectively. The amplitude of the data is scaled using linear min-max scaling. Preprocessing normalises the variation in gesture speed (tempo) and scale, thus improving the recognition of the gesture form and direction. 5.2 Vector quantisation The vector quantisation is used to convert preprocessed 3D data into 1D prototype vectors. The collection of the prototype vectors is called a codebook. In our experiments, the size of the codebook was empirically selected to be eight. Vector quantisation is done using the kmeans algorithm [19]. The codebook is built using the entire data set available for the experiments. Figure 3 illustrates a 3D acceleration vector (upper diagram) and
Table 3 Gestures used in the recognition accuracy experiments

Fig. 3 Three-dimensional acceleration vector and its symbolic representation for > shaped gesture

its corresponding 1D symbol vector (lower diagram) for a > -shaped gesture. 5.3 Training and recognition with HMM The Hidden Markov Model [12] is a stochastic state machine. HMM classication and training are solved using Viterbi and BaumWelch algorithms. The global structure of the HMM recognition system is composed of parallel connection of each trained HMM [20]. Hence adding a new HMM or deleting the existing one is feasible. In this paper an ergodic, i.e. fully connected discrete HMM topology, was utilised. In the case of gesture recognition from acceleration signals, both ergodic and left-to-right models have been reported as giving similar results [4]. The alphabet sizethe codebook sizeused here is eight and the number of states in each model is ve. Good results have been obtained in earlier studies by using an ergodic model with ve states for a set of gestures [4]. It has been reported that the number of states does not have a signicant eect on the gesture recognition results [21].

6 Gesture recognition experiments and results


Systematic tests are required in order to evaluate the accuracy of the gesture training and recognition system. In this section, the recognition accuracy of the system is evaluated by established pattern recognition methods. First, the collected data is described, the methods and experiments are explained, and nally the results are

291 Table 4 Recognition rate versus number of training vectors 1 81.2% 2 87.2% 4 93.0% 6 95.3% 8 96.1% 10 96.6% 12 98.9%

discussed. These tests were performed oine before the multimodal design environment user study. The user study was performed using the implemented real-time recognition system integrated with the design environment prototype, Sect. 5. According to the questionnaires results, eight popular gestures, presented in Table 3, were selected as a test data set for the HMM-based gesture recognition accuracy test. Data collection was performed using a wireless sensor device (SoapBox) equipped with accelerometers. For each gesture, 30 distinct 3D acceleration vectors were collected from one person in two separate sessions over 2 days. The gestures were collected sequentially by repeating the rst gesture 15 times, taking a short break and then proceeding to the collection of the next gesture. The total test data set consisted of 240 repetitions. The length of the 3D acceleration vectors varied depending on the duration of the gesture. The experiments were divided into two parts. First, the recognition rate for each gesture was calculated by using two repetitions for training and the remaining 28 for recognition. The recognition percentage for each gesture was the result of cross-validation, so that, in the case of the two training repetitions there were 15 training sets and the rest of the data (28) was used as the test set 15 times. The procedure was repeated for each gesture and the result was averaged over all the eight gestures. This procedure was then repeated with four, six, eight, ten, and 12 repetitions to nd the number of required training repetitions for reaching a proper accuracy. In the second test, the aim was to reduce the number of required training repetitions in order to make the training process less laborious for the user. Our goal was to nd a solution which would keep the amount of required training repetitions as low as possible and still provide a recognition rate of over 95%, which should be satisfactory for most applications. The experimented method was to copy the original gesture data and add noise into the copy [13]. The noise-distorted copy was then used as one actual training repetition. The goal of the experiment was to see if the number of repetitions could be reduced and to nd the right level of noise to add to the data for optimal results. Two actual gestures were used for training in this test, plus two copy gestures a total of four. The accuracies in this test are also the result of cross-validation for each noise parameter value the models were each trained and tested 15 times, the test set consisting of 28 repetitions for each gesture. The result of the rst test, the eect of the number of the training vectors for recognition rate is shown in Table 4. A recognition rate of over 90% was achieved with four training vectors.

The result of the second test was that with two original and two noise-distorted (uniform distributed, SNR 5) training vectors the recognition accuracy is 96.4%. Compared with the situation in which HMMs were trained using only two original training vectors, the gain achieved by adding noised vectors is over nine per cent. In fact, the result seems to be near the recognition accuracy achieved using ten original training vectors. Clearly, discrete HMM is not able to make good generalizations from only two training vectors. Adding the noised vectors to the original training set increases the density of the signicant features of the gesture in the overall training set. Variation of gesture becomes better captured and the new training set is a more representative sample of the vectors describing the gesture. Results show that two original vectors with two noise-distorted vectors can capture the variation of the gesture with the same accuracy as ten original vectors [22]. According to the preliminary, as yet unpublished empirical results, walking while making a gesture does not cause a signicant decrease in the recognition accuracy, though training in this study was performed while stationary.

7 Smart Design Studio


The Smart Design Studio prototype was developed in co-operation with ITEA Ambience project partners. The prototype was realised in the Italdesign-Giugiaro Virtual Reality Centre, which is an interactive work environment supporting dierent design and engineering activities. The Virtual Reality Centre is equipped with a large wall-sized projection display where engineers and industrial designers present their design sketches and prototypes to their customers. A typical usage scenario for the Virtual Reality Centre presentation starts with preparatory tasks, such as loading virtual models, presentation slides and meeting agenda, and, during the actual presentation, the designer or marketing representative constantly changes between dierent model views and switches between dierent programs in the presentation. This requires that either a separate presentation operator changes the programs, models and views according to the presenters requests or the presenter himself has to move back and forth from the projection wall to the back of the room to operate the computer. Virtual Reality Centre users had found the existing mouse and keyboard-based interaction impractical and slow to use, so they wanted to develop a new easy-to-use environment the Smart Design Studio with different optional interaction modalities. In addition, the users wanted to have separate design and presentation software user interfaces to be integrated into one common user interface application supporting eortless changing of modalities. The main objective was to simplify and accelerate the engineering and design development phases by introducing more devices and dierent interaction modalities. The users were free to

292

select the most suitable control modality for the given task according to their personal preference. Due to the limitations in control capabilities, some modalities supported only a limited set of control tasks. The modality and task selection was done in co-operation with Virtual Reality Centre users. For example, RFID-based control was practical in room access control and image presentation tasks but less practical for CAD model editing. The new modalities besides the existing mouse and keyboard were the following: Speech recognition and speech output for voice command-based navigation Gesture recognition for controlling the Smart Design Studio with hand movements RFID tag card-based physical tangible navigation of multimedia material Laser-tracked pen for direct manipulation of objects on the projection wall Tablet PC and PDA touch-sensitive display for remote controlling This paper concerns the experiments from the gesture modality compared with the others. An overview of the Smart Design Studio architecture and modalities is discussed rst, followed by a description of the gesture recognition system integrated into the design environment. Finally, the end-user evaluation is presented and the results of the comparison of the modalities are discussed.

the conceptual overview of the Smart Design Studio Demonstrator. The Smart Design Studio is implemented using two dedicated servers, Windows and UNIX, connected together with a TCP/IP network. The Windows server is used to run a web server and typical oce and presentation tools. Moreover, it has hardware and software interfaces for speech and gesture recognition, RFID tagbased physical tangible objects (PTO) and wireless network connections (WLAN). The UNIX server functions as a platform for the Virtual Room Service Manager (VRSM) server, CAD design and presentation software, pen tracker interface, and the projection wall controller. The VRSM is used to manage and synchronise interaction events coming from dierent modalities according to the contextual information about currently active software and control devices. In detail, interaction with the system is permitted through: Speech input and output. Through this modality, the users can manage complex applications by simply talking with the system using a wireless microphone. Speech recognition can go from the basic commands to start presentation and select from a menu to a much more complex interaction with the system, such as changing the viewpoint of a model. However, the commands related to 3D model editing were not supported. System responses were generated using a text-to-speech engine to provide feedback and help in conicting situations. The Speech recognition solution was developed by Knowledge. Gesture input allows the association of user-denable gestures with specic correspondent commands (such as move, rotate, zoom in, zoom out) enabling the users to interact with the applications in a simple and natural way. Gesture recognition could be used for s electing items from a menu, changing the point of view or moving and rotating a model in the virtual space. The gesture control set was limited to simple basic commands and more complex CAD design was not supported. The gesture recognition system was developed by VTT Electronics and is described in more detail in the following section.

8 System architecture and interaction modalities


The Smart Design Studio was realised in the existing Virtual Reality Centre, where the interaction was earlier performed using only a mouse and a keyboard for the input and a projection wall for the output. The basic infrastructure supporting seamless multimodal operation of the dierent design and presentation software was developed by Italdesign-Giugiaro. Figure 4 presents
Fig. 4 Conceptual overview of Smart Design Studio prototype

293

Physical tangible object (PTO). RFID-tag technology enables the association of digital information with physical objects. The system allows users of the Smart Design Studio to start applications and select designs by simply placing a physical object on a table. Italdesign-Giugiaro and Philips have implemented a solution to present design sketches and documents to their customers by attaching related data objects to RFID cards. Using a tagged card it is possible to launch dierent applications and visualize dierent data, such as images and oce documents. The RFID receiver has two antennae: one for manipulating data content such as pictures and design models, and the other for application selection, such as text editors or image viewers. The system has been implemented using at 2D RFID cards with the name of the application or a photo of the respective car model. The latter could just as easily be implemented using a small 3D model of the car. Tangible object recognition oers the advantage of instant access to certain functions, eliminating the need to search through dierent application menus. It is also very exible and not restricted to the number of buttons that t on a remote control. The main advantage, however, is that it enables users to interact with the system using one of their most basic and intuitive skills: handling physical objects. The same technology could be used in access control and presentation environment personalization; so, for example, when the designer enters the room the lighting level is adjusted and the appropriate presentation data is loaded. PDA and tablet PC. These devices can wirelessly control the Smart Design Studio via a dedicated and re-congurable browser interface. The navigation is performed using either a touch-sensitive display or a wireless mouse and keyboard. The same interface can also be used with a laptop or workstation PC equipped with a web browser. The control set was limited and it supported only 3D navigation and presentation task related commands. IntelliPen is used as a physical pointer device over a large projection screen. It acts as a mouse equivalent allowing direct interaction with CAD applications on a 1:1 scale, thus giving new opportunities for the stylist and engineers to interact with the system. The IntelliPen can control all functions that are controllable by mouse, making it the most versatile control device in the Smart Design Studio. High-precision tracking is based on two laser scanners, giving accuracy high enough to let the designers work on the screen in the same way that they would do with normal input devices. The IntelliPen was developed by Barco for the prototype. Projection wall provides visual output for the Smart Design Studio. It has a resolution of 3,200 1,120 pixels and the size of the screen is 6.2 2.2 m2 In the initial system set up, dierent input commands are bound to certain functions of the controllable

applications by using a specic communication protocol. The protocol denes the applications and the specic commands that are available. There is also a set of global commands, which are available to the user in any context (for example, VRSM_help will activate the context-related help). Speech, gesture and physical tangible objects require the user to explicitly dene and personalise the commands and desired control outputs, while the PDA and Tablet PC have a special browser interface. IntelliPen is the only device that does not need any conguration, except for initial calibration, since it works as a mouse equivalent. Because all the information to be presented is handled with the UNIX server, VRSM is constantly aware of what application is used and currently visible, thus making it possible to ignore commands that are not reasonable for that context for example, the gesture command Rotate clockwise is ignored in the slide presentation mode. However, this kind of inconsistency is uncommon in a normal presentation situation where the show is controlled by one person familiar with the presentation agenda. The system enables true multimodal interaction by providing the possibility of making the same control commands in many dierent ways. For instance, the move down command can be provided by gesture, voice, IntelliPen or PDA, and the load command can be provided by voice or physical tangible object.

9 Gesture recognition system as part of the Smart Design Studio


VTT Electronics provided the personalisable gesture modality to be used for control and navigation within dierent Smart Design Studio applications. In the training phase, users were free to associate dierent activation and navigation functions with their personal gestures. This enabled the users to present a slideshow or select dierent viewpoints of CAD software. Gesture input was collected using SoapBox, which uses acceleration sensors to measure both dynamic acceleration (e.g. motion of the box) and static acceleration (e.g. tilt of the box). The acceleration was measured in three dimensions and sampled at a rate of 46 Hz. The measured signal values were wirelessly transmitted from the handheld SoapBox to a receiver SoapBox that was connected to the Windows server with a serial connection. The gesture start and end were marked by pushing the button on the SoapBox at the start of the gesture and releasing it at the end, which then activated either the training or recognition algorithm. This may produce some extra artefacts to the actual gesture data, such as short still parts in the start or end of the actual gesture, which could be ltered out to improve the recognition. The ideal solution would be device operation without use of buttons, which could be technically compared to continuous speech recognition with always open microphones. In both cases the input data is continuously monitored and nding the

294

actual data from the background noise, i.e. additional user movement in the case of gesture recognition, for the recognition is dicult and computationally heavy. Moreover, continuous recognition might produce unintentional recognitions, which may disturb the user. All signal processing and pattern recognition (HMM) software ran in the Windows server. Recognition results could be mapped to dierent control commands, which were transmitted to VRSM using TCP/IP socket communication. The mapping between gestures and output functions was done in the training phase by naming the gestures using specic command names, e.g. Model zoom in for each gesture. Moreover, the set up also supported two kinds of continuous control (category measure and control in Table 1) by utilising the tilting angle of the control device and rotation (bearing) detected by an electronic compass. Three dierent operation modes discrete gesture commands, tilting, and rotation could be selected by two buttons on top of the SoapBox. These modes could be utilised for zooming or rotating virtual models. This kind of continuous control was not originally supported in the communication protocol of the VRSM, although during the preliminary user tests it was found to be a very natural way of manipulating certain views of the software and was included in the protocol. Because the electronic compass is very sensitive to metallic and magnetic objects nearby, a separate calibration program was provided to lter out these errors in the signal.

In addition, more specic questions were asked to get detailed feedback from the gesture recognition system. The following items were queried from the users: Should gesture recognition be user-specic or userindependent? Is the personalisation phase too complicated (gesture training)? Usefulness of the dierent interaction modes (gesture, tilting, rotation) Ergonomics of the device In order to cover all the Italdesign - Giugiaro Virtual Reality Centre user types the following user groups were selected: Young designers and (Computer-aided styling) CAS operators from the Styling Department Senior designers from the Styling department Project managers, engineers and (digital mock up) DMU operators

10 Comparison of modalities
The usefulness of the Smart Design Studio prototype was evaluated with user tests performed in the real environment. The test session started with a brief demonstration and as introduction to the new control modalities of the Smart Design Studio. The usage scenario consisted of a typical presentation session simulation. Each user performed the tasks of a typical design session, closely following the typical work methodologies of the designer in the classic design session and in the meetings. During the test sessions, users were encouraged to give free comments on the operation and functionality of the control devices. After the test, the users were interviewed by the test observers. There was no specic format for the interviews since the questions were asked informally, depending on the users professional background and their behaviour during the test. However, every user was asked to answer the following open questions: What was the most feasible control modality for a certain Smart Design Studio application function? Overall impression of the control modalities (like/ dislike) Did the new Smart Design Studio environment, with new modalities, improve the interaction compared to the previous VRC environment?

Fig. 5 a) SoapBox gesture device and b) Gesture-based interaction with the Smart Design Studio

295 Table 5 Percentage of user types preferring certain control modalities Control modality Speech interaction Users Senior Designer Young Designer CAS operators Engineers / DMU operators 12 11 11,5 7 Physical tangible objects 15 15 14,5 21 Gesture interaction 8 12 16 23 PDA 10 14 7 14 IntelliPen 55 48 51 35

Table 6 Percentage of users preferring certain control modalities for controlling certain application Applications Opticore Catia V5 IcemSurf Sketch presentation Slideshow

Control modality Speech interaction 17 15 14 30 40 Physical tangible objects 13 23 20 35 40 Gesture interaction 43 15 13 0 0 PDA 22 0 0 35 20 IntelliPen 5 47 53 0 0

The users levels of knowledge of the computer-aided tools varied a great deal; some of them use the computer as their main working instrument while others consider it just as a support for their job. The average age of the test subjects was 37 and their educational backgrounds were in economics, telecommunications and electrical engineering or industrial design. The total number of test subjects was 15 and all were male. Figure 5 shows the SoapBox gesture control device and an industrial designer interacting with CAD presentation software by using gestures.

11 Test results
User interviews and comments revealed successful aspects of the Smart Design Studio prototype, and targets further development. Overall, on the positive side, the following aspects were identied: the new opportunity to interact with the system by using a combination of different modalities for dierent tasks signicantly improves the productivity of the work and usability of the virtual reality centre environment (i.e. Smart Design Studio). The new modality options were found natural and intuitive to use. Also the operation speed of the system and the possibility of interacting directly with virtual models on the large screen were appreciated.

Switching between modalities was easy and the system conguration phase did not require too much eort. When comparing modalities between dierent user groups, the most popular control modality among each group was the IntelliPen as presented in Table 5. Because the IntelliPen acts similar to a mouse pointer, it provides a natural and instant way of interacting with the system with little or no training. One reason for the popularity of the IntelliPen was that all the functions could be controlled with it. The other modalities e.g. physical tangible objects or gesture control could only be used for controlling a small subset of functions in the system. However, when using the IntelliPen in a program or menu navigation some functions may not be so feasible for example, the menu bar may open on the other end of the projection wall requiring extra walking to make a selection. In these cases, the users preferred to use additional modalities that provided direct shortcuts to certain menu functions, such as move model left, rotate model or activate help. Table 5 also shows that dierent user groups performed dierent tasks in the workspace, which aected the preferred control modalities; the designers and computer-aided styling engineers preferred tools that enabled design tasks, while the engineers and DMV operators liked to use more presentation-oriented modalities.

Table 7 Gestures used in the Smart Design Studio experiments

296

11.1 IntelliPen The IntelliPen was found especially useful in the tasks requiring direct manipulation of virtual models. This is clearly seen in Table 6, where the modalities are evaluated in terms of application used. Catia V5 and IcemSurf, which are applications used in 3D model design, styling and rendering, require active use of a pointer device. In addition, the users appreciated the possibility of using the IntelliPen in direct interaction with the design model on a 1:1 scale, which is often impossible when working with a mouse-controlled workstation. Despite the fact that the IntelliPen provided the possibility to access every function of the system, it was not preferred in the presentation tasks since other modalities provided direct shortcuts to certain actions and thus more natural interaction. 11.2 Gesture interaction The test results show that the users preferred gesture interaction when using the Opticore, which is a special software for 3D model visualisation. Their gestures were used in 3D navigation in y mode i.e. moving the model in dierent directions by making simple spatial up, down, left and right gestures in the xy plane (Table 7). A similar y mode can also be found in the Catia V5 and IcemSurf applications and it was commonly used during the tests. During the system integration phase, the functionality of the gesture recogniser was tested with the above-mentioned navigation gestures plus push and pull gestures in the z-axis for zooming (Table 7). It was found that the operation of gesture command-based zooming was too stepwise and continuous control would have been better. The continuous tilting and rotation features were implemented and integrated into the Smart Design Studio before the nal user evaluation. Positive feedback proved that the modication was justied. Especially when using the rotation feature, some users reported that they had the sensation of having the model in their hands. The gestures, tilting and rotation were easy to use and the users appreciated the possibility of interacting with the model without the spatial constraints of a mouse and keyboard. Gesture recognition accuracy was good in the tests and the system was able to recognise both small and large scale gestures performed with dierent speeds. However, there were some problems with the sensitivity of the rotation. The electronic compass inside the sensor box is sensitive to magnetic elds in the environment, which may change when moving around the Smart Design Studio. During the interviews, there were comments that it would be nice if the system included a userindependent pre-trained library of typical navigation gestures enabling instant usage and a separate training program for personalisation of the library. However, the gesture models trained for one person were not accurate

when used by a person other than the trainer. Most of the recognition errors resulted from varying button usage and sensor box tilting dierences between the user who trained the gestures and the test user. Because of the small size of the sensor box, dierent users easily tilted the box in their hand, especially during gesture movement. This is not a problem if the sensor box is tilted in the same way it was tilted in the training phase, but if there are dierences in tilting between training and recognition, the acceleration signal axes are shifted and the recognition rate decreases. The tests showed that the system is not so sensitive to moderate speed and scale dierences in gesture movement. However, the initial results suggest that the gesture recognition system is able to recognise gestures user-independently, if the gesture set is trained by a group of users (ve or more) and the tilting angle variation is ltered out. We will report the results for user-independent recognition in a later publication. Moreover, user comments revealed specic gesture interface issues that need to be addressed in the future: How to undo operations How to correct recognition errors How to identify which state the controllable device is in When controlling dierent devices with the same gestures, how to select the device under control

Furthermore, it was found that some sort of feedback is essential, especially in the error cases when nothing happens. This is conrmed by the literature [10]. Overall, the user comments stated that the operation speed of the gesture recognition system was fast and it provided a natural way of interacting with 3D visualisation software. However, concerning ergonomics, for some users the buttons on the sensor box were too close to each other and thus a little cumbersome to use because of the small size of the sensor box. Users proposed that the casing of the sensor box should be redesigned to make it more comfortable and there should be a selection switch for discrete or continuous gesture control modes. 11.3 PDA, physical tangible objects (PTO) and speech interaction The PDA was found most useful in visualisation (Opticore) and presentation control tasks. Since the http-based control required implementation of an application-specic interface for each application, complicated design and modelling tasks were left out due to the limited input and output capabilities of the PDA. The tagged cards (PTO) were widely used in the presentation tasks especially in the slideshow and sketch application, which is a thumbnail-based tool for image viewing. Furthermore, positive feedback was given to the option to launch the dierent applications and load the design models simply by throwing an appropriate data object (card) onto the (antenna) table. Overall, the

297

physical aspect of controlling the data and applications was found natural and comfortable. Another modality valued in the slideshow presentation was speech control. Typical speech commands were load model <model_no>, go to next slide, go to previous slide or go to slide <slide_no>. Similar commands were also used in the image navigation inside the sketch presentation application. Both slides and images were easily navigable by using speech or tagged cards. Speech input was found practical in some special functions, such as loading models and setting viewpoints of design and visualisation software. In addition, the user comments revealed that the dialogue of the speech recognition system was practical, providing help and asking questions in the case of conicting situations for example if the user requested Please rotate the model, the system would generate the prompt Rotate model, left or right?.

12 Discussion
The use of accelerometer-based gesture recognition as a complementary modality for a multimodal system was analysed with two user studies. The rst study examined the suitability and type of gestures for controlling selected home appliances and the Smart Design Studio. During the study it was found that, with a small subset of users, the request to rell the same questionnaire produced dierent gestures for certain tasks after a few days. Finding the suitable gestures for a certain task seems to require iteration. At rst, the users may propose complex gestures, which are easy to understand and relate to a function, e.g. R-shaped gesture for VCR record (Table 2). Later, the users may discover straightforward gestures that demand less user eort than complex ones, e.g. move gestures in the design environment. Moreover, for certain tasks, there is no similarity between proposed gestures, Fig. 1. Overall, the results of the study indicated that people prefer to dene personal gestures. The gesture personalisation capability was evaluated with a recognition accuracy test, which validated that gesture training and accurate recognition is feasible. The usefulness of the integrated gesture recognition system was evaluated with real users in the Smart Design Studio with a multimodal interface. The comparison of modalities suggested that preferred interaction method is task dependent. However, the preference of modalities varied a lot between test participants. Some users preferred using multiple modalities for the same task, and some preferred using just one. Dierent modalities had dierent numbers of possible controllable functions. IntelliPen had the widest scope of functions. For speech, gestures, stylus, and tangible objects, the amount of reasonable control functions was lower, excluding e.g. drawing functions. Despite the limited set of functions for these modalities, they were still preferred for some

tasks. All the modalities had the highest precedence percent for at least one task, suggesting that multiple modalities improve the interaction with the system. Results show that users tried to select the most natural control modality available for a given task. Tasks which did not require direct manipulation of screen objects such as CAD model editing were typically controlled by modalities that provided command-based discrete control i.e. speech, gestures, stylus and tangible objects. In addition, these modalities also oered freedom to move around during presentations and control the task remotely even from the other side of the room, while direct manipulation required the user to stand right in front of the screen, often blocking the view of the audience. Test results were based on multiple test sessions of a few hours duration over a short period of time. The results indicated that interaction can be improved by using dierent modalities for dierent tasks. However, user preferences might change during longterm usage, and more extensive tests are required to validate that the results are not only based on the novelty eect alone. Concerning gesture interaction, user feedback included positive comments about the fast operational speed of the gesture recognition system and naturalness of the interaction in the navigation tasks. The users experimented with and tested dierent gestures for controlling the Smart Design Studio applications. Interestingly, after freely experimenting with gestures in user groups in the test environment a consensus was found for a common gesture set for 3D navigation. For this set, design model move commands, a clear spatial association exists. The discovery of a common gesture set reected user feedback, which suggested that the system should include a user-independent pre-trained library of typical navigation gestures enabling instant usage. Nevertheless, a separate training program was still desired for the personalisation (adding and modifying) of the gestures. Discovering a common gesture set diers from the questionnaire ndings, where consensus was missing. Finding a common gesture set seems to depend on the task. When the chosen set of gestures is intuitive enough for the given task, as for movements in 3D navigation, the users can easily adopt the gestures dened by others. For gesture commands having a natural spatial association with the navigation task, a common set agreed by a group of users can be found, but this requires real experimentation with a real system. The contradictory questionnaire results show that it is dicult to imagine beforehand what gesture would be useful and practical for a certain task, without interacting with the real application. Gestures are a rather new control method and only a few users have previous experience of how to interact with computer systems using gestures. Hence, concrete hands-on experimentation is required for any task to nd the best gestures agreed among a group of users.

298

13 Conclusions and future work


Accelerometer-based gesture recognition was studied as an emerging interaction modality, providing new possibilities to interact with mobile devices, consumer electronics, etc. A user questionnaire was circulated to examine the suitability and type of gestures for controlling a design environment and selected home appliances. The results indicated that people prefer to dene personal gestures, implying that the gestures should be freely trainable. An experiment to evaluate gesture training and recognition based on signals from 3D accelerometers and machine learning methods was conducted. The results validated that gesture training and accurate recognition is feasible in practice. The usefulness of the developed gesture recognition system was evaluated with a user study with a Smart Design Studio prototype that had a multimodal interface. The results indicated that gesture commands were natural especially for simple commands with spatial association. Furthermore, for this type of gestures a common set of gestures, agreed upon by a group of users, can be found but this requires hands-on experimentation with real applications. Test results were based on multiple test sessions over a short period of time. In the future, more extensive sessions are required to acquire more detailed results of the long-term usefulness of the system. Sensor-based gesture control brings some advantages compared with more traditional modalities. Gestures require no eye focus on the interface, and they are silent. For some tasks, the gesture control can be natural and quick. However, many targets remain for future work. The results of user-dependent recognition should be extended to user-independent recognition, as is the goal in speech recognition. The use of buttons should be eliminated, and the tilt of the device should be ltered out. A long-term goal is continuous gesture recognition. In addition, the gesture interface should give feedback to the user by means of vibration or audio. One of the challenges is selecting a controllable device from the environment, in order to enable the use of dierent devices with the same gestures. Furthermore, in this study, mobility refers to the mobility of the device, such as mobile phone or a separate control device, which is carried with the person. Testing the recognition accuracy in cases where the user is moving, requires further work. The test should cover the most common forms of movement that could include gesture control while moving. However, early experiments have shown that the recogniser maintains performance if the user is walking. An important topic is practical usability; more user studies are needed to empirically evaluate and develop the gesture modality for a variety of interaction tasks in multimodal systems.
Acknowledgements We gratefully acknowledge research funding from the National Technology Agency of Finland (Tekes) and the Italian Ministry of Education, University and Research (MIUR).

We would also like to thank our partners in the ITEA Ambience project.

References
1. Starner T, Auxier J, Ashbrook D, Gandy M (2000) The gesture pendant: a self-illuminating, wearable, infrared computer vision system for home automation control and medical monitoring. In: proceedings of the fourth international symposium on wearable computers, ISWC 2000, pp 8795 2. Rekimoto J (2001) GestureWrist and gesturepad : unobtrusive wearable interaction devices. In: proceedings of the fth international symposium on wearable computers, ISWC 2001, pp 2131 3. Sawada H, Hashimoto S (2000) Gesture recognition using an accelerometer sensor and its application to musical performance control. Electron Commun Jpn Part 3, pp 917 4. Homan F, Heyer P, Hommel G (1997) Velocity prole based recognition of dynamic gestures with discrete hidden markov models. Proceedings of gesture workshop 97, Springer, Berlin Heidelberg, Newyork 5. Tsukada K, Yasumura M (2002) Ubi-nger: gesture input device for mobile use. In: Proceedings of APCHI 2002, Vol. 1, pp 388-400 6. Wilson A, Shafer S (2003) Between u and i: XWand: UI for intelligent spaces. In: Proceedings of the conference on human factors in computing systems, CHI 2003, April 2003, pp 545552 7. Flanagan J, Mantyjarvi J, Korpiaho K, Tikanmaki J (2002) Recognizing movements of a handheld device using symbolic representation and coding of sensor signals. In: Proceedings of the rst international conference on mobile and ubiquitous multimedia, pp 104112 8. Mantyla V-M, Mantyjarvi J, Seppanen T, Tuulari E (2000) Hand gesture recognition of a mobile device user. In: Proceedings of the international IEEE conference on multimedia and expo, pp 281284 9. Theodoridis S, Koutroumbas K (1999) Pattern recognition. Academic press, London 10. Pirhonen A, Brewster S, Holgiun C (2002) Gestural and audio metaphors as a means of control for mobile devices. CHI 2002, April 2002, pp 291298 11. Tuulari E, Ylisaukko-oja A (2002) SoapBox: a platform for ubiquitous computing research and applications. In: First international conference, Pervasive 2002, pp 26-28 12. Rabiner L (1998) Tutorial on hidden markov models and selected applications in speech recognition. In: Proceedings of the IEEE, Vol. 77, No. 2 13. Kay S (2000) Can detectability be improved by adding noise? IEEE Signal Process Lett, 7(1):810 14. Ailisto H, Plomp J, Pohjanheimo L, Strommer E (2003) A physical selection paradigm for ubiquitous computing. In: 1st European symposium on ambient intelligence (EUSAI 2003). ambient intelligence, Lecture Notes in Computer Science Vol. 2875. Aarts, Emile et al (eds) Springer, Berlin Heidelberg, Newyork, pp 372383 15. Peltola J, Plomp J, Seppanen T (1999) A dictionary-adaptive speech driven user interface for distributed multimedia platform. In: Euromicro Workshop on multimedia and telecommunications, Milan, Italy 16. Kallio S, Kela J, Mantyjarvi J (2003) Online gesture recogni tion system for mobile interaction. IEEE International Conference on Systems, Man & Cybernetics, Vol. 3, Washington DC, USA, pp 20702076 17. Mantyjarvi J (2003) Sensor-based context recognition for mo bile applications. VTT Publications 511 18. Iacucci G, Kela J, Pehkonen P (2004) Computational support to record and re-experience visits. Personal and ubiquitous computing, Vol 8 No 2, Springer, Berlin Heidelberg, Newyork, pp 100109

299 19. Gersho A, Gray RM (1991) Vector Quantization and signal compression. Kluwer, Dordrecht 20. Yoon HS (2001) Hand gesture recognition using combined features of location, angle and velocity. Pattern Recogn 34:491501 21. Mantyla V-M (2001) Discrete hidden markov models with application to isolated user-dependent hand gesture recognition. VTT Publications 449 22. Mantyjarvi J, Kela J, Korpipaa P, Kallio S (2004) Enabling fast and eortless customisation in accelerometer based gesture interaction. In: Proceedings of the third international conference on mobile and ubiquitous multimedia, ACM, pp 2531

You might also like