You are on page 1of 29

1 Introduction

1.1 Overview
1.1.1 Motivation Natural language abounds with descriptions of motion. This is hardly surprising, since our environment teems with slithering, swimming, ying, and cruising creatures that navigate in a world with natural elements that can spin, ow, slide, whirl, etc. Our experience of our own motion, and our perception of motion in the world, together have given human languages substantial means to verbally express many different aspects of movement, including its temporal circumstances and its spatial trajectory and its manner. In every language on earth, verbalizations of motion can specify changes in the spatial position of an object over time. In addition to when and where the motion takes place, languages additionally characterize how the motion takes place: its path, its manner, how it was caused, etc. The path of motion, in particular, involves conceptualizations of the various spatial relationships that an object can have to other objects in the space it moves in. Physicists and philosophers have long theorized about the nature of space and spatial relationships. Newton (1995) believed that space has an existence independent of physical objects, an absolute space that will remain always similar and immovable (Newton 1995, Scholium 3). Objects, in his account, occupy places that are part of absolute space, which affords a universal coordinate system with objects and their relationships being characterizable in terms of Euclidean geometry. This sort of model of space underlies most of the classical, pre-relativistic analyses of motion in physics. The conception of space found in natural languages is quite different. As we shall see, it allows for positioning objects in terms of coordinate systems, but does not have built-in a universal, absolute coordinate system that allows for precise specication of object positions. (Of course, languages can in many cases specify relatively precise positions by importing absolute coordinate systems.) Typically, a gure object is expressed as being in a particular orientation (left, east, under, etc.) with respect to another reference or ground object and possibly a third object, the viewer (Levinson 2003). A gure object can also be positioned in terms of topological relations (inside, separate from, etc.) along with distance from a ground object.

Interpreting Motion

When objects are positioned without a reference object, the descriptions can indicate paths in a coordinate system (to the east or seaward). Space in language, at least in terms of the way it is revealed by the use of closed-class terms for topology and orientation, seems to be parasitic on objects and the relations between them, and can be broadly described as incorporating a relational view of space.1 This book articulates a new computational linguistics approach to understanding natural language descriptions of motion. Our goals are theoretical as well as pragmatic. From a theoretical standpoint, we aim to provide a semantic theory of motion expressions that can be used for computation. This sort of theory involves mapping motion descriptions in natural language to formal representations that computers can automatically reason with. As we shall see, such reasoning uses qualitative models of space and time, making inferences about changes in the positions of objects over time. From an empirical standpoint, we want our theory to mesh well with natural language data, and so we allow our computational methods to avail of information found in text corpora. The ability to create computer programs that can automatically process large corpora containing descriptions of motion has an important practical consequence: it allows us to map from texts to data representations that can be of immense value in everyday life. For example, a system could take a set of verbal directions for getting to a particular place, and automatically transform it into a map with trajectories marked on it. Narratives of journeys taken today and long ago could be parsed into logs that record where, when, and how the various segments of the journey were carried out. Documents involving media such as pictures and videos that have associated linguistic annotations can be analyzed so as to retrieve spatial, temporal, and motion-related information from collections of such media on the Web. In this chapter, we will rst discuss the challenges in linguistic analysis and inference that are faced by such systems. After outlining our technical approach, we highlight two key insights that inform our work. The challenges and our approach give rise to a set of requirements that have to be met, in our view, in order to achieve success; this constitutes a short list of desiderata. Last but not least, all research builds

1 The natural language-derived relational view of space that we have sketched is often viewed as being in conformity with Leibnizs philosophy of space. Leibniz denies the reality of an absolute space out there, arguing that space is a mental construct arising from an ordering of physical objects (like time, which he views as a mental construct arising from an ordering of events). Specically, an objects physical location is determined by its relation to that of xed (what we might call ground) objects: Particularly, that place is that, which is the same in different moments to different existent things, when their relations of co-existence with certain other existentes, which are supposed to continue xed from one of those moments to the other, agree entirely together. [ . . . ] Lastly, space is that which results from places taken together. (Clarke 1717, p. 199; my elisions indicated by [ . . . ]). Leibnizs places are thus dened in terms of relations between objects, similar to the situation revealed in natural language usage. However, natural language and its analysis has nothing to say about the metaphysical question as to whether space exists or is a mental construct.

Introduction

on the labor of others; to situate our work and convince the reader that we have something interesting and plausible to say, we compare and contrast our work with previous research in linguistics on spatial prepositions and motion verbs. 1.1.2 Challenges In order to interpret motion expressions in natural language, each sentence has to be rst parsed along with morphological analysis, and once a syntactic structure is arrived at and disambiguated from among alternative parses, the predicates and their semantic arguments have to be identied, with the latter classied in terms of their semantic roles (the agent of the event, the theme, the manner and path of motion, etc.). To carry this out, the system must have knowledge of the morphology and syntax of the language, as well as the mapping between the semantic arguments of different lexical predicates on one hand, and on the other, the syntactic constituents (arguments) these predicates can combine with (i.e., subcategorize for) as well as additional phrases (adjuncts) that co-occur with them in the sentence. This sort of information is usually represented in a lexicon for the particular language. In addition, the events must be anchored to the times they are purported to occur in. For example, in the sentence The Princess of Wales arrived at a Christmas concert last night, the syntactic subject The Princess of Wales has to be identied as the Theme of the predicate arrive, at a Christmas concert as its Goal, and last night as its Time. In addition, last night must be pegged to a time that is on the previous night with respect to the speech time (which could of course be on the same day as the speech time). Here tense has to be recognized. Some languages (like the Bantu language ChiBemba) have several past and future tenses; some, like Mandarin Chinese, do not have grammatical tense; and still others like Burmese distinguish only between ongoing or past events and others. These apparent linguistic peculiarities (which are in fact entirely normal for the speakers of those languages) have to be taken into account, along with context, to situate the event with respect to the speech time. Events also have to be ordered with respect to each other, which can be non-trivial when events are narrated in an order different from that of their occurrence. The results of these inferences have to be represented in terms of an inventory of temporal relations that is drawn from some calculus that deals with orderings in time. Time expressions must also be resolved, to calendar times where possible. These inferential tasks can be fairly challenging for computational approaches, because most narratives will not explicitly date each event, and when time and date expressions are used, they may be anaphoric, i.e., relative to times introduced earlier in the discourse (as in arrived on Tuesday). Further, the inventory of temporal relations in the calculus used must be expressive enough to capture the distinctions between temporal relations found in any natural language; and it is also

Interpreting Motion

desirable to be able to carry out efcient computations using the calculus. This reects an important desideratum: the semantic representations need to be expressive enough for natural languages, but also must be amenable to inference methods that can be used in practical systems. Turning to spatial information, spatial references in the form of place names (toponyms) mentioned in text must be identied and, when geographic in nature, resolved to particular entities such as countries, mountain ranges, cities, etc., and when construed as points, resolved to geo-coordinates where possible. This resolution process can involve considerable disambiguation, as humans naturally tend to reuse names when naming places as well as other entities. Spatial relationships involving topological, orientation, and distance relations between places must be recognized. This too can be challenging, due in part to the ambiguity of prepositions and adverbials. The unraveling of directions, in particular, can be notoriously difcult, as any driver navigating from others helpful verbal directions can attest. In addition, some languages have fairly elaborate inventories of closed-class terms for representing spatial relations. For example, Talmy (2000) cites the (now extinct) Californian language Atsugewi which has a set of sufxes appearing on the verb that mark some 50 distinctions of Ground geometries and the paths that relate to them. Some dozen of these sufxes represent distinctions covered by the English preposition into, which does not itself reect such ner subdivisions. (ibid., p. 192). As with time, these spatial relations must be represented in terms of some calculus that characterizes orderings in space. Such a calculus must, of course, also satisfy the desideratum above. The above inferences are just prerequisites for interpreting motion expressions. Once the events are anchored to times, and the objects participating in the events are located with respect to other objects in terms of spatial relations, motion events have to be analyzed. In particular, information from the lexicon such as the class of the motion verb must be brought to bear on the analysis; for example, run is a mannerof-motion verb, while arrive is a path verb. This will allow the system to characterize motion events in terms of the event or situation involved in the change of location, the object that is undergoing movement (the gure), the region (or path) traversed through the motion, a distinguished point or region of the path (the ground), the manner in which the change of location is carried out, and the medium through which the motion takes place. Once the motion is grounded in this way by linguistic analysis, qualitative reasoning tools must operate on the underlying representation, allowing inferences to be made. Maps and other visualizations that track the movements of entities may also be generated from the representation. 1.1.3 Approach These requirements present a set of formidable problems for automatic interpretation of motion expressions in language. However, writing in the second decade of the

Introduction

21st century, we believe computational approaches have started to address these challenges. The goal of our book is to esh out a computational approach, addressing for the rst time in a systematic manner the integration of the language of motion with qualitative reasoning. This integration is evaluated in terms of the desideratum above, discussed in Chapters 3 and 4, highlighting gaps and outstanding problems. We also indicate along the way, in Chapter 5, the performance accuracies of practical systems. Our approach integrates together the linguistic conceptualizations with the formal methods, mapping one to the other in the context of natural language processing. Our approach is empirical, driven by instances of language use found in text collections (or corpora), especially the newsletters, travel blogs, route directions, etc., found on the Web. In terms of methodology, these corpora are rst annotated by humans with features reecting the kinds of linguistic distinctions and analyses mentioned above. Computers then mine the annotated corpora to learn automatically how to reproduce the annotations, using a variety of machine-learning tools. These annotations are then mapped to the representations used by the formal models, allowing reasoning to be carried out over motion information captured from natural language. Throughout, the goal of satisfying the above desideratum is addressed to the extent possible. The details of this methodology are described in Chapter 5. The automatic systems that result from training on the annotated data offer both a working embodiment of the theory and the modularity that it denes, as well as practical tools that can interpret motion expressions in language and generate visualizations including maps and sketches. From a theoretical standpoint, this methodology allows linguistic theories to be tested empirically, both in terms of the breadth of their applicability when faced with actual language use, as well as the precise linguistic representation that should result for each example. This test also involves measuring the reliability of humans in terms of the annotations that they produce. In practical terms, the approach results in systems with a text-to-sketch capability that can display tracks on a map of where a moving object has been at particular times. For example, given a bikers travel blog as input, a map with tracks could be generated as output. The resulting systems can be evaluated and compared with each other, stimulating in turn the development of new and better methods. In a nutshell, we offer an integrated perspective on how language structures concepts of motion, and how the world shapes the way in which motion is linguistically expressed. The books approach is two-pronged: analysis of the details of language use in different contexts (based on the exploitation of linguistic corpora), along with theoretical modeling and formal reasoning (based on qualitative representations). While there has been a great deal of linguistics research on the semantics of motion verbs as well as locative constructions, and considerable research on qualitative spatial reasoning, there has been little interdisciplinary effort on trying to connect these two

Interpreting Motion

elds in a systematic way. This is the rst book, we believe, to analyze concepts of motion in language while integrating these two fundamental points-of-view. In the rest of this chapter, we outline two key insights that inform our approach. After discussing our desiderata, to further situate our approach, we differentiate our framework from other work in linguistics, as well as compare our classications and semantics for motion with other relevant approaches.

1.2 Key insights


1.2.1 Spatial abstractions One of the key insights from prior research has to do with the types of conceptualization needed to understand spatial language, e.g., Miller and Johnson-Laird (1976), Herskovits (1986), Talmy (1983, 2000), among others. For example, research by Talmy (1983, 2000) has characterized various primitive templates or schemas for representing motion. In a description like (1), a complex spatial scene is abstracted as a geometric point (the gure) moving towards another point (the ground) for a bounded temporal extent. Likewise, a moving object may be described as a point moving along a path that is a line (2), or as a line moving coaxially along the linear path (3). (1) The ball rolled toward the lamp for 10 seconds. (2) The ball rolled across the railway bed. (3) The trickle owed along the ledge. The idealization is such that the speaker is able to abstract away from irrelevant details such as the length or orientation of the path, representing each spatial scene using a schema, and the hearer in turn is able to recreate the scenes from the schema.2 Talmy points out that these representations do not rely on Euclidean geometry and the properties of metric spaces, emphasizing instead topological relations that remain invariant irrespective of changes in sizes, distances, and shapes of the objects. He also points out that while the expressions for the geometries of gure objects tend to be limited in variety, the geometries of ground objects, by contrast, are less constrained and vary considerably with the language, including bounded planes (e.g., the bike sped across the eld/around the track), cylindrical forms (the bike sped through the tunnel), a wide variety of different types of enclosures (I crawled out the window, I ran in the house), etc. A related set of ndings has to do with the differences across languages in the way one can specify a gure object as being in a particular orientation (left, east,
The use of such intuitive geometries begs the question as to whether the points being idealized are in fact mathematical points. After all, natural language does not typically construe points in space or time as being dimensionless; instead, they are all conceived as having extent.
2

Introduction

under etc.) with respect to another reference or ground object and possibly a third object, the viewer. Studies of speakers across a wide variety of languages have revealed a basic inventory of three types of geometric coordinate systems (frames of reference) whose types are unevenly distributed, along with a variety of idiosyncratic instantiations, across languages (Levinson 2003). The human ability to refer to and pick out objects in space relies on these particular frames of reference. These are discussed in more detail in Chapter 3. While understanding spatial descriptions appears to rely on interpreting such topological and geometrical relationships, it is important to note that it does not require precise geometries. Humans, after all, communicate successfully by and large without specifying the relatively exact (e.g. GPS) positions of objects and their shapes. We are able to describe and understand fairly elaborate motions, without needing to drill down into equations that characterize the physical motions signaled by these verbs. The use of imprecise and often incomplete qualitative geometric descriptions (instead of quantitative ones such as specifying the coordinates and shapes of every object) allows human communication to be highly efcient. Our communication relies on a rich commonsense model of the world that has proved sufcient for humans to survive and evolve until now. In turn, this fact has hardly gone unnoticed in articial intelligence research. Having an articial agent reason qualitatively allows for reasoning to be more efcient in some situations, since abstracting away from numerical details allows the agent to focus on more compact representations that isolate just the relevant information needed to solve a particular problem. AI approaches to qualitative reasoning have developed a rich set of geometric primitives for representing time, space (including distance, orientation, and topological relations involving notions such as contact and containment), and together with those, motion. The results of such research have yielded a wide variety of spatial and temporal reasoning logics and tools. Qualitative Spatial Reasoning has been successfully applied to military sketch maps (Forbus et al. 2003), meteorology (Bailey-Kellogg and Zhao 2004), robot navigation (Moratz and Wallgrn 2003), integration of sensor information for environmental monitoring (Jung and Nittel 2008), etc. In contrast, the primitives specied in the linguistic approaches above are not expressive enough for formal computational reasoning. To address this gap, in Chapter 3, we map the geometric and topological primitives and calculi used in qualitative reasoning in a systematic manner to natural language. Our work thus allows for more formal and expressive models to be constructed for linguistic representations. Our innovations are similar in spirit to Miller and Johnson-Laird (1976) and Johnson-Laird (1977), who argued that understanding of language involves translating a sentence into an executable program. We are thus committed to providing computationally expressive ways of representing motion expressed in natural language, in particular subscribing to the idea that understanding motion

Interpreting Motion

in language involves assembling and executing programs. However, the programming framework we use, discussed in Chapter 4, involves precise formal logics developed in computer science, rather than Miller and Johnson-Lairds early and somewhat ad hoc procedural semantics.3 In section 1.3.3, we compare our approach to the semantics of motion with several other approaches. 1.2.2 Motion semantics: action- versus location-based predicates Motion verbs, according to Talmy (1985, 1991, 2000), occur in syntactic constructions that express several semantic components: (i) a Figure object that moves with respect to (ii) a Ground object, along a spatial region, called (iii) the Path. There are also two additional components (called co-events, in keeping with his view that they are construable as distinct events): (iv) the Manner of the movement and (v) the Cause that is responsible for the motion. A further distinction that Talmy makes (one that is largely borne out by crosslinguistic research) is that languages have two distinct strategies for expressing concepts of motion. In satellite-framing, commonly used in English and other Germanic languages, as well as Slavic languages, also called manner-type languages, the main verb conates (i.e., contains a morpheme that encodes) the manner or cause of motion, while path information is expressed in satellites.4 Here a satellite is any constituent other than a noun-phrase or prepositional-phrase complement that is in a sister relation to the verb root (Talmy 2000, p. 102), and includes particles, afxes, etc.5 Thus, in (4a), the language represents the motion as an action of bouncing, with slid/ rolled/ bounced expressing the manner of the motion, and the path being expressed by the satellite down.6 In contrast, in verb-framing, found in Turkish, Romance, Semitic, and other languages, also called path-type languages, the verb conates the path, whereas the manner is optionally expressed by adjuncts, as in the Spanish (4b).

3 The procedural semantics of Miller and Johnson-Laird (1976) is based on primitive routines such as nding in a search domain an entity referred to by a natural language description, testing if the particular properties predicated by the description hold of it, and acting so as to make the description be true of the entity. 4 Such manner-of-motion verbs are extremely common in English, as attested by the long list of such verbs in the verb classication of Levin (1993). 5 Talmy (1991) characterized satellites in more detail: The satellite, which can be either a bound afx or a free word, is thus intended to encompass all of the following grammatical forms, which traditionally have been largely treated independently of each other: English verb particles, German separable and inseparable verb prexes, Latin or Russian verb prexes, Chinese verb complements, Caddo incorporated nouns and Atsugewi polysynthetic afxes around the verb root. (Talmy 1991, p. 486). 6 Likewise, in the napkin blew off the table, the verb conates the Cause of the motion, with the path being expressed by the satellite off, In addition to Manner/Cause and Path conation, Talmy (1985) points out that verbs can also conate Figure information, as in the Atsugewi verb root -caq-, which means for a slimy lumpish object (e.g., a toad, a cow-dropping) to be move/be located.

Introduction
(4a) (4b) The rock slid/rolled/bounced down the hill. La botella entr a la cueva (otando) the bottle moved-in to the cave (oating) The bottle oated into the cave.

Here the language represents the motion as a change of location. Note that there are exceptions; English has Romance-derived verbs like enter, arrive, ascend etc. that encode path. As Talmy (1985) points out, the (small number of) verbs in English that conate Path are mostly Romance borrowings. Now, various scholars including Talmy have recognized that this classication is not quite disjoint. For example, in languages involving serial verb compounds, like Lahu, Thai, and Mandarin Chinese (Slobin 2004), it is unclear which one is the main verb; and in Native American language families such as Hokan and Penutian, path and manner morphemes together form part of a verb complex, with neither one being classiable as a main verb or satellite (Delancey 1989). Also, in the Australian language Jaminjung, motion is expressed by one of ve core verbs combined with preverbs that encode both path and manner with neither one being of subordinate status (Schultze-Berndt 2000). All such languages have been designated by Slobin (2004) as belonging to a third category instantiating equipollent-framing, where both manner and path are equally salient. In response, Talmy (2009) has accepted that cases of equipollent framing denitely exist. For example, based on a set of linguistic criteria for what constitutes a main verb, he points out that in the case of Mandarin serial verbs, the verb in the rst position is clearly the main verb, while the verb in second position is sometimes viewed as subordinate, and sometimes a main verbin the latter case, demonstrating equipollent framing. However, such instances, he shows, are relatively rare. Given this qualied but fundamental linguistic distinction,7 the semantic representations for verbs can involve two classes of logical predicates: action-based predicates (e.g., manner-of-motion verbs found in satellite-framing patterns, like bike, drive, y, etc.) and location-based predicates (e.g. for path verbs found in verb-framing patterns, such as arrive, depart, etc.). Action-based predicates do not make reference to distinguished locations, but rather to the assignment and reassignment of locations of the object, through the action. Since the location-based predicates focus on points on a path, we view them as making reference to a distinguished location, and the location of the moving object is tested to check its relation to this distinguished value. The predicate semantics makes use of Dynamic Interval Temporal Logic (DITL) from Pustejovsky and Moszkowicz (2011), which in turn blends dynamic logic (Harel
7 For equipollent languages, our semantic representation will thus have to make use of a combination of action- and location-based predicates.

10

Interpreting Motion

1984) with a rst-order linear temporal logic (Allen, 1984; Moszkowski, 1986; Manna and Pnueli, 1995; Krger and Merz, 2008). DITL is a hybrid, rst-order dynamic logic where events are modeled as either dynamic processes or static situations. Here event expressions refer to simple or complex programs, and states refer to preconditions or post-conditions of these programs. Assignment-of-location is modeled as an atomic program, and change-of-location is modeled as a compound program, whose relation is determined compositionally by the relations denoted by its atomic parts. This approach to modeling the semantics of motion is discussed in more depth in Chapter 4. There are obvious subtypes of action-based predicates, due, for example, to the type of vehicle involved in the motion (bike, drive, etc.). Just as important are aspects of manner dened in terms of topological constraints between the objects throughout the motion. Consider a gure object that is moving with respect to a ground object. Here we can consider four subclasses, based on the orientation of the gure with respect to the ground, whether the topological relation is constant throughout the process of motion, whether it involves all of the gure or only a part thereof, and characteristics of the medium in which the gure moves. Similarly, location-based predicates can be differentiated according to how many formal qualitative dimensions are involved in their denitions. For example, the simplest path is merely an implicit line associated with a distinguished end or start point, as in the case of the topological path verbs arrive, exit, take off, etc. This can be further rened to make reference to orientation or direction, as in the orientation path verbs climb and descend, metric information, as in the topometric verbs approach, near, etc., or a combination of both, as in the topometric orientation expressions just below or just above. In this book, we will examine how these categories and subcategories of motion predicates are expressed through qualitative spatial and temporal models. In the next section, we critically assess, in the light of our approach, prior work on the semantics of spatial prepositions, verb classication, and motion verb semantics.

1.3 Desiderata
The challenges we identied earlier can only be met if we constrain our approach to meet some strict requirements. These have to be borne in mind when we assess any technical approach, both ours as well as that of other research. We list these now, while delving into them further throughout this chapter and book. 1. As mentioned earlier: the semantic representations need to be expressive enough for natural languages, but also must be amenable to inference methods that can be used in practical systems.

Introduction

11

2. The semantic theory must be denotational, i.e. provide a mapping in terms of a model of things in the world. 3. The semantic analysis must be compositional, i.e., the meaning of sentences must be built up systematically from the meanings of the constituent phrases and in turn the lexical elements in them, in tandem with the syntactic operations that assemble them. 4. The representations used have to support qualitative reasoning. 5. The systems built must be evaluated to be accurate and efcient enough to support practical applications.

1.4 Theoretical background


1.4.1 Spatial prepositions 1.4.1.1 Classic studies There has been considerable prior research on motion verbs (e.g. run), spatial prepositions (across), adjectives (narrow), adverbs (far), nouns (lake), proper names (San Francisco), and other locative constructions. We focus here on spatial prepositions and adpositions. Two key issues emerge from the prior research. The rst issue is the nature of the spatial representations involved, and the second issue is what exactly differentiates the different senses to produce polysemy. Underlying them both is a third issue, the characteristics and properties of a theory of meaning. Prepositions are traditionally classied as either directional or locative (Miller and Johnson-Laird 1976; Herskovits 1986; Zwarts and Winter 2000). Directional ones involve a path and/or movement, and include across, around, from, into, onto, and to. Locative prepositions are sub-classied into projective ones, which involve a pointof-view (e.g. above, behind, below, beside, in front of, over, under) and non-projective ones (e.g. at, between, in, inside, on, outside, near). The work of Miller and Johnson-Laird (1976) represents a signicant advance in the modeling of the semantics of spatial prepositions. Consider their analysis of in as in (5): (5a) (5b) (5c) (5d) (5e) a city in Sweden the coffee in the cup the spoon in the cup the scratch in the surface the bone in the leg

In (5a,b), the gure is entirely enclosed within the ground object, whereas in (5c) part of the gure need not be enclosed in the ground. In (5b,c), the ground object is conceptualized as some form of container. In (5d,e), the gure is entirely enclosed in the ground object, with (5d) dealing with two-dimensional (2D) objects and (5e) dealing with three-dimensional (3D) objects. To handle these cases, Miller and

12

Interpreting Motion

Johnson-Laird develop a semantic theory of parthood and topological relations, i.e. mereotopology. In their account, in has a common meaning in the above uses: the gure has a part that is totally inside the ground object.8 Providing a theory of mereotopology, built, say on primitive notions of connection and parthood, is essential, we believe, to characterizing of spatial relations. Such a theory will be discussed more in Chapter 2 and formalized in Chapter 3. Likewise, consider the uses of on in (6). (6a) (6b) (6c) (6d) (6e) the scratch on the surface the picture on the wall the lamp on the table the house on the river the boat on the river

Miller and Johnson-Laird point out that in (6ac), the relation is between surfaces. In (6b), part of the gure is over a part of the ground (such as a hook), and the latter part supports the rest of the gure. In (6c), if the table is on a rug, which is on the oor, it is ne to say the table is on the oor, because the region of interaction with the oor includes the table legs. But the transitivity is limited: we cannot say in (6c) that the lamp is on the oor. Searching the region of interaction with the oor will not reveal the lamp. Functional notions such as support and regions of interaction (or affordances of objects (Gibson 1977)) are part and parcel of a theory of spatial relations; in this book, though we will take note of their presence, we will not be formally representing functional notions, as they presuppose a great deal of commonsense knowledge that is difcult to acquire and represent in a general way for use in practical systems. Of course, in specic domains, it is possible to enumerate object-specic functional properties (including shape). For example, in their natural language-driven scene rendering system, Coyne and Sproat (2001) associate 3D regions called spatial tags with objects, so that the object representing daisy has a stem spatial tag and likewise test-tube a cup spatial tag. Given the input expression the daisy is in the test tube, the graphical output has the daisys stem inserted into the test tubes cupped opening. A similar approach could be used to represent the meaning of (5c). However, his daisy is in the scrapbook would presumably require an entirely different spatial tag for daisy, begging the question of the enumeration of domain-independent functional properties for each object. Regarding (6d), it involves a path that is potentially ambiguous between being on the edge of the ground object (the river) and being on the surface of the ground object (where the surface is that part of the object that will reect light to the eye or that can
8 In their semantic framework, the relations are between percepts of gure and ground, rather than between things in the world.

Introduction

13

be explored by touch), with a strong preference for the former (in contrast to (6e)). Based on this and other evidence, Miller and Johnson-Laird argue that on has two spatial meanings: either the gure is part of the region of interaction with the surface of the ground object, with the ground supporting the gure, or else the gure object is construed as being in a path relation with the ground object. In subsequent research, Herskovits (1986) proposed underlying geometric meanings for spatial prepositions in English involving geometric relations between gure and ground objects; these relations are between objects construed as points, lines, surfaces, volumes, and vectors. The preposition on in (7a), for example, involves concepts of contiguity (the gure is next to and touches the ground object) and (as we have seen) support (the ground object supports the gure). However, in (7b), contrary to Miller and Johnson-Laird, she argues that support is not involved. (7a) The book on the table. (7b) The wrinkles on his forehead. In addition, the objects related by a preposition must be modeled in terms of their geometric properties, expressed as geometric functions that dene characteristics of the space occupied by the object. For example, a table is geometrically constrained to be bounded and denite in shape, whereas water is not. Other geometric functions include idealizations (approximations to a point, line, surface, or plane), parts (e.g. edges, bases, surfaces, etc.), axes, volumes, projections, and what she calls goodform. For example, in (8a), good form provides the Gestalt closure on the tree such that a bird can be contained in the space occupied by that form, shown in (8b), from Pustejovsky (1989). (8a) The bird in the tree. (8b) Included-in (Part (Place (Bird)), Interior (Outline (VisiblePart (Place (Tree))))). Turning to the issue of polysemy, Herskovits argues that (7a) above expresses an ideal meaning of on, whose sense is shifted in (7b). Senses can also shift due to a pragmatic degree of tolerance, i.e. to handle fuzzy cases of (7a) where the book is on a table cloth which is in turn on the table. As a result, while an ideal meaning is semantic, the actual senses in use are produced as pragmatic alterations to the ideal meaning. From the standpoint of a theory of meaning, Herskovits account rejects the notion of a compositional theory. Further, although there is a sketch of a mereotopology, there is no precise theory of how exactly the pragmatic alterations occur, resulting in a lack of applicability to computational processes. 1.4.1.2 Cognitive linguistics Along with Herskovits work, there has been a great deal of activity in cognitive linguistics on the semantics of spatial prepositions. Here we will consider some of the core work from this area, while deferring a discussion of Jackendoffs contributions to the next section.

14

Interpreting Motion

One of the fundamental tenets of this rather diverse eld is that human concepts are embodied, i.e., the concepts we have access to and the nature of the reality we think and talk about are a function of our embodiment (Evans et al. 2007, p. 7). Following (Johnson 1987; Lakoff and Johnson 1980; Brugman 1981; Mandler 2004; Evans, op. cit.), basic topological concepts like contact and inclusion (in the spatial sense of enclosure) are formed through the infants interaction with objects. In this account, it is the schema of the container which underlies both the enclosure or inclusion sense of in in (9a) and its metaphorical extension in (9b). (9a) The cat is in the house. (9b) The cat is in trouble. The nature of polysemy is a contentious issue in cognitive linguistics. Consider the preposition over, which has been the subject of considerable discussion. The classic account of Lakoff (1987) makes ne-grained sense distinctions for the preposition based on characteristics of the gure and ground object. In (10a), the landmark (i.e., ground object) is an extended object, but not so in (10b) (examples from Tyler and Evans 2001): (10a) The helicopter hovered over the ocean. (10b) The hummingbird hovered over the ower. Likewise, in (11a) there is contact with the wall, whereas there is not in (11b); in (11c), there is covering and occlusion of the ground. These differences would warrant, in the classic account, different senses for over.9 (11a) The boy climbed over the wall. (11b) The tennis ball ew over the wall. (11c) Joan nailed a board over the hole in the ceiling. (11d) The heavy rains caused the river to ow over its banks. In general, this sort of argument by appeal to arbitrary spatial distinctions proliferates senses in a somewhat unprincipled manner. There is no underlying mereotopological theory, providing no way of building up spatial concepts from more primitive ones. Researchers have struggled to constrain the number of senses, using (quite sensibly) dictionaries, lexical resources, and various theoretical criteria. For example, Tyler and Evans (2001) take their cue from Herskovits and propose a proto-sense or (primary sense) of every preposition that they argue is the diachronically earliest sense;10 the proto-sense of over means above except that unlike above, there is potential contact with the ground. Notably, this sense does not contain path
Examples in (11) from Tyler and Evans (2001, pp. 728, 732, 757). Postulating the diachronically earliest sense as more basic in every case does not seem at all correct given modern usage.
9 10

Introduction

15

information. The above and across interpretation in (11a) and (11b), which does include the path, is not a different sense of over, but arises in conjunction with the meaning of the verb and the gure and ground objects. In (11c), however, a nonprimary sense of over is differentiated, as it involves the distinct spatial notion of covering. In (11d), the sense is distinguished based on a supposedly distinct spatial notion of excess given by a cognitive scenario of a container overowing, with the gure rising higher than the top of the ground object. The Tyler and Evans proposal suffers from the same problems we observed with Herskovits account. Appealing to potential contact between gure and ground only serves as a way of grouping together disjunctions. Further, (11d) does not seem to warrant a different sense, given the contribution of the verb ow. In addition, as Cuyckens (2007) points out, consider (12a) and (12b). (12a) The cat jumped over the wall. (12b) The cat jumped up on the wall. The only syntactic difference is the preposition, but (12a) results in a different path than (12b)the cat ends up on the wall in the latter, but on the other side of the wall in the former. Thus over must involve a path meaning. Having said that, the question arises as to the set of spatial properties that should be considered when distinguishing spatial senses of a preposition. Unless these properties are drawn from a structured domain, in particular geometric or topological domains that can be made mathematically precise, pretty much any set of spatial properties that sound relevant might be used, since the theory has no way of evaluating them except by arguments based on linguistic tests. In general, the inability to nd reliable criteria to differentiate word senses is also a reection of the lack of empirical, corpus-based methodology in the cognitive linguistics approach. Corpus-level annotation of word senses is a well-established task in computational linguistics, e.g. SENSEVAL-1 (Kilgarriff and Palmer 2000). In these annotation efforts, ne-grained lexical resources such as WordNet (Fellbaum 1998), where different senses of words are grouped into synonym classes called synsets (with the classes being linked by conceptual relations such as hypernymy and part-whole relations), have been used as sense inventories for annotating openclass terms in large corpora. Certain senses will of course be more frequent than others, and the more frequent ones may coincide with notions of central or more salient meanings for a given word. (As it happens, WordNet provides a ranking of different senses based on frequencies in the British National Corpus.) This sort of project also has the practical benet of dividing the problem of polysemy into those word senses that are easy to agree on and those that arent, focusing attention on the ones that pose challenges, and perhaps suggesting revisions or limitations to the sense inventory. In SENSEVAL-3 (Mihalcea and Edmonds 2004), annotators agreed with each other almost two-thirds of the time.

16

Interpreting Motion

Turning to the theory of meaning, cognitive linguistics is an inherently mentalistic theory of meaning.11 In contrast, denotational theories12 are important for several reasons: (i) Truth and reference are important for successful communication, as work in discourse modeling, e.g. Kamp and Reyle (1993) indicates. (ii) Mentalistic theories tend not to tell us what role in understanding the things communicated about play. As Putnam (1975) points out, a person may not have the conceptual knowledge to tell the difference between a beech and an elm, even though the two terms clearly refer to different things in the world. (iii) Using a logical representation allows for logical inferences to be made, for formal properties of computation to be studied systematically, etc. The latter property is of course of considerable interest to computational approaches. 1.4.1.3 Jackendoff In our earlier linguistic analyses, we mentioned paths. In addition to Talmy, another cognitive linguist who provides a rich representation for paths is Jackendoff (1983, 1990). In his theory of Lexical Conceptual Structure (LCS), the verbs of location and motion are viewed as fundamentally spatial, with non-spatial senses being an extension of the spatial senses. Jackendoff gives distinguished status to places and paths in LCS. Paths can be bounded, where the ground is the start- or end-point of the path. Another type of path is a direction, as in (13a), where the ground object does not fall on the path, but would if the path were extended some unspecied distance (ibid., p. 165). A third kind is a route, where the ground object is related to some point in the interior of the path, as in (14a). Unlike Herskovits account, Jackendoff s semantics has an implicit mereotopology and is compositional. He relies on functions to assemble meanings of words together to form meanings of phrases. A place-function (e.g. IN, ON, INSIDE, UNDER, etc.) takes a Thing and returns a Place, while a pathfunction (FROM, TO, TOWARD, AWAY-FROM, and VIA) takes either a Thing or a Place and returns a Path. Examples of place-and path-functions are shown in the prepositional phrase meanings in (13b) and (14b). (13a) [John ran] toward the house. (13b) [Path TOWARD ([Thing house])] (14a) [The car passed] through the tunnel. (14b) [Path VIA ([Place INSIDE ([Thing tunnel)])]

11 Mentalistic, or representational theories of meaning, are concerned mainly with understanding the relation between linguistic expressions and things in the speakers mind, namely, explaining what goes on in peoples minds when they use language. 12 Denotational theories of meaning (i.e. as found in model-theoretic semantics) are concerned mainly with the correspondence between expressions and things in the environment, and thus this enterprise aims at a theory of truth and reference. Such theories represent the environment in terms of a formal model for the denotation of expressions.

Introduction

17

While the semantics of LCS is obviously compositional, it is not intended to be truth-conditional, and is thus in keeping with cognitive semantics precepts. Since it has no basis in logic, Conceptual Structure cannot be used to make logical inferences, and as such cannot account for entailments between sentences.13 Another drawback is that the primitives corresponding to prepositions, such as IN, ON, TOWARD, INSIDE, etc. are not further elaborated to support reasoning; they are functors in a compositional syntax, but are not differentiated from each other in terms of semantics. Finally, unlike the work say of (Talmy 2000), the geometry used is far too abstract to be relevant to computational modeling of spatial reference and motion. 1.4.1.4 Vector representations It must be acknowledged that Jackendoffs ontology of paths and places and the differentiation between place- and path-functions constitute one of the more expressive accounts of the semantics of spatial prepositions offered within an entirely compositional semantics. His basic notions of paths have been further elaborated by others, most notably within a denotational semantics by Zwarts (2003). In the latters work, a spatial preposition denotes a set of paths, where a path is dened as a continuous function from the real interval [0, 1] to points (or regions) in space. The denotation of a prepositional phrase (PP) of the form into the room is a set of paths whose end-point is inside the room. Zwarts associates events with paths via a function that takes an event and returns its path. Accordingly, the denotation of a verb phrase (VP) of the form enter the room is a set of events such that (only) the end-point of the events path is inside the room. In support of this theory, relations like into, inside etc. are based on an underlying model of vectors14 (Zwarts and Winter 2000). Here, the preposition inside is treated as a function which maps a set of points representing the ground object A to a set of vectors whose start-points are on the boundary of A and whose end-points are internal to A. Since there may be multiple vectors from different points on the boundary to the particular end-point, only the shortest vector is considered. The set of points representing an object is treated as convex,15 in keeping with our use of prepositions like inside to conceptualize even non-convex ground objects as being convex. As Zwarts and Winter point out, the ball is inside the bowl is compatible with a situation where the ball is sitting on the bottom of an open bowl, where the ball actually occupies a space that is disjoint from that of the bowl. The preposition outside is similar, except that the externally closest vectors are involved, i.e. the shortest vectors that start at the boundary of A and end at points

13 However, a truth-conditional semantics for Conceptual Structure has been demonstrated by (Zwarts and Verkuyl 1994), who recast it as a many-sorted rst-order logic. 14 Other researchers have also explored vectors, including Talmy (2000), Bohnemeyer (2003), OKeefe (2003), and Carlson et al. (2003). However, they have not concerned themselves with building up a compositional semantics for spatial language based on vectors. 15 A set of points is convex if the line segment joining any pair of points in the set lies entirely in the set.

18

Interpreting Motion

not belonging to A. As for the preposition on, its meaning is a set of vectors each of whose end-points is outside the set of points corresponding to the gure object, but whose length is less than some small number, so that distance between gure and ground is near zero. Although the theory of Zwarts and Winter (2000) does provide an elegant compositional semantics for PPs, including those modied by measure phrases, it can be faulted on several grounds. For one thing, though there are vectors and point sets, there is no explicit mereotopology. The invocation of metric notions of distance to represent topological relations is somewhat counter-intuitive. A related failing is that the theory does not distinguish between in and inside, or between at and on, and the case of (5c) mentioned earlier, where there is a part of the gure that is outside the ground object, is ignored. Finally, carrying out formal reasoning using these vector models is still an open question. In short, the theory does not provide an adequate grounding in a spatial semantics that can be used for reasoning. 1.4.1.5 Assessment In summary, then, the prior theoretical research, while providing insightful discussions of the semantics of spatial prepositions, has made assumptions (such as those of cognitive linguistics) that are untenable in a computational approach, and has also largely ignored evidence from corpus-based annotation efforts at distinguishing senses in context. While compositional treatments of prepositional meaning have ourished, the question of what underlying spatial primitives to rely on has not thus far been tied to those available in qualitative reasoning systems. In Chapter 3, we explore topological and geometric representations that can be used for expressing prepositional meaning in qualitative reasoning systems. 1.4.2 Motion verbs 1.4.2.1 Langacker As with spatial prepositions, there has been a fair amount of research on the semantics of motion verbs. We had earlier discussed the inuential work of Talmy and Jackendoff. Another key cognitive linguist who has tackled motion is Langacker (1987). It is not possible to do justice to his overall cognitivist philosophy here; instead, let us get down to brass tacks and examine his analyses of motion verbs. Consider the verb enter. Langacker (1987) characterizes it as a dynamic process, whose conceptual semantics involves, in effect, a temporally indexed sequence of relations between the trajector (i.e. moving gure object) and the landmark (i.e. ground object, which may or may not move). The trajector changes from a state of being spatially OUT with respect to the landmark to a state of being IN with respect to the landmark. From his diagrams of image-schema16 (ibid.

An image schema is a mental pattern that recurrently provides structured understanding of various experiences, and is available for use in metaphor as a source domain to provide an understanding of yet other experiences (Johnson 1987, pp. 24).
16

Introduction

19

p. 245, gures 7.1 and 7.2), it appears that this change of state occurs over a conceived time interval, where the process involves a sequence of an indenite number of component states (ibid. p. 244). As for the relations IN and OUT, they are explained informally as follows: The relation [A IN B], based on immanence, species that the cognitive events constituting the conception of A (in a given domain) are included among those comprised by B. The relation of separation, which I will give as [A OUT B], is based on the absence of such inclusion. (ibid. p. 228). In contrast, the verb arrive, according to Langacker (1987), presupposes an extended path of motion on the part of its trajectory, but only the nal portions of this trajectorythose where the trajector enters the vicinity of its destination and then reaches itare specically designated by this verb. (ibid. p. 246). Langackers account does clearly capture some of our topological intuitions about enter. However, his presentation relies on diagrams representing image-schema, and there is no formal description of the process of entering. While one can accept the idea of a primitive spatial relation IN standing for inclusion, characterizing it in terms of relationships between cognitive events is somewhat vague. Further, there is no clear distinction between enter and arrive, except by way of various diagrams and the informal denitions above. More specically, there is no statement that arrive involves the trajector, at the end of the process, being merely AT the landmark, as opposed to being IN the landmark as in the case of enter. This problem is further borne out by his analysis of the verb leave: (Langacker 1988, p. 96) indicates that the trajector is at rst IN with respect to the landmark, and then overlaps with its boundary (i.e. trajector is AT the landmark), before being OUT with respect to the landmark. Here too, there is no difference from exit. Having critiqued his account, it is worth pointing out that Langackers intuitions reect a topological view of motion verbs. In Chapter 3, we will formalize notions such as IN in terms of mereotopology, and in Chapter 4, we will provide a formal semantics for verbs like enter and arrive that gives a specic computational interpretation to notions similar to Langackers. 1.4.2.2 Jackendoff Let us turn now to the interpretation of motion in Jackendoffs LCS (Jackendoff 1983, 1990). In LCS, verbs of spatial motion, such as bike, are given a common semantic template, which determines their syntactic behavior, shown in (15). (15) [Event GO+LOC ([Thing]x, [Path]y)] GO is a semantic primitive of motion, which is a function that takes as inputs a Thing and a Path and returns as output an Event. GO+LOC involves movement specialized to a locative semantic eld17. When the above verb template is combined with a path PP, we get examples like (16).
17

Analogously, verbs of temporal motion, such as delay, use GO+TEMP.

20

Interpreting Motion

(16a) John biked to the store. (16b) [Event GO ([Thing John], [Path TO ([Place AT ([Thing store])])])] A verb like enter is treated as equivalent to go into, and has the more instantiated semantics shown in (17). (17) [Event GO ([Thing]x, [Path TO ([Place IN ([Thing]y)])])] Note that LCS, in addition to bearing the disadvantages described in the previous section, also blurs important differences, since all motion verbs are represented just by either GO(Thing, Path), STAY(Thing, Place), as in cling, ORIENT(Thing, Path), as in point, BE(Thing, Place) as in lie, and GO_Ext(Thing, Path), as in reach, along with their specialization to different semantic elds. The inability to distinguish among verb meanings is a serious problem with such highly abstract representations of meaning. 1.4.2.3 WordNet Given the theories of verb semantics, one would expect that lexical resources would exist that provide a rich semantics for motion verbs. Unfortunately, this is not the case. We mentioned WordNet (Fellbaum 1998) earlier, and its differentiation and ranking of word senses based on corpora. In WordNet, verbs are grouped into a hierarchy, with related verbs differentiated by manner into troponyms. For example, the troponyms of arrive are: land, reach, ood/drive/ come in, light, perch, force-land, beach, disembark, debark, set down, touch down, and crash land. However, while WordNet is widely used for its coverage of relations such as synonymy and hypernymy, which is what it was designed for, it is impoverished not only in terms of the syntactic representations for the verbs, but also in terms of the absence of any semantic representation for lexical items. Consequently, researchers have integrated WordNet with other resources that provide the missing information. 1.4.2.4 VerbNet VerbNet (Kipper et al. 2006) is one such key lexical resource that provides syntactic and semantic information about verbs which are grouped into classes based on extensions of the well-known classication of Levin (1993). We rst discuss the latters classication, where verbs are grouped into semantic classes based on participating in common meaning-preserving syntactic constructions involving syntactic arguments, called diathesis alternations. For example, consider the verbs break and cut. As seen in (18) (examples from Kipper-Schuler (2005)), break participates in transitive (18a), the simple intransitive (18b), the middle construction (18c), but not the conative alternation (18d). (18a) (18b) (18c) (18d) John broke the jar. The jar broke. Jars break easily. *John broke at the loaf.

Introduction

21

In comparison, cut participates in the transitive, middle, and conative alternations. (19a) John cut the bread. (19b) *The bread cut. (19c) Bread cuts easily. (19d) John valiantly cut at the frozen loaf, but his knife was too dull to make a dent in it. These differences are grounds, in Levins account, for splitting break verbs (along with similar-behaving verbs such as chip, crack, crash, crush, fracture, rip, shatter, smash, snap, splinter, tear) into a separate class from cut verbs (with fellow-members chip, clip, cut, hack, hew, saw, scrape, scratch, slash, snip). In particular, the motion verbs (Levin class 51) are grouped into 9 subclasses. As Kipper-Schuler (ibid.) points out, this method also produces classes whose members are far from synonymous, e.g. the braid class, which counts among its members bob, braid, brush, clip, comb, condition, crimp, crop, curl, etc. Further, the classes are not disjoint, and some verbs are members of multiple classes with conicting sets of alternations. VerbNet attempts to x these and other problems by rening the classes (e.g. as in Dang et al. (1998), grouping together classes which share at least three members), adding new classes, integrating the classes with WordNet, and most importantly, providing semantic templates for each of the classes. For example, consider the semantics for the path verb arrive in VerbNet (version 3.1), as in arrived in the US. The entry species that the entity that lls the semantic role of Theme (the subject noun phrase (NP)) moves during the arrival event, and that at the end of the arriving event, the location of the moving object is in the US, i.e. the entity that lls the semantic role of the Oblique object (the PP). Thus, the semantic information for arrive is expressed as: (20) motion(during(E), Theme) location(end(E), Theme, Oblique)

As we shall see in Chapter 2, arrive is a verb whose meaning involves the gure object traversing a path that goes from its not being located at the ground object to its being at the ground object. Although (20) does not make reference to paths and to start(E), VerbNet appears to at least capture part of the meaning. However, as Zaenen et al. (2008) reveal, while some of the motion verbs in VerbNet (such as carry) have start and/or end point information, others dont, leaving a great deal of incompleteness. They argue that although they were able to get around some of these glitches and extract change of location information from VerbNet by a variety of post-processing rules, there is a more fundamental problem with the VerbNet approach: the classication is driven by syntactic considerations separating arguments from adjuncts. As is well-known, there is no one-to-one mapping between syntactic predications and semantic ones. The latter often include

22

Interpreting Motion

as arguments constituents that are syntactically adjuncts. For lexical resources to be helpful in normalizing textual information, they have to encode the distinction between syntactic and semantic predication and be systematic about the correspondence between the two. (ibid., p. 390). Their investigation reveals, unfortunately, that VerbNet lacks such a systematic mapping.18 1.4.2.5 FrameNet Another well-known lexical resource is FrameNet (Baker et al. 2003), which has been developed based on the underlying theory of Frame Semantics, e.g. Fillmore (1976). It involves specifying each lexical items syntactic properties in the context of a hierarchy of semantic structures called frames, which represent the experiential knowledge evoked by lexical items. The semantic roles of verbs (called frame elements) are annotated in terms of corpus examples. For example, consider the path verb arrive, for which a FrameNet III example is shown in (21). (21)
TIVE]

[The Princess of Wales THEME] arrived TARGET [smiling and laughing [at a Christmas concert GOAL] [last night TIME].

DEPIC-

In FrameNets view, the lexical entry arrive evokes the frame of arriving, which is a subframe of (i.e. is part of) the traversal frame, which in turn is a subclass of the motion frame and involves the Theme changing location with respect to a Path. In the motion frame, a Theme starting out at a location expressed by the Source role ends up at a Goal location, covering space between the two, expressed by the Path role; or else, the Theme moves in a particular Area of Direction, or its Distance may be expressed.19 Arriving involves a moving object (lling the semantic role of Theme) moving in the direction of a location lling the semantic role of Goal. According to the comments for the arrive lexical entry, the Goal is always implied by the verb, but may or may not be explicit in the text; it indicates where the Theme ends up, or would end up, as a result of the motion. Note that this FrameNet representation is weaker than the one we have been advocating, in that it doesnt commit to the gure object of the Princess of Wales in (21) being located, at the point of arrival, at the ground object (the site of the Christmas concert). In turn, FrameNets representation for the preposition at, while it is associated with a Locative_relation frame (a subclass of the Trajector-Landmark frame that is derived from Langackers account), does not convey any specic semantics for at.

In more recent work, Palmer et al. (2009) have tried to address some of these issues. The motion frame is dened as Some entity (Theme) starts out in one place (Source) and ends up in some other place (Goal), having covered some space between the two (Path). Additional frames that inherit the motion frame elaborate on this denition. Goal-proling frames account for verbs such as reach. Source-proling frames capture verbs from the Leave class. Path-proling frames are for verbs such as traverse or cross, and, nally, the manner of motion can be elaborated on in additional frames for verbs like run and y.
19

18

Introduction

23

Likewise, the verb enter, which is also associated with the arriving frame and illustrated in (22), does not indicate that at the end of the event, the gure we is inside the ground object the upper room, thus failing to distinguish enter from arrive (in the latter, the gure is merely at the ground). (22) We THEME entered TARGET [the upper room GOAL] [by a ight of stairs leading from the north side of the yard PATH].

While FrameNet seems to do well with change of location motions, the hierarchy can be confusing. Sometimes the motion frame is directly inherited as in the case of the traversal frame. Conversely, the departing frame uses the motion frame (i.e. it does not necessarily inherit or specialize the semantic roles of the motion frame) and is a subclass of the traversal frame. As another example, the manner verb drive is associated with the frame of operate_vehicle, which has semantic roles that include those illustrated in (23), from FrameNet III.20 (23a) (23b) (23c) [Jamie Shepherd DRIVER] droveTARGET [the bucketing old vehicle [out of the estate SOURCE] [towards the main road PATH]. [The riders DRIVER] droveTARGET [all over the place AREA].
VEHICLE] VEHICLE]

Dhamma is [the charioteer DRIVER] [that DRIVER] drivesTARGET [the chariot [along the road [to Nirvana GOAL] PATH].

The frame operate_vehicle is a subclass of the Operating_a_system frame, inheriting or specializing all its semantic roles; it also uses the motion frame. However, the combined information does not explicitly indicate that driving a vehicle involves an iterated change of location. In Chapter 2, we will provide such a semantics for manner verbs like drive. All in all, while FrameNets rich subclassication of motion verbs and its integration of semantics, syntax and corpus data are both impressive and commendable, FrameNet does not address or explicitly represent the sorts of spatial relationships involved in motion that we have been emphasizing. Further, although it has been used for inferential tasks such as question-answering (Narayanan and Harabagiu 2004), FrameNets representation, even when mapped to knowledge representation languages such as OWL, is not directly amenable to spatial reasoning. And although FrameNet, VerbNet and WordNet have been mapped to each other, e.g. (Shi and Mihalcea 2005), such an integrated resource, given the discussion above, also does not address our desiderata.

As the FrameNet III website indicates, the semantic role AREA is used for expressions which describe a general area in which motion takes place when the motion is understood to be irregular and not to consist of a single linear path. Locative setting adjuncts of motion expressions may also be assigned this frame element.
20

24

Interpreting Motion

1.4.2.6 Verb classications based on qualitative reasoning Let us now turn to other verb classications, inspired by work in qualitative spatial reasoning (QSR). One of the most successful models in QSR, which has been used for static spatial relations, is the Region Connection Calculus 8 (RCC-8), (Randell et al. 1992), a calculus grounded in mereotopology (to be discussed in Chapter 2). It identies the following eight jointly exhaustive and pairwise disjoint relations between two regions A and B: (24) a. b. c. d. e. f. Disconnected (DC): A and B do not touch each other. Externally Connected (EC): A and B touch each other at their boundaries. Partial Overlap (PO): A and B overlap each other in Euclidean space. Equal (EQ): A and B occupy the exact same Euclidean space. Tangential Proper Part (TPP): A is inside B and touches the boundary of B. Non-tangential Proper Part (NTPP): A is inside B and does not touch the boundary of B. g. Tangential Proper Part Inverse (TPPi): B is inside A and touches the boundary of A. h. Non-tangential Proper Part Inverse (NTPPi): B is inside A and does not touch the boundary of A.

As we shall see in Chapters 2 and 3, RCC-8 and other systems like it do an adequate job of representing static information about space. However, it cannot help us deal with motion, since that task requires a temporal component. Muller (1998) proposes just such a system, one which merges spatial and temporal phenomena with a qualitative theory of motion based on spatiotemporal primitives. This system has at its base a topological system borrowed from Asher and Vieu (1995) that is similar to RCC-8 but adds the concept of open and closed regions, as well as a set of temporal relations that include a relation of temporal connection, along with the standard ordering relations. The result of Mullers system is a set of six motion classes: leave, hit, reach, external, internal, and cross. Asher and Sablayrolles (1995) offer a related account of motion verbs and spatial prepositional phrases in French. They propose ten groups of motion verbs as follows: sapprocher (to approach), arriver (to arrive), entrer (to enter), se poser (to alight), sloigner (to distance oneself from), partir (to leave), sortir (to go out), dcoller (to take off), passer (par) (to go through), and dvier (to deviate). This verb classication is more ne-grained than Mullers. Asher and Sablayrolles, however, do not have any groups that match well with Mullers internal and external. In addition, Muller does not include a class for the inverse of hit. The most striking difference between the accounts is that Asher and Sablayrolles include a notion of metric distance that Muller does not. This allows the separation of verbs such as approach and reach. For Muller, approach would have to be a simple external motion, which does not adequately capture the meaning of this verb.

Introduction
TABLE 1.1. A revised classication of motion verbs
Class MOVE MOVE_EXTERNAL MOVE INTERNAL LEAVE REACH ATTACH DETACH HIT FOLLOW DEVIATE STAY Examples drive, y, run drive around, pass walk around the room desert, leave arrive, enter, reach approach disconnect, pull away, take off hit, land chase, follow ee, run from remain, stay FrameNet Motion or Self motion Traversing Motion Departing Arriving Attaching X Impact Co-Theme Fleeing State continue Muller X External Internal Internal Reach X X Hit X X X Asher and Sablayrolles X X X partir, sortir arriver/entrer X dcoller se poser X dvier X

25

How do the semantic classications of Muller, Asher, Sablayrolles, and Vieu among others relate to those in VerbNet and FrameNet? To answer this, Pustejovsky and Moszkowicz (2008) mapped Asher and Sablayrolles verbs to VerbNet classes. The mapping revealed that while many of the motion predicates we care about have specic classes in VerbNet, it is not always clear what these classes have in common unless we look to FrameNet to nd a higher level representation. Pustejovsky and Moszkowicz (ibid.) therefore considered a mapping to FrameNet, arriving at a more expressive verb classication. The resulting ten classes are based largely on Mullers classications with some very slight modications detailed in Table 1.1, along with some revisions we have made. Here X means there is no mapping. 1.4.2.7 Compositional semantics, revisited So far, we have discussed motion verbs as well as spatial prepositions separately, but of course when they combine together in sentences there is the question of specifying and composing together the meanings of each constituent. Our approach, discussed in Chapter 4, leverages a richer semantics for nouns, prepositions, and motion verbs that allows one to parcel the meaning contributions of the various constituents appropriately, without promiscuously proliferating preposition senses. For example, in (5b) discussed earlier (the coffee in the cup), cup has a noun sense as an open container made of solid material used for drinking; this comes out of its lexical entry, based on the Generative Lexicon (GL) account of Pustejovsky (1995, 2001). The preposition in has a meaning that involves an underspecied notion of containment, specically inside a container. Thus, in the cup involves containment

26

Interpreting Motion

inside a drinking instrument. Coffee has a noun sense of being constituted of liquid material. To glue the two together, to get coffee in the cup, the liquid has to be contained in the container, and for that its convex hull21 is required to be inside the container. This is achieved within a compositional semantics using GL (based on notions of coercion and co-composition), via an axiom of world knowledge. In (5c), spoon is an eating instrument with a handle, and constituted of solid material, and to be contained in a container, it is sufcient for a part of it to be inside the container. The details of how this integration is performed compositionally are explored in Pustejovsky (forthcoming). Likewise, consider the preposition around. In (25a), the walking is outside the pool, whereas in (25b), the swimming is inside the pool. (25a) He walked around the pool. (25b) He swam around the pool. Clearly, it is the verb which differentiates the spatial relationship between gure and ground in each case, rather than the preposition. Here, around creates a region that is displaced relative to the ground region, without committing to the direction of displacement. It is the medium of the motion (a parameter of verb meaning) that has a contrasting value in this case: swimming involves water as the medium, whereas walking involves a solid surface, setting aside some notable (e.g. mythological) exceptions. This overview of approaches and resources for analysis of motion in language establishes that while there have been a variety of linguistic theories and resources that provide a classication of motion verbs, a substantial gap exists in terms of actually representing the spatial semantics of motion in a manner consistent with our desiderata. The fact that even basic sense differences such as the distinction between the motion verbs enter and arrive are not adequately explicated by these theories shows that they are not expressive enough for natural language. We have suggested that our account has an improved modularity that allows verbs, nouns, and prepositions to contribute spatial meaning in such a way that these meanings can be composed together (within a particular GL-derived compositional account) so as to provide negrained meaning differences, without proliferating prepositional senses. Finally, we have arrived at a verb classication that builds on and extends earlier ones.

1.5 Caveats
An interdisciplinary book like this one is necessarily restricted in scope, and as a result there are several deliberate lacunae. First and foremost, the theory being
21 The convex hull of a region, treated as a set of points S, is the boundary formed by the minimal convex set containing S.

Introduction

27

FIGURE 1.1 Acceptability ratings, rotation, and functional information, from Coventry (2003, p. 60)

developed here is essentially a semantic one. As such, questions of pragmatics, which of course are key to the understanding of language in context, are not addressed. We have already observed that the meaning of spatial prepositions, even when putting aside metaphorical uses, can involve functional notions such as support and affordances, i.e. the nature of interactions with the ground object. An especially compelling argument implicating functional notions is found in the experiments of Coventry et al. (2001). They showed subjects pictures of the kind displayed in Figure 1.1, and asked them to rate the acceptability of sentences of the form the Figure is preposition to the Ground, where the prepositions used were over, above, under, and below. For example, a given sentence could be the umbrella is over the man. Not only were the ratings related to the degree of rotation of the gure from the vertical plane, but ratings for functional scenes (the middle row) were higher than those for controls (top row), which were in turn higher than for non-functional scenes (bottom row). In addition to Coventry et al. (2001), there have been a substantial number of other psycholinguistic investigations into the acceptability of different spatial terms given geometric and functional relations between gure and ground, e.g. (Logan and Sadler, 1996; Garrod et al., 1999; Carlson et al., 2003; Coventry, 2003), with the latter two developing a psychologically-grounded computational model that integrates

28

Interpreting Motion

these two types of relations. We will not survey these here; sufce it to say that in our framework, as discussed in Chapters 3 and 4, we do not as yet address such functional information or different degrees of centrality in word meaning. Other topics that we leave out include perceptual accessibility (e.g. visibility and occlusion) of the objects to the viewer. Nor do we consider the pragmatic conditions under which particular spatial references take place and succeed (e.g. the speakers choice of a reference frame and point-of-view, the details of a spatial description in the presence of particular distractors in the environment, etc.). A good discussion of these and other factors is found in the work of Tenbrink (2007). Finally, a book of this limited length cannot claim to offer a thorough survey of the eld; in the course of our exposition, the best we can do is to cite other papers that introduce the reader to the relevant literature.

1.6 Conclusion
Let us rst summarize the argument so far. We launched this book with a discussion of the substantial challenges faced by todays text-to-sketch technology in terms of comprehending natural language. We based our approach on two key insights from the previous literature: research on the types of spatial abstractions underlying language use, and the distinction between satellite-framing patterns (used with manner-of-motion verbs like bike, drive, y etc.) and verb-framing patterns (used in path-verbs such as arrive, depart etc.). The former provides inspiration for our account of qualitative spatial relations based on a theory of mereotopology, to be explicated in Chapter 3. The latter distinction motivated our differentiating, in our semantic theory, between action-based and path-based predicates, leading to a rstorder dynamic logic (discussed in Chapters 2 and 4) where events are modeled as dynamic processes or static situations. For the approach to be of practical use in computational approaches, ve specic requirements have to be met. When considered in the light of these requirements, the prior theories of spatial prepositions turned out to be rich in fundamental insights, but made assumptions untenable for a computational approach, while also ignoring evidence from corpus-based word-sense disambiguation. While compositional treatments of the semantics of spatial prepositions were available, the question of what underlying spatial primitives to rely on was not tied to those available in qualitative reasoning systems. As for motion verbs, we found a gap in terms of a lack of expressiveness and some specic shortcomings with respect to our desiderata. We indicated how the compositional integration of prepositional, verb, and noun meanings will be handled in our framework. We also proposed what we believe to be a more expressive verb classication than has been hitherto considered. Finally, we listed some of the obvious lacunae in our approach.

Introduction

29

In Chapter 2, we will delve more deeply into how motion is expressed in natural languages, introducing a framework that analyzes different parameters of spatial meaning in natural language in terms of successively more expressive representation languages. Following that, in Chapter 3, we will examine spatial and temporal representations and inference methods that have been developed based on qualitative reasoning, applying them to spatial phenomena in language involving topological and orientation relations. Chapter 4 applies the methods discussed in Chapters 2 and 3 to motion, providing a grounding for the semantics of motion expressions in language within a cognitively inspired spatiotemporal model of change. We demonstrate how the two linguistic strategies for encoding motion (that of path constructions and manner-of-motion constructions) can be modeled within an operational (dynamic) interval temporal logic. We also show how prepositional, noun, and verb meanings are integrated together compositionally. Chapter 5 turns to algorithms for linguistic analyses of motion that leverage information from text corpora, delving into the methods and results from corpus annotation. This provides, among other things, for end-to-end systems that allow for automatic text-to-sketch mapping. Finally, in Chapter 6, we summarize our approach and its potential advantages, and discuss more broadly the kinds of new inferential capabilities and applications to which our approach can contribute.

You might also like