You are on page 1of 22

Journal of Memory and Language 53 (2005) 292313

Journal of Memory and Language


www.elsevier.com/locate/jml

Beyond salience: Interpretation of personal and demonstrative pronouns q


Sarah Brown-Schmidt a,*, Donna K. Byron b, Michael K. Tanenhaus a
a

Department of Brain and Cognitive Sciences, Meliora Hall, University of Rochester, Rochester, NY 14627, USA b Department of Computer Science and Engineering, Ohio State University, USA Received 30 August 2004; revision received 16 March 2005 Available online 25 April 2005

Abstract Three experiments examined the hypothesis that it preferentially refers to the most salient entity in a discourse, whereas that preferentially refers to a conceptual composite. In Experiment 1, eye movements were monitored as participants followed spoken instructions such as, Put the cup on the saucer. Now put it/that. . .. The preferred referent was the theme (cup) for it and the composite for that (cup on the saucer) with the goal (saucer) rarely chosen. Experiment 2 demonstrated that stressing it reduces the number of theme interpretations. Experiment 3 replicated the main ndings from Experiment 1, regardless of whether or not the theme was the backward-looking center. The authors conclude that entities without linguistic antecedents are sometimes preferred over entities with linguistic antecedents and a single construct such as salience is insucient to account for dierences among referential forms. Candidate reference-specic constructs include the availability of conceptual composites and syntactic role. 2005 Elsevier Inc. All rights reserved.
Keywords: Anaphora; Salience; Cognitive status; Demonstrative; Conceptual composite; Eye-tracking; Pronoun

Introduction A speaker has numerous choices available for referring to an entity that has previously been mentioned or is otherwise salient in the discourse. For example, a glass of whisky can be called it, this, or the drink. A central challenge for models of language processing is to understand what conditions underlie the choices made by speakers and how those choices inuence reference resolution for the listener.
q This work was supported by NIH Grant HD-27206 to M.K. Tanenhaus. * Corresponding author. Fax: +1 585 442 9216. E-mail address: sschmidt@bcs.rochester.edu (S. BrownSchmidt).

One inuential approach within linguistics and computational linguistics directly relates the referential forms to the salience of the speakers representation of the referent (Ariel, 1990; Givon, 1983; Gundel, Hedberg, & Zacharski, 1993). Perhaps the most complete saliencebased model is the Givenness Hierarchy proposed by Gundel et al. (1993), as seen in Table 1. The Givenness Hierarchy proposes that entities in a discourse dier in salience (referred to as cognitive status), and can be organized in a subsumption hierarchy rather than into mutually exclusive categories. Each salience category entails all lower categories (categories to the right in the table). Thus, focused items are also activated, familiar, uniquely identiable, and so forth. Unstressed pronouns (personal pronouns) and zero pronouns may only be used to refer to entities with

0749-596X/$ - see front matter 2005 Elsevier Inc. All rights reserved. doi:10.1016/j.jml.2005.03.003

S. Brown-Schmidt et al. / Journal of Memory and Language 53 (2005) 292313 Table 1 The Givenness Hierarchy Status: Form: In focus > It/them/they Activated > This/that Familiar > That (noun)/ This (noun) Uniquely identiable > The (noun) Referential > This (noun) (indenite use)

293

Type identiable A (noun)

focused status. In contrast, demonstrative pronouns and accented personal pronouns may be used to refer to entities with activated status, viz. entities that have been evoked into short-term memory by some trigger in either the discourse or the conversational setting. Gundel et al. (1993) dene focus as follows: the entities in focus at a given point in the discourse will be that partially ordered subset of activated entities that are likely to be continued as topics of subsequent utterances [p. 279], for example subjects and direct objects of matrix sentences [p. 279]. Gundel et al. (1993) combine the salience hierarchy with Gricean-based reasoning. They suggest that informativeness dictates using the appropriate referring expression that is highest on the Giveness hierarchy. Thus, each form is interpreted as referring to a referent of a particular salience level. For example, use of a demonstrative signals a less salient referent because a more salient referent would have been signaled by a personal pronoun. The Givenness Hierarchy, and, more generally, the assumption that salience or activation is the unifying dimension that distinguishes among referential forms, is consistent with a range of results in the language processing literature, including those inspired by the inuential centering theory (Gordon, Grosz, & Gilliom, 1993; Grosz, Joshi, & Weinstein, 1995; Grosz & Sidner, 1986; van Gompel & Majid, 2004; Walker, Joshi, & Prince, 1998). For example, pronouns and full names are interpreted more quickly when referring to focused and non-focused entities, respectively (Gordon et al., 1993; Hudson, Tanenhaus, & Dell, 1986). Almor (1999) has demonstrated similar results for pronouns and denite noun phrases. The salience-based approach also underlies most computational algorithms for reference resolution, from early work (e.g., Grosz, 1977; Winograd, 1972) up to recent models (Baldwin, 1997; Strube, 1998; Tetreault, 2001). It is important to note that while most manipulations of salience have focused on linguistic factors, non-linguistic factors are generally understood to also aect referent choice via salience. According to salience approaches, then, linguistic and non-linguistic factors combine to establish the salience of each potential discourse referent, which in turn aects reference choice. However, it remains unclear whether a uniform dimension such as salience, activation, or expectancy (Arnold, Wasow, Losongco, & Ginstrom, 2000) is sucient to account for how dierent referring expressions are processed. Clearly, grammatical features associated

with pronouns, such as number and gender, inuence reference resolution above and beyond salience (Arnold, Eisenband, Brown-Schmidt, & Trueswell, 2000; Sanford & Garrod, 1989). Potentially more problematic for uniform salience-based accounts is the possibility that different referential forms may place dierent weights on the factors that inuence choice of referential form for the speaker and the preferred interpretation for the listener. In such a reference-specic framework, the mapping between referential forms and referents is mediated by multiple factors, with dierent referential forms being more sensitive to some factors than others. Thus, a complete theory of referring expressions must specify the set of constraints and relative weights for each type of anaphor. Perhaps the clearest evidence in support of this hypothesis comes from Kaiser (2003). Using corpus analyses, questionnaires and on-line processing studies, Kaiser (2003) and Kaiser and Trueswell (in press) demonstrate that the interpretation of the Finnish pronouns han (s/he) and tama (this) can not be described by a uni dimensional representation of salience. Instead, the results indicate that in contexts where the antecedents are full NPs, tama tends to refer to low-salience refer ents, and han to subjects, regardless of word order or sal ience. In other words, one could hypothesize that in these contexts, han is more sensitive to information at the syntactic level, and tama more sensitive to informa tion at the discourse level. In this article, we explore related hypotheses within the reference-specic framework for the comprehension of the pronouns it and that, which, as we have seen, are adjacent to one another in the Givenness Hierarchy. The rst hypothesis we consider is that English demonstrative pronouns preferentially refer to conceptually complex or composite entities. The second is that personal pronouns are sensitive primarily to the salience of the potential referent. In the next section, we discuss current work on the pronouns it and that, and outline the hypotheses and predictions of the two frameworks regarding the interpretation of it and that. In contrast to the extensive literature on the processing of personal pronouns (see Garnham, 2001), demonstrative pronouns have received relatively little attention in the psycholinguistics and computational linguistics literature, perhaps because they are less common in text compared to personal pronouns. However, recent analyses of conversational speech nd that demonstratives occur in much higher proportions in spoken language than in text (Eckert & Strube, 2000), and in some cor-

294

S. Brown-Schmidt et al. / Journal of Memory and Language 53 (2005) 292313

pora they even appear as frequently as third-person personal pronouns (Byron & Allen, 1998). Several studies have concluded that demonstratives refer to less-salient entities than personal pronouns. Schuster (1988) presented informants with short discourses (as in example (1)) in which the form of the pronoun alone was varied in the second sentence. Her informants had dierent preferred interpretations for it and that, as illustrated in the discourse in (1): (1) a. John thought about becoming a street person. b1. It would hurt his mother and it would make his father furious. b2. It would hurt his mother and that would make his father furious. Use of the personal pronoun it in the second conjunct of sentence (1.b1) maintains the reference established by the rst it (Johns becoming a street person). In contrast, use of the demonstrative pronoun that in the alternative sentence (1.b2) changes the interpretation to something like Johns mother being hurt would make his father furious. In a similar study, Borthen, Fretheim, and Gundel (1997) found the same pattern in both English and Norwegian. Consistent with the Givenness Hierarchy, this alternation in meaning can be attributed to the eect of attentional focus: personal pronouns prefer referents that are highly salient, whereas demonstrative pronouns prefer less-salient referents. A similar distinction between it and that was also found in several studies of naturally occurring discourse. In a corpus of spoken descriptions of apartment layouts, it was used to refer to the room currently being described, which is taken to be the local focus, whereas that was used to refer to portions of the apartment outside the current focus of attention (Linde, 1979). Linde found that it and that were for the most part in complementary distribution. However, a few cases violated this pattern, leading her to conclude that there is some overlap in the conditions for choosing between that and it. In a set of career counseling interviews, Passonneau (1989, 1993) isolated two factors that characterize the dierences between the use of personal and demonstrative pronouns. First, demonstratives were used more than personal pronouns when either the pronoun or its antecedent was not the subject of its clause. Second, when the speaker made reference to an entity described previously by a clausal or sentential argument, the more clause-like the antecedent, the more likely it is for a speaker to refer to it using a demonstrative pronoun. Passonneau concluded that each of these conditions indicates that demonstratives are used to refer to entities of lower salience than those referred to by personal pronouns. These observations are also generally consistent with Webbers (1991) account of discourse deixis, which

characterizes which portions of a text are suitable for reference with a demonstrative pronoun. However, observations by Channon (1980), suggest that salience might not be the primary factor determining the choice between it and that. Channon (1980) noted that demonstrative pronouns have looser agreement features than personal pronouns. Demonstratives can be used to accomplish identity-of-sense reference (anaphoric mention of an object of the same type as the antecedent, but not the same instance as the antecedent) in addition to identity-of-referent anaphora,1 and their numeric agreement allows non-singular objects to be picked out by a singular demonstrative. He suggested that speakers choose to use a demonstrative when the antecedent is a composite entity with conicting or unclear semantic features, as in (2): (2) Patron #1: Ill have the hamburger and fries. Patron #2: Ill have that, too. Here, the composite of hamburger and fries is complex, composed of two individual entities, one of which is singular, and the other plural. Demonstrative pronouns often do not have clearly identiable linguistic antecedents. As a result, computational algorithms that work well for assigning interpretations to personal pronouns perform poorly with demonstratives (Byron, 2002). It also complicates the determination of salience for the possible referents, when the salience calculation depends on linguistic criteria. For instance, in (2) the hamburger and fries are mentioned in a coordination phrase, so each item on its own should be of roughly equal salience. However, it is unclear how the conjoined referent hamburger + fries should be ranked for salience with respect to the individual entities (see Kaup, Kelter, & Habel, 2002; Sanford & Moxey, 1995; Sanford, Sturt, Moxey, & Morrow, 2004, for discussion and some approaches to handling plural referring expressions with conjoined referents). In sum, whereas demonstratives have attributes that make them dicult to characterize using a purely salience-based account, the hypothesis that demonstratives prefer complex entities could provide a more accurate characterization of the use of these pronouns. The complex entity hypothesis accounts for Channons observations as well as those results that have been used to argue that demonstratives are preferentially used to refer

1 One of the reviewers raised the interesting hypothesis that that may be more acceptable for identity of sense reference compared to it, even when morphosyntactic features of the referent match both pronouns. While this hypothesis warrants further research, the experiments presented here use real world objects, which makes identity of sense interpretations unlikely.

S. Brown-Schmidt et al. / Journal of Memory and Language 53 (2005) 292313

295

to entities that are linguistically less salient than those referred to by personal pronouns. In the following experiments, we examine the interpretation of it and that to evaluate predictions derived from the salience and reference-specic frameworks. The salience hypothesis, as exemplied by the Givenness Hierarchy, predicts that the dierent referential forms conventionally signal discourse referents that dier in salience. As applied to personal and to demonstrative pronouns, the salience approach predicts that addressees should interpret personal pronouns as referring to the most salient entities in a discourse, and should typically interpret demonstratives as referring to less salient entities. Because the Givenness Hierarchy claims that it is acceptable to refer to a more salient entity using a less specic term, but not vice versa, that could refer to the entity in focus, but unstressed it should never refer to a non-focused entity. However, personal pronouns which are accented or carry extra stress can refer to less salient entities (Akmajian & Jackendo, 1970; Cahn, 1995; Kameyama, 1999; Lako, 1971; Nakatani, 1997). Thus, stressed personal pronouns should behave more like demonstratives. In contrast, we examine the hypothesis that it prefers to refer to salient entities, and that prefers to refer to conceptually complex entities. The complex entity hypothesis predicts that the demonstrative that is often used to refer to complex or composite entities, such as a hamburger and fries, or a cup on a saucer. While the Givenness Hierarchy may accurately describe overall interpretation patterns, alternations in referring forms are not exclusively a function of the relative accessibility of referents. Instead, the observed accessibility-form mapping is a by-product of the conditions governing use of a particular referential expression, with each referential form placing dierent weights on the factors that inuence the use and interpretation of referential expressions. For example, full noun phrases may be preferred when the speaker chooses to shift focus from one entity to another, call attention to a particular feature of an established discourse referent, or introduce a new discourse topic, a move that may also introduce new discourse referents (Vonk, Hustinx, & Simons, 1992). Speakers tendencies to use demonstratives to refer to conceptual composites should create distributional patterns that result in bias for readers and listeners to interpret demonstratives as referring to complex referents, when they are available. In contrast, personal pronouns like it are primarily sensitive to salience factors and preferentially used to refer to focused entities. Thus, addressees should interpret it as referring to the entity in focus. To evaluate these hypotheses, we created short discourses in which participants were instructed to move everyday objects, such as cups and saucers, or childrens blocks. For example, the rst utterance might instruct

the participant to Put the cup on the saucer. In an utterance like this, the theme (the cup) is predicted to be more salient than the goal (the saucer) due to the fact that the cup is introduced rst in the sentence, and also due to the fact that an item in the grammatical role of direct object is typically considered to be more salient than items in adjunct positions. Using grammatical roles to determine the relative salience of entities in a sentence has a long-standing tradition in computational linguistics (see for instance Grosz et al., 1995; Winograd, 1972) as well as in theoretical accounts of accessibility (for example Ariel, 1990; Gundel et al., 1993). The ordering typically used for English is Subject > Direct Object > Indirect Object > Adjuncts. This instruction also creates a potential composite, the cup on the saucer. Crucially, the linguistic expression that introduces the cup on the saucer is not a linguistic constituent in the utterance, whereas the individual entities the cup and the saucer are. The next instruction contains the personal pronoun it, as in Now put it over by the lamp, or the demonstrative pronoun that as in Now put that over by the lamp. In Experiment 1, we manipulated the availability of conceptual composites in two ways. First, we instructed the participant to put the theme object (the cup) on the goal object (the saucer) or next to the goal object. A cup on a saucer intuitively creates a more natural composite object than a cup next to a saucer. Additionally, reading studies suggest that plural pronouns more easily refer to two individuals when they are closer in space (Carreiras, 1997). Second, we used dierent categories of objects, either wooden blocks or everyday objects such as cups and saucers. Using two types of objects allows us to test interpretation preferences for dierent types of referents. Additionally, we expected that two functionally related objects such as a cup on a saucer might form a more natural composite than two stacked blocks, which are less individuated from the other objects in the scene. In Experiments 2a and 2b, we investigated the role of prosody in the interpretation of it and that, examining the effects of prosodic emphasis when composites are or are not available in the scene. These experiments test the hypothesis that accented pronouns refer to non-focused entities (Cahn, 1995; Kameyama, 1999; Nakatani, 1997). In Experiment 3, we manipulated the coherence of the discourse to test the hypothesis that the preference to interpret it as the most salient entity is stronger when the most salient entity is more strongly in focus. Finally, these experiments allow us to investigate an assumption that is often implicit in models of reference resolution. Most formal theories of reference assume that all referential expressions, with the exception of certain types of ellipsis and, in some grammatical frameworks, empty categories, are linked to conceptual or

296

S. Brown-Schmidt et al. / Journal of Memory and Language 53 (2005) 292313

discourse entities that are introduced by, or evoked by linguistic expressions (antecedents) or relevant non-linguistic context (e.g., Heim, 1983; Kamp, 1981; Webber, 1979). However, many computational and psycholinguistic models of reference resolution implicitly or explicitly assume that discourse entities with linguistic antecedents are given initial priority. Experimental investigations often only examine linguistically introduced expressions, and often assume that a referring expression accesses its discourse referent by reactivating the linguistic antecedent with which it co-refers (e.g., Nicol & Swinney, 1989). We should note, though, that computational algorithms for reference resolution initially evaluate a ranked list of entities introduced linguistically as a heuristic because few systems have a way to evaluate conceptual factors. The underlying theories typically assume that some form of cognitive status, such as salience, determines the set and ranking of entities. The conceptual composites created in the discourses we examine oer an interesting case because the rst instruction of each discourse introduces a focused entity (the theme) and a salient, but non-focused entity (the goal), each of which has a linguistic antecedent. In contrast, the conceptual composite does not have a linguistic antecedent. Thus, if entities with linguistic antecedents are evaluated rst, then the theme and the goal should be evaluated before a composite. And, if entities with linguistic antecedents are always more salient that entities without linguistic antecedents, then it should prefer goal interpretations over composite interpretations.

the reference-specic framework makes the prediction that the demonstrative that is preferentially interpreted as referring to a complex entity such as a conceptual composite, when one is available (Channon, 1980). We measured eye movements because the participants oline choices might not reect early processing of the pronouns. Initial interpretation preferences should be reected in eye movements beginning 200 ms after the onset of the pronoun, based on earlier work on the on-line interpretation of pronouns using the visual world eye-tracking technique (Arnold et al., 2000). This version of the visual world paradigm uses real-world referents, a feature that allows us to examine the participants nal interpretation of the pronoun, and to compare the status of linguistically and non-linguistically introduced referents. Method Participants Sixteen participants were recruited from the University of Rochester undergraduate community and paid for their participation. All participants were native speakers of North American English, with normal or corrected-to-normal hearing and vision. Materials On each trial, four objects were arranged on a table in front of the participant as illustrated in Fig. 1. Two classes of objects were used: childrens blocks and everyday objects. The childrens blocks were six small, brightly colored blocks with dierent shapes (all cuboids) and colors (yellow, red, purple, blue, orange, and green). The everyday objects (see rst table of Appendix A) were slightly larger than the blocks, and varied more in size and shape. The four objects, such as a cup and a saucer, and a toy lamp and a table, were selected to form two (separate) coherent wholes when the objects were placed together. Experimental trials. Example sets of instructions from trials using the childrens blocks and the objects are presented in examples 3 and 4, respectively.

Experiment 1: Object coherence This experiment uses the action-based version of the visual-world paradigm (Tanenhaus, Spivey-Knowlton, Eberhard, & Sedivy, 1995) to compare the predictions that the salience and reference-specic frameworks make for the interpretation of it and that. Both frameworks make the prediction that it preferentially refers to the focused, or most salient, entity. For the demonstrative, the salience-based framework predicts that that should be interpreted as referring to a non-focused entity, whereas

Fig. 1. A 2-D representation of the experimental display, for the blocks (A), and objects (B) conditions of Experiment 1.

S. Brown-Schmidt et al. / Journal of Memory and Language 53 (2005) 292313

297

(3) a. Put the red block next to the blue block. b. Now put that on the green block. c. Put the green block on the yellow block. d. Now put the blue block on the yellow block. (Display includes: a red, blue, green, and yellow block.) (4) a. Put the cup on the saucer. b. Now put it over by the lamp. c. Put the table next to the lamp. d. Now put the cup in front of the table. (Display includes: cup, saucer, table, and lamp.) In both examples, the rst sentence (3a and 4a) introduces two discourse entities, one in the role of theme (the object being moved) and one in the role of goal (the destination of the moved object). The theme, which is underlined above, is predicted to be the most salient entity at the onset of Now in the (b) sentences. The (b) sentences contained a pronoun, either it or that. Eye movement data were analyzed from the onset of the pronoun and continued until participants completed an action in response to that command. The third and fourth instructions never contained pronouns, and eye-tracking data associated with these sentences were not analyzed. Sixty-four instruction sets, each containing four instructions, were used in the study. Thirty-two contained blocks and 32 contained common objects. Half of these trials contained a pronoun in the second instruction. Stimulus sentences were recorded by the rst author, speaking as naturally as possible. The natural stress on that was somewhat more pronounced than the stress on it; that was longer than it by an average of 88 ms. Previous observations that prosodic dierences in a single word can be foreshadowed in length dierences earlier in a phrase (see Arnold, Fagnano, & Tanenhaus, 2003) prompted us to examine length dierences in the words preceding the pronoun. Our carrier phrases were Now put it/that next to. . .. Now was signicantly shorter for phrases containing that compared to phrases containing it. While the length of put was equivalent for that and it phrases, the time in between the oset of put and the onset of the pronoun was longer for that phrases, suggesting that the longer duration of that is reected in the prosody of the preceding words and pauses. We will address these timing issues when we turn to analyses of the eye movement data. Filler trials. The 32 ller trials and the two ller sentences on each target trial (e.g., 3c and d; 4c and d) never contained pronouns and were designed to equate the probability of manipulating each object or composite, given the previous instruction. The sentences in (5) are an example ller trial. (5) a. Put the nest next to the hamburger. b. Now put the plate under the hamburger. c. Put the bird in the nest.

d. Now put the hamburger and the bird in front of the nest. (Display includes: hamburger, plate, bird, and nest.) Equating the probability of manipulating each object, given the previous instruction, was important in order to decrease the likelihood that participants would interpret a critical pronoun as referring to simply the most frequently moved entity (e.g., the theme, goal, or composite from the preceding trial). We manipulated the following three factors: pronoun type (it vs. that); object type (everyday objects vs. childrens blocks); and whether the prepositional phrase introducing the goal in the rst instruction was on top of or next to. In all conditions, the item that is the theme of the (a) sentence is the predicted focus when the pronoun is encountered. We will refer to the manipulation of the preposition as the location manipulation, since the preposition determined the location of the theme with respect to the goal. The object type manipulation was blocked (1st vs. 2nd half of the experiment) and block order was counterbalanced across participants. Pronoun type and preposition type were both manipulated using a modied Latin square design.2 Each participant was presented with a total of 32 target pronoun sentences; four in each of the eight dierent conditions. The trials using childrens blocks were organized into one pseudo-random order. The six dierent colored blocks were randomly assigned to the sentences for two dierent lists (plus two corresponding reverse-order lists). The trials using everyday objects were organized into a dierent pseudo-random order, and the individual items (such as the trial with the cup, saucer, lamp, and table) were rotated through the four conditions, yielding four lists (plus four reverse-lists). Procedure After the eye-tracker was calibrated, participants were seated at a table on which the experimenter placed the experimental stimuli, which included everyday objects and childrens blocks. Participants were instructed to follow pre-recorded spoken instructions to manipulate the objects. They were told that the task was fairly easy and they should do the rst thing that comes to mind when they heard the instructions. Eye movements were measured using a light-weight, head-mounted ASL
2 An error in list design created an imbalance in one of the blocks lists such that one condition appeared 5, rather than four times (and another condition occurred 3, rather than four times). In the objects lists, each item was accidentally rotated through three of the four conditions, however the rotations were balanced such that each person saw an equal number of items in each condition. Other than these minor deviations, the design followed the Latin squares method.

298

S. Brown-Schmidt et al. / Journal of Memory and Language 53 (2005) 292313

Fig. 2. Experiment 1 referent selections for blocks (A) and objects (B) conditions split by object location (on top/next to), and pronoun (it/that). Grey portion, theme responses; white, composite; black, goal.

Series 5000 eye-tracker. Software superimposed xations on a video-record taken from a 60 Hz camera mounted on the headband. The actions and eye movements of the participants were hand-coded by the experimenters using a frame-accurate digital VCR. Results Referent choice On the experimental trials, e.g., Put the cup next to the saucer. Now put it/ that. . ., participants typically moved either the theme (the cup) or the composite (both the cup and saucer) to the specied location. We will refer to the object(s) they moved as the referent of the pronoun. A small number of responses fell into one of three other categories (accounting for 2% of the data): selection of the goal (e.g., the saucer), one of the other two items in the scene (e.g., the lamp or table, in Example 4), or selecting three or more items. Due to the low frequency of occurrence, we will not discuss those responses except to note that the composite, which was not introduced in a linguistic constituent, was preferred to the goal, which did have a linguistically introduced antecedent. It and that clearly had dierent preferred referents. Participants tended to interpret it as the theme of the preceding utterance and that as the composite (see Figs. 2A and B). However, the availability of a conceptual composite inuenced the interpretation of both it and that. Placing the theme on top of the goal object increased the number of composite interpretations for both it and that. This effect was strongest for the conditions with everyday objects. These observations were conrmed by analyses of variance, using subjects as the random factor. Items analyses for the blocks trials are not informative because the items in the blocks list were very similar to one another, and the six dierent colored blocks were randomly grouped for each trial. An ANOVA by subjects with the proportion of theme responses as the dependent variable and including pronoun type (it/that), object location (on top/next to), and object type (objects/blocks) as factors was signicant for all three main eects, as well as a signif-

icant object type by object location interaction.3 The remaining three-way and the two-way interactions were not signicant (all Fs < 1.5). The main eect of pronoun was due to more theme interpretations for it than for that, F (1, 15) = 46.23, p < .0001. The main eect of object location was due to more theme interpretations in the next to condition compared to the on top condition, F (1, 15) = 35.33, p < .0001. The main eect of object type was due to more theme interpretations in the blocks conditions compared to the objects conditions, F (1, 15) = 6.72, p < .05. The signicant object type by object location interaction, F (1, 15) = 20.54, p < .001, was due to a greater preference for the composite interpretation in the on top/objects condition compared to the on top/blocks condition, t (15) = 3.69, p < .01, and no eect of object type in the next to condition, t (15) = .17, p = .87. A separate items analysis for the objects condition yielded a pattern similar to the subjects analysis. Note that a between-items analysis was used due to the problem in list rotation. The eects of both pronoun, F2 (1, 39) = 49.07, p < .0001, and location, F2 (1, 39) = 97.22, p < .0001 were signicant, and they did not interact. In summary, the preferred interpretation of it was the theme, and the preferred interpretation of that was the composite. However, the interpretation of both pronouns was modulated by factors that aected how easily the theme and goal could be construed as a composite entity. Both pronouns were more likely to be interpreted as the composite when the theme and goal were semantically related objects, and when the theme was placed on top of the goal, making a visually salient grouping. Composite interpretations were strongest in the on top/ objects condition where the semantic and visual grouping cues worked together. In this condition, it and that were interpreted as the composite 40 and 88% of the time, respectively. Despite being linguistically introduced in the immediately preceding sentence, the goal was not a viable referent for interpretation of either pronoun. The fact that the goal was underneath the theme
One item from the objects condition was not included in the analyses because it contained the noun grapes which may not be suitable for reference with the personal pronoun.
3

S. Brown-Schmidt et al. / Journal of Memory and Language 53 (2005) 292313

299

in the on top conditions may have contributed to this effect, however in the next to conditions, the goal was physically as available as the theme, and goal interpretations never exceeded 7% in this condition. Additionally, recall that participants were equally likely to manipulate the top and bottom objects of a composite in the ller sentences, which should reduce bias against manipulating the bottom object. Fixations to potential referents We analyzed the timing with which participants looked at the objects in the display in order to examine the time course with which potential referents were considered. We assume, following other research on anaphora using the visual world paradigm, that looks to an object following an anaphor indicate that the object is being considered as a potential referent (Arnold & Eisenband et al., 2000; Runner, Sussman, & Tanenhaus, 2003). We rst conducted an analysis of eye movements beginning at the onset of the pronoun in phrases like 3b and 4b, and continuing until the participant moved and released an object in response to the command. The average onsets of it and that were 316 and 358 ms after the onset of Now, respectively. To account for this dierence we aligned the eye movement data at the onset of the pronoun in each trial separately. The destination word (e.g., lamp in example 4b) began approximately 400 and 500 ms after the onsets of it and that, respectively. Fixations to an object were dened as a sequence of xations (at minimum two frames) on a single object, or within 1 degree of visual angle of the edge of the object. Following the rst instruction, the theme and goal items were physically very close to one another. Thus, distinguishing looks which are close to both the theme and goal is dicult given the resolution limit of the eye-tracker (.1 degree of visual angle, at best). To resolve this diculty, we implemented a strict coding scheme. For xations near two objects, we counted the xation as being on the object the xation crosshair was closer to. While limitations of the experimental display and tracker resolution increase noise in the eye-tracking data, the coding scheme assures that this noise equally aects the two critical objects. If a participant considers the theme to be the referent, we expect increased looks to the theme beginning approximately 200400 ms following pronoun onset and decreased looks to the goal. In contrast, if a participant only looks to the goal, this would suggest that participants were considering a goal interpretation. However, it is less obvious what the pattern should be when the participant considers the composite. In this situation, the participants might xate the goal object or alternate looking between the theme and goal. Thus, while we expect more variability when participants are considering the composite, when comparing goal vs. composite interpretations, we expect more goal looks

when the participant interprets the pronoun as the goal, and when comparing theme vs. composite interpretations, we expect more theme looks when the participant interprets the pronoun as the theme. First, we present analyses of the proportion of eye xations to the theme and goal across the three conditions that we manipulated: type of object, pronoun, and location. Recall, however, that these analyses are confounded by the fact that participants made dierent numbers of theme and composite choices across conditions. In order to partially account for these dierences in nal interpretation, we next present action-contingent analyses (Runner et al., 2003) in which we separately examine trials for which the participants adopted theme and composite interpretations. Upon the onset of the pronoun in critical phrases, xations were primarily limited to the theme and goal, with a later rise in xations to the destination location and few looks to the other object in the scene. To simplify our analyses, we focus on the proportion of xations to the theme, goal, and destination. Additionally, in order to simplify presentation of the eye xation data, in the eye xation graphs (but not the analyses), we collapse the data from the two dierent object types (blocks and objects) together. Figs. 3A and B show the proportion of xations to the theme, goal, and destination for the next to/it, and next to/that conditions. As with the overall pattern of xations, initial eye movements were primarily to the theme and goal, followed by a later rise in looks to the destination location. Eye movements associated with the pronoun that elicited substantially more looks to the goal object. In the next to/it condition, looks to the goal fall below 20% at about 1100 ms, in contrast, in the next to/ that condition, looks to the goal remain high, reecting the larger number of composite interpretations in this condition. In the on top/it condition, there was an early preference for looks to the theme (Fig. 3C). In contrast, in the on top/that condition (Fig. 3D), we initially see an almost equal proportion of looks to theme and goal, followed by sustained looks to the goal which do not drop o as they do in the it condition. The xation data were analyzed in three 600 ms epochs, beginning 200 ms after the pronoun onset, to allow time for the programming of eye movements (Matin, Shao, & Bo, 1993). The rst two epochs capture the earliest eects we would expect to see based on participants hearing the pronoun. The onset of the destination information (e.g., lamp in example 4b) begins shortly before the beginning of Epoch 2, so we expect to see destination-related eects primarily in Epoch 3. Following Arnold et al. (2003), we analyzed the eyetracking data in terms of theme advantage, in order to evaluate the relative preference for the theme over the

300

S. Brown-Schmidt et al. / Journal of Memory and Language 53 (2005) 292313

Fig. 3. Experiment 1. Relative proportion of xations to theme, goal and destination for next to/it (A), next to/that (B), on top/it (C), and on top/that (D), for both blocks and objects.

goal. The theme advantage was calculated for each segment and condition separately by subtracting the proportion of xations to the goal from the proportion of xations to the theme. Separate (planned) ANOVAs by subjects at each epoch were conducted to examine the time-course of eye movements eects. Theme advantage was used as the dependent variable, and object type (blocks and objects), location (next to/on top), and pronoun (it/that) were used as factors. At the rst and second epochs, we observed only a main eect for pronoun, epoch 1: F1 (1, 15) = 7.91, p < .05, epoch 2: F1 (1, 15) = 17.81, p < .001, due to a higher target advantage score (more theme looks) for it compared to that. Thus the pattern of eye movements reects the participants choices, beginning shortly after the pronoun. The ANOVA at epoch 3 revealed a main eect of pronoun type, F1 (1, 15) = 21.44, p < .001, due to a larger theme advantage for it, as well as a marginal eect of object type, F1 (1, 15) = 3.71, p = .07. This was due to a larger theme advantage for everyday objects compared to blocks. At the third epoch, we also observed a signicant object type by location interaction, F1 (1, 15) = 15.17, p < .01, which was due to a larger theme advantage for next to compared to on top in the objects condition, F1 (1, 15) = 7.42, p < .05, and no difference between next to and on top in the blocks condition, F1 (1, 15) = .35, p = .56. The items analysis by epoch for the objects condition showed a similar pattern; while no eects were signicant at the rst epoch, the pronoun eect was signicant at epoch 2, F2 (1, 39) =

8.23, p < .01. At epoch 3, both the pronoun, F2 (1, 39) = 5.61, p < .05, and location eects, F2 (1, 39) = 5.61, p < .05, were signicant. The interaction did not approach signicance. Taken together, the pattern of looks to potential referents supports two conclusions. First, the strongest and most consistent eect we observed is a higher proportion of theme xations for it compared to that; the pronoun eect was signicant at all three epochs and for both blocks and objects. Second, the interpretation of both pronouns is aected by the availability of a composite, indicated by signicant decreases in the theme advantage when the objects were on top rather than next to each other, as well as an increase in composite selections for the on top condition. The theme preference for it is consistent with the hypothesis that it is primarily used to refer to the most salient entity. However, the increase in o-line composite interpretations for it when a composite was available and the fact that increasing the availability of the composite aected the on-line processing of it, together suggest that non-linguistically mentioned entities can sometimes be more salient than the linguistic focus. Additionally, that showed a composite bias that was also modied by the availability of the composite. In fact, in cases where the composite was available, we observed an increase in composite selections for both it and that. This nding is inconsistent with a purely salience-based account, which would predict that that should be preferentially interpreted as the less salient entity.

S. Brown-Schmidt et al. / Journal of Memory and Language 53 (2005) 292313

301

The eye movement analyses that we have presented thus far combine trials in which participants chose the composite with trials in which participants chose the theme. This is standard practice in studies of reference resolution during reading in which there is no clear measure of the participants nal interpretation. However, this is potentially problematic when comparing anaphors that have dierent preferred interpretations, or when an anaphor has several potential referents, and the preferred referent diers across conditions. The task we used, however, allows us to bypass this problem by separating trials based on the participants nal interpretation of the pronoun. Thus, we conducted action-contingent analyses in which we examined eye movements based on the nal referent choice.

If the eye movement patterns we observed are due to early, response-independent processing driven solely by the anaphor, we might expect to see a signature pattern of early eye movements for each anaphor, regardless of the nal interpretation. Alternatively, the earliest eye movements might reect the nal interpretation. Action-contingent analyses. Due to problems of small sample sizes, we collapsed across the object-type variable and will focus primarily on descriptive analyses. First, we present the data from trials in the on top/that condition in which participants chose the composite (Fig. 4A), because this condition strongly facilitated composite interpretations. Fig. 4B presents xations for trials in the next to/it condition in which participants

Fig. 4. Experiment 1. Relative proportion of xations to theme, goal, and destination for on top/that (chose composite), and next to/it (chose theme), respectively, for blocks and objects together.

Fig. 5. Experiment 1. Relative proportion of xations to theme, goal, and destination for both blocks and objects. (A and B) on top/it condition where participants chose the composite, and the theme (respectively). (C and D) next to/that condition where participants chose the composite, and theme (respectively).

302

S. Brown-Schmidt et al. / Journal of Memory and Language 53 (2005) 292313

chose the theme, because this is the condition that best supported the preferred interpretation of it. The pattern of xations in Fig. 4A shows strong, long-lasting competition between theme and goal, reecting the fact that the participants formed composite interpretations on these trials. In contrast, the xation pattern in 4B shows an immediate theme bias and a quick drop in the xations to the goal. Next, consider the processing of it in the on top condition when participants selected either the composite or the theme (Figs. 5A and B, respectively). The xation pattern when participants chose the theme shows the same pattern as the next to/it condition (Fig. 4B). In addition, when participants selected the composite, there was still an initial theme bias, with the composite pattern emerging later. These results suggest that listeners initially considered the theme interpretation for it on all trials, later rejecting the theme interpretation on some proportion of trials when there was also a composite available. Figs. 5C and D show the eye movements associated with the next to/that condition on trials where participants selected the composite and the theme (respectively). When participants selected the composite (5C), we see the xation pattern associated with composites, with looks equally distributed between the theme and goal. In contrast, when participants chose the theme (5D), we see a substantial theme advantage, comparable to the pattern seen for interpretations of it in the next to condition (Fig. 4B). In both the next to/that and on top/ it conditions, there are two clear patterns of interpretation of the pronoun (as the theme or composite), which are reected not only in the behavioral data, but also in the early on-line data. When it and that were assigned theme interpretations, the pattern of early eye movements was very similar, suggesting that similar referents were considered. However, for composite interpretations there was a suggestion of an initial theme bias for it but not for that. Experiment 1: Conclusions We replicated previous ndings by Linde (1979), Passonneau (1989, 1993), Schuster (1988), and Borthen et al. (1997) that it and that dier in their preferred interpretation, with it preferentially referring to the more focused entity. However, we found that it was not only sensitive to linguistic focus; when a composite was available in the scene (in the objects, on top condition) we observed frequent composite interpretations for it. The fact that the action-contingent analyses showed a theme bias for it regardless of the participants nal choice indicates that it is highly sensitive to the linguistically determined focus even in these cases. However, on the assumption that it preferentially refers to the most salient entity, increases in composite interpretations in the objects/on top condition demonstrate that an entity without a linguistic

antecedent can compete with the most salient entity that has a linguistic antecedent. These results are problematic for models that assume that entities with linguistic antecedents have privileged status during initial reference resolution. The results for that are problematic for models in which reference resolution is based purely on salience. According to a salience account, conditions that increase it-interpretations should decrease that-interpretations because that does not prefer the most salient alternative, whereas it does. However, in the on top conditions where it was more likely to be interpreted as the composite, and thus would sometimes be the most salient alternative, there was an increase in composite interpretations for that. This is consistent with the prediction that the demonstrative pronoun would be interpreted as the composite if one was available, regardless of its salience. In the next to conditions, where a composite was not as available, there was no increase in goal xations or goal interpretations for that. Rather, xations to the theme increased, along with an increase in theme interpretations (compare the xation patterns for the next to/that and the on top/that conditions). This result is not entirely consistent with the hypothesis that that prefers a less-salient entity, which would have predicted a preference for goal referents over themes. Instead, we suggest an alternative explanation that when a complex entity is not available, the demonstrative that is interpreted as referring to a task-relevant entity. Here, the theme may be more relevant to the task than the other objects because it was just moved, and the listener could plausibly assume that it should be moved again, despite the fact that ller trials were designed to decrease the likelihood participants expected a particular kind of continuation. The hypothesis that task-relevant objects are preferred for that when no composite is available will require further exploration. The eye-tracking results for it and that show that initial preferences for the two pronouns diverge starting as early as 200 ms following the onset of the pronoun, with a larger theme bias for it. This pattern, in conjunction with the eects of object location and object type seen in epoch 3 indicate that the eye movement data are highly consistent with overall o-line selections. Moreover, the fact that the eye movement data do not indicate a divergence between the overall o-line response preferences and on-line interpretations suggests that, in the following three experiments, we can further explore participants interpretation preferences examining referent selection alone.

Experiments 2a and 2b: Prosodic eects In Experiment 1, we demonstrated that the interpretation preferences for it and that are clearly dierent,

S. Brown-Schmidt et al. / Journal of Memory and Language 53 (2005) 292313

303

with it preferentially interpreted as the theme and that as the composite, and that the proportion of composite interpretations was modulated by the availability of a composite. However, an important dierence between it and that is that that tends to receive more prosodic stress than it. In the stimuli used in Experiment 1, that was longer than it by an average of 88 ms. The preferred referent of a stressed pronoun is thought to be less salient than the unstressed counterpart, suggesting that stress may be a component of the dierences between the pronouns observed in Experiment 1. In Experiment 2a, we manipulate the availability of a composite and also vary the stress on it and that. Comparing stressed and unstressed variants of that and it is important for at least three reasons. First, we need to establish that the dierence in preferred interpretation for it and that is not simply due to the fact that the pronouns dier in stress. Second, on the assumption that stressed pronouns refer to less salient entities, comparing stressed and unstressed versions of it and that should add to our understanding of the relative salience of the discourse referents. Finally, although there have been a number of proposals about the interpretation of stressed pronouns, experimental research on the issue has been equivocal. Various proposals regarding the use of stress (or pitch accenting) for personal pronouns claim that a stressed pronoun is used when the intended referent is not the focused entity. The classic example (due to Lako, 1971) in (6) illustrates the alternation in meaning between unstressed and stressed versions of the subject pronoun he and object pronoun him. The example in (7) shows a similar alternation (original example due to Akmajian & Jackendo, 1970), but does not contain the second pronoun. (6) a. John called Jim a republican, then he insulted him. b. John called Jim a republican, then HE insulted HIM. (7) a. Lolita slapped Doloris and then she hit Humbert. b. Lolita slapped Doloris and then SHE hit Humbert. By adding stress to the rst pronoun in (6), the interpretation of he shifts from the focus (John) to the second-mentioned (thus less accessible) entity, Jim. Notice that interpretation of the object pronoun also seems to change (see Smyth, 1994, for a discussion of accented object pronouns). Example 7 is more similar to our stimuli in that the sentence only contains a single pronoun. Here, the unaccented she in 7a is typically interpreted as the focus, Lolita, while the accented she in 7b shifts the interpretation to Doloris. Accenting a pronoun is typically thought to shift the interpretation to a less-sa-

lient entity (see Cahn, 1995; Nakatani, 1997), though examples of stressed pronouns in cases where only a single referent is available have been used to suggest that stressed pronouns indicate rhetorical contrast instead (de Hoop, 2004). A dierent approach by Kameyama (1999) proposes that there is a complementary relationship between stressed and unstressed pronouns in the local domain. When two or more possible referents are in that domain, the unstressed pronoun refers to the most salient entity and the stressed pronoun refers to the least salient entity. Previous studies examining the eects of prosodic stress on pronoun interpretation have failed to nd systematic eects when prosody is taken as an isolated cue. Wolters and Byron (2000) analyzed the use of both accented demonstratives and personal pronouns in task-related discourse. They did not nd any prosodic features that could be used reliably to determine attributes of the pronouns referent, such as the distance from the antecedent to the pronoun or the syntactic properties of the antecedent. In another corpus study, stress on subject pronouns with gender was found to be a signal of other discourse properties, such as an implied contrast, rather than a clue to the pronouns referent (Wolters & Beaver, 2001). In an on-line interpretation study, Venditti, Stone, Nanda, and Tepper (2002) and Venditti, Trueswell, Stone, and Nautiyal (2003) found that a stressed subject pronoun causes participants to look at both potential referents until additional material in the sentence helps disambiguate the pronouns meaning. Therefore, she concluded that listeners are unable to use prosodic accent on its own as a disambiguating clue. We are not aware of any online studies comparing stressed and unstressed that. Experiments 2a and 2b investigated how adding stress to the pronouns it and that aected their interpretation. The participants manipulated everyday objects that were not easily viewed as composites (e.g., a frog and candle); we will return to this design change in the discussion. We tested participants using both stressed versions of the pronouns and unstressed versions like those in Experiment 1. Experiment 2a evaluated three alternative hypotheses: The rst is that stressing the personal pronoun shifts the interpretation away from the focus. In order to evaluate this hypothesis, we compare the interpretation of stressed and unstressed versions of it, with a prediction of fewer theme selections for the stressed pronoun. The second hypothesis is that stressing a personal pronoun results in a preference for the least salient entity, i.e., the goal. The third hypothesis is that the primary dierence between unstressed it and that is stress. In order to evaluate this hypothesis, we compare unstressed that with both stressed and unstressed versions of it, with a prediction that unstressed that should be similar to stressed it but dierent from unstressed it.

304

S. Brown-Schmidt et al. / Journal of Memory and Language 53 (2005) 292313

Experiment 2b is a follow-up study in which the materials were recorded dierently. The sentences for Experiment 2a were recorded with natural prosody, which resulted in prosodic dierences between the stressed and unstressed sentences that emerged before the onset of the pronoun. Thus, we cannot be sure whether the eects of stress on it are due to the extra stress on it, or to a change in the entire prosodic contour of the phrase. Experiment 2b used an identical design to Experiment 2a, except for the following changes: (1) We carefully spliced the stressed and unstressed pronouns into the carrier phrases to allow only the pronoun to change between conditions (on average). (2) The word Now was removed from the instructions because the length of Now was strongly affected by the stress on the pronoun. These materials carefully controlled the prosody, but sounded much less natural. Method The methodology used in Experiments 2a and 2b is generally the same as Experiment 1, thus we only note the dierences in the technique. Sixteen participants from the University of Rochester undergraduate community participated in Experiment 2a, and a separate set of 16 participated in Experiment 2b. Each experiment lasted approximately 90 min. Materials An example instruction set is given in (8). All trials used the everyday objects for both experiments. Additionally, the objects were not presented in functionally related pairs as they were in Experiment 1. A full list of materials is included in Appendix A. (8) a. Put the balloon next to the road. b. (Now) put that on the clip. c. Put the hamburger on the balloon. d. (Now) put the road behind the hamburger. (Display includes: a balloon, road, clip and hamburger. Now was not included in the 2b stimuli.) In addition to the pronoun (it/that) and preposition (on top/next to) manipulations, we manipulated whether the pronoun received extra stress. Each participant saw 32 target items, four in each of eight conditions. The target sentences were of the same structure as those in Experiment 1, and were interspersed with 32 llers of the same structure, but which did not contain pronouns. The 32 target items were rotated through the eight conditions in eight lists (plus 8 reverse-order lists) using a modied Latin squares design. The items in each list were randomly ordered, except that adjacent trials were never in the same condition.

Details of the sentence recordings. Stimulus sentences from Experiments 2a and 2b were pre-recorded by the same speaker as in Experiment 1. Each critical sentence (b) was separately recorded in four dierent wayswith it or that, with normal stress or extra stress. The stressed pronouns were the most acoustically prominent word in the sentence, and were audibly dierent than the unstressed versions of the pronouns. We conrmed the stress manipulation by measuring the duration of the stressed and unstressed pronouns, as duration is an acoustic correlate of stress (see discussion in Hirschberg, 1993). We also measured the pitch excursion (maximum pitch) of each pronoun, and performed a ToBI analysis on a subset of the pronouns from each experiment. For Experiment 2a, the mean duration of the stressed and unstressed versions of it were 281 ms (standard error = 8.6) and 70 ms (3.9), respectively, and the mean duration of the stressed and unstressed versions of that were 290 ms (7.1) and 143 ms (6.3), respectively. An ANOVA showed a main eect of stress, F (1, 31) = 600.2, p < .0001, and a main eect for type of pronoun, F (1, 31) = 64.0, p < .0001. A signicant stress by pronoun type interaction, F (1, 31) = 31.71, p < .0001 was due to shorter durations for it than that in the unstressed conditions, paired t (31) = 12.07, p < .0001, and no difference between the pronouns in the stressed condition, t (31) = 1.0, p = .32. Additionally, the unstressed pronouns were similar in length to those used in Experiment 1. The unstressed tokens of it did not dier from those in the Experiment 1 objects condition, paired t (31) = .5, p = .62. The unstressed tokens of that in Experiment 1 were on average 159 ms (6.3), only slightly longer than the tokens in Experiment 2a, t (31) = 1.91, p = .06. The mean pitch excursion for pronouns in Experiment 2a and 2b are presented in Fig. 6. An ANOVA for pitch excursion with stress (stressed, unstressed), and pronoun (it, that) as factors revealed only a main eect of stress, F (1, 29) = 48.82, p < .0001,

Fig. 6. Mean pitch excursion for pronouns in Experiments 2a and 2b.

S. Brown-Schmidt et al. / Journal of Memory and Language 53 (2005) 292313

305

due to larger pitch excursions for the stressed pronouns. The duration of words leading up to the pronoun was also analyzed. Now was longer in the unstressed conditions, for both it, 108.8 ms (4.1) stressed vs. 162.9 ms (2.9) unstressed, and that, 106.6 ms (2.4) stressed vs. 133.4 ms (3.4) unstressed. The stressed versions had longer pauses after the pronoun: pause after it = 22 ms (6.6) stressed vs. 0 ms unstressed, and that = 74 ms (2.8) stressed vs. 52 ms (2.9) unstressed. A ToBI analysis of a randomly selected subset of 40 pronouns from Experiment 2a (10 of each of the four types) conrmed these basic dierences between the unstressed and stressed pronouns. Each of the 10 unstressed tokens of it was unaccented, and each token of stressed it (n = 10) and stressed that (n = 10) received an L + H* accent (Pierrehumbert & Hirschberg, 1990). Six of the unstressed tokens of that were unaccented, while four tokens received an H* accent, due to a slight pause after the pronoun. The ndings for unaccented that are consistent with the duration analyses, and conrm that unaccented that receives more prosodic weight than it. Additionally, the clear dierence in accent type between the unstressed and stressed conditions (unaccented/H* vs. L + H*) indicates that our manipulation of stress was successful. For each item in Experiment 2b, one of the two recordings for it, and one of the two recordings for that was used as a carrier phrase (carrier phrases were equally distributed across the stressed and unstressed conditions). The pronoun was extracted from the second recording and spliced into this carrier phrase. A third sentence for each of the two pronoun conditions was recorded using the same stress as the carrier phrase. The pronoun in these third sentences was spliced into the carrier phrase, to ensure that all pronouns shared the feature of being spliced. Co-articulation with the pronoun made splicing between the pronoun conditions impossible. All splicing was done using SoundEdit software and minimized the presence of odd noises (such as clicks) which can sometimes occur during cross-splicing. While the cross-splicing was done carefully, the sentences were not as natural sounding as those in Experiment 2a because the prosody of the carrier phrase was (by design) not always consistent with the prosody on the pronoun. Despite this inconsistency, we felt it was important to investigate whether the eects we observed in Experiment 2a would persist when only the stress on the pronoun changed (on average) between conditions. An analysis of the duration of Put and the space before the pronoun indicated our eorts at splicing were successful. Except for the expected dierences between the pronouns, there were no dierences between the stressed and unstressed conditions in the length of Put, stressed vs. unstressed for it = 142.0 ms (5.8), 147.0 ms (3.4), and stressed vs. unstressed for that = 96.0 ms (3.1), 95.4 ms (3.3). An ANOVA for the length of Put

with pronoun type (it/that), stress (stressed/unstressed) as factors, found no eect for stress F (1, 31) = .80, p = .38, and no interaction with pronoun type, F (1, 31) = 1.2, p = .29. There were no dierences between the stressed and unstressed versions of either pronoun in the length of the pause after the pronoun, stressed vs. unstressed for it = 0 ms (0), 0 ms (0), and stressed vs. unstressed for that = 29.0 ms (5.1), 30.5 ms (5.4). Finally, an ANOVA for pitch excursion with stress (stressed and unstressed), and pronoun (it, that) as factors revealed only a main eect of stress, F (1, 31) = 25.20, p < .0001, due to a larger pitch excursion for the stressed pronouns. A ToBI analysis of a subset of 40 pronouns indicated that the unstressed tokens of it were all unaccented (n = 10), and the stressed tokens of both it (n = 10) and that (n = 10) all received the L + H* accent. The analysis of the unstressed tokens of that indicated that 2/10 were unaccented, and the remaining eight received the H* accent. The larger proportion of H* accents for unstressed that in Experiment 2b is likely due to dierences in phrasal prosody in Experiment 2b. In order to be able to splice the pronouns, the Now was removed from the critical instructions, and the speaker had to pause slightly after the unaccented pronoun. Experiments 2a and 2b: Results We analyzed the referent selection data in the same manner as Experiment 1, in order to facilitate comparisons between the experiments. We focus on comparing the interpretation of the stressed and unstressed pronouns, to assess whether they are similarly aected by the presence of prosodic stress. The results from Experiments 2a and 2b were similar. The primary dierence between the two experiments was a lower baseline of theme responses in Experiment 2b. Referent choice When participants heard instructions such as Put the balloon next to the road. Now put it/that. . ., they typically moved either the theme (the balloon) or the composite (both the balloon and road) to the specied location. On a small number of trials, participants selected the goal (e.g., the road), however this represented only 4% of the data. Due to the low frequency of occurrence, we will not discuss those responses further except to note that they are inconsistent with Kameyamas (1999) complementarity hypothesis, which predicts that stressed it should prefer goal interpretations. Replicating the patterns seen in Experiment 1, the unstressed pronouns had dierent interpretation preferences, with it interpreted as the theme, and that as the composite, and the interpretation of both pronouns was aected by the location manipulation. In addition, the stressed form of it decreased the proportion of theme

306

S. Brown-Schmidt et al. / Journal of Memory and Language 53 (2005) 292313

interpretations, whereas the stressed form of that had little eect on interpretation preferences. Figs. 7A and B show the proportion of trials in which participants selected the theme, goal or composite for Experiment 2a. Figs. 8A and B show the proportion of trials in which participants selected the theme, goal or composite for Experiment 2b. To add to the power of our analysis and compare the results in Experiments 2a and 2b, we analyzed the data from the two experiments together. An ANOVA for the proportion of theme responses with pronoun (it/ that), object location (on top/next to), and stress (stressed/unstressed), as well as the between-subjects experiment factor (Experiment 2a/2b) revealed a main eect of pronoun type, due to signicantly more theme responses for it, F1 (1, 30) = 42.74, p < .0001, F2 (1, 62) = 248.06, p < .0001. The main eect of location was signicant, F1 (1, 30) = 17.24, p < .001, F2 (1, 62) = 14.39, p < .001, due to more theme responses in the next to condition. The main eect of stress was due to signicantly more theme responses for the unstressed pronouns, F1 (1, 30) = 7.29, p < .05, F2 (1, 62) = 7.0, p < .05. The main eect of Experiment was marginal in the subjects analysis, and signicant in the items analysis, F1 (1, 30) = 4.14, p = .05, F2 (1, 62) = 141.5, p < .0001, and was due to fewer theme responses in Experiment 2b. The pronoun by stress interaction was signicant, F1 (1, 30) = 9.68, p < .01, F2 (1, 62) = 9.39, p < .01, and was due to a signicant decrease in theme responses for stressed compared to unstressed versions of it, F1 (1, 30) = 9.88, p < .01,

F2 (1, 62) = 12.50, p < .001, and no eect of stress for that, F1 (1, 30) = 1.0, p = .34, F2 (1, 62) = .55, p = .46. We observed an interaction between pronoun and experiment that was signicant only in the items analysis, F1 (1, 30) = 1.73, p = .2, F2 (1, 62) = 15.02, p < .001. This interaction is likely due to the fact that the theme bias for it was slightly larger in Experiment 2a, F1 (1, 15) = 52.31, p < .0001, F2 (1, 31) = 121.98, p < .0001, than Experiment 2b, F1 (1, 15) = 9.66, p < .01, F2 (1, 31) = 104.63, p < .0001. We also observed a signicant interaction between location and experiment, F1 (1, 30) = 5.56, p < .05, F2 (1, 62) = 5.28, p < .05. This interaction was a result of the fact that the location eect was not signicant for Experiment 2b, F1 (1, 15) = 1.92, p = .19, F2 (1, 31) = .9, p = .35, whereas for Experiment 2a we observed signicantly more theme responses in the next to than the on top condition, F1 (1, 15) = 18.26, p < .001, F2 (1, 31) = 12.82, p < .01. A closer look at the response pattern for Experiment 2b suggests a hint of a location eect for the goal responses, with more goal selections in the next to condition than the on top condition, however the small number of trials and numerous missing cells makes a statistical comparison impossible. In summary, we replicated the patterns seen in Experiment 1 for the unstressed pronouns; basic dierences in the interpretation of it and that were modulated by factors that increased the availability of the composite. The data from the stressed pronouns adds to our understanding of these interpretation preferences by showing that adding stress to it decreases theme inter-

Fig. 7. Experiment 2a: proportion of theme, goal, and composite selections for stressed (str) and unstressed (unst) it and that in the next to (A) and on top (B) conditions. Grey portion, theme responses; white, composite; black, goal.

Fig. 8. Experiment 2b: proportion of theme, goal, and composite selections for stressed (str) and unstressed (unst) it and that, in the next to (A) and on top (B) conditions. Grey portion, theme responses; white, composite; black, goal.

S. Brown-Schmidt et al. / Journal of Memory and Language 53 (2005) 292313

307

pretations, but stress has no eect on the demonstrative that. Finally, although the object pairs we used in this experiment did not form natural groups as they did in Experiment 1, we observed an almost identical pattern of interpretation for the unstressed pronouns as we did in the Experiment 1 objects conditions, and more composite interpretations than in the blocks condition. One reason why participants may have easily interpreted the mismatching objects as a composite is that they may have invented idiosyncratic groupings for the object pairs. The Experiments 2a and 2b object pairs have more features than those in the blocks condition, thus composites may be more likely with the objects. Example 9 shows one of the object pairs in Experiment 2a, which included a frog and a dolls leg. (9) a. Put the leg on top of the frog. b. Now put that next to the boat. This particular trial elicited 100% composite responses when the pronoun was that (regardless of the object location manipulation), perhaps because participants associated frogs and legs (as in frog-leg soup) or possibly because (in the on top condition), the leg balanced nicely on top of the frog. Unlike the blocks of Experiment 1, objects with multiple salient features may be more able to form a conceptual composite, prompting composite interpretations of that. Experiments 2a and 2b: Conclusions In summary, we replicated the o-line ndings of Experiment 1 for an overall theme bias for it, and a pragmatic eect for both pronouns, due to increasing the availability of the composite. Additionally, we found that stressing it decreased the proportion of theme interpretations, due to an increase in the proportion of both composite and goal interpretations. This result supports the rst hypothesis, that stressing the personal pronoun shifts the interpretation away from the focus, and is consistent with previous claims that stressed pronouns refer to non-focused or less salient entities (Akmajian & Jackendo, 1970; Cahn, 1995; Lako, 1971; Nakatani, 1997). However, the dominant interpretation of stressed it was still the theme, a nding that is somewhat inconsistent with the second hypothesis, that stressing a personal pronoun results in a preference for the least salient entity (e.g., the goal). While the basic theme preference casts doubt on Kameyamas (1999) complementary preference hypothesis, the small increase in goal responses for stressed it is consistent with her basic claim. The results for the demonstrative showed a dierent pattern, with no dierence between the stressed and unstressed versions of that, and more composite responses compared to stressed it. These results are inconsistent with the third hypothesis that the primary dierence between un-

stressed it and that is stress. The ndings for that instead suggest that the demonstrative may be insensitive to added stress, perhaps because it already receives more prosodic weight than the personal pronoun, as evidenced by the H* accents on some of the unstressed tokens of that. Systematically manipulating the type of accent (unaccented, H*, L + H*) could provide insights into the role, if any, of stress on the interpretation of that. The fact that we observed eects of stress in Experiment 2b that were comparable to those in Experiment 2a suggests that the eect of prosody for it in Experiment 2a was not primarily due to dierences in the intonation of the carrier phrase, but specically due to the change in the stress on the pronoun. The consistent lack of an eect of stress for that is likely due to the fact that that normally receives more stress than it, and that stress on the demonstrative pronoun is not interpreted contrastively as it is for the personal pronoun. Taken together, the results for unstressed and stressed versions of it, combined with the results for both versions of that, show a basic dierence between the pronouns, as well as a stress eect for it, with unstressed it having the largest number of theme interpretations, that (stressed and unstressed) having the fewest theme interpretations, and stressed it somewhere in-between. The fact that responses to stressed it are in-between those for unstressed it and both versions of that, even when the composite is most competitive with the theme (on top conditions), supports three conclusions: (a) the personal pronoun prefers the most salient entity; (b) stressing it weakens this preference; and (c) the primary dierence between the pronouns is not due to stress. Rather, the demonstrative that preferentially refers to composite or taskrelevant entities. Finally, in comparing the results of Experiments 2a and 2b, the numerically lower rate of theme responses in Experiment 2b is likely due to the fact that the sentence prosody was odd, a result of the cross-splicing manipulation. Additionally, removing Now from the critical pronoun sentences in Experiment 2b may have decreased the continuity of the instructions. When Now is used as a discourse marker (as it is in these contexts), it can mark a change of focus in the discourse (Grosz & Sidner, 1986), or a further development of the previous topic (Reichman, 1985). Removing Now may have decreased the likelihood that the second instruction was interpreted as a continuation of the rst, thus mitigating the relative salience of the theme established in the rst instruction, and resulting in a decrease in overall theme selections.

Experiment 3: Discourse coherence In each of the preceding experiments, the theme (e.g., the cup in example 4a) has the highest salience according

308

S. Brown-Schmidt et al. / Journal of Memory and Language 53 (2005) 292313

to grammatical role ranking when this rst sentence concludes. However, the inuential centering framework (Grosz et al., 1995) asserts that a within-sentence ranking of salience is not as reliable a predictor of pronoun interpretation preferences as a ranking that takes into account cross-sentence coherence. Centering theory predicts that, in order to maximize discourse coherence, pronouns preferentially select for referents that are minimally oblique in the previous sentence and that maintain topicality across sequences of sentences. A typical obliqueness ranking is Subject < Direct Object < Indirect Object < Adjuncts. Centering denes the backward-looking center as the least oblique element of the previous sentence which is repeated in the current sentence. Crucially, according to this denition, the rst sentence of a discourse has no backward-looking center, and therefore the preferred pronoun interpretation in the second sentence of a discourse is only weakly established. Thus, it is not until the third sentence of a discourse that both of the centering rules come into play for pronoun interpretation preferences. Recall the materials used in Experiment 1, which are repeated in (10). Following 10a, the highest ranked entity, the cheese, should be the preferred referent for a subject pronoun in the following sentence, 10b. However, the fact that there is no preceding linguistic context before 10a, causes predictions regarding interpretation of a pronoun in 10b to be less clear. The possibility that the theme was not adequately focused at the onset of the pronoun may account for the large (up to 40%) number of trials in which our participants interpreted the personal pronoun it as referring to a relatively less salient entity such as the composite. In Experiment 3, we used a two-sentence sequence to establish the theme as the most salient entity according to both grammatical role ranking and Cb-continuity. If it was frequently interpreted as the composite in Experiments 1 and 2 because the theme was not the backward-looking center, we should observe more theme interpretations for it when we have two context sentences, rather than the single context sentence as in Experiments 1 and 2b. Method To directly compare the results of Experiment 3 with those of Experiment 1, we returned to using object pairs that were conceptually related (see object list in Appendix A). In addition to manipulating the pronoun (it/that), and the preposition (next to/on top), we manipulated whether there was a single context sentence before the critical pronoun sentence (as in Experiments 1 and 2b), or two context sentences (see examples 10 and 11). The one-sentence condition (10) diers from the two-sentence condition (11), in that (11) more clearly establishes the b-sentences theme as

the focus when the sentence containing the critical pronoun begins: (10) a. Put the cheese next to the cracker. b. Now put it/that in front of the candle. c. Put the candleholder behind the candle. d. Now put the candleholder next to the candle. (11) a. Put the candle and the cheese to the right of the candleholder. b. Now put the cheese next to the cracker. c. Now put it/that in front of the candle. d. Now put the cheese and the candle in front of the candleholder. In sentence 10b, when the participant encounters the pronoun, two of the four objects have been mentioned in the discourse. The cheese is of higher salience than the cracker, because it has been mentioned rst, however it is not the backward-looking center of sentence 10a, because there was no prior sentence. In contrast, in 11c, when the participant hears the pronoun, the cheese is the most salient entity in the discourse because it was mentioned before the cracker in the previous sentence. The cheese is also the backward-looking center of 11b, because it is the highest ranked (and only) discourse entity in 11b that was also mentioned in 11a. The fact that the cheese is more clearly established as the focus in (11) predicts a stronger theme preference for it in 11c, than in 10b, on some accounts. The ller trials did not contain pronouns and were constructed in the same way as Experiments 1 and 2b. Equating the probability of moving each of the objects on ller trials makes the sequences of instructions less predictable. However, an eect of discourse coherence should be observable when evaluating the eect of the number of context sentences on the interpretation of the pronoun. Results The results of Experiment 3 are comparable to the o-line results in the Experiment 1 objects condition. We observed no eect of the focusing manipulation. Thus, the fact that the theme was not the backwardlooking center in Experiments 1 and 2b is not responsible for the high proportion of composite responses. Instead, it appears to be sensitive to the availability of the composite. Figs. 9A and B show the proportion of theme, goal and composite responses in each of the eight conditions. As in Experiments 1 and 2b, the location of the theme and goal objects at the beginning of the critical instruction clearly aected the degree to which participants interpreted the pronoun as the composite, with far more composite interpretations in the on top conditions for both pronouns.

S. Brown-Schmidt et al. / Journal of Memory and Language 53 (2005) 292313

309

Fig. 9. Experiment 3. Proportion of theme, goal, and composite selections for trials with two context sentences before the pronoun (A), and trials with one context sentence before the pronoun (B), split by pronoun (it/that) and location (next to/on top). Grey portion, theme responses; white, composite; black, goal.

As before, we analyze the proportion of theme responses. Less than 4% of the responses were goal selections, thus higher proportions indicate fewer composite responses. An ANOVA for the proportion of theme responses with pronoun (it/that), object location (on top/ next to), and focus (2 sentence/3 sentence) as factors, revealed a main eect of pronoun, which was due to more theme responses in the it condition, F1 (1, 15) = 47.81, p < .0001, F2 (1, 31) = 324.0, p < .0001. The main eect of object location was due to more theme responses in the next to condition, F1 (1, 15) = 23.97, p < .001. There was no eect of focus, F1 (1, 15) = .002, p = .97, F2 (1, 31) = .00, p = .99. The pronoun factor interacted with object location, F1 (1, 15) = 4.93, p < .05, F2 (1, 31) = 23.25, p < .0001. Separate ANOVAs indicated that this interaction was due to a larger location eect for that than for it. The eect for it was small, 98% (SD = 2%) theme responses for next to vs. 88% (4.8%) for on top, F1 (1, 15) = 7.8, p < .05, F2 (1, 31) = 9.2, p < .01, whereas the eect for that was much larger, 52% (5%) theme responses for next to vs. 23% (5%) theme responses for on top, F1 (1, 15) = 15.76, p < .01, F2 (1, 31) = 38.76, p < .0001. The location by focus interaction was not signicant for it in either the subjects or items analysis, F1 (1, 15) = .72, p = .41, F2 (1, 31) = .81, p = .37, nor was it signicant for that, F1 (1, 15) = 2.89, p = .11, F2 (1, 31) = 1.09, p = .30. Experiment 3: Conclusions In summary, the results of Experiment 3 clearly replicate the results seen in Experiments 1 and 2b for the 2sentence condition. We replicated the theme bias for it as well as the main eect of location, suggesting that both pronouns are aected by increasing the availability of the composite. Additionally, we observed a signicant pronoun by location interaction in this condition, due to a slightly weaker location eect for it than that, indicating that that was particularly sensitive to the availability of a composite in the scene. Crucially, the number of context sentences did not aect interpreta-

tion. Thus the frequent interpretation of it as the composite is not due to failure to establish the theme as the backward-looking center in Experiments 1 and 2b.

General conclusions In this series of experiments we investigated the on and o-line interpretation of the personal pronoun it and the demonstrative pronoun that. In Experiment 1, we found a dierence between the pronouns in both the eye movement and the referent selection data, with it preferentially referring to the theme, and that referring to the composite. Preferences for both pronouns were modulated by the availability of the composite, as manipulated by the location of the two most recently mentioned objects (e.g., on or next to). On the assumption that it refers to the most salient entity in a discourse, the increase in composite interpretations for it in some conditions indicates that a referent without a linguistic antecedent can compete with the most salient linguistically introduced referent. Consistent with this conclusion is the fact that composite interpretations were overwhelmingly preferred to goal interpretations in every condition. These ndings are consistent with salience-based approaches, so long as linguistically introduced antecedents are not given priority over non-linguistically introduced antecedents. However, given a salience-based approach, if the large proportion of composite selections for it in the on top/objects condition means that the composite can sometimes be more salient than the theme, then decreased composite interpretations would be predicted for that in these cases. However, that is not what we found. Instead, the conditions that increased composite interpretations for it also increased composite interpretations for that, suggesting that a purely salience-based approach may be incomplete. The results for Experiments 2a and 2b replicate the basic ndings of Experiment 1 with object pairs that do not form natural composites. The main nding of these experiments is that adding stress to it weakens

310

S. Brown-Schmidt et al. / Journal of Memory and Language 53 (2005) 292313

the theme preference, but adding stress to that has no effect. Finally, the results of Experiment 3 replicate the o-line results of Experiment 1, and suggest that it is sometimes interpreted as the composite even when the theme is both the discourse focus as well as the most salient item of the prior sentence. Taken together, our ndings for that are most consistent with the complex-entity hypothesis, and our ndings for it are most consistent with the salience hypothesis, so long as non-linguistically introduced referents can be as salient as referents which were linguistically introduced. These results support ve conclusions. First, while the relationship between dierent referring forms and their typical referents is clearly aected by salience, as described in Gundel et al. (1993), Ariel (1990), and Gi von (1983), understanding the preference for dierent anaphoric forms must also take into account how each form weighs the multiple factors that can inuence reference resolution. These factors cannot easily be subsumed under a uniform dimension, such as salience, even when augmented by Gricean inference. Cross-linguistic support for a reference-specic framework comes from work by Kaiser (2003) and Kaiser and Trueswell (in press), which identied two factors, grammatical role and salience, that dierentially inuence the interpretation of demonstrative and personal pronouns in Finnish. Second, the interpretation of both personal and demonstrative pronouns is strongly inuenced by non-linguistic factors such as expectations about the task, goals and the visual environment. This nding is consistent with a growing body of research demonstrating social, taskbased, and contextual inuences on language understanding (Bangerter, 2004; Beun & Cremers, 1998; Chambers, Tanenhaus, Eberhard, Filip, & Carlson, 2002; Chambers, Tanenhaus, & Magnuson, 2004; Grodner & Sedivy, in press; Metzing & Brennan, 2003). Therefore, models of reference interpretation and computational algorithms for reference resolution must be expanded to account for these factors. Third, entities without linguistic antecedents can be more salient than entities that have linguistic antecedents. This nding is consistent with the observation that many computational algorithms perform poorly when assigning reference for demonstratives because demonstratives do not typically have a linguistically introduced antecedent (Byron, 2002). The fact that a referent which does not have an antecedent which is a linguistic constituent is sometimes preferred over one that does, highlights the need to understand: (a) how to identify referents which are non-linguistically introduced into discourses, and (b) how to quantify the salience of these referents, in comparison to those which are introduced linguistically. Fourth, stressing a personal pronoun shifts interpretation away from the focused or most salient entity. This nding provides experimental support for claims made by Akmajian and Jackendo (1970), Cahn (1995), Kameyama (1999),

Lako (1971), and Nakatani (1997). However, the related nding that the dominant interpretation of stressed it is still the focused entity is not consistent with the original claims of Kameyama (1999) that the preferred referent of a stressed pronoun is the least salient referent in the local domain. We suspect that given the right circumstances, adding stress to it might eliminate the bias to refer to the focused entity, however more work is needed to identify and understand these factors. Finally, and most generally, in future research, it will be important to augment salience-based models of reference resolution to take into account the dierent constraints associated with each referential form. Models of reference resolution will be increasingly successful at characterizing the interpretation of dierent kinds of expressions in a variety of settings, including face-to-face conversation, narratives and task-based dialog when we can: (a) identify the reference-specic factors that guide the use and understanding of each type of referential form and (b) characterize and quantify the inuence of non-linguistic and contextual factors on reference use and understanding, especially for referents that do not have linguistic antecedents.

Appendix A
Experiment 1: Object pairs 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. soap; soap dish lamp; table cup; saucer car; toy road anchor; boat egg; small nest buttery; ower bird; large nest slice of cheese; cracker dragony; lotus plant hamburger; plate candle; candleholder bow; present bee; bunch of grapes ower pot; pot holder picture; picture frame

Experiment 2a: Object pairs 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. anchor; corkscrew balloon; road bird; zip-lock bag bow; spoon bowl; can bug; sponge buttery; frame car; block clip; hamburger cracker; boat crayon; pillow cup; present (continued on next page)

S. Brown-Schmidt et al. / Journal of Memory and Language 53 (2005) 292313 Appendix A (continued) 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. doll; plate dragony; board duck; can opener egg; legal pad ower; hot pad hat; bear lamp; cd leg; frog lightbulb; candle lotus plant; candleholder nest; soap nut; picture plant; rug post-it pad; table rock; notebook shovel; cd case spider; saucer teabag; jar truck; sock whisk; soap dish Appendix A (continued) 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. buttery; pink ower candle; candleholder car; toy road cheese; cracker cup; saucer dragony; lotus plant egg; small nest hamburger; plate lamp; table picture; picture frame plant; rug owerpot; pot holder soap; soapdish

311

References
Akmajian, A., & Jackendo, R. (1970). Coreferentiality and stress. Linguistic Inquiry, 1, 124126. Almor, A. (1999). Noun-phrase anaphora and focus: The informational load hypothesis. Psychological Review, 106, 748765. Ariel, M. (1990). Accessing noun-phrase antecedents. London: Routledge. Arnold, J. E., Eisenband, J., Brown-Schmidt, S., & Trueswell, J. C. (2000). The rapid use of gender information: Evidence of the time course of pronoun resolution from eyetracking. Cognition, 76, B13B26. Arnold, J. E., Fagnano, M., & Tanenhaus, M. K. (2003). Disuencies signal theee, um, new information. Journal of Psycholinguistic Research, 32, 2536. Arnold, J. E., Wasow, T., Losongco, T., & Ginstrom, R. (2000). Heaviness vs. Newness: The eects of structural complexity and discourse status on constituent ordering. Language, 76, 2855. Baldwin, B. (1997). Cogniac: High precision coreference with limited knowledge and linguistic resources. In Operational factors in practical, robust anaphora resolution for unrestricted texts (ACL-97 workshop) (pp. 3845). Bangerter, A. (2004). Using pointing and describing to achieve joint focus of attention in dialogue. Psychological Science, 15, 415419. Beun, R.-J., & Cremers, A. H. M. (1998). Object reference in a shared domain of conversation. Pragmatics & Cognition, 6, 121151. Borthen, K., Fretheim, T., & Gundel, J. K. (1997). What brings a higher-order entity into focus of attention? Sentential pronouns in English and Norwegian. In R. Mitkov & B. Boguraev (Eds.), Operational factors in practical, robust anaphora resolution for unrestricted texts (pp. 8893). Association for computational linguistics. Byron, D. K. (2002). Resolving pronominal reference to abstract entities. In Proceedings of the 40th annual meeting of the association for computational linguistics (ACL-02) (pp. 8087). Association for Computational Linguistics. Byron, D. K., & Allen, J. F. (1998). Resolving demonstrative pronouns in the TRAINS93 corpus. In New approaches to discourse anaphora: proceedings of the second colloquium on (continued on next page)

Experiment 2b: Object pairs 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. anchor; can basket; board bird; can of tuna bow; cup buttery; soap can opener; picture car; hot pad cd; eraser chocolate; boat clip; jar corkscrew; red block cracker; bowl doll; pot dragony; frame grapes; candle hamburger; frog hat; present lamp; little pink plate leg; big plate lightbulb; cd case lotus plant; table nest; candleholder notebook; jar of coee ower; zip-lock bag post-it pad; soap dish scissors; bear shovel; pitcher sock; saucer sponge; toy road teabag; potholder tongs; little legal pad vase; deck of cards

Experiment 3: Object pairs 1. anchor; boat 2. bird; large nest 3. bow; present

312

S. Brown-Schmidt et al. / Journal of Memory and Language 53 (2005) 292313 discourse. In Proceedings, eighth annual conference of the cognitive society (pp. 96101). Lawrence Erlbaum. Kaiser, E. (2003). The quest for a referent: A crosslinguistic look at reference resolution. Unpublished Doctoral Dissertation, University of Pennsylvania, Philadelphia. Kaiser, E., & Trueswell, J. C. (in press). Investigating the interpretation of pronouns and demonstratives in Finnish: Going beyond salience. In E. Gibson & N. Pearlmutter (Eds.), The processing and acquisition of reference. Cambridge, MA: MIT Press. Kameyama, M. (1999). Stressed and unstressed pronouns: Complementary preferences. In P. Bosch & R. van der Sandt (Eds.), Focus: Linguistic, cognitive and computational perspectives. Cambridge: Cambridge University Press. Kamp, H. (1981). A theory of truth and semantic representation. In J. Groenendijk, T. Janssen, & M. Stokhof (Eds.), Formal methods in the study of language (pp. 277322). Amsterdam: Mathematical Center Tract 135. Kaup, B., Kelter, S., & Habel, C. (2002). Representing referents of plural expressions and resolving plural anaphors. Language and Cognitive Processes, 17, 405450. Lako, G. (1971). Presupposition and relative well-formedness. In D. D. Steinberg & L. A. Jakobovits (Eds.), Semantics: An interdisciplinary reader in philosophy, linguistics, and psychology (pp. 329340). Cambridge: Cambridge University Press. Linde, C. (1979). Focus of attention and the choice of pronouns in discourse. In T. Givon (Ed.), Syntax and semantics 12: Discourse and syntax. New York: Academic Press. Matin, E., Shao, K. C., & Bo, K. R. (1993). Saccadic overheadinformation-processing time with and without saccades. Perception & Psychophysics, 53, 372380. Metzing, C., & Brennan, S. E. (2003). When conceptual pacts are broken: Partner-specic eects on the comprehension of referring expressions. Journal of Memory and Language, 49(2), 201213. Nakatani, C. (1997). The computational processing of intonational prominence: A functional prosody perspective. Harvard University, Unpublished Doctoral Dissertation, Cambridge, MA. Nicol, J., & Swinney, D. A. (1989). The role of structure in coreference assignment during sentence comprehension. Journal of Psycholinguistic Research, 18(Special Issue: Sentence Processing), 519. Passonneau, R. J. (1989). Getting at discourse referents. In Proceedings of the 27th annual meeting of the association for computational linguistics (ACL-89) (pp. 5159). Association for Computational Linguistics. Passonneau, R. J. (1993). Getting and keeping the center of attention. In M. Bates & R. Weischedel (Eds.), Challenges in natural language processing (pp. 179226). Cambridge: Cambridge University Press. Pierrehumbert, J., & Hirschberg, J. (1990). The meaning of intonational contours in the interpretation of discourse. In P. R. Cohen, J. Morgan, & M. E. Pollack (Eds.), Intentions in communication (pp. 271311). Cambridge, MA: MIT Press. Reichman, L. (1985). Getting computers to talk like you and me: Discourse context, focus and semantics. Cambridge, MA: MIT Press. Runner, J. T., Sussman, R. S., & Tanenhaus, M. K. (2003). Assignment of reference to reexives and pronouns in

discourse anaphora and anaphor resolution (DAARC2) (pp. 6881). Cahn, J. (1995). The eect of pitch accenting on pronoun referent resolution. In Proceedings of the 33rd conference on association for computational linguistics(ACL-95) (pp. 290 293). Association for Computational Linguistics. Carreiras, M. (1997). Plural pronouns and the representation of their antecedents. European Journal of Cognitive Psychology, 9, 5387. Chambers, C. G., Tanenhaus, M. K., Eberhard, K. M., Filip, H., & Carlson, G. N. (2002). Circumscribing referential domains during real-time language comprehension. Journal of Memory and Language, 47(1), 3049. Chambers, C. G., Tanenhaus, M. K., & Magnuson, J. S. (2004). Actions and aordances in syntactic ambiguity resolution. Journal of Experimental Psychology: Learning, Memory and Cognition, 30(3), 687696. Channon, R. (1980). Anaphoric that: A friend in need. In J. Kreiman & A. Ojeda (Eds.), Papers from the parasession on pronouns and anaphora (pp. 98109). Chicago: Chicago Linguistic Society. de Hoop, H. (2004). On the interpretation of stressed pronouns. In R. Blutner & H. Zeevat (Eds.), Optimality theory and pragmatic. Houndmills, Basingstoke, Hampshire: Palgrave Macmillan. Eckert, M., & Strube, M. (2000). Dialogue acts, synchronizing units and anaphora resolution. Journal of Semantics, 17, 5189. Garnham, A. (2001). Mental models and the interpretation of anaphora. Philadelphia: Psychology Press Ltd.. Givon, T. (1983). Topic continuity in discourse: A quantitative cross-language study. Amsterdam: John Benjamins. Gordon, P. C., Grosz, B. J., & Gilliom, L. A. (1993). Pronouns, names and the centering of attention in discourse. Cognitive Science, 17, 311347. Grodner, D., & Sedivy, J. C. (in press). The eect of speakerspecic information on pragmatic inferences. In E. Gibson & N. Pearlmutter (Eds.), The Processing and Acquisition of Reference. Cambridge, MA: MIT press. Grosz, B. J. (1977). The representation and use of focus in a system for understanding dialogs. In Proceedings of the fth international joint conference on articial intelligence (IFCAI-77) (pp. 6776). Cambridge, MA. Grosz, B. J., Joshi, A. K., & Weinstein, S. (1995). Centering: A framework for modelling the local coherence of discourse. Computational Linguistics, 21, 203226. Grosz, B. J., & Sidner, C. L. (1986). Attention, intentions, and the structure of discourse. Computational Linguistics, 12(3), 175204. Gundel, J. K., Hedberg, N., & Zacharski, R. (1993). Cognitive status and the form of referring expressions in discourse. Language, 69, 274307. Heim, I. (1983). File change semantics and the familiarity theory of deniteness. In R. Bauerle, C. Schwarze, & A. v. Stechow (Eds.), Meaning, use, and the interpretation of language. Berlin: de Gruyter. Hirschberg, J. (1993). Pitch accent in context: Predicting intonational prominence from text. Articial Intelligence, 63, 305340. Hudson, S. B., Tanenhaus, M. K., & Dell, G. S. (1986). The eect of the Discourse Center on the local coherence of a

S. Brown-Schmidt et al. / Journal of Memory and Language 53 (2005) 292313 picture noun phrases: Evidence from eye movements. Cognition, 89, B1B13. Sanford, A. J., & Garrod, S. C. (1989). What, when, and how? Questions of immediacy in anaphoric reference resolution. Language and Cognitive Processes, 4, 235263. Sanford, A. J., & Moxey, L. M. (1995). Notes on plural reference and the scenario-mapping principle in comprehension. In C. H. G. Rickheit (Ed.), Focus and cohesion in discourse. Berlin: de Gruyter. Sanford, A. J., Sturt, P., Moxey, L., & Morrow, L. (2004). Production and comprehension measures in assessing plural object formation. In M. Carreiras & J. C. Clifton (Eds.), On-line sentence processing: ERPS, eye movements and beyond. (pp. 151168). Psychology Press. Schuster, E. (1988). Anaphoric reference to events and actions: Evidence from naturally occurring data, No. Technical Report MS-CIS-88-13. University of Pennsylvania LINC LAB, Philadelphia. Smyth, R. (1994). Grammatical determinants of ambiguous pronoun resolution. Journal of Psycholinguistic Research, 23, 197229. Strube, M. (1998). Never look back: An alternative to centering. In Proceedings of the 36th annual meeting of the association for computational linguistics (ACL 98) (pp. 12511257). Association for Computational Linguistics. Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M., & Sedivy, J. C. (1995). Integration of visual and linguistic information in spoken language comprehension. Science, 268, 16321634. Tetreault, J. R. (2001). A corpus-based evaluation of centering and pronoun resolution. Computational Linguistics, 27, 507520.

313

van Gompel, R. P. G., & Majid, A. (2004). Antecedent frequency eects during the processing of pronouns. Cognition, 90, 255264. Venditti, J. J., Stone, M., Nanda, P., & Tepper, P. (2002). Discourse constraints on the interpretation of nuclearaccented pronouns. In Proceedings of the 2002 international conference on speech prosody. Aix-en-Provence, France. Venditti, J. J., Trueswell, J. T., Stone, M., & Nautiyal, K. (2003, March). On-line accented pronoun interpretation in discourse context. Paper presented at the 16th CUNY Sentence Processing Conference, Cambridge, MA. Vonk, W., Hustinx, L. G., & Simons, W. H. (1992). The use of referential expressions in structuring discourse. Language and Cognitive Processes, 7, 301333. Walker, M. A., Joshi, A. K., & Prince, E. F. (Eds.). (1998). Centering theory in discourse. Oxford: Clarendon Press. Webber, B. L. (1979). A formal approach to discourse anaphora. New York: Garland. Webber, B. L. (1991). Structure and ostension in the interpretation of discourse deixis. Language and Cognitive Processes, 6(2), 107135. Winograd, T. (1972). Understanding natural language. New York: Academic Press. Wolters, M., & Beaver, D. I. (2001). What does he mean? Paper presented at the Proceedings of the Twenty-Third Annual Meeting of the Cognitive Science Society. Wolters, M., & Byron, D. (2000). Prosody and the resolution of pronominal anaphora. In Proceedings of the 18th international conference on computational linguistics (COLING) (pp. 287292).

You might also like