You are on page 1of 10

Aiming for a Rashomon Storyteller

Alejandro Ramírez, Josep Blat

Interactive Technology Group, Department of Technology, Pompeu Fabra University


Barcelona, Spain
{alejandro.ramirez, josep.blat}@upf.edu
http://www.tecn.upf.es/gti/

Abstract. The relevance of the visual components in narrative is unarguable in


conventional media such as cinema or video, but only a small number of
research projects have been headed towards a cinematic approach in Interactive
Storytelling. While some developments use cinematic ontologies as aids to
support the story being told, no real attempts have been made to devise a
system that is not only capable of selecting appropriate shots to support a given
piece of a story, but is actually able to develop a story through primarily camera
positions and shots. This paper presents an approach to develop a storytelling
test bed capable to generate story pieces through mainly camera positions and
cinematic rules, and briefly describes its purposes and research opportunities
within a Rashomon context: truth within a story is relative and subject to
different points of view.

1 Introduction

The impressive evolution of computer games and game engines provides researchers
a rich, visual, real-time environment to narrate stories. The potential for storytelling,
immersion and narrative within Virtual Environments (VE’s) developed with current
game engines is every time closer to cinema and films in terms of its visual
capabilities and appearance. However, only few research efforts approach storytelling
by using the cinematic expertise and cinematic rules to support the visual component
of interactive stories (e.g. providing visual designs that change according to plot or
interaction), with elements such as automatic camera placement or the election of the
best possible shot with some constraints.
Currently, most of the efforts related to the visual elements also include the
analysis of the narrative components (e.g. structure, plot, characters) and hence the
story generated, or narrated, is supported by different autonomous processes both in
the visual and the narrative parts. As of now, the story is supported by its visual
presentation (e.g. aiming to emphasize elements, choosing the correct shot), but no
research has been headed towards using camera shots to generate a story by mainly
the presentation of these shots. This latter approach develops a basic idea: the story
can be understood as the mere result of the images presented, a fact that provides the
camera a very large authorial responsibility. We understand that cinematic rules
applied to a storytelling system allow the visual presentation to narrate a story entirely
by camera positions and shots. We claim that, under these principles, we can generate
and narrate a story by merely the correct real-time election of camera shots, and basic
information about the scene and story. We propose using the elements within a
storytelling system (e.g. actors and their variables, scenario and its constraints, basic
narrative elements) and presenting them in a correct sequence that results in a story,
without the need of a large or complex authorial component: the camera can actually
obtain an important authorial task.
The apparent elimination of the authorial element in the conventional sense (or the
authorial component that becomes the base of authored narratives) does not aim to
minimize the author's role, but to present a principle in terms of the different possible
points of view that can prevail when no real author exists. With these conditions, all
the versions of a story are the result of some interpretation of a very basic plot and
some story elements. This principle is named Rashomon, after a film with that name.
In this paper, we present research efforts related to a specific cinematic approach in
interactive stories, and discuss the approach taken in relation to accomplishing a
Rashomon storyteller. It should be noted that this paper represents a work in progress.

2 Related Work

Limited but important research regarding interactive cinematography and cinematic


rules applied to narrative has already been made, and it has been both influence and
inspiration for our ideas. Christianson et al [1] developed a system that respects
cinematic constraints and plans camera sequences, but uses an off-line approach to
generate events that best show a pre-existing animation trace, and was not designed to
work with real-time environments. Bares and Lester approach real-time constraints
and the user's cinematic preferences [2], but do not take a full advantage of the
cinematic expertise as a body of work. The virtual cinematographer of He et al. [3]
described a scene hierarchy to provide cinematic rules and techniques to guide an
authored interactive narrative. They developed several film hierarchy models that
included shot types and film patterns, while leaving aside the narrative in terms of the
connection between the plot and the visual presentation obtained, focusing only in
camera shots and their definition. Their work has been the inspiration for most of the
hierarchy-based developments, including our research.
Additional research has also been related to cinematic issues. Drucker provided
shot selection by considering it to be a series of constrained optimization problems,
and developed a useful framework for high-level control of virtual cameras [4]. His
work aimed to aid visual tasks (e.g. VE navigation), but was not truly oriented
towards a cinematic experience. Tomlinson changed lights and camera according to
the character's emotions [5], and was one of the first to search for the visual changes
in presentation according to interaction. His goals, however, are headed mainly
towards the presentation of the emotions, with no real cinematic aspirations.
In terms of plot and story, different projects have been developed, both in emergent
and authored narratives. Emergent narrative projects, where the story emerges from
interaction with no narrative scripted, provide a test bed where believable characters
are fully responsible of the story being told: the type of stories that can emerge is
limited to the specific design of the characters that appear. The Oz project [6] is a
good example. Authored narrative projects, guided by an authorial objective, allow a
major premise to be explored through interaction and the story. The first hypertextual
adventure games, or AI oriented approaches such as the project of Mateas and Stern
[7, 8] provide good examples of this kind of narrative.
Regarding cinematic techniques, our project relates to Christianson et al. [1] and
He at al. [3] in the sense of using a film-based set of hierarchies and scene-models to
guide a narrative, and we follow a hybrid path in the narrative chosen: we aim for an
emergent narrative that makes use of basic story elements, with a minimum set of
authorial clues, and argue that the visual presentation must consider a basic authorial
objective, character and relationship attributes, and a dramatic component to be
accomplished.

3 A Cinematic Approach

If we understand a movie as a series of scenes, we can then split these scenes into
shots (also called takes) and hence describe every single piece of a film from these
elements. Scenes are pieces of information that represent continuous space or time,
with unity in terms of the sense of the information shown, while takes are an
uninterrupted segment of actions, with unity in terms of the continuous image seen
from the camera. Both scenes and takes can be described by different common
‘templates’ of camera positions and/or movement, called idioms. Film idioms are the
basic parts of a cinematic language ontology, and are the result of many years of
experience from cinematographers, editors and directors.
The visualization of a story implies the aspects related to its graphical
representation, and the way the creators aim to compose it, telling a plot but also
playing with emotions and overall producing a complete visual experience [9]. The
visualization process in a cinematic context usually covers several decisions related to
camera (position, movement, angle, focus), lighting (colour, intensity, direction,
shadows) and movement within the plot (characters, positions, blockings) [9, 10].
Cinema and movies have created a full structure to narrate stories, and most people
is absolutely comfortable with this approach: spectators, in general, perceive and
interpret when viewing and making sense of a film; cinematic elements such as close-
ups or editing conventions directly affect the spectator and its experience [11]. Film
language is clearly understood, the result of a learning process, and is very familiar to
most users. In some cases, the language is so precise and so well defined that many
different events can be narrated in the same way, and hence the resulting film idioms
that become the film language ontology (shot types, shot patterns).
Idioms, as the basis of a system, make use of cinematic notions in the sense of
classical filmmaking, but considering that VE’s and storytelling imply much more
possibilities and freedom than standard directing (e.g. by allowing complex camera
positioning and movements), idioms within a system require some degree of
constraints to handle all issues than standard filmmaking would control by itself, such
as the placement of the actors, the limitations in their movement, the previous
knowledge of what is going to happen next, an existing script, etc. As of now, film
idioms are appropriate for interactive fiction, and even in the case where the field
evolve to generate its own idioms, a cinematic approach is a first step and becomes
what we believe is a very useful starting point.
In general terms, a cinematic approach faces the initial tendency of most
animations in the first Interactive Storytelling (IS) systems, where the visual setup
was portrayed either from a specific character’s point of view, or from a small group
of pre-defined viewpoints. By lacking simple cinematic elements, such as camera
placement, we believe these interactive applications fail to value the visual
capabilities and knowledge developed by cinematographers for over a century. This
implies missing a very important expertise to which users are already familiar.

4 A Rashomon Point of View

Akira Kurosawa's film Rashomon, a 1950 black and white movie, is considered a
classic in many senses. Its structure caused such an effect that a narrative principle
now has its name. The murder story told in the film implies many interpretations, and
is surprisingly both simple and complex due to the different versions told about the
murder and the reasons. Every character, from the murderer to the murdered,
describes the story in a different manner and, in each of the four versions presented,
the details in the script [12] allow completely different conclusions. The principle
claims the following: truth is hard to find and relative, since every different point of
view may be valid.
Taking some of this principle and translating it to an IS context, we propose the
idea that, mainly by the election of different camera shots, we can generate a story
based on merely film language rules.
We claim that film idioms and a minimal set of authorial traces allow the
possibility of generating a story, providing the camera an important authorial role. In
this sense, Rashomon principle has probably never had a more literal approach: the
different points of view will be those of the camera, and hence we can allow the
generation of different versions by just changing some basic cinematic elements
within the story. In general, IS projects present a story that gets narrated through the
aid of images, but we take further this obviousness and propose that images
themselves can actually become the story, and not merely support it.

5 Our Storyteller

Our Rashomon storyteller project uses the Unreal Tournament (UT) game engine to
allow a rich environment to develop and narrate the stories. The project is currently in
its early development stage, and consists of a 3D VE (which in cinematic terms
becomes the set), character agents (our actors, with variables and specific definitions
related to the story), a basic idea for the narrative (in a similar sense as a first draft of
any screenplay) and a set of cinematic idioms (the film language ontology that keeps
the cinematic expertise). All the development is done using the engine’s scripting
language, UnrealScript. Further AI development will use other languages (e.g. C++).
5.1 Project Objectives

Our main objective is to accomplish our Rashomon storytelling proposal: simply by


using cinematic rules and some basic plot elements, we claim it is possible to generate
different and alternative story pieces by providing a large authorial leadership to the
camera. We believe that a hybrid approach between emergent and authored narrative
can be developed by merely camera shots and basic blocks that define a minimal set
of cinematic elements within a VE:

1. Film idioms (shots, rules, sequences, etc.)


2. Character descriptors (personality, variables, etc.)
3. VE restrictions (location, size, etc.)
4. Narrative guidelines (basic plot, mood, tone, etc.)

From these elements, we aim to be able to generate coherent narratives allowing a


large authorial freedom in the camera, while following specific rules that determine
its behaviour according to a cinematic ontology. Each generated story must be an
alternative interpretation of a given series of events.

5.2 A Rashomon context

To briefly describe a Rashomon context, we present a very simple example of the


type of interactions and characteristics involved in it. For the purposes of an example,
we describe the following hypothetic storytelling scenario:
1. Two different characters (completely separated from each other, fixed position, no
movement, each possessing certain characteristics and descriptors)
2. A given number of objects (probably related to the characters, and possessing
descriptors as well, e.g. a gun on a table, or a book, or a clock)
3. A given VE (any set, with descriptors and geometric restrictions, such as walls or
corners, e.g. a house)
4. A basic plot idea (an initial story idea, stating some basic issues that can lead to
actions, and the overall mood of the story, like it could be ‘suspense’; for our
example we define “One character aims to shoot another in order to take revenge
for the murder of his brother” as the basic plot)

In most conventional IS systems, by only providing the previous elements we could


not obtain a story from them. The lack of movement clearly implies some limitations,
while the lack of a complete plot seems to involve an emergent narrative condition.
This emergent circumstance becomes restricted due to the lack of movement and the
apparent lack of interest that a fixed environment may provide.
When trying to narrate the setup in a visual manner, e.g. using a conventional long
shot to show both characters, a standard camera view would not provide any useful
information about the story. Our proposal claims that a good cinematic storyteller
should find some ways to provide story pieces with these elements. Truth is relative,
and hence by make-believe shots we can imply some basic story elements to an
audience. The following list provides a possible combination of shots to use.
1. Establishing shot: exterior of the house
2. {Standard cut} Medium shot of character 1 only
3. {Standard cut} Short panning on character 1
4. {Standard cut} Medium shot: a table with a gun on top of it
5. {Standard cut} Medium shot of character 2
6. {Fast cut} Close-up of character 1
7. {Fast cut} Zoom to extreme close-up of character 2, looking from below
8. {Fast cut} Zoom to extreme close-up of the gun, looking from above

From the initial idea that film language has taught us many different ways to
interpret a sequence, we made a small test with some researchers and people from our
group. The test was basically asking to describe the ‘image’ and ‘story’ the previous
list implied, and the kind of information it provided. Our results show that most of the
users that observed such a sequence would affirm it states aspects among the
following: ‘that a character aims to point the gun (or even shoot) at the other
character’, or that ‘aims to grab it’, or ‘that someone wants to kill another one’, and
‘that they’re inside a house, where everything is happening’.
Figure 1 represents a sample storyboard with the possible interpretation of the
combination described. The image provides no information on the speed of the cuts or
the direction of the camera, and is used only as a visual aid.

Fig. 1. Sample storyboard shot representation.

The initial scenario, however, described the ‘real’ situation within the fixed VE as
probably the one shown in Figure 2. Initial conditions mentioned separated and steady
characters, and hence the interpretation of truth is just due to the film language we
apply to the sequence of images shown. Because of the sequence we can begin the
generation of a storyline, and well-know formulas for cinematic mood and pace
provide the basic elements of a story, even when the characters are not even moving.
Film language provides the basic elements, and a correct decision on the shots
induces the imaginary of what can be generally understood as a tense event (e.g. due
to the pace in the cuts and the close-ups), the idea that at least one of the characters is
looking towards the gun, and so on. The initial plot about a ‘murder’ is actually being
developed, and the sequence provides information we use to begin believing the story
in that manner. Different interpretations of the basic plot can induce different stories
(e.g. we can simply swap the characters in the list to obtain a different ‘murderer’ and
‘murdered’), and the relativity of truth within a narrative appears, through different
versions and multiple possibilities to handle visual information. Each version can be
understood as an alternative interpretation of a given series of events, and it is the shot
sequence that accomplishes this Rashomon context.

Fig. 2. Sample VE conditions described initially.

5.3 Important Considerations

It is possible to appreciate many different issues that must be acknowledged in order


to provide a correct overview of the potential of our storyteller idea and the
information provided by a real list of shots similar to the one described above. As
mentioned earlier, we aim to generate different versions, each being an alternative
interpretation of a given series of events, and this must not be understood as the real-
time generation of multiple stories.
In terms of graphical limitations, we can observe that the storyboard implies the
need to control the depth of field of the scene (an issue that does not natively exist in
UT), to be able to remove all details that can show the initial positions in the room.
We can also mention the fact that gestures and facial expressions are a clear need to
support mood or tone in a complete way. Issues like the analysis of UT ‘tricks’ to
allow depth-of- field and limited facial expression capabilities are being studied and
tested. Regarding the story, it is obvious that a gun has intrinsic uses, and that not all
the elements provide the same story capabilities (e.g., by using a book, or a fish tank,
the same shot sequence would not be too informative, as our users mentioned), hence
the need for a detailed descriptor of the actions and story-related parameters an object
may have in a storyline. The entire list of considerations is far larger, but our
development process includes the analysis of such kind of needs.

5.4 System Overview

A basic description of our system’s architecture presents three main components:


1. Autonomous Agents Control System (ACS)
2. Autonomous Virtual Camera Control System (VCCS)
3. Virtual Environment (VE)

The VE processes all the visual information shown, receives commands and
queries from both the VCCS and the ACS, and provides feedback and detailed
information about its elements, such as states, positions, events or geometric
information. Both the ACS and the VCCS are our development, while the UT engine
manages the general VE issues, the rendering and the visual output.
Our VCCS, implements an autonomous camera that makes use of cinematic idioms
for real-time shot decisions and movements. In line with the characteristics of the VE,
the VCCS determines the appropriate position of the camera, according to some rules.
Within execution time, the ACS permanently sends information to the VCCS,
providing all relevant information and descriptions related to characters. From this
information, and depending on the current state of the camera, the VS specifies an
adequate shot, responsible of the resulting visual output. The cinematic expertise that
exists in the system is found in two main components: camera shots and idioms.
While shots relate to specific camera positions, framing and movements (e.g. crane,
close-up, two shot), idioms describe the combination of different shots to generate
specific sequences. In the VCCS, a Directing Camera (DC) component is the highest
responsible of the actions taken. This DC is permanent object during execution, and
the main decision taker in terms of the cinematic element to be shown. According to
the elements in the VE, this component determines the cinematic technique to be
used, the specific shot to show, etc. Every idiom the DC uses for a specific scene
makes use of different and specific camera positions and movement. Communication
between the characters or the elements within the VE and the DC is permanent, and
hence the DC can determine the next steps in all times. A hierarchical structure of
relevant idioms determines whether the DC needs to change an idiom, and which
idioms becomes the most convenient to be used. The criteria that determine the
election of a specific idiom allow different cinematic styles, one of the research areas
that can be exploited. Idioms, the codified cinematic techniques of our ontology, are
currently FSM’s with hierarchy and deal with specific sequences as the result of the
combination of different shots. Every idiom is responsible of determining whether its
states are capable of providing an adequate shot if the conditions change, in order to
give the control back to the DC, who can then chose to use another idiom. Figure 3
provides a basic layout and overview of the control and information flows described.

Fig. 3. Basic system description with information and control flows.

5.7 Initial Prototype and Sample Shots

We are currently in the development of different general idioms, and have focused
on the main definition of different camera modules to control the shots. In this early
development stage, the most important issues relate to the correct definition of the
variables to be used, in order to provide the camera enough feedback so it can
generate a story. The complete description of the actors, however, includes several
parameters (e.g. related to personality, mood, internal beliefs) that will be used in the
near future. For our initial purposes, we are using fixed elements and characters in the
scenario as a starting point, to allow the camera a larger freedom with limited
constraints, avoiding issues such as character blocking, which will be considered in
later stages. Figure 4 shows some of the camera modules we already have. We have
implemented a small set of shots to allow a minimal cinematic narration, (e.g. crane,
over-the-shoulder, side, back, dolly, tracking, panning, tilting, follow, sweep). All
shot sizes for specific framing (e.g. long, two, medium, full, extreme close-up) are
also included. Our shots have so far proven successful, and we are in the process of
defining and correcting our set of idioms. Our initial tests show some occasional
cinematic errors (such as crossing the 180º line in some tracking when we allow
characters to move freely) but the overall definitions seem correct.

Fig. 4. Sample camera shots: (a) Crane, (b) Side, (c) Over-the-shoulder.

6 Conclusions and Future Work

We believe that aiming for improved story visualization in interactive stories is as


important to the overall experience as it is in film, and that it can be understood as a
detonator for improved experiences within IS, in addition to all the many functions it
has in conventional media. Upon its completion, the implementation of the storyteller
shall provide a test bed where we can experiment with different narrative elements
and our Rashomon proposal. Current shots, idioms and overall visualization in terms
of a cinematic approach seem quite promising.
The complete narrative part of our project is still under development in terms of the
technique to be used, but we believe that Mateas and Stern [8] and Young [13]
provide the correct approach as a starting point if we translate their narrative elements
to our cinematic needs: cinematic elements will change dynamically and planning
techniques must allow the choice of the next actions. In our case, we will use planning
techniques that must allow supporting the choice of narrative actions together with the
cinematic decisions taken. This issue will be mandatory, in order to generate the story
once it gets much more complex (e.g., random movement in the characters, characters
interaction and camera-blocking that results from their movement). The definition of
the narrative descriptors (e.g. the plot) is also a very important part of our goals.
Metadata descriptors will be studied to describe the final input and output of the
complete storyteller.
The relationship between control, story-generation and narrative is still under
different studies [14, 15], but a first approach suggests the use of a Heuristic Search
Planner [16] to manage our cinematic and cinematic-oriented decisions. Future steps
include the growth of our ontology of idioms, and the implementation of a planner to
support the cinematic decision-taking processes once story elements begin to appear.

7 Acknowledgments

This research is partially funded by the TIC2001-2416-C03-03 project of the Spanish


Ministry of Science and Technology (MCYT).

References

1. Christianson, D.B., et al. Declarative Camera Control for Automatic Cinematography,


Proceedings of the 13th AAAI National Conference on AI, 1996.
2. Bares, W. and J. Lester. Cinematographic User Models for Automated Real-Time Camera
Control in Dynamic 3D Environments, Proceedings of the 6th Intl. Conference on User
Models, 1997.
3. He, L.-W., et al. The Virtual Cinematographer: A Paradigm for Automatic Real-Time
Camera Control and Directing, Proceedings of the 23rd Annual Conference on Computer
Graphics and Interactive Techniques, 1996.
4. Drucker, S.M., et al. Intelligent Camera Control in a Virtual Environment, Proceedings of
the Graphics Interface Conference, 1994.
5. Tomlinson, B., et al. Expressive Autonomous Cinematography for Interactive Virtual
Environments, Proceedings of the 4th Intl. Conference on Autonomous Agents, 2000.
6. Mateas, M. An Oz-Centric Review of Interactive Drama and Believable Agents, School of
Computer Science, Carnegie Mellon University, 1997.
7. Mateas, M. and A. Stern. Towards Integrating Plot and Character for Interactive Drama,
Proceedings of the AAAI Fall Symposium, 2000.
8. Mateas, M. and A. Stern. A Behavior Language for Story-Based Believable Agents,
Working Notes of the AAAI Spring Symposium on AI and Interactive Entertainment, 2002.
9. Mascelli, J.V., The Five C's of Cinematography: Motion Picture Filming Techniques,
Silman-James Press, 1998.
10.Arijon, D., Grammar of the Film Language, Silman-James Press, 1991.
11.Persson, P., Understanding Cinema. Constructivism and Spectator Psychology, Department
of Cinema Studies, Stockholm University, 2000.
12.Kurosawa, A., D. Richie (ed), Rashomon, Rutgers, 1987.
13.Young, R.M. An Overview of the Mimesis Architecture: Integrating Intelligent Narrative
Control into an Existing Gaming Environment, Working Notes of the AAAI Spring
Symposium on AI and Interactive Entertainment, 2001.
14.Cavazza, M., et al. Emergent Situations in Interactive Storytelling, Proceedings of the ACM
Symposium on Applied Computing, 2002.
15.Charles, F., et al. Planning Formalisms and Authoring in Interactive Storytelling,
Proceedings of the 1st Intl. Conference on Technologies for Interactive Digital Storytelling
and Entertainment, 2003.
16.Bonet, B. and H. Geffner. Planning as Heuristic Search: New Results, Proceedings of the
European Conference on Planning, 1999.

You might also like