Professional Documents
Culture Documents
1 Introduction
The impressive evolution of computer games and game engines provides researchers
a rich, visual, real-time environment to narrate stories. The potential for storytelling,
immersion and narrative within Virtual Environments (VE’s) developed with current
game engines is every time closer to cinema and films in terms of its visual
capabilities and appearance. However, only few research efforts approach storytelling
by using the cinematic expertise and cinematic rules to support the visual component
of interactive stories (e.g. providing visual designs that change according to plot or
interaction), with elements such as automatic camera placement or the election of the
best possible shot with some constraints.
Currently, most of the efforts related to the visual elements also include the
analysis of the narrative components (e.g. structure, plot, characters) and hence the
story generated, or narrated, is supported by different autonomous processes both in
the visual and the narrative parts. As of now, the story is supported by its visual
presentation (e.g. aiming to emphasize elements, choosing the correct shot), but no
research has been headed towards using camera shots to generate a story by mainly
the presentation of these shots. This latter approach develops a basic idea: the story
can be understood as the mere result of the images presented, a fact that provides the
camera a very large authorial responsibility. We understand that cinematic rules
applied to a storytelling system allow the visual presentation to narrate a story entirely
by camera positions and shots. We claim that, under these principles, we can generate
and narrate a story by merely the correct real-time election of camera shots, and basic
information about the scene and story. We propose using the elements within a
storytelling system (e.g. actors and their variables, scenario and its constraints, basic
narrative elements) and presenting them in a correct sequence that results in a story,
without the need of a large or complex authorial component: the camera can actually
obtain an important authorial task.
The apparent elimination of the authorial element in the conventional sense (or the
authorial component that becomes the base of authored narratives) does not aim to
minimize the author's role, but to present a principle in terms of the different possible
points of view that can prevail when no real author exists. With these conditions, all
the versions of a story are the result of some interpretation of a very basic plot and
some story elements. This principle is named Rashomon, after a film with that name.
In this paper, we present research efforts related to a specific cinematic approach in
interactive stories, and discuss the approach taken in relation to accomplishing a
Rashomon storyteller. It should be noted that this paper represents a work in progress.
2 Related Work
3 A Cinematic Approach
If we understand a movie as a series of scenes, we can then split these scenes into
shots (also called takes) and hence describe every single piece of a film from these
elements. Scenes are pieces of information that represent continuous space or time,
with unity in terms of the sense of the information shown, while takes are an
uninterrupted segment of actions, with unity in terms of the continuous image seen
from the camera. Both scenes and takes can be described by different common
‘templates’ of camera positions and/or movement, called idioms. Film idioms are the
basic parts of a cinematic language ontology, and are the result of many years of
experience from cinematographers, editors and directors.
The visualization of a story implies the aspects related to its graphical
representation, and the way the creators aim to compose it, telling a plot but also
playing with emotions and overall producing a complete visual experience [9]. The
visualization process in a cinematic context usually covers several decisions related to
camera (position, movement, angle, focus), lighting (colour, intensity, direction,
shadows) and movement within the plot (characters, positions, blockings) [9, 10].
Cinema and movies have created a full structure to narrate stories, and most people
is absolutely comfortable with this approach: spectators, in general, perceive and
interpret when viewing and making sense of a film; cinematic elements such as close-
ups or editing conventions directly affect the spectator and its experience [11]. Film
language is clearly understood, the result of a learning process, and is very familiar to
most users. In some cases, the language is so precise and so well defined that many
different events can be narrated in the same way, and hence the resulting film idioms
that become the film language ontology (shot types, shot patterns).
Idioms, as the basis of a system, make use of cinematic notions in the sense of
classical filmmaking, but considering that VE’s and storytelling imply much more
possibilities and freedom than standard directing (e.g. by allowing complex camera
positioning and movements), idioms within a system require some degree of
constraints to handle all issues than standard filmmaking would control by itself, such
as the placement of the actors, the limitations in their movement, the previous
knowledge of what is going to happen next, an existing script, etc. As of now, film
idioms are appropriate for interactive fiction, and even in the case where the field
evolve to generate its own idioms, a cinematic approach is a first step and becomes
what we believe is a very useful starting point.
In general terms, a cinematic approach faces the initial tendency of most
animations in the first Interactive Storytelling (IS) systems, where the visual setup
was portrayed either from a specific character’s point of view, or from a small group
of pre-defined viewpoints. By lacking simple cinematic elements, such as camera
placement, we believe these interactive applications fail to value the visual
capabilities and knowledge developed by cinematographers for over a century. This
implies missing a very important expertise to which users are already familiar.
Akira Kurosawa's film Rashomon, a 1950 black and white movie, is considered a
classic in many senses. Its structure caused such an effect that a narrative principle
now has its name. The murder story told in the film implies many interpretations, and
is surprisingly both simple and complex due to the different versions told about the
murder and the reasons. Every character, from the murderer to the murdered,
describes the story in a different manner and, in each of the four versions presented,
the details in the script [12] allow completely different conclusions. The principle
claims the following: truth is hard to find and relative, since every different point of
view may be valid.
Taking some of this principle and translating it to an IS context, we propose the
idea that, mainly by the election of different camera shots, we can generate a story
based on merely film language rules.
We claim that film idioms and a minimal set of authorial traces allow the
possibility of generating a story, providing the camera an important authorial role. In
this sense, Rashomon principle has probably never had a more literal approach: the
different points of view will be those of the camera, and hence we can allow the
generation of different versions by just changing some basic cinematic elements
within the story. In general, IS projects present a story that gets narrated through the
aid of images, but we take further this obviousness and propose that images
themselves can actually become the story, and not merely support it.
5 Our Storyteller
Our Rashomon storyteller project uses the Unreal Tournament (UT) game engine to
allow a rich environment to develop and narrate the stories. The project is currently in
its early development stage, and consists of a 3D VE (which in cinematic terms
becomes the set), character agents (our actors, with variables and specific definitions
related to the story), a basic idea for the narrative (in a similar sense as a first draft of
any screenplay) and a set of cinematic idioms (the film language ontology that keeps
the cinematic expertise). All the development is done using the engine’s scripting
language, UnrealScript. Further AI development will use other languages (e.g. C++).
5.1 Project Objectives
From the initial idea that film language has taught us many different ways to
interpret a sequence, we made a small test with some researchers and people from our
group. The test was basically asking to describe the ‘image’ and ‘story’ the previous
list implied, and the kind of information it provided. Our results show that most of the
users that observed such a sequence would affirm it states aspects among the
following: ‘that a character aims to point the gun (or even shoot) at the other
character’, or that ‘aims to grab it’, or ‘that someone wants to kill another one’, and
‘that they’re inside a house, where everything is happening’.
Figure 1 represents a sample storyboard with the possible interpretation of the
combination described. The image provides no information on the speed of the cuts or
the direction of the camera, and is used only as a visual aid.
The initial scenario, however, described the ‘real’ situation within the fixed VE as
probably the one shown in Figure 2. Initial conditions mentioned separated and steady
characters, and hence the interpretation of truth is just due to the film language we
apply to the sequence of images shown. Because of the sequence we can begin the
generation of a storyline, and well-know formulas for cinematic mood and pace
provide the basic elements of a story, even when the characters are not even moving.
Film language provides the basic elements, and a correct decision on the shots
induces the imaginary of what can be generally understood as a tense event (e.g. due
to the pace in the cuts and the close-ups), the idea that at least one of the characters is
looking towards the gun, and so on. The initial plot about a ‘murder’ is actually being
developed, and the sequence provides information we use to begin believing the story
in that manner. Different interpretations of the basic plot can induce different stories
(e.g. we can simply swap the characters in the list to obtain a different ‘murderer’ and
‘murdered’), and the relativity of truth within a narrative appears, through different
versions and multiple possibilities to handle visual information. Each version can be
understood as an alternative interpretation of a given series of events, and it is the shot
sequence that accomplishes this Rashomon context.
The VE processes all the visual information shown, receives commands and
queries from both the VCCS and the ACS, and provides feedback and detailed
information about its elements, such as states, positions, events or geometric
information. Both the ACS and the VCCS are our development, while the UT engine
manages the general VE issues, the rendering and the visual output.
Our VCCS, implements an autonomous camera that makes use of cinematic idioms
for real-time shot decisions and movements. In line with the characteristics of the VE,
the VCCS determines the appropriate position of the camera, according to some rules.
Within execution time, the ACS permanently sends information to the VCCS,
providing all relevant information and descriptions related to characters. From this
information, and depending on the current state of the camera, the VS specifies an
adequate shot, responsible of the resulting visual output. The cinematic expertise that
exists in the system is found in two main components: camera shots and idioms.
While shots relate to specific camera positions, framing and movements (e.g. crane,
close-up, two shot), idioms describe the combination of different shots to generate
specific sequences. In the VCCS, a Directing Camera (DC) component is the highest
responsible of the actions taken. This DC is permanent object during execution, and
the main decision taker in terms of the cinematic element to be shown. According to
the elements in the VE, this component determines the cinematic technique to be
used, the specific shot to show, etc. Every idiom the DC uses for a specific scene
makes use of different and specific camera positions and movement. Communication
between the characters or the elements within the VE and the DC is permanent, and
hence the DC can determine the next steps in all times. A hierarchical structure of
relevant idioms determines whether the DC needs to change an idiom, and which
idioms becomes the most convenient to be used. The criteria that determine the
election of a specific idiom allow different cinematic styles, one of the research areas
that can be exploited. Idioms, the codified cinematic techniques of our ontology, are
currently FSM’s with hierarchy and deal with specific sequences as the result of the
combination of different shots. Every idiom is responsible of determining whether its
states are capable of providing an adequate shot if the conditions change, in order to
give the control back to the DC, who can then chose to use another idiom. Figure 3
provides a basic layout and overview of the control and information flows described.
We are currently in the development of different general idioms, and have focused
on the main definition of different camera modules to control the shots. In this early
development stage, the most important issues relate to the correct definition of the
variables to be used, in order to provide the camera enough feedback so it can
generate a story. The complete description of the actors, however, includes several
parameters (e.g. related to personality, mood, internal beliefs) that will be used in the
near future. For our initial purposes, we are using fixed elements and characters in the
scenario as a starting point, to allow the camera a larger freedom with limited
constraints, avoiding issues such as character blocking, which will be considered in
later stages. Figure 4 shows some of the camera modules we already have. We have
implemented a small set of shots to allow a minimal cinematic narration, (e.g. crane,
over-the-shoulder, side, back, dolly, tracking, panning, tilting, follow, sweep). All
shot sizes for specific framing (e.g. long, two, medium, full, extreme close-up) are
also included. Our shots have so far proven successful, and we are in the process of
defining and correcting our set of idioms. Our initial tests show some occasional
cinematic errors (such as crossing the 180º line in some tracking when we allow
characters to move freely) but the overall definitions seem correct.
Fig. 4. Sample camera shots: (a) Crane, (b) Side, (c) Over-the-shoulder.
7 Acknowledgments
References