You are on page 1of 6

First Vision: Music for 96 Mini-Loudspeakers

John D. Moeller
Texas A&M University
jdmoeller@tamu.edu

ABSTRACT 2. COMPOSITIONAL METHODOLOGY


This paper describes a 96-channel acousmatic work for a
mobile audience, along with a unique, portable, 96- Various aspects of the composition of the piece took
channel playback system. The spatialization methods place simultaneously: the generation of the form and
used in the composition and their resultant effects are structure, the creation of a text description of each section
also discussed. of the piece, and the selection and recording of the vari-
ous sound sources of the work. The design of the multi-
1. INTRODUCTION channel playback system also took place during this ini-
tial period. It will be discussed later in the section entitled
First Vision (2016) is a 96-channel acousmatic work for a MULTI-CHANNEL PLAYBACK SYSTEM.
mobile audience. It is intended to be presented outdoors.
The composition is a sonic depiction of the theophany of 2.1 Form and Structure
Joseph Smith, Jr. (18051844). The piece is one hour in
The process of composing the work began with an exam-
duration.
ination of the selected text,1 including the historical con-
The work is listened to via ninety-six mini-loudspeakers
spread out in a field. Each speaker rests on the ground text as well as additional accounts of the events depicted
[1-4]. The text was divided into major sections and sub-
and points upwards. The speakers are laid out in a grid
divisions according to the sequence of events in the nar-
formation: eight columns by twelve rows. The size of the
area is approximately 21 m by 34 m. The audience mem- rative, conforming to the acts and scenes of an archetypal
bers are able to move freely amongst the speakers during dramatic structure.
This process ultimately yielded three main divisions
a performance of the piece (Figure 1); they are also given
a listening guide that provides information about each and twenty-six subsections (including an introduction),
section of the piece. which became the three movements and twenty-six sec-
tions of the work. The movements are divided as follows:
This paper will describe the compositional methodolo-
gy of the work, the rendering and mixing of the sounds Movement I has ten sections and ends with Smith asking
files, the playback system designed and created for the himself religious questions, Movement II also has ten
sections and ends with the near destruction of Smith, and
work, and the performance venue. There are also con-
Movement III has six sections and includes the the-
cluding remarks that discuss the results of the spatializa-
tion approach used in the piece. ophany. The structure of the movements, and subsequent-
ly the climaxes of the music, intentionally model Aristo-
tles three-part dramatic structure as described in his Po-
etics [5].
With the number of sections and the rise and fall in ac-
tion delineated, the next step was to determine the dura-
tion of each section. The durations of the twenty-six sec-
tions were derived by dividing and sub-dividing the total
duration of the piece by near approximations of the gold-
en ratio. Additional divisions using the same method
were used to deal with structural features at a sub-section
level; they delineate fade-ins and fade-outs within sec-
Figure 1. The loudspeaker layout (not to scale). tions (Figure 2).
Aristotles three-act structure and the golden ratio have
both been employed from antiquity to the present as or-
Copyright: 2017 John D. Moeller. This is an open-access article dis-
ganizing principles in aesthetic works. Both approaches
tributed under the terms of the Creative Commons Attribution License 3.0
were used in First Vision to increase the probability that
Unported, which permits unrestricted use, distribution, and reproduction the form and structure would be coherent.
in any medium, provided the original author and source are credited.

1
JS-H 1:5-20, Pearl of Great Price.
instruments (Table 1). In many cases, several sound types
were recorded from a single sound source. In every case,
several recordings of each sound type were made. The
final product was a library of several iterations each of
fifty-one total sound types, resulting in several hundred
samples.3
0 5 10 15 20 25 30 35 40 45 50 55 60 Concert flute Bass drum
Time (in minutes) Alto flute Bongo drum
Figure 2. The total duration was divided and sub- Bass flute Timbales
divided according to the golden ratio. Dashed lines in- Recorder Triangle
dicate sub-divisions within sections. Quena Cymbal
Glockenspiel Claves
Key elements in the narrative were intentionally associ-
Xylophone Castanets
ated with the largest divisions of the total duration. They
Vibraphone Guiro
were connected with movement beginnings and musical
Marimba Wood block
climaxes, such as at 14:00, 23:00, 37:00, 46:00, and
Timpani Tam-tam
51:00 (Figure 3).
Antique cymbals Voice
Movement I Movement II Movement III Crotales Cello
Section #
1 2 3 4 5 6 7 8 9 10 11 1213 14 15 16 17 18 1920 21 22 23 24 2526 Gong Pink noise
0 5 10 15 20 25 30 35 40 45 50 55 60
Snare drum
Time (in minutes) Table 1. The sound sources used in the piece. Many of
the sound sources provided multiple sound types.
Figure 3. The resulting structure, including dashed lines
showing structural elements within sections.
3. RENDERING AND MIXING
2.2 Descriptions of Each Section THE PIECE
Determining the sonic approach to each section began at The composition was rendered and mixed using the audio
the same time as delineating the structure and form. Al- programming language Csound. Each section was ren-
gorithmic processes were used for many compositional dered independently and then mixed later. This section
parameters, including attack time, amplitude, duration, describes these two processes.
spatial position, frequency selection, timbre selection, Every section had one or more Csound files to process
and time-varying effects such as amplitude modulation the sound samples for the section according to the com-
and delay (for a variable comb filter). Pitch was orga- positional plan. The output of each Csound file was ren-
nized throughout the piece using a just-intonation system dered to an interleaved, multi-channel audio file. As an
based on the harmonic series. Sounds were spatialized by example, the piece begins with each of the ninety-six
routing them to discrete channels. Additional spatial ef- loudspeakers emitting a unique succession of bowed
fects were embedded in the sounds themselves though string tones. Every channel outputs an independent series
signal processing (i.e., Doppler shift, comb filtering, and of tones, yet all the channels are interconnected according
reverberation).2 to a specific harmonic plan. Thus, the code in this exam-
Eventually a refined compositional plan was created ple yielded a 96-channel, interleaved sound file. As an-
that addressed every section of the piece. This plan was other example, pitched wood percussion sounds are scat-
used for rendering the piece and for the creation of the tered throughout thirty-two channels beginning in the
listening guide. As stated in the INTRODUCTION, the second section of the piece. Once again, each channel is
listening guide was provided for each audience member unique, but all channels are interconnected according to
during performances of the piece. The guide is a section- various compositional parameters. In this case, a 32-
by-section booklet that includes the section number and channel, interleaved sound file was rendered. Ultimately,
section start time (in minutes), constituent sound types, forty-three such multi-channel files of varying channel
accompanying text, compositional methodology, and counts were rendered for the twenty-six sections of the
spatial layout (Figure 4). piece.

2.3 Sound Sources


The sounds of the composition originate from twenty-
seven sources: a cello, multiple types of flutes, a human
voice, synthesized noise, and several kinds of percussion

2
In some instances, Paul Bergs software, AC Toolbox, was used to
algorithmically generate Csound scores, or to rapidly prototype algo-
3
rithmic ideas. In other cases, algorithmic processes were coded directly Sound eXchange (SoX) was used to carry out various processes on
into the Csound files [6, 7]. sound samples in batches, such as editing, filtering, and tuning [8].
Section 14 (29:00) Josephs sound begins once again. It fades in over a
period of forty-five seconds. It emerges in twenty-four
Whispered text speakers in the center of the field. The sounds were
generated as described previously, using the same
At length I came to the conclusion that I must either remain fundamental pitch (i.e., B2). However, the lower and
in darkness and confusion, or else I must do as James di- upper bounds of the random-walk process, described
rects, that is, ask of God. I at length came to the determina-
in Mvt. I, begin at 2 and 6 and increase over time
tion to ask of God, concluding that if he gave wisdom to
them that lacked wisdom, and would give liberally, and not according to the shape of an exponential curve until
upbraid, I might venture. they reach 28 and 32 by the end of Mvt. II.

Snippets from 0.7 to 1.0 seconds in length of whis-


pering appear in each loudspeaker. A Doppler shift
and rapid rise and fall in amplitude are applied to each
snippet. The base pitch of each selection varies within
-200 to +1100 cents from the original pitch. The
temporal position of each of the sounds in each channel
is according to a uniform distribution. At the end of
this section, all the sections that are related to whis-
pering come to a close. JJr

Section 15 (32:00)
Bass flute, alto flute, concert flute, quena, and recorder

So, in accordance with this, my determination to ask of


God, I retired to the woods to make the attempt. It was on
the morning of a beautiful, clear day, early in the spring of
eighteen hundred and twenty. It was the first time in my life
that I had made such an attempt, for amidst all my anxiet-
ies I had never as yet made the attempt to pray vocally. Section 16 (35:00)
After I had retired to the place where I had previously de-
Glockenspiel
signed to go, having looked around me, and finding myself
alone, ...
... I kneeled down and began to offer up the desires of my
heart to God. ...

Figure 4. A selection from the listening guide provided for the audience members.
A Csound file was then created for mixing the before- ing issue came about because a 96-channel playback sys-
mentioned multi-channel files. An instrument was cre- tem was not readily available for instant review of the
ated for every one of the forty-three renders, using the compositional process. This point was addressed by cre-
diskin2 opcode for each (using the array-output mode, ating stereo versions of the sections (either all or in part).
which does not appear to have an upper channel limit) This approach provided sufficient information to make
[9]. Each channel of every multi-channel render was as- useful compositional choices. Additionally, the Csound
signed to its specific output speaker location. The console messages provided helpful amplitude infor-
global start time, duration, amplitude, and amplitude en- mation. As far as file size was concerned, the resulting
velope of each multi-channel render was also appointed. multi-channel audio files were rendered using the Core
For example, section ones file was a 96-channel sound Audio Format (.caf). It was necessary to use the Core
file and thus was assigned accordingly throughout all Audio Format because it does not have the 4GB file size
ninety-six speaker positions. A global start time, ampli- limitation that audio file formats such as the Waveform
tude level, fade-in, and fade-out was designated for it. Audio File Format (.wav) or the Audio Interchange File
Section two has a 32-channel component and was as- Format (.aif) have [10, 11]. Most of the renders had file
signed to speaker positions thirty-three to sixty-four. sizes larger than 4GB, and certainly the final mixed mul-
Once again, a global start time and amplitude envelope ti-channel file exceeded it by a large margin (i.e., it had a
was assigned to it. Similar processes were carried out for 99.53 GB file size).
all the multi-channel renders, with occasional tweaks to The fully mixed 96-channel sound file was split into its
individual channels as necessary (as it was also possible ninety-six constituent individual sound files. 4 The play-
to adjust the individual channels within each multi- back units used for the composition could playback PCM
channel file). Ultimately, a 96-channel, 96kHz, 24bit, audio files in the Waveform Audio File Format up to a
hour-long audio file was generated. sampling rate of 48kHz. Thus, the file format and sample
Two of the challenges during both rendering and mix-
ing were monitoring the high-channel-count audio files 4
Scott Wilsons application De-Interleaver was used to de-interleave the
and managing the multi-gigabyte file sizes. The monitor- multi-channel sound files [12].
rate were converted accordingly. 5 The individual sound computers time, 6 the alarm clock on every tablet was
files were loaded onto their corresponding playback units programmed to go off at show time, stats were generated
(this procedure and the units themselves are described in about all the tablets to review for error-checking purpos-
greater detail in the next section). es, and every tablet was turned off.
Next, the tablets were disconnected from the host com-
4. MULTI-CHANNEL PLAYBACK puter and all the units were taken to the performance area.
SYSTEM The units were set up according to the designed layout
and turned on (Figure 1 and Figure 6). Then, at the set
As indicated in the section Compositional Methodolo- time, each of the tablets began playing its respective
gy, the design of the playback system began during the sound filethus commencing a performance of the piece.
initial phases of the creation of the work. After objective
and subjective testing of various hardware components, it 96 95 94 93 92 91 90 89
81 82 83 84 85 86 87 88
was determined to construct each playback unit with a
80 79 78 77 76 75 74 73
40mm, 2.7W, battery-powered speaker (JBL Micro II)
65 66 67 68 69 70 71 72
and a 7-inch, 4GB, Android OS tablet (Datawind 64 63 62 61 60 59 58 57
UbiSlate 7Ci) resting on a 12-inch plastic tray (Yoshi 49 50 51 52 53 54 55 56
EMI-420W) (Figure 5). 48 47 46 45 44 43 42 41
33 34 35 36 37 38 39 40
32 31 30 29 28 27 26 25
17 18 19 20 21 22 23 24
16 15 14 13 12 11 10 9
1 2 3 4 5 6 7 8
Figure 6. The physical layout of the tablets. The num-
bering begins at the bottom of the field with number one
and snakes back and forth, up to number ninety-six at
the top. Each tablet is labeled with a number as shown
in Figure 5, which corresponds to its sound-file number
and its position on the field.

5. PERFORMANCE LOCATION
Figure 5. One of the ninety-six units, each consisting of
An outdoor performance venue was an integral part of the
a tablet, a mini-loudspeaker, and a plastic tray. The cir- piece. It was desired to create as much of a free-field
cular label on the upper left-hand side of the tablet condition as possible, so that reflections in the perfor-
shows the number of the tablet, which is described be- mance venue did not cause excessive reverberation or
low. resonance. This approach ensured that listeners could
Several methods for time-aligning the playback of the aurally differentiate between the speaker units without
units were proposed and explored. The various solutions exertion, and to ensure that certain areas of the frequency
provided trade-offs between portability, reliability, ro- bandwidth were not unduly reinforced.
bustness, battery life, and accuracy. Ultimately, the To date, performances of First Vision have taken place
method outlined in the following two paragraphs was in the early morning during the summer on a level, open
used. (i.e., sans obstructions), accessible venue, which was suf-
A one-time set-up process was required to prepare the ficiently far from the sounds of modern machinery.7 Per-
system. First, the entire piece was rendered and mixed forming it in the early morning avoided the heat of the
(resulting in ninety-six unique, full-length sound files day and provided the additional element of a sunrise that
as described in the previous section). Next, the tablets coincided with the climax of the piece (which multiple
were connected to a host computer using a multi-port audience members responded favorably to in their feed-
USB hub and each sound file was loaded on to a corre- back).8
spondingly numbered tablet. The command-line tool An-
droid Debug Bridge [14] was used to communicate be-
tween the host computer and the tablets. Finally, the
alarm clock on each tablet was programmed to use the
6
loaded sound file as its sound source. The host computer derived its time from the network via NTP time
servers.
For each performance, a series of shell scripts were run
on the host computer that carried out various tasks on all 7
The premiere performance took place from 6:00 a.m. to 7:00 a.m. on
of the tablets (once again, using the Android Debug the East Lawn at Texas A&M University in College Station, Texas on 4
July 2016. The second performance took place at the same hour and
Bridge). Each tablets time was synchronized to the host location on 9 July 2016.
5 8
The file format and sampling rate were converted using the built-in Audience responses to the author from video interviews after the 4 July
Mac OS command-line utility afconvert [13]. 2016 concert; and email surveys after the 9 July 2016 concert.
6. RESULTS AND CONCLUSION entire composition from a single listening position. In
other words, there are sounds and groups of sounds that
As indicated in the section COMPOSITIONAL are audible in one area that cannot be heard in other are-
METHODOLOGY, the fundamental approach to sound as, and vice versa. Hearkening back to the earlier sculp-
spatialization in this work involved routing each sound of ture analogy, we can compare this to a sculpture that is
the composition to an individual loudspeaker at any given apprehended all at once versus a sculpture (or perhaps a
time. A key advantage to this type of literal sound-source garden of sculptures) that cannot be perceived all at one
positioning is precise localization across all listening po- time due to size, but that can be navigated throughout.
sitions. A challenge with this kind of sound positioning is Likewise, in First Vision, one listener can hear entirely
to provide sufficient variation of distances and angles of different sounds from another listener depending on
the sound sources in relation to the listener. This is where where they are situated on the field at any given time.
the high-density loudspeaker array [15, 16] comes into In summary, the primary spatialization technique used
play. The ninety-six discrete channels of First Vision in First Vision proved to be effective for precise sound-
provide noteworthy spatial resolution and variation in source localization, ample spatial variation, articulating
terms of distance and angle. the meaning of the work using space, eradicating a
During the process of composition it was possible to sweet spot, creating large sound masses, and expanding
confidently make sound spatialization choices that ex- the performance area beyond a single, audible region.
press the meaning of the piece. That is because it was Indeed, the sounds of First Vision envelop the audience.
known that the locations of the sound sources would be Large, dynamic sound masses appear and fill the area.
consistently perceived by the audience members (as de- Sound events from proximate speakers are clearly audible
scribed in the previous paragraph). Thus, in First Vision, even while they contribute to larger sound masses.
the sonic representations of the elements of Smiths ac- Sound-source location contributes to the meaning of the
count are composed in such a way as to enable them to work. Sounds that are distinctly discernible to some lis-
interact both musically and spatially in the service of the teners are inaudible to other listeners across the field.
portrayal of his narrative. Each listener hears a unique set of textures, timbres, har-
Many spatial works have a sweet spot (i.e., a pre- monies, and rhythms, depending on their position in
ferred listening zone) due to virtual sound-source posi- space.9
tioning methods. In First Vision, however, all the listen-
ing locations are valid because of the literal sound-source Acknowledgments
positioning method used. We can liken this to sculpture First Vision was funded by an Arts Research Enhance-
in relief compared to sculpture in the round. Both types ment Grant from the Academy for the Visual and Per-
of sculpture provide spatial and visual information, but forming Arts at Texas A&M University.
there are differences. For example, a sculptural relief, by
design, is best viewed from a limited number of angles. A
7. REFERENCES
free-standing sculpture, on the other hand, is viewed from
any angle. Similarly, in First Vision, listeners are not [1] First Vision Accounts. Gospel Topics. Available:
restricted to a specific, optimal listening area. https://www.lds.org/topics/first-vision-accounts.
Routing each sound to a single loudspeaker channel at a Accessed January 4, 2017.
time does not preclude the ability to create sounds that
[2] History, circa June 1839circa 1841 [Draft 2].
appear to take up more space than a single point-source.
The Joseph Smith Papers. Available:
In fact, by placing sounds that are similar (albeit de-
http://www.josephsmithpapers.org/paper-
correlated) in proximity, they perceptually merge to cre-
summary/history-circa-june-1839-circa-1841-draft-
ate a larger, interconnected sound mass. The resulting
2/2. Accessed January 4, 2017.
percept maintains a well-defined boundary, while each
voice remains discernible. By juxtaposing or layering [3] Church History Maps. Study Helps. Available:
multiple such sound masses, different effects are per- https://www.lds.org/scriptures/history-maps.
ceived depending on the nature of the constituent sound Accessed January 4, 2017.
types. For example, the sound masses may complement
[4] Timeline of Events. The Joseph Smith Papers.
or clash with each other. Edgard Varses statement
Available:
comes to mind:
http://www.josephsmithpapers.org/reference/events.
When new instruments will allow me to write music as Accessed January 4, 2017.
I conceive it, taking the place of the linear counterpoint,
[5] Aristotle, Poetics, in Aristotles Theory of Poetry
the movement of sound masses, of shifting planes, will
and Fine Art: With a Critical Text and Translation
be clearly perceived. When these sound-masses collide
of The Poetics, S. H. Butcher. New York: Dover
the phenomena of penetration or repulsion will seem to
Publications, 1951, p. 31.
occur [17].
Another consequential aspect about the spatialization of
9
the piece is the fact that a listener cannot apprehend the Visit http://www.moellerstudios.org/portfolio/first-vision/ for audio
and video selections of First Vision.
[6] P. Berg, AC Toolbox 4.5.7. The Hague, Netherlands:
Institute of Sonology, 2014.
[7] B. Vercoe et al., Csound 6.04. Cambridge, Mass.:
MIT Media Lab., 2014.
[8] L. Norskog and C. Bagwell, Sound eXchange (SoX)
14.4.2. 2015.
[9] I. Varga. Diskin2. The Canonical Csound
Reference Manual, version 6.06. Available:
http://www.csounds.com/manual/html/diskin2.html.
Accessed January 5, 2017.
[10] CAF File Overview. Apple Core Audio Format
Specification 1.0. Available:
https://developer.apple.com/library/content/documen
tation/MusicAudio/Reference/CAFSpec/CAF_overv
iew/CAF_overview.html. Accessed January 5, 2017.
[11] J. Moeller, V. Lazzarini, and R. Dobson. Comments
on Using the Array Output Version of Diskin2
With Raw (Headerless) Audio Files. Csound
General online forum. Comments posted on
August 29-31, 2015. Available:
http://csound.1045644.n5.nabble.com/Using-the-
Array-Output-Version-of-Diskin2-With-Raw-
Headerless-Audio-Files-td5743341.html. Accessed
January 6, 2017.
[12] S. Wilson, De-Interleaver 1.2.0. 2007.
[13] Afconvert 2.0. Cupertino, Calif.: Apple Inc., 2013.
[14] Android Debug Bridge (ADB) 1.0.32. Mountain
View, Calif.: Google Inc., 2015.
[15] E. Lyon, Music Composition for HDLAs (High
Density Loudspeaker Arrays), lecture at CCRMA
Colloquium, Stanford University, Stanford, Calif.,
May 18, 2016. Available:
https://www.youtube.com/watch?v=9xujQrLO0gk
and https://ccrma.stanford.edu/events/eric-lyon-
music-composition-hdlas-high-density-loudspeaker-
arrays. Accessed January 11, 2017.
[16] E. Lyon, The Future of Spatial Computer Music,
in Proc. Int. Computer Music Conf./Sound and
Music Computing Conf., Athens, Greece, September
14-20, 2014, pp. 850-854.
[17] E. Varse and C. Wen-chung, The Liberation of
Sound, Perspectives of New Music, vol. 5, no. 1,
pp. 11-19, 1966.

You might also like