Professional Documents
Culture Documents
John D. Moeller
Texas A&M University
jdmoeller@tamu.edu
1
JS-H 1:5-20, Pearl of Great Price.
instruments (Table 1). In many cases, several sound types
were recorded from a single sound source. In every case,
several recordings of each sound type were made. The
final product was a library of several iterations each of
fifty-one total sound types, resulting in several hundred
samples.3
0 5 10 15 20 25 30 35 40 45 50 55 60 Concert flute Bass drum
Time (in minutes) Alto flute Bongo drum
Figure 2. The total duration was divided and sub- Bass flute Timbales
divided according to the golden ratio. Dashed lines in- Recorder Triangle
dicate sub-divisions within sections. Quena Cymbal
Glockenspiel Claves
Key elements in the narrative were intentionally associ-
Xylophone Castanets
ated with the largest divisions of the total duration. They
Vibraphone Guiro
were connected with movement beginnings and musical
Marimba Wood block
climaxes, such as at 14:00, 23:00, 37:00, 46:00, and
Timpani Tam-tam
51:00 (Figure 3).
Antique cymbals Voice
Movement I Movement II Movement III Crotales Cello
Section #
1 2 3 4 5 6 7 8 9 10 11 1213 14 15 16 17 18 1920 21 22 23 24 2526 Gong Pink noise
0 5 10 15 20 25 30 35 40 45 50 55 60
Snare drum
Time (in minutes) Table 1. The sound sources used in the piece. Many of
the sound sources provided multiple sound types.
Figure 3. The resulting structure, including dashed lines
showing structural elements within sections.
3. RENDERING AND MIXING
2.2 Descriptions of Each Section THE PIECE
Determining the sonic approach to each section began at The composition was rendered and mixed using the audio
the same time as delineating the structure and form. Al- programming language Csound. Each section was ren-
gorithmic processes were used for many compositional dered independently and then mixed later. This section
parameters, including attack time, amplitude, duration, describes these two processes.
spatial position, frequency selection, timbre selection, Every section had one or more Csound files to process
and time-varying effects such as amplitude modulation the sound samples for the section according to the com-
and delay (for a variable comb filter). Pitch was orga- positional plan. The output of each Csound file was ren-
nized throughout the piece using a just-intonation system dered to an interleaved, multi-channel audio file. As an
based on the harmonic series. Sounds were spatialized by example, the piece begins with each of the ninety-six
routing them to discrete channels. Additional spatial ef- loudspeakers emitting a unique succession of bowed
fects were embedded in the sounds themselves though string tones. Every channel outputs an independent series
signal processing (i.e., Doppler shift, comb filtering, and of tones, yet all the channels are interconnected according
reverberation).2 to a specific harmonic plan. Thus, the code in this exam-
Eventually a refined compositional plan was created ple yielded a 96-channel, interleaved sound file. As an-
that addressed every section of the piece. This plan was other example, pitched wood percussion sounds are scat-
used for rendering the piece and for the creation of the tered throughout thirty-two channels beginning in the
listening guide. As stated in the INTRODUCTION, the second section of the piece. Once again, each channel is
listening guide was provided for each audience member unique, but all channels are interconnected according to
during performances of the piece. The guide is a section- various compositional parameters. In this case, a 32-
by-section booklet that includes the section number and channel, interleaved sound file was rendered. Ultimately,
section start time (in minutes), constituent sound types, forty-three such multi-channel files of varying channel
accompanying text, compositional methodology, and counts were rendered for the twenty-six sections of the
spatial layout (Figure 4). piece.
2
In some instances, Paul Bergs software, AC Toolbox, was used to
algorithmically generate Csound scores, or to rapidly prototype algo-
3
rithmic ideas. In other cases, algorithmic processes were coded directly Sound eXchange (SoX) was used to carry out various processes on
into the Csound files [6, 7]. sound samples in batches, such as editing, filtering, and tuning [8].
Section 14 (29:00) Josephs sound begins once again. It fades in over a
period of forty-five seconds. It emerges in twenty-four
Whispered text speakers in the center of the field. The sounds were
generated as described previously, using the same
At length I came to the conclusion that I must either remain fundamental pitch (i.e., B2). However, the lower and
in darkness and confusion, or else I must do as James di- upper bounds of the random-walk process, described
rects, that is, ask of God. I at length came to the determina-
in Mvt. I, begin at 2 and 6 and increase over time
tion to ask of God, concluding that if he gave wisdom to
them that lacked wisdom, and would give liberally, and not according to the shape of an exponential curve until
upbraid, I might venture. they reach 28 and 32 by the end of Mvt. II.
Section 15 (32:00)
Bass flute, alto flute, concert flute, quena, and recorder
Figure 4. A selection from the listening guide provided for the audience members.
A Csound file was then created for mixing the before- ing issue came about because a 96-channel playback sys-
mentioned multi-channel files. An instrument was cre- tem was not readily available for instant review of the
ated for every one of the forty-three renders, using the compositional process. This point was addressed by cre-
diskin2 opcode for each (using the array-output mode, ating stereo versions of the sections (either all or in part).
which does not appear to have an upper channel limit) This approach provided sufficient information to make
[9]. Each channel of every multi-channel render was as- useful compositional choices. Additionally, the Csound
signed to its specific output speaker location. The console messages provided helpful amplitude infor-
global start time, duration, amplitude, and amplitude en- mation. As far as file size was concerned, the resulting
velope of each multi-channel render was also appointed. multi-channel audio files were rendered using the Core
For example, section ones file was a 96-channel sound Audio Format (.caf). It was necessary to use the Core
file and thus was assigned accordingly throughout all Audio Format because it does not have the 4GB file size
ninety-six speaker positions. A global start time, ampli- limitation that audio file formats such as the Waveform
tude level, fade-in, and fade-out was designated for it. Audio File Format (.wav) or the Audio Interchange File
Section two has a 32-channel component and was as- Format (.aif) have [10, 11]. Most of the renders had file
signed to speaker positions thirty-three to sixty-four. sizes larger than 4GB, and certainly the final mixed mul-
Once again, a global start time and amplitude envelope ti-channel file exceeded it by a large margin (i.e., it had a
was assigned to it. Similar processes were carried out for 99.53 GB file size).
all the multi-channel renders, with occasional tweaks to The fully mixed 96-channel sound file was split into its
individual channels as necessary (as it was also possible ninety-six constituent individual sound files. 4 The play-
to adjust the individual channels within each multi- back units used for the composition could playback PCM
channel file). Ultimately, a 96-channel, 96kHz, 24bit, audio files in the Waveform Audio File Format up to a
hour-long audio file was generated. sampling rate of 48kHz. Thus, the file format and sample
Two of the challenges during both rendering and mix-
ing were monitoring the high-channel-count audio files 4
Scott Wilsons application De-Interleaver was used to de-interleave the
and managing the multi-gigabyte file sizes. The monitor- multi-channel sound files [12].
rate were converted accordingly. 5 The individual sound computers time, 6 the alarm clock on every tablet was
files were loaded onto their corresponding playback units programmed to go off at show time, stats were generated
(this procedure and the units themselves are described in about all the tablets to review for error-checking purpos-
greater detail in the next section). es, and every tablet was turned off.
Next, the tablets were disconnected from the host com-
4. MULTI-CHANNEL PLAYBACK puter and all the units were taken to the performance area.
SYSTEM The units were set up according to the designed layout
and turned on (Figure 1 and Figure 6). Then, at the set
As indicated in the section Compositional Methodolo- time, each of the tablets began playing its respective
gy, the design of the playback system began during the sound filethus commencing a performance of the piece.
initial phases of the creation of the work. After objective
and subjective testing of various hardware components, it 96 95 94 93 92 91 90 89
81 82 83 84 85 86 87 88
was determined to construct each playback unit with a
80 79 78 77 76 75 74 73
40mm, 2.7W, battery-powered speaker (JBL Micro II)
65 66 67 68 69 70 71 72
and a 7-inch, 4GB, Android OS tablet (Datawind 64 63 62 61 60 59 58 57
UbiSlate 7Ci) resting on a 12-inch plastic tray (Yoshi 49 50 51 52 53 54 55 56
EMI-420W) (Figure 5). 48 47 46 45 44 43 42 41
33 34 35 36 37 38 39 40
32 31 30 29 28 27 26 25
17 18 19 20 21 22 23 24
16 15 14 13 12 11 10 9
1 2 3 4 5 6 7 8
Figure 6. The physical layout of the tablets. The num-
bering begins at the bottom of the field with number one
and snakes back and forth, up to number ninety-six at
the top. Each tablet is labeled with a number as shown
in Figure 5, which corresponds to its sound-file number
and its position on the field.
5. PERFORMANCE LOCATION
Figure 5. One of the ninety-six units, each consisting of
An outdoor performance venue was an integral part of the
a tablet, a mini-loudspeaker, and a plastic tray. The cir- piece. It was desired to create as much of a free-field
cular label on the upper left-hand side of the tablet condition as possible, so that reflections in the perfor-
shows the number of the tablet, which is described be- mance venue did not cause excessive reverberation or
low. resonance. This approach ensured that listeners could
Several methods for time-aligning the playback of the aurally differentiate between the speaker units without
units were proposed and explored. The various solutions exertion, and to ensure that certain areas of the frequency
provided trade-offs between portability, reliability, ro- bandwidth were not unduly reinforced.
bustness, battery life, and accuracy. Ultimately, the To date, performances of First Vision have taken place
method outlined in the following two paragraphs was in the early morning during the summer on a level, open
used. (i.e., sans obstructions), accessible venue, which was suf-
A one-time set-up process was required to prepare the ficiently far from the sounds of modern machinery.7 Per-
system. First, the entire piece was rendered and mixed forming it in the early morning avoided the heat of the
(resulting in ninety-six unique, full-length sound files day and provided the additional element of a sunrise that
as described in the previous section). Next, the tablets coincided with the climax of the piece (which multiple
were connected to a host computer using a multi-port audience members responded favorably to in their feed-
USB hub and each sound file was loaded on to a corre- back).8
spondingly numbered tablet. The command-line tool An-
droid Debug Bridge [14] was used to communicate be-
tween the host computer and the tablets. Finally, the
alarm clock on each tablet was programmed to use the
6
loaded sound file as its sound source. The host computer derived its time from the network via NTP time
servers.
For each performance, a series of shell scripts were run
on the host computer that carried out various tasks on all 7
The premiere performance took place from 6:00 a.m. to 7:00 a.m. on
of the tablets (once again, using the Android Debug the East Lawn at Texas A&M University in College Station, Texas on 4
July 2016. The second performance took place at the same hour and
Bridge). Each tablets time was synchronized to the host location on 9 July 2016.
5 8
The file format and sampling rate were converted using the built-in Audience responses to the author from video interviews after the 4 July
Mac OS command-line utility afconvert [13]. 2016 concert; and email surveys after the 9 July 2016 concert.
6. RESULTS AND CONCLUSION entire composition from a single listening position. In
other words, there are sounds and groups of sounds that
As indicated in the section COMPOSITIONAL are audible in one area that cannot be heard in other are-
METHODOLOGY, the fundamental approach to sound as, and vice versa. Hearkening back to the earlier sculp-
spatialization in this work involved routing each sound of ture analogy, we can compare this to a sculpture that is
the composition to an individual loudspeaker at any given apprehended all at once versus a sculpture (or perhaps a
time. A key advantage to this type of literal sound-source garden of sculptures) that cannot be perceived all at one
positioning is precise localization across all listening po- time due to size, but that can be navigated throughout.
sitions. A challenge with this kind of sound positioning is Likewise, in First Vision, one listener can hear entirely
to provide sufficient variation of distances and angles of different sounds from another listener depending on
the sound sources in relation to the listener. This is where where they are situated on the field at any given time.
the high-density loudspeaker array [15, 16] comes into In summary, the primary spatialization technique used
play. The ninety-six discrete channels of First Vision in First Vision proved to be effective for precise sound-
provide noteworthy spatial resolution and variation in source localization, ample spatial variation, articulating
terms of distance and angle. the meaning of the work using space, eradicating a
During the process of composition it was possible to sweet spot, creating large sound masses, and expanding
confidently make sound spatialization choices that ex- the performance area beyond a single, audible region.
press the meaning of the piece. That is because it was Indeed, the sounds of First Vision envelop the audience.
known that the locations of the sound sources would be Large, dynamic sound masses appear and fill the area.
consistently perceived by the audience members (as de- Sound events from proximate speakers are clearly audible
scribed in the previous paragraph). Thus, in First Vision, even while they contribute to larger sound masses.
the sonic representations of the elements of Smiths ac- Sound-source location contributes to the meaning of the
count are composed in such a way as to enable them to work. Sounds that are distinctly discernible to some lis-
interact both musically and spatially in the service of the teners are inaudible to other listeners across the field.
portrayal of his narrative. Each listener hears a unique set of textures, timbres, har-
Many spatial works have a sweet spot (i.e., a pre- monies, and rhythms, depending on their position in
ferred listening zone) due to virtual sound-source posi- space.9
tioning methods. In First Vision, however, all the listen-
ing locations are valid because of the literal sound-source Acknowledgments
positioning method used. We can liken this to sculpture First Vision was funded by an Arts Research Enhance-
in relief compared to sculpture in the round. Both types ment Grant from the Academy for the Visual and Per-
of sculpture provide spatial and visual information, but forming Arts at Texas A&M University.
there are differences. For example, a sculptural relief, by
design, is best viewed from a limited number of angles. A
7. REFERENCES
free-standing sculpture, on the other hand, is viewed from
any angle. Similarly, in First Vision, listeners are not [1] First Vision Accounts. Gospel Topics. Available:
restricted to a specific, optimal listening area. https://www.lds.org/topics/first-vision-accounts.
Routing each sound to a single loudspeaker channel at a Accessed January 4, 2017.
time does not preclude the ability to create sounds that
[2] History, circa June 1839circa 1841 [Draft 2].
appear to take up more space than a single point-source.
The Joseph Smith Papers. Available:
In fact, by placing sounds that are similar (albeit de-
http://www.josephsmithpapers.org/paper-
correlated) in proximity, they perceptually merge to cre-
summary/history-circa-june-1839-circa-1841-draft-
ate a larger, interconnected sound mass. The resulting
2/2. Accessed January 4, 2017.
percept maintains a well-defined boundary, while each
voice remains discernible. By juxtaposing or layering [3] Church History Maps. Study Helps. Available:
multiple such sound masses, different effects are per- https://www.lds.org/scriptures/history-maps.
ceived depending on the nature of the constituent sound Accessed January 4, 2017.
types. For example, the sound masses may complement
[4] Timeline of Events. The Joseph Smith Papers.
or clash with each other. Edgard Varses statement
Available:
comes to mind:
http://www.josephsmithpapers.org/reference/events.
When new instruments will allow me to write music as Accessed January 4, 2017.
I conceive it, taking the place of the linear counterpoint,
[5] Aristotle, Poetics, in Aristotles Theory of Poetry
the movement of sound masses, of shifting planes, will
and Fine Art: With a Critical Text and Translation
be clearly perceived. When these sound-masses collide
of The Poetics, S. H. Butcher. New York: Dover
the phenomena of penetration or repulsion will seem to
Publications, 1951, p. 31.
occur [17].
Another consequential aspect about the spatialization of
9
the piece is the fact that a listener cannot apprehend the Visit http://www.moellerstudios.org/portfolio/first-vision/ for audio
and video selections of First Vision.
[6] P. Berg, AC Toolbox 4.5.7. The Hague, Netherlands:
Institute of Sonology, 2014.
[7] B. Vercoe et al., Csound 6.04. Cambridge, Mass.:
MIT Media Lab., 2014.
[8] L. Norskog and C. Bagwell, Sound eXchange (SoX)
14.4.2. 2015.
[9] I. Varga. Diskin2. The Canonical Csound
Reference Manual, version 6.06. Available:
http://www.csounds.com/manual/html/diskin2.html.
Accessed January 5, 2017.
[10] CAF File Overview. Apple Core Audio Format
Specification 1.0. Available:
https://developer.apple.com/library/content/documen
tation/MusicAudio/Reference/CAFSpec/CAF_overv
iew/CAF_overview.html. Accessed January 5, 2017.
[11] J. Moeller, V. Lazzarini, and R. Dobson. Comments
on Using the Array Output Version of Diskin2
With Raw (Headerless) Audio Files. Csound
General online forum. Comments posted on
August 29-31, 2015. Available:
http://csound.1045644.n5.nabble.com/Using-the-
Array-Output-Version-of-Diskin2-With-Raw-
Headerless-Audio-Files-td5743341.html. Accessed
January 6, 2017.
[12] S. Wilson, De-Interleaver 1.2.0. 2007.
[13] Afconvert 2.0. Cupertino, Calif.: Apple Inc., 2013.
[14] Android Debug Bridge (ADB) 1.0.32. Mountain
View, Calif.: Google Inc., 2015.
[15] E. Lyon, Music Composition for HDLAs (High
Density Loudspeaker Arrays), lecture at CCRMA
Colloquium, Stanford University, Stanford, Calif.,
May 18, 2016. Available:
https://www.youtube.com/watch?v=9xujQrLO0gk
and https://ccrma.stanford.edu/events/eric-lyon-
music-composition-hdlas-high-density-loudspeaker-
arrays. Accessed January 11, 2017.
[16] E. Lyon, The Future of Spatial Computer Music,
in Proc. Int. Computer Music Conf./Sound and
Music Computing Conf., Athens, Greece, September
14-20, 2014, pp. 850-854.
[17] E. Varse and C. Wen-chung, The Liberation of
Sound, Perspectives of New Music, vol. 5, no. 1,
pp. 11-19, 1966.