Professional Documents
Culture Documents
SGN-14006 / A.K.
1 Introduction SGN-14006 / A.K.
Hearing
! Auditory system can be divided in two parts
Sources: Rossing. (1990). The science of sound. Chapters 57. Peripheral auditory system (outer, middle, and inner ear)
Karjalainen. (1999). Kommunikaatioakustiikka. Auditory nervous system (in the brain)
Moore. (1997). An introduction to the psychology of hearing.
! Ear physiology studies the peripheral system
! Psychoacoustics studies the entire sensation:
Contents: relationships between sound stimuli and the subjective
1. Introduction sensation
2. Ear physiology
3. Masking
4. Sound pressure level
5. Loudness
6. Pitch
7. Spatial hearing
Hearing 3 Hearing 4
! Dynamic range of hearing is wide ! Perception involves information processing in the brain
ratio of a very loud to a barely audible sound pressure level is Information about the brain is limited
1:105 (powers 1:1010, 100 dB) ! Psychoacoustics studies the relationships between sound
! Frequency range of hearing varies a lot between stimuli and the resulting sensations
individuals Attempt to model the process of perception
only few can hear from 20 Hz to 20 kHz For example trying to predict the perceived loudness / pitch /
sensitivity to low sounds (< 100Hz) is not very good timbre from the acoustic properties of the sound signal
sensitivity to high sounds (> 12 kHz) decreases along with age ! In a psychoacoustic listening test
! Selectivity of hearing Test subject listens to sounds
listener can pick an instrument from among an orchestra Questions are made or the subject is asked to describe her
listener can follow a speaker at a cocktail party sensasions
One can sleep in background noise but still wake up to an
abnormal sound
Hearing 5 Hearing 6
! The human ear consists of three main parts: ! Outer ear consists of:
(1) outer ear, (2) middle ear, (3) inner ear pinna gathers sound; direction-dependent response
auditory canal (ear canal) - conveys sound to middle ear
Nerve
signal
to brain
[Chittka05]
Hearing 7 Hearing 8
! Middle ear contains ! The inner ear contains the cochlea: a fluid-filled organ where
Eardrum that transforms sound waves into mechanic vibration vibrations are converted into nerve impulses to the brain.
Tiny audtory bones: hammer (resting against the eardrum, see ! Cochlea = Greek: snail shell.
figure), anvil and stirrup ! Spiral tube: When stretched out, approximately 30 millimeters long.
! The bones transmit eardrum vibrations to the oval window ! Vibrations on the cochleas oval window cause hydraulic pressure
of the inner ear waves inside the cochlea
! Acoustic reflex: when sound ! Inside the cochlea there is
pressure level exceeds the basilar membrane,
~80 dB, eardrum tension ! On the basilar membrane
increases and stirrup is there is the organ of Corti
with nerve cells that are
removed from oval window
sensitive to vibration
Protects the inner ear
from damage ! Nerve cells transform
movement information into
neural impulses in the
auditory nerve
Hearing 9 Hearing 10
! Figure: cochlea stretched out for illustration purposes ! Different frequencies produce highest amplitude at different sites
Basilar membrane divides the fluid of the cochlea into separate ! Preliminary frequency analysis happens on the basilar membrane
tunnels Travelling waves:
When hydraulic pressure waves travel along the cochlea, they
move the basilar membrane
Hearing 11 Hearing 12
! Distributed along the basilar membrane are sensory hair cells that ! Masking describes the situation where a weaker but
transform membrane movement into neural impulses clearly audible signal (maskee, test tone) becomes
! When a hair cell bends, it generates neural impulses inaudible in the presence of a louder signal (masker)
Impulse rate depends on vibrate amplitude and frequency
! Masking depends on both the spectral structure of the
! Each nerve cell has a characteristic frequency to which it is most
responsive to (Figure: tuning curves of 6 different cells) sounds and their variation over time
Hearing 13 Hearing 14
! Model of the frequency analysis in the auditory system ! Figure: masked thresholds [Herre95]
subdivision of the frequency axis into critical bands masker: narrowband noise around 250 Hz, 1 kHz, 4 kHz
frequency components within a same critical band mask each spreading function: the effect of masking extends to the spectral
other easily
vicinity of the masker (spreads more towards high freqencies)
Bark scale: frequency scale that is derived by mapping
frequencies to critical band numbers ! Additivity of masking: joint masked thresh is approximately
! Narrowband noise masks a tone (sinusoidal) easier than (but slightly more than) sum of the components
a tone masks noise
! Masked threshold refers to the raised threshold of
audibility caused by the masker
sounds with a level below the masked threshold are inaudible
masked threshold in quiet = threshold of hearing in quiet
Hearing 15 Hearing 16
! Forward masking
masking effect extends to times after the masker is switched off ! A single tone is played, followed by the same tone
! Backwards masking and a higher frequency tone. HF tone is reduced in
masking extends to times before the masker is been switched on
intensity first by 12 dB, then by steps of 5 dB.
! Forward/backward masking does not extend far in time
Sequence repeats twice: second time the frequency
" simultaneous masking is more important phenomenon
separation between the tones is increased.
backward forward ! Attempt to mask higher frequencies
masking masking
! Attempt to mask lower frequencies (not masked as
easily)
Hearing 17 Hearing 18
! Idea: hide a message in the audio data, keeping the ! Sound signal s1(t) at time t represents pressure deviation
message inaudible yet decodable from normal atmospheric pressure
! Example ! Sound pressure pRMS = E{s(t )2} is the (linear) RMS-level
Here robustness to environmental noise was important of the signal
E{ } denotes expectation (RMS = root-mean-square level)
! Due to the wide dynamic range, decibel scale is
convenient
pdB = 20 log10 (pRMS / p0) = Lp
where p0 is a reference pressure
Hearing 19 Hearing 20
! Two sources with 80 dB sound pressure level ! Loudness describes the subjective level of sound
Source signals uncorrelated: together produce 83 dB level Perception of loudness is relatively complex, but
Sources correlate perfectly (same sound): results in 86 dB level consistent phenomenon and
! Doubling the sound amplitude increases the sound one of the central parts of psychoacoustics
pressure level by 6 dB ! The loudness of a sound can be compared to a
Because: Lp = 20 log10(2p/p0) = 20 log10(p/p0) + 6 [dB] standardized reference tone, for example 1000 Hz
Equivalent to adding another identical source next to the first one sinusoidal tone
Loudness level (phon) is defined to be the sound pressure level
! Intuitively: if the two sources do not correlate, the
(dB) of a 1000 Hz sinusoidal, that has the the same subjective
components of the two audio signals may amplify or loudness as the target sound
cancel out each other, depending on their relative phases, For example if the heard sound is perceived as equally loud as 40
and hence the level will be only 83 dB dB 1kHz sinusoidal, is the loudness level 40 phons
Hearing 23 Hearing 24
Frequency (Hz)
Hearing 25 Hearing 26
Hearing 27 Hearing 28
! For a sinusoidal tone ! Pitch perception has been tried to explain using two
Fundamental frequency = sinusoidal frequency competing theories
Pitch sinusoidal frequency Place theory: Peak activity along the basilar membrane
determines pitch (fails to explain missing fundamental)
! Harmonic sound Periodicity theory: Pitch depends on rate, not place, of response.
Neurons fire in sync with signals
Trumpet sound: ! The real mechanism
* Fundamental is a combination of
frequency the above
F = 262 Hz Sound is subdivided into
* Wavelength subbands (critical bands)
1/F = 3.8 ms Periodicity of the
amplitude envelope
(see lowest panel) is
analyzed within bands
Results are combined
across bands
Hearing 29 Hearing 30
Physical parameter
frequency / Bark Pressure
Frequency
Spectrum
Duration
Envelope
Hearing 31 Hearing 32
! The most important auditory cues for localizing a sound ! Diretional hearing works to some extent even with one ear
sources in space are ! Head and pinna form a direction-dependent filter
1. Interaural time difference Direction-dependent changes in the spectrum of the sound arriving in the
ear can be described with HRTFs
2. Interaural intensity difference
HRTF = head-related transfer function
3. Direction-dependent filtering of the sound spectrum by head and
! HRTFs are crucial
pinnae
for localizing
sources in the
! Terms median plane
(vertical localization)
Monaural : with one ear
Binaural : with two ears
Interaural : between the ears (interaural time difference etc)
Lateralization : localizing a source in horizontal plane
Hearing 33 Hearing 34
! HRTFs can be measured by recording ! Experimenting with sinusoidal tones helps to understand
Sound emitted by a source the localization of more complex sounds
Sounds arriving to the auditory canal or eardrum (transfer function ! Angle-of-arrival perception for sinusoids below 750 Hz is
of the auditory canal does not vary along with direction) based mainly on interaural time difference
! In practice
left: microphone in the ear of a test subject,
OR
right:
head and torso
simulator
Hearing 35 Hearing 36
! Interaural time difference is useful only up to 750 Hz ! Complex sounds refer to sounds that
Above that, the time difference is ambiguous, since there are involve a number of different frequency components and
several wavelengths within the time difference
vary over time
Moving the head (or source movement) helps: can be done up to
1500 Hz ! Localizing sound sources is typically a result of
! At higher frequencies combining all the above-described mechanisms
(> 750 Hz) the auditory system 1. Interaural time difference (most important)
utilizes interaural intensity 2. Interaural intensity difference
difference 3. HRTFs
Head causes and acoustic
shadow (sound level is ! Wideband noise: directional hearing works well
lower behind the head)
Works especially at
high frequencies
Hearing 37