L03 Hearing

Hearing 1 Hearing 2
SGN-14006 / A.K.
1 Introduction SGN-14006 / A.K.
Hearing
! Auditory system can be divided in two parts
Sources: Rossing. (1990). The science of sound. Chapters 57. Peripheral auditory system (outer, middle, and inner ear)
Karjalainen. (1999). Kommunikaatioakustiikka. Auditory nervous system (in the brain)
Moore. (1997). An introduction to the psychology of hearing.
! Ear physiology studies the peripheral system
! Psychoacoustics studies the entire sensation:
Contents: relationships between sound stimuli and the subjective
1. Introduction sensation
2. Ear physiology
3. Masking
4. Sound pressure level
5. Loudness
6. Pitch
7. Spatial hearing
Hearing 3 Hearing 4
1.1 Auditory system SGN-14006 / A.K.

1.2 Psychoacoustics SGN-14006 / A.K.
! Dynamic range of hearing is wide ! Perception involves information processing in the brain
ratio of a very loud to a barely audible sound pressure level is Information about the brain is limited
1:105 (powers 1:1010, 100 dB) ! Psychoacoustics studies the relationships between sound
! Frequency range of hearing varies a lot between stimuli and the resulting sensations
individuals Attempt to model the process of perception
only few can hear from 20 Hz to 20 kHz For example trying to predict the perceived loudness / pitch /
sensitivity to low sounds (< 100Hz) is not very good timbre from the acoustic properties of the sound signal
sensitivity to high sounds (> 12 kHz) decreases along with age ! In a psychoacoustic listening test
! Selectivity of hearing Test subject listens to sounds
listener can pick an instrument from among an orchestra Questions are made or the subject is asked to describe her
listener can follow a speaker at a cocktail party sensasions
One can sleep in background noise but still wake up to an
abnormal sound
Hearing 5 Hearing 6
2 Ear physiology SGN-14006 / A.K.

2.1 Outer ear SGN-14006 / A.K.
! The human ear consists of three main parts: ! Outer ear consists of:
(1) outer ear, (2) middle ear, (3) inner ear pinna gathers sound; direction-dependent response
auditory canal (ear canal) - conveys sound to middle ear
Nerve
signal
to brain
[Chittka05]
Hearing 7 Hearing 8
2.2 Middle ear SGN-14006 / A.K.

2.3 Inner ear, cochlea SGN-14006 / A.K.
! Middle ear contains ! The inner ear contains the cochlea: a fluid-filled organ where
Eardrum that transforms sound waves into mechanic vibration vibrations are converted into nerve impulses to the brain.
Tiny audtory bones: hammer (resting against the eardrum, see ! Cochlea = Greek: snail shell.
figure), anvil and stirrup ! Spiral tube: When stretched out, approximately 30 millimeters long.
! The bones transmit eardrum vibrations to the oval window ! Vibrations on the cochleas oval window cause hydraulic pressure
of the inner ear waves inside the cochlea
! Acoustic reflex: when sound ! Inside the cochlea there is
pressure level exceeds the basilar membrane,
~80 dB, eardrum tension ! On the basilar membrane
increases and stirrup is there is the organ of Corti
with nerve cells that are
removed from oval window
sensitive to vibration
Protects the inner ear
from damage ! Nerve cells transform
movement information into
neural impulses in the
auditory nerve
Hearing 9 Hearing 10
2.4 Basilar membrane SGN-14006 / A.K.

Basilar membrane SGN-14006 / A.K.
! Figure: cochlea stretched out for illustration purposes ! Different frequencies produce highest amplitude at different sites
Basilar membrane divides the fluid of the cochlea into separate ! Preliminary frequency analysis happens on the basilar membrane
tunnels Travelling waves:
When hydraulic pressure waves travel along the cochlea, they
move the basilar membrane
Best freq (Hz)
2.5 Sensory hair cells SGN-14006 / A.K.

3 Masking SGN-14006 / A.K.
! Distributed along the basilar membrane are sensory hair cells that ! Masking describes the situation where a weaker but
transform membrane movement into neural impulses clearly audible signal (maskee, test tone) becomes
! When a hair cell bends, it generates neural impulses inaudible in the presence of a louder signal (masker)
Impulse rate depends on vibrate amplitude and frequency
! Masking depends on both the spectral structure of the
! Each nerve cell has a characteristic frequency to which it is most
responsive to (Figure: tuning curves of 6 different cells) sounds and their variation over time
3.1 Masking in frequency domain SGN-14006 / A.K.

Masking in frequency domain SGN-14006 / A.K.
! Model of the frequency analysis in the auditory system ! Figure: masked thresholds [Herre95]
subdivision of the frequency axis into critical bands masker: narrowband noise around 250 Hz, 1 kHz, 4 kHz
frequency components within a same critical band mask each spreading function: the effect of masking extends to the spectral
other easily
vicinity of the masker (spreads more towards high freqencies)
Bark scale: frequency scale that is derived by mapping
frequencies to critical band numbers ! Additivity of masking: joint masked thresh is approximately
! Narrowband noise masks a tone (sinusoidal) easier than (but slightly more than) sum of the components
a tone masks noise
! Masked threshold refers to the raised threshold of
audibility caused by the masker
sounds with a level below the masked threshold are inaudible
masked threshold in quiet = threshold of hearing in quiet
3.2 Masking in time domain SGN-14006 / A.K.

Masking: Examples SGN-14006 / A.K.
! Forward masking
masking effect extends to times after the masker is switched off ! A single tone is played, followed by the same tone
! Backwards masking and a higher frequency tone. HF tone is reduced in
masking extends to times before the masker is been switched on
intensity first by 12 dB, then by steps of 5 dB.
! Forward/backward masking does not extend far in time
Sequence repeats twice: second time the frequency
" simultaneous masking is more important phenomenon
separation between the tones is increased.
backward forward ! Attempt to mask higher frequencies
masking masking
! Attempt to mask lower frequencies (not masked as
easily)
Application to audio steganography SGN-14006 / A.K.

4 Sound pressure level SGN-14006 / A.K.
! Idea: hide a message in the audio data, keeping the ! Sound signal s1(t) at time t represents pressure deviation
message inaudible yet decodable from normal atmospheric pressure
! Example ! Sound pressure pRMS = E{s(t )2} is the (linear) RMS-level
Here robustness to environmental noise was important of the signal
E{ } denotes expectation (RMS = root-mean-square level)
! Due to the wide dynamic range, decibel scale is
convenient
pdB = 20 log10 (pRMS / p0) = Lp
where p0 is a reference pressure
4.1 Threshold of hearing and dB scale SGN-14006 / A.K.

4.2 Multiple sources SGN-14006 / A.K.
! Threshold of hearing ! Two sound sources: s(t) = s1(t) + s2(t)

Weakest audible sound pressure at 1 kHz frequency is 20 Pa, ! RMS pressure level of the summary signal:
which has been chosen to be the reference level p0 of the dBscale
Lp = 20 log10(p/p0) = 10 log10(p2/p02) pRMS = E{s(t )2 } = E{s1 (t ) 2 + 2s1 (t )s2 (t ) + s2 (t ) 2}
! Threshold of pain
! If the signals are uncorrelated E{s1 (t )s2 (t )} = 0
Loudest sound
that the auditory and the above formula simplifies to
system can pRMS = p12 + p2 2
meaningfully
deal with If p1 = p2, the sound pressure level of the summary
130 dB @ 1 kHz signal is 3 dB higher than that of p1 (why?)
Multiple sources SGN-14006 / A.K.

5 Loudness SGN-14006 / A.K.
! Two sources with 80 dB sound pressure level ! Loudness describes the subjective level of sound
Source signals uncorrelated: together produce 83 dB level Perception of loudness is relatively complex, but
Sources correlate perfectly (same sound): results in 86 dB level consistent phenomenon and
! Doubling the sound amplitude increases the sound one of the central parts of psychoacoustics
pressure level by 6 dB ! The loudness of a sound can be compared to a
Because: Lp = 20 log10(2p/p0) = 20 log10(p/p0) + 6 [dB] standardized reference tone, for example 1000 Hz
Equivalent to adding another identical source next to the first one sinusoidal tone
Loudness level (phon) is defined to be the sound pressure level
! Intuitively: if the two sources do not correlate, the
(dB) of a 1000 Hz sinusoidal, that has the the same subjective
components of the two audio signals may amplify or loudness as the target sound
cancel out each other, depending on their relative phases, For example if the heard sound is perceived as equally loud as 40
and hence the level will be only 83 dB dB 1kHz sinusoidal, is the loudness level 40 phons
5.1 Equal-loudness curves SGN-14006 / A.K.

5.3 Critical bands SGN-14006 / A.K.
! Listening to two sinusoids with nearby frequencies and increasing

Loudness level (phons)
their frequency difference, the perceived loudness increases
when the frequency difference exceeds critical bandwidth
Figure: 1 kHz @ 60 dB, Critical bandwidth is 160 Hz at 1 kHz
Sound pressure level (dB)
! Ear analyzes sound at critical band resolution. Each critical band

contributes to the overall loudness level
Frequency (Hz)
5.4 Loudness of a complex sound SGN-14006 / A.K.

6 Pitch SGN-14006 / A.K.
! Loudness of a complex sound is calculated by using so-called ! Pitch

loudness density as intermediate unit
Subjective attribute of sounds that enables us to arrange them on
! Loudness density at each critical band is (roughly) proportional to the
log-power of the signal at the band (weighted according to sensitivity a frequency-related scale ranging from low to high
of hearing and spread slightly by convolving over frequency) Sound has a certain pitch if human listaners can consistently
! Overall loudness is obtained by summing up loudness density values match the frequency of a sinusoidal tone to the pitch of the sound
from each critical band ! Fundamental frequency vs. pitch
! Figure: integration of loudness for a sinusoidal tone and for wideband
Fundamental frequency is a physical attribute
noise
Loudness density (sones / Bark)

Pitch is a perceptual attribute
Both are measured in Hertz (Hz)
In practise, perceived pitch fundamental frequency
! "Perfect pitch" or "absolute pitch" - ability to recognize the
pitch of a musical note without any reference
Minority of the population can do that
Frequency / Bark
6.1 Harmonic sound SGN-14006 / A.K.

6.2 Pitch perception SGN-14006 / A.K.
! For a sinusoidal tone ! Pitch perception has been tried to explain using two
Fundamental frequency = sinusoidal frequency competing theories
Pitch sinusoidal frequency Place theory: Peak activity along the basilar membrane
determines pitch (fails to explain missing fundamental)
! Harmonic sound Periodicity theory: Pitch depends on rate, not place, of response.
Neurons fire in sync with signals
Trumpet sound: ! The real mechanism
* Fundamental is a combination of
frequency the above
F = 262 Hz Sound is subdivided into
* Wavelength subbands (critical bands)
1/F = 3.8 ms Periodicity of the
amplitude envelope
(see lowest panel) is
analyzed within bands
Results are combined
across bands
6.3 Perceptually-motivated frequency scales

SGN-14006 / A.K.
Subjective attributes of sound SGN-14006 / A.K.
! Sounds are typically described using four main attributes

loudness, pitch, timbre, and duration
! Table: dependence of the subjective attributes on physical
mm. on basilar membrane parameters
= strongly dependent, = to some extent = weak dependency
frequency / kHz
Subjective attribute
frequency / mel Loudness Pitch Timbre Duration
Physical parameter
frequency / Bark Pressure
Frequency
Spectrum
Duration
Envelope
7 Spatial hearing SGN-14006 / A.K.

7.1 Monaural source localization SGN-14006 / A.K.
! The most important auditory cues for localizing a sound ! Diretional hearing works to some extent even with one ear
sources in space are ! Head and pinna form a direction-dependent filter
1. Interaural time difference Direction-dependent changes in the spectrum of the sound arriving in the
ear can be described with HRTFs
2. Interaural intensity difference
HRTF = head-related transfer function
3. Direction-dependent filtering of the sound spectrum by head and
! HRTFs are crucial
pinnae
for localizing
sources in the
! Terms median plane
(vertical localization)
Monaural : with one ear
Binaural : with two ears
Interaural : between the ears (interaural time difference etc)
Lateralization : localizing a source in horizontal plane
Monaural source localization SGN-14006 / A.K.

7.2 Localizing a sinusoidal
SGN-14006 / A.K.
! HRTFs can be measured by recording ! Experimenting with sinusoidal tones helps to understand
Sound emitted by a source the localization of more complex sounds
Sounds arriving to the auditory canal or eardrum (transfer function ! Angle-of-arrival perception for sinusoids below 750 Hz is
of the auditory canal does not vary along with direction) based mainly on interaural time difference
! In practice
left: microphone in the ear of a test subject,
OR
right:
head and torso
simulator
Localizing a sinusoidal SGN-14006 / A.K.

7.3 Localizing complex sounds SGN-14006 / A.K.
! Interaural time difference is useful only up to 750 Hz ! Complex sounds refer to sounds that
Above that, the time difference is ambiguous, since there are involve a number of different frequency components and
several wavelengths within the time difference
vary over time
Moving the head (or source movement) helps: can be done up to
1500 Hz ! Localizing sound sources is typically a result of
! At higher frequencies combining all the above-described mechanisms
(> 750 Hz) the auditory system 1. Interaural time difference (most important)
utilizes interaural intensity 2. Interaural intensity difference
difference 3. HRTFs
Head causes and acoustic
shadow (sound level is ! Wideband noise: directional hearing works well
lower behind the head)
Works especially at
high frequencies
Hearing 37
7.4 Lateralization in headphone listening

SGN-14006 / A.K.
! When listening with headphones, the sounds are often localized

inside the head, on the axis between the ears
Sound does not seem to come from outside the head because the
diffraction caused by pinnae and head is missing
If the sounds are processed with HRTFs carefully, they move outside the
head

L03 Hearing

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

L03 Hearing

Uploaded by

Copyright:

Available Formats

Hearing 1 Hearing 2

1.1 Auditory system SGN-14006 / A.K.

2 Ear physiology SGN-14006 / A.K.

2.2 Middle ear SGN-14006 / A.K.

2.4 Basilar membrane SGN-14006 / A.K.

Best freq (Hz)

2.5 Sensory hair cells SGN-14006 / A.K.

3.1 Masking in frequency domain SGN-14006 / A.K.

3.2 Masking in time domain SGN-14006 / A.K.

Application to audio steganography SGN-14006 / A.K.

4.1 Threshold of hearing and dB scale SGN-14006 / A.K.

! Threshold of hearing ! Two sound sources: s(t) = s1(t) + s2(t)

Multiple sources SGN-14006 / A.K.

5.1 Equal-loudness curves SGN-14006 / A.K.

! Listening to two sinusoids with nearby frequencies and increasing

! Ear analyzes sound at critical band resolution. Each critical band

5.4 Loudness of a complex sound SGN-14006 / A.K.

! Loudness of a complex sound is calculated by using so-called ! Pitch

Loudness density (sones / Bark)

6.1 Harmonic sound SGN-14006 / A.K.

6.3 Perceptually-motivated frequency scales

! Sounds are typically described using four main attributes

7 Spatial hearing SGN-14006 / A.K.

Monaural source localization SGN-14006 / A.K.

Localizing a sinusoidal SGN-14006 / A.K.

7.4 Lateralization in headphone listening

! When listening with headphones, the sounds are often localized

You might also like