Development of Cantonese Speech Intelligibility Index

Development of the Cantonese speech intelligibility indexa)
Lena L. N. Wong,b Amy H. S. Ho,c and Elizabeth W. W. Chuad

Division of Speech & Hearing Sciences, University of Hong Kong, Hong Kong, China
Sigfrid D. Soli
House Ear Institute, Los Angeles, California 90057
Received 4 April 2006; revised 8 December 2006; accepted 11 December 2006 A Speech Intelligibility Index SII for the sentences in the Cantonese version of the Hearing In Noise Test CHINT was derived using conventional procedures described previously in studies such as Studebaker and Sherbecoe J. Speech Hear. Res. 34, 427438 1991 . Two studies were conducted to determine the signal-to-noise ratios and high- and low-pass ltering conditions that should be used and to measure speech intelligibility in these conditions. Normal hearing subjects listened to the sentences presented in speech-spectrum shaped noise. Compared to other English speech assessment materials such as the English Hearing In Noise Test Nilsson et al., J. Acoust. Soc. Am. 95, 10851099 1994 , the frequency importance function of the CHINT suggests that low-frequency information is more important for Cantonese speech understanding. The difference in frequency importance weight in Chinese, compared to English, was attributed to the redundancy of test material, tonal nature of the Cantonese language, or a combination of these factors. 2007 Acoustical Society of America. DOI: 10.1121/1.2431338 PACS number s : 43.71.Gv ARB Pages: 23502361
I. INTRODUCTION A. Background
a specic test material because the relative importance of various frequency bands to speech intelligibility is a key component of the basic SII equation:
n
The Articulation Index AI or its revised appellation, Speech Intelligibility Index SII , is a quantitative measure that accounts for the contribution of audible speech cues in given frequency bands to speech intelligibility Amlani et al., 2002 . It is a useful tool for estimating speech understanding ability under specied listening situations. The AI has been suggested for clinical applications such as prediction of speech recognition performance with various congurations of hearing loss Macrae and Brigden, 1973; Pavlovic, 1984; Kamm et al., 1985; Killion and Christensen, 1998 , estimation of unaided and aided speech intelligibility to determine the potential benets of hearing aids Mueller and Killion, 1990; Killion and Christensen, 1998; Stelmachowicz et al., 2002 , and prescription of hearing aid gain Rankovic, 1991 . Amendments to the original calculations of AI were made for over a decade before the ANSI-S3.5 1969 standard was adopted. Since then, efforts were made to simplify the calculation of AI for clinical applications e.g., Pavlovic, 1984; Eisenberg et al., 1998 . The term, Speech Intelligibility Index SII was later adopted in the ANSI-S3.5 1997 standard to account for spread of masking and level distortion effects Amlani et al., 2002 . To establish the SII, it is necessary to gain a thorough understanding of the frequency-importance function FIF of
a
IiAi,
i=1
Portions of this work were presented in a paper Cantonese Speech Intelligibility Index, Proceedings of International Congress of Audiology, Phoenix, Arizona, September 2004. b Electronic mail: LLNWONG@hku.hk c Currently associated with St. Teresas Hospital Hearing and Speech Centre, Hong Kong. d Currently associated with Starkey HK Hearing and Speech Centre Ltd. 2350 J. Acoust. Soc. Am. 121 4 , April 2007
where Ii is the importance of a frequency band i and is expressed as a weighted factor from 0.0 to 1.0; Ai is the audibility function, representing the amount of speech energy available in the ith frequency band that contributes to the overall intelligibility French and Steinberg, 1947; Amlani et al., 2002 . It is assumed that the speech signals in the adjoining frequency bands that comprise the audible spectrum will independently contribute to the articulation score, and speech intelligibility is an additive measure of weighted importance contributed by different frequency regions Rankovic, 1995 . The dynamic range DR of the long-term average speech spectrum LTASS , which Byrne et al. 1994 found is similar across languages, affects the calculation of the audibility function. Conventionally, the effective DR is assumed to be 30 dB at all frequency bands for English materials e.g., ANSI-S3.5, 1969; Studebaker and Sherbecoe, 1991; Eisenberg et al., 1998 . Studebaker et al. 1999 argued that there is credible evidence for a larger value, and they experimentally proved that a DR of 40 dB yielded better prediction of speech recognition using NU-6 Tillman and Carhart, 1966 on normal hearing and hearing-impaired participants under different listening conditions. The SII can be used to predict speech intelligibility via a transfer function S such as the one derived by Fletcher and Galt 1950 :
2007 Acoustical Society of America
0001-4966/2007/121 4 /2350/12/$23.00
TABLE I. Crossover frequencies of various speech materials. Study Studebaker et al. 1987 Studebaker and Sherbecoe 1991 Eisenberg et al. 1998 Sherbecoe and Studebaker 2002 ANSI S3.5-1969 French and Steinberg 1947 Speech stimulus Continuous discourse W-22 HINT sentences Connected Speech Test Nonsense syllables Nonsense syllables Crossover frequency 1189 Hz 1314 Hz 1550 Hz 1599 Hz 1660 Hz About 1900 Hz
S = 1 10AP/Q N ,
where S is the percent correct intelligibility score, A is the SII value, P stands for a proficiency factor that accounts for talkers and listeners competence and practice effect, and both Q and N are fitting constants depending on the speech stimulus characteristics Fletcher and Galt, 1950 . More specifically, Q is a correction factor to compensate for changes in proficiency to the test stimuli in an experiment; N represents the number of independent sounds in a test item or a constant that controls the shape of the line S Studebaker and Sherbecoe, 1991, pp. 431 and 433 .
B. SII for specic speech materials
Studebaker and Sherbecoe 1993 reported that FIFs vary with speech stimuli so that given the same SII, predicted speech intelligibility varies with speech materials. The original AI calculation was based on CVC nonsense syllables French and Steinberg, 1947 . Other types of speech test materials have been used in subsequent research. These include the Central Institute for the Deaf CID W-22 word lists Studebaker and Sherbecoe, 1991 , NU-6 word test Studebaker et al., 1993 , Hearing In Noise Test HINT sentence materials Eisenberg et al., 1998 , Consonant-vowel Nucleus-Consonant CNC monosyllabic word test Henry et al., 1998 , and Connected Speech Test CST passages Sherbecoe and Studebaker, 2002 . DePaolis et al. 1996 found statistically different one-third octave band FIFs for PB-50 monosyllabic words, the SPIN test and continuous discourse. Distinct crossover frequencies, i.e., the frequency that divides a speech spectrum into two equally important parts, varied from 1189 to 1900 Hz for various materials see Table I . With the exception of the W-22 word lists, crossover frequencies shift to lower values as the redundancy of the speech materials increases Studebaker et al., 1987; Studebaker and Sherbecoe, 1991 continuous discourse has the lowest values and nonsense syllables have the highest values. The crossover frequency may differ across languages. For example, while French and English did not show much difference in crossover frequencies about 1500 Hz , Finnish disyllabic words had a signicantly lower crossover frequency at about 1000 Hz Studebaker and Sherbecoe, 1993 . The FIF or crossover frequency has never been established for tonal languages such as Cantonese.
C. Cantonese
in South-Eastern China Ramsey, 1987 and one of the main dialects in China Li, 1989 . It is commonly spoken among Chinese immigrants in North America, South Asia, Australia, and Great Britain Lau and So, 1988; Matthews and Yip, 1994 . Among Chinese dialects, its inuence is second to that of Mandarin Matthews and Yip, 1994 . Cantonese morphemes are monosyllabic and monosyllables are combined to form polysyllabic words. Cantonese syllables take the form of optional initial consonant, mandatory vowel, and optional nal consonant or C V C . Cantonese has the same long-term average speech spectrum LTASS as many other languages including English Byrne et al., 1994 , but Cantonese phonology is very different from English phonology So and Dodd, 1995 . For example, Cantonese speakers would be concerned with discrimination of aspirated and unaspirated consonants and not between voiced and voiceless consonants. Cantonese has fewer consonants and more vowels than English, and tones carry lexical meaning Dodd and So, 1994 . There are nine lexical tones Browning, 1974; Fok Chan, 1974; Dodd and So, 1994 , as listed in Table II. Browning 1974 suggested that the three entering tones high, mid, and low of Cantonese are not contrastive as their registers are comparable to tones 1 high level , 3 mid level , and 6 low level . Pitch variations due to changes in fundamental frequency F0 provide the main cues for tone perception Fok Chan, 1974; Gandour, 1981; Cheung, 1992 . Cheung 1992 found that tones are more resistant to the masking effect of noise than consonants. Thus, it is possible that low-frequency information carries more weight for Cantonese speech understanding than English. In fact, compared to English speakers with the same amount of hearing loss, Cantonese speakers with good lowfrequency hearing experience less self-reported difculty in speech understanding, despite a signicant loss at higher frequency Doyle and Wong, 1996; Doyle et al., 2002; Wong et al., 2004 .
TABLE II. Description and examples of each Cantonese tone. Number 1 2 3 4 5 6 7 8 9 Classication High level High rising Mid level Low falling Low rising Low level High entering Mid entering Low entering Example Poem History Examination Time Market Matter Color Kiss Eat Transcription si1 si2 si3 si4 si5 si6 sIk7 sIk8 sIk9
Cantonese is a tonal language spoken by more than 16 million people in the world. Cantonese is a regional dialect
J. Acoust. Soc. Am., Vol. 121, No. 4, April 2007
Wong et al.: Cantonese speech intelligibility index
2351
D. Aim of the study
This study was aimed at deriving a Speech Intelligibility Index for Cantonese SIIC using the materials from the Cantonese version of the Hearing in Noise Test CHINT Wong and Soli, 2005 . The CHINT is the only standardized Cantonese sentence speech reception test. Deriving a SII based on the CHINT would result in a better understanding of Cantonese speech perception. In particular, cochlear implant coding strategies are based on work to optimize speech understanding in native English speakers, but Cantonese users fail to recognize tones Ciocca et al., 2002; Wong and Wong, 2004 . It is hoped that knowledge of Cantonese SII may result in a better understanding of cochlear implant strategies to help preserve tonal information. How hearing aids should be best prescribed for Cantonese speakers also requires a thorough understanding of how audibility at various frequencies contributes to intelligibility. Procedures described by Studebaker and Sherbecoe 1991 were used as a basis for Cantonese SII derivation. With Cantonese being a tonal language, it was expected that the crossover frequency for a given type of Cantonese material would be lower than the English equivalent and the FIF would be different from English or French materials. As the effective DR for CHINT has not been determined, results based on the work by Byrne and colleagues 1994 were used. That is, the DR of CHINT was assumed to be 30 dB, but a 40-dB DR was also evaluated.
II. METHOD A. Participants
gibility. The CHINT comprises 24 sets of 10 sentences each, with sentences in each set balanced for the level of difculty and phonemic characteristics. The Cantonese HINT sentences have 10 syllables represented by 10 Chinese characters; this contrasts with the English HINT sentences that contain four to seven syllables Nilsson et al., 1994 . The CHINT can be used to assess speech intelligibility in quiet and in noise with noise simulated to originate from 0 , 90, and 270 azimuths. In this study, speech and noise were presented in noise only from 0 azimuth. The noise used was matched to the long-term average speech spectrum of the talker.
C. Equipment
The CHINT sentences and the speech-spectrum shaped noise were presented via the Hearing In Noise Test HINT program version 5.0.3 using a SoundBlaster soundcard. Both speech signal and speech-spectrum shaped noise were mixed before they were delivered to a Tucker-Davis Technologies TDT System 3 digital lter. The lter was controlled by a computer program, Realtime Processor Visual Design Studio RPvds version 4.0 and provided a rejection slope of 96 dB/ octave at the desired cutoff frequencies. The ltered signals were routed to a GSI 16 audiometer and presented diotically to the participants using TDH-50P headphones. The output of headphones was calibrated to 65 dB A in a 6-cc coupler using the speech-spectrum shaped noise low-pass ltered at 12 000 Hz.
D. Procedures
Six normal-hearing native Cantonese speakers participated in the pilot study. Seventy-eight 34 male, 44 female other young normal-hearing native Cantonese speakers participated in the actual study. As participants were recruited in Hong Kong where some individuals are exposed to two dialects e.g., Cantonese and Mandarin since birth, rst language was difcult to determine. Therefore, participants speaking Cantonese as their primary language were recruited. None of the participants spoke Cantonese with a dialectal accent. Mean age of participants in the actual experiment was 23 years for male s.d. 4.5 and 22 years for female s.d. 2.5 , with a range from 18 to 34 years. All participants had bilateral hearing thresholds of 20 dB HL or better at the octave frequencies from 250 to 8000 Hz. In the actual experiment, participants pure-tone hearing thresholds averaged at 500, 1000, and 2000 Hz in the right ear was 9.9 dB HL s . d . 3.8 and in the left ear was 6.9 dB HL s . d . 4.3 . None of the participants reported histories of noise exposure or middle ear pathology. All of them had normal middle ear function conrmed by tympanometry prior to the experiment. All participants were paid to take part in the study.
B. Materials
For the wide-band condition, the noise was xed at 65 dB A, and the level of speech signal was varied according to the desired signal-to-noise ratio SNR in each test condition. Prior to testing, participants listened to two practice lists, one presented in quiet and another in noise to familiarize them with the stimuli and test procedures. Reception thresholds of sentences RTSs were obtained adaptively Nilsson et al., 1994 in quiet with test stimuli low-pass LP ltered at 12 000 Hz. Individual RTSs served as reference levels for obtaining speech intelligibility scores in the ltering conditions.
1. Pilot study for selecting ltering conditions
Sentences from the CHINT Wong and Soli, 2005 were used in the present study because it is the only wellstandardized material for assessing Cantonese speech intelli2352 J. Acoust. Soc. Am., Vol. 121, No. 4, April 2007
A pilot study was conducted to determine the signal-tonoise ratios SNRs and the cutoff frequencies that should be used in the actual experiment. Six participants took part in the pilot study. RTSs were obtained in noise and served as the reference level for determining the speech level at which percent correct intelligibility was measured in various ltering/SNR conditions. Participants were instructed to repeat as much of each sentence as possible. According to the HINT protocol, only small variations in response that did not change the meaning of the sentences were allowed e.g., mommy instead of mama . Percent correct intelligibility was then obtained in various ltering/SNR conditions. As there were 10 sentences in each list, every sentence repeated correctly contributed to 10% of the score. To determine the ltering conditions to be
TABLE III. Mean percent speech recognition scores in various ltering/SNR conditions used in the actual experiment. The blank cells represent conditions that were not evaluated in the study. SNR refer to individual RTSs 2 1 0 2
Cut-off frequency Low-pass ltered 500 650 800 1100 1400 1700 3500 5000 6500 8000 12000 High-pass ltered 200 500 800 1100 1400 1700
1.3 3.8 11.1 2.2 11.1 15.6
0.6 6.7 12.7 28.9 31.1 30.0 29.4
4.4 7.3 23.1 32.2 44.4 48.9 43.8
0.6 5.0 13.3 42.5 60.0 56.7 58.9 56.3
7.3 9.3 15.0 43.1 50.6 74.4 74.4 74.4 74.4
0.6 10.0 23.1 18.8 38.1 56.9 84.4 88.9 91.1 77.8 85.6
1.3 18.0 28.7 46.9 57.5 81.3 91.3 93.3 94.4 97.8 96.3
1.9 37.5 50.0 64.7 67.5 82.0 95.0 98.3 96.7 100.0 96.3
7.5 8.7 0.6 0.6
26.9 21.9 1.3 3.1 2.5
41.9 34.4 7.5 6.3 3.1 0.6
54.4 39.4 12.7 8.7 4.0 3.8
88.7 68.1 24.4 19.4 10.0 2.5
84.4 75.6 41.9 29.4 18.8 3.1
92.5 89.4 56.9 55.6 35.6 6.7
96.9 90.0 65.0 60.0 40.6 12.5
used in the actual study, presentations at +7 dB SNR; and LP and high-pass HP lters set to 500, 800, 1100, 1400, and 1700 Hz were arbitrarily chosen. To select the SNR conditions for the actual experiment, a total of eight SNR conditions, i.e., 5, 4, 3, 1, 0, +1, +3, +5 dB SNR with reference to individual RTSs were used with LP lter set to 12 000 Hz cutoff. A total of 13 conditions were evaluated. Each participant took about one hour to complete the testing in the pilot study. After the pilot study, SNR and ltering conditions that would contribute important information to the study were selected. In addition, a few other SNR and ltering conditions were selected in order to yield more detailed information. The criteria used to select these conditions will be discussed in the Results section.
2. Study
as wide a range of scores as possible from 0% to 100% could be obtained. Among the conditions that yielded very similar results, only one was selected.
2. Determination of performance-intensity function
The performance-intensity PI function is dened as the change in intelligibility per dB change in SNR. The PI function was used to conrm whether intelligibility grows as a function of SNR in a linear relationship, with a slope of about 10% per dB change in SNR Wong and Soli, 2005 . For the present study, the PI function was estimated by using the data from the 12 000 Hz LP ltering condition at various SNRs. Intelligibility scores from 20% to 80% were used to estimate the PI function; beyond this range, plateau of scores did not allow accurate measurement.
A practice list was administered to obtain RTS in noise so as to familiarize participants with test stimuli and procedures. RTS was measured in noise to determine the speech level at which percent correct intelligibility was to be measured. As only 23 sentence lists were available after individual RTSs were obtained, each participant was evaluated using 22 to 23 randomly assigned conditions. This process took about one hour to complete. Mean intelligibility in each test condition was based on 16 sets of data. Based on results from the pilot study, RTS was obtained in a total of 115 ltering/SNR conditions see Table III .
E. Data analysis 1. Pilot study to select ltering/SNR conditions
3. Determination of crossover frequencies
Percent intelligibility across the ltering/SNR conditions was compared. Conditions were selected or added to ensure
The crossover frequency is dened as the frequency which divides the frequency range into two regions, each accounting for 50% of the information. To obtain crossover frequencies at each SNR, intelligibility at each LP and HP cutoff frequency was plotted. These data were examined and only the scores that contributed to the linear portion of the growth function were used to obtain two regression equations, one for the LP and another for the HP conditions; beyond this range, ceiling and oor effects might have affected the results. The intersection between the LP and HP curves at each SNR represents the crossover frequency for the CHINT materials. The crossover frequency was obtained by solving these equations. The same procedures were applied at various SNRs.
Wong et al.: Cantonese speech intelligibility index 2353
FIG. 1. The curve bisection procedure used to derive the RTF. Panel A denotes the value for 0.50 SII, panel B denotes the value for 0.25 SII, and panel C denotes the values used to derive 0.75 SII. Results from high-pass ltering conditions are represented by lines with upper ends that start from the left side of the graph; those from low-pass ltering conditions start from the right side.
4. Derivation of the relative transfer function
The relative transfer function RTF assumes that the maximum SII is equal to one. That is, the unltered condition with the highest score is assigned a SII value of 1.00 and the other conditions have SIIs relative to that value Studebaker et al., 1987; Studebaker and Sherbecoe, 1991 . The curve bisection procedure described by Studebaker and Sherbecoe 1991, pp. 431 to 432 was used to derive the RTF. Briey, percent correct scores for the LP and HP ltering conditions at the highest SNR i.e., +8 dB SNR, with reference to individual RTSs were rst plotted as a function of lter cutoff frequency see Fig. 1 . The percent correct intelligibility corresponding to 0.5 SII was obtained using these two curves. That is, the intersection of these two curves represents 0.5 SII, because half of the total auditory area is available to the listener above this point and another half is
2354 J. Acoust. Soc. Am., Vol. 121, No. 4, April 2007
below this point. The total area for this SNR is assumed to have an SII of 1.00 Studebaker and Sherbecoe, 1991, p. 430 . The procedures are shown in panel A of Fig. 1. The score at 0.50 SII was then used to determine the next point i.e., the score corresponding to 0.25 SII on the transfer function. Because there were no HP or LP curves that terminated at the score corresponding to 0.50 SII, data from the curves corresponding to 0 and 2 dB SNR were used to interpolate the data. The intersection of these two curves yielded the scores for 0.25 SII. Scores for SII values above 0.50 were obtained by identifying points on the curves that complemented those below 0.5 SII Studebaker and Sherbecoe, 1991, p. 431 . The procedure is illustrated in panel B of Fig. 1. The 0.75 SII point was produced by extending a horizontal line for the score for 0.25 SII until it intersected the HP and LP curves for the
+8 dB SNR with reference to individual RTSs condition. These HP and LP curves as well as the horizontal line are shown in panel C of Fig. 1. Two vertical lines were then drawn, starting from these two intersection points, to connect to the upper ends of the LP and HP curves for the +8 dB SNR with reference to individual RTSs condition. The values corresponding to the top intersections of these lines and the HP and LP curves, as indicated by the two circles in panel C of Fig. 1, were then averaged to yield a nal score for 0.75 SII. The above bisection procedures were followed until a number of SII values with corresponding percent correct intelligibility were obtained. The SPSS 11.0 program was used to t the percent correct intelligibility scores and the corresponding SII derived from the above procedures using several equations including Eq. 2 . The best t SII relative transfer function RTF , together with its tting constants were estimated.
5. Derivation of the frequency-importance function
assumes the best score obtained is equivalent to a perfect SII or 1.0 , adjustment to the slope of the RTF is required to obtain an absolute transfer function ATF , which reects the true relationship between SII and the test scores. The procedures described in Studebaker and Sherbecoe 1991, p. 433 were followed below to derive the ATF. Equation 1 was used to identify the SIIs for the mean percent correct score in each test condition. As the SNRs were based on individual RTSs, a correction factor equivalent to the mean RTS or 3.5 dB was added to each condition before calculations. Using an iterative method of audibility index determination Studebaker and Sherbecoe, 1991 , the SII for each listening condition was calculated using Eq. 4 :
n
SII =
i=1
SNRadjusted + K /DR
FIFi,
The RTF was then used to derive the frequencyimportance function FIF , i.e., the relative importance of speech information contained in each frequency region dened by the area between lter cutoff frequencies Henry et al., 1998; Studebaker and Sherbecoe, 1991 . The procedures described in Studebaker and Sherbecoe 1991, pp. 430 to 433 and Henry et al. 1998, p. 83 were followed. First, all HP and LP mean scores at each SNR condition were converted to SIIs using Eq. 3 , which is a transformation of Eq. 2: A = Q/P log 1 S1/N . 3
The mean scores and their corresponding SIIs were substituted into Eq. 3 to obtain the fitting constants Q and N using SPSS 11.0 program. The P value was assumed to be 1.000. The HP and LP SII data were combined and averaged using the procedures set out in Studebaker and Sherbecoe 1991, pp. 430 to 432 to generate an average cumulative SII curve against the filter cutoff frequencies. Briefly, the mean SII across all SNRs for each filtering cutoff frequency was calculated. These SII values were then plotted against the filtering frequencies. The SPSS 11.0 program was used to identify the best fit curve for relating these parameters. As this graph represented the cumulative band-importance for the full range of frequencies 200 to 12 000 Hz , the contribution of each one-third octave band FIF was obtained by dividing the full range into appropriate bands, and subtracting the cumulative SII at the center frequency of the lower band from that of the higher band. Then, the relative FIF was expanded to an SII scale of 0 to 1. This was achieved by dividing every SII value by the sum of individual SIIs.
6. Derivation of the absolute transfer function
where SNR adjusted is the SNR for each test condition adjusted by the mean RTS or 3.5 dB , K is the assumed speech maxima above LTASS, DR is the assumed dynamic range for speech, FIFi is the FIF of frequency band i, and n is the total number of bands used in the calculation. First, mean scores between 5 and 95% were plotted against their SII values, using Eq. 2 as the fitting model which was also the best fit curve among others e.g., linear regression analysis . As the value of K was unknown for the CHINT material, it was varied in 1 dB steps from 10 to 21 dB, and the DR was set at 30 or 40 dB to identify a combination of K and DR values that would yield the smallest mean square error. These K and DR values were based on ANSI-S3.5 1969 standard where a 30 dB DR represents the range from +12 dB to 18 dB relative to the LTASS. This range was then modified to 15 dB in ANSI-S3.5 1997 . In this study, the exploration of the K value was extended to 21 to include these values. While typical Cantonese speech DR was assumed to be 30 dB Byrne et al., 1994 , a DR of 40 dB as suggested by Studebaker et al. 1999 was also evaluated for any improvement to the accuracy of intelligibility prediction.
III. RESULTS A. Pilot study to select ltering/SNR test conditions
Once the FIF is determined, the slope of the RTF can be adjusted so that the best SII predicted by that function is now equal to its true absolute value Studebaker and Sherbecoe, 1991 . As the curve bisection procedure in RTF derivation
A speech intelligibility dropped dramatically when the cutoff frequency of the LP lter was reduced from 800 to 500 Hz, the 650 Hz LP ltering condition was added in the actual experiment. Because the 1700 Hz LP ltering condition failed to yield a high score i.e., scores were lower than 90% correct , LP ltering conditions with cutoff frequencies at 3500, 5000, 6300, and 8000 Hz were added. A 200 Hz HP ltering condition was also added because the 500 Hz HP ltering condition failed to yield a high score. Thus, a total of 17 ltering conditions were used in the actual experiment. There were 11 LP ltering conditions with cutoff frequencies set at 500, 650, 800, 1100, 1400, 1700, 3500, 5000, 6300, 8000, and 12 000 Hz; and six HP conditions with cutoffs at 200, 500, 800, 1100, 1400 and 1700 Hz.
Wong et al.: Cantonese speech intelligibility index 2355
FIG. 2. Mean percent speech intelligibility, plotted as a function of cutoff frequency at various SNRs. Results from high-pass ltering conditions are represented by lines with upper ends that start from the left side of the graph; those from low-pass ltering conditions start from the right side.
As there was no substantial difference in scores between the 5 and 4 dB SNR conditions i.e., 5% versus 8.3% , the 5 dB condition was not used for the actual experiment. Because the +7 dB SNR with LP lter cutoff at 1700 Hz condition yielded a score of only 80%, the +8 dB SNR condition was added in the actual experiment in an attempt to yield better scores. In addition, a preliminary SII was estimated using the curve bisection procedure described above. This suggested that testing using 2 dB SNR steps was adequate in generating results for SII calculations except that the 1 dB SNR condition should be retained because it yielded approximately 0.50 SII in the pilot study and would facilitate derivation of SII. Thus, in the actual study, eight SNR conditions at 4, 2, 1, 0, +2, +4, +6, and +8 dB were adopted. As speech stimuli in some of the ltering/SNR conditions e.g., LP ltering cutoff at 1400 Hz or below at 4 dB SNR were consistently unintelligible, these conditions were excluded from further testing. Together, 115 ltering/SNR conditions see Table III , instead of the 136 conditions 8 SNR 17 ltering conditions used in the pilot study were used in the actual study.
as the cutoff frequency of HP ltering was reduced to about 800 Hz. The scores also covered a wide range of performance.
C. Reception threshold of sentences and performance-intensity function
The mean RTS was 3.5 dB s.d. 1.16 . The PI function is shown in Fig. 3. Sentence intelligibility that ranged between 29.4% and 74.4% corresponded to 2 dB and +2 dB SNR with reference to individual RTSs , respectively, in the full band condition and grew at a rate of 11.1% per dB SNR.
B. Results in various ltering/SNR conditions
The mean percent correct score in each ltering/SNR condition is reported in Table III and Fig. 2. These results suggest an improvement in intelligibility as the cutoff frequency of LP ltering was increased to about 3500 Hz and
FIG. 3. PI function plotted as mean percent intelligibility at various SNRs refer to individual RTSs . The bars represent 1 standard error from the mean. Wong et al.: Cantonese speech intelligibility index
FIG. 5. Best-t relative transfer function RTF and the 13 intelligibility scores % plotted as a function of SII values. FIG. 4. Crossover frequency as the intersection between regression lines as a function of mean percent intelligibility at cutoff frequencies from 500 to 1700 Hz. Results from +4 dB SNR refer to individual RTSs conditions are used.
Using this PI function, the SNR for 50% correct performance is estimated at 0.3 dB, which is 3.2 dB above the mean RTS.
D. Crossover frequency
Linear regressions used to t the data yielded crossover frequencies of 1069 Hz at 2 dB SNR, 1097 Hz at 1 dB SNR, 1110 Hz at 0 dB SNR, 1045 Hz at 2 dB SNR, 1130 Hz at 4 dB SNR, 1025 Hz at 6 dB SNR, and 1050 Hz at 8 dB SNR. Crossover frequency at 4 dB SNR with reference to individual RTS was not calculated because the intelligibility was very low across all ltering conditions and performance was probably affected by oor effects. The geometric average of these crossover frequencies is 1075 Hz. Mean percent performance at +4 dB SNR refer to individual RTSs for LP and HP ltering conditions is presented in Fig. 4.
E. Relative transfer function
points were estimated in a similar manner, yielding a total of 13 SII values with corresponding percent correct intelligibility as plotted in Fig. 5. Equation 2 yielded the best t SII relative transfer function RTF , as compared to that of the other t functions evaluated when the prociency factor P was assumed to be 1.000. The tting constants Q and N were found at 0.3638 and 12.2491, respectively. R2 value of 0.9894 indicated that the model provided a good t to the data. The RTF, plotted as a function of sentence recognition score against SII using the CHINT in the wideband condition, is also shown in Fig. 5.
F. Derivation of the frequency-importance function FIF
In panel A of Fig. 1, the two LP and HP curves for the +8 dB SNR refer to individual RTSs condition are plotted. The intersection point marked by a circle between the two curves corresponded to 0.5 AI. The corresponding percent correct intelligibility 58% served as the starting point at which the next two LP and HP curves were plotted. As none of the SNRs produced a 58% correct score, the next pair of LP and HP curves was estimated by interpolating data between the two curves 0 and +2 dB SNR refer to individual RTSs that yielded scores closest to 58% in the unltered condition, as shown in panel B. The point where these two curves intersected was 0.25 SII. The value corresponding to 0.25 SII was about 7%. The 0.75 SII point was estimated by drawing a horizontal line through the 0.25 SII point until it intersected the LP and HP ltered curves in the best SNR condition. Two vertical lines were then drawn across the intersections until one met the upper end of the HP ltered curve, and the other met the upper end of the LP ltered curve. The circles in panel C indicated the values used to derive 0.75 SII and these values were averaged. Ten other
To derive the FIF, Eq. 2 was transformed to Eq. 3 . The adjusted Q value was 0.3647, the value of N was 12.1488, and the R2 value was 0.9996. Again, P was assumed to be 1.000. Values for the FIF, in one-third octave bands, are summarized in Table IV and Fig. 6. The FIF is characterized by a peak at 1600 Hz which is the frequency range of greatest importance for CHINT sentence recognition. Cumulative values of the CHINT FIF are plotted in Fig. 7, together with those of similar materials in English. As the FIFs for ANSI S3.5-1997 and Pavlovic 1984 were derived from the same data, the ANSI 3.5-1997 cumulative FIF is not plotted in Fig. 7. Frequency regions below 557 Hz and above 2331 Hz each accounted for 25% of importance weight. The midpoint of the FIF is at 1183 Hz.
TABLE IV. Frequency-importance function in one-third octave bands. The weights are expressed as percentages % . 1/3-Octave Center band Hz frequency Hz 0180 180224 224280 280355 355450 450560 560710 710900 9001120 160 200 250 315 400 500 630 800 1000 Weight 1/3-Octave Center % band Hz frequency Hz 5.1 2.2 2.7 3.6 4.3 4.8 6.1 7.0 7.3 11201400 14001800 18002240 22402800 28003550 35504500 45005600 56007100 71009000 1250 1600 2000 2500 3150 4000 5000 6300 8000 Weight % 8.1 9.6 8.4 8.2 7.8 6.2 4.2 2.9 1.5
2357
FIG. 6. Frequency-importance function FIF of the CHINT.
G. Derivation of the ATF
FIG. 8. Best-t absolute transfer function ATF and actual mean scores % plotted as a function of SII.
The slope of the RTF was adjusted to reect its absolute value. The ATF is shown in Fig. 8. The iterative process of varying K and DR suggested that the smallest rms error was obtained with K set at 11.8 dB and DR set at 30 dB. These values provided the best t of the data to the ATF. The corresponding Q and N values were 0.1894 and 12.1771 and the R2 value was 0.8926 for predicting SII from intelligibility scores. The Q value was 0.1844 and the N value was 12.5769, with the R2 at 0.9499 for predicting intelligibility scores using SII values. These R2 values indicate that the model still provided a good t to the data. Applying Eq. 3 to the mean scores, the SII for all ltering/SNR test conditions were obtained. These values are plotted in Fig. 8.
IV. DISCUSSION A. Reception threshold of sentences and performance-intensity function
The mean RTS in noise found in this study is within the 95% condence interval for normal hearing listeners found by Wong and Soli 2005 . The slope of the PI function found in this study is also in agreement with the slope of 9.7% per dB found previously Wong and Soli, 2005 . These ndings suggested that the CHINT is a consistent measure of speech intelligibility in noise. Although Studebaker et al. 1987 and
Sherbecoe and Studebaker 2002 suggested that a steeper PI function is expected when speech and noise spectra are matched, the slope of the PI function obtained in this study is not as steep as might have been expected based on some earlier work that used talker spectrum matched maskers. In fact, the PI function is consistent with those reported for the CST by Sherbecoe and Studebaker and the English HINT by Eisenberg et al. 1998 , and more gentle than those found by Plomp and Mimpen 1979 , Hagerman 1982 , and Studebaker et al. 1987 . The CHINT materials were designed to yield a PI function slope of about 10% per dB so that they are more suitable for the HINT adaptive procedure Wong and Soli, 2005; Nilsson et al., 1994 . Any inuence due to clarity of speech or spectral matching between speech and noise would have been accounted for by this predetermined criterion of test development. Because test stimulus levels are specied in the same way, we are able to compare the PI function of the CHINT and the English HINT and conclude that they yielded a similar PI function Sherbecoe and Studebaker, 2002 .
B. The CHINT transfer function
FIG. 7. Comparison of cumulative FIFs derived from the CHINT and other similar materials. 2358 J. Acoust. Soc. Am., Vol. 121, No. 4, April 2007
As suggested by Sherbecoe and Studebaker 2002 , comparing transfer functions TFs across studies is difcult because absolute TFs often are not reported. When absolute TFs have been reported, testing might not have been conducted using noise matched to the speech spectrum to control for ltering effects of hearing thresholds. Furthermore, a priori assumptions about the size of speech peaks have been made when relative TFs were converted to absolute TFs. Nonetheless, like the TFs of many English materials, The CHINT TF shows a monotonic relationship between SII values and speech recognition scores. The slope of the CHINT transfer function and the Q and N values are similar to those reported by Eisenberg et al. 1998 for the English HINT and Sherbecoe and Studebaker 2002 for the CST see Table V . However, the N value for Cantonese HINT is much smaller than the mean number of phonemes 26.3 in CHINT sentences, in contrast with the English HINT sentences with a mean number of phonemes 16.8 per sentence matching the N value. It seems therefore, that Cantonese phonemes are not perceived as separate units but chunks. This speculation
TABLE V. Comparison of transfer functions TFs and frequency-importance functions FIFs for various speech materials. Authors Current study DePaolis et al. 1996 Material CHINT sentences PB-50 words SPIN sentences Continuous discourse HINT sentences ANSI S3.5 standard CNC monosyllables CST passages CID W-22 words NU-6 words Continuous discourse Q 0.1844 0.641 0.329 0.353 0.235 0.247 0.474 0.227 0.283 0.404 N 12.58 2.436 4.481 8.943 15.13 16.90 2.518 10.26 4.057 3.334 TF slopea 11.0 4.0 8.0 7.0 10.0 10.0 10.6 10.2 6.4 18.7 FIF shape, peaks bimodal, below 200 Hz and around 800 1600 Hz unimodal, around 2000 Hz unimodal, around 2000 Hz unimodal, around 2000 Hz unimodal, 2000 Hz using average unimodal, 2000 Hz bimodal, 500 and 1600 bimodal, 400 and 2000 bimodal, 500 and 2000 bimodal, 500 and 2500
Eisenberg et al. 1998 Henry et al. 1998 Sherbecoe and Studebaker 2002 Studebaker and Sherbecoe 1991 Studebaker et al. 1993 Studebaker et al. 1987
a
Hz Hz Hz Hz
TF slopes are in percent per 0.0333 SII and are based on observed or estimated scores between 20 and 80%. Numbers are either reported in the relevant studies or estimated by the authors according to reported TFs denotes approximation . TF slope data are not available in Henry et al. 1998 .
requires further research to verify. However, an example may help illustrate this phenomenon. The CHINT sentence, / t ai6 k1 siG4 j 0 t6 h 0 i2 k * G1 si1 k G2 tin6 wa2/, means my big brother is on the phone all day long at work. The Chinese word / k 1/ means brother and would limit the word before it to those related to order of birth. The word / j 0 t6/ means day and would limit the word before it to mean the day before or after, or all day. The word / h 0 i2/ means in and refers specically to a physical location. When followed by the character / k * G1/ work , the next word must be / s i1/ which together with / h 0 i2/ and / k * G1/ mean at ones workplace. The words / k G2/ and / w a2/ both mean speaking and when spoken in a sequence, the only words that can t between are / t in6/ electric , / t a i6/ big , or / s iu3/ laugh . The three monosyllables together mean talking on the phone, lying or joking. Therefore, it seems that individual Chinese speech sounds are not independent of each other and perhaps is related to the fact that Chinese polysyllabic words are made up of semantically meaningful monosyllabic parts. Chinese polysyllabic words seemed to have greater semantic and syntactic constraints than their English counterparts. This redundancy has made shorter Chinese sentences inappropriate for adaptive testing. Adverbial phrases were added to shorter sentences to derive the CHINT sentences to make them less redundant Wong and Soli, 2005 . An SII value of 0.5 or higher would yield close to maximum intelligibility using CHINT sentences 97.6% . This would be consistent with ndings of other materials e.g., the CST that are more redundant in content than single words e.g., NU-6 . At the same SII, 89.3% intelligibility is expected with the English HINT. As greater constraint on speech material e.g., grammatical structure and context and greater redundancy would yield higher percent intelligibility for a given AI ANSI-S3.5 1969, p. 21; Studebaker et al., 1987 , we can conclude that the Cantonese materials are more redundant than the English HINT and materials that employ single-word stimuli such as the NU-6 Studebaker et al., 1993 . In summary, the CHINT sentences have fewer independent sounds than would be suggested by the number of phoJ. Acoust. Soc. Am., Vol. 121, No. 4, April 2007
nemes in the sentences. The CHINT material is more redundant in context than similar materials such as the English HINT or single-word materials.
C. Crossover frequency and frequency-importance function
As the crossover frequency decreases, the relative importance of low-frequency information increases. Since the crossover frequency is lower for Cantonese than for all English speech materials see Table I , we conclude that low frequencies in Cantonese contain more speech information than in English. Results from the Cantonese HINT FIF Fig. 7 also show that when compared to similar English materials, the 1/3 octave band centered at 180 Hz carries more weight for speech understanding. As a result, the whole FIF is shifted down in frequency; 75% of CHINT information is located below 2331 Hz. Figure 7 also shows that the shape of CHINT cumulative FIF resembles those of average speech derived by Pavlovic 1987 and the ANSI S3.5-1997, with the exception that frequencies below 400 Hz are slightly more heavily weighted and frequencies above 4000 Hz exhibit reduced importance when compared to equivalent English materials. Several reasons might contribute to differences in CF and FIF across materials. First, redundancy of materials may be a factor Studebaker et al., 1987; Studebaker and Sherbecoe, 1991 . As discussed, CHINT appears to carry much redundant information. In fact, the crossover frequency of the CHINT material resembles that reported for continuous discourse by Studebaker et al. 1987 at 1189 Hz . This contrasts with those reported for the W-22, with a crossover frequency at 1314 Hz Studebaker and Sherbecoe, 1991 , the English HINT, with crossover frequency at 1550 Hz Eisenberg et al., 1998 , and nonsense syllables, with crossover frequency at 1980 Hz French and Steinberg, 1947 . Second, the shape of the FIF may differ depending on the bandwidth of the lter used to derive the function DePaolis, 1996 . Third, the rate, clarity, and peak spectrum of the speech materials may have an effect on the FIF, so that for a given material, different talkers may yield different FIFs SherbeWong et al.: Cantonese speech intelligibility index 2359
coe and Studebaker, 2002 . This, however, is unlikely to have affected the shape of the CHINT FIF because spectrally matched noise was used Studebaker et al., 1994 . The crossover frequency obtained in this study was slightly lower than the midpoint of the FIF 1183 Hz and the crossover frequencies did not vary systematically with SNR Studebaker et al., 1993; Sherbecoe and Studebaker, 2002 . Thus, the contribution of talker characteristics was small. The shift in importance weight toward lower frequency is probably due to a fourth factorthe tonal nature of Cantonese. Findings from research on tone recognition support this phenomenon e.g., Fok Chan, 1974 .
1. The role of fundamental frequency on Cantonese speech perception
help in the identication of tone 3 in Mandarin, amplitude cues contribute to the discrimination of tone 4, and periodicity cues aid recognition of all ve tones. When fundamental frequencies are absent, resolved and unresolved harmonics contribute to tone recognition Stagray et al., 1992 . These results suggest that low-frequency information is important for tone recognition which, in turn, aids sentence recognition. Overall, ndings from this study suggested that lowfrequency information is more important for speech understanding for Cantonese than for English. These results are consistent with ndings in tone recognition experiments e.g., Fok Chan, 1974 .
V. SUMMARY AND CONCLUSION
Fundamental frequency F0 contains information on pitch level and contour. F0 ranges from 80 to 210 Hz for males and 190 to 305 Hz for females Baken, 1987; Evans et al., 2006 . F0 plays a crucial role in identifying the meaning of Cantonese words with identical phonemes Fok Chan, 1974; Gandour, 1981, 1983; Lee et al., 2002 . While some studies found pitch contour and direction are more important than height Fok Chan, 1974; Gandour, 1981; Cheung, 1992; Whalen and Xu, 1992 , others found height a more important factor Vance, 1976; Tse, 1977; Gandour, 1983; Lui, 2000 . The CHINT FIF showed that, while the 1/3 octave band between 180 to 224 Hz contributed only minimally to intelligibility, frequencies below 180 Hz, where the fundamental frequency of male speakers lies the CHINT was recorded using a male voice , seemed more importantly weighted. Ng 1981 also found that good Cantonese word discrimination can be achieved even when the signals have been LP ltered at 250 Hz. The contribution of tonal information is exemplied in the ability to acquire correct tone production by children with moderate to profound hearing loss and Dodd and So 1994 attributed this phenomenon to better hearing at low frequency.
2. Findings from other tonal language literature
To summarize, a SII for the CHINT material was established in this study. While the Q and N values were similar to those of English sentence materials, the N value was smaller than the average number of phonemes in each sentence. The slope of the ATF, the N value, the crossover frequency and the FIF of the CHINT suggest that low frequencies are more important for Cantonese speech recognition than English. Whether the redundancy of the CHINT material and/or the tonal nature of the language has affected this result remains uncertain. One way to separate these effects is to repeat the experiment using female recordings with higher fundamental frequency . If similar results are obtained, the shift in importance weight at low frequency is probably related to the redundancy in the speech materials. These results also suggest that it is important to establish separate FIF and SII for various languages. The FIF obtained in this study may have important implications on how hearing aids and/or cochlear implants should be tted to Cantonese speakers. The roles of low- or high-frequency information on speech intelligibility assessed using other Cantonese speech materials, and using materials in other tonal languages, need to be established.
ACKNOWLEDGMENT
Research on Mandarin, another Chinese dialect, also suggested that low frequencies play an important role in speech and tone recognition. Tone recognition could be preserved at a high level 94.6% correct , even with speech LP ltered at 300 Hz Liang, 1963 . Similarly, Fu et al. 1998 found that tone recognition of LP ltered Mandarin at 500 Hz was preserved. In another study, about 80% of Mandarin tones were correctly identied when speech was LP ltered at 750 Hz Zhang et al., 1981 . However, the cues for tone recognition in Mandarin and Cantonese, however, are slightly different. The primary cues for Cantonese tones are pitch contour and level Fok Chan, 1974 . While fundamental frequency is the most important cue for tone recognition in both dialects, temporal e.g., duration and amplitude envelopes cue Mandarin sentence recognition when spectral information is absent, these cues are less crucial when more spectral information is available Lin, 1988; Fu et al., 1998; Whalen and Xu, 1992; Wei et al., 2004 . Similarly, Fu and Zeng 2000 found that tone duration and amplitude contours
The authors are grateful to Carol Cheung, Kammy Yeung, and Benny Zee for their assistance in data collection and analysis. Our gratitude also goes to all participants in the study, as well as to Phonak Hearing Center Hong Kong Ltd. and the University of Hong Kong Standard Chartered Community Foundation Hearing Center for their assistance in participant recruitment. This study was supported by a Research Grants Council CERG grant HKU 7165/01H , Hong Kong, China.
Amlani, A. M., Punch, J. L., and Ching, T. Y. C. 2002 . Methods and applications of the audibility index in hearing aid selection and tting, Trends Amplif. 6, 81129. ANSI 1969 . S3.5, American National Standard Methods for the Calculation of the Articulation Index Acoustical Society of America, New York . ANSI 1997 . S3.5, American National Standard Methods for Calculation of the Speech Intelligibility Index Acoustical Society of America, New York . Baken, R. J. 1987 . Clinical measurement of speech and voice Taylor and Frances, London . Browning, L. K. 1974 . The Cantonese dialect with special reference to contrasts with Mandarin as an approach to determining dialect relatedness, Ph.D dissertation, Georgetown University. Wong et al.: Cantonese speech intelligibility index
Byrne, D., Dillon, H., Tran, K., Arlinger, S., Wibraham, K., Cox, R., Hagerman, B., Hetu, R., Kei, J., Lui, C., Kiessling, J. Kotby, M. N., Nasser, N. H. A., El Kholy, W. A. H., Nakanishi, Y., Oyer, H., Powell, R., Stephens, D., Meredith, R., Sirimanna, T., Tavartkiladze, G., Fronlenkov, G. I., Westerman, S., and Ludvigsen, C. 1994 . An international comparison of long-term average speech spectra, J. Acoust. Soc. Am. 96, 21082120. Cheung, P. P. 1992 . Tonal confusions in Cantonese at different signal-tonoise ratios, B.Sc. dissertation, University of Hong Kong. Ciocca, V., Francis, A. L., Aisha, R., and Wong, L. 2002 . The perception of Cantonese lexical tones by early-deafened cochlear implantees, J. Acoust. Soc. Am. 111, 22502256. DePaolis, R. A., Janota, C. P., and Frank, T. 1996 . Frequency importance functions for words, sentences, and continuous discourse, J. Speech Hear. Res. 39, 714723. Dodd, B. J., and So, L. K. H. 1994 . The phonological abilities of Cantonese-speaking children with hearing loss, J. Speech Hear. Res. 37, 671779. Doyle, J., and Wong, L. L. 1996 . Mismatch between aspects of hearing impairment and hearing disability/handicap in adult/elderly Cantonese speakers: some hypotheses concerning cultural and linguistic inuences, J. Am. Acad. Audiol 7, 442446. Doyle, J., Schaefer, C., Dacakis, G., and Wong, L. L. N. 2002 . Hearing levels and hearing handicap in Cantonese speaking Australian, Asia-Pac. J. Speech Lang. Hear. 7, 92100. Eisenberg, L. S., Dirks, D. D., Takayanagi, S., and Martinez, A. S. 1998 . Subjective judgments of clarity and intelligibility for ltered stimuli with equivalent speech intelligibility index predictions, J. Speech Lang. Hear. Res. 41, 327339. Evans, S., Neave, N., and Wakelin, D. 2006 . Relationships between vocal characteristics and body size and shape in human males: An evolutionary explanation for a deep male voice, Biol. Psychol. 72 2 , 160163. Fletcher, H., and Galt, R. H. 1950 . The perception of speech and its relation to telephony, J. Acoust. Soc. Am. 22, 89151. Fok Chan, Y. Y. 1974 . A Perceptual Study of Tones in Cantonese University of Hong Kong, Hong Kong . French, N. R., and Steinberg, J. C. 1947 . Factors governing the intelligibility of speech sounds, J. Acoust. Soc. Am. 19, 90119. Fu, Q. J., and Zeng, F. G. 2000 . Identication of temporal envelope cues in Chinese tone recognition, Asia Pacic J. Speech Lang. Hear. 5, 4557. Fu, Q. J., Zeng, F. G., Shannon, R. V., and Soli, S. D. 1998 . Importance of tonal envelope cues in Chinese speech recognition, J. Acoust. Soc. Am. 104, 505510. Gandour, J. 1981 . Perceptual dimensions of tones: evidence in Cantonese, J. Chin. Linguist. 9, 2036. Gandour, J. 1983 . Tone perception in Far Eastern languages, J. Phonetics 11, 149175. Hagerman, B. 1982 . Sentences for testing speech intelligibility in noise, Scand. Audiol. 11, 7987. Henry, B. A., McDermott, H. J., McKay, C. M., James, C. J., and Clark, G. M. 1998 . A frequency importance function for a new monosyllabic word test, Aust. J. Audiol. 20, 7986. Kamm, C. A., Dirks, D. D., and Bell, T. S. 1985 . Speech recognition and the articulation index for normal and hearing-impaired listeners, J. Acoust. Soc. Am. 77, 281288. Killion, M. C., and Christensen, L. A. 1998 . The case of the missing dots: AI and SNR loss, Hear. J. 51, 3247. Lau, C. C., and So, K. W. 1988 . Material for Cantonese speech audiometry constructed by appropriate phonetic principles, Br. J. Audiol. 22, 297304. Lee, K. Y. S., Chiu, S. N., and van Hasselt, C. A. 2002 . Tone perception ability of Cantonese-speaking children, Lang Speech 45, 387406. Li, R. 1989 . The classication of the Chinese dialects, FangYan. 4, 241259. Liang, Z. A. 1963 . The auditory perception of Mandarin tones, Acta. Physiol. Sincia. 26, 8591. Lin, M. C. 1988 . The acoustic characteristics and perceptual cues of tones in standard Chinese, Chin. Ling. 204, 182193. Lui, J. 2000 . Cantonese tones perception in children, Unpublished B.Sc. dissertation, University of Hong Kong. Macrae, J. H., and Brigden, D. N. 1973 . Auditory threshold impairment and everyday speech reception, Audiology 12, 272290. Matthews, S., and Yip, V. 1994 . Cantonese: A Comprehensive Grammar Routledge, London .
Mueller, H. G., and Killion, M. C. 1990 . An easy method for calculating the articulation index, Hear. J. 43, 1417. Ng, Y. H. 1981 . The effects of ltering on the intelligibility of Cantonese, M.Ed. dissertation, University of Manchester. Nilsson, M., Soli, S. D., and Sullivan, J. A. 1994 . Development of the Hearing In Noise Test for the measurement of speech reception thresholds in quiet and in noise, J. Acoust. Soc. Am. 95, 10851099. Pavlovic, C. V. 1984 . Use of the articulation index for assessing residual auditory function in listeners with sensorineural hearing impairment, J. Acoust. Soc. Am. 75, 12531258. Pavlovic, C. V. 1987 . Derivation of primary parameters and procedures for use in speech intelligibility predictions, J. Acoust. Soc. Am. 82, 413 422. Plomp, R., and Mimpen, A. M. 1979 . Improving the reliability of testing the speech reception threshold for sentences, Audiology 18, 4352. Ramsey, S. R. 1987 . The Languages of China Princeton University Press, Princeton . Rankovic, C. M. 1991 . An application of the articulation index to hearing aid tting, J. Speech Hear. Res. 34, 391402. Rankovic, C. M. 1995 . Prediction of articulation scores, J. Acoust. Soc. Am. 97, 3358. Sherbecoe, R. L., and Studebaker, G. A. 2002 . Audibility-index functions for the Connected Speech Test, Ear Hear. 23, 385398. So, L. K. H., and Dodd, B. J. 1995 . The acquisition of phonology by Cantonese-speaking children, J. Child Lang 22, 473493. Stagray, J. R., Downs, D., and Sommers, R. K. 1992 . Contributions of the fundamental, resolved harmonics, and unresolved harmonics in tonephoneme identication, J. Speech Hear. Res. 35, 14061409. Stelmachowicz, P., Lewis, D., and Creutz, T. 2002 . Situational HearingAid Response Prole (SHARP, version 6.0) Users Manual Boys Town National Research Hospital, Omaha . Studebaker, G. A., and Sherbecoe, R. L. 1991 . Frequency-importance and transfer functions for recorded CID W-22 word lists, J. Speech Hear. Res. 34, 427438. Studebaker, G. A., and Sherbecoe, R. L. 1993 . Frequency-importance functions for speech recognition, in Acoustical factors affecting hearing aid performance, edited by G. A. Studebaker and I. Hochberg Allyn and Bacon, Boston , pp. 185204. Studebaker, G. A., Pavlovic, C. V., and Sherbecoe, R. L. 1987 . A frequency importance function for continuous discourse, J. Acoust. Soc. Am. 81, 11301138. Studebaker, G. A., Sherbecoe, R. L., and Gilmore, C. 1993 . Frequencyimportance and transfer functions for the Auditec of St. Louis recordings of the NU-6 word test, J. Speech Hear. Res. 36, 799807. Studebaker, G. A., Sherbecoe, R. L., McDaniel, D. M., and Gwaltney, C. A. 1999 . Monosyllabic word recognition at higher-than-normal speech and noise levels, J. Acoust. Soc. Am. 105, 24312444. Studebaker, G. A., Taylor, R., and Sherbecoe, R. L. 1994 . The effect of noise spectrum on speech recognition performance-intensity functions, J. Speech Hear. Res. 37, 439448. Tillman, T. W., and Carhart, R. 1966 . An expanded test for speech discrimination utilizing CNC monosyllabic words: Northwestern University auditory test no. 6. Technical report no. SAM-TR-66-55. San Antonio, TX: USAF School of Aerospace Medicine, Brooks Air Force Base. Tse, J. K. P. 1977 . Tone acquisition in Cantonese: a longitudinal case study, J. Child Lang 5, 191204. Vance, T. J. 1976 . An experimental investigation of tone and intonation in Cantonese, Phonetica 33, 368392. Wei, C. G., Cao, K., and Zeng, F. G. 2004 . Mandarin tone recognition in cochlear-implant subjects, Hear. Res. 197, 8795. Whalen, D. H., and Xu, Y. 1992 . Information for mandarin tones in the amplitude contour and in brief segments, Phonetica 49, 2547. Wong, L., Hickson, L., and McPherson, B. 2004 . Hearing aid expectations among Chinese rst-time users: Relationships to post-tting satisfaction, Aust. New Zeal. J. Audiol. 26, 5369. Wong, L. L. N., and Soli, S. D. 2005 . Development of the Cantonese Hearing in Noise Test CHINT , Ear Hear. 26 3 , 276289. Wong, A. O., and Wong, L. L. 2004 . Tone perception of Cantonesespeaking prelingually hearing-impaired children with cochlear implants, Otolaryngol.-Head Neck Surg. 130, 751758. Zhang, J. L., Qi, S. Q., Song, M. Z., and Liu, Q. X. 1981 . On the important role of Chinese tones in speech intelligibility, Acta Acust. Beijing 4, 23724.
2361

Development of Cantonese Speech Intelligibility Index

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Development of Cantonese Speech Intelligibility Index

Uploaded by

Copyright:

Available Formats

Development of the Cantonese speech intelligibility indexa)

Lena L. N. Wong,b Amy H. S. Ho,c and Elizabeth W. W. Chuad

Wong et al.: Cantonese speech intelligibility index

D. Aim of the study

1.3 3.8 11.1 2.2 11.1 15.6

0.6 6.7 12.7 28.9 31.1 30.0 29.4

4.4 7.3 23.1 32.2 44.4 48.9 43.8

0.6 5.0 13.3 42.5 60.0 56.7 58.9 56.3

7.3 9.3 15.0 43.1 50.6 74.4 74.4 74.4 74.4

7.5 8.7 0.6 0.6

26.9 21.9 1.3 3.1 2.5

41.9 34.4 7.5 6.3 3.1 0.6

54.4 39.4 12.7 8.7 4.0 3.8

88.7 68.1 24.4 19.4 10.0 2.5

84.4 75.6 41.9 29.4 18.8 3.1

92.5 89.4 56.9 55.6 35.6 6.7

96.9 90.0 65.0 60.0 40.6 12.5

2. Determination of performance-intensity function

3. Determination of crossover frequencies

4. Derivation of the relative transfer function

III. RESULTS A. Pilot study to select ltering/SNR test conditions

C. Reception threshold of sentences and performance-intensity function

B. Results in various ltering/SNR conditions

Wong et al.: Cantonese speech intelligibility index

FIG. 6. Frequency-importance function FIF of the CHINT.

G. Derivation of the ATF

J. Acoust. Soc. Am., Vol. 121, No. 4, April 2007

Wong et al.: Cantonese speech intelligibility index

You might also like