|
1. |
Analysis of acoustic elements and syntax in communication sounds emitted by mustached bats |
|
The Journal of the Acoustical Society of America,
Volume 96,
Issue 3,
1994,
Page 1229-1254
Jagmeet S. Kanwal,
Sumiko Matsumura,
Kevin Ohlemiller,
Nobuo Suga,
Preview
|
PDF (4046KB)
|
|
摘要:
Mustached bats,Pteronotusparnelliiparnelliispend most of their lives in the dark and use their auditory system for acoustic communication as well as echolocation. The sound spectrograms of their communication sounds or ‘‘calls’’ revealed that this species produces a rich variety of calls. These calls consist of one or more of the 33 different types of discrete sounds or ‘‘syllables’’ that are emitted singly and/or in combination. These syllables can be further classified as 19 simple syllables, 14 composites, and three subsyllables. Simple syllables consist of characteristic geometric patterns of CF (constant frequency), FM (frequency modulation), and NB (noise burst) sounds that are defined quantitatively using statistical criteria. Composites consist of simple syllables or subsyllables conjoined without any silent interval. Most syllable types exhibit a large intrinsic variation in their physical structure compared to the stereotypic echolocation pulses. Syllable domains are defined on the basis of multiple parameters, although these can be collapsed onto three dimensions that capture 99% of the measured variation among different types of syllables. Temporal analysis of multisyllabic constructs reveals several syntactical rules for syllable transitions.
ISSN:0001-4966
DOI:10.1121/1.410273
出版商:Acoustical Society of America
年代:1994
数据来源: AIP
|
2. |
Marine mammal call discrimination using artificial neural networks |
|
The Journal of the Acoustical Society of America,
Volume 96,
Issue 3,
1994,
Page 1255-1262
John R. Potter,
David K. Mellinger,
Christopher W. Clark,
Preview
|
PDF (1490KB)
|
|
摘要:
Recent work has applied a linear spectrogram correlator filter (SCF) to detect bowhead whale (Balaenamysticetus) song notes, outperforming both a time‐series‐matched filter and a hidden Markov model. The method relies on an empirical weighting matrix. An artificial neural net (ANN) may be better yet, since it offers two advantages; (i) the equivalent weighting matrix is determined by training and can converge to a more optimal solution and (ii) an ANN is a nonlinear estimator and can embody more sophisticated responses. A three‐layer feed‐forward ANN is ideally suited to this application and has been implemented on 1475 sounds, of which 54% were used for training and 46% kept as ‘‘unseen’’ test data. The trained ANN error rate was 1.5%, a twofold improvement over previous methods. It is shown that ANN hidden neurons can be interrogated to reveal the operating paradigm developed during training. The function of each of these neurons can be determined in terms of spectrographic features of the training calls. Furthermore, the operating paradigm can be controlled and training time reduced by assigning specific recognition tasks to hidden neurons prior to training, rather than initiating training with randomized weights. The ANN is compared to the SCF and the role of the ‘‘hidden’’ neurons and equivalent weighting matrices are discussed.
ISSN:0001-4966
DOI:10.1121/1.410274
出版商:Acoustical Society of America
年代:1994
数据来源: AIP
|
3. |
Perceptual compensation for speaker differences and for spectral‐envelope distortion |
|
The Journal of the Acoustical Society of America,
Volume 96,
Issue 3,
1994,
Page 1263-1282
Anthony J. Watkins,
Simon J. Makin,
Preview
|
PDF (2636KB)
|
|
摘要:
This study asks whether perceptual mechanisms that compensate for the spectral‐envelope distortion of transmission channels also contribute to compensation for speaker differences. Subjects identified test words that were played after a carrier sentence. In some conditions the carriers were synthesized withF1 in low‐ and high‐frequency ranges and in others they were distorted by filters whose frequency response is the spectral envelope of one vowel minus the spectral envelope of another. The filter /■/ minus /ε/ and its inverse were used. Test words were drawn from an /■tch/ to /εtch/ continuum. Carriers filtered by /■/ minus /ε/ and its inverse give a phoneme boundary difference, indicating compensation for spectral envelope distortion. A phoneme boundary difference also occurs between carriers withF1 in low and high ranges, indicating compensation for speaker differences. Neither of these effects is reduced by playing the carrier backwards, even though a measurement of the perceived naturalness of carriers is sharply reduced by this manipulation. Analysis of carriers synthesized with low and highF1 showed that they have different long‐term spectra, and subsequent experiments used time‐stationary filters to alter this characteristic. The results showed that the long‐term spectra of the carriers govern their influence on the identity of subsequent test sounds. However, measurements of perceptual confusions among the carriers and of perceived talker‐differences between carriers revealed that other, time‐varying factors are more important for voice identification.
ISSN:0001-4966
DOI:10.1121/1.410275
出版商:Acoustical Society of America
年代:1994
数据来源: AIP
|
4. |
A comparison of content‐masking procedures for obtaining judgments of discrete affective states |
|
The Journal of the Acoustical Society of America,
Volume 96,
Issue 3,
1994,
Page 1283-1290
Margaret Friend,
M. Jeffrey Farrar,
Preview
|
PDF (1082KB)
|
|
摘要:
The purpose of this article is to investigate observers’ use of acoustic cues to arrive at judgments of the speaker’s affective state and to address current methodological limitations. Ninety‐nine female undergraduates rated the level of excitement, happiness, and anger of speech stimuli under three content‐masking procedures: low‐pass filtering, random splicing, and reiterant speech. Each procedure preserves some forms of acoustic information while disrupting or degrading others. As predicted, the content‐masking procedures generated bias in observers’ affective ratings. Results are discussed in terms of the efficacy of the content‐masking procedures and implications for the study of acoustic cues to speaker affect.
ISSN:0001-4966
DOI:10.1121/1.410276
出版商:Acoustical Society of America
年代:1994
数据来源: AIP
|
5. |
The multidimensional nature of pathologic vocal quality |
|
The Journal of the Acoustical Society of America,
Volume 96,
Issue 3,
1994,
Page 1291-1302
Jody Kreiman,
Bruce R. Gerratt,
Gerald S. Berke,
Preview
|
PDF (1500KB)
|
|
摘要:
Although the terms ‘‘breathy’’ and ‘‘rough’’ are frequently applied to pathological voices, widely accepted definitions are not available and the relationship between these qualities is not understood. To investigate these matters, expert listeners judged the dissimilarity of pathological voices with respect to breathiness and roughness. A second group of listeners rated the voices on unidimensional scales for the same qualities. Multidimensional scaling analyses suggested that breathiness and roughness are related, multidimensional constructs. Unidimensional ratings of both breathiness and roughness were necessary to describe patterns of similarity with respect to either quality. Listeners differed in the relative importance given to different aspects of voice quality, particularly when judging roughness. The presence of roughness in a voice did not appear to influence raters’ judgments of breathiness; however, judgments of roughness were heavily influenced by the degree of breathiness, the particular nature of the influence varying from listener to listener. Differences in how listeners focus their attention on the different aspects of multidimensional perceptual qualities apparently are a significant source of interrater unreliability (noise) in voice quality ratings.
ISSN:0001-4966
DOI:10.1121/1.410277
出版商:Acoustical Society of America
年代:1994
数据来源: AIP
|
6. |
The role of short‐term and long‐term auditory storage in processing spectral relations for adult and child speech |
|
The Journal of the Acoustical Society of America,
Volume 96,
Issue 3,
1994,
Page 1303-1313
Ralph N. Ohde,
Anne H. Perry,
Preview
|
PDF (1670KB)
|
|
摘要:
The processes involved in the perception of spectral change between the nasal murmur and the vocalic transition for speakers of different ages were assessed before and after disruption of the variation in spectra between these elements. Three children, aged 3, 5, and 7, and an adult female and male produced consonant–vowel (CV) syllables consisting of either [m] or [n]followed by [i] or [u]. In one condition (spectrally noncontiguous), the acoustic information surrounding the region of spectral change was digitally removed and in another condition (spectrally contiguous) this portion of the signal was retained. In both of these conditions, intervals of silence ranging from 0 to 2000 ms were inserted between 50‐ms segments of murmur and vocalic transition. These gap duration conditions were then presented to adult listeners for the identification of the nasal. Across speakers, the results for the spectral contiguous condition support a primary mechanism in the perception of spectral relations that is mediated by processes within short‐term auditory memory, but the results for the spectral noncontiguous condition revealed little consistent support for either short‐term or long‐term memory processes.
ISSN:0001-4966
DOI:10.1121/1.410278
出版商:Acoustical Society of America
年代:1994
数据来源: AIP
|
7. |
Stimulus variability and spoken word recognition. I. Effects of variability in speaking rate and overall amplitude |
|
The Journal of the Acoustical Society of America,
Volume 96,
Issue 3,
1994,
Page 1314-1324
Mitchell S. Sommers,
Lynne C. Nygaard,
David B. Pisoni,
Preview
|
PDF (1899KB)
|
|
摘要:
The present experiments investigated how several different sources of stimulus variability within speech signals affect spoken‐word recognition. The effects of varying talker characteristics, speaking rate, and overall amplitude on identification performance were assessed by comparing spoken‐word recognition scores for contexts with and without variability along a specified stimulus dimension. Identification scores for word lists produced by single talkers were significantly better than for the identical items produced in multiple‐talker contexts. Similarly, recognition scores for words produced at a single speaking rate were significantly better than for the corresponding mixed‐rate condition. Simultaneous variations in both speaking rate and talker characteristics produced greater reductions in perceptual identification scores than variability along either dimension alone. In contrast, variability in the overall amplitude of test items over a 30‐dB range did not significantly alter spoken‐word recognition scores. The results provide evidence for one or more resource‐demanding normalization processes which function to maintain perceptual constancy by compensating for acoustic–phonetic variability in speech signals that can affect phonetic identification.
ISSN:0001-4966
DOI:10.1121/1.411453
出版商:Acoustical Society of America
年代:1994
数据来源: AIP
|
8. |
Effects of temporal smearing on temporal resolution, frequency selectivity, and speech intelligibility |
|
The Journal of the Acoustical Society of America,
Volume 96,
Issue 3,
1994,
Page 1325-1340
Zezhang Hou,
Chaslav V. Pavlovic,
Preview
|
PDF (1785KB)
|
|
摘要:
Envelopes of speech were smeared in 23 parallel frequency channels. The smeared speech was presented to normal‐hearing listeners, and the effects of different smearing magnitudes on speech intelligibility were measured by obtaining speech recognition scores. It was demonstrated theoretically and experimentally that the system consisting of the computer smearing and the auditory system had reduced temporal resolution but nearly normal frequency resolution. Speech intelligibility of the processed vowel–consonant nonsense syllables was tested for low‐ and high‐pass filter conditions. The overall speech recognition scores as well as the recognition scores of the consonants grouped according to articulatory features were analyzed. The results indicated that smearing with a narrow temporal window did not degrade speech. The larger equivalent rectangular durations (ERDs) of the resultant temporal window (RTW) of the combined system (temporal smearing plus auditory system) produced a small but significant reduction in speech intelligibility for the low‐pass filter condition. Scores for the RTWs≳16 ms were significantly different from the score for the 7.7‐ms RTW for the high‐pass filter condition, but this effect was small and did not differ across articulatory features.
ISSN:0001-4966
DOI:10.1121/1.410279
出版商:Acoustical Society of America
年代:1994
数据来源: AIP
|
9. |
Viseme classifications of Dutch consonants and vowels |
|
The Journal of the Acoustical Society of America,
Volume 96,
Issue 3,
1994,
Page 1341-1355
Nic van Son,
Tirtsa M. I. Huiskamp,
Arjan J. Bosman,
Guido F. Smoorenburg,
Preview
|
PDF (2038KB)
|
|
摘要:
Videotaped lists of meaningless Dutch syllables were presented in quiet to four subject groups, differing with respect to their knowledge of and experience with lipreading (lipreading expertise). Syllables consisted of all Dutch consonants within three vowel contexts, and of all Dutch vowels within four consonant contexts. Three speakers pronounced all syllable lists. The aim of the research was (1) to establish viseme classifications of Dutch vowels and consonants; (2) to interpret the visual‐perceptual dimensions underlying this classification and relate them to acoustic‐phonetic parameters; (3) to establish the effect of lipreading expertise on the classification of visually similar phonemes (visemes). In general, viseme classification proved very constant with different subject groups: Lipreading expertise is not related to viseme recognition. Important visual features in consonant lipreading areliparticulation,degreeoforalcavityopening, andplaceofarticulation, leading to the following viseme classification: /p,b,m/, /f,v,υ/, /s,z,■/, and /t,d,n,j,l,k,x,r,■,h/. In the acoustic domain, these features may be related to spectral differences. Vowel features in lipreading areliprounding,degreeoflipopening, andvowelduration, yielding the following visemes: /i,■,e,ε,εi,a,■/, /u,y,œ,■/, /o/,o/, and /au,œy/. In the acoustic domain, lip rounding may roughly be related to the second formant, lip opening to the first formant.
ISSN:0001-4966
DOI:10.1121/1.411324
出版商:Acoustical Society of America
年代:1994
数据来源: AIP
|
10. |
Determination of sagittal tongue shape from the positions of points on the tongue surface |
|
The Journal of the Acoustical Society of America,
Volume 96,
Issue 3,
1994,
Page 1356-1366
Tokihiko Kaburagi,
Masaaki Honda,
Preview
|
PDF (1494KB)
|
|
摘要:
This paper describes a method for determining the shape of the midsagittal tongue contour from the positions of points on the tongue surface. The positions of the points and the tongue shape were measured simultaneously by using an alternating magnetic field device and an ultrasonicB‐mode scanner for continuous speech utterances. A comparison between the magnetic and the ultrasonic data revealed that the average measurement difference between the two types of data was 1.16 mm. The shape of the tongue contour was then represented by multivariable linear regression of the magnetically determined positions. The results of the regression analysis showed that the tongue contour was estimated, from four positions on the tongue, with an average estimation error of 1.24 mm. This estimation error could be reduced to 0.84 mm when there was no measurement error between the magnetic and the ultrasonic data, and it was further reduced to 0.43 mm when the receiver coils of the magnetic device were positioned optimally on the tongue. It was also shown that the number of data frames for calculating the regression coefficients could be reduced, while maintaining the estimation accuracy, by appropriately selecting data frames. Finally, the tongue shape was estimated successfully for several phonemes from the magnetically determined positions, thus demonstrating the usefulness of this method for observing the articulatory configuration of the tongue.
ISSN:0001-4966
DOI:10.1121/1.410280
出版商:Acoustical Society of America
年代:1994
数据来源: AIP
|
|