|
51. |
Amplitude and period discrimination of haptic stimuli |
|
The Journal of the Acoustical Society of America,
Volume 104,
Issue 1,
1998,
Page 453-463
Martha A. Rinker,
James C. Craig,
Lynne E. Bernstein,
Preview
|
PDF (137KB)
|
|
摘要:
As part of a project to examine the ability of the hand to receive speech information, the present study examined subjects’ ability to discriminate finger movements along the dimensions of amplitude and period (movement duration). The movements consisted of single-cycle, sinewave movements and single-cycle, cosine movements presented to the index finger. Difference thresholds were collected using an adaptive, two-interval, temporal forced-choice procedure. Amplitudes from 6 to 19 mm were examined, and the difference thresholds ranged from 10% to 18%. The thresholds were unaffected by the period of the movement. Periods from 3000 to 111 ms (0.33–9 Hz) were examined, and thresholds ranged from 6% to 16%. The thresholds were unaffected by the amplitude of the movement. Further measurements in which period was varied in the amplitude discrimination task and amplitude was varied in the period discrimination task indicated that subjects were not using peak velocity as the basis for discrimination. These measurements were collected using a display specifically designed for the examination of haptic stimulation and capable of presenting controlled patterns of movement and vibration to the fingers.
ISSN:0001-4966
DOI:10.1121/1.423249
出版商:Acoustical Society of America
年代:1998
数据来源: AIP
|
52. |
Phonation onset: Vocal fold modeling and high-speed glottography |
|
The Journal of the Acoustical Society of America,
Volume 104,
Issue 1,
1998,
Page 464-470
Patrick Mergell,
Hanspeter Herzel,
Thomas Wittenberg,
Monika Tigges,
Ulrich Eysholdt,
Preview
|
PDF (313KB)
|
|
摘要:
Phonation onset is discussed in the framework of dynamical systems as a Hopf bifurcation, i.e., as a transition from damped to sustained vocal fold oscillations due to changes of parameters defining the underlying laryngeal configuration (e.g., adduction, subglottal pressure, muscular activity). An analytic envelope curve of the oscillation onset is deduced by analyzing the Hopf bifurcation in mathematical models of the vocal folds. It is governed by a single time constant which can be identified with the physiological parameterphonation onset time. This parameter reflects the laryngeal state prior to phonation and can be used as a quantitative classification criterion in order to assess the phonation onset in clinical diagnosis. The extraction of the phonation onset time from simulated time series using a simplified two-mass model and from digital high-speed videos is described in detail. It shows a good agreement between theory and measurement.
ISSN:0001-4966
DOI:10.1121/1.423250
出版商:Acoustical Society of America
年代:1998
数据来源: AIP
|
53. |
Vocal tract area functions for an adult female speaker based on volumetric imaging |
|
The Journal of the Acoustical Society of America,
Volume 104,
Issue 1,
1998,
Page 471-487
Brad H. Story,
Ingo R. Titze,
Eric A. Hoffman,
Preview
|
PDF (3856KB)
|
|
摘要:
Magnetic resonance imaging (MRI) was used to acquire vocal tract shapes of ten vowels /i, ɪ, ɛ, æ, ʌ, ɑ, ɔ, o, ʊ, u/ and two liquid approximants /ɝ, l/ for a 27-year-old adult female. These images were complemented with additional images acquired with electron beam computed tomography (CT) of /i/ and /ɑ/. Each 3-D shape was condensed into a set of cross-sectional areas of oblique sections perpendicular to the centerline of the vocal tract’s long axis, resulting in an “area function.” Formant frequencies computed for each area function showed reasonable similarity to those determined from the natural (recorded) speech of the imaged subject, but differences suggest that some of the imaged vocal tract shapes were articulated differently during imaging than during recording of natural speech, and also that imaging procedures may have compromised some accuracy for a few shapes. The formant calculations also confirmed the significant effect that the piriform sinus can have on lowering the formant frequencies. A comparison is made between area functions derived using both MRI and CT methods for the vowels /i/ and /ɑ/. Additionally, the area functions reported in this study are compared with those from two previous studies and demonstrate general similarities in shape but also obvious differences that can be attributed to anatomical differences of the imaged subjects and to differences in imaging techniques and image processing methods.
ISSN:0001-4966
DOI:10.1121/1.423298
出版商:Acoustical Society of America
年代:1998
数据来源: AIP
|
54. |
Dynamic specification of coarticulated German vowels: Perceptual and acoustical studies |
|
The Journal of the Acoustical Society of America,
Volume 104,
Issue 1,
1998,
Page 488-504
Winifred Strange,
Ocke-Schwen Bohn,
Preview
|
PDF (225KB)
|
|
摘要:
To examine the generality of Strange’s Dynamic Specification Theory of vowel perception, two perceptual experiments investigated whether dynamic (time-varying) acoustic information about vowel gestures was critical for identification of coarticulated vowels in German, a language without diphthongization. The perception by native North German (NG) speakers of electronically modified /dVt/ syllables produced in carrier sentences was assessed using the “silent-center” paradigm. The relative efficacy of static target information, dynamic spectral information (defined over syllable onsets and offsets together), and intrinsic vowel length was investigated in listening conditions in which the centers (silent-center conditions) or the onsets and offsets (vowel-center conditions) of the syllables were silenced. Listeners correctly identified most vowels in silent-center syllables and in vowel-center stimuli when both conditions included information about intrinsic vowel length. When duration information was removed, errors increased significantly, but performance was relatively better for silent-center syllables than for vowel-center stimuli. Acoustical analyses of the effects of coarticulation on target formant frequencies, vocalic duration, and dynamic spectro-temporal patterns in the stimulus materials were performed to elucidate the nature of the dynamic spectral information. In comparison with vowels produced in citation form /hVt/ syllables by the same speaker, the coarticulated /dVt/ utterances showed considerable “target undershoot” of formant frequencies and reduced duration differences between tense and lax vowel pairs. This suggests that both static spectral cues and relative duration information for NG vowels may not remain perceptually distinctive in continuous speech. Analysis of formant movement within syllable nuclei corroborated descriptions of German vowels as monophthongal. However, an analysis of first formanttemporaltrajectories revealed distinct patterns for tense and lax vowels that could be used by listeners to disambiguate coarticulated NG vowels.
ISSN:0001-4966
DOI:10.1121/1.423299
出版商:Acoustical Society of America
年代:1998
数据来源: AIP
|
55. |
Importance of tonal envelope cues in Chinese speech recognition |
|
The Journal of the Acoustical Society of America,
Volume 104,
Issue 1,
1998,
Page 505-510
Qian-Jie Fu,
Fan-Gang Zeng,
Robert V. Shannon,
Sigfrid D. Soli,
Preview
|
PDF (127KB)
|
|
摘要:
Recent studies have shown that temporal waveform envelope cues can provide significant information for English speech recognition. This study investigated the use of temporal envelope cues in a tonal language: Mandarin Chinese. In this study, the speech was divided into several frequency analysis bands; the amplitude envelope was extracted from each band by half-wave rectification and low-pass filtering and was used to modulate a noise of the same bandwidth as the analysis band. These manipulations preserved temporal and amplitude cues in each frequency band, but removed the spectral detail within each band. Chinese vowels, consonants, tones and sentences were identified by 12 native Chinese-speaking listeners with 1, 2, 3, and 4 noise bands. The results showed that the recognition score of vowels, consonants, and sentences increased monotonically with the number of bands, a pattern similar to that observed in English speech recognition. In contrast, tones were consistently recognized at about 80% correct level, independent of the number of bands. This high level of tone recognition produced a significant difference in the open-set sentence recognition between Chinese (11.0%) and English (2.9%) for the one-band condition where no spectral information was available. The data also revealed that, with primarily temporal cues, the falling–rising tone (tone 3) and the falling tone (tone 4) were more easily recognized than the flat tone (tone 1) and the rising tone (tone 2). This differential pattern in tone recognition resulted in a similar pattern in word recognition: words having either tone 3 or 4 were more likely to be recognized while words having tone 1 and 2 were not. The quantitative role of tones in Chinese speech recognition was further explored using a power-function model and found to play a significant role in relating phoneme recognition to sentence recognition.
ISSN:0001-4966
DOI:10.1121/1.423251
出版商:Acoustical Society of America
年代:1998
数据来源: AIP
|
56. |
Exploration of the perceptual magnet effect using the mismatch negativity auditory evoked potential |
|
The Journal of the Acoustical Society of America,
Volume 104,
Issue 1,
1998,
Page 511-517
Anu Sharma,
Michael F. Dorman,
Preview
|
PDF (110KB)
|
|
摘要:
The goals of this study were (i) to assess the replicability of the “perceptual magnet effect” [, J. Acoust. Soc. Am.97(1), 553–561 (1995)] and (ii) to investigate neurophysiologic processes underlying the perceptual magnet effect by using the mismatch negativity (MMN) auditory evoked potential. A stimulus continuum from /i/ to /e/ was synthesized by varyingF1andF2in equal mel steps. Ten adult subjects identified and rated the goodness of the stimuli. Results revealed that the prototype was the stimulus with the lowestF1and highestF2values and the nonprototype stimulus was close to the category boundary. Subjects discriminated stimulus pairs differing in equal mel steps. The results indicated that discrimination accuracy was not significantly different in the prototype and the nonprototype condition. That is, no perceptual magnet effect was observed. The MMN evoked potential (a preattentive, neurophysiologic index of auditory discrimination) revealed that despite equal mel differences between the stimulus pairs the MMN was largest for the prototype pair (i.e., the pair that had the lowestF1and highestF2values). Therefore the MMN appears to be sensitive to within category acoustic differences. Taken together, the behavioral and electrophysiologic results indicate that discrimination of stimulus pairs near a prototype is based on the auditory structure of the stimulus pairs.
ISSN:0001-4966
DOI:10.1121/1.423252
出版商:Acoustical Society of America
年代:1998
数据来源: AIP
|
57. |
The perception of speech gestures |
|
The Journal of the Acoustical Society of America,
Volume 104,
Issue 1,
1998,
Page 518-529
Aimée M. Surprenant,
Louis Goldstein,
Preview
|
PDF (178KB)
|
|
摘要:
Two experiments examined the effects of temporal overlap of speech gestures on the perception of stop consonant clusters. Sequences of stop consonant gestures that exhibit temporal overlap extreme enough to potentially eliminate the acoustic evidence of (at least) one of the consonants were obtained from x-ray microbeam data. Subjects were given a consonant monitoring task using stimuli containing stop sequences as well as those containing single stops. Results showed that (1) the initial consonant in the stop sequences was detected significantly less often than in the single stops; (2) bilabial gestures were considerably more effective at obscuring a preceding alveolar than the reverse; and (3) the detection rate correlated with an index of overlap between lip and tongue tip gestures. Experiment 2 employed stimuli that were truncated during the closure for the critical stop or stop sequence, so as to eliminate any information occurring in the acoustic signal at the stop release. This experiment showed that removing release information decreased detectability of the consonants generally. However, consistent with the observed gestural patterns, removing the release did not decrease detection of the alveolar stop when it was the first consonant of a sequence, indicating that there was no information about the alveolar stop present in acoustic realization of the second stop release. These experiments show that certain gestural patterns actually produced by English speakers may not be completely recoverable by listeners, and further, that it is possible to relate recoverability to particular metric properties of the gestural pattern.
ISSN:0001-4966
DOI:10.1121/1.423253
出版商:Acoustical Society of America
年代:1998
数据来源: AIP
|
58. |
Audiovisual gating and the time course of speech perception |
|
The Journal of the Acoustical Society of America,
Volume 104,
Issue 1,
1998,
Page 530-539
K. G. Munhall,
Y. Tohkura,
Preview
|
PDF (195KB)
|
|
摘要:
The time course of audiovisual information in speech perception was examined using a gating paradigm. VCVs that evoked the McGurk effect were gated visually and auditorily. The visual gating yielded a McGurk effect that increased in strength as a linear function of amount of visual stimulus presented. The acoustic gating revealed a more nonlinear function in which the VC information was considerably weaker than the CV portion of the VCV. The results suggest that the flow of cross-modal information is quite complex during audiovisual speech perception.
ISSN:0001-4966
DOI:10.1121/1.423300
出版商:Acoustical Society of America
年代:1998
数据来源: AIP
|
59. |
Acceptability for temporal modification of single vowel segments in isolated words |
|
The Journal of the Acoustical Society of America,
Volume 104,
Issue 1,
1998,
Page 540-549
Hiroaki Kato,
Minoru Tsuzaki,
Yoshinori Sagisaka,
Preview
|
PDF (166KB)
|
|
摘要:
Few perceptual studies of the temporal aspects of speech have investigated the influence of changes in segmental durations in terms of acceptability. Aiming to contribute to the assessment of rules for assigning segmental durations in speech synthesis, the current study measured the perceptual acceptability of changes in the segmental duration of vowels as a function of the segment attributes or context, such as base duration, temporal position in a word, vowel quality, and voicing of the following segment. Seven listeners estimated the acceptability of word stimuli in which one of the vowels was subjected to a temporal modification from−50 ms(for shortening) to+50 ms(for lengthening) in 5-ms steps. The temporal modification was applied to vowel segments in 70 word contexts; their durations ranged from 35–145 ms, the mora position in the word was first or third, the vowel quality was /ɑ/ or /i/, and the following segment was a voiced or an unvoiced consonant. The experimental results showed that the listeners’ acceptable range of durational modification was narrower for vowels in the first moraic position in the word than for those in the third moraic position. The acceptable range was also narrower for the vowel /ɑ/ than for the vowel /i/, and similarly narrower for vowels followed by unvoiced consonants than for those followed by voiced consonants. The vowel that fell into the least vulnerable class (the third /i/, followed by a voiced consonant) required 140% of the modification of that which fell into the most vulnerable class (the first /ɑ/, followed by an unvoiced consonant) to yield the same acceptability decrement. In contrast, the effect of the original vowel duration on the acceptability of temporal modifications was not significant despite its wide variation (35–145 ms).
ISSN:0001-4966
DOI:10.1121/1.423301
出版商:Acoustical Society of America
年代:1998
数据来源: AIP
|
60. |
Characterizing the clarinet tone: Measurements of Lyapunov exponents, correlation dimension, and unsteadiness |
|
The Journal of the Acoustical Society of America,
Volume 104,
Issue 1,
1998,
Page 550-561
Teresa D. Wilson,
Douglas H. Keefe,
Preview
|
PDF (301KB)
|
|
摘要:
The clarinet tone is produced by a self-sustained oscillation involving nonlinearity between the flow through the reed and the mechanical response of the reed, and acoustic coupling via the air column response. Regimes of oscillation include periodic, biperiodic, and other quasi-periodic signals, yet even a nominally periodic tone has small, but perceptually and musically important, deviations that are obscured in a conventional power spectrum. Such deviations may be due to the nonlinear dynamics underlying sound production or to perturbations in the performer’s control of the instrument via changes in lip embouchure, blowing pressure, and vocal tract configuration. Techniques based upon experimental nonlinear dynamics and short-time signal processing are applied to the acoustic signal measured within the clarinet mouthpiece to assess the role of these additional deviations on sound production. These include the Lyapunov exponent, correlation dimension, and a normalized period-synchronous energy variance, termed unsteadiness. Normal tones and multiphonics are indistinguishable with respect to their Lyapunov exponents. The largest exponent is small and positive, indicating a small amount of information loss each cycle. The information in clarinet tones diminishes at rates ranging from 10 to 60 bits/s. Unsteadiness accounts for the variations in correlation dimension for normal tones but not for multiphonics. These measures may be useful in the study of more subtle aspects of tone production in wind instruments.
ISSN:0001-4966
DOI:10.1121/1.423254
出版商:Acoustical Society of America
年代:1998
数据来源: AIP
|
|