Science: Studies of vowels in TIMIT database
Science: Studies of vowels in TIMIT database
Vowels in the TIMIT database were studied to test predictions of
acoustic theory, and investigate the identifiable characteristics of
vowels.
Vowels in the TIMIT database were analyzed
with a formant tracker, and F1 frequency and amplitude tracks
computed.
Acoustic theory predicts that F1 frequency should be lowered by
constriction in the front half of the vocal tract. Therefore, F1
should show a peak in vowels between orally closed consonants.
Experiments confirm the prediction. Most vowels (94% amp, 89% freq)
show a peak somewhere in the vowel, including essentially all vowels
between orally closed consonants. Exceptions are explainable by
transcription error or formant tracker failure. F1 amplitude is more
reliable than frequency.
Acoustic theory predicts that F1 frequency and amplitude should peak
together, if the source is stable. Experiments confirm the
prediction. Peaks of amplitude and frequency are highly correlated.
Both peaks tend to occur early in the vowel, and frequency peaks tend
to occur earlier than amplitude peaks. Exceptions are explainable by
transcription error or formant tracker failure.
If the F1 peak is a better location for judging vowel quality than the
midpoint, LAFF labeling policy may need to be changed, since current
labels are placed at the midpoint. A vowel classifier (K nearest
neighbors) was tested on vowel spectra from the F1 peaks and the
midpoints. Performance was not substantially different between the
two types of spectra, although statistical test showed that they carry
significantly different information.
If a fixed energy band can provide performance comparable to the F1
energy, a formant tracker can be avoided. A fixed band (300 to 900
Hz) was found to be at least as good for peak picking as the F1
energy, and the performance was not sensitive to the exact values of
band edges and transition widths.
Conclusions
Overall, experimental results agree with theoretical predictions. F1
peak is a good indicator of vowel presence (amplitude especially).
The precise location of the vowel landmark is not critical. The
simple fixed band energy works as well as using a formant tracker.