Science: Studies of vowels in TIMIT database

Science: Studies of vowels in TIMIT database

Vowels in the TIMIT database were studied to test predictions of acoustic theory, and investigate the identifiable characteristics of vowels. Vowels in the TIMIT database were analyzed with a formant tracker, and F1 frequency and amplitude tracks computed.

Experiment 1: Presence of F1 peak in vowels

Acoustic theory predicts that F1 frequency should be lowered by constriction in the front half of the vocal tract. Therefore, F1 should show a peak in vowels between orally closed consonants. Experiments confirm the prediction. Most vowels (94% amp, 89% freq) show a peak somewhere in the vowel, including essentially all vowels between orally closed consonants. Exceptions are explainable by transcription error or formant tracker failure. F1 amplitude is more reliable than frequency.

Experiment 2: Coincidence of amplitude and frequency peaks in vowels

Acoustic theory predicts that F1 frequency and amplitude should peak together, if the source is stable. Experiments confirm the prediction. Peaks of amplitude and frequency are highly correlated. Both peaks tend to occur early in the vowel, and frequency peaks tend to occur earlier than amplitude peaks. Exceptions are explainable by transcription error or formant tracker failure.

Experiment 3: Vowel quality at F1 peak vs. midpoint

If the F1 peak is a better location for judging vowel quality than the midpoint, LAFF labeling policy may need to be changed, since current labels are placed at the midpoint. A vowel classifier (K nearest neighbors) was tested on vowel spectra from the F1 peaks and the midpoints. Performance was not substantially different between the two types of spectra, although statistical test showed that they carry significantly different information.

Experiment 4: Fixed energy band vs. formant tracking

If a fixed energy band can provide performance comparable to the F1 energy, a formant tracker can be avoided. A fixed band (300 to 900 Hz) was found to be at least as good for peak picking as the F1 energy, and the performance was not sensitive to the exact values of band edges and transition widths.

Conclusions

Overall, experimental results agree with theoretical predictions. F1 peak is a good indicator of vowel presence (amplitude especially). The precise location of the vowel landmark is not critical. The simple fixed band energy works as well as using a formant tracker.