Bayesian Auditory Scene Analysis

Back to main
Click on gammatonegrams in order to hear the corresponding audio!

Perceptual filling-in

When sources produce sounds that overlap in time and frequency, sufficiently intense sounds can obscure the presence of less intense sounds. That is, if a less intense sound is added to a sufficiently intense sound, the less intense sound will not be heard -- a phenomenon termed 'masking'. In such cases, the addition of the less intense sound does not alter the peripheral auditory representation to a detectable extent. However, the perceptual interpretation can nevertheless be modulated by context, as explored in these next illusions.

Continuity of a tone through masking noise
Homophonic continuity**
Spectral Completion

Continuity of a tone through masking noise

A noise flanked by tones could equally well consist of two short tones adjacent to the noise, or a single longer tone overlapping the noise. Listeners will perceive the former scene if the noise is quiet, and they will hear the latter scene if the noise is sufficiently loud to have masked the tone. In the next illusion, the amplitude of the noise is decreased:

Tone continuity illusion - decreasing tone volume

Most listeners will hear a single longer tone on the first and second presentation. The model also strongly prefers this scene.

Posterior samples of scenes	Mixture of sampled sources

Warren, R. M., Obusek, C. J., & Ackroff, J. M. (1972). Auditory Induction: Perceptual Synthesis of Absent Sounds. Science, 176, 1149-1151.
Back to top

Homophonic continuity**

A similar effect occurs even when the masker sound is produced simply by increasing the amplitude of the masked sound (hence, homophonic). In the illusion below, the amplitude of the initial quiet noise is increased suddenly (1 ms ramp):

Listeners tend to hear a long quiet noise with a louder short noise imposed on top. In contrast, the following sound that ramps gradually in amplitude is usually heard as a single entity:

Homophonic continuity illusion - gradual amplitude increase, gammatonegram

The model exclusively hears two sources in the first sound and a single source in the second sound, although the frequency spectra of the inferred sounds do not exactly match the observation.

Observation	Posterior samples of scenes	Mixture of sampled sources (if nSources > 1)

Warren, Obusek & Ackroff (1972).
Back to top

Spectral Completion

Note: listening with headphones is likely to work better than listening over computer speakers.

Analogous phenomena occur over the frequency spectrum, dubbed "spectral completion". McDermott and Oxenham (2008)'s basic paradigm for investigating spectral completion is shown below. Listeners heard a long masker noise, which overlapped with a brief target noise halfway through its duration. The spectrum of the target was ambiguous because the middle band of its spectrum could plausibly be masked by the masker. Listeners were asked to adjust the middle band of a comparison noise until it perceptually matched the target.
Spectral completion paradigm

To simulate this basic experiment here, we borrow the demonstration from the paper's associated site. The mixture stimulus is presented followed by the comparison, ten times. In each iteration, the level of middle frequency band of the comparison (the adjustable band) is increased. Listeners typically judge a good match between the comparison and target around the eighth repetition.

In the first few iterations, the comparsion stimulus sounds too dull or low and in the very last iterations, the comparison stimulus sounds too tinny or thin.

Below are the spectral completion stimuli included in our Cognitive Science submission. Note that our figures reported the mean spectrum level across posterior samples. Any single posterior sample may not perceptually match the target, even if the expectation of the spectrum level matches the mean of human reports. The question of how posterior samples should relate to the percept is open and interesting.

Figure 5B
Schematic	Observation	Posterior samples of target only

Figure 5C
Schematic	Observation	Posterior samples of target

McDermott, J. H., & Oxenham, A. J. (2008). Spectral completion of partially masked sounds. Proceedings of the National Academy of Sciences, 105(15), 5939-5944.
Back to top
Back to main