Bayesian Auditory Scene Analysis

Back to main
Click on gammatonegrams in order to hear the corresponding audio!

Perceptual filling-in

When sources produce sounds that overlap in time and frequency, sufficiently intense sounds can obscure the presence of less intense sounds. That is, if a less intense sound is added to a sufficiently intense sound, the less intense sound will not be heard -- a phenomenon termed 'masking'. In such cases, the addition of the less intense sound does not alter the peripheral auditory representation to a detectable extent. However, the perceptual interpretation can nevertheless be modulated by context, as explored in these next illusions.

Continuity of a tone through masking noise

A noise flanked by tones could equally well consist of two short tones adjacent to the noise, or a single longer tone overlapping the noise. Listeners will perceive the former scene if the noise is quiet, and they will hear the latter scene if the noise is sufficiently loud to have masked the tone. In the next illusion, the amplitude of the noise is decreased:
Tone continuity illusion - decreasing tone volume
Most listeners will hear a single longer tone on the first and second presentation. The model also strongly prefers this scene.
Posterior samples of scenesMixture of sampled sources
Tone continuity illusion - Posterior sample 1, stream 0 Tone continuity illusion - Posterior sample 1, stream 1 Tone continuity illusion - Posterior sample 1, mixture
Tone continuity illusion - Posterior sample 2, stream 0 Tone continuity illusion - Posterior sample 2, stream 1Tone continuity illusion - Posterior sample 2, mixture
Tone continuity illusion - Posterior sample 3, stream 0 Tone continuity illusion - Posterior sample 3, stream 1Tone continuity illusion - Posterior sample 3, mixture
Tone continuity illusion - Posterior sample 4, stream 0 Tone continuity illusion - Posterior sample 4, stream 1Tone continuity illusion - Posterior sample 4, mixture
Warren, R. M., Obusek, C. J., & Ackroff, J. M. (1972). Auditory Induction: Perceptual Synthesis of Absent Sounds. Science, 176, 1149-1151.
Back to top

Homophonic continuity**

A similar effect occurs even when the masker sound is produced simply by increasing the amplitude of the masked sound (hence, homophonic). In the illusion below, the amplitude of the initial quiet noise is increased suddenly (1 ms ramp):
Homophonic continuity illusion - sudden amplitude increase, gammatonegram
Listeners tend to hear a long quiet noise with a louder short noise imposed on top. In contrast, the following sound that ramps gradually in amplitude is usually heard as a single entity:
Homophonic continuity illusion - gradual amplitude increase, gammatonegram
The model exclusively hears two sources in the first sound and a single source in the second sound, although the frequency spectra of the inferred sounds do not exactly match the observation.
ObservationPosterior samples of scenesMixture of sampled sources
(if nSources > 1)
Homophonic continuity illusion - sudden amplitude increase, gammatonegramhomophonic continuity illusion, sudden - posterior sample 1, stream 0 homophonic continuity illusion, sudden - posterior sample 1, stream 1homophonic continuity illusion, sudden - posterior sample 1, mixture
homophonic continuity illusion, sudden - posterior sample 2, stream 0 homophonic continuity illusion, sudden - posterior sample 2, stream 1homophonic continuity illusion, sudden - posterior sample 2, mixture
homophonic continuity illusion, sudden - posterior sample 3, stream 0 homophonic continuity illusion, sudden - posterior sample 3, stream 1homophonic continuity illusion, sudden - posterior sample 3, mixture
Homophonic continuity illusion - gradual amplitude increase, gammatonegramhomophonic continuity illusion, gradual - posterior sample 1, stream 0
homophonic continuity illusion, gradual - posterior sample 2, stream 0
homophonic continuity illusion, gradual - posterior sample 3, stream 0
Warren, Obusek & Ackroff (1972).
Back to top

Spectral Completion

Note: listening with headphones is likely to work better than listening over computer speakers.

Analogous phenomena occur over the frequency spectrum, dubbed "spectral completion". McDermott and Oxenham (2008)'s basic paradigm for investigating spectral completion is shown below. Listeners heard a long masker noise, which overlapped with a brief target noise halfway through its duration. The spectrum of the target was ambiguous because the middle band of its spectrum could plausibly be masked by the masker. Listeners were asked to adjust the middle band of a comparison noise until it perceptually matched the target.
Spectral completion paradigm
To simulate this basic experiment here, we borrow the demonstration from the paper's associated site. The mixture stimulus is presented followed by the comparison, ten times. In each iteration, the level of middle frequency band of the comparison (the adjustable band) is increased. Listeners typically judge a good match between the comparison and target around the eighth repetition.
Basic spectral completion demo
In the first few iterations, the comparsion stimulus sounds too dull or low and in the very last iterations, the comparison stimulus sounds too tinny or thin.

Below are the spectral completion stimuli included in our Cognitive Science submission. Note that our figures reported the mean spectrum level across posterior samples. Any single posterior sample may not perceptually match the target, even if the expectation of the spectrum level matches the mean of human reports. The question of how posterior samples should relate to the percept is open and interesting.
Figure 5B
SchematicObservationPosterior samples of target only
Bi
FigureBiBi ps1Bi ps2Bi ps3Bi ps4Bi ps5
Bii
FigureBiiBii ps1Bii ps2Bii ps3Bii ps4
Biii
FigureBiiiBiii ps1Biii ps2Biii ps3Biii ps4
Biv
FigureBivBiv ps1Biv ps2Biv ps3Biv ps4Biv ps5
Bv
FigureBvBv ps1Bv ps2Bv ps3Bv ps4
Bvi
FigureBviBv ps1Bv ps2Bv ps3Bv ps4

Figure 5C
SchematicObservationPosterior samples of target
Ci
FigureCiCi ps1ci ps2ci ps3ci ps4
Cii
FigureCiicii ps1cii ps2cii ps3cii ps4
Ciii
FigureCiiiciii ps1ciii ps2ciii ps3
Civ
FigureCivciv ps1civ ps2civ ps3civ ps4
Cv
FigureBvcv ps1cv ps2cv ps3cv ps4cv ps4
Cvi
FigureCvicvi ps1cvi ps2cvi ps3cvi ps4
McDermott, J. H., & Oxenham, A. J. (2008). Spectral completion of partially masked sounds. Proceedings of the National Academy of Sciences, 105(15), 5939-5944.
Back to top
Back to main