Perception models based on different kinds of acoustic data were compared with respect to their capacity to predict perceptual confusions between the Swedish stops [b,d,Δ,g] in systematically varied vowel contexts. Fragments of VC:V utterances read by a male speaker were presented to listeners. The resulting confusions were especially numerous between short stimulus segments following stop release, and formed a regular pattern depending mainly on the acute/grave dimension of the following vowel. The acoustic distances calculated were based on: (1) filter band spectra; (2)F2andF3at the CV boundary and in the middle of the following vowel; (3) the duration of the burst (=transient + noise section). Both the spectrum‐based and the formant‐based models provided measures of acoustic distance (dissimilarity) that revealed regular patterns. However, the predictive capacity of both models was improved by including the time‐varying properties of the stimuli in the distance measures. The highest correlation between predicted and observed percent confusions,r=0.85, was obtained with the formant‐based model in combination with burst length data. The asymmetries in the listeners’ confusions were also shown to be predictable, given acoustic data on the following vowel.