Measurements were made of closure durations of the eighteen possible voiced‐voiceless and voiceless—voiced combinations of /p/, /t/, /k/ and /b/, /d/, /g/ produced medially in isolated nonsense disyllables by three English‐speaking males. Preliminary results show that closure durations depend on both voicing and place of articulation of first and second members of such clusters. Mean closure durations were significantly longer (by approximately 20 msec) for voiceless‐voiced than voiced—voiceless clusters. Clusters with first‐member velars were longer than those with first‐member labials and alveolars, while clusters with second‐member velars were shorter than those with other second members. Sequences of stops requiring the greatest changes in points of articulation (labial → velar, or velar → labial) were longer than sequences requiring smaller changes in the same direction (labial → alveolar, alveolar → velar, or velar → alveolar, alveolar → labial). Closure durations generally varied inversely with durations of surrounding vowels. Clusters were longest, for example, in the frame /pI__It/, somewhat shorter in/pI__at/, shorter still in /pa__It/, and shortest in /pa__at/. But, durations for like clusters were no shorter in the frame /pI__Id/ than in /pI__It/. These latter findings suggest first that closure durations for medial step clusters are governed extrinsically by a tendency toward isochrony. However, whatever principles lead toward isochrony appear to operate before the rule which results in greater length of final‐syllable vowels before voiced stops in English.