Intelligibility testing is often the only correct way to evaluate a speech system. Two factors have not been standardized: the talkers' voice quality and the listeners' ability to understand degraded speech. Significant differences exist between test groups because of these factors, so it is not proper to consider the resulting intelligibility scores as absolute values. I propose to coordinate the development of tape recorded word lists to be used as a reference/standard. A group of “representative” talkers, rather than one talker, would read each word list for balanced voice quality. A large representation of the population would evaluate these tapes under various degraded speech conditions to obtain the “reference” scores. Details as to word lists, method of presentation, types of degration, and listener population selection would be addressed by a varied group of interested and qualified investigators.