Published on October 4, 2007
The Wheat and Chaff of Speech Recognition: The Wheat and Chaff of Speech Recognition Jonathan Bloom, Ph.D. Agenda: Agenda When to use speech How to spec How to test Definitions: Definitions Dictation (PC mostly) Command and Control (PC, Phone, PDA, cell, cars) Multimodal (PC, cars) asynchronous synchronous TTS - “Text to Speech” GUI: Warts and All: GUI: Warts and All SUI: Warts and All: SUI: Warts and All SUI: Warts in Places You Didn’t Check: SUI: Warts in Places You Didn’t Check Taxes computer memory – requires tradeoffs Speaker dependence Vocabulary size Taxes human memory Remember command wording Remember how to speak Speech for the Right Reasons: Speech for the Right Reasons Hands busy or disabled? Eyes busy? Repetitive task? Small form factor? Noisy environment? Good examples: Good examples CAD on desktop Hands available for mouse and keyboard tasks Dictation on desktop For RSI For loosely formatted text Navigation destination entry in car Auto attendant on phone Bad Examples: Bad Examples Email by phone Speech to replace long touch-tone menus Browse web by voice This could get me fired. How to write a spec(speech only): How to write a spec (speech only) Start with sample interactions not comprehensive represent main legs validate feel audio version as well How to write a spec(speech only): How to write a spec (speech only) ETC... Directing Voice Talent: Directing Voice Talent Huge part of usability Common problems No pause before commands Too ‘friendly’ or too ‘cold’ Not a conversation Engages caller and makes app more understandable Usability Testing Speech Apps: Usability Testing Speech Apps More is the same than different, except… No real-time discussion Leading wording becomes major concern Need to capture two audio channels (THAT-1) 2 audio and 2 video for multimodal Changing Field: Changing Field SpeakFreely®, SayAnything®, How May I Help You Synchronous multimodal NLU? Thank you.: Thank you.