Untitled Document

Evaluation of mutlidmodal speech as a human-computer interface
Investigators: Azra Ali, Michael Ingleby, Phil Marsden

DESCRIPTION:
The aim of the thesis is to develop models to evaluate speech communication and development of the McGurk effect, which will provide data for a better understanding of the phenomenon. The research will focus on multimedia presentations where aligned auditory and visual channels can improve speech reliability but misalignment can create curious perceptual effects (e.g. McGurk and MacDonald, 1976). The investigation will examine cognitive models for providing an in-depth understanding of audiovisual speech communication. Thus, aim to provide an insight as to which speech sounds are more vulnerable to synchronisation of audio and visual channels by studying the McGurk effect in syllables, isolated words, and parts of words presented in a sentence context, in the hope of increasing the reliability and design of audiovisual speech as an interface in multimodal applications. Such studies are scientifically important because the interface involves the overall human cognitive and performance system.

The cognitive models of speech communication that are to be investigated have current technological interest. Talking heads, real and virtual are increasingly used as a key component of the human computer interface, because this bimodal form of communication promises greater reliability and usability than single mode. The experimental side of the investigation will develop further modalities beyond the bimodal to characterise human performance in human computer interaction using a speech channel, which ultimately will bring multimodal applications much closer to the human.