Evaluation
of mutlidmodal speech as a human-computer interface
Investigators:
Azra Ali, Michael Ingleby, Phil Marsden
DESCRIPTION:
The aim of the thesis is to develop models to evaluate speech communication
and development of the McGurk effect, which will provide data for a better understanding
of the phenomenon. The research will focus on multimedia presentations where
aligned auditory and visual channels can improve speech reliability but misalignment
can create curious perceptual effects (e.g. McGurk and MacDonald, 1976). The
investigation will examine cognitive models for providing an in-depth understanding
of audiovisual speech communication. Thus, aim to provide an insight as to which
speech sounds are more vulnerable to synchronisation of audio and visual channels
by studying the McGurk effect in syllables, isolated words, and parts of words
presented in a sentence context, in the hope of increasing the reliability and
design of audiovisual speech as an interface in multimodal applications. Such
studies are scientifically important because the interface involves the overall
human cognitive and performance system.
The cognitive models of speech communication that are to be investigated have current technological interest. Talking heads, real and virtual are increasingly used as a key component of the human computer interface, because this bimodal form of communication promises greater reliability and usability than single mode. The experimental side of the investigation will develop further modalities beyond the bimodal to characterise human performance in human computer interaction using a speech channel, which ultimately will bring multimodal applications much closer to the human.