Assistance System for Intelligent Buildings Controlled by Voice and Natural Speech
Coordinated by: Horia Cucu
The main goal of this project is to create a Natural-language, Voice-controlled Assistive System for Intelligent Buildings. The resulting prototype from this project will be the proof-of-concept starting point for the implementation of voice-enabled assistive systems in homes, schools, hospitals, etc.
As opposed to the standard smart room interface using buttons or software on static and mobile devices, the bidirectional voice-based interaction between a user and the smart room system that will be developed in this project brings a significant improvement in the quality of life, under many aspects. First, it is much more comfortable and natural for users to speak with/hear the system (speech is the most natural way of communication for humans). Hardware based interaction through static (e.g. wall-switches) and mobile devices (e.g. specialized software on smartphones, tablets) will not be replaced, but will become the secondary means of control, needed to predefine complex scenarios. For elderly and disabled people, a voice interface may be the only way to interact with such a system, making it invaluable over traditional interaction methods. Furthermore, in sanitary environments such as hospitals, a voice interface brings significant health advantages, as users do not have to physically touch devices. Finally, a voice synthesis system is very important in case of emergencies. A simple alarm does not inform the userson the alarm cause (fire somewhere in the vicinity, water flooding due to a broken pipe, gas leak, etc.) and the required precautions, while a voice synthesis system could provide short, concise information on what happened and precise security instructions and may make a big difference in disaster scenarios.
To deploy a highly-scalable voice-enabled smart room, three main technical and scientific directions must be pursued: multilingual automatic speech recognition (ASR) in a smart room scenario, multilingual text-to-speech (TTS) synthesis and scalable, flexible hardware-software system integration. The above three challenges will be adequately distributed and approached by the three partners of the consortium: (1) University Politehnica of Bucharest, through the Speech and Dialogue research laboratory, with an extensive expertise in spoken language technology, (2) the Research Institute for Artificial Intelligence “Mihai Draganescu”, with an extensive expertise in TTS and natural language processing and (3) iWave Solutions, a Romanian company which deploys IT&C hardware and software solutions since 2002.
From the ASR point of view there are several scientific bottlenecks that were identified and will be addressed in this project: (1) robustness against noise, (2) distant speech recognition and (3) the accuracy of keyword spotting. An additional challengefor voice-controlled smart rooms is (4) the language dependence it creates, due to the fact that today's speech recognition systems support only one language and moving to another language requires at least new acoustic, phonetic and language resources which are expensive to obtain. Hence, the solution is not scalable to other languages.
Multilingual speech synthesis presents scientific bottlenecks of its own. The following were identified and will be addressed in this project:(1)the synthesized voices must be natural, intelligible and pleasant (prosody performance will be addressed), (2) the system should be easily adaptable to other languages (multilingualism will be addressed), (3) speech resources for low-resourced languages (such as Romanian) have to be developed.
The envisaged end-products of this project are (1) a voice-controlled smart room prototype, (2) multilingual ASR for spoken term (command) detection functional model and (3) multilingual TTS functional model.