Speech assistance for citizen services

Digital voice assistants make life easier and continue to enjoy growing popularity. A voice interaction capability can also enable a more modern and convenient communication between citizens and public authorities (and specifically their online services), while also contributing to improved accessibility. However, in this application domain, special requirements apply w.r.t. data protection and data sovereignty, which rule out the use of relevant cloud services offered by the data-driven internet giants. Therefore, Fraunhofer FOKUS and IDMT developed a custom, hybrid AI approach for an independently deployable voice assistance solution. With this solution approach, citizen services can be easily extended with a voice assistant, e.g., to enable filling out and submitting applications in a completely voice-based manner – for everyone, but especially beneficial for people with motoric or visual impairments.     

© bestforbest - stock.adobe.com
Citizen services can be easily extended with a voice assistant, e.g., to enable filling out and submitting applications in a completely voice-based manner.

With smart voice assistants like Alexa, Siri, or Google Assistant, it is more convenient than ever to quickly check the weather forecast, select music for playing, use mobility services, or even place online orders. Why shouldn't such a popular and innovative way of interaction also make the citizens' communication with public administrations more attractive and convenient, but also more accessible?
 

Speech assistants with data sovereignty

A tempting idea – but with considerable challenges in implementation: interaction with public authorities often involves sensitive personal data by its very nature. For this reason, relying on the sophisticated voice assistance services of U.S. providers is not an option. In the interest of data sovereignty, the project "Speech Assistance for Citizen Services – S4CS" therefore combines existing Fraunhofer research results, powerful open source software components based on a customized hybrid AI solution approach to create a tailored voice assistance solution for the needs of German public administrations. This approach allows existing online services to be easily extended with conversational interfaces, enabling a chat interface (and integration with existing chatbots) as well as spoken-language interfaces. In particular, application processes and the task of filling online forms central to this purpose are supported in a dedicated manner. An important element of the design is a minimally-invasive integration approach by which the basic implementation of the existing online service can remain functionally unchanged. The existing core functionality and logic of the online service thus are not affected or restricted by the integration. Only limited extensions in a narrow part of the code base of the existing online service are necessary to voice-enable the process. Customized interaction details for the new interaction channel “speech” (such as system utterances and which user utterances to react to) are specified along the existing workflow of the traditional screen interaction.

 

Using hybrid AI to include existing knowledge

The core challenges of the project arise from the application domain – in public administration, there are rigid requirements regarding data privacy and data sovereignty. These not only preclude building upon the cloud-based voice assistant offerings. They also result in a very specific and limited data situation – in particular, hardly any real-life data may be collected from the ongoing operation of online services. This makes it very difficult to employ modern, statistical AI methods for core logic. For these reasons, a hybrid AI approach adapted to this specific situation was developed and implemented:

  • an automatic speech recognition (ASR) subsystem developed by Fraunhofer IDMT uses modern deep learning algorithms and domain-adapted language models in combination with innovative noise reduction methods adapted to the project requirements.
  • The dialogue control subsystem implemented by Fraunhofer FOKUS uses the Constraint Handling Rules (CHR) formalism. The implementation of this theoretically sound but also field-tested rule-based AI approach in an efficient open source implementation (GoCHR) is a research result of Fraunhofer FOKUS and was adapted and extended in order to meet the requirements of the project.
  • The speech synthesis (Text-To-Speech, TTS) component again builds on state-of-the-art deep learning methods. It was developed by Fraunhofer FOKUS based on open-source tools with specific project-related adaptations and optimizations as well as specially optimized training data.

This hybrid solution approach allows existing, language-specific general corpora to be used for training the Deep Learning models applied in the language interfaces, with only selective domain-specific extensions.
 

AI is fully subject to human control and monitoring

The rule-based dialogue control at the core of the system, on the other hand, requires no training data, is fully subject to human control and monitoring, and offers unrestricted traceability. For this core logic (which is responsible for all essential system decisions), the specific risks often discussed in the context of AI applications (including those arising from training data biases) therefore do not apply.  

 

Follow-up projects

Three substantial follow-up projects have emerged from this project meanwhile: A contract research project of the Free and Hanseatic City of Hamburg, in which the online service "Kinderleicht zum Kindergeld" is extended with speech assistance functionality is currently entering the pilot phase. In addition, the BMG-funded project HYKIST (Hybrid AI Language Technology in Emergency Medicine), and the Horizon2020 project ACROSS (Towards user journeys for the delivery of cross-border services ensuring data sovereignty) of the European Union have been launched in 2020 and 2021, respectively.

The S4CS project as well as the follow-up projects have proven the viability of the chosen approach. Current research and development work is primarily focused on two aspects – first, the support of multilingual operation and second, an approach for leveraging modern, statistical NLP methods, in particular large, pre-trained deep learning models for dialogue control – with continued compliance to the special requirements, restrictions and data situation imposed by the application domain.      

Project info

This project is funded

by the Fraunhofer CCIT Technology Hub Machine Learning.