With smart voice assistants like Alexa, Siri, or Google Assistant, it is more convenient than ever to quickly check the weather forecast, select music for playing, use mobility services, or even place online orders. Why shouldn't such a popular and innovative way of interaction also make the citizens' communication with public administrations more attractive and convenient, but also more accessible?
A tempting idea – but with considerable challenges in implementation: interaction with public authorities often involves sensitive personal data by its very nature. For this reason, relying on the sophisticated voice assistance services of U.S. providers is not an option. In the interest of data sovereignty, the project "Speech Assistance for Citizen Services – S4CS" therefore combines existing Fraunhofer research results, powerful open source software components based on a customized hybrid AI solution approach to create a tailored voice assistance solution for the needs of German public administrations. This approach allows existing online services to be easily extended with conversational interfaces, enabling a chat interface (and integration with existing chatbots) as well as spoken-language interfaces. In particular, application processes and the task of filling online forms central to this purpose are supported in a dedicated manner. An important element of the design is a minimally-invasive integration approach by which the basic implementation of the existing online service can remain functionally unchanged. The existing core functionality and logic of the online service thus are not affected or restricted by the integration. Only limited extensions in a narrow part of the code base of the existing online service are necessary to voice-enable the process. Customized interaction details for the new interaction channel “speech” (such as system utterances and which user utterances to react to) are specified along the existing workflow of the traditional screen interaction.
The core challenges of the project arise from the application domain – in public administration, there are rigid requirements regarding data privacy and data sovereignty. These not only preclude building upon the cloud-based voice assistant offerings. They also result in a very specific and limited data situation – in particular, hardly any real-life data may be collected from the ongoing operation of online services. This makes it very difficult to employ modern, statistical AI methods for core logic. For these reasons, a hybrid AI approach adapted to this specific situation was developed and implemented:
This hybrid solution approach allows existing, language-specific general corpora to be used for training the Deep Learning models applied in the language interfaces, with only selective domain-specific extensions.
The rule-based dialogue control at the core of the system, on the other hand, requires no training data, is fully subject to human control and monitoring, and offers unrestricted traceability. For this core logic (which is responsible for all essential system decisions), the specific risks often discussed in the context of AI applications (including those arising from training data biases) therefore do not apply.
Three substantial follow-up projects have emerged from this project meanwhile: A contract research project of the Free and Hanseatic City of Hamburg, in which the online service "Kinderleicht zum Kindergeld" is extended with speech assistance functionality is currently entering the pilot phase. In addition, the BMG-funded project HYKIST (Hybrid AI Language Technology in Emergency Medicine), and the Horizon2020 project ACROSS (Towards user journeys for the delivery of cross-border services ensuring data sovereignty) of the European Union have been launched in 2020 and 2021, respectively.
The S4CS project as well as the follow-up projects have proven the viability of the chosen approach. Current research and development work is primarily focused on two aspects – first, the support of multilingual operation and second, an approach for leveraging modern, statistical NLP methods, in particular large, pre-trained deep learning models for dialogue control – with continued compliance to the special requirements, restrictions and data situation imposed by the application domain.