about
DeepKnowledge will extend the state-of-the-art in natural language processing (NLP) and multilingual knowledge enabling technologies in seven interrelated areas of high potential impact. The main research objective of DeepKnowledge consists in advancing the state-of-the-art towards NLU by (i) generating and exploiting new language models for the official languages of Spain plus English by taking into account a multitask and multimodal objective during the pre-training; (ii) exploring novel ways, such as prompting, of exploiting these language models to improve NLP results on zero-shot and few-shot settings (without or very little training data for the target language or task at hand); (iii) by addressing language understanding tasks by text generation; (iv) by leveraging pre-trained language models and knowledge bases, (v) developing new benchmarks and datasets for evaluating and assessing the our progress towards Natural Language Understanding; (vi) to apply the newly developed techniques to improve the state-of-the art in language understanding, especially for settings with few or non-existing training data and (vii) by developing a number of advanced content-based domain applications for the main official languages in Spain (including Spanish, Catalan, Basque, and Galician) and English, in multiple sectors and domains (such as eLearning, eHealth, eHumanities, etc).
The research groups involved in the project have a strong track record of publishing at national and international level and they will continue work in disseminating results (both research and application related) throughout the duration of the project. This will include the publication of top-ranking journal articles and conference proceedings as well as presentation of the project results at scientific events, fairs, hackathons, workshops and conferences.
Incorporating the latest insights in deep learning technology, such as large pre-trained language models, transfer learning, few-shot and zero-shot capabilities, multimodal and multi-task processing, prompting, etc. DeepKnowledge will leverage deep learning techniques and large pre-trained language models and carefully designed datasets and knowledge bases to advance the state of the art towards natural language understanding to English, Spanish, Catalan, Basque and Galician in several domains and digital sectors. DeepKnowledge has the potential to help de-fragment and impact NLP technology on these languages, domains and sectors thereby providing easy access to such technology. For instance, DeepKnowledge will contribute to information extraction and enrichment of Electronic Health Records. DeepKnowledge will also investigate new text generation approaches for applications such as argument generation, text simplification or abstractive summarization. Additionally, DeepKnowledge will apply the new language models in novel ways for tasks and applications such as disinformation detection, Question Answering or elearning.
The impact of the project in the academic and industrial communities will be higher due to the resulting technology and linguistic resources: the produced knowledge bases will be very useful not only to researchers in Artificial Intelligence and NLP, but also will make possible for the industry to develop interfaces and information access applications currently infeasible. The produced new software will be distributed under open source licenses, enabling the universal access to a new cutting-edge technology in NLP. The feasibility of the socio-economic impact is boosted by the socio-economic and scientific impact that linguistic tools and resources already contributed by the partners in the DeepKnowledge project convey both at national and international levels. Examples of this noticeable impact include thousands of downloads of the Multilingual Central Repository (MCR), the linguistic processors such as IXA pipes or our language models uploaded into Hugging Face repository.