PROSA-MED: PROcesamiento Semántico textual Avanzado para la detección de diagnósticos, procedimientos, otros conceptos y sus relaciones en informes MEDicos.

The Health area plays a key role in our society, not only due to its impact in the welfare state but also because of its multidisciplinary impacts. The number of documents in the medical domain generated by the healthcare centers (hospitals and primary care) grows constantly, hence, the development of automatic tools for textual analysis may imply a decisive advance for health systems.

Language Technologies provide different tools for textual analysis that can be of much aid to the medical personnel and consequently lead to an increment in their productivity. The consortium of research groups from different universities and health institutions presented in this project is convinced that a big step ahead can be made in this field. Our objective is to propose solutions for the automatic processing of Medical Records that currently imply an important soothe in person-time and economic costs. We present three main use cases: 1) automatic diagnosis of medical records, 2) detection of adverse drug effects and 3) detecting relationships among concepts that will allow the discovery of new medical knowledge. The types of relations identified in the 2nd use case will be relevant to improve the 1st use case, and will also be used in the 3rd use case 3 in order to establish patterns regarding a patient's medical history.

This project will develop a set of processors that will allow the automatic analysis of medical texts using criteria such as robustness, high precision and coverage. The project will provide the medical personnel a wide and flexible set of tools, linguistic, semantic and terminological resources that will be applied to different types of medical texts to leverage the following tasks:

- Morphologic, syntactic and semantic analysis adapted to texts in the medical domain, improving the state of the art in this area, paying a special emphasis to entity recognition.

- Assignment of diagnostic codes to medical records following the ICD-10 coding.

- Detection of relationships between concepts, in order to advance the discovery of evidence not explicitly coded in texts.

The project will make use of both supervised and unsupervised techniques. The approach is suitable for different languages, including Spanish, an ambitious objective due to its relevance in multiple health systems of different countries. Moreover, we will also tackle languages with different characteristics and level of development in the medical domain: Catalan and Basque. The work developed in this project will leverage public and private companies, as it will develop software that will be available for SMEs and other companies that are interested in developing products for the health area. The participating entities represent 3 different health systems (Madrid, Cataluña and the Basque Country), but the system will be easily ported to all the Spanish Health system. Given the experience of the research groups taking part in this project, we expect that it will produce an important scientific impact in the form of papers published in high impact venues, as journals and conferences, generating new knowledge that will make a step ahead in several scientific areas.