Summary of the coordinated Project

The combination of individual patient data and guidelines is conceptualized as clinical decision support systems. The increase in the adoption of Electronic Health Records (EHR) by healthcare systems results in a collection of massive healthcare data that practitioners, having a limited capability to deal with a big amount of information, are unable to process. This together with the increase of machine processing capabilities leads to an scenario where automatic analysis of Electronic Health Records becomes essential to ascertain patterns, to prevent errors, improve quality, reduce costs and save time to the Health Services. This proposal addresses two main challenges: Development of technologies to support the clinical diagnosis and prevention, and to support the management of medical services. The proposed technology goes beyond detecting medical entities (drugs, symptoms, diseases, diagnostics, body parts, etc) moving towards extracting relevant patterns and relations among these entities (e.g. was a disease an adverse effect of a prescribed drug?), identifying and grouping patients with a particular condition, selecting the appropriate encoding for each medical entity and record, etc. Additionally, the extracted information will be structured in such a way that the evolution of patients can be mined for clinical condition similarities or phenotyping (patterns) leading to new medical knowledge and supporting a variety of purposes and data needs for research. In western societies health systems are highly heterogeneous resulting in a large variety in different dimensions: different types of health centers, medical specialties, medical activities (e.g. primary care vs surgery), and health institutions using different protocols in multi- language countries such as Spain. We aim to prove that Natural Language Processing is able to provide solutions to handle a variety of these scenarios. Keeping all this in mind, the project will focus on developing crucial tools that advance the technology on healthcare decision-making support systems. For the objective of developing text-based technology to support diagnosis and prevention, this project intends to carry out Medical Information Extraction (Recognition of Medical Entities, Temporal and Domain Relationships Detection), document classification based on the diagnoses (e.g. ICD-10), and construction of semantic networks from extracted information, to be used for prevention/diagnosis pattern discovery. As aforementioned, a second objective consists in developing text-based technology to support healthcare institution management, including classification of medical documents according to medical coding systems. Anonymization, negated and speculative facts, disambiguation of acronyms, whose presence in medical reports is extremely common, the identification of temporal expressions that allow to build timelines of events related to a patient, and the exploration of the analysis of emotional states as a source of additional information about the patient are among the techniques we plan to explore and develop. We will also advance in the improvement of the biomedical domain information representation models (for example in the combination of embeddings, graphs, rules of association, etc.), as well as in the enrichment of medical ontologies with terminological variants of concepts and their approximated search, that will allow to improve in the rest of the objectives.