SYSTEM AND A METHOD FOR ASSESSING PATIENT RISK USING OPEN DATA AND CLINICIAN INPUT

20170277855 · 2017-09-28

Assignee

FUJITSU LIMITED (Kawasaki-shi Kanagawa, JP)

Inventors

Cpc classification

International classification

Abstract

A system for assessing patient risk using open data and input of knowledge data, the system including a healthcare knowledge data input to receive open data and a knowledge input to accept input of knowledge data relating to risk; a healthcare risk engine to provide a healthcare risk knowledge graph from the open data and knowledge data by using input of risk-related terms to retrieve documents from the open data and by extracting the healthcare risk knowledge graph as entities from the documents corresponding to risk-related terms, as well as links between the entities. A patient risk graph prediction module predicts risks for a patient by combining information in a Patient Clinical Object (PCO) with entities in the healthcare risk knowledge graph to produce a patient risk graph.

Claims

1. A system for assessing patient risk using open data and input of knowledge data, the system comprising: at least one processor to cause the system to implement: a healthcare knowledge data input to receive open data and a knowledge input to accept input of knowledge data in form of risk-related terms relating to risk; a healthcare risk engine to provide a healthcare risk knowledge graph from the open data and knowledge data by using the risk-related terms to retrieve documents from the open data and by extracting the healthcare risk knowledge graph as entities from the documents corresponding to the risk-related terms and links between the entities; and a patient risk graph prediction module to predict risks for a patient by combining information in a Patient Clinical Object (PCO) with the entities in the healthcare risk knowledge graph to produce a patient risk graph.

2. A system according to claim 1, wherein the PCO is an aggregation of clinical entities that encapsulates items of information about the patient, and wherein the PCO contains historical clinical information as a graph centered on the patient, with information about the patient linked to the patient by categories including any one or combination of categories of diagnosis, symptom, treatment, hospital visit and prescription.

3. A system according to claim 1, wherein the risk-related terms include terms related to risks in form of potential diseases, terms related to risk factors that increase the likelihood of disease and terms related to treatments of a medical condition.

4. A system according to claim 1, wherein the healthcare risk engine comprises: a risk related terms collector to accept input of the risk-related terms; a medical entity reconciliator, to standardise and expand the risk-related terms to include synonyms and equivalent terms using a standardised vocabulary of terms; a topic detector and tagger, to retrieve a set of documents linked to the expanded terms from a medical document database; a named entity recognition, resolution and disambiguation (NERD) module to extract entities from the documents, each document with a score and each aligned to the standardised vocabulary; and a relation extractor to score relations between the entities based on co-occurrence of two entities in the documents in the retrieved documents; wherein the healthcare risks extraction system is arranged to generate a risk knowledge graph storing the entities and the scored relations of the entities.

5. A system according to claim 1, wherein the patient risk graph prediction module predicts risks for a patient by matching an item of information from the PCO with a corresponding entity in the healthcare risk knowledge graph, and extracting nodes around the corresponding entity to form the patient risk graph.

6. A system according to claim 1, wherein the patient risk graph prediction module is a meta-predictor which is arranged to use the PCO and healthcare risk knowledge graph in two or more of the following predictors: a diagnosis-based predictor to provide a patient risk subgraph based on previous diagnoses, a drug-based predictor to provide a patient risk subgraph based on previous drugs taken by the patient, a symptom-based predictor to provide a patient risk subgraph based on previous symptoms of the patient, and a treatment-based predictor to provide a patient risk subgraph based on treatments the patient is receiving; and wherein the meta diagnosis prediction module includes a meta predictor to make predictions by organizing and processing the patient risk subgraphs produced by the predictors into a patient risk graph.

7. A system according to claim 6, wherein each patient risk subgraph constructed by a predictor includes any entities in the healthcare risk knowledge graph which match the item of information, one or more entities which neighbour the matching entities and the entities on a shortest path between the matching entities.

8. A system according to claim 7, wherein each predictor is given a weighting based on an accuracy of performance measure, and wherein the weighting is used to determine a maximum number of entities in the subgraph which is retained in the patient risk graph.

9. A system according to claim 1, wherein each entity in the healthcare risk knowledge graph and the patient risk graph includes a score to reflect how closely the entity corresponds to the risk-related term.

10. A system according to claim 9, wherein the meta predictor checks the entities included by the predictors and selects entities which are present in a highest number of predictors and/or has a highest cumulative score as the patient risk graph.

11. A system according to claim 1, further comprising a translation module to accept a risk-related term in one language and translate the risk-related term into an equivalent in a language of the standardised vocabulary.

12. A computer-implemented method for assessing patient risk using open data and input of knowledge data, the method comprising: by at least one processor to implement: receiving open data and knowledge data in form of risk-related terms relating to risk; providing a healthcare risk knowledge graph from the open data and knowledge data by using the risk-related terms to retrieve documents from the open data and by extracting the healthcare risk knowledge graph as entities from the documents corresponding to the risk-related terms and links between the entities; and predicting risks for a patient by combining information in a Patient Clinical Object (PCO) with the entities in the healthcare risk knowledge graph to produce a patient risk graph.

13. A non-transitory computer-readable storage medium storing a computer program which when executed on a computer carries out a method for assessing patient risk using open data and input of knowledge data, the method comprising: receiving open data and knowledge data in form of risk-related terms relating to risk; providing a healthcare risk knowledge graph from the open data and knowledge data by using the risk-related terms to retrieve documents from the open data and by extracting the healthcare risk knowledge graph as entities from the documents corresponding to the risk-related terms and links between the entities; and predicting risks for a patient by combining information in a Patient Clinical Object (PCO) with the entities in the healthcare risk knowledge graph to produce a patient risk graph.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0040] Preferred features of the present invention will now be described, purely by way of example, with references to the accompanying drawings, in which:

[0041] FIG. 1 is a block diagram of components in an embodiment of the invention;

[0042] FIG. 2 is a flow chart of a method in a general embodiment;

[0043] FIG. 3 is a flowchart of a method in a healthcare risks engine;

[0044] FIG. 4 is a block diagram of a system for providing a health risk knowledge graph;

[0045] FIG. 5 is a block diagram illustrating production of a PCO;

[0046] FIG. 6 is a block diagram illustrating modules in a predictor;

[0047] FIG. 7 is an illustration of an excerpt from a health risk knowledge graph;

[0048] FIG. 8 is an illustration of a patient risk graph; and

[0049] FIG. 9 is a diagram of suitable hardware for implementation of invention embodiments.

DETAILED DESCRIPTION

[0050] Embodiments of the invention may aim: [0051] to create a network of Health Risks, represented as a Knowledge Graph, extracted from the literature and public data sources together with the clinicians' expertise on risk assessment; [0052] to develop a mechanism that identifies the associated risks for a particular patient given his/her clinical history information, such as, diagnoses, drugs, and symptoms; and enrich the patient clinical history with his/her risk information.

[0053] Precision medicine is a medical model that proposes the customisation of healthcare, tailored to the individual patient/subject. This is an emerging approach for disease diagnosis, treatment and prevention that takes into account individual variability in genes, physiology, anatomy, environment, and lifestyle. In this context invention embodiments support the individual variability of the patients by including health risks, along with their associated treatments, diagnosis, and drugs.

[0054] The following definitions are used in this document:

[0055] Health risk (or simply risk): a disease or condition precursor associated with a higher than average morbidity or mortality rate. Disease precursors include demographic variables, certain individual behaviours, familial and individual histories, and certain physiological changes.

[0056] Health risk factor: a condition, behaviour, or other factor that increases risk, e.g., depression is a risk factor in suicide.

[0057] Medical treatment: the management and care of a patient, including for example in the mental health area, nursing, psychological intervention and specialist mental health rehabilitation. This term may also include “alternative” medical treatments and medication which may be prescribed, if so wished, for example, homeopathic/hypnosis/ acupuncture treatment.

[0058] Diagnosis: the process of determining by examination the nature and circumstance of a disease or condition from its signs and symptoms.

[0059] Drugs: medicaments that treat or prevent or alleviate the symptoms of a disease.

[0060] As far as the inventors are aware, there is no standard resource for dealing with health risks, there are only ad-hoc resources such as plain lists, or matrices within medical institutions and for specific areas.

[0061] In summary: [0062] there are no standards for representing health risks, in the same way as there are standards for diseases, e.g., ICD9; there are only plain lists of risks and they are specific to a particular medical institution or area; [0063] there is a lack of methods and tools that exploits the information of the healthcare risks and their association to patient clinical data; [0064] there are no approaches that estimate the potential risks of a patient after applying a particular treatment.

[0065] Invention embodiments create a network of medical risks of a given patient and can also evaluate the potential impact of a treatment for such patient.

[0066] A detailed embodiment might consist of three main modules: [0067] A module for the creation of Health Risks Knowledge Base/Graph based on information extracted from the literature and public data sources together with the clinicians' expertise on risk assessment. [0068] A module that predicts for a given patient a health risks subgraph that is affected.

[0069] The system according to one invention embodiment includes a Health Risks Knowledge Graph builder module 10, based on information obtained from literature, and available standards which creates a knowledge graph 80; and a patient risk graph prediction module 110, which predicts the risks for a given patient.

[0070] It is worth mentioning that the solution in this embodiment also relies on a “Patient Clinical Object” (PCO) 90 which is defined as a semantically rich aggregation of clinical entities that encapsulates information about a given patient. This PCO contains historical clinical information, such as information about the patient and its clinical data, diagnoses, and drugs.

[0071] FIG. 1 shows the main components of the system 1 for health risk assessment. There is a healthcare knowledge data input to receive open data and an expert knowledge input to accept input of form one or more clinicians relating to risk. The clinician knowledge is input by people directly, and can be grouped into risks, risk factors and treatments. The open data may be from healthcare data sources such as PUBMED and SNOMED, as explained in more detail below.

[0072] A healthcare risk engine is used to provide a healthcare risk knowledge graph from the open data and clinician input. This uses the human knowledge of (a) clinician(s), who inputs risk-related terms to retrieve relevant documents from the open data. The healthcare risk knowledge graph is extracted as entities (standardised terms, but also including the clinician's original terms) from the documents corresponding to the clinician's terms, as well as the links between the entities.

[0073] A patient risk graph prediction module predicts risks for a specific patient by combining information in a Patient Clinical Object, PCO, with entities in the healthcare risk knowledge graph to produce a patient risk graph.

[0074] FIG. 2 illustrates a corresponding method, including receiving, S10, open data and clinician knowledge relating to risk; providing, S20, a healthcare risk knowledge graph from the open data and clinician input by using clinician input of risk-related terms to retrieve relevant documents from the open data and by extracting the healthcare risk knowledge graph as entities from the documents corresponding to the clinician's terms, as well as the links between the entities; and predicting, S30, risks for a specific patient by combining information in a Patient Clinical Object, PCO, with entities in the healthcare risk knowledge graph to produce a patient risk graph.

[0075] The specific modules of the system are described in more detail below.

[0076] Health Risk Engine 10

[0077] This module captures the evidence based on data derived from literature and public data sources, such as PUBMED (PUBMED is a service of the US National Library of Medicine (NLM) and provides free access to the NLM database of nursing, veterinary, healthcare, medical and scientific articles) and SNOMED (Systemized Nomenclature of Medicine).

[0078] One underlying concept is that the data used covers a wide range of different risks and risk factors: invention embodiments are not limited to a certain area of medicine. For example SNOMED CT (clinical terms) is a standardised multilingual vocabulary which is generally applicable across medical and health care areas. PUBMED is also as wide-ranging as the US NLM and thus generally applicable.

[0079] The process carried out by the module is shown in FIG. 3. It starts with an initial seed of keywords provided by the clinicians. With this seed the task of data collection, cleaning and pre-processing is performed. This task consists of searching for risk related information on available open datasets. Once we have the initial set of risk related terms the engine filters those terms according to a particular topic. The next task consists of performing some text mining over the initial set of extracted terms to get the risk and risks factors along with the relationships among them. Finally, the system allows clinicians to carry out a manual curation task to identify and find possible mistakes and inconsistencies in the graph.

[0080] FIG. 4 shows one example of a specific healthcare risk engine which accepts inputs from clinicians and is connected to open data in the form of a standardised vocabulary of terms and a library of documents from the healthcare domain, nursing, veterinary, healthcare, medical and scientific articles. Individual modules are explained further below.

[0081] A risk related terms collector 20 accepts input of seed terms by a clinician (or from a group of clinicians). These clinician's terms include terms related to risks in the form of potential diseases or conditions, terms related to risk factors that increase the likelihood of disease and terms related to treatments of a medical condition.

[0082] For data collection, cleaning and pre-processing, a medical entity reconciliator 30 can be used to standardise and expand the clinicians' terms to include synonyms and equivalent terms using a standardised vocabulary of terms. For example the SNOMED ontologies may be used, as explained in more detail later.

[0083] A topic detector 40 is used to filter resources by retrieving a set of documents linked to the expanded terms from a searchable medical document database (such as PUBMED). Essentially, this component compares the documents contents (for example their abstracts) with the standardised terms and selects the documents which include exactly those terms or close matches to those terms. It also tags the documents with their main topic(s).

[0084] A named entity recognition, resolution and disambiguation, NERD, module 50 extracts entities from the set of document each with a score and each aligned to the standardised vocabulary. That is, the entity may be taken from the SNOMED vocabulary, for example, but is matched to the document content.

[0085] A relation extractor 60 scores relations between the entities based on the co-occurrence of two entities in documents in the retrieved set of documents. For example, this can use known co-occurrence metrics.

[0086] The healthcare risks extraction system is arranged to generate a risk knowledge graph 80 storing the entities and their scored relations. The graph is generated by the parts explained above. The graph can then be displayed to the user (who might for instance be another clinician). For example the user might enter a term, such as a risk, risk factor or treatment and receive a subgraph of the linked terms and the strength of the link, based on the knowledge implicitly stored in the PUBMED library. However, the healthcare risk knowledge graphs is combined with a PCO in invention embodiments, to provide individual risk information.

[0087] PCO 90

[0088] An example of a PCO in construction is shown in FIG. 5. The PCO is, for example, provided from historical clinical data as a graph centred on the patient, with information about the patient linked to the patient by categories, such as any of diagnosis, symptom, treatment, hospital visit and prescription. The historical clinical data may be provided, for instance from hospital records, or health authority records. A PCO enricher may provided in the data and knowledge acquisition system, to compare the PCO with a biomedical knowledge graph to equate PCO parts with standard vocabulary and to annotate entities in the PCO with corresponding concepts/information from the biomedical knowledge graph.

[0089] Patient Risk Graph Prediction 110

[0090] The patient risk prediction module can be a meta-predictor, also known as hybrid/combined predictor, that makes predictions by organizing and processing the predictions produced by several predictors. The individual predictors may take the information for the relevant features from the Patient Clinical Object, and the Healthcare Risks Knowledge graph.

[0091] The individual predictors are [0092] Predictor based on previous diagnoses. In this case the prediction is made by checking and reviewing the previous diagnoses of the patient. [0093] Predictor based on the drugs the patient was taking. [0094] Predictor based on the symptoms of the patient. [0095] Predictor based on the treatments the patient is receiving.

[0096] FIG. 6 depicts each one of the predictors and their relation with the Patient Clinical Object, Healthcare Risks Knowledge graph and the output, which is the patient risk sub graph prediction. This prediction indicates which risks are valid for a particular patient. Nodes of the subgraph can include risks, risk factors and treatments.

[0097] Each predictor uses items of information (terms) and potentially surrounding terms (this is related to the context) found in the PCO and searches for matching entities (also terms, which form nodes in the graph) in the healthcare risks knowledge graph. It then expands the subgraph around the corresponding terms to include further entities. For example, if there are several corresponding terms, the subgraph may be expanded to include all the nodes in the shortest path between the corresponding terms and the 1, 2 or 3 adjacent terms to each corresponding term.

[0098] Once we have the outputs of the individual predictors, a meta predictor component combines the individual predictors in order to offer better predicting performance and to see which terms (nodes) to retain in the patient subgraph. To weight this combination, the component adjusts weights to each one of the predictors, for example using the following equation

R.sub.j−W.sub.dP.sub.d+W.sub.drP.sub.dr+W.sub.sP.sub.s+W.sub.tP.sub.t

[0099] Where [0100] R.sub.j is the predicted patient risks (sub) graph [0101] W.sub.d is the assigned weight to the predictor based on previous diagnosis [0102] P.sub.d is the prediction based on previous diagnosis [0103] W.sub.dr is the assigned weight to the predictor based on drugs the patient was taking [0104] P.sub.dr is the prediction based on drugs the patient was taking [0105] W.sub.s is the assigned weight to the predictor based on symptoms of the patient [0106] P.sub.s is the prediction based on symptoms of the patient. [0107] W.sub.t is the assigned weight to the predictor based on treatments of the patient [0108] P.sub.t is the prediction based on treatments of the patient

[0109] The component takes a sample from the population of patients and creates a training dataset. The goal of the component is to build an algorithm that automatically applies the predictors, and makes a best guess or estimate the patient risk subgraph.

[0110] A detailed example of meta-prediction follows, using the equation as set out previously.

[0111] Basically, each predictor outputs a set of risks, risk factors and treatments (entities) each ranked based on the score of that entity in the healthcare risk graph.

[0112] The weight for each predictor represents how accurate its risk assessment is. Each weight then represents the number of entities we consider for each predictor. The meta predictor outputs the intersection of the repeated entities of the individual predictors.

[0113] For example, and using numbers for the entities, the predictor based on previous diagnoses may have the following output:

TABLE-US-00001 D Score 300.00 0.7 290.0 0.5 300.01 0.5 290.01 0.4

[0114] And a weight of 2 represents that we only consider the first two risks for that predictor:

[0115] Let us suppose we have the following example

R.sub.j=W.sub.dP.sub.d+W.sub.drP.sub.dr+W.sub.sP.sub.s+W.sub.tP.sub.t

[0116] And replacing the results of the predictors:

[0117] Next, the meta predictor checks which entities are present in all the individual predictors and selects the ones with a high score (above a threshold) in terms of the largest cumulative score and/or largest number of times they appears. These nodes are included in the patient risk graph.

[0118] The meta predictor, in order to calculate the weights, is trained in advance on a pre-defined set of training examples, which then facilitate its ability to reach an accurate diagnosis when given new patient data.

[0119] The subgraph can be used in many flexible ways and contains significantly more information than a simple list of risks.

[0120] FIG. 7 shows an excerpt of a risk knowledge graph showing how biomedical related information is encoded in a graph. The graph database contains information about diagnoses (or risks), drugs, treatments, and symptoms, and the relations between them. For example the weights on the links between the terms show how strongly they are related using the co-occurrence metrics mentioned previously.

[0121] In this particular example the graph identifies the relation between Anxiety and Depression as comorbidity with a score of 0.7, and the relation between Depression and Sertraline as treatment, because the drug prescription for depression is in some cases sertraline.

[0122] Each entity has a score (.e. Anxiety—0.9) showing its similarity to the sum of the documents in the retrieved set of documents. 1 indicates an identical term in all the relevant documents.

[0123] The labels are available due to previous annotation of SNOMED by the clinician with the risks, risk factors and treatments, using the terms collector or another module. For example, a link between two risks is labelled with “co-morbidity”, a link between a risk and a risk factor is labelled with “risk factor” and a link between a treatment and a risk or risk factor is labelled “treatment”.

[0124] FIG. 8 shows a patient risk graph, which has combined information about heart palpitations and panic in the PCO with the part of the risk knowledge graph shown in FIG. 7.

[0125] Here, the PCO information has been combined with the general derived risk information to give a picture of risks for the individual, which can be, for example, displayed to the user with a GUI.

[0126] FIG. 9 is a block diagram of a computing device, such as a data storage server, which embodies the present invention, and which may be used to implement a method of an embodiment. The computing device comprises a computer processing unit (CPU) 993, memory, such as Random Access Memory (RAM) 995, and storage, such as a hard disk, 996. Optionally, the computing device also includes a network interface 999 for communication with other such computing devices of embodiments. For example, an embodiment may be composed of a network of such computing devices. Optionally, the computing device also includes Read Only Memory 994, one or more input mechanisms such as keyboard and mouse 998, and a display unit such as one or more monitors 997. The components are connectable to one another via a bus 992.

[0127] The CPU 993 is configured to control the computing device and execute processing operations. The RAM 995 stores data being read and written by the CPU 993. The storage unit 996 may be, for example, a non-volatile storage unit, and is configured to store data.

[0128] The display unit 997 displays a representation of data stored by the computing device and displays a cursor and dialog boxes and screens enabling interaction between a user and the programs and data stored on the computing device. The input mechanisms 998 enable a user to input data and instructions to the computing device.

[0129] The network interface (network I/F) 999 is connected to a network, such as the Internet, and is connectable to other such computing devices via the network. The network I/F 999 controls data input/output from/to other apparatus via the network.

[0130] Other peripheral devices such as microphone, speakers, printer, power supply unit, fan, case, scanner, trackerball etc may be included in the computing device.

[0131] Methods embodying the present invention may be carried out on a computing device such as that illustrated in FIG. 9. Such a computing device need not have every component illustrated in FIG. 9, and may be composed of a subset of those components. A method embodying the present invention may be carried out by a single computing device in communication with one or more data storage servers via a network. The computing device may be a data storage itself storing at least a portion of the data graph. A method embodying the present invention may be carried out by a plurality of computing devices operating in cooperation with one another. One or more of the plurality of computing devices may be a data storage server storing at least a portion of the data graph.

[0132] Although a few embodiments have been shown and described, it would be appreciated by those skilled in the art that changes may be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the claims and their equivalents.

SYSTEM AND A METHOD FOR ASSESSING PATIENT RISK USING OPEN DATA AND CLINICIAN INPUT

Assignee

Inventors

Cpc classification

Classification Explorer

G16Z99/00

PHYSICS

Classification Explorer

G16H70/00

PHYSICS

Classification Explorer

G16H50/30

PHYSICS

International classification

Classification Explorer

G06F19/00

PHYSICS

Abstract

Claims

Description