Method of Automatically Matching Procedure Definitions in Different Radiology Information Systems
20220238239 · 2022-07-28
Assignee
Inventors
Cpc classification
G16H10/00
PHYSICS
G16H15/00
PHYSICS
G06F18/213
PHYSICS
International classification
Abstract
A computer-implemented method which, given a set of procedure definitions in a first radiology information system generates the best match for a procedure definition defined in a second system on the basis of a multidimensional vector representation of procedure definitions and a matching algorithm based on vector cosine similarity.
Claims
1. A computer-implemented method of matching a procedure definition formulated in a first Radiology Information System (client RIS) to a procedure definition in a catalog of procedure definitions defined in a second Radiology Information System (vendor RIS) by generating a set of procedure definitions defined in said second MS as a set of multidimensional vectors, each dimension of such a vector representing a token in said procedure definition, a token corresponding with a word of a vocabulary of relevant words for said procedure definition, representing a procedure definition of said first MS to be matched by a multidimensional vector, each dimension of said vector representing a token in said procedure definition, a token corresponding with a word of a vocabulary of relevant words for said procedure definitions, and applying to a matching algorithm to the vectors so as to generate a matching result.
2. The method according to claim 1, wherein said matching algorithm is based on vector cosine similarity.
3. The method according to claim 2, wherein a weight is given to at least one of said tokens.
4. The method according to claim 3, wherein the weight is given to a token that represents one of a modality, laterality, contrast modifier, and/or number of views.
5. The method according to claim 1, wherein a weight is given to at least one of said tokens.
6. The method according to claim 5, wherein the weight is given to a token that represents one of a modality, laterality, contrast modifier, and/or number of views.
Description
DETAILED DESCRIPTION OF THE INVENTION
[0020] The invention provides a computer-implemented method which, given a set of (internal) procedure definitions in a first radiology information system generates the best match for a procedure definition defined in a second system.
[0021] The method basically tries to find similar documents from a catalog in a given radiology information system for a given input document generated in another radiology information system.
[0022] The high-level workflow of the algorithm is as follows:
[0023] Given a first procedure definition e.g. in a first radiology information system of a hospital or department, the algorithm returns the best matching procedure definition from a catalog of procedure definitions as defined in a second radiology information system.
[0024] The match is defined as a score from 0 to 1, with 1 being a perfect match.
[0025] The matching score is computed as the cosine between two vectors, one vector representing the first procedure definition, e.g. in a client system and the other representing a procedure definition from a catalog of definitions generated in a second radiology information system.
[0026] To compute the vector representation, first each procedure definition is converted to a set of tokens.
[0027] Preferably the following steps are implemented: [0028] (i) Extract relevant fragments of text from various sources such as the name, code, modality, and body part of the procedure definition; [0029] (ii) Convert to lower case; [0030] (iii) Apply string substitutions to standardize the text, e.g., to map synonyms, fix typos, replace special characters, etc.; [0031] (iv) Split the text into tokens based on a set of delimiters including <space> and a set of configurable characters, e.g. /, -, etc.; [0032] (v) Stemming and lemmatization; [0033] (vi) Clean and simplify tokens, e.g., by removing non-alphanumeric characters, removing vowels in large words, etc.; and/or [0034] (vii) Remove duplicate tokens.
[0035] Extraction of relevant fragments and splitting into tokens are mandatory steps, others are preferred embodiments.
[0036] All tokens from all first procedure definitions are gathered into a vocabulary. This vocabulary represents a multi-dimensional space where each token represents one dimension. Thus by looking up the index in the vocabulary, a dimension can be assigned to each token.
[0037] According to this invention, at least one token is also be assigned a weight. By default, every token has the same weight of 1. Certain tokens may receive a different value when they are recognized as special concept, such as modality, laterality, contrast modifier or number of views. This allows the host to give more or less weight to specific concepts, e.g. making a modality much more important by increasing its weight, or reducing the relevance for the number of views. The weight of a token can also be modified depending on the source that it was extracted from, e.g. a modality extracted from the procedure definition name vs the modality from its metadata.
[0038] In a specific embodiment, a weight is set to a value greater than 1 for a token that represents one of a modality, laterality, contrast modifier or number of views.
[0039] It is also possible that the weight is smaller than one in case of tokens that have less importance in the matching process.
[0040] In a specific embodiment, weights can also be calculated by means of training data so that the algorithm does not need manually determined substitution values.
[0041] Given its set of tokens, a procedure definition can now be written as a vector where each token represents a dimension and the coefficient for that dimension is the token's weight. Note that due to the size of the vocabulary, these vectors are very sparse as most of the coefficients are 0.
EXAMPLE 1
[0042] Below is the vector representation for a catalog of two vectors defined in a first radiology information system, i.e. CT brain and MR head with tokens ‘ct’, ‘brain’, ‘modality’ and ‘head’ and wherein ‘modality’ is considered twice as important as other tokens: [0043] Vocabulary is ct, brain, mr, head [0044] First (in a first system) procedure definition CT brain is represented by the vector (2,1,0,0) [0045] First (in a first system) procedure definition MR head is represented by the vector (0,0,2,1) [0046] Second (in a second system) procedure definition CT head tilted is represented by the vector (2,0,0,1)
[0047] A matching algorithm is then applied to match a procedure definition in one radiology information system with a procedure definition out of the set of procedure definitions generated by the second system.
[0048] Such a matching algorithm is e.g. a matching algorithm that works according to vector cosine similarity.
[0049] The algorithm can be requested to return the top results for the best matches, not just the single best match. In case there are multiple results with the same score, it will return all results with the same score.
[0050] So, for example, given a catalog of 5 first procedures, part of the vocabulary of a second radiology information system, and one second procedure, part of a different first radiology information system, the matching scores are 90%, 80%, 70%, 70%, 50%. When requesting the best result, the algorithm will return the internal procedure definition for which the matching score is 90%. When requesting the 2 best results, it will return 2 results, those for a score of 90% and 80%. When requesting the 3 best results, it will return 4 results, those for a score of 90%, 80%, 70% and 70%, because the 3th and 4th results have the same score.
[0051] Having described in detail preferred embodiments of the current invention, it will now be apparent to those skilled in the art that numerous modifications can be made therein without departing from the scope of the invention as defined in the appending claims.