METHOD AND SYSTEM FOR RETRIEVAL OF FINDINGS FROM REPORT DOCUMENTS

20170228455 · 2017-08-10

    Inventors

    Cpc classification

    International classification

    Abstract

    System and method used to provide fast and accurate retrieval of findings results from large amounts or report documents (the corpus), such as medical record documents. The system maintains a dynamic list of the characteristics of no-finding called no-finding descriptors, each identified by a tag. Upon entering the corupus, the sentences of each new document are searched, and each sentence the content of which is similar to one of the descriptors is tagged. When search is conducted, the user enters a word or phrase, which expresses the subject of search. This subject is searched for in the corpus and from which a list of all sentences that contain the subject—the initial result list. The initial results list includes both finding and no-finding results. The final result list is obtained by extracting from the initial result list all occurences of the tagged no-finding sentences.

    Claims

    1. A method for performing search to retrieve findings from report documents stored in a corpus, the method is comprised of the following steps: a. receiving a query regarding the search subject from the user; b. preparing relevant No-Finding Tag List from the tag descriptors in Tag Definition Table; c. tagging all documents in the corpus with the relevant No-Finding Tags, generating Removed Result List; d. preparing a Result List containing all documents related to the search subject; e. removing from the Result List all documents in the Removed Result List; and f. presenting the Result List and the Removed Result List to the user.

    2. The method according to claim 1 wherein the corpus contains medical reports.

    3. The method according to claim 1 wherein the relevant no-finding tags are derived by comparing the match level between the query phrases to the tag descriptors.

    4. The method according to claim 1 wherein the Removed Result List is derived by comparing the match level between the Tag Descriptors and the sentences in the documents.

    5. The method according to claim 1 wherein the user can modify the Tag Definition Table.

    6. A method for performing search to retrieve findings from report documents stored in a corpus, the method is comprised of the following steps: a. tagging all documents in the corpus according to the definition in the No-Finding Tag-Definition Table. b. receiving a query regarding the search subject from the user; c. searching all documents in the corpus for finding matching d. repeating, for each document in corpus: i. searching the document for matching with the received query; ii. if matching was found, then if the matched sentence is tagged, add the document to the Removed Result List, else add the document to the Result List; and e. presenting the result list and the removed result list to the user.

    7. The method according to claim 6 wherein the corpus contains medical reports.

    8. The method according to claim 6 wherein the relevant no-finding tags are derived by comparing the match level between the query phrases to the tag descriptors.

    9. The method according to claim 6 wherein the user can modify the Tag Definition Table.

    10. A system comprising one or more computers configured to perform operations for retrieving findings from report documents stored in a corpus, operations comprising: a. receiving a query regarding the search subject from the user; b. preparing relevant No-Finding Tag List from the tag descriptors in Tag Definition Table; c. tagging all documents in the corpus with the relevant No-Finding Tags, generating Removed Result List; d. preparing a Result List containing all documents related to the search subject; e. removing from the Result List all documents in the Removed Result List; and f. presenting the Result List and the Removed Result List to the user.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0018] FIG. 1 presents a flow chart of the search preparation process.

    [0019] FIG. 2 presents a flow chart of an embodiment of the processing of the query.

    [0020] FIG. 2A presents a flow chart of another embodiment of the processing of the query.

    [0021] FIG. 3 presents a flow chart of search refinement phase.

    DETAILED DESCRIPTION

    [0022] The invention will be described more fully hereinafter, with reference to the accompanying drawings, in which a preferred embodiment of the invention is shown. The invention may, however, be embodied in many different forms and should not be construed as limited to the embodiment set forth herein; rather this embodiment is provided so that the disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

    [0023] Before describing the processing that each word goes through, it is important to explain the corpus of the system. The corpus of the system is a database that stores information on each document, sentence and each word ever entered the system, documents that constitute the search domain. Among the information on each sentence and word, the system corpus keeps a list of all words and their locations within the document as well as the sentence number within the document where that word is located, referred to as the search indexes. It also contains a phonetic representation for each word as well as statistical information on the word. Also it contain a dictionary of semantic synonyms of each word, including cross language synonyms. The result is that a search phrase entered by the user is transformed into multiple phrases that expressed the original query but not identical to it.

    [0024] The no finding queries work the same but the tag is of type “no finding” which is used by the system as signal to filter it out when the user request only positive findings.

    [0025] The system maintains a table, which defines No-Finding Tag definitions. An example of such a table is shown in Table 1. Each line in the table defines the TAG and its descriptors. The descriptor is a phrase. A TAG may have more than one descriptor. When a sentence in a document contains similar information to that of a TAG's descriptor, the TAG is added to No Finding Tagged Document Table, an example of which is shown in table 2.

    TABLE-US-00001 TABLE 1 No-Finding Tag Definition Table NO-FINDING TAG NO-FINDING TAG'S DESCRIPTOR Lung<NF> Lungs free of active disease no hyper metabolic foci suggestive of metastatic lesions in lungs Spleen <NF> No significant abnormality of the spleen There is uniform activity in the spleen text missing or illegible when filed  adrenals <NF> No findings suggestive of metastatic disease activity in adrenals Low temperature General no finding<NF> There are no additional significant findings normal text missing or illegible when filed indicates data missing or illegible when filed

    TABLE-US-00002 TABLE 2 No-Finding Tagged Document-Sentence Table DOCUMENT SENTENCE MATCHING ID. TAG ID. SCORE 12345 <NF> spleen 5 0.8 lung> <NF> 18 0.92

    [0026] The flow chart of one embodiment of the preparation process is shown in FIG. 1. Each new medical document, which is part of the search domain, goes through the preparation process. After reading the new document in step 110, it goes through the loop of steps 112 to 120, where each sentence of the document is compared to TAG's descriptor, and when match is found, the TAG is added to the No-Finding Tagged Document-Sentence Table. In step 112, a new descriptor is fetched from the list of No-Finding Tag's Definition Table—150. If all descriptors were processed, as tested in step 114, then the processing of the document terminates. Otherwise, the new fetched descriptor is compared to the sentences in the document and a matching score is calculated—step 116. If the matching score, as tested in step 118, is bigger than a predefined threshold, than step 120 is executed where the Tag of the matching descriptor is added to the No-Finding Tagged Document-Sentence table 160.

    [0027] One embodiment of the processing of a query is shown in FIG. 2. The user enters a search request. The search request is a word or phrase expressing a finding. For example, the user wants to find all cases with pulmonary edema. Using ontology 250, comprehensive set of search expressions is generated. A document from the corpus is retrieved—in step 214 and is searched in step 216 to find matching phrases to the set of search expressions. A score is assigned to every match. Only those matches whose score is higher than a predetermined threshold are considered phrase detection. If the desired phrase was detected, as tested in step 218, then the system checks if the retrieved sentence in the document was tagged—with “no finding” tag step 220. If the sentence was not tagged with “no finding” tag, it is assumed that it did not contain no-finding, and the document is added, in step 222 to the result list 270. If the retrieved sentence was tagged with “no finding” tag, then this sentence in step 224, is added to the Removed Result List 280.

    [0028] If there is no sentence in the document that contain the search phrase and is not tagged with “no finding” tag, then the document is removed from the list

    [0029] Another implementation of the processing of the query is shown in FIG. 2A. The user enters the search request in order to find documents that contains a finding. For example, the user wants to find cases with findings in the lungs , so he enters the word “lung”. The system receives the user request, and using ontology 250, prepares Search List—step 230. The system then, in step 232, extracts from the List of No-Finding Tags Definition table 150, relevant No-Finding Tags and saves it in a list—250 called Relevant-No-Finding Tag List. The system proceeds to step 234 in which the corpus is searched for all sentences that contain the search phrases, and in step 236, the initial Result List—270—is generated.

    [0030] Each result in the Result List goes through loop consisting of steps 238 to 248. In step 238, a result from the Result List is fetched. In step 240 the Tags of the sentences from the Result List 270, are compared to the Tags in the Relevant No-Finding Tag list 250. If match is found—step 242—Then step 244 is executed where the sentence with the matched Tag is removed from the Result List 270, followed by step 246 where a Removed Result List 280 is updated. In step 242, if tagged sentence from the Result List 270 is not found in the Relevant No-Finding Tag 250, then next sentence is processed. In step 248, if not all sentences in the Result List 270 have been processed then the next sentence is fetched and processed.

    [0031] The user can see both the Result List and the Removed Result List, as shown in FIG. 3. If he finds that the Result List contains a document that should not have been included in the Result List, it means that It was not removed. The user can update the Tag Definition Table so that this document will be tagged with No-Finding Tag, so next time it will be automatically removed from the Result List. The user request is received in step 310. If the request is to view the Result List as tested in step 312, then the Result List is shown—step 314. The user can view the content of any document in the Result List. If the user finds that a document in the Result List is not relevant—step 316, then he can, if he so wishes Update the Result List and/or Update the Tag Definition Table—step 318. The document can be removed from the Result List. The Tag Definition Table can be updated by addition of a new Tag Descriptor.

    [0032] If the user requests to view the Removed Result List—step 320, then the list is presented to the user who can open and view any document in the list—322. If the user finds that a document is relevant—step 324, i.e. it should have been included in the Result List, he can add the document to the Result List and remove it from the Removed Result List—step 326. The user can also update the No-Finding Tag Definition Table by deleting the Tag Descriptor that erroneously caused the tagging of the document.