METHOD AND APPARATUS FOR SEGMENTING A MEDICAL TEXT REPORT INTO SECTIONS
20220398374 · 2022-12-15
Inventors
- Shaika Chowdhury (Chicago, IL, US)
- Halid Yerebakan (Carmel, IN, US)
- Yoshihisa Shinagawa (Downingtown, PA, US)
Cpc classification
G16H50/70
PHYSICS
G16H15/00
PHYSICS
International classification
Abstract
A framework for segmenting a medical text report into sections is disclosed. For each sentence of the report, a first sentence representation is determined by inputting a word-level context representation for each sentence sequentially into a neural network. A second sentence representation is determined by inputting an aggregated representation for each sentence sequentially into another neural network. For each sentence, a third sentence representation is determined based on a combination of the first and second sentence representations, and a section classification for the sentence is determined by inputting the third sentence representation into a section classifier. Each sentence is assigned the section classification determined for the sentence.
Claims
1. A computer implemented method for segmenting a medical text report into sections, the medical text report comprising a plurality of sentences, the method comprising: (a) for each sentence, obtaining word embeddings of a plurality of words of the sentence; (b) for each sentence, determining a first sentence representation for the sentence by: for each sentence, inputting the word embeddings of the plurality of words of the sentence sequentially into a first neural network to generate a word-level context representation for the sentence; and inputting the word-level context representation for each sentence sequentially into a second neural network to generate the first sentence representation for each sentence; (c) for each sentence, determining a second sentence representation for the sentence by: for each sentence, applying an aggregating operation to the word embeddings for the sentence to generate an aggregated representation for the sentence; and inputting the aggregated representation for each sentence sequentially into a third neural network to generate the second sentence representation for each sentence; (d) for each sentence, determining a third sentence representation based on a combination of the first sentence representation and the second sentence representation for the sentence; (e) for each sentence, determining a section classification for the sentence by inputting the third sentence representation for the sentence into a section classifier; and (f) for each sentence, assigning the sentence the section classification determined for the sentence.
2. The computer implemented method according to claim 1, wherein the method further comprises: (g) generating output data in which text of the medical text report is segmented into sections, each section being associated with a particular section classification and including those sentences of the medical text report to which the particular section classification has been assigned.
3. The computer implemented method according to claim 2, wherein the method further comprises: (h) storing the output data in a structured storage such that each section is stored in association with a respective associated section classification.
4. The computer implemented method according to claim 1, wherein generating the word-level context representation for each sentence comprises, for each word of the sentence: determining a score indicating a relevance of the word to a context of the sentence; and weighting a contribution associated with the word to the word-level context representation using the score determined for the word.
5. The computer implemented method according to claim 4, wherein, for each word of the sentence, the contribution associated with the word to the word-level context representation comprises a hidden state, associated with the word, of a recurrent unit of the first neural network.
6. The computer implemented method according to claim 5, wherein the context of the sentence is represented by an aggregation of hidden states associated with all of the words of the sentence.
7. The computer implemented method according to claim 1, wherein the third sentence representation is determined based on a concatenation of the first sentence representation and the second sentence representation.
8. The computer implemented method according to claim 1, wherein applying the aggregating operation comprises taking a mean of the word embeddings of the plurality of words of the sentence.
9. The computer implemented method according to claim 1, wherein one or more of the first neural network, the second neural network and the third neural network comprise a bidirectional Recurrent Neural Network.
10. The computer implemented method according to claim 1, wherein one or more of the first neural network, the second neural network and the third neural network comprise one or more Gated Recurrent Units.
11. The computer implemented method according to claim 1, further comprising: providing training data, the training data comprising a plurality of medical text reports, each medical text report comprising a plurality of sentences, the training data further comprising a ground truth section classification for each sentence indicating a particular section of the medical text report to which the sentence belongs; and training the first neural network, the second neural network, the third neural network and the section classifier based on the training data so as to minimize a loss function between the section classifications determined for the sentences by the section classifier and corresponding ground truth section classifications for the sentences.
12. A computer-implemented method of training a neural network for segmenting a medical text report into sections, the medical text report comprising a plurality of sentences, the method comprising: providing the neural network, the neural network including (a) a first sentence representation module comprising: a first neural network configured to generate, for each sentence, a word-level context representation for the sentence based on sequential input of word embeddings of a plurality of words of the sentence; a second neural network configured to generate, for each sentence, a first sentence representation for the sentence based on sequential input of the word-level context representation for the sentence, (b) a second sentence representation module comprising: a third neural network configured to generate, for each sentence, a second sentence representation based on sequential input of an aggregated representation for the sentence, the aggregated representation having been generated by applying an aggregating operation to the word embeddings of the plurality of words of the sentence, and (c) a section classifier configured to, for each sentence, determine a section classification for the sentence based on input of a third sentence representation for the sentence, the third sentence representation being a combination of the generated first sentence representation and the generated second sentence representation for the sentence; providing training data, the training data comprising a plurality of medical text reports, each medical text report comprising a plurality of sentences, the training data further comprising a ground truth section classification for each sentence indicating a particular section of the medical text report to which the sentence belongs; and training the neural network based on the training data so as to minimize a loss function between the section classifications determined for the sentences by the section classifier and corresponding ground truth section classifications for the sentences.
13. The computer-implemented method according to claim 12, wherein the loss function comprises a cross entropy between the section classifications determined for the sentences by the section classifier and the corresponding ground truth section classifications for the sentences.
14. The computer-implemented method according to claim 12, wherein one or more of the first neural network, the second neural network and the third neural network comprise a bidirectional Recurrent Neural Network.
15. An apparatus, comprising: a non-transitory memory device for storing computer readable program code; and a processor in communication with the non-transitory memory device, the processor being operative with the computer readable program code to perform a method for segmenting a medical text report into sections, the medical text report comprising a plurality of sentences, the method comprising: (a) for each sentence, obtaining word embeddings of a plurality of words of the sentence; (b) for each sentence, determining a first sentence representation for the sentence by: for each sentence, inputting the word embeddings of the plurality of words of the sentence sequentially into a first neural network to generate a word-level context representation for the sentence; and inputting the word-level context representation for each sentence sequentially into a second neural network to generate the first sentence representation for each sentence; (c) for each sentence, determining a second sentence representation for the sentence by: for each sentence, applying an aggregating operation to the word embeddings for the sentence to generate an aggregated representation for the sentence; and inputting the aggregated representation for each sentence sequentially into a third neural network to generate the second sentence representation for each sentence; (d) for each sentence, determining a third sentence representation based on a combination of the first sentence representation and the second sentence representation for the sentence; (e) for each sentence, determining a section classification for the sentence by inputting the third sentence representation for the sentence into a section classifier; and (f) for each sentence, assigning the sentence the section classification determined for the sentence.
16. The apparatus according to claim 15, wherein generating the word-level context representation for each sentence comprises, for each word of the sentence: determining a score indicating a relevance of the word to a context of the sentence; and weighting a contribution associated with the word to the word-level context representation using the score determined for the word.
17. The apparatus according to claim 16, wherein, for each word of the sentence, the contribution associated with the word to the word-level context representation comprises a hidden state, associated with the word, of a recurrent unit of the first neural network.
18. The apparatus according to claim 17, wherein the context of the sentence is represented by an aggregation of hidden states associated with all of the words of the sentence.
19. The apparatus according to claim 15, wherein the third sentence representation is determined based on a concatenation of the first sentence representation and the second sentence representation.
20. The apparatus according to claim 15, wherein applying the aggregating operation comprises taking a mean of the word embeddings of the plurality of words of the sentence.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0010]
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
DETAILED DESCRIPTION
[0018] Referring to
[0019] Returning to
[0020] (a) in step 102, for each sentence s.sub.i, obtaining a word embedding w.sub.it for each of a plurality of words of the sentence;
[0021] (b) in step 104, for each sentence s.sub.i, determining a first sentence representation c.sub.senti for the sentence by: [0022] (i) for each sentence, inputting the word embeddings w.sub.it for each of the plurality of words of the sentence sequentially into a first trained neural network to generate a word-level context representation c.sub.wordi for the sentence; and [0023] (ii) inputting the word-level context representation c.sub.wordi for each sentence sequentially into a second trained neural network thereby to generate the first sentence representation c.sub.senti for each sentence;
[0024] (c) in step 106, for each sentence s.sub.i, determining a second sentence representation g.sub.i for the sentence by: [0025] (i) for each sentence, applying an aggregating operation to the word embeddings for each of the plurality of words of the sentence to generate an aggregated representation p.sub.i for the sentence; and [0026] (ii) inputting the aggregated representation p.sub.i for each sentence sequentially into a third trained neural network thereby to generate the second sentence representation g.sub.i for each sentence;
[0027] (d) in step 108, for each sentence s.sub.i, determining a third sentence representation u.sub.i based on a combination of the first sentence representation c.sub.senti and the second sentence representation g.sub.i for the sentence;
[0028] (e) in step 110, for each sentence s.sub.i, determining a section classification k.sub.i for the sentence by inputting the third sentence representation u.sub.i for the sentence into a trained section classifier; and
[0029] (f) in step 112, for each sentence S.sub.1 to S.sub.9, assigning the sentence the section classification k.sub.i determined for the sentence.
[0030] By this method, each sentence s.sub.i may be accurately and reliably assigned a section classification (e.g., ‘Description’, ‘Clinical History’, or ‘Findings’) and hence accurate and reliable section segmentation of medical text reports may be provided for.
[0031] Specifically, the text within a medical text report is sequential in nature. As such, the section to which a sentence would ideally be assigned may be influenced by the sequential context of the sentence (i.e., the sentences that precede and/or are subsequent to the sentence). In order to implement this, according to features (b) and (c), representations for each sentence (based on word embeddings) are passed sequentially into trained neural networks, which may be for example Recurrent Neural Networks, RNNs, to generate sentence representations (i.e., feature vectors) that encode the sequential context of each sentence.
[0032] However, the inventors have identified that a particularly accurate/reliable section classification can be obtained by encoding sequential context of sentences by combining two different but complementary branches or approaches to generating sentence representations: As per feature (b), a first, ‘local’, branch takes a hierarchical approach by first determining a word-level context representation for the sentence (by passing word embeddings sequentially into a first neural network, e.g. a first RNN) and then determining a first sentence representation for the sentence encoding the sentence-level sequential context (by passing the word-level context representations sequentially through a second trained neural network, e.g. a second RNN). This helps capture the fine-grained nuances between local sentences, such as between sentences within a section. On the other hand, as per feature (c), a second, ‘global’, branch takes an aggregated representation for the sentence (e.g., an average of the word embeddings for the sentence) and uses this to determine a second sentence representation encoding sentence-level sequential context (by passing the aggregated representations sequentially through a trained third neural network, e.g., a third RNN). This helps capture more course grained context changes between sentences, such as between sentences in different sections.
[0033] Combining the sentence representations from both branches (e.g., by concatenating them) as per feature (d) and inputting this combination into a trained section classifier to determine a section classification for each sentence as per feature (e) leads to more accurate and/or reliable assignment of section classifications to each sentence as per feature (f), for example as compared to using either branch alone. Accordingly, accurate and/or reliable section segmentation is provided for.
[0034] In some examples, the method may comprise generating output data in which the text of the medical text report 220 is segmented into sections, each section being associated with a particular section classification k.sub.i and including those sentences of the medical text report 220 to which the particular section classification k.sub.i has been assigned.
[0035] An example of generated output data is illustrated in
[0036] In some examples, the method may comprise storing the output data 330 in a structured storage such that each section is stored in association with a respective associated section classification. For example, the data of the table 330 of
[0037] Storing the output data in this way may provide for the efficient retrieval of information contained within the medical text reports. For example, where only a particular section of the report is needed, the particular section can be efficiently and precisely queried and extracted from the database, for example as compared to extracting the entire medical text report. For example, the structured storage may be accessed and interrogated by online medical query platforms and/or other information retrieval systems to return information to a user that is more relevant and/or precise with respect to a search query. The efficient extraction of information may, in turn, allow for the efficient running of downstream computer-implemented tasks, such as document summarization and comparison, and information extraction.
[0038] Another example of output data that may be generated is illustrated in
[0039] In some examples, the method may comprise displaying the output data 330, 440 on a display, such as a computer monitor (not shown). This may allow a user to readily discern the different sections of the medical text report, and hence may allow for a more efficient interaction of the user with the information contained in the report.
[0040] Example details of the steps of the method described above with reference to
[0041]
[0042] As is known, the input to a RNN is sequential and the state or output of the RNN resulting from a certain input is dependent on or influenced by the state or output resulting from a previous input. In
[0043] As mentioned, the method comprises, in step 102, for each sentence S.sub.i, obtaining a word embedding w.sub.it for each of a plurality of words of the sentence. As illustrated in
[0044] As is known, a word embedding is a vector representing the meaning or semantics of a word in a multidimensional space. Libraries of pre-trained word embeddings exist. In some examples, the word embedding for a given word may be obtained by looking-up the word embedding for the given word in such a library. In some examples, word embeddings may be obtained using a pre-trained model. For example, each word of the medical text report may be passed through a WordPiece tokenizer to return a set of tokens, each token representing a word, and the set of tokens may be passed through a pre-trained model to generate the word embedding for each word. For example, a BERT (Bideirectional Encoder Representations from Transformers) model may be used, whereby each sentence is passed through a BERT WordPiece tokenizer, and each resulting token passed through a pre-trained BERT model to obtain therefrom the embedding w.sub.it for each word. This may allow to accurately capture the similarities and regularities between words. It will be appreciated that in some examples, the word embeddings may be obtained in other ways.
[0045] As mentioned, the method comprises, in step 104, for each sentence, determining a first sentence representation c.sub.senti for the sentence.
[0046] Specifically, as a first part of step 104, for each sentence S.sub.i, the word embeddings w.sub.it for each of the plurality of words of the sentence are input sequentially into the trained first RNN 556 (recurrent neural network) to generate a word-level context representation c.sub.wordi for the sentence.
[0047] In some examples, the first RNN 556 may be a bi-directional recurrent neural network. That is, the first RNN may incorporate, for each word, both past and future context into the hidden state h.sub.it calculated for the word (i.e., context of the words preceding the word as well as context of the words subsequent to the word in the sentence). This may allow for both the preceding and subsequent context of each word to be incorporated into the word-level context representation c.sub.wordi for the sentence, which may in turn improve the accuracy with which the word-level context of the sentences can be represented.
[0048] In some examples, the first RNN 556 may comprise one or more Gated Recurrent Units (GRUs) R. This may allow for the hidden states h.sub.it for each word to be calculated with high performance and computational efficiency, for example as compared to a vanilla RNN or a Long Short-Term Memory (LSTM) unit. In the case that the first RNN 566 is a bidirectional RNN, the first RNN may comprise at least two GRUs R, one operating in a different sequential direction to the other.
[0049] For example, for each sentence s.sub.i, for the word having word embedding w.sub.it, the hidden state h.sub.it in a GRU R may be computed with an update gate and a reset gate using the following equations:
z.sub.it=σ(W.sub.zw.sub.it+V.sub.zh.sub.it-1+b.sub.z) (1)
r.sub.it=σ(W.sub.rw.sub.it+V.sub.rh.sub.it-1+b.sub.r) (2)
ĥ.sub.it=tan h(W.sub.hw.sub.it+V.sub.h(r.sub.it⊙h.sub.it-1)+b.sub.h) (3)
h.sub.it=(1−z.sub.it)⊙ĥ.sub.it+z.sub.it⊙h.sub.it-1 (4)
where ⊙ denotes the element-wise product of two vectors, σ is a sigmoid function, W and V are parameter matrices, b is a parameter vector, h.sub.it is the hidden state i.e., the output vector, ĥ.sub.it is the candidate activation vector, z.sub.it is the update gate vector, and r is the reset gate vector. In the case of the bidirectional GRUs (BiGRU), one GRU calculates a forward hidden state and another calculates the backward hidden state, and these may be concatenated to represent the hidden state h.sub.it of the word. Accordingly, the BiGRU may encode the word sequence for each sentence s.sub.i={w.sub.t, 1:M} into the hidden state sequence h.sub.i={h.sub.t, 1:M}.
[0050] In some examples, for each sentence, the hidden states h.sub.it output from the first RNN 556 for each word of the sentence s.sub.i may be summed to obtain the word-level sentence representation c.sub.wordi for the sentence.
[0051] However, in some examples, as illustrated in
[0052] For example, applying the attention mechanism may comprise, for each word of the sentence: determining a score a.sub.it indicating a relevance of the word w.sub.it to a context z.sub.i of the sentence s.sub.i; and weighting a contribution h.sub.it associated with the word w.sub.it to the word-level context representation c.sub.wordi using the score a.sub.it determined for the word. For example, the contribution h.sub.it associated with the word w.sub.it to the word-level context representation c.sub.wordi may comprise the hidden state h.sub.it, associated with the word, of a recurrent unit R of the first RNN 556. The word-level context representation c.sub.wordi for the sentence may comprise a weighted sum of the hidden states ha associated with the words of the sentence s.sub.i, each hidden state h.sub.it being weighted by the score a.sub.it determined for the associated word. The context of the sentence z.sub.i may be represented by an aggregation of the hidden states h.sub.it of all of the words of the sentence. For example, the aggregation of the hidden states may be a concatenation of the hidden states h.sub.it for all of the words of the sentence.
[0053] The score a.sub.it indicating the relevance of the word to the context z.sub.i of the sentence may be determined based on the output of an activation function (e.g., tan h) applied between the hidden state h.sub.it associated with the word and the aggregation z.sub.i of the hidden states. For example, the score a.sub.it between each hidden state h.sub.it and the context vector z.sub.i for the sentence s.sub.i may be computed using the following equations:
where v.sub.a, W.sub.1, and W.sub.2 are learned weight matrices. As mentioned, the context vector z.sub.i for the sentence s.sub.i may be the concatenation of the hidden states h.sub.it of all of the words of the sentence. A higher value of the relevance score a.sub.it indicates a higher salience of the information carried by the word with respect to the overall sentence context z.sub.i.
[0054] The hidden states h.sub.it for a sentence may then be transformed into a word-level context representation c.sub.wordi for the sentence s.sub.i by weighing each hidden state h.sub.it with its score a.sub.it, for example using the following equation:
c.sub.wordi=Σ.sub.t=1.sup.Ma.sub.ith.sub.it (7)
This may be repeated for each sentence to obtain the word-level content representation c.sub.wordi for each sentence.
[0055] In the second part of step 104, the word-level context representation c.sub.wordi for each sentence is input sequentially into the second trained RNN 558 thereby to generate the first sentence representation c.sub.senti for each sentence s.sub.i.
[0056] In some examples, the second RNN 558 may be a bi-directional RNN. In some examples, the second RNN 558 may comprise one or more Gated Recurrent Units R. For example, the second RNN 558 may operate similarly to as described above for the first RNN 556, but for example with the word-level context representation c.sub.wordi for each sentence being sequentially input into the second RNN 558 instead of the word embedding w.sub.it for each word of a sentence as per the first RNN 556. For example, the second RNN 558 may operate using equations (1)-(4) listed above, except with the word embedding w.sub.it for each word of a sentence being replaced with the word-level context representation c.sub.wordi for each sentence.
[0057] The second part of step 104 captures into the first sentence representation c.sub.senti for each sentence the semantically relevant context from the surrounding sentences. The first sentence representation c.sub.senti for each sentence, having been based on the word-level context representation c.sub.wordi for each sentence, encodes the fine-grained topical semantics of the sentence among the other sentences, and can help indicate more nuanced and fine-grained relationships between sentences, such as between sentences within a section.
[0058] As mentioned, the method comprises, in step 106, for each sentence s.sub.i, determining a second sentence representation g.sub.i for the sentence.
[0059] A first part of the step 106 comprises, for each sentence s.sub.i, applying an aggregating operation P to the word embeddings w.sub.it for the sentence to generate an aggregated representation p.sub.i for the sentence. For example, applying the aggregating operation P may comprise taking the mean of the word embeddings of the words of the sentence. In this case the aggregated representation p.sub.i for the sentence may be the average (mean) of the word embeddings w.sub.it of the words of the sentence. The mean operation may be particularly effective in that it is computationally simple but allows for all of the words of the sentence to contribute to the aggregated representation p.sub.i, thereby effectively capturing the overall or global context of the sentence. In some examples, the mean may be calculated by applying a mean pooling operation. It will be appreciated that in some examples, other aggregating operations may be used. For example, in some examples, the aggregating operation may be a pooling operation, applied to the word embeddings w.sub.it for the sentence, to generate a pooled representation p.sub.i for the sentence. For example, the pooling operation may be max-pooling (e.g., where the maximum value of the word embedding w.sub.it, or of sub regions of the word embedding w.sub.it, are taken as representative values of the word embedding), or for example min-pooling (e.g., where the minimum value of the word embedding w.sub.it, or of sub regions of the word embedding w.sub.it, are taken as representative values of the word embedding).
[0060] A second part of step 106 comprises inputting the aggregated representation p.sub.i for each sentence sequentially into the third trained RNN 590 thereby to generate the second sentence representation g.sub.i for each sentence. For example, the third RNN 590 may be a bidirectional RNN, i.e., whereby the context of both preceding sentences and subsequent sentences are encoded into the second sentence representation g.sub.i for the sentence. In some examples, similarly to as described above, the third RNN may comprise GRUs R. The second sentence representation g.sub.i for each sentence, having been based on the aggregated or global representation p.sub.i for each sentence, encodes the coarse-grained topical semantics of the sentence among the other sentences, and can help indicate more course grained context changes between sentences, such as between sentences in different sections.
[0061] As mentioned, in step 108, the method comprises, for each sentence, determining a third sentence representation u.sub.i based on a combination of the first sentence representation c.sub.senti and the second sentence representation g.sub.i for the sentence. For example, this may be performed by the combiner 591 module of the neural network of
[0062] As mentioned, the method comprises, in step 108, for each sentence, determining a section classification k.sub.i for the sentence by inputting the third sentence representation u.sub.i for the sentence into a trained section classifier. For example, the classifier 592 may be trained to determine the section classification for a sentence based on an input third sentence representation for the sentence. For example, the classifier may comprise a fully connected softmax layer S. This may give the probabilities for each of a plurality of pre-defined section classifications that the input sentence s.sub.i belongs to the section classification. For example, the softmax layer may output the probability that a given sentence belongs to the section ‘Description’, ‘Clinical History’ or ‘Findings’, although it will be appreciated that the trained section classifier may be configured to output probabilities for any number of pre-defined section classifications, in dependence on the section classifications on which it has been trained. In some examples, the section classifier may determine, as the section classification for an input sentence, the section classification associated with the highest probability output by the softmax layer.
[0063] As mentioned, the method comprises, in step 110, for each sentence, assigning the sentence the section classification determined for the sentence. For example, if the section classification k.sub.1 for the sentence s.sub.1 is determined as ‘Description’, this section classifier is assigned to the sentence. For example, a tag may be assigned to data representing the sentence to indicate the associated classification. In some examples, output data may be generated and stored, for example as described above with reference to
[0064] The section classification for each sentence being based on the combination u.sub.i of the both the more fine-grained first sentence representation c.sub.senti and the more course-grained second sentence representation g.sub.i allows for more accurate and/or reliable assignment of the section classification, for example as compared to using either one of the sentence representations alone. Accordingly, accurate and/or reliable section segmentation may be provided for. It is noted that an example demonstration of this is described below with reference to
[0065] Referring to
[0066] The neural network comprises a first sentence representation module 554 comprising: (i) a first neural network 556 configured to generate, for each sentence s.sub.i, a word-level context representation c.sub.wordi for the sentence based on sequential input of word embeddings w.sub.it for each of a plurality of words of the sentence. For example, the first neural network 556 may be an RNN, and in some examples may be the same as or similar to the first RNN 556 described above with reference to
[0067] The neural network comprises a second sentence representation module 552 comprising: a third neural 590 configured to generate, for each sentence, a second sentence representation g.sub.i based on sequential input of an aggregated representation p.sub.i for the sentence, the aggregated representation having been generated by applying an aggregating operation P to the word embeddings w.sub.it of each of the plurality of words of the sentence. For example, the third neural network 590 may be an RNN, and ins some examples may be the same as or similar to the third RNN 590 described above with reference to
[0068] The neural network comprises a section classifier 592 configured to, for each sentence, determine a section classification k.sub.i for the sentence based on input of a third sentence representation u.sub.i for the sentence, the third sentence representation u.sub.i being a combination of the generated first sentence representation c.sub.senti and the generated second sentence representation g.sub.i for the sentence. For example, the section classifier 592 may be the same as or similar to that described above with reference to
[0069] The method comprises, in step 604, providing training data. The training data comprises a plurality of medical text reports, each medical text report comprising a plurality of sentences s.sub.i, each sentence comprising a plurality of words, the training data further comprising a ground truth section classification y.sub.i for each sentence indicating the particular section of the medical text report to which the sentence belongs. For example, the sentences of the medical text reports of the training data may have been annotated, for example by an expert or automatically, to indicate the section classification to which the sentence belongs or should belong.
[0070] The method comprises, in step 606, training the neural network based on the training data. The neural network is trained to minimize a loss function between the section classifications k.sub.i determined for the sentences by the section classifier 592 and the corresponding ground truth section classifications y.sub.i for the sentences. For example, the loss function may comprise the cross entropy between the section classifications k.sub.i determined for the sentences by the section classifier 592 and the corresponding ground truth section classifications y.sub.i for the sentences. For example, the loss function L may be calculated according to the following equation:
L=Σ.sub.F=1.sup.RΣ.sub.i=1.sup.Ny.sub.i.sup.r log(k.sub.i.sup.r) (8)
where R is the total number of medical text reports in the training data set, and N is the total number of sentences in each text report r of the training data set.
[0071] Training the first RNN 556, second RNN 558, third RNN 590 and classifier 592 in this way allows for both the feature construction (i.e., the generation of the sentence representations) and the model training to be undertaken together automatically without human interaction. This allows the features to be learned through guidance from the model optimization. This reduces the need, as in a known section segmentation, to hand-craft features, and hence reduces the manual labor associated with training the model, as well as providing for better generalization on unseen layouts.
[0072] In the example described above with reference to
[0073] A demonstration of the effectiveness of the method according to examples disclosed herein in correctly assigning section classifications to sentences in medical text reports is provided. The demonstration is provided for illustrative purposes. Specifically, a study was performed to assess the effectiveness of the method as compared to other models. For the purposes of this illustrative study, parameters for the training of the neural network disclosed herein were as follows: a learning rate of 0.001 for 100 epochs, batch size set to 28, the dimension of the GRU hidden states is set to 100, the dimension of the attention mechanism is set to 10, the BERT word embedding dimension is 768, and all other weights are initialized using a Glorot uniform initializer.
[0074] This study was performed for four different data-sets: MtSamples (MT) which consists of transcribed medical reports downloaded from mtsamples.com, NationalRad (NR) which consists of transcribed radiology reports downloaded from nationalrad.com/radiology/reports, JH consisting of sample reports provided from a hospital, and NLP consisting of sample reports provided by another hospital. The data sizes of these data sets are outline below in Table 1, both in terms of number of reports and number of sentences. The dataset ALL includes all of the MT, NR, JH and NLP datasets added together.
TABLE-US-00001 TABLE 1 MT NR JH NLP ALL #Reports 267 51 124 81 523 #Sentences 6745 1579 3894 1165 13383
[0075] For the purposes of the illustrative study, each data set was split into 80% training, 10% validation, and 10% testing.
[0076] The performance of each model is indicated by a weighted average accuracy, precision, recall, and F-score of whether the section classification of each sentence in the report is predicted correctly (as determined by the ground truth).
[0077] The other models to which the performance of the presently disclosed method was compared included a Naïve Bayes (NB) model, a Support Vector Machine (SVM) model, a Maximum Entropy (ME) model, a Random Forest (RF) model, a convolutional neural network (CNN) model, a Multi-Layer Perceptron (MLP) model, a Cross-Segment Bidirectional Encoder Representations from Transformers (CS-BERT) model, a Bi-directional Long Short Term Memory (Bi-LSTM) model, and a Stacked GRU (St-GRU) model. The performance of the NB, SVM, ME, and RF models as compared to the presently disclosed method (MedTextSeg) for the ALL dataset is shown below in Table 2.
TABLE-US-00002 TABLE 2 NB SVM ME RF MedTextSeg Accuracy 65.75 70.44 36.57 71.12 90.24 Precision 67.72 71.11 13.37 71.67 94.79 Recall 65.75 70.45 36.57 71.11 90.24 F-score 63.77 69.48 19.58 70.56 90.81
The performance of the CNN, MLP, CS-BERT, Bi-BERT, and the ST-GRU models as compared to the presently disclosed model (MedTextSeg) for the MT, NR, JH, NLP, and ALL datasets is shown below in Table 3.
TABLE-US-00003 TABLE 3 CS- Bi- St- CNN NLP BERT BERT GRU MedTextSeg MT Accuracy 68.67 72.80 75.56 87.73 78.37 90.79 Precision 78.93 84.96 76.08 89.90 86.79 95.87 Recall 68.67 72.80 75.56 87.73 78.37 90.79 F-Score 70.82 75.52 74.89 87.21 80.05 91.72 NR Accuracy 72.16 73.47 90.62 93.19 77.99 97.64 Precision 72.19 74.63 89.06 96.06 75.63 96.67 Recall 72.16 73.47 90.62 93.19 77.97 97.64 F-Score 71.24 73.55 889.33 93.84 76.36 97.08 JH Accuracy 56.72 73.22 79.48 84.92 82.11 88.66 Precision 59.09 77.37 85.79 83.82 88.47 91.54 Recall 56.72 73.22 79.49 84.92 82.77 88.66 F-Score 61.24 74.44 79.81 83.47 84.07 88.92 NLP Accuracy 70.98 74.14 77.33 79.46 78.30 83.30 Precision 81.74 91.99 80.41 92.21 87.04 94.38 Recall 70.98 83.87 74.14 79.46 78.30 83.30 F-Score 72.77 81.21 78.71 78.57 80.01 84.91 ALL Accuracy 73.74 77.35 82.84 85.19 80.00 90.24 Precision 82.77 83.69 83.76 86.18 88.66 94.79 Recall 73.74 77.34 82.84 85.19 80.00 90.24 F-Score 75.54 78.33 82.44 84.41 81.76 90.81
[0078] As can be seen, the presently disclosed method MedTextSeg is able to outperform all of the comparative models for all of the data sets for each of the Accuracy, Precision, Recall and F-Score metrics. For example, there are percentage improvements of 5.93% accuracy, 9.99% precision, 5.93% recall and 7.58% F-score of the MedTextSeg method/model over Bi-BERT on the ALL dataset. Although CS-BERT, Bi-BERT and St-GRU are sequential models, they only model the local context of each sentence. To the contrary, as discussed above, the presently disclosed method/model is able to also capture the overall topical information within a section by using the second, ‘global’ encoding module or branch 590 and accordingly can perform better.
[0079] An ablation study was also undertaken to illustrate the effect of removing either the first module 554 (comprising the first RNN 556 and the second RNN 558) or the second module 552 (comprising the third RNN 590) on performance. The results are shown below in Table 4 for the ALL dataset, where MedTextSeg indicates the results where the disclosed method/model is used, HEM indicates the results where only the first module 554 is used, and GEM indicates the results where only the second module 552 is used.
TABLE-US-00004 TABLE 4 MedTextSeg HEM GEM Accuracy 90.24 83.07 89.72 Precision 94.79 90.66 92.45 Recall 90.24 83.01 89.72 F-Score 90.81 84.27 89.48
[0080] As can be seen, removing either module damages the segmentation performance in comparison to that of the complete model. In particular, the F-score drops by 7.76% with HEM and 1.49% with GEM. This serves to illustrated that using the third sentence representation u.sub.i including both the first sentence representation c.sub.senti encoding a more local context and the second sentence representation g.sub.i encoding a more global context of the sentence, as per the method disclosed herein, allows for improved performance, and hence more accurate and/or reliable section segmentation.
[0081] A qualitative evaluation of the learned features was also performed. For this illustrative study, the output sentence representations from the last layer (i.e., the layer before the softmax layer) in respective models was obtained. In the case of the MedTextSeg model of the present disclosure, this corresponds to the third sentence representation u.sub.i output from the combiner 591. This was compared to the output sentence representations from the MLP model. Specifically, for each sentence of the test set of the training data set, the output sentence representation was projected to a two-dimensional space using Principal Component Analysis (PCA), and then t-Distributed Stochastic Neighbor Embedding (T-SNE) was applied to group the sentences based on the section that they belong to according to the ground truth classifications. The results are shown in
[0082] For the purposes of illustration, ovals have been drawn to indicate certain groupings of the symbols that are apparent from inspecting the graphs. Referring to
[0083] Referring to
[0084] For example, the input interface 886 may receive a medical text report (or text thereof, or segmented sentences thereof, or word embeddings of a plurality of words of each of the sentences thereof), the processor 882 may implement the method described above with reference to
[0085] As another example, alternatively or additionally, the input interface 886 may receive a training data set as per any one of the examples described above, the processor 882 may implement training of a neural network for example as described above with reference to
[0086] The apparatus 880 may be implemented as a processing system and/or a computer. It will be appreciated that the methods according to any one of the examples described above with reference to
[0087] The above examples are to be understood as illustrative examples of the invention. It is to be understood that any feature described in relation to any one example may be used alone, or in combination with other features described, and may also be used in combination with one or more features of any other of the examples, or any combination of any other of the examples. Furthermore, equivalents and modifications not described above may also be employed without departing from the scope of the invention, which is defined in the accompanying claims.