Learning quality estimation device, method, and program

Abstract

This disclosure relates to a device, a method, and a program capable of removing erroneous data from learning data used for machine learning used in natural language processing, for example. The method includes storing a forward direction learned model of a discrete series converter. The model is trained based on a plurality of pairs of discrete series of texts. Each pair comprises a first discrete series indicates an input of discrete series. A second discrete series indicates an output of discrete series. The first discrete series and the second discrete series are correctly associated. The method further includes converting the first discrete series to the second discrete series, and generating a quality score using the forward direction learned model, using a second learning pair of discrete series texts including an error in relationship.

Claims

1. A computer-implemented method for evaluating aspects of discrete series of texts, the method comprising: receiving a set of pairs of discrete series of texts for learning, wherein each pair comprises a first discrete series of input data and a second discrete series of output data, and wherein the first discrete series of input data and the second discrete series of output data are based on a correct relationship without noise; generating, based on training, a forward learning model as training of the forward learning model using the set of pairs of discrete series of tests for learning as first training data, wherein the forward learning model, after being trained, converts the first discrete model of input data into the second discrete series of output data according to the correct relationship; receiving an erroneous pair of discrete series of texts of a plurality of erroneous pairs of discrete series of texts for learning, wherein the erroneous pair of discrete series of texts comprises a third discrete series of input data and a fourth discrete series of output data, and wherein the third discrete series of input data and the fourth discrete series of output data have a likelihood of being based on an erroneous relationship with noise; determining, using the generated forward learning model, a quality-score for each erroneous pair of discrete series of texts of the plurality of erroneous pairs of discrete series of texts according to the likelihood of being based an erroneous relationship with noise; ranking said each erroneous pair of discrete series of texts according to the quality-score; generating, based on the plurality of erroneous pairs of discrete series of texts, second training data by removing at least the ranked said each erroneous pair of discrete series of texts for learning according to the quality-score; and updating the forward learning model as iterative training of the forward learning model using second training data.

2. The computer-implemented method of claim 1, the method further comprising: converting, using a neural network, one or more discrete symbols in the third discrete series of input data of the erroneous pair of discrete series of texts into one or more fixed-length vectors; generating a fixed-length vector series based on encoding the converted one or more fixed-length vectors of the one or more discrete symbols in the third discrete series of input data; determining, based on the generated fixed-length vector series, a hidden vector; generating, based on the third discrete series of input data using the hidden vector, a fifth discrete series of output data; determining, using a negative log-likelihood of the generated fifth discrete series of output data associated with the generated fixed-length vector series, the quality-score for the fourth discrete series of output data; and providing the quality-score for excluding, based on the quality-score, the erroneous pair of discrete series of texts for training the forward learning model.

3. The computer-implemented method of claim 1, the method further comprises: determining a plurality of quality-scores for the plurality of the erroneous pairs; and selecting, based on a descending order of the determined plurality of quality-scores, a predetermined number of the erroneous pairs of discrete series of texts from the plurality of the erroneous pairs of discrete series of texts.

4. The computer-implemented method of claim 1, the method further comprising: generating, based on training, a reverse learning model for converting the second discrete model of input data into the first discrete series of output data; and determining, using both of the generated forward learning model and the generated reverse learning model, the quality-score for the erroneous pair of discrete series of texts.

5. The computer-implemented method of claim 1, wherein the first discrete series of input data is based on a first language, wherein the second discrete series of output data is based on a second language, and the first language and the second language are distinct.

6. The computer-implemented method of claim 1, wherein the first discrete series of input data is a set of texts in a document, and wherein the second discrete series of output data is a summary of the document.

7. The computer-implemented method of claim 2, wherein the quality-score for the fourth discrete series of output data relates to a negative log-likelihood per discrete symbol, and wherein the negative log-likelihood per discrete symbol is based on dividing the negative log-likelihood of the generated fifth discrete series of output data by a number of discrete symbols included in the generated fifth discrete series of output data.

8. A system for evaluating aspects of discrete series of texts, the system comprises: a processor; and a memory storing computer-executable instructions that when executed by the processor cause the system to: receive a set of pairs of discrete series of texts for learning, wherein each pair comprises a first discrete series of input data and a second discrete series of output data, and wherein the first discrete series of input data and the second discrete series of output data are based on a correct relationship without noise; generate, based on training, a forward learning model as training of the forward learning model using the set of pairs of discrete series of tests for learning as first training data, wherein the forward learning model, after being trained, converts the first discrete model of input data into the second discrete series of output data according to the correct relationship; receive an erroneous pair of discrete series of texts of a plurality of erroneous pairs of discrete series of texts for learning, wherein the erroneous pair of discrete series of texts comprises a third discrete series of input data and a fourth discrete series of output data, and wherein the third discrete series of input data and the fourth discrete series of output data have a likelihood of being based on an erroneous relationship with noise; determine, using the generated forward learning model, a quality-score for each erroneous pair of discrete series of texts of the plurality of erroneous pairs of discrete series of texts according to the likelihood of being based an erroneous relationship with noise; rank said each erroneous pair of discrete series of texts according to the quality-score; generate, based on the plurality of erroneous pairs of discrete series of texts, second training data by removing at least the ranked said each erroneous pair of discrete series of texts for learning according to the quality-score; and update the forward learning model as iterative training of the forward learning model using second training data.

9. The system of claim 8, the computer-executable instructions when executed further causing the system to: convert, using a neural network, one or more discrete symbols in the third discrete series of input data of the erroneous pair of discrete series of texts into one or more fixed-length vectors; generate a fixed-length vector series based on encoding the converted one or more fixed-length vectors of the one or more discrete symbols in the third discrete series of input data; determine, based on the generated fixed-length vector series, a hidden vector; generate, based on the third discrete series of input data using the hidden vector, a fifth discrete series of output data; determine, using a negative log-likelihood of the generated fifth discrete series of output data associated with the generated fixed-length vector series, the quality-score for the fourth discrete series of output data; and provide the quality-score for excluding, based on the quality-score, the erroneous pair of discrete series of texts for training the forward learning model.

10. The system of claim 8, the computer-executable instructions when executed further causing the system to: determine a plurality of quality-scores for the plurality of erroneous pairs; and select, based on a descending order of the determined plurality of quality-scores, a predetermined number of the erroneous pairs of discrete series of texts from the plurality of erroneous pairs of discrete series of texts.

11. The system of claim 8, the computer-executable instructions when executed further causing the system to: generate, based on training, a reverse learning model for converting the second discrete model of input data into the first discrete series of output data; and determine, using both of the generated forward learning model and the generated reverse learning model, the quality-score for the erroneous pair of discrete series of texts.

12. The system of claim 8, wherein the first discrete series of input data is based on a first language, wherein the second discrete series of output data is based on a second language, and the first language and the second language are distinct.

13. The system of claim 8, wherein the first discrete series of input data is a set of texts in a document, and wherein the second discrete series of output data is a summary of the document.

14. The system of claim 9, wherein the quality-score for the fourth discrete series of output data relates to a negative log-likelihood per discrete symbol, and wherein the negative log-likelihood per discrete symbol is based on dividing the negative log-likelihood of the generated fifth discrete series of output data by a number of discrete symbols included in the generated fifth discrete series of output data.

15. A computer-readable non-transitory recording medium storing computer-executable instructions that when executed by a processor cause a computer system to: receive a set of pairs of discrete series of texts for learning, wherein each pair comprises a first discrete series of input data and a second discrete series of output data, and wherein the first discrete series of input data and the second discrete series of output data are based on a correct relationship without noise; generate, based on training, a forward learning model as training of the forward learning model using the set of pairs of discrete series of tests for learning as first training data, wherein the forward learning model, after being trained, converts the first discrete model of input data into the second discrete series of output data according to the correct relationship; receive an erroneous pair of discrete series of texts of a plurality of erroneous pairs of discrete series of texts for learning, wherein the erroneous pair of discrete series of texts comprises a third discrete series of input data and a fourth discrete series of output data, and wherein the third discrete series of input data and the fourth discrete series of output data have a likelihood of being based on an erroneous relationship with noise; determine, using the generated forward learning model, a quality-score for each erroneous pair of discrete series of texts of the plurality of erroneous pairs of discrete series of texts according to the likelihood of being based an erroneous relationship with noise; rank said each erroneous pair of discrete series of texts according to the quality-score; generate, based on the plurality of erroneous pairs of discrete series of texts, second training data by removing at least the ranked said each erroneous pair of discrete series of texts for learning according to the quality-score; and update the forward learning model as iterative training of the forward learning model using second training data.

16. The computer-readable non-transitory recording medium of claim 15, the computer-executable instructions when executed further causing the system to: convert, using a neural network, one or more discrete symbols in the third discrete series of input data of the erroneous pair of discrete series of texts into one or more fixed-length vectors; generate a fixed-length vector series based on encoding the converted one or more fixed-length vectors of the one or more discrete symbols in the third discrete series of input data; determine, based on the generated fixed-length vector series, a hidden vector; generate, based on the third discrete series of input data using the hidden vector, a fifth discrete series of output data; determine, using a negative log-likelihood of the generated fifth discrete series of output data associated with the generated fixed-length vector series, the quality-score for the fourth discrete series of output data; and provide the quality-score for excluding, based on the quality-score, the erroneous pair of discrete series of texts for training the forward learning model.

17. The computer-readable non-transitory recording medium of claim 15, the computer-executable instructions when executed further causing the system to: determine a plurality of quality-scores for the plurality of erroneous pairs; and select, based on a descending order of the determined plurality of quality-scores, a predetermined number of the erroneous pairs of discrete series of texts from the plurality of erroneous pairs of discrete series of texts.

18. The computer-readable non-transitory recording medium of claim 15, the computer-executable instructions when executed further causing the system to: generate, based on training, a reverse learning model for converting the second discrete model of input data into the first discrete series of output data; and determine, using both of the generated forward learning model and the generated reverse learning model, the quality-score for the erroneous pair of discrete series of texts.

19. The computer-readable non-transitory recording medium of claim 15, wherein the first discrete series of input data is based on a first language, wherein the second discrete series of output data is based on a second language, and the first language and the second language are distinct.

20. The computer-readable non-transitory recording medium of claim 16, wherein the quality-score for the fourth discrete series of output data relates to a negative log-likelihood per discrete symbol, and wherein the negative log-likelihood per discrete symbol is based on dividing the negative log-likelihood of the generated fifth discrete series of output data by a number of discrete symbols included in the generated fifth discrete series of output data.

Description

BRIEF DESCRIPTION OF DRAWINGS

(1) FIG. 1 is a block diagram illustrating an example of a functional configuration of a learning quality estimation device according to a first embodiment.

(2) FIG. 2 is a block diagram illustrating an example of the configuration of a quality-score calculating unit according to the embodiment.

(3) FIG. 3 is a flowchart illustrating an example of a flow of processing of a learning quality estimation program according to a first embodiment.

(4) FIG. 4 is a block diagram illustrating an example of a functional configuration of a learning quality estimation device according to a second embodiment.

(5) FIG. 5 is a diagram illustrating various example of conversion cases in natural language processing.

(6) FIG. 6 is a diagram illustrating an example of encoding by an encoder and decoding by a decoder.

(7) FIG. 7 is a diagram illustrating an example of an attention-based discrete series-to-discrete series converter.

DESCRIPTION OF EMBODIMENTS

(8) An example of a mode for carrying out the present invention is explained in detail below with reference to the drawings.

First Embodiment

(9) In the following explanation in this embodiment, a quality score is given to data for translation learning and selected using the attention-based discrete series-to-discrete series converter described in NPL 2 described above. For the discrete series-to-discrete series converter, various methods (for example, see NPL 3) are conceivable other than the above. In this embodiment, any converter may be used.

(10) [NPL 3] Dzmitry Bandanau, Kyunghyun Cho and Yoshua Bengio, Neural Machine Translation by Jointly Learning to Align and Translate. Proceedings of the 3rd International Conference on Learning Representations, 2015.

(11) Note that, in this embodiment, an input sentence of a translations source language and an output sentence of a translation target language are applied as an example of an input and an output of a discrete series as illustrated in FIG. 5. In this case, a word included in the input sentence and a word included in the output sentence are applied to discrete symbols. Note that an input sentence before document summarization and an output sentence after the document summarization or an input sentence before document proofreading and an output document after the document proofreading may be applied as the input and the output of the discrete series.

(12) First, a definition of a case is explained. A group of sentences of a pair for translation learning including an input sentence and an output sentence not including an error (noise) is represented as (CF, CE) and a group of sentences of a pair for translation learning including an input sentence and an output sentence likely to include an error is represented as (NF, NE). In this embodiment, a learned model of a discrete series-to-discrete series converter is constructed based on the pair for translation learning not including an error. A quality score is given to the pair for translation learning likely to include an error using the constructed learned model of the discrete series-to-discrete series converter. A procedure of processing for giving a quality score according to this embodiment is as explained below.

(13) (Input)

(14) An input of a pair for translation learning likely to include an error is received.

(15) (Preprocessing)

(16) Each of an input sentence and an output sentence of the pair for translation learning is divided into word series using an existing word divider.

(17) (Quality Score Calculation)

(18) A quality score of the output sentence is calculated using a learned model of a discrete series-to-discrete series converter obtained by learning beforehand.

(19) (Repetition)

(20) The procedure described above is repeated for all pairs for translation learning to obtain quality scores for all the pairs for translation learning.

(21) (Selection)

(22) The pairs for translation learning are rearranged based on the obtained quality scores of the pairs for translation learning to select a predetermined number of pairs for translation learning in descending order of the quality scores.

(23) Next, an overview of learning processing performed using the discrete series-to-discrete series converter described in NPL 2 is explained.

(24) Here, the discrete series-to-discrete series converter is learned based on the pair for translation learning not including an error as explained above.

(25) Note that an i-th input sentence is represented as f.sub.iCF and an i-th output sentence is represented as e.sub.iCE. Types of input words are limited to frequently appearing V.sub.f words and types of output words are limited to frequently appearing V.sub.e words. Words not included in the frequently appearing V.sub.f words and the frequently appearing V.sub.e words may be replaced with dedicated tokens as unknown words.

(26) The discrete series-to-discrete series converter includes an encoder and a decoder as illustrated in FIG. 7. The encoder converts words included in the input sentence f.sub.i into fixed-length vectors e.sub.s. Thereafter, the encoder encodes the converted fixed-length vectors e.sub.s of the words respectively into fixed-length vector series h.sub.s.

(27) The decoder calculates a hidden layer (a hidden vector) h.sub.t based on the fixed-length vector series h.sub.s finally obtained by the encoder and decodes the output sentence e.sub.i based on the fixed-length vector series h.sub.s and the hidden layer h.sub.t. At that time, the decoder calculates, according to Expressions (1) to (4) described below, weight a.sub.t and a weighted sum c.sub.t to the fixed-length vector series h.sub.s output by the encoder at time steps. Subsequently, the decoder calculates
{tilde over (h)}.sub.t[Formula 1]

(28) from c.sub.t and h.sub.t according to Expression (5). Finally, the decoder predicts a probability distribution of output words using Expression (6) and uses the probability distribution for decoding. Note that these expressions are described in NPL 2.

(29) $\begin{matrix} [Formula 2] \\ \begin{matrix} a_{t} = align (h_{t}, {\overline{h}}_{s}) \\ = \frac{\exp (score (h_{t}, {\overline{h}}_{s}))}{{.Math.}_{s^{}} \exp (score (h_{t}, {\overline{h}}_{s^{}}))} \end{matrix} & \begin{matrix} (1) \\ (2) \end{matrix} \\ score (h_{t}, h_{s}) = h_{t}^{T} W_{a} {\overline{h}}_{s} & (3) \\ c_{t} = {.Math.}_{j = 1}^{.Math. f_{i} .Math.} a_{tj} h_{sj} & (4) \\ {\overline{h}}_{t} = \tanh (W_{c} [c_{t}; h_{t}]) & (5) \\ p (e_{i, t} | e_{i, < t}, f_{i}) = softmax (W_{s} {\tilde{h}}_{t}) & (6) \end{matrix}$

(30) Here,
h.sub.s[Formula 3]

(31) represents the fixed-length vector series output by the encoder at the time steps. The decoder performs weighting on outputs of the encoder at the time steps based on the vector series.
h.sub.t.sup.T[Formula 4]

(32) represents transposition of the hidden layer h.sub.t.
W.sub.a[Formula 5]

(33) represents a model parameter.
{tilde over (h)}.sub.t[Formula 6]

(34) represents a fixed-length vector series output by the decoder at time step t.
W.sub.s[Formula 7]

(35) represents a model parameter.
W.sub.c[Formula 8]

(36) represents a model parameter.
e.sub.i,t[Formula 9]

(37) represents a t-th word of the output sentence e.sub.i.

(38) Various configurations are conceivable on the inside of the encoder. The encoder is constructed by a long and short term storage memory (LSTM) herein. However, the encoder may have other configurations such as a recurrent neural network (RNN) and a gated recurrent unit (GRU). A parameter .sub.fe in the encoder and the decoder is determined using the pair for translation learning (CF, CE) not including an error as explained above. The parameter .sub.fe is fixed after learning. This parameter .sub.fe determines accuracy of encoding.

(39) A parameter learning method may be basically the same as the conventional series structure-to-series structure converter. An input sentence of a pair for learning is input to the discrete series-to-discrete series converter and an output sentence is obtained based on parameters of the encoder and the decoder. At that time, if the same output sentence as an output sentence (correct answer data) of the pair for learning is obtained, it is considered that the present parameters are successfully adjusted. On the other hand, if the output sentence is not the same as the correct answer data, processing for adjusting the parameters in a direction in which a correct answer is obtained is performed. Finally, a parameter search is performed in the direction in which the correct answer can be obtained in all pairs for learning to adjust the parameters.

(40) Next, the configuration of a learning quality estimation device according to the first embodiment is explained with reference to FIG. 1

(41) FIG. 1 is a block diagram illustrating an example of a functional configuration of a learning quality estimation device 90 according to the first embodiment.

(42) As illustrated in FIG. 1, the learning quality estimation device 90 according to this embodiment includes, in terms of functions, a computing unit 10, an input unit 20, and an output unit 30.

(43) The learning quality estimation device 90 according to this embodiment is electrically configured as a computer including a CPU (Central Processing Unit), a RAM (Random Access Memory), a ROM (Read Only Memory), and an HDD (Hard Disk Drive). A learning quality estimation program according to this embodiment is stored in the ROM. Note that the learning quality estimation program may be stored in the HDD.

(44) The learning quality estimation program may be installed in, for example, the learning quality estimation device 90 in advance. This learning quality estimation program may be realized by being stored in a nonvolatile storage medium or distributed via a network and installed in the learning quality estimation device 90 as appropriate. Note that examples of the nonvolatile storage medium include a CD-ROM (Compact Disc Read Only Memory), a magneto-optical disk, a DVD-ROM (Digital Versatile Disc Read Only Memory), a flash memory, and a memory card.

(45) The CPU functions as the computing unit 10, the input unit 20, and the output unit 30 by reading and executing the learning quality estimation program stored in the ROM.

(46) The computing unit 10 according to this embodiment is configured by a dividing unit 12, a quality-score calculating unit 14, a storing unit 16, and a selecting unit 18.

(47) In the storing unit 16, a forward direction learned model of a discrete series-to-discrete series converter (hereinafter simply referred to as discrete series converter), which converts an input sentence of a discrete series into an output sentence, learned in advance based on a plurality of first pairs for learning in which the input sentence and the output sentence of the discrete series are in a correct correspondence relation. That is, the forward direction learned model of the discrete series converter is generated based on the parameter .sub.fe indicated by a fixed-length vector obtained by machine learning of the discrete series converter in advance with the plurality of first pairs for learning as an input. This parameter .sub.fe is used for conversion of a discrete series from the input sentence to the output sentence by the discrete series converter. Note that the input sentence is an example of a first discrete series and the output sentence is an example of a second discrete series.

(48) The input unit 20 according to this embodiment receives an input of a group of sentences formed by a plurality of second pairs for learning including input sentences and output sentences likely to include an error in a correspondence relation.

(49) The dividing unit 12 according to this embodiment divides each of the input sentences and the output sentences included in the pairs of the plurality of second pairs for learning, the input of which is received by the input unit 20, into word series using an existing word divider.

(50) The quality-score calculating unit 14 according to this embodiment calculates, with, as an input, the plurality of second pairs of learning divided into the word series by the dividing unit 12, quality scores for the pairs of the plurality of second pairs for learning using the forward direction learned model stored in the storing unit 16.

(51) The selecting unit 18 according to this embodiment selects, out of the plurality of second pairs for learning, a predetermined number of second pairs for learning in descending order of the quality scores calculated by the quality-score calculating unit 14. A predetermined number n (n is an integer equal to or larger than 1) can be set as appropriate by a user.

(52) The output unit 30 according to this embodiment outputs the n second pairs for learning selected by the selecting unit 18. As an output destination of the output unit 30, a display unit such as a liquid crystal display (LCD) or an organic EL (Electro Luminescence) display is applied.

(53) Next, a specific configuration of the quality-score calculating unit 14 according to this embodiment is explained with reference to FIG. 2.

(54) FIG. 2 is a block diagram illustrating an example of the configuration of the quality-score calculating unit 14 according to this embodiment.

(55) As illustrated in FIG. 2, the quality-score calculating unit 14 according to this embodiment includes an encoder 14A, a decoder 14B, and a likelihood calculating unit 14C. Note that the encoder 14A and the decoder 14B configure the discrete series converter.

(56) The encoder 14A according to this embodiment converts words of an input sentence included in a second pair for learning into fixed-length vectors using the forward direction learned model stored in the storing unit 16 and encodes the converted fixed-length vectors of the words to obtain each of fixed-length vector series.

(57) The decoder 14B according to this embodiment calculates a hidden vector based on the fixed-length vector series obtained by the encoder 14A and obtains an output sentence with respect to the input sentence based on each of the fixed-length vector series, the hidden vector, and weight to each of the fixed-length vector series.

(58) The likelihood calculating unit 14C according to this embodiment calculates, concerning the output sentence included in the second pair for learning, from the input sentence included in the second pair for learning, with, as an input, the fixed length vector series obtained by the encoder 14A, based on the fixed-length vector series obtained by the decoder 14B, a negative log likelihood at the time when the output sentence is obtained from the decoder 14B. The likelihood calculating unit 14C outputs, as a quality score, a negative log likelihood per word obtained by dividing the negative log likelihood of the output sentence included in the second pair for learning by the number of words of the output sentence.

(59) That is, when the input sentence of the second pair for learning is represented as f.sub.iNF, the output sentence is represented as e.sub.iNE, the fixed-length vector series is represented as
{tilde over (h)}.sub.t,[Formula 10]

(60) a fixed-length vector series obtained by encoding the input sentence f.sub.i with the encoder is represented as h.sub.s, a parameter used for conversion of a discrete series from the input sentence into the output sentence is represented as .sub.fe, and a conditional probability is represented as p, with respect to e.sub.i, based on the fixed-length vector series
{tilde over (h)}.sub.t[Formula 11]
obtained by the decoder 14B from the fixed-length vector series h.sub.s,

(61) a negative log likelihood J at the time when the output sentence e.sub.i is obtained from the decoder 14B is calculated according to Expression (7) and Expression (8) described below.

(62) $\begin{matrix} [Formula 12] \\ \begin{matrix} J = - \log p (e_{i} | f_{i}) \\ - \log p (e_{i} | {\tilde{h}}_{t};_{fe}) \end{matrix} & \begin{matrix} (7) \\ (8) \end{matrix} \end{matrix}$

(63) Further, when the number of words of the output sentence e.sub.i is represented as |e.sub.i|, the negative log likelihood J obtained as described above is divided by the number of words |e.sub.i| of the output sentence e.sub.i. By performing standardization using the number of words in this way, a difference between a long sentence and a short sentence can be reflected on a quality score. In this case, a log likelihood J.sub.w per word is calculated by Expression (9) described below. Note that the conditional probability p of Expression (8) can be calculated by calculating output probabilities of words constituting e.sub.i based on, for example, the probability distribution of Expression (6) and calculating a product of the output probabilities.

(64) $\begin{matrix} [Formula 13] \\ J_{w} = \frac{J}{.Math. e_{i} .Math.} & (9) \end{matrix}$

(65) The log likelihood J.sub.w per word obtained as explained above is output as a quality score of the second pair for learning. The quality score is calculated for all pairs included in the group of sentences (NF, NE) including the plurality of second pairs for learning.

(66) Further, rearrangement of the plurality of second pairs for learning is performed based on quality scores obtained for all of the plurality of second pairs for learning. High score n (n1) second pairs for learning after the rearrangement are used for machine learning.

(67) Note that, in the above explanation, as an example of the discrete series converter, the attention-based discrete series-to-discrete series converter illustrated in FIG. 7 is applied. However, a general discrete series-to-discrete series converter illustrated in FIG. 6 may be applied. In this case, the encoder 14A converts the words of the input sentence included in the second pair for learning into fixed-length vectors using the forward direction learned model stored in the storing unit 16 and encodes the converted fixed-length vectors of the words to obtain a fixed-length vector series. The decoder 14B calculates a hidden vector based on the fixed-length vector series obtained by the encoder 14A and obtains an output sentence with respect to the input sentence based on the calculated hidden vector.

(68) Next, action of the learning quality estimation device 10 according to the first embodiment is explained with reference to FIG. 3. Note that FIG. 3 is a flowchart illustrating an example of a flow of processing of the learning quality estimation program according to the first embodiment.

(69) In step 100 in FIG. 3, the input unit 20 receives an input of a second pair for learning including an input sentence and an output sentence likely to include an error.

(70) In step 102, the dividing unit 12 divides each of the input sentence and the output sentence included in the second pair for learning, the input of which is received in step 100, into word series.

(71) In step 104, the quality-score calculating unit 14 calculates, concerning the second pair for learning divided into the word series in step 102, using the forward direction learned model stored in the storing unit 16, a likelihood of the output sentence included in the second pair for learning and calculates a quality score from the calculated likelihood.

(72) In step 106, the quality-score calculating unit 14 determines whether the quality score calculated in step 104 is obtained for all of second pairs for learning. When it is determined that the quality score is obtained (in the case of affirmative determination), the processing shifts to step 108. When it is determined that the quality score is not obtained (in the case of negative determination), the processing returns to step 100 and is repeated.

(73) In step 108, the selecting unit 18 selects n second pairs for learning in descending order of the quality scores calculated by the processing. The series of processing by this learning quality estimation program is ended.

(74) In this way, according to this embodiment, it is possible to remove a wrong pair for learning by calculating, using a learned model of the discrete series converter, a quality score for a pair for learning likely to include an error. Consequently, it is possible to reduce an adverse effect on machine learning due to the wrong pair for learning, improve accuracy of the machine learning, and further reduce a time required for the machine learning.

Second Embodiment

(75) In the first embodiment, the parameter .sub.fe in the forward direction from the input sentence to the output sentence of the discrete series converter is learned and used using the group of sentences (CF, CE) of the first pair for learning not including an error. In this embodiment, a parameter .sub.ef in the backward direction from the output sentence to the input sentence of the discrete series converter is also learned and used. Consequently, it is possible to achieve further improvement. In this case, for example, a sum of quality scores J.sub.w obtained in both the directions can be calculated and uses for selection.

(76) FIG. 4 is a block diagram illustrating an example of a functional configuration of a learning quality estimation device 92 according to the second embodiment.

(77) As illustrated in FIG. 4, the learning quality estimation device 92 according to this embodiment includes, in terms of functions, the computing unit 10, the input unit 20, and the output unit 30. Note that components having the same functions as the functions of the learning quality estimation device 90 according to the first embodiment are denoted by the same reference numerals and signs and repeated explanation of the components is omitted.

(78) In a storing unit 16A, as in the first embodiment, a forward direction learned model of a discrete series converter, which converts an input sentence of a discrete series into an output sentence, learned in advance based on a plurality of first pairs for learning is stored. In a storing unit 16B, a backward direction learned model of a discrete series converter, which converts an input sentence of a discrete series into an output sentence, learned in advance based on the plurality of first pairs for learning is stored. This backward direction learned model of the discrete series converter is generated based on parameters used for conversion of the discrete series from the output sentence into the input sentence by the discrete series converter. Note that the storing unit 16A and the storing unit 16B may be configured as one storing unit.

(79) The quality-score calculating unit 14 according to this embodiment calculates, concerning a second pair for learning, a quality score using each of the forward direction learned model and the backward direction learned model. Specifically, the quality-score calculating unit 14 calculates, with a plurality of second pairs for learning as an input, using the forward direction learned model stored in the storing unit 16A, quality stores of output sentences included in pairs of the plurality of second pairs for learning and further calculates, using the backward direction learned model stored in the storing unit 16B, quality scores of input sentences included in pairs of the plurality of second pairs.

(80) Here, like the quality-score calculating unit 14 illustrated in FIG. 2, the quality-score calculating unit 14 according to this embodiment includes the encoder 14A, the decoder 14B, and the likelihood calculating unit 14C.

(81) The encoder 14A according to this embodiment converts words of an input sentence included in a second pair for learning into fixed-length vectors using the forward direction learned model stored in the storing unit 16A and encodes the converted fixed-length vectors of the words to obtain each of fixed-length vector series. The encoder 14A converts words of an output sentence included in the second pair for learning into fixed-length vectors using the backward direction learned model stored in the storing unit 16B and encodes the converted fixed-length vectors of the words to obtain each of fixed-length vector series.

(82) The decoder 14B according to this embodiment calculates a hidden vector based on the fixed-length vector series obtained by the encoder 14A and obtains an output sentence with respect to the input sentence based on each of the fixed-length vector series, the hidden vector, and weight to each of the fixed-length vector series. The decoder 14B calculates a hidden vector based on the fixed-length vector series obtained by the encoder 14A and obtains an input sentence with respect to the output sentence based on each of the fixed-length vector series, the hidden vector, and weight to each of the fixed-length vector series.

(83) The likelihood calculating unit 14C according to this embodiment calculates, concerning the output sentence included in the second pair for learning, from the input sentence included in the second pair for learning, with, as an input, the fixed length vector series obtained by the encoder 14A, based on the fixed-length vector series obtained by the decoder 14B, a negative log likelihood at the time when the output sentence is obtained from the decoder 14B. Specifically, the likelihood calculating unit 14C calculates the negative log likelihood using Expression (7) and Expression (8) described above. The likelihood calculating unit 14C outputs, as a quality score in the forward direction, a negative log likelihood per word obtained by dividing the negative log likelihood of the output sentence included in the second pair for learning by the number of words of the output sentence. The likelihood calculating unit 14C calculates, concerning the input sentence included in the second pair for learning, from the output sentence included in the second pair for learning, with, as an input, the fixed length vector series obtained by the encoder 14A, based on the fixed-length vector series obtained by the decoder 14B, a negative log likelihood at the time when the input sentence is obtained from the decoder 14B. In this case, the relation between the input sentence and the output sentence only has to be reversed in Expression (7) and Expression (8) described above. The likelihood calculating unit 14C outputs, as a quality score in the backward direction, a negative log likelihood per word obtained by dividing the negative log likelihood of the input sentence included in the second pair for learning by the number of words of the input sentence.

(84) The selecting unit 18 according to this embodiment selects, out of the plurality of second pairs for learning, a predetermined number of second pairs for learning in descending order of the quality scores calculated by the quality-score calculating unit 14. Note that, in this embodiment, a sum of the quality score in the forward direction obtained when the forward direction learned model is used and the quality score in the backward direction obtained when the backward direction learned model is used is applied as the quality score.

(85) In this way, according to this embodiment, using the learned models concerning both the directions of the forward direction from the input sentence to the output sentence and the backward direction from the output sentence to the input sentence, the quality scores are calculated from both the directions concerning the pair for learning likely to include an error. Consequently, it is possible to more accurately remove a wrong pair for learning.

(86) As explained above, in general, it is premised that wrong data is not mixed in learning data of a discrete series-to-discrete series converter used in natural language processing. However, in reality, in many cases, wrong data is mixed and sometimes adversely affects machine learning.

(87) On the other hand, the quality of the learning data can be estimated by using the embodiments explained above. The wrong data adversely affecting the machine learning can be removed by selecting the learning data based on the quality. Therefore, it is possible to improve learning accuracy of the machine learning by applying the embodiments.

(88) The learning quality estimation devices are illustrated and explained as the embodiments above. The embodiments may be a form of a program for causing a computer to function as the units included in the learning quality estimation device. The embodiments may be a form of a computer-readable storage medium storing the program.

(89) Besides, the configurations of the learning quality estimation devices explained in the embodiments are examples and may be changed according to a situation in a range not departing from the gist.

(90) The flows of the programs explained in the embodiments are also examples. Unnecessary steps may be deleted, new steps may be added, and the processing order may be changed in a range not departing from the gist.

(91) In the explanation in the embodiments, by executing the programs, the processing according to the embodiments is realized by the software configuration using the computer. However, the embodiments are not limited to this. The embodiments may be realized by, for example, a hardware configuration or a combination of the hardware configuration and the software configuration.

REFERENCE SIGNS LIST

(92) 10 Computing unit 12 Dividing unit 14 Quality-score calculating unit 14A Encoder 14B Decoder 14C Likelihood calculating unit 16, 16A, 16B Storing unit 18 Selecting unit 20 Input unit 30 Output unit 90, 92 Learning quality estimation device

Learning quality estimation device, method, and program

Assignee

Inventors

Cpc classification

Classification Explorer

G06F40/45

PHYSICS

Classification Explorer

G06N20/00

PHYSICS

Classification Explorer

G05B2219/32193

PHYSICS

Classification Explorer

G06N3/0455

PHYSICS

Classification Explorer

G06N3/08

PHYSICS

Classification Explorer

G06F40/40

PHYSICS

Classification Explorer

G05B2219/32188

PHYSICS

Classification Explorer

G05B2219/32194

PHYSICS

Classification Explorer

G06N3/0499

PHYSICS

Classification Explorer

G06F40/20

PHYSICS

Classification Explorer

G05B2219/32195

PHYSICS

Classification Explorer

G06F16/3337

PHYSICS

Classification Explorer

G06N3/04

PHYSICS

Classification Explorer

G05B2219/32018

PHYSICS

Classification Explorer

G06F40/30

PHYSICS

International classification

Classification Explorer

G06F16/33

PHYSICS

Classification Explorer

G06F16/3332

PHYSICS

Classification Explorer

G06F40/20

PHYSICS

Classification Explorer

G06F40/30

PHYSICS

Classification Explorer

G06F40/40

PHYSICS

Classification Explorer

G06N20/00

PHYSICS

Classification Explorer

G06N3/0499

PHYSICS

Classification Explorer

G06N3/08

PHYSICS

Abstract

Claims

Description