Confusion network distributed representation generation apparatus, confusion network classification apparatus, confusion network distributed representation generation method, confusion network classification method and program
11556783 · 2023-01-17
Assignee
Inventors
Cpc classification
G10L15/10
PHYSICS
International classification
Abstract
There is provided a technique for transforming a confusion network to a representation that can be used as an input for machine learning. A confusion network distributed representation sequence generating part that generates a confusion network distributed representation sequence, which is a vector sequence, from an arc word set sequence and an arc weight set sequence constituting the confusion network is included. The confusion network distributed representation sequence generating part comprises: an arc word distributed representation set sequence transforming part that, by transforming an arc word included in the arc word set to a word distributed representation, obtains an arc word distributed representation set and generates an arc word distributed representation set sequence; and an arc word distributed representation set weighting/integrating part that generates the confusion network distributed representation sequence from the arc word distributed representation set sequence and the arc weight set sequence.
Claims
1. A confusion network distributed representation generation apparatus comprising: processing circuitry that, when T is assumed to be an integer equal to or larger than 1, and W.sub.t=(w.sub.t1, w.sub.t2, . . . , w.sub.tN_t) (1≤t≤T) and C.sub.t=(c.sub.t1, c.sub.t2, . . . , C.sub.tN_t) (1≤t≤T) are assumed to be a t-th arc word set constituting a confusion network (wherein w.sub.tn (1≤n≤N.sub.t; N.sub.t is an integer equal to or larger than 1) denotes an arc word included in the arc word set W.sub.t) and a t-th arc weight set constituting the confusion network (wherein c.sub.tn (1≤n≤N.sub.t) denotes an arc weight corresponding to the arc word w.sub.tn), respectively, generates a confusion network distributed representation sequence U.sub.1, U.sub.2, . . . , U.sub.T, which is a vector sequence, from an arc word set sequence W.sub.1, W.sub.2, . . . , W.sub.T and an arc weight set sequence C.sub.1, C.sub.2, . . . , C.sub.T constituting the confusion network; wherein the processing circuitry is configured to: by transforming the arc word w.sub.tn included in the arc word set W.sub.t to a word distributed representation ω.sub.tn, obtain an arc word distributed representation set Ω.sub.t=(ω.sub.t1, ω.sub.t2, . . . , ω.sub.tN_t) and generate an arc word distributed representation set sequence Ω.sub.1, Ω.sub.2, . . . , Ω.sub.T; and generate the confusion network distributed representation sequence U.sub.1, U.sub.2, . . . , U.sub.T from the arc word distributed representation set sequence Ω.sub.1, Ω.sub.2, . . . , Ω.sub.T and the arc weight set sequence C.sub.1, C.sub.2, . . . , C.sub.T.
2. The confusion network distributed representation generation apparatus according to claim 1, wherein the processing circuitry is configured to generate, from a word sequence w.sub.1, w.sub.2, . . . , w.sub.T, the arc word set sequence W.sub.1, W.sub.2, . . . , W.sub.T and the arc weight set sequence C.sub.1, C.sub.2, . . . , C.sub.T constituting the confusion network by the following formula:
W.sub.t=(w.sub.t)(1≤t≤T)
C.sub.t=(1)(1≤t≤T) [Formula 10]
3. A confusion network classification apparatus comprising: processing circuitry that, when T is assumed to be an integer equal to or larger than 1, and W.sub.t=(w.sub.t1, w.sub.t2, . . . , w.sub.tN_t) (1≤t≤T) and C.sub.t=(c.sub.t1, c.sub.t2, . . . , c.sub.tN_t) (1≤t≤T) are assumed to be a t-th arc word set constituting a confusion network (wherein w.sub.tn (1≤n≤N.sub.t; N.sub.t is an integer equal to or larger than 1) denotes an arc word included in the arc word set W.sub.t) and a t-th arc weight set constituting the confusion network (wherein c.sub.tn (1≤n≤N.sub.t) denotes an arc weight corresponding to the arc word w.sub.tn), respectively, generates a confusion network distributed representation sequence U.sub.1, U.sub.2, . . . , U.sub.T, which is a vector sequence, from an arc word set sequence W.sub.1, W.sub.2, . . . , W.sub.T and an arc weight set sequence C.sub.1, C.sub.2, . . . , C.sub.T constituting the confusion network; and estimates a class label showing a class of the confusion network, from the confusion network distributed representation sequence U.sub.1, U.sub.2, . . . , U.sub.T; wherein the processing circuitry is configured to by transforming the arc word w.sub.tn included in the arc word set W.sub.t to a word distributed representation ω.sub.tn, obtain an arc word distributed representation set Ω.sub.t=(ω.sub.t1, ω.sub.t2, . . . , ω.sub.tN_t) and generate an arc word distributed representation set sequence Ω.sub.t, Ω.sub.2, . . . , Ω.sub.T; and generate the confusion network distributed representation sequence U.sub.1, U.sub.2, . . . , U.sub.T from the arc word distributed representation set sequence Ω.sub.1, Ω.sub.2, . . . , Ω.sub.T and the arc weight set sequence C.sub.1, C.sub.2, . . . , C.sub.T.
4. The confusion network classification apparatus according to claim 3, wherein the processing circuitry is configured to implement two neural networks to respectively generate the confusion network distributed representation sequence and estimate the class label; and parameters of the neural network that generates the confusion network distributed representation sequence and parameters of the neural network that estimates the class label are learned with the two neural networks as one neural network obtained by combining the two neural networks.
5. A confusion network distributed representation generation method comprising a confusion network distributed representation sequence generating step of, when T is assumed to be an integer equal to or larger than 1, and W.sub.t=(w.sub.t1, w.sub.t2, . . . , w.sub.tN_t) (1≤t≤T) and C.sub.t=(c.sub.t1, c.sub.t2, . . . , c.sub.tN_t) (1≤t≤T) are assumed to be a t-th arc word set constituting a confusion network (wherein w.sub.tn (1≤n≤N.sub.t; N.sub.t is an integer equal to or larger than 1) denotes an arc word included in the arc word set W.sub.t) and a t-th arc weight set constituting the confusion network (wherein c.sub.tn (1≤n≤N.sub.t) denotes an arc weight corresponding to the arc word w.sub.tn), respectively, a confusion network distributed representation generation apparatus generating a confusion network distributed representation sequence U.sub.1, U.sub.2, . . . , U.sub.T, which is a vector sequence, from an arc word set sequence W.sub.1, W.sub.2, . . . , W.sub.T and an arc weight set sequence C.sub.1, C.sub.2, . . . , C.sub.T constituting the confusion network; wherein the confusion network distributed representation sequence generating step comprises: an arc word distributed representation set sequence transforming step of, by transforming the arc word w.sub.tn included in the arc word set W.sub.t to a word distributed representation ω.sub.tn, obtaining an arc word distributed representation set Ω.sub.t=(ω.sub.t1, ω.sub.t2, . . . , ω.sub.tN_t) and generating an arc word distributed representation set sequence Ω.sub.1, Ω.sub.2, . . . , Ω.sub.T; and an arc word distributed representation set weighting/integrating step of generating the confusion network distributed representation sequence U.sub.1, U.sub.2, . . . , U.sub.T from the arc word distributed representation set sequence Ω.sub.1, Ω.sub.2, . . . , Ω.sub.T and the arc weight set sequence C.sub.1, C.sub.2, . . . , C.sub.T.
6. A confusion network classification method comprising a confusion network distributed representation sequence generating step of, when T is assumed to be an integer equal to or larger than 1, and W.sub.t=(w.sub.t1, w.sub.t2, . . . , w.sub.tN_t (1≤t≤T) and C.sub.t=(c.sub.t1, c.sub.t2, . . . , c.sub.tN_t) (1≤t≤T) are assumed to be a t-th arc word set constituting a confusion network (wherein w.sub.tn 1≤n≤N.sub.t; N.sub.t is an integer equal to or larger than 1) denotes an arc word included in the arc word set W.sub.t) and a t-th arc weight set constituting the confusion network (wherein c.sub.tn(1≤n≤N.sub.t) denotes an arc weight corresponding to the arc word w.sub.tn), respectively, a confusion network classification apparatus generating a confusion network distributed representation sequence U.sub.1, U.sub.2, . . . , U.sub.T, which is a vector sequence, from an arc word set sequence W.sub.1, W.sub.2, . . . , W.sub.T and an arc weight set sequence C.sub.1, C.sub.2, . . . , C.sub.T constituting the confusion network; and a class label estimating step of the confusion network classification apparatus estimating a class label showing a class of the confusion network, from the confusion network distributed representation sequence U.sub.1, U.sub.2, . . . , U.sub.T; wherein the confusion network distributed representation sequence generating step comprises: an arc word distributed representation set sequence transforming step of, by transforming the arc word w.sub.tn included in the arc word set W.sub.t to a word distributed representation ω.sub.tn, obtaining an arc word distributed representation set Ω.sub.t=(ω.sub.t1, ω.sub.t2, . . . , ω.sub.tN_t) and generating an arc word distributed representation set sequence Ω.sub.1, Ω.sub.2, . . . , Ω.sub.T; and an arc word distributed representation set weighting/integrating step of generating the confusion network distributed representation sequence U.sub.1, U.sub.2, . . . , U.sub.T from the arc word distributed representation set sequence Ω.sub.1, Ω.sub.2, . . . , Ω.sub.T and the arc weight set sequence C.sub.1, C.sub.2, . . . , C.sub.T.
7. A computer program product comprising a non-transitory computer-readable medium storing a program that, when executed by processing circuitry of a computer, causes the computer to function as the confusion network distributed representation generation apparatus according to any of of claims 1 or 2.
8. A computer program product comprising a non-transitory computer-readable medium storing a program that, when executed by processing circuitry of a computer, causes the computer to function as the confusion network classification apparatus according to any one of claim 3 or 4.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
DETAILED DESCRIPTION OF THE EMBODIMENTS
(10) Embodiments of the present invention will be described below in detail. Note that the same number will be given to components having the same function, and duplicate description will be omitted.
(11) Prior to description of the embodiments, a notation method in this specification will be described.
(12) Here, _(underscore) indicates a subscript. For example, x.sup.y_z indicates that y.sub.z is a superscript of x, and x.sub.y_z indicates that y.sub.z is a subscript of x.
(13) Next, a confusion network will be described. A confusion network is a structure that efficiently represents hypothesis space at the time of speech recognition and is a structure represented as a graph composed of nods and arcs (see
(14) Each arc of a confusion network obtained at the time of speech recognition corresponds to a word (hereinafter referred to as an arc word), and each word has a probability of being correct (hereinafter referred to as an arc weight). A confusion network in
(15) An important point is that, by causing such a word that a start node (corresponding to start time) and an end node (corresponding to end time) are the same (hereinafter, a set of such words will be referred to as an arc word set) to be located as an arc between the same nodes, it is possible to represent hypothesis space at the time of speech recognition as a confusion network which is a pair of an arc word set sequence and an arc weight set sequence (a sequence of arc weight set corresponding to the arc word set). In the example of
(16) For details of the confusion network, see Reference non-patent literature 1.
First Embodiment
(17) A confusion network classification apparatus 100 will be described below with reference to
(18) The confusion network classification apparatus 100 takes a confusion network as an input. The confusion network is represented by two sequences, an arc word set sequence and an arc weight set sequence.
(19) The operation of the confusion network classification apparatus 100 will be described according to
(20) [Confusion Network Distributed Representation Sequence Generating Part 110]
(21) Input: a confusion network (an arc word set sequence and an arc weight set sequence)
(22) Output: a confusion network distributed representation sequence
(23) The confusion network distributed representation sequence generating part 110 generates a confusion network distributed representation sequence U.sub.1, U.sub.2, . . . , U.sub.T from an arc word set sequence W.sub.1, W.sub.2, . . . , W.sub.T and an arc weight set sequence C.sub.1, C.sub.2, . . . , C.sub.T (S110). Here, T is an integer equal to or larger than 1. Note that both of a length of the arc word set sequence and a length of the arc weight set sequence are T, and the lengths of the two sequences are the same.
(24) The t-th arc word set W.sub.t (1≤t≤T) constituting the confusion network is represented by:
W.sub.t=(w.sub.t1,w.sub.t2, . . . ,w.sub.tN.sub.
(25) Here, w.sub.tn indicates an arc word of the n-th kind included in the arc word set W.sub.t (1≤n≤N.sub.t; N.sub.t is an integer equal to or larger than 1); and N.sub.t is the number of kinds of arc words included in the arc word set W.sub.1, which is a value that differs according to t.
(26) Similarly, the t-th arc weight set C.sub.t (1≤t≤T) constituting the confusion network is represented by:
C.sub.t=(c.sub.t1,c.sub.t2, . . . ,c.sub.tN.sub.
(27) Here, c.sub.tn indicates an arc weight of the n-th kind included in the arc weight set C.sub.t (1≤n≤N.sub.t). Note that the following formula is satisfied.
(28)
(29) Further, it is assumed that the arc weight c.sub.tn and the arc word w.sub.tn correspond to each other. That is, the arc weight c.sub.tn indicates a probability of the arc word w.sub.tn being correct.
(30) The confusion network distributed representation sequence generating part 110 will be described below with reference to
(31) [Arc Word Distributed Representation Set Sequence Transforming Part 112]
(32) Input: an arc word set sequence
(33) Output: an are word distributed representation set sequence
(34) By transforming the arc word w.sub.tn included in the arc word set W.sub.t to a word distributed representation ω.sub.tn, the arc word distributed representation set sequence transforming part 112 obtains an arc word distributed representation set Ω.sub.t=(ω.sub.t1, ω.sub.t2, . . . , ω.sub.tN_t) and generates an arc word distributed representation set sequence Ω.sub.1, Ω.sub.2, . . . , Ω.sub.T (S112). For each arc word w.sub.tn (1≤n≤N.sub.t) of the t-th arc word set W.sub.t (1≤t≤T), the arc word w.sub.tn, which is a symbol, is transformed to the word distributed representation ω.sub.tn, which is a vector, by the following formula:
ω.sub.tn=EMBEDDING(w.sub.tn) [Formula 5]
(35) Here, EMBEDDING(•) is a function of transforming a word to a word vector of a predetermined dimensionality; and EMBEDDING(•) is indicated, for example, as a linear transformation function. Note that EMBEDDING(•) is not limited to linear transformation but may be any function if the function has a similar function. For example, a transformation matrix for transforming a word to a word distributed representation can be used. The transformation matrix is a dictionary (a code book table) in which a corresponding vector is prepared for each word, and the number of dimensions and values of the vector are decided when the dictionary is generated. Further, as other examples, a function of calculating a word vector or a concept vector used in natural language processing, and the like are given. The word vector is a vector obtained by utilizing co-occurrence frequency of words, and the concept vector is a vector obtained by compressing a word vector.
(36) [Arc Word Distributed Representation Set Weighting/Integrating Part 114]
(37) Input: an arc word distributed representation set sequence and an arc weight set sequence
(38) Output: a confusion network distributed representation sequence
(39) The arc word distributed representation set weighting/integrating part 114 generates the confusion network distributed representation sequence U.sub.1, U.sub.2, . . . , U.sub.T from the arc word distributed representation set sequence Ω.sub.1, Ω.sub.2, . . . , Ω.sub.T and the arc weight set sequence C.sub.1, C.sub.2, . . . , C.sub.T(S114). For each t (1≤t≤T), the arc word distributed representation set weighting/integrating part 114 generates the confusion network distributed representation U.sub.t from the arc word distributed representation set Ω.sub.t and the arc weight set C.sub.t. Specifically, by integrating the word distributed representation ω.sub.tn (1≤n≤N.sub.t) by weighting, the confusion network distributed representation U.sub.t is generated by the following formula:
(40)
(41) By performing this integration processing for all of t (1≤t≤T), the confusion network distributed representation sequence U.sub.1, U.sub.2, . . . , U.sub.T is obtained.
(42) Next, the class label estimating part 120 estimates a class label showing a class of the confusion network from the confusion network distributed representation sequence generated at S110 (S120). The class label estimating part 120 will be described below in detail.
(43) [Class Label Estimating Part 120]
(44) Input: a confusion network distributed representation sequence
(45) Output: a class label
(46) The class label estimating part 120 estimates a class label L from the confusion network distributed representation sequence U.sub.1, U.sub.2, . . . , U.sub.T (S120). By executing such processing as below for the confusion network distributed representation sequence U.sub.1, U.sub.2, . . . , U.sub.T, the class label L is estimated.
h=NN(U.sub.1,U.sub.2, . . . ,U.sub.T)
P=DISCRIMINATE(h) [Formula 7]
(47) Here, h denotes a fixed-length real-valued vector obtained by transforming the confusion network distributed representation sequence U.sub.1, U.sub.2, . . . , U.sub.T and indicates a feature value of the confusion network distributed representation sequence U.sub.1, U.sub.2, . . . , U.sub.T. This real-valued vector h is used as a feature at the time of estimating a class label. Note that it is assumed that a dimensionality of the real-valued vector h is determined in advance. Further, P denotes probability distribution indicating a posterior probability corresponding to each class to be a classification destination.
(48) Here, NN(•) is a function of transforming a real-valued vector sequence to a fixed-length real-valued vector. As NN(•), for example, RNN or CNN can be used. Note that NN(•) is not limited to RNN or CNN but may be any function if the function has a similar function. NN(•) may be realized using machine learning other than a neural network such as RNN and CNN, for example, SVM (Support Vector Machine) or Random Forest.
(49) Here, DISCRIMINATE(•) is a function for calculating a posterior probability corresponding to each class from a fixed-length vector. As DISCRIMINATE(•), for example, a softmax function can be used.
(50) The class label L to be an output is a label corresponding to a class with the highest probability in probability distribution P.
(51) Note that it is possible to, by omitting the class label estimating part 120 from the confusion network classification apparatus 100, configure a confusion network distributed representation generation apparatus 101 (not shown). That is, the confusion network distributed representation generation apparatus 101 includes the confusion network distributed representation sequence generating part 110 and the recording part 190, and generates a confusion network distributed representation sequence with a confusion network as an input.
(52) (Method for Configuring Confusion Network Distributed Representation Sequence Generating Part 110 and Class Label Estimating Part 120)
(53) As stated before, NN(•), which is a part of the function of the class label estimating part 120, can be configured as a neural network, and the class label estimating part 120 itself can be configured as a neural network. That is, the class label estimating part 120 can be configured as a neural network (for example, like RNN and CNN) that outputs a class label with a confusion network distributed representation sequence as an input.
(54) Similarly, the confusion network distributed representation sequence generating part 110 can also be configured as a neural network (for example, like RNN and CNN) that outputs a confusion network distributed representation sequence with a confusion network as an input.
(55) Furthermore, by combining the confusion network distributed representation sequence generating part 110 and the class label estimating part 120 so that an output of the neural network constituting the confusion network distributed representation sequence generating part 110 becomes an input of the neural network constituting the class label estimating part 120, one neural network having both of the function of the confusion network distributed representation sequence generating part 110 and the function of the class label estimating part 120 can be configured. In this case, it becomes possible to learn parameters of the neural network constituting the confusion network distributed representation sequence generating part 110 and parameters of the neural network constituting the class label estimating part 120 at the same time, and learning is performed in a form of the parameters of the two neural networks being optimized as a whole.
(56) Of course, learning may be performed so that the parameters of the neural network constituting the confusion network distributed representation sequence generating part 110 and the parameters of the neural network constituting the class label estimating part 120 are independently optimized by individually learning the parameters of the neural network constituting the confusion network distributed representation sequence generating part 110 and the parameters of the neural network constituting the class label estimating part 120.
(57) Note that the confusion network distributed representation sequence generating part 110 or the class label estimating part 120 is not necessarily required to be configured as a neural network but may be configured by other machine learning. For example, the class label estimating part 120 may be configured with SVM or Random Forest. Furthermore, the confusion network distributed representation sequence generating part 110 or the class label estimating part 120 may be configured by a method other than machine learning.
(58) According to the present invention, by representing a confusion network as a confusion network distributed representation sequence, which is a vector sequence, it becomes possible to use the confusion network as an input for machine learning.
(59) Further, since it becomes possible to configure a class classifier using a confusion network distributed representation sequence, it becomes possible to configure a class classifier with a better performance, for example, in comparison with a case of configuring a class classifier using only one word string like a speech recognition result. This is because a confusion network includes various information about hypothesis space of speech recognition, including a speech recognition result. That is, since a confusion network includes a plurality of candidates for a speech recognition result and information about a probability of each candidate being correct, it is possible to learn a class classifier based on whether a recognition error or not (or how accurately recognition seems to be correct), and, as a result, performance of the learned class classifier is increased.
Application Examples
(60) Description has been made so far with it kept in mind to learn a class classifier using a confusion network obtained in a process of speech recognition, and configure a confusion network classification apparatus using the class classifier. If a confusion network is generated in a first-stage estimation process, like a combination of a speech recognizer corresponding to a first-stage estimator and a class classifier corresponding to a second-stage estimator, it is possible to learn the second-stage estimator in a similar framework. As the combination of the first-stage estimator and the second-stage estimator, for example, a combination such as a text basic analyzer and a class classifier, a speech recognizer and a text searcher, or the text basic analyzer and the text searcher is also possible.
Second Embodiment
(61) Though a confusion network is an input in the confusion network distributed representation generation apparatus 101 described in the first embodiment, text which is a word sequence may be an input. Therefore, here, a confusion network distributed representation generation apparatus 200 will be described that generates a confusion network distributed representation sequence with text which is a word sequence as an input.
(62) The confusion network distributed representation generation apparatus 200 will be described below with reference to
(63) The confusion network distributed representation generation apparatus 200 takes text (a word sequence) as an input.
(64) The operation of the confusion network distributed representation generation apparatus 200 will be described according to
(65) [Text Transforming Part 210]
(66) Input: text (a word sequence)
(67) Output: a confusion network (an arc word set sequence and an arc weight set sequence)
(68) The text transforming part 210 generates, from a word sequence w.sub.1, w.sub.2, . . . , w.sub.T, an arc word set sequence W.sub.1, W.sub.2, . . . , W.sub.T and an arc weight set sequence C.sub.1, C.sub.2, . . . , C.sub.T constituting a confusion network that represents the word sequence (S210). Here, the word sequence w.sub.1, w.sub.2, . . . , w.sub.T can be obtained, for example, by dividing input text using morphological analysis. An arc word set W.sub.t is generated by the following formula:
W.sub.t=(w.sub.t) [Formula 8]
(69) That is, the arc word set W.sub.t (1≤t≤T) is a set that includes one word w.sub.t as an are word. Further, an arc weight set C.sub.t is generated by the following formula:
C.sub.t=(1) [Formula 9]
(70) That is, the arc weight set C.sub.t (1≤t≤T) is a set that includes only an arc weight 1 corresponding to the word w.sub.t. Thereby, the arc word set sequence W.sub.1, W.sub.2, . . . , W.sub.T and the arc weight set sequence C.sub.1, C.sub.2, . . . , C.sub.T corresponding to the word sequence w.sub.1, w.sub.2, . . . , w.sub.T are obtained.
(71) Next, the confusion network distributed representation sequence generating part 110 generates a confusion network distributed representation sequence, which is a vector sequence, from the arc word set sequence and the arc weight set sequence constituting the confusion network (S110).
(72) According to the present invention, it becomes possible to generate a confusion network distributed representation sequence from text. Here, the text to be an input is not limited to a speech recognition result. Therefore, it becomes possible to learn a class classifier using all of a confusion network corresponding to a speech recognition result that includes a speech recognition error, a confusion network corresponding to a speech recognition result that does not include a speech recognition error and a confusion network generated from general text. Further, it becomes possible to learn a class classifier using not only learning data for a class classifier that classifies a speech recognition result but also learning data created for text classification in natural language processing.
Third Embodiment
(73) In a third embodiment, description will be made on a method for calculating similarity between confusion networks using a confusion network distributed representation sequence.
(74) A confusion network similarity calculation apparatus 300 will be described below with reference to
(75) The confusion network similarity calculation apparatus 300 takes two confusion networks (a first confusion network and a second confusion network) as inputs. Each of the confusion networks to be an input may be, for example, such that is obtained at the time of speech recognition or such that is obtained by transforming text by processing similar to the processing of the text transforming part 210.
(76) The operation of the confusion network similarity calculation apparatus 300 will be described according to
(77) The similarity calculating part 310 calculates similarity between the first confusion network and the second confusion network from the first confusion network distributed representation sequence and the second confusion network distributed representation sequence (S310). Since both of the first confusion network distributed representation sequence and the second confusion network distributed representation sequence are vector sequences, for example, a vector sum of the first confusion network distributed representation sequence and a vector sum of the second confusion network distributed representation sequence are calculated, and similarity between the vectors is calculated. Further, for example, cosine similarity can be used as vector similarity.
(78) According to the present invention, it becomes possible to indicate a degree of similarity between two confusion networks as a numerical value.
(79) By incorporating such a configuration into a speech search system or a spoken dialogue system and using the configuration for a process for searching a text database with speech as an input, it is possible to reduce influence given by an input speech recognition error on the search of the text database. Specifically, similarity between a first confusion network corresponding to input speech and a second confusion network corresponding to text, which is a search result, is calculated. If the similarity is smaller than a predetermined value, input of speech is prompted again on an assumption that an input speech recognition result has an error.
(80) Further, by applying the framework of the confusion network distributed representation and the calculation of similarity between confusion networks described above to the process for searching a text database with speech as an input, it is possible to reduce influence given by an input speech recognition error on the search of the text database. Specifically, each piece of text in the text database is transformed to a second confusion network distributed representation sequence in advance; similarity is calculated using a first confusion network distributed representation sequence corresponding to input speech and a second confusion network distributed representation sequence corresponding to each piece of text in the database; and a piece of text with the highest similarity is returned as a search result.
(81) <Supplementary Notes>
(82) For example, as a single hardware entity, an apparatus of the present invention has an inputting part to which a keyboard and the like can be connected, an outputting part to which a liquid crystal display and the like can be connected, a communicating part to which a communication device (for example, a communication cable) capable of communicating with the outside of the hardware entity can be connected, a CPU (Central Processing Unit; a cache memory and a register may be provided), a RAM and a ROM which are memories, an external storage device which is a hard disk, and a bus connecting these inputting part, outputting part, communicating part, CPU, RAM, ROM and external storage device so that data can be exchanged thereamong. Further, the hardware entity may be provided with a device (a drive) or the like capable of reading from/writing to a recording medium such as a CD-ROM as necessary. As a physical entity provided with such hardware resources, there is a general-purpose computer or the like.
(83) In the external storage device of the hardware entity, programs required to realize the functions described above and data required for processing by the programs are stored (Other than the external storage device, the program may be stored, for example, in the ROM which is a read-only storage device). Data and the like obtained by the processing by these programs are appropriately stored in the RAM or the external storage device.
(84) In the hardware entity, each program stored in the external storage device (or the ROM or the like) and data required for processing by the program are read into the memory as necessary, and interpretation, execution and processing are appropriately performed by the CPU. As a result, the CPU realizes a predetermined function (each of the above components represented as a . . . part, a . . . unit or the like).
(85) The present invention is not limited to the above embodiments but can be appropriately changed within a range not departing from the spirit of the present invention. Further, the processes described in the above embodiments are not only executed in time series in the order of description. The processes may be executed in parallel or individually according to processing capability of an apparatus that executes the processes or as necessary.
(86) As already stated, in the case of realizing the processing functions in the hardware entity (the apparatus of the present invention) described in the above embodiments by a computer, processing content of the functions that the hardware entity should have is written by a program. Then, by executing the program on the computer, the processing functions of the hardware entity are realized on the computer.
(87) The program in which the processing content is written can be recorded in a computer-readable recording medium. As the computer-readable recording medium, anything, for example, a magnetic recording device, an optical disk, a magneto-optical recording medium, a semiconductor memory or the like is possible. Specifically, for example, a hard disk device, a flexible disk, a magnetic tape or the like can be used as the magnetic recording device; a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only Memory), a CD-R (Recordable)/RW (ReWritable) or the like can be used as the optical disk; and an MO (Magneto-Optical disc) or the like can be used as the magneto-optical recording medium; and an EEP-ROM (Electronically Erasable and Programmable-Read Only Memory) or the like can be used as the semiconductor memory.
(88) Distribution of this program is performed, for example, by performing sales, transfer, lending or the like of a portable recording medium such as a DVD or a CD-ROM in which the program is recorded. Furthermore, a configuration is also possible in which the program is stored in a storage device of a server computer and is caused to be distributed by being transferred from the server computer to other computers via a network.
(89) For example, a computer that executes such a program first stores the program recorded in a portable recording medium or transferred from a server computer into its own storage device once. Then, at the time of executing processing, the computer reads the program stored in its own storage device and executes the processing according to the read program. Further, as another execution form of the program, a computer may directly read the program from a portable recording medium and execute processing according to the program. Furthermore, each time a program is transferred to the computer from a sever computer, the computer may sequentially execute processing according to the received program. Further, a configuration is also possible in which the processes described above is executed by a so-called ASP (Application Service Provider) type service in which, without transferring the program to the computer from the server computer, the processing functions are realized only by an instruction to execute the program and acquisition of a result. Note that it is assumed that the program in the present form includes information that is provided for processing by an electronic calculator and is equivalent to a program (data or the like that is not a direct command to a computer but has a nature of specifying processing of the computer).
(90) Further, though it is assumed in this form that the hardware entity is configured by causing a predetermined program to be executed on a computer, at least a part of the processing content may be realized as hardware.
(91) The above description of the embodiments of the present invention is presented for the purpose of illustration and description. It is not intended that the description is comprehensive, and it is not intended to limit the invention to the disclosed strict form, either. Modifications and variations are possible from the teachings stated above. The embodiments are selected and expressed in order to provide the best illustration of the principle of the present invention and in order that one skilled in the art can use the present invention in various embodiments or by adding various modifications to adapt the present invention to contemplated actual use. All of such modifications and variations are within the scope of the present invention specified by the accompanying claims interpreted according to a range that is fairly, legitimately and justly given.