System for generating topic inference information of lyrics
11544601 · 2023-01-03
Assignee
Inventors
Cpc classification
G06N7/01
PHYSICS
International classification
Abstract
A system for generating topic inference information of lyrics that can provide more useful for topic interpretation of lyrics. A device for learning topic numbers performs an operation of updating and learning topic numbers, which performs an operation of updating topic numbers on all of a plurality of lyrics data of each of a plurality of artists, for a predetermined number of times. The operation of updating topic numbers updates the topic number assigned to a given lyrics data of a given artist using a random number generator having a deviation of appearance probability corresponding to a probability distribution over topic numbers. An outputting device outputs the topic numbers of the plurality of lyrics data for each of the plurality artists, and a probability distribution over words for each of the topic numbers.
Claims
1. A system for generating topic inference information of lyrics for obtaining reliable information for inferring a topic that is a subject, a main point or a theme of lyrics as determined by lyric contents, the system comprising: a means for obtaining lyrics data, operable to obtain a plurality of lyrics data each including a song name and the lyrics of each of a plurality of artists; a means for generating a given number of one or more topic numbers k (1≤k≤K) where k is a number in a range of 1 to K (a positive integer); an analysis means for extracting a plurality of words by performing morpheme analysis of a plurality of lyrics in the plurality of the lyrics data, using a morpheme analysis engine; a means for learning the one or more topic numbers, operable to first assign a topic number to each of the plurality lyrics data of each of the plurality of artists in a random or arbitrary manner; then to calculate a probability p that the topic number of a given lyrics data S.sub.ar is k, based on a number R.sub.ak of lyrics data other than the lyrics data S.sub.ar of a given artist a, to which the topic number k is assigned and a number N.sub.kv of times that the topic number k is assigned to a word v in the plurality of lyrics data of the plurality of artists except the given lyrics data S.sub.ar; to generate a probability distribution over the one or more topic numbers of the given lyrics data S.sub.ar, based on the calculated probability p; next to perform an operation of updating the one or more topic numbers to update the topic number assigned to the given lyrics data S.sub.ar of the given artist a using a random number generator having a deviation of appearance probability corresponding to the probability distribution over the one or more topic numbers; and to perform an operation of updating and learning the one or more topic numbers for a predetermined number of times, the operation of updating and learning the one or more topic numbers performing the operation of updating the one or more topic numbers on all of the plurality of lyrics data of each of the plurality of artists; a means for learning values of one or more switch variables, operable to assign values of the one or more switch variables to the plurality of words included in the plurality of lyrics data of each of the plurality of artists in a random or arbitrary manner; then to generate a probability distribution λ.sub.a over values of the one or more switch variables by calculating a probability whether the value of a switch variable x assigned to a given word v.sub.arj is a topic word or a background word, based on the values of the one or more switch variables assigned to the plurality of words in the plurality of lyrics data of the given artist a; next to perform an operation of updating values of the one or more switch variables to update the value of the switch variable assigned to the given word v.sub.arj using a random number generator having a deviation of appearance probability corresponding to the probability distribution over values of the one or more switch variables; and to perform an operation of updating and learning values of the one or more switch variables for a predetermined number of times, the operation of updating and learning values of the one or more switch variables performing the operation of updating values of the one or more switch variables on all of the plurality of words included in the plurality of lyrics data of each of the plurality of artists; and an outputting means operable to identify the one or more topic numbers of each of the plurality of lyrics data and the probability distributions over words for each of the one or more topic numbers, based on learning results obtained from the means for learning the one or more topic numbers and learning results obtained by the means for learning values of the one or more switch variables; wherein the means for generating the given number of the one or more topic numbers, the analysis means, the means for learning the one or more topic numbers, the means for learning values of the one or more switch variables and the outputting means are implemented on a computer by a computer program installed in the computer.
2. The system for generating topic inference information of lyrics according to claim 1, wherein: in the means for learning the one or more topic numbers, it is assumed that the one or more topic numbers assigned to all of the plurality of lyrics but the topic number assigned to the given lyrics data of the given artist are correct when generating the probability distribution over the one or more topic numbers.
3. The system for generating topic inference information of lyrics according to claim 1, wherein: in the means for learning values of the one or more switch variables, it is assumed that values of the one or more switch variables assigned to all of words but the value of the switch variable x assigned to a given word in the plurality of words of the given lyrics data of the given artist are correct when performing the operation of updating values of the one or more switch variables.
4. The system for generating topic inference information of lyrics according to claim 1, wherein the means for learning the one or more topic numbers: calculates a first probability p.sub.1 that the topic number of the given lyrics data S.sub.ar is k, based on the number R.sub.ak of lyrics data other than the given lyrics data S.sub.ar of the given artist a when generating the probability distribution over the one or more topic numbers; calculates a second probability p.sub.2 that the topic number of the given lyrics data S.sub.ar is k, based on the number N.sub.kv of times that the topic number k is assigned to the word v in the plurality of lyrics data of the plurality of artists other than the given lyrics data S.sub.ar; calculates the probability p that the topic number of the given lyrics data S.sub.ar is k, from the first probability p.sub.1 and the second probability p.sub.2; and determines the probability distribution over the one or more topic numbers of the given lyrics data S.sub.ar by performing the above-identified calculations on all of the one or more topic numbers and normalizing probabilities that the topic number of the given lyrics data S.sub.ar is any one of 1 (one) to K such that normalized probabilities sum up to 1 (one).
5. The system for generating topic inference information of lyrics according to claim 1, wherein: the outputting means is configured to output a probability distribution over words for each topic number, based on the number N.sub.kv of times that the topic number k is assigned to a given word v, wherein the outputting means are implemented on the computer by the computer program installed in the computer.
6. The system for generating topic inference information of lyrics according to claim 5, wherein: in the outputting means, an occurrence probability θ.sub.kv of the word v to which the topic number k is assigned is calculated as follows:
θ.sub.kv=(N.sub.kv+β)/(N.sub.k+β|V|) where N.sub.kv denotes a number of times that a topic number k is assigned to a given word v, N.sub.k denotes a number of all of words to which the topic number k is assigned, β denotes a smoothing parameter, and |V| denotes a number of kinds of words.
7. The system for generating topic inference information of lyrics according to claim 4, wherein the means for learning values of the one or more switch variables: calculates a third probability p.sub.3 that the value of the switch variable of the word v.sub.arj is 0 (zero), based on a number N.sub.a0 of words to which a value of 0 (zero) is assigned as the value of switch variable in all of lyrics data of all of songs of the given artist a; calculates a fourth probability p.sub.4 that the value of the switch variable of the word v.sub.arj is 0 (zero), based on a number Nz.sub.arv.sub.arj of times that 0 (zero) is assigned to the value of the switch variable of the word v.sub.arj in all of songs of all of artists to which the same topic number Z.sub.ar as the lyrics including the word v.sub.arj is assigned; calculates a fifth probability p.sub.5 that the value of the switch variable is 0 (zero) from the third probability p.sub.3 and the fourth probability p.sub.4; calculates a sixth probability p.sub.6 that the value of the switch variable of the word v.sub.arj is 1 (one), based on a number N.sub.a1 of times that 1 (one) is assigned as the value of the switch variable in the plurality lyrics data of the given artist; calculates a seventh probability p.sub.7 that the value of the switch variable of the word v.sub.arj is 1 (one), based on a number N.sub.1varj of times that 1 (one) is assigned as the value of the switch variable of the word v.sub.arj in the plurality of lyrics data of the plurality of artists; calculates an eighth probability pa that the value of the switch variable is 1 (one) from the sixth probability p.sub.6 and the seventh probability p.sub.7; and normalize the probabilities from the fifth probability p.sub.5 and the eighth probability pa such that a sum of the probability that the value of the switch variable of the word v.sub.arj is 0 (zero) and the probability that the value of the switch variable of the word v.sub.arj is 1 (one) is 1 (one) to obtain a probability distribution over values of the one or more switch variables.
8. The system for generating topic inference information of lyrics according to claim 1, wherein: the topic number for each of the plurality of lyrics data in the outputting means is a topic number that is last assigned to each of the plurality of lyrics data after the operation of updating and learning the one or more topic numbers is performed for a predetermined number of times in the means for learning the one or more topic numbers.
9. The system for generating topic inference information of lyrics according to claim 1, further comprising: a first means for generating a first word probability distribution, operable to generate a probability distribution over words included in lyrics data of a new song s of an artist that has not been used in learning; a second means for generating one or more second word probability distributions over words included respectively in lyrics data of the plurality of songs of the plurality of artists; a means for computing similarities operable to respectively obtain similarities between the first word probability distribution over words included in the lyrics data of the new song s as calculated by the first means for generating the first word probability distribution and the one or more second word probability distributions over words respectively included in the lyrics data of the plurality of songs as calculated by the second means for generating the one or more second word probability distributions; a means for generating a weight distribution by adding the similarities of the lyrics data of the plurality of songs corresponding to the lyrics data of the plurality of songs to the one or more topic numbers as a weight; and a means for determining a topic number, operable to determine a topic number having a largest weight as a topic number of the lyrics data of the new song s.
10. The system for generating topic inference information of lyrics according to claim 9, further comprising: a third means for generating a third word probability distribution, operable to generate the third word probability distribution over words included in the lyrics data of all of songs of the artist that have not been used in learning and for which an occurrence probability of background words are to be calculated; a fourth means for generating one or more fourth word probability distributions, operable to generate probability distributions over words included in the lyrics data of all of the songs of each of the artists; a fifth means for generating one or more fifth word probability distributions, operable to generate probability distributions over background words included in the lyrics data of all of songs for each of the artists; a means for computing similarities, operable to obtain similarities respectively between the third word probability distribution over words included in the lyrics data of the new song s as calculated by the third means for generating the third word probability distribution and the one or more fourth word probability distributions over words included in the lyrics data of the plurality of songs as calculated by the fourth means for generating the one or more fourth word probability distributions; and a means for generating an occurrence probability distribution over background words, operable to multiply the respective probability distributions over background words included in the lyrics data of all of the songs of each of the artists as calculated by the fifth means for generating the one or more fifth word probability distributions, by the similarities for each of the artists as computed by the means for computing similarities to obtain probability distributions, and normalizing the obtained probability distributions such that the weights sum up to 1 (one) for the respective artists, and then determining a resulting probability distribution as an occurrence probability distribution over background words.
11. A system for generating topic inference information of lyrics for obtaining reliable information for inferring a topic that is a subject, a main point or a theme of lyrics as determined by lyric contents, the system comprising: a means for obtaining lyrics data, operable to obtain a plurality of lyrics data each including a song name and the lyrics of each of a plurality of artists; a means for generating a given number of one or more topic numbers k (1≤k≤K) where k is a number in a range of 1 to K (a positive integer); an analysis means for extracting a plurality of words by performing morpheme analysis of a plurality of lyrics in the plurality of the lyrics data, using a morpheme analysis engine; a means for learning the one or more topic numbers, operable to first assign a topic number to each of the plurality lyrics data of each of the plurality of artists in a random or arbitrary manner; then to calculate a probability p that the topic number of a given lyrics data S.sub.ar is k, based on a number R.sub.ak of lyrics data other than the lyrics data S.sub.ar of a given artist a, to which the topic number k is assigned and a number N.sub.kv of times that the topic number k is assigned to a word v in the plurality of lyrics data of the plurality of artists except the given lyrics data S.sub.ar; to generate a probability distribution over the one or more topic numbers of the given lyrics data S.sub.ar, based on the calculated probability p; next to perform an operation of updating the one or more topic numbers to update the topic number assigned to the given lyrics data S.sub.ar of the given artist a using a random number generator having a deviation of appearance probability corresponding to the probability distribution over the one or more topic numbers; and to perform an operation of updating and learning the one or more topic numbers for a predetermined number of times, the operation of updating and learning the one or more topic numbers performing the operation of updating the one or more topic numbers on all of the plurality of lyrics data of each of the plurality of artists; and an outputting means operable to identify the one or more topic numbers of each of the plurality of lyrics data and the probability distributions over words for each of the one or more topic numbers, based on learning results obtained from the means for learning the one or more topic numbers; wherein the means for generating the given number of the one or more topic numbers, the analysis means, the means for learning the one or more topic numbers, and the outputting means are implemented on a computer by a computer program installed in the computer.
12. The system for generating topic inference information of lyrics according to claim 11, wherein: in the means for learning the one or more topic numbers, it is assumed that the one or more topic numbers assigned to all of the plurality of lyrics but the topic number assigned to the given lyrics data of the given artist are correct when generating the probability distribution over the one or more topic numbers.
13. The system for generating topic inference information of lyrics according to claim 11, wherein the means for learning the one or more topic numbers: calculates a first probability p.sub.1 that the topic number of the given lyrics data S.sub.ar is k, based on the number R.sub.ak of lyrics data other than the given lyrics data S.sub.ar of the given artist a when generating the probability distribution over the one or more topic numbers; calculates a second probability p.sub.2 that the topic number of the given lyrics data S.sub.ar is k, based on the number N.sub.kv of times that the topic number k is assigned to the word v in the plurality of lyrics data of the plurality of artists other than the given lyrics data S.sub.ar; calculates the probability p that the topic number of the given lyrics data S.sub.ar is k, from the first probability p.sub.1 and the second probability p.sub.2; and determines the probability distribution over the one or more topic numbers of the given lyrics data S.sub.ar by performing the above-identified calculations on all of the one or more topic numbers and normalizing probabilities that the topic number of the given lyrics data S.sub.ar is any one of 1 (one) to K such that normalized probabilities sum up to 1 (one).
14. The system for generating topic inference information of lyrics according to claim 13, wherein: the outputting means is configured to output a probability distribution over words for each topic number, based on the number N.sub.kv of times that the topic number k is assigned to a given word v, wherein the outputting means are implemented on the computer by the computer program installed in the computer.
15. The system for generating topic inference information of lyrics according to claim 14, wherein: in the outputting means, an occurrence probability θ.sub.kv of the word v to which the topic number k is assigned is calculated as follows:
θ.sub.kv=(N.sub.kv+β)/(N.sub.k+β|V|) where N.sub.kv denotes a number of times that a topic number k is assigned to a given word v, N.sub.k denotes a number of all of words to which the topic number k is assigned, β denotes a smoothing parameter, and |V| denotes a number of kinds of words.
16. The system for generating topic inference information of lyrics according to claim 11, further comprising: a first means for generating a first word probability distribution, operable to generate a probability distribution over words included in lyrics data of a new song s of an artist that has not been used in learning; a second means for generating one or more second word probability distributions over words included respectively in lyrics data of the plurality of songs of the plurality of artists; a means for computing similarities, operable to obtain similarities respectively between the probability distribution over words included in the lyrics data of the new song s as calculated by the first means for generating the first word probability distribution and the probability distributions over words included in the lyrics data of the plurality of songs as calculated by the second means for generating the one or more second word probability distributions; a means for generating a weight distribution by adding the similarities of the lyrics data of the plurality of songs corresponding to the lyrics data of the plurality of songs to the one or more topic numbers as a weight; and a means for determining a topic number, operable to determine a topic number having a largest weight as a topic number of the lyrics data of the new song s.
Description
BRIEF DESCRIPTION OF DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
DESCRIPTION OF EMBODIMENTS
(23) Now, embodiments of the present invention will be described below in detail with reference to accompanying drawings.
First Embodiment
(24)
(25) As illustrated in
(26) In the present invention, structural elements as illustrated as a block in
(27) Now, the theories used in implementing the first embodiment on hardware such as a computer will be described using mathematical equations and expressions. The model is represented by the following equation. Here, the number of topics given as an input is K, a collection of artists in the collection of lyrics data is A, and a collection of nouns or given parts of speech is V. The topic k (1≤k≤K) has a word probability distribution φk=(φ.sub.k1, φ.sub.k2, . . . , φ.sub.kv), and an occurrence probability of a word v∈V is φ.sub.kv≥0 and satisfies the following equation.
Σ.sub.v=1.sup.|v|ϕ.sub.kv=1
(28) An artist a∈A has a topic probability distribution θa=(θa.sub.1, θa.sub.2, . . . , θa.sub.k), and a topic occurrence probability θa≥0 and satisfies the following equation.
Σ.sub.k=1.sup.Kθ.sub.ak=1
(29) The artist a∈A has a probability distribution λa=(λa.sub.0; λa.sub.1) for choosing a value of the switch variable. λa.sub.0 is a probability having a value of the switch variable of 0 (zero), and indicates that a word is chosen from the topics. λa.sub.1 is a probability having a value of the switch variable of 1 (one), and indicates that a word is chosen from the background words. λa.sub.0≥0 and λa.sub.1≥0 and λa.sub.0+λa.sub.1=1 are satisfied. The background word v∈V has a word probability distribution φ=φ.sub.1, φ.sub.2, . . . φ.sub.|V|), and a word occurrence probability φ.sub.V≥0 and satisfies the following equation.
Σ.sub.v=1.sup.|V|ψ.sub.v=1
(30) The system of the present invention automatically generates information useful for determining or inferring the topics of lyrics, based on the model illustrated in
(31) To implement this using a computer, the total number of lyrics data of an artist a is defined as R.sub.a, and the r(1≤r≤R.sub.a)th lyric as S.sub.ar, a collection D.sub.a of lyrics of the artist a is represented by the following equation.
D.sub.a={S.sub.ar}.sub.r=1.sup.R.sup.
(32) Further, a collection of D of lyrics of all of the artists is represented by D={D.sub.a}.sub.a∈.sub.A.
(33) The means 7 for generating topic numbers generates a topic number k of 1 to K (a positive integer) as illustrated in step ST2 of
(34) As illustrated in step ST3 of
(35) As illustrated in step ST4 of
(36)
(37)
(38) Next, in step ST409, the topic number of the song, namely, the given lyrics data S.sub.ar is updated. In updating the topic number, the topic number assigned to the given lyrics data S.sub.ar of the given artist a is updated using a random number generator having a deviation of appearance probability corresponding to the probability distribution over topic numbers (step ST409). The operation of updating topic numbers (steps ST403, ST409) is performed on all of a plurality of lyrics data of each of a plurality of artists (steps ST 404, ST411). Then, the operation of updating and learning topic numbers (steps ST403 to ST411) is performed for a predetermined number of times [in an example of
(39) As illustrated in step ST5 of
(40) Specifically, a last updated value is determined as the topic number assigned to lyrics data in step ST409 of
θ.sub.kv=(N.sub.kv+β)/(N.sub.k+β|V|)
(41) Where N.sub.kv denotes the number of times that the topic number k is assigned to the given word v, N.sub.k denotes the number of all the words to which the topic number k is assigned, β denotes a smoothing parameter for the number times of word appearing, and |V| denotes the number of kinds of words.
(42) (Equation-Based Updating of Topic Numbers)
(43) Updating of topic numbers as mentioned above will be theoretically described below. First, it is assumed that each of θa, φk, φ, and λa has a Dirichlet distribution of parameters α, β, γ, and pas a prior distribution. Defining that the topic number of the song S.sub.ar of the artist a as Z.sub.ar, the value of the switch variable of the jth word in the lyrics S.sub.ar of the artist a as X.sub.arj, a collection D of lyrics and a collection Z of topic numbers are represented by the following equation.
Z={{z.sub.ar}.sub.r=1.sup.R.sup.
(44) A collection X of switch variables is represented by the following equation.
X={{{x.sub.arj}.sub.j=1.sup.V.sup.
(45) The joint distributions is represented by the following equation.
P(D,Z,X|α,β,γ,ρ)=∫∫∫∫P(D,Z,X|Θ,Φ,Ψ,Λ)P(Θ|α)P(Φ|β)P(ψ|γ)×P(Λ|ρ)dΘdΦdψdΛ (1)
(46) Here, the following equation holds.
Θ={θ.sub.a}a∈A Φ={ϕ.sub.k}.sub.k=1.sup.K
Λ={λ.sub.a}a∈A
(47) P(D, Z, X|α,β, γ, ρ) represents a probability that the following combination occurs: words (D) of all of the lyrics, all of the topic numbers (Z), and assignment of all of the switch variables (X) when topic number assignment for all of the songs of all of the artists and values of switch variables assignment for all of the words of all of the songs of all of the artists are determined. Equation (1) is calculated by integrating out these parameters as follows.
(48)
(49) Here, N.sub.a0 and N.sub.a1 respectively denote the number of words for which the value of switch variable is 0 (zero) and the number of words for which the value of switch variable is 1 (one) in the lyrics of the artist a, and N.sub.a=N.sub.a0+N.sub.a1. N.sub.1 denotes the number of words v for which the value of switch variable is one, N.sub.1=Σ.sub.v∈.sub.vN.sub.1v. Here, N.sub.k=Σ.sub.v∈.sub.vN.sub.kv, and N.sub.kv denotes the number of times that the topic number k is assigned to the word v under the condition of a switch variable of 0 (zero). R.sub.ak denotes the number of lyrics to which the topic number k is assigned in the lyrics of the artist a.
R.sub.a=Σ.sub.k=1.sup.KR.sub.ak
(50) The term of expression (3) in equation (2) denotes a probability that when assignment of topic numbers to all of the lyrics is determined, that assignment is observed.
(51)
(52) The term of the following expression (4) in equation (2) denotes a probability that when assignment of values of switch variables to all of the words in all of the lyrics is determined, that assignment is observed.
(53)
(54) The term of the following expression (5) in equation (2) denotes a probability that when assignment of all of the topic numbers to all of the lyrics and assignment of all of the values of switch variables to all of the words in all of lyrics are determined, all of the words in all of the lyrics are observed.
(55)
(56) A probability of Z.sub.ar=k is represented by equation (6) when the topic number of a song S.sub.ar of the artist a is defined as Z.sub.ar.
(57)
(58) In the above equation, \.sub.ar denotes a value when rth lyrics of the artist a is excluded. N.sub.ar denotes the number of words in rth lyrics of the artist a, and N.sub.arv denotes the number of words v appearing in rth lyrics of the artist a. In equation (6), the term of expression (7) in equation (6) denotes how many topic numbers k are assigned to the lyrics other than rth lyric of the artist a. In other words, the more the topic number k is assigned to the songs of the artist a, the higher the probability that the topic number assigned to rth lyric of the artist a is k will be.
(59)
(60) The term of expression (8) in equation (6) denotes how many words the topic number k is assigned to in the rth lyrics of the artist a when looking into the songs other than the rth song of the artist a. For example, if a word “Natsu (summer)” is presented in the rth song of the artist a, it is taken into consideration how many times the topic number k is assigned to the word “Natsu (summer)” in all of the songs but the rth song of the artist a. Here, when the topic number of the song is k, it is considered that the topic number k is assigned to all of the words in the lyrics of that song. Namely, the more words the topic number k is assigned in the lyrics of the rth song of the artist a, the higher the probability that the topic number of the rth song of the artist a is k will be.
(61)
(62) Updating of the topic number is performed so as to increase the value of equation (2). In parallel with updating of the topic numbers for each of the lyrics, a probability distribution over words for each topic number is also updated.
(63) The switch variable as described above is theoretically a switch variable output by a means 115 for learning values of switch variables in a second embodiment as described later. In the first embodiment, the switch variable is assumed to be 0 (zero), and the value of a switch variable is not updated. Accordingly, background words are not taken into consideration.
Second Embodiment
(64)
(65) The second embodiment is different from the first embodiment as illustrated in
(66) In the second embodiment, as illustrated in
(67) In steps S14 to S18 of
(68) The means 115 for learning values of switch variables performs an operation of updating and learning of values of switch variables [steps ST1409 to ST1415 of
(69) As illustrated in
(70)
(71) (Equation-Based Updating of Switch Variables)
(72) Updating of switch variables as mentioned above will be theoretically described below. First, it is assumed a value of the switch variable for the jth word in the lyrics data S.sub.ar of the artist a is x.sub.arj. A probability that x.sub.arj=0 is represented by the following equation.
(73)
(74) In the above equation, \.sub.ar denotes a value when the jth word of the rth lyrics of the artist a is excluded. The term of expression (10) in equation of (9) denotes how readily the artist a generates words from the topics. The larger the value is, the higher the probability that the value of the switch variable of the jth word in the rth lyrics of the artist a is 0 (zero) will be.
(75)
(76) The term of expression (11) in equation of (9) denotes how readily the jth word in the rth lyrics of the artist a occurs at the topic number Z.sub.ar. The larger the value is, the higher the probability that the value of the switch variable of the jth word in the rth lyrics of the artist a is 0 (zero) will be. For example, when the jth word of the rth lyrics of the artist a is “Natsu (summer)”, it is taken into consideration how often 0 (zero) is assigned to the word “Natsu (summer)” as the value of the switch variable in all of the words of all of the songs of all of the artists to which the topic number z.sub.ar has been assigned.
(77)
(78) Likewise the probability that x.sub.arj=1 is represented as follows.
(79)
(80) The term of expression (13) in equation (12) denotes how readily the artist a generates a word from the background words. The larger the value is, the higher the probability that the value of the switch variable of the jth word in the rth lyrics of the artist a is 1 (one) will be.
(81)
(82) The term of expression (14) in equation (12) denotes how readily the jth word in the rth lyrics of the artist a generates a word from the background words. The larger the value is, the higher the probability that the value of the switch variable of the jth word in the rth lyrics of the artist a is 1 (one) will be.
(83)
(84) Updating of the values of switch variables of the words as illustrated in step ST1413 of
(85) Specifically, an operation of updating the value of switch variable assigned to a given word is performed on all of a plurality of words included in a plurality of lyrics of each of a plurality of artists, using a random Number generator having a deviation of appearance probability corresponding to the probability distribution over values of the switch variables (step ST1412 to ST1416). The “random number generator” used herein is conceptually described as follows. In the second embodiment, assume an imaginary dihedron dice having two faces corresponding to two switch variables, and each face having an area proportional to its appearing probability. When rolling the imaginary dice, the number assigned to the appearing face of the dice is defined as an updated value of the switch variable.
(86) As illustrated in step ST5 of
(87) The topic number of each of a plurality of lyrics output by the outputting means 113 is a topic number last assigned to the lyrics data of each of the artists [the topic number last updated in step ST1409 of
(88) Likewise, the word probability distribution for each topic number output by the outputting means 113 is the last stored word probability distribution for each topic number after performing the operation of updating and learning topic numbers for a predetermined number of times [In
(89)
(90) Where N.sub.kv denotes the number of times that the topic number k is assigned to the given word v, N.sub.k denotes the number of all of the words to which the topic number k is assigned, β denotes a smoothing parameter, and |V| denotes the number of kinds of words. Here, the smoothing parameter refers to the number of times of pseudo-occurrence of each word at each topic number. The number of kinds of words refers to the number of unique words included in the lyrics in the lyrics database illustrated in
(91) (Effect Obtainable from Second Embodiment)
(92) According to the second embodiment, once an arbitrary number of topics is determined, the last updated topic number for each of a plurality of lyrics data can be identified with the topic number as updated by the means for learning topic numbers. Further, an occurrence probability of words for each topic number is generated, based on the values of switch variables as last updated by the means for learning values of switch variables. Once the topic number of each of a plurality of lyrics data and the occurrence probability of words for each topic number have been determined, the word having a high occurrence probability can be known for each of topic number, thereby eliminating the need of manually specifying a collection of words related to the topics and a collection of words unrelated to the topics. Further, once a plurality of words having a high occurrence probability have been grasped, reliable information for determining the topics can be obtained from these words, thereby grasping likely meanings of the topics of the lyrics of each song.
(93) [System for Topic Inference Information of Lyrics that have not been Used in Learning]
(94) When obtaining a topic number of a lyrics data of a new song of a given artist that has not been used in learning, the system may be configured as illustrated in
(95) (System for Generating Occurrence Probability for Background Word)
(96)
(97) In the present embodiment, the system comprises a third means 27 for generating a word probability distribution to a fifth means 31 for generating a word probability distribution; a means 33 for computing similarities, and a means 35 for generating an occurrence probability distribution. The third means 27 for generating a word probability distribution generates a probability distribution over words included in the lyric data of all of songs of the artist that have not been used in learning and for which an occurrence probability of background words are to be calculated (step ST302); a fourth means 29 for generating a word probability distribution generates probability distributions over words in the lyrics data of all of the songs of each of the artist that have been used in learning (step ST306); and a fifth means 31 for generating a word probability distribution generates a probability distribution for background words included in the lyrics data of all of songs for each of the artists that have been used in learning (step ST306). In the present embodiment, the word distribution over background words for each artist can be obtained by determining a word distribution over background words for each artist, not a common word probability over background words for all of the artists as illustrated in
(98) [Example Results]
(99)
(100) [Method and Computer Program]
(101) The present invention may be implemented as a method for generating topic inference information of lyrics and a computer program therefor as follows,
(102) The method comprises:
(103) (1) A step of obtaining a plurality of lyrics data each including a song name and lyrics for each of a plurality of artists;
(104) a step of generating a given number of topic numbers k of 1 to K (1≤k≤K);
(105) an analysis step of analyzing the plurality of lyrics in the plurality of lyrics data by means of morpheme analysis to extract a plurality of words;
(106) a step of learning topic numbers by first assigning the topic number k to the plurality of lyrics data for each of the plurality of artists in a random or arbitrary manner, then calculating a probability p that the topic number of a given lyrics data S.sub.ar is k, based on a number R.sub.ak of lyrics data other than a lyrics data S.sub.ar for a given artist a, to which the topic number k is assigned and a number N.sub.kv of times that the topic number k is assigned to the word v in the plurality of lyrics data of the plurality of artists except the given lyrics data S.sub.ar, calculating a probability distribution over topic numbers of the given lyrics data S.sub.ar, based on the calculated probability p, next performing an operation of updating topic numbers to update the topic number assigned to the given lyrics data S.sub.ar of the given artist a using a random number generator having a deviation of appearance probability corresponding to the probability distribution over topic numbers, and performing an operation of updating and learning topic numbers on all of the plurality of lyrics data of each of the plurality of artists for a predetermined number of times; and
(107) an outputting step of identifying the topic numbers of each of the plurality of lyrics data and the probability distributions over words for each of the topic numbers, based on learning results obtained in the step of learning topic numbers.
(108) (2) The method for generating topic inference information of lyrics as described in (1) further comprises:
(109) a step of learning values of switch variables, wherein a value of the switch variable is assigned to each of the plurality of words included in the plurality of lyrics data of each of the plurality of artists in a random or arbitrary manner; then a probability distribution A.sub.a over values of switch variables is generated by calculating a probability whether the value of the switch variable x assigned to the given word v.sub.arj is a topic word or a background word, based on values of switch variables assigned to the plurality of words in the plurality of lyrics data of the given artist a; next an operation of updating switch variables is performed to update the value of the switch variable assigned to the given word using a random number generator having a deviation of appearance probability corresponding to the probability distribution over values of the switch variables; and the operation of updating and learning values of switch variables, which performs the operation of updating values of switch variables on all of the plurality of words included in the plurality of lyrics data of each of the plurality of artists, is performed for a predetermined number of times.
(110) (3) The method for generating topic inference information of lyrics as described in (1), wherein:
(111) in the step of learning topic numbers, it is assumed that topic numbers assigned to all of the plurality of lyrics but the topic number assigned to the given lyrics data of the given artist are correct when generating the probability distribution over topic numbers.
(112) (4) The method for generating topic inference information of lyrics as described in (2), wherein:
(113) in the step of learning values of switch variables, it is assumed that values of switch variables assigned to all of words but the value of the switch variable x assigned to the given word in the plurality of words of the given lyrics data of the given artist are correct when performing the operation of updating switch variables.
(114) (5) The method for generating topic inference information of lyrics as described in (1), wherein the step of learning topic numbers:
(115) calculates a first probability p.sub.1 that the topic number of the given lyrics data S.sub.ar is k, based on the number R.sub.ak of lyrics data other than the given lyrics data S.sub.ar of the given artist a when generating a probability distribution over topic numbers;
(116) calculates a second probability p.sub.2 that the topic number of the given lyrics data S.sub.ar is k, based on the number N.sub.kv of times that the topic number k is assigned to the word v in the plurality of lyrics data of the plurality of artists other than the given lyrics data S.sub.ar;
(117) calculates the probability p that the topic number of the given lyrics data S.sub.ar is k, from the first probability p.sub.1 and the second probability p.sub.2; and
(118) determines a probability distribution over topic numbers of the given lyrics data S.sub.ar by performing the above-identified calculations on all of the topic numbers and normalizing probabilities that the topic number of the given lyrics data S.sub.ar is any one of 1 to K such that normalized probabilities sum up to 1 (one).
(119) (6) The method for generating topic inference information of lyrics as described in (1), wherein the outputting step is configured to output a probability distribution over words for each topic number, based on the number N.sub.kv of times that the topic number k is assigned to a given word v as used in the step of calculating the second probability p.sub.2.
(7) The method for generating topic inference information of lyrics as described in (6), wherein:
(120) in the outputting step, an occurrence probability θ.sub.kv of a word v to which the topic number k is assigned is calculated as follows:
θ.sub.kv=(N.sub.kv+β)/(N.sub.k+β|V|)
(121) where N.sub.kv denotes a number of times that a topic number k is assigned to a given word v, N.sub.k denotes a number of all of words to which the topic number k is assigned, β denotes a smoothing parameter, and |V| denotes a number of kinds of words.
(122) (8) The method for generating topic inference information of lyrics as described in (2), wherein the step of learning values of switch variables:
(123) calculates a third probability p.sub.3 that the value of switch variable of the word v.sub.arj is 0 (zero), based on a number N.sub.a0 of words to which a value of 0 (zero) is assigned as the value of the switch variable in all of lyric data of all of songs of the given artist a;
(124) calculates a fourth probability p.sub.4 that the value of the switch variable of the word v.sub.arj is 0 (zero), based on a number Nz.sub.arv.sub.arj of times that 0 (zero) is assigned to the value of the switch variable of the word v.sub.an in all of sons of all of artists to which the same topic number Z.sub.ar as the lyrics including the word v.sub.arj is assigned;
(125) calculates a fifth probability p.sub.5 that the value of the switch variable is 0 (zero) from the third probability p.sub.3 and the fourth probability p.sub.4;
(126) calculates a sixth probability p.sub.6 that the value of the switch variable of the word v.sub.arj is 1 (one), based on a number N.sub.a1 of times that 1 (one) is assigned as the value of the switch variable in the plurality of lyrics data of the given artist;
(127) calculates a seventh probability p.sub.7 that the value of the switch variable of the word v.sub.arj is 1 (one), based on a number N.sub.1varj of times that 1 (one) is assigned as the value of the switch variable of the word v.sub.arj in the plurality of lyrics data of the plurality of artists;
(128) calculates an eighth probability p.sub.8 that the value of switch variable is 1 (one) from the sixth probability p.sub.6 and the seventh probability p.sub.7; and
(129) normalize the probabilities from the fifth probability p.sub.5 and the eighth probability p.sub.8 such that a sum of the probability that the value of the switch variable of the word v.sub.arj is 0 (zero) and the probability that the value of the switch variable of the word v.sub.arj is 1 (one) is 1 (one) to obtain a probability distribution over values of switch variables.
(130) (9) The method for generating topic inference information of lyrics as described in (1), wherein:
(131) the topic number of each of the plurality of lyrics data in the outputting means is a topic number that is last assigned to each of the plurality of lyrics data after the operation of updating and learning topic numbers is performed for a predetermined number of times in the step of learning topic numbers.
(132) (10) The method for generating topic inference information of lyrics as described in (1) or (2), further comprises:
(133) a first step of generating a word probability distribution over words included in lyrics data of a new song s of an artist that has not been used in learning;
(134) a second step of generating a word probability distributions over words included respectively in lyrics data of the plurality of songs of the plurality of artists;
(135) a step of computing similarities, respectively obtain similarities between the probability distribution of the words included in the lyrics data of the new song s as calculated by the first step of generating a word probability and the probability distributions over words included in the lyrics data of the plurality of songs as calculated by the second step of generating word probability distributions;
(136) a step of generating a weight distribution by adding the similarities of the lyrics data of the plurality of songs corresponding to the lyrics data of the plurality of songs to the topic numbers as a weight; and
(137) a step of determining a topic number, determining a topic number having a largest weight as the topic number of the lyrics data of the new song s.
(138) (11) The method for generating topic inference information of lyrics as described in (10), further comprises:
(139) a third step of generating a word probability distribution over words included in the lyric data of all of songs of the artist that have not been used in learning and for which an occurrence probability of background words are to be calculated;
(140) a fourth step of generating word probability distributions over words in the lyrics data of all of the songs of each of the artist;
(141) a fifth step of generating a probability distribution over background words included in the lyrics data of all of songs of each of the artists;
(142) a step of computing similarities, respectively obtaining similarities between the probability distribution over words included in the lyrics data of the new song s as calculated by the third step of generating a word probability and the probability distributions over words included in the lyrics data of the plurality of songs as calculated by the forth step of generating word probability distributions; and
(143) a step of generating an occurrence probability distribution over background words, multiplying the respective probability distributions over the background words included in the lyrics data of all of the songs of each of the artists as calculated by the fifth step of generating a word probability distribution by the similarities of each of the artists as computed by the step of computing similarities to obtain probability distributions, and normalizing the obtained probability distributions such that the weights for each of the artists sum up to 1 (one), and then determining a resulting probability distribution as the occurrence probability distribution over background words.
(144) (12) A computer program for implementing the steps of the method for generating topic inference information of lyrics as described in any one of (1) to (11) using a computer.
(145) (13) The computer program for generating topic inference information of lyrics as described in (12) is recorded in a computer-readable medium.
INDUSTRIAL APPLICABILITY
(146) According to the present invention, once an arbitrary number of topics are determined, the respective topic numbers are identified for a plurality of lyrics data with the topic numbers for the lyrics data that are finally updated by the means of learning topic numbers. Once the topic number of each of the lyrics data is grasped, a word probability distribution can be known for each topic number. This accordingly eliminates the need of manually specifying a collection of words related to the topics and a collection of unrelated words. Further, once a plurality of words having a high occurrence probability are grasped, reliable information for determining the topics can be obtained from the thus grasped words, thereby obtaining likely meaning of the topic of each lyrics.
DESCRIPTION OF REFERENCE SIGNS
(147) 1, 101 System for generating topic inference information of lyrics 3, 103 Lyrics database 5, 1 05 Means for obtaining topic numbers 7, 107 Means for generating topic numbers 9, 109 Means for learning topic numbers 11, 111 Analysis means 13, 113 Outputting means 115 Means for learning values of switch variables