Audio processing method and audio processing device for expanding or compressing audio signals
10891966 ยท 2021-01-12
Assignee
Inventors
Cpc classification
International classification
Abstract
An audio processing device includes a feature extraction unit and signal generating unit. The feature extraction unit is configured to extract a feature quantity of a first audio signal for each of a plurality of periods. The signal generating unit is configured to for generate a second audio signal by time axis expanding/compressing either a section of the first audio signal in which the feature quantity is steadily maintained for a period time, or a section of the first audio signal in which a fluctuation of the feature quantity is repeated and excluding from the time axis expanding/compressing a section of the first audio signal in which a fluctuation of the feature quantity is not similar to that of other sections of the first audio signal.
Claims
1. An audio processing method comprising: extracting a feature quantity of a first audio signal for each of a plurality of first periods; calculating a similarity index of the feature quantity between each of the plurality of first periods; executing a time correspondence process for making each one of the plurality of first periods substantially equal to a corresponding one of a plurality of second periods within a target period after expansion/compression of the first audio signal, in accordance with the similarity index and a transition cost for transitioning between each of the plurality of first periods, in the time correspondence process, a minimum value of an allocation cost immediately preceding one of the plurality of second periods being sequentially calculated as a basic cost for each of the plurality of second periods, and each of the plurality of first periods being made substantially equal to the corresponding one of the plurality of second periods so as to minimize the allocation cost in accordance with the basic cost of the immediately preceding one of the plurality of second periods, the similarity index, and the transition cost; and generating a second audio signal over the target period from a result obtained by making each one of the plurality of first periods substantially equal to the corresponding one of the plurality of second periods.
2. The audio processing method according to claim 1, wherein in the time correspondence process, the transition cost between two first periods from among the plurality of first periods is set to a first value when a time difference between the two first periods is below a threshold value and is set to a second value that is greater the first value when the time difference exceeds the threshold value.
3. The audio processing method according to claim 1, wherein in the time correspondence process, the basic cost is set for each of the plurality second periods such that each of the plurality of first periods within a prescribed range is made substantially equal to the corresponding one of the plurality of second periods based on a provisional relationship between each of the plurality of first periods and each of the plurality of second periods.
4. The audio processing method according to claim 3, wherein the provisional relationship is a linear relationship.
5. The audio processing method according to claim 3, wherein the provisional relationship is a curvilinear relationship.
6. The audio processing method according to claim 1, wherein in the time correspondence process, the basic cost is set such that one of the plurality of first periods corresponding to a sound generation point of the first audio signal, and one of the plurality of second periods corresponding to the sound generation point based on a provisional relationship between each of the plurality of first periods and each of the plurality of second periods, correspond to each other.
7. The audio processing method according to claim 6, wherein the provisional relationship is a linear relationship.
8. The audio processing method according to claim 6, wherein the provisional relationship is a curvilinear relationship.
9. The audio processing method according to claim 1, wherein in the time correspondence process, the transition cost to be applied to the time correspondence process is specified from a transition matrix whose elements are transition costs that correspond to combinations of the plurality of first periods.
10. The audio processing method according to claim 1, wherein in the time correspondence process, the transition cost to be applied to the time correspondence process is specified from a transition vector that corresponds to one column of a transition matrix whose elements are transition costs that correspond to combinations of each of the plurality of first periods.
11. An audio processing device comprising: an electronic controller having a feature extraction unit, an index calculation unit, an analysis processing unit and a signal generating unit, the feature extraction unit being configured to extracting a feature quantity of a first audio signal for each of a plurality of first periods; the index calculation unit being configured to calculate a similarity index of the feature quantity between each of the plurality of first periods; the analysis processing unit being configured to make each of the plurality of first periods substantially equal to a corresponding one of a plurality of second periods within a target period after expansion/compression of the first audio signal in accordance with the similarity index and a transition cost for transitioning between each of the plurality of first periods, the analysis processing unit being configured to sequentially calculate a minimum value of an allocation cost immediately preceding one of the plurality of second periods as a basic cost for each of the plurality of second periods, and configured to make each of the plurality of first periods substantially equal to the corresponding one of the plurality of second periods so as to minimize the allocation cost in accordance with the basic cost of the immediately preceding one of the plurality of second periods, the similarity index, and the transition cost; and the signal generating unit being configured to generate a second audio signal over the target period from a result obtained upon the analysis processing unit making each of the plurality of first periods substantially equal to the corresponding one of the plurality of second periods.
12. The audio processing device according to claim 11, wherein the analysis processing unit is configured to set the basic cost for each of the plurality second periods such that each of the plurality of first periods within a prescribed range is made substantially equal to the corresponding one of the plurality of second periods based on a provisional relationship between each of the plurality of first periods and each of the plurality of second periods.
13. The audio processing device according to claim 12, wherein the provisional relationship is a linear relationship.
14. The audio processing device according to claim 12, wherein the provisional relationship is a curvilinear relationship.
15. The audio processing device according to claim 11, wherein the analysis processing unit is configured to set the basic cost such that one of the plurality of first periods corresponding to a sound generation point of the first audio signal, and one of the plurality of second periods corresponding to the sound generation point based on a provisional relationship between each of the plurality of first periods and each of the plurality of second periods, correspond to each other.
16. The audio processing device according to claim 15, wherein the provisional relationship is a linear relationship.
17. The audio processing device according to claim 15, wherein the provisional relationship is a curvilinear relationship.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
DETAILED DESCRIPTION OF THE EMBODIMENTS
First Embodiment
(11) Selected embodiments will now be explained with reference to the drawings. It will be apparent to those skilled in the position detection field and the substrate field from this disclosure that the following descriptions of the embodiments are provided for illustration only and not for the purpose of limiting the invention as defined by the appended claims and their equivalents.
(12)
(13) A program that is executed by the electronic controller 12 and various data that are used by the electronic controller 12 are stored in the storage device 14. The storage device 14 is any computer storage device or any computer readable medium with the sole exception of a transitory, propagating signal. The storage device 14 can include nonvolatile memory and volatile memory. For example, the storage device 14 can includes a ROM (Read Only Memory) device, a RAM (Random Access Memory) device, a hard disk, a flash drive, etc. Thus, any known storage medium, such as a magnetic storage medium or a semiconductor storage medium, or a combination of a plurality of types of storage media can be freely employed as the storage device 14. An audio signal x.sub.A (example of a first audio signal) that represents various sounds such as musical sounds, voice, and the like are stored in the storage device 14 of the first embodiment. It is also possible, for example, to supply an audio signal x.sub.A to the audio processing device 100 from a reproduction device that reproduces the audio signal x.sub.A that is stored in a storage medium, such as an optical disc.
(14) The electronic controller 12 is formed of one or more semiconductor chips that are mounted on a printed circuit board. The term electronic controller as used herein refers to hardware that executes software programs. The electronic controller 12 includes a processing circuit such as a CPU (Central Processing Unit) having at least one processor that comprehensively controls each element of the audio processing device 100. As is illustrated in
(15) The input device 16 is a user operable input device that receives instructions from a user. For example, a plurality of operators or a touch panel can be suitably used as the input device 16. By appropriately operating the input device 16, the user can arbitrarily set the expansion/compression ratio . The expansion/compression ratio is a time ratio of the audio signal x.sub.B relative to the audio signal x.sub.A. That is, as illustrated in
(16) As illustrated in
(17) The feature extraction unit 22 extracts a feature quantity F relating to the acoustic characteristics of the audio signal x.sub.A. As illustrated in
(18) The index calculation unit 24 calculates similarity indices R.sub.n, m of the feature quantities F between each of the K periods U.sub.A of the audio signal x.sub.A. The index calculation unit 24 of the first embodiment generates a similarity matrix MR such as that illustrated in
(19) The analysis processing unit 26 makes one of the K periods U.sub.A of the audio signal X.sub.A correspond to each of a plurality (Q) periods U.sub.B within a target period of
(20) The signal generating unit 28 generates an audio signal x.sub.B over the target period from the result (indices Z.sub.1 to Z.sub.Q) of the analysis processing unit 26 making the period U.sub.A correspond to each of the Q periods U.sub.B. Briefly, the audio signal x.sub.B over the target period is generated by arranging the period U.sub.A specified by one arbitrary index Z.sub.q from among the K periods U.sub.A of the audio signal x.sub.A over the Q periods U.sub.B.
(21) Specifically, the signal generating unit 28 generates the complex spectra X.sub.B1 to X.sub.BQ of the audio signal x.sub.B for each period U.sub.B from the complex spectra X.sub.A1 to X.sub.AK of each period U.sub.A of the audio signal x.sub.A, converts each of the plurality of complex spectra X.sub.B1 to X.sub.BQ into the time domain by an inverse Fourier transform and then interconnects them, thereby generating an audio signal x.sub.B. The complex spectrum X.sub.Bq of the audio signal x.sub.B in one arbitrary period U.sub.B, for example, can be expressed by the following formula (1).
Formula 1
X.sub.Bq=|X.sub.AZq|(arg X.sub.Bq1+.sub.q)
X.sub.B1=X.sub.AZ1
.sub.q=arg(X.sub.AZq)arg(X.sub.AZq1)(1)
(22) That is, the complex spectrum X.sub.Bq of the qth period U.sub.B of the audio signal x.sub.B is made up of the amplitude spectrum |X.sub.AZq| of the period U.sub.A of the audio signal x.sub.A specified by the index Z.sub.q and the phase spectrum obtained by adding the phase difference .sub.q to the phase angle arg X.sub.Bq1 of the immediately preceding (q1)th period U.sub.B. The phase difference .sub.q is the difference between the phase angle arg (X.sub.AZq) for the period U.sub.A of the audio signal x.sub.A specified by the index Z.sub.q and the phase angle arg (X.sub.AZq1) of the immediately preceding period U.sub.A. That is, the signal generating unit 28 of the first embodiment generates the complex spectrum X.sub.Bq of the audio signal x.sub.B by using a phase vocoder technique. However, the method for generating an audio signal x.sub.B corresponding to the processing result by the analysis processing unit 26 is not limited to the example described above. For example, it is also possible to generate an audio signal x.sub.B by using audio processing technique such as PSOLA (Pitch Synchronous Overlap and Add), or the like.
(23) The specific operation of the analysis processing unit 26 will now be described.
(24) The analysis processing unit 26 calculates a basic cost C.sub.n,q for each period U.sub.A Of the audio signal x.sub.A for each of the Q periods U.sub.B within the target period (S31). The basic cost C.sub.n,q is calculated for each combination of each of the K periods U.sub.A and each of the Q periods U.sub.B. As illustrated in
(25)
(26) As can be understood from formula (2), the allocation cost .sub.q1,n,m that is used for calculating the basic cost C.sub.n,q that corresponds to the qth period U.sub.B and the nth period U.sub.A is the sum of the basic cost C.sub.m,q1 of the immediately preceding period U.sub.B, the similarity index R.sub.n1,m, and the transition cost T.sub.n,m. The similarity index R.sub.n1,m is the distance of the feature quantity F between the (n1)th period U.sub.A of the audio signal x.sub.A and an arbitrary (mth) period U.sub.A of the audio signal x.sub.A. Therefore, the allocation cost .sub.q1,n,m becomes a smaller numerical value and becomes more likely to be selected as the basic cost C.sub.n,q, as the feature quantities F become more similar between the (n1)th period U.sub.A and the mth period U.sub.A of the audio signal x.sub.A.
(27) The transition cost T.sub.n,m is the cost when transitioning from the nth period U.sub.A to an arbitrary (mth) period U.sub.A of the audio signal x.sub.A. Specifically, as shown in
(28) If there is a jump in the audio signal x.sub.B to a period U.sub.A (mth) that is separated from the nth period U.sub.A of the audio signal x.sub.A on the time axis, then the reproduced audio signal x.sub.B creates an unnatural sound. Therefore, the analysis processing unit 26 sets the transition cost T.sub.n,m for a transition from the nth period U.sub.A to a period U.sub.A that is ahead of time t.sub.1, which is earlier than the nth period U.sub.A by a threshold .sub.1 (n.sub.1>m), to a numerical value .sub.H. Similarly, the analysis processing unit 26 sets the transition cost T.sub.n,m for a transition from the nth period U.sub.A to a period U.sub.A that is after time t.sub.2, which is later than the nth period U.sub.A by a threshold .sub.2 (n+.sub.2<m), to a numerical value .sub.H. The numerical value .sub.H is a sufficiently lame numerical value (for example, to .sub.H=). Therefore, the allocation cost .sub.q1,n,m that corresponds to a transition from the nth period U.sub.A to a period ahead of time t.sub.1, or, the allocation cost .sub.q1,n,m that corresponds to a transition from the nth period to a period after time t.sub.2, is not selected as the basic cost C.sub.n,q. On the other hand, the transition cost T.sub.n,m for a transition from the nth period U.sub.A to a period between time t.sub.1, which is earlier than the nth period U.sub.A by a threshold .sub.1 and time t.sub.2, which is later than the nth period U.sub.A by a threshold .sub.2 (n.sub.1mn+.sub.2), is set to a numerical value .sub.L. The numerical value .sub.L is a numerical value that is sufficiently less than the numerical value .sub.H (for example, zero). That is, a transition within a prescribed range with respect to the nth period U.sub.A is permitted. The setting of the transition cost T.sub.n,m illustrated above can be expressed by the following formula (3).
(29)
(30) In addition to the calculation of the basic cost C.sub.n,q illustrated above, the analysis processing unit 26 of the first embodiment calculates a candidate index I.sub.n,q by using the following recurrence formula (4) (S32).
(31)
(32) That is, the analysis processing unit 26 calculates a variable in that minimizes the allocation cost .sub.q1,n,m as a candidate index I.sub.n,q of the qth period U.sub.B. Specifically, a variable m that corresponds to the minimum value of K allocation costs .sub.q1,n,1 to .sub.q1,n,K, calculated for the immediately preceding ((q1)-th) period U.sub.B and corresponding to different periods U.sub.A, is adopted as the candidate index I.sub.n,q of the period U.sub.B.
(33) Then, as is expressed by the following formula (5), the analysis processing unit 26 sets an index Z.sub.Q at the end (qth) of the target period to the number K of the period U.sub.A that is positioned at the end of the audio signal x.sub.A, and, by tracking back the candidate index I.sub.n,q (backtrack) toward the front of the time axis therefrom, sets an index Z.sub.q for each of the Q periods U.sub.B within the target period (S33).
(34)
(35)
(36) When the time axis expansion/compression process is started, the feature extraction unit 22 extracts a feature quantity F for each period U.sub.A of the audio signal x.sub.A stored in the storage device 14 (S1). The index calculation unit 24 calculates similarity indices R.sub.n,m of the feature quantities F extracted by the feature extraction unit 22 between each of the K periods U.sub.A of the audio signal x.sub.A (S2).
(37) The analysis processing unit 26 makes the period U.sub.A correspond to each of the Q periods U.sub.B within the target period by using the time correspondence process S3 (S31-S33) described above with reference to
(38)
(39) In addition, because the allocation cost .sub.q1,n,m of the first embodiment is calculated according to the transition cost T.sub.n,m from the nth period U.sub.A to the mth period U.sub.A, a transition between two periods U.sub.A that widely diverge from each other on the time axis is restricted. From the above point of view as well, it is possible to realize the above-described effect of being able to expand/compress the audio signal x.sub.A while maintaining auditory naturalness. In the first embodiment in particular, the transition cost T.sub.n,m is set to the numerical value .sub.L (example of a first value) when the time difference between the nth period U.sub.A and the mth period U.sub.A is below a threshold value (n.sub.1mn+.sub.2), and the transition cost T.sub.n,m is set to the numerical value .sub.H (example of a second value) when the time difference exceeds the threshold value (n.sub.1>m, n+.sub.2<M). That is, the transition between two periods U.sub.A of the audio signal x.sub.A is constrained within a prescribed range. Therefore, it is to be noted that the above-described effect, that it is possible to expand/compress audio signals while maintaining auditory naturalness, is remarkable.
Second Embodiment
(40) The second embodiment of the present invention will now be described. In each of the embodiments illustrated below, elements that have the same actions or functions as in the first embodiment have been the same reference symbols as those used to describe the first embodiment, and detailed descriptions thereof have been appropriately omitted.
(41) In the second embodiment, as well as in the third embodiment, which is described below, a provisional relationship (hereinafter referred to as provisional relationship) is set between each of the periods U.sub.A of the audio signal x.sub.A and each of the periods U.sub.B of the audio signal x.sub.B, and an index Z.sub.q is set for each of the periods U.sub.B within the target period so as to not excessively deviate from the provisional relationship. As illustrated in
(42)
(43) As can be understood from formula (6), under the provisional relationship, the Kth period U.sub.A of the audio signal x.sub.A corresponds to the qth period U.sub.B (q=Q=K)(A.sub.Q=K). As can be understood from formula (6), it can also be said that the provisional relationship of the second embodiment is a correspondence relationship between each period U.sub.A and each period U.sub.B, when the audio signal x.sub.A is uniformly expanded/compressed over all the sections to generate the audio signal x.sub.B.
(44) In the second embodiment, the basic cost C.sub.n,q is set such that the relationship between each period U.sub.A and each period U.sub.B specified by the index Z.sub.q does not deviate widely from the provisional relationship of formula (6). Specifically, the analysis processing unit 26 sets the basic cost C.sub.n,q by means of the following formula (7).
Formula 7
C.sub.n,q=.sub.H if |A.sub.qn|>.sub.TH(7)
(45) As can be understood from formula (7), of K basic costs C.sub.t,q to C.sub.K,q that are calculated for the qth period U.sub.B, a basic cost C.sub.n,q that is outside of a prescribed range (hereinafter referred to as allowable range) that corresponds to the period U.sub.B on the basis of the provisional relationship of formula (6), is set to the numerical value .sub.H. As is illustrated in
(46) As can be understood from the description above, in the second embodiment, the basic cost C.sub.n,q is set such that a period U.sub.A within an allowable range defined by the provisional relationship of formula (6) corresponds to the qth period U.sub.B. Thus, it is possible to generate the audio signal x.sub.B within a range that does not deviate widely from the provisional relationship between each period U.sub.A and each period U.sub.B.
Third Embodiment
(47)
(48) Specifically, the analysis processing unit 26 sets the basic cost C.sub.n,q as in formula (8) below with respect to a period U.sub.B corresponding to a sound generation point t.sub.A of the audio signal X.sub.A under the provisional relationship (that is, the period U.sub.B in which A.sub.q=t.sub.A).
(49)
(50) As can be understood from formula (8) and formula (10), of K basic costs C.sub.1,q to C.sub.K,q that arc calculated for the qth period U.sub.B corresponding to the sound generation point t.sub.A under the provisional relationship, a basic cost C.sub.n,q of one period U.sub.A in which the sound generation point t.sub.A exists (n=A.sub.q) is set to the numerical value .sub.L. On the other hand, the basic cost C.sub.n,q of a period U.sub.A in which the sound generation point t.sub.A does not exist (n A.sub.q) is set to a numerical value .sub.H, which sufficiently exceeds the numerical value .sub.L. The numerical value .sub.L is, for example, set to zero (.sub.L=0), and the numerical value .sub.H is, for example, set to infinity (.sub.H=).
(51) According to the configuration above, with respect to a period U.sub.B corresponding to the sound generation point t.sub.A wider the provisional relationship, only the number n of the period U.sub.A, which corresponds to said sound generation point t.sub.A from among K periods U.sub.A, is employed as the index Z.sub.q. Therefore, the time ratio between each sound generation point t.sub.A in the sound generation point t.sub.A is also equally maintained in the audio signal x.sub.B. That is, according to the second embodiment, there is the benefit that it is possible to generate an audibly natural audio signal x.sub.B, in which the rhythm of the generated sound remains equal to that of audio signal x.sub.A. It is also possible to apply the configuration of the second embodiment to the third embodiment.
Modifications
(52) Each of the embodiments exemplified above may be variously modified. Specific modified embodiments are illustrated below. Two or more embodiments arbitrarily selected from the following examples can be appropriately combined as long as they are not mutually contradictory.
(53) (1) In each of the above-described embodiments, the analysis processing unit 26 sets the transition cost T.sub.n,m with reference to the transition matrix MT illustrated in
(54) (2) In each of the above-described embodiments, all of the sections of the audio signal x.sub.A are expanded/compressed with a common expansion/compression ratio ; however, it is also possible to change the expansion/compression ratio in real-time at an arbitrary point in time of the audio signal x.sub.B. For example, a configuration is assumed in which the target period is divided into a plurality of unit sections on a time axis, and the time axis expansion/compression process of
(55) (3) In each of the above-described embodiments, a linear relationship is exemplified (formula (6)) as the provisional relationship between each period U.sub.A of the audio signal x.sub.A and each period U.sub.B of the audio signal x.sub.B; however, the provisional relationship is not limited to the example described above. For example, it is also possible to employ a curvilinear relationship (for example, A.sub.q=q.sup.2) as the provisional relationship between each period U.sub.A and each period U.sub.B (where is a prescribed positive number).
(56) (4) It is also possible to realize the audio processing device 100 with a server device that communicates with terminal devices (for example, mobile phones and smartphones) via a communication network such as a mobile communication network or the Internet. Specifically, the audio processing device 100 generates an audio signal x.sub.B by means of the time axis expansion/compression process illustrated in
(57) (5) The audio processing device 100 illustrated in each of the above-described embodiments is realized cooperation between the electronic controller 12 and a program, as is illustrated in each of the above-described embodiments. A program according to a preferred aspect of the present invention causes a computer to function as a feature extraction unit 22 for extracting a feature quantity F of an audio signal x.sub.A for each of a plurality of periods U.sub.A; as an index calculation unit 24 for calculating a index R.sub.n,m of the feature quantity F between each of the periods U.sub.A; as an analysis processing unit 26 for making one of the plurality of periods U.sub.A correspond to each of a plurality of periods U.sub.B within a target period such that an allocation cost .sub.q1,n,m corresponding to the similarity index R.sub.n,m between each period U.sub.A and a transition cost T.sub.n,m for transitioning between each period U.sub.A is minimized; and as a signal generating unit 28 for generating an audio signal x.sub.B over the target period from the result obtained when the analysis processing unit 26 causes the period U.sub.A to correspond to each of the plurality of periods U.sub.B.
(58) The program exemplified above can be stored on a computer-readable storage medium and installed in a computer. The storage medium is, for example, a non-transitory (non-transitory) storage medium, a good example of which is an optical storage medium, such as a CD-ROM (optical disc), but may include well-known arbitrary storage medium formats, such as semiconductor storage media and magnetic storage media. Non-transitory storage media include any storage medium that excludes transitory propagating signals and does not exclude volatile storage media. Furthermore, it is also possible to deliver the program to a computer in the form of distribution via a communication network.
(59) (6) For example, the following configurations may be understood from the embodiments exemplified above.
Aspect 1
(60) An audio processing method according to a preferred aspect (Aspect 1) of the present invention comprises extracting a feature quantity of a first audio signal for each of a plurality of periods; and generating a second audio signal by time axis expanding/compressing either a section of the first audio signal in which the feature quantity is steadily maintained for a period time, or a section of the first audio signal in which a fluctuation of the feature quantity is repeated and excluding from the time axis expanding/compressing a section in which a fluctuation of the feature quantity is not similar to that of other sections. Thus, for example, compared with a configuration in which the first audio signal is uniformly expanded/compressed over all the sections including both a steady section in which the feature quantity is steadily maintained and a transient section in which the feature quantity fluctuates unsteadily, it is possible to expand compress the audio signal while maintaining auditory naturalness.
Aspect 2
(61) An audio processing method according to a preferred aspect (Aspect 2) of the present invention comprises extracting a feature quantity of a first audio signal for each of a plurality of first periods; calculating a similarity index of the feature quantity between each of the plurality of first periods; executing a time correspondence process for making one of the plurality of first periods correspond to a plurality of second periods within a target period after expansion/compression of the first audio signal in accordance with the similarity index and a transition cost for transitioning between each of the plurality of first periods; and generating a second audio signal over the target period from a result obtained making the plurality of first periods correspond to the plurality of second periods. In the aspect described above, a first period is made to correspond to each second period within the target period such that the allocation cost corresponding to the similarity index between each first period is minimized. That is, a section of the first audio signal in which the feature quantity is steadily maintained on the time axis and or a section in which a fluctuation of the feature quantity is repeated (for example, one cycle of vibrato) is expanded/compressed on the time axis, and sections in which a fluctuation of the feature quantity does not resemble that of other sections (for example, a transient section in which the feature quantity fluctuates unsteadily, such as a glissando) are excluded as an object of expansion/compression. Thus, for example, compared to a configuration in which the first audio signal is uniformly expanded/compressed over all the sections including both a steady section in which the feature quantity is steadily maintained and a transient section in which the feature quantity fluctuates unsteadily, it is possible to expand/compress the audio signal while maintaining auditory naturalness. In addition, a first period is made to correspond to each second period within the target period, in in correspondence with the transition cost for transitioning between each of the first periods. Therefore, transitions between first periods that are widely divergent on the time axis is restricted. From the above point of view as well, it is possible to realize the above-described effect of being able to expand/compress the audio signal while maintaining auditory naturalness.
Aspect 3
(62) In a preferred example (Aspect 3) of Aspect 2, in the time correspondence process, one of the plurality of first periods is made to correspond to each of the plurality of second periods within the target period after expansion/compression of the first audio signal, such that an allocation cost, corresponding to the similarity index and to the transition cost for transitioning between each of the plurality of first periods is reduced. In the aspect described above, a first period is made to correspond to each second period within the target period such that the allocation cost is reduced. Therefore, transitions between first periods that are widely divergent on the time axis is restricted.
Aspect 4
(63) In a preferred example (Aspect 4) of Aspect 3, in the time correspondence process, one of the plurality of first periods is made to correspond to each of the plurality of second periods within the target period after expansion/compression of the first audio signal, such that the allocation cost is minimized. In the aspect described above, in the aspect described above, a first period is made to correspond to each second period within the target period such that the allocation cost is minimized. Therefore, the effect that transitions between first periods that are excessively divergent on the time axis is restricted is remarkable.
Aspect 5
(64) In a preferred example (Aspect 5) of any one of Aspects 2 to 4, in the time correspondence process, the transition cost between two first periods from among the plurality of first periods is set to a first value when a time difference between the two first periods is below a threshold value and is set to a second value that is greater the first value when the time difference exceeds the threshold value. In the aspect described above, because the transition cost is set to a first value when the time difference between two first periods is below a threshold value, and the transition cost is set to a second Value that is greater the first value when the time difference exceeds the threshold value, it is possible to constrain the transition between two first periods to within a prescribed range. Therefore, it is to be noted that the above-described effect, that it is possible to expand/compress audio signals while maintaining auditory naturalness, is remarkable.
Aspect 6
(65) It a preferred example (Aspect 6) of any one of Aspects 2 to 5, in the time correspondence process, a minimum value of an allocation cost immediately preceding one of the plurality of second period is sequentially calculated as a basic cost for each of the plurality of second periods, and one of the plurality of first periods is made to correspond to each of the plurality of second periods so as to minimize the allocation cost in accordance with the basic cost of the immediately preceding one of the plurality of second periods, the similarity index, and the transition cost.
Aspect 7
(66) In a preferred example (Aspect 7) of Aspect 6, in the time correspondence process, the basic cost is set for each of the plurality second periods such that one of the plurality of first period within a prescribed range corresponds to one of the plurality of second periods, based on a provisional relationship between each of the plurality of first periods and each of the plurality of second periods. In the aspect described above, the basic cost is set such that a first period corresponds to each of a plurality second periods within a prescribed range that corresponds to the second period, on the basis of a provisional relationship between each first period and each second period. Thus, it is possible to generate a second audio signal within a range that does not deviate widely from a provisional relationship between each first period and each second period.
Aspect 8
(67) In a preferred example (Aspect 8) of Aspect 6 or 7, in the time correspondence process, the basic cost is set such that one of the plurality of first periods corresponding to a sound generation point of the first audio signal and one of the plurality of second period corresponding to the sound generation point based on a provisional relationship between each of the plurality of first periods and each of the plurality of second periods correspond to each other. In the aspect described above, the basic cost is set such that a first period corresponding to a sound generation point of a first audio signal and a second period corresponding to the sound generation point on the basis of a provisional relationship between each first period and each second period correspond to each other. That is, a second audio signal that reflects the time ratio between each sound generation point in the first audio signal (for example, a second audio signal in which the time ratio between each sound generation point is kept the same as in the first audio signal) is generated. Therefore, there is the benefit that it is possible to generate an audibly natural second audio signal in which the rhythm of the sound remains equal to that of the first audio signal.
Aspect 9
(68) In a preferred example (aspect 9) of aspect 7 or 8, the provisional relationship is a linear relationship. In the aspect described above, there is the benefit that the provisional relationship is simplified.
Aspect 10
(69) In a preferred example (aspect 10) of aspect 7 or 8, the provisional relationship is a curvilinear relationship. In the aspect described above, it is possible to make the first period and the second period correspond to each other by means of various types of relationships that are not limited to a linear relationship.
Aspect 11
(70) In a preferred example (Aspect 11) of any one of Aspects 2 to 10, in the time correspondence process the transition cost to be applied to the time correspondence process is specified from a transition matrix whose elements are transition costs that correspond to combinations of the plurality of first periods.
Aspect 12
(71) In a preferred example (Aspect 12) of any one of Aspects 2 to 10, in the time correspondence process, a transition cost to be applied to the time correspondence process is specified from a transition vector that corresponds to one column of a transition matrix whose elements are transition costs that correspond to combinations of each of the plurality of first periods. In the aspect described above, because the transition cost is specified from a transition vector that corresponds to one column of a transition matrix, it is not necessary to store an entire transition matrix. Therefore, there is the benefit that the storage capacity required for the time correspondence process can be reduced.
Aspect 13
(72) An audio processing device according to a preferred aspect (Aspect 13) of the present invention comprises an electronic controller having a feature extraction unit and a signal generating unit. The feature extraction unit is configured to extract a feature quantity of a first audio signal for each of a plurality of periods. The signal generating unit is configured to generate a second audio signal by time axis expanding/compressing on a time axis either a section of the first audio signal in which the feature quantity is steadily maintained for a period time, or a section of the first audio signal in which a fluctuation of the feature quantity is repeated and excluding from the time axis expanding/compressing a section of the first audio signal in which a fluctuation of the feature quantity is not similar to that of other sections of the first audio signal. According to the configuration described above, for example, compared to a configuration in which the first audio signal is uniformly expanded/compressed over all the sections including both a steady section in which the feature quantity is steadily maintained and a transient section in which the feature quantity fluctuates unsteadily, it is possible to expand/compress the audio signal while maintaining auditory naturalness.
Aspect 14
(73) An audio processing device according to a preferred aspect (Aspect 14) of the present invention comprises an electronic controller having a feature extraction unit, an index calculation unit, an analysis processing unit and a signal generating unit. The feature extraction unit is configured to extract a feature quantity of a first audio signal for each of a plurality of first periods; an index calculation unit is configured to calculate a similarity index of the feature quantity between each of the plurality of first periods. The analysis processing unit is configured to make the plurality of first periods correspond to a plurality of second periods within a target period after expansion/compression of the first audio signal in accordance with the similarity index and a transition cost for transitioning between each of the plurality of first periods. The signal generating unit is configured to generate a second audio signal over the target period from a result obtained upon the analysis processing unit making the plurality of first periods correspond to the plurality of second periods. In the aspect described above, a first period is made to correspond to each second period within the target period such that the allocation cost corresponding to the similarity index between each first period is minimized. That is, a section of the first audio signal in which the feature quantity is steadily maintained on the time axis and a section in which the fluctuation of the feature quantity is repeated are expanded/compressed on the time axis, and sections in which a fluctuation of the feature quantity does not resemble that of other sections are excluded from the subject of expansion/compression. Thus, for example, compared to a configuration in which the first audio signal is evenly expanded/compressed over all the sections including both a steady section in which a feature quantity is steadily maintained and a transient section in which the feature quantity fluctuates unsteadily, it is possible to expand/compress the audio signal while maintaining auditory naturalness. In addition, a first period is made to correspond to each second period within the target period in relation to the transition cost for transitioning between each of the first periods. Therefore, transitions between first periods that are excessively divergent on the time axis are restricted. Consequently, it is possible to realize the above-described effect of being able to expand/compress the audio signal while maintaining auditory naturalness.