AUDIO SIGNAL PROCESSING DEVICE, AUDIO SIGNAL PROCESSING METHOD, AND AUDIO SIGNAL PROCESSING PROGRAM

Abstract

An audio signal processing device comprises a discontinuity detector configured to determine an occurrence of a discontinuity from a sudden increase of an amplitude of decoded audio obtained by decoding the first audio packet which is received correctly after an occurrence of a packet loss, and a discontinuity corrector for correcting the discontinuity of the decoded audio.

Claims

1. An audio signal processing method executed by an audio signal processing device, comprising: determining an occurrence of a discontinuity occurring with a sudden increase of an amplitude of a decoded audio obtained by decoding an audio packet received correctly after an occurrence of a packet loss, and correcting the discontinuity of the decoded audio, wherein correcting the discontinuity of the decoded audio comprises causing distances between ISF/LSF parameters corresponding to a frame in which a packet loss has occurred to be equal.

2. The audio signal processing method according to claim 1, wherein determining an occurrence of a discontinuity comprises: a decoding step of decoding side information about a discontinuity of a decoded audio obtained by decoding an audio packet, the side information being transmitted from an encoder, and a determining step of determining the discontinuity of the decoded audio using the side information decoded in the decoding step.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0091] FIG. 1 is a configuration diagram of the audio decoder.

[0092] FIG. 2 is a processing flow of the audio decoder.

[0093] FIG. 3 is a functional configuration diagram of the audio code decoder.

[0094] FIG. 4 is a functional configuration diagram of the LP coefficient calculator.

[0095] FIG. 5 is a processing flow of calculating the LP coefficients.

[0096] FIG. 6 is a functional configuration diagram of the concealment signal generator.

[0097] FIG. 7 is a configuration diagram of the audio decoder of Patent Literature 2.

[0098] FIG. 8 is a functional configuration diagram of the concealment signal generator of Patent Literature 2.

[0099] FIG. 9 is a functional configuration diagram of the audio code decoder in a first embodiment.

[0100] FIG. 10 is a processing flow of the LP coefficient calculator in the first embodiment.

[0101] FIG. 11 is a functional configuration diagram of the audio code decoder in the first embodiment.

[0102] FIG. 12 is a processing flow of a second stability processor in modification example 1 of the first embodiment.

[0103] FIG. 13 is a functional configuration diagram of the audio code decoder in a second embodiment.

[0104] FIG. 14 is a functional configuration diagram of the LP coefficient calculator in the second embodiment.

[0105] FIG. 15 is a processing flow of calculation of the LP coefficients in the second embodiment.

[0106] FIG. 16 is a configuration diagram of an audio encoder in fourth embodiment.

[0107] FIG. 17 is a configuration diagram of the audio encoder in the fourth embodiment.

[0108] FIG. 18 is a configuration diagram of an LP analyzer/encoder in the fourth embodiment.

[0109] FIG. 19 is a processing flow of the LP analyzer/encoder in the fourth embodiment.

[0110] FIG. 20 is a functional configuration diagram of the audio code decoder in the fourth embodiment.

[0111] FIG. 21 is a processing flow of the LP coefficient calculator in the fourth embodiment.

[0112] FIG. 22 is a configuration diagram of the LP analyzer/encoder in the fifth embodiment.

[0113] FIG. 23 is a processing flow of the LP analyzer/encoder in the fifth embodiment.

[0114] FIG. 24 is a functional configuration diagram of the audio code decoder in the fourth embodiment.

[0115] FIG. 25 is a processing flow of the LP coefficient calculator in the fifth embodiment.

[0116] FIG. 26 is a configuration diagram of the audio decoder in the seventh embodiment.

[0117] FIG. 27 is a processing flow of the audio decoder in the seventh embodiment.

[0118] FIG. 28 is a functional configuration diagram of the audio code decoder in the seventh embodiment.

[0119] FIG. 29 is a processing flow of calculation of the LP coefficients in the seventh embodiment.

[0120] FIG. 30 is a drawing showing a hardware configuration example of a computer.

[0121] FIG. 31 is an appearance diagram of the computer.

[0122] FIG. 32(a), (b), (c), and (d) are drawings showing various examples of audio signal processing programs.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0123] Preferred embodiments of an audio signal processing device, an audio signal processing method, and an audio signal processing program according to the present invention will be described below in detail using the drawings. The same elements will be denoted by similar reference signs in the description of the drawings to avoid duplicate descriptions.

First Embodiment

[0124] The audio signal processing device in the first embodiment has the same configuration as the aforementioned audio decoder 1 shown in FIG. 1 and has a novel feature in the audio code decoder, and thus the audio code decoder will be described below.

[0125] FIG. 9 is a diagram showing a functional configuration of an audio code decoder 12A in the first embodiment, and FIG. 10 shows a flowchart of the LP coefficient calculation process. The audio code decoder 12A shown in FIG. 9 is configured by adding a discontinuity detector 129 to the aforementioned configuration of FIG. 3. Since the present embodiment differs from the conventional technology only in the LP coefficient calculation process, the operations of respective parts associated with the LP coefficient calculation process will be described below.

[0126] A discontinuity detector 129 refers to a fixed codebook gain g.sub.c.sup.0 acquired by decoding and a fixed codebook gain g.sub.c.sup.−1 included in the internal states and compares a change of the gain with a threshold in accordance with the following equation (step S11 in FIG. 10).

log(g.sub.c.sup.0)−log(g.sub.c.sup.−1)>Thres [Mathematical Equation 43]

[0127] When the gain change exceeds the threshold, the detector detects an occurrence of a discontinuity (also referred to hereinafter simply as “detects a discontinuity”) and outputs a control signal indicating a detection result of a discontinuity occurrence to the stability processor 121.

[0128] The following equation may be used for the comparison between the gain change and the threshold.

g.sub.c.sup.0−g.sub.c.sup.−1>Thres [Mathematical Equation 43]

[0129] Furthermore, the comparison between the gain change and the threshold may be made by the following equation, where a g.sub.c.sup.(c) represents the maximum among the fixed codebook gains of the first to fourth subframes included in the current frame and a g.sub.c.sup.(p) represents the minimum among the fixed codebook gains included in the internal states.

log(g.sub.c.sup.(c))−log(g.sub.c.sup.(p))>Thres [Mathematical Equation 44]

[0130] The flowing equation can also be used.

g.sub.c.sup.(c)−g.sub.c.sup.(p)>Thres [Mathematical Equation 45]

[0131] The above example of the first embodiment shows an example in which a discontinuity detection is conducted using the fixed codebook gain g.sub.c.sup.−1 of the fourth subframe of the immediately preceding frame (lost frame) and the fixed codebook gain g.sub.c.sup.0 of the first subframe of the current frame. However, comparison between the gain change and the threshold may be made using averages calculated from the fixed codebook gains included in the internal states and the fixed codebook gains included in the current frame.

[0132] The ISF decoder 120 performs the same operation as in the conventional technology (step S12 in FIG. 10).

[0133] The stability processor 121 corrects the ISF parameters by the following process when the discontinuity detector 129 detects a discontinuity (step S13 in FIG. 10).

[0134] First, the stability processor 121 subjects the ISF parameters

{dot over (ω)}.sub.i.sup.−1 [Mathematical Equation 46]

stored in the internal state buffer 14 to a process of expanding a distance between two adjacent element to become M.sub.−1 times wider than the ordinary distance. The process of placing a very wide distance than the ordinary distance provides an effect to suppress excessive peaks and dips in the spectrum envelope. Here, min_dist represents the minimum ISF distance, and isf_min represents the minimum of ISF necessary for securing the distance of min_dist. isf_min is successively updated by adding the distance of min_dist to a value of neighboring ISF. On the other hand, isf_max is the maximum of ISF necessary for securing the distance of min_dist. isf_max is successively updated by subtracting the distance of min_dist from a value of neighboring ISF.

isf_min=min_dist=50.sub.−1

for i=0 to 14

if {dot over (ω)}.sub.i<isf_min then {dot over (ω)}.sub.i.sup.−1=isf_min

isf_min={dot over (ω)}.sub.i.sup.−1+min_dist

isf_max=6400−min_dist

if {dot over (ω)}.sub.14.sup.−1>isf_max

for i=14 down to 1

if {dot over (ω)}.sub.i.sup.−1>isf_max then {dot over (ω)}.sub.i.sup.−1=isf_max

isf_max={dot over (ω)}.sub.i.sup.−1−min_dist [Mathematical Equation 47]

[0135] Next, a stability processor 121 subjects the ISF parameters of the current frame to a process of expanding a distance between two adjacent element to become M.sub.0 times wider than the ordinary distance. 1<M.sub.0<M.sub.−1 is assumed herein, but it is also possible to set one of r M.sub.−1 and M.sub.0 to 1 and the other to a value larger than 1.

isf_min=min_dist=50.sub.0

for i=0 to 14

if {dot over (ω)}.sub.i.sup.0<isf_min then {dot over (ω)}.sub.i.sup.0=isf_min

isf_min={dot over (ω)}.sub.i.sup.0+min_dist

isf_max=6400−min_dist

if {dot over (ω)}.sub.14.sup.0>isf_max

for i=14 down to 1

if {dot over (ω)}.sub.i.sup.0>isf_max then {dot over (ω)}.sub.i.sup.0=isf_max

isf_max={dot over (ω)}.sub.i.sup.0−min_dist [Mathematical Equation 48]

[0136] Furthermore, the stability processor 121 performs the following process in the same manner as carried out in the ordinary decoding process, when the discontinuity detector detects no discontinuity.

isf_min=min_dist=50

for i=0 to 14

if {dot over (ω)}.sub.i.sup.0<isf_min then {dot over (ω)}.sub.i.sup.0=isf_min

isf_min={dot over (ω)}.sub.i.sup.0+min_dist

isf_max=6400−min_dist

if {dot over (ω)}.sub.14.sup.0>isf_max

for i=14 down to 1

if {dot over (ω)}.sub.i.sup.0>isf_max then {dot over (ω)}.sub.i.sup.0=isf_max

isf_max={dot over (ω)}.sub.i.sup.0−min_dist [Mathematical Equation 49]

[0137] The minimum distance placed between elements when a discontinuity is detected may be varied depending upon the frequency of ISF. The minimum distance placed between elements when a discontinuity is detected needs only to be different from the minimum distance placed between elements in the ordinary decoding process.

[0138] The ISF-ISP converter 122A in the LP coefficient calculator 122 converts the ISF parameters

{dot over (ω)}.sub.i custom-character {dot over (ω)}.sub.i.sup.−1 [Mathematical Equation 50]

into the ISP parameters

{dot over (q)}.sub.i custom-character {dot over (q)}.sub.i.sup.−1 [Mathematical Equation 51]

respectively, in accordance with the following equation (step S14 in FIG. 10). Here, C is a constant determined in advance.

{dot over (q)}.sub.i=cos(C.Math.{dot over (ω)}.sub.i) [Mathematical Equation 52]

[0139] The ISP interpolator 122B calculates the ISP parameters for the respective subframes from the past ISP parameters

{dot over (q)}.sub.i.sup.−1 [Mathematical Equation 53]

and the foregoing ISP parameters

{dot over (q)}.sub.i [Mathematical Equation 54]

in accordance with the following equation (step S15 in FIG. 10). Other coefficients may be used for the interpolation.

q.sub.i.sup.(1)=0.75.Math.{dot over (q)}.sub.i.sup.−1+0.25.Math.{dot over (q)}.sub.i

q.sub.i.sup.(2)=0.5.Math.{dot over (q)}.sub.i.sup.−1+0.5.Math.{dot over (q)}.sub.i

q.sub.i.sup.(3)=0.25.Math.{dot over (q)}.sub.i.sup.−1+0.75.Math.{dot over (q)}.sub.i

q.sub.i.sup.(4)={dot over (q)}.sub.i [Mathematical Equation 55]

[0140] The ISP-LPC converter 122C converts the ISP parameters for the respective subframes into the LP coefficients

{dot over (a)}.sub.i.sup.j(0<i≤P custom-character 0≤j<4) [Mathematical Equation 56]

(step S16 in FIG. 10). Here, the number of subframes included in a look-ahead signal was assumed to be 4, but the number of subframes may differ depending upon the design principle. A specific conversion procedure to be used can be the processing procedure described in Non Patent Literature 1.

[0141] Furthermore, the ISF-ISP converter 122A updates the ISF parameters stored in the internal state buffer 14

{dot over (ω)}.sub.i.sup.−1 [Mathematical Equation 57]

in accordance with the following equation.

{dot over (ω)}.sub.i.sup.−1={dot over (ω)}.sub.i.sup.0 [Mathematical Equation 58]

At this time, even when a discontinuity is detected, the ISF-ISP converter 122A may carry out the below procedure to update the ISF parameters

{dot over (ω)}.sub.i.sup.−1 [Mathematical Equation 59]

stored in the internal state buffer, using the calculation result of the ISF parameters.

isf_min=min_dist=50

for i=0 to 14

if {dot over (ω)}.sub.i.sup.0<isf_min then {dot over (ω)}.sub.i.sup.0=isf_min

isf_min={dot over (ω)}.sub.i.sup.0+min_dist

isf_max=6400−min_dist

if {dot over (ω)}.sub.14.sup.0>isf_max

for i=14 down to 1

if {dot over (ω)}.sub.i.sup.0>isf_max then {dot over (ω)}.sub.i.sup.0=isf_max

isf_max={dot over (ω)}.sub.i.sup.0−min_dist [Mathematical Equation 60]

[0142] As in the above first embodiment, a discontinuity of decoded audio can be determined with the quantized codebook gains used in the calculation of the excitation signal and the ISF/LSF parameters (e.g., the distance between elements of the ISF/LSF parameters given for ensuring stability of the synthesis filter) can be corrected according to a result of the determination for a discontinuity. This reduces the discontinuity of audio which can occur upon recovery from a packet loss at the audio start point, and thereby improves the subjective quality.

Modification Example of First Embodiment

[0143] FIG. 11 is a diagram showing a functional configuration of an audio code decoder 12S according to a modification example of the first embodiment. Since it differs from the configuration of the conventional technology shown in FIG. 3 only in the discontinuity detector 129 and the second stability processor 121S, the operations of these will be described. The second stability processor 121S has a gain adjustor 121X and a gain multiplier 121Y, and a processing flow of the second stability processor 121S is shown in FIG. 12.

[0144] The discontinuity detector 129 refers to the fixed codebook gain g.sub.c.sup.0 obtained by decoding and the fixed codebook gain g.sub.c.sup.−1 included in the internal states and compares the gain change with a threshold, in the same manner as performed by the discontinuity detector 129 in the first embodiment. Then, the discontinuity detector 129 sends to the gain adjustor 121X, a control signal including information about whether the gain change exceeds the threshold.

[0145] The gain adjustor 121X reads from the control signal the information about whether the gain change exceeds the threshold, and, when the gain change exceeds the threshold, it outputs a predetermined gain g.sub.on to the gain multiplier 121Y. On the other hand, when the gain change does not exceed the threshold, the gain adjustor 121X outputs a predetermined gain g.sub.off to the gain multiplier 121Y. This operation of the gain adjustor 121X corresponds to step S18 in FIG. 12.

[0146] The gain multiplier 121Y multiplies the synthesized signal output from the synthesis filter 128 by the foregoing gain g.sub.on or gain g.sub.off (step S19 in FIG. 12) and outputs the resultant decoded signal.

[0147] Here, the audio code decoder may be configured such that the LP coefficient calculator 122 outputs the LP coefficients or the ISF parameters to feed them to the second stability processor 121S (as indicated by a dotted line from the LP coefficient calculator 122 to the gain adjustor 121X in FIG. 11). In this case, the gains to be multiplied are determined using the LP coefficients or the ISF parameters calculated by the LP coefficient calculator 122.

[0148] By adding the second stability processor 121S to the audio code decoder 12S and adjusting the gain, depending upon whether the gain change exceeds the threshold as described in the above modification example, an appropriate decoded signal can be obtained.

[0149] The second stability processor 121S may be configured to multiply the excitation signal by the foregoing calculated gain and output the result to the synthesis filter 128.

Second Embodiment

[0150] An audio signal processing device according to the second embodiment has the same configuration as that of the aforementioned audio decoder 1 in FIG. 1 and has a novel feature in an audio code decoder, and thus the audio code decoder will be described below. FIG. 13 shows an exemplary functional configuration of the audio code decoder 12B, FIG. 14 shows an exemplary functional configuration associated with the calculation process of the LP coefficients, and FIG. 15 shows a flow of the calculation process of the LP coefficients. The audio code decoder 12B in FIG. 13 is configured by adding the discontinuity detector 129 to the aforementioned configuration shown in FIG. 3.

[0151] The ISF decoder 120 calculates the ISF parameters in the same manner as performed in the conventional technology (step S21 in FIG. 15).

[0152] The stability processor 121 performs the process of placing a distance of not less than 50 Hz between elements of the ISF parameters

{dot over (ω)}.sub.i [Mathematical Equation 61]

in order to secure the stability of the filter in the same manner as performed in the conventional technology (step S22 in FIG. 15).

[0153] The ISF-ISP converter 122A converts the ISF parameters output by the stability processor 121 into the ISP parameters in the same manner as performed in the first embodiment (step S23 in FIG. 15).

[0154] The ISP interpolator 122B, in the same manner as performed in the first embodiment (step S24 in FIG. 15), calculates the ISP parameters for the respective subframes from the past ISP parameters

{dot over (q)}.sub.i.sup.−1 [Mathematical Equation 62]

and the ISP parameters

{dot over (q)}.sub.i [Mathematical Equation 63]

obtained by the conversion by the ISF-ISP converter 122A.

[0155] The ISP-LPC converter 122C, in the same manner as performed in the first embodiment (step S25 in FIG. 15), converts the ISP parameters for the respective subframes into the LP coefficients

{dot over (a)}.sub.i.sup.j(0<i≤P custom-character 0≤j<4) [Mathematical Equation 64]

Here, the number of subframes included in the look-ahead signal is assumed to be 4, but the number of subframes may differ depending upon the design principle.

[0156] The internal state buffer 14 updates the ISF parameters stored in the past with the new ISF parameters.

[0157] The discontinuity detector 129 reads the LP coefficients of the fourth subframe in the lost packet frame from the internal state buffer 14 and calculates the power of the impulse response of the LP coefficients of the fourth subframe in the lost packet frame. The LP coefficients of the fourth subframe in the lost packet frame to be used can be the coefficients output by the LP coefficient interpolator 130 included in the concealment signal generator 13 shown in FIG. 6 and accumulated in the internal state buffer 14 upon the packet loss.

E.sub.−1=10 log(Σ.sub.n=0.sup.L′−1h.sub.−1.sup.2(n))

h.sub.−1(n)=δ(n)−Σ.sub.i=1.sup.P{dot over (a)}.sub.i.sup.(−1).Math.h.sub.−1(n−i) [Mathematical Equation 65]

[0158] Then, the discontinuity detector 129 detects a discontinuity, for example, by the below equation (step S26 in FIG. 15).

E.sub.0−E.sub.−1>Thres [Mathematical Equation 66]

[0159] When the gain change does not exceed the threshold (NO in step S27 of FIG. 15), the discontinuity detector 129 does not detect an occurrence of a discontinuity, and the ISP-LPC converter 122C outputs the LP coefficients and ends the processing. On the other hand, when the gain change exceeds the threshold (YES in step S27 of FIG. 15), the discontinuity detector 129 detects an occurrence of a discontinuity and sends a control signal indicative of a result of the detection for an occurrence of a discontinuity to the stability processor 121. When receiving the control signal, the stability processor 121 corrects the ISP parameters in the same manner as performed in the first embodiment (step S28 in FIG. 15). The subsequent operations of the ISF-ISP converter 122A, ISP interpolator 122B, and ISP-LPC converter 122C (steps S29, S2A, and S2B in FIG. 15) are the same as above.

[0160] As discussed in the above second embodiment, a discontinuity of decoded audio can be determined by the power of the excitation signal, and the discontinuous audio is reduced to improve the subjective quality in the same manner as performed in the first embodiment.

Third Embodiment

[0161] Upon a detection of discontinuity, the ISF parameters may be corrected by another method. The third embodiment differs from the first embodiment only in the stability processor 121, and thus only the operation of the stability processor 121 will be described.

[0162] When the discontinuity detector 129 detects a discontinuity, the stability processor 121 performs the following process to correct the ISF parameters.

[0163] With respect to the ISF parameters stored in the internal state buffer 14,

{dot over (ω)}.sub.i.sup.−1 [Mathematical Equation 67]

the stability processor 121 replaces the ISF parameters up to a low-order P′ dimension (0<P′≤P) in accordance with the below equation. Here, the following definition is adopted.

δ.sup.−1={dot over (ω)}.sub.P′−1.sup.−I/P′ [Mathematical Equation 68]

{dot over (ω)}.sub.i.sup.−1={dot over (ω)}.sub.i-1.sup.−I+δ.sup.−1

ω.sub.0.sup.−1=δ.sup.−1(0≤i≤P′) [Mathematical Equation 69]

[0164] The stability processor 121 may overwrite the ISF parameters of the low-order P′ dimensions with P′-dimension vectors obtained in advance by learning as follows.

{dot over (ω)}.sub.i.sup.−1=ω.sub.i.sup.0(0≤i≤P′) [Mathematical Equation 70]

[0165] Next, as to the ISF parameters of the current frame, the stability processor 121 may, as performed in the first embodiment, perform the process of expanding the distance between elements to become M.sub.0 times wider than the ordinary distance or may determine them in accordance with the below equation. Here, the following definition is adopted.

δ.sup.0={dot over (ω)}.sub.P′−1.sup.0/P′ [Mathematical Equation 71]

{dot over (ω)}.sub.i.sup.0={dot over (ω)}.sub.i-1.sup.0+δ.sup.0

{dot over (ω)}.sub.0.sup.0=δ.sup.0 [Mathematical Equation 72]

[0166] The stability processor 121 may overwrite them with P′-dimensional vectors learned in advance.

{dot over (ω)}.sub.i.sup.0=ω.sub.i.sup.0 [Mathematical Equation 73]

[0167] Furthermore, the foregoing P′-dimensional vectors may be learned in the decoding process or may be defined, for example, as follows.

ω.sub.i.sup.0=(1−λ)ω.sub.i.sup.−1+λ{dot over (ω)}.sub.i.sup.−1 [Mathematical Equation 74]

In a frame at the start of decoding, however, ω.sub.i.sup.−1 may be defined as predetermined P′-dimensional vector ω.sub.i.sup.init.

[0168] The internal state buffer 14 updates the ISF parameters stored in the past with the new ISF parameters.

[0169] As discussed in the above third embodiment, the distance obtained by equally dividing the ISF/LSF parameters into those of a predetermined dimension can be used as the distance between elements of the ISF/LSF parameters given for ensuring the stability of the synthesis filter, whereby the discontinuous audio is reduced to improve the subjective quality as performed in the first and second embodiments.

Fourth Embodiment

[0170] A fourth embodiment will be described in which the encoding side detects an occurrence of a discontinuity and transmits a discontinuity determination code (indicative of a detection result) as included in audio codes to the decoding side and also in which the decoding side determines the operation of the stability process, based on the discontinuity determination code included in the audio codes.

[0171] (Regarding Encoding Side)

[0172] FIG. 16 shows an exemplary functional configuration of the encoder 2, and FIG. 17 is a flowchart showing the processes performed in the encoder 2. As shown in FIG. 16, the encoder 2 has an LP analyzer/encoder 21, a residual encoder 22, and a code multiplexer 23.

[0173] An exemplary functional configuration of the LP analyzer/encoder 21 among them is shown in FIG. 18, and a flowchart showing the processes performed in the LP analyzer/encoder 21 is shown in FIG. 19. As shown in FIG. 18, the LP analyzer/encoder 21 has an LP analyzer 210, an LP-ISF converter 211, an ISF encoder 212, a discontinuity determiner 213, an ISF concealer 214, an ISF-LP converter 215, and an ISF buffer 216.

[0174] In the LP analyzer/encoder 21, the LP analyzer 210 performs a linear prediction analysis on an input signal to obtain linear prediction coefficients (step T41 in FIG. 17 and step U41 in FIG. 18). For the calculation of linear prediction coefficients, an autocorrelation function is first calculated from the audio signal, and then the Levinson-Durbin algorithm or the like can be applied.

[0175] The LP-ISF converter 211 converts the calculated linear prediction coefficients into the ISP parameters in the same manner as performed in the first embodiment (steps T42, U42). The conversion from linear prediction coefficients into ISF parameters may be implemented by use of the method described in the Non Patent Literature.

[0176] The ISF encoder 212 encodes the ISF parameters using a predetermined method to calculate ISF codes (steps T43, U43) and outputs quantized ISF parameters obtained in the process of encoding to the discontinuity determiner 213, the ISF concealer 214, and the ISF-LP converter 215 (step U47). Here, the quantized ISF parameters are equal to the ISF parameters obtained by an inverse quantization of the ISF codes. A method of encoding may be vector-encoding, or encoding by a vector quantization or the like of error vectors from ISFs of the immediately preceding frame and mean vectors determined in advance by learning.

[0177] The discontinuity determiner 213 encodes a discontinuity determination flag stored in an internal buffer (not shown) built in the discontinuity determiner 213 and outputs a resultant discontinuity determination code (step U47). In addition, the discontinuity determiner 213 uses concealment ISF parameters

{tilde over (ω)}.sub.i [Mathematical Equation 75]

read from the ISF buffer 216 and the quantized ISF parameters

{dot over (ω)}.sub.i [Mathematical Equation 76]

to make a determination on a discontinuity in accordance with the below equation (steps T44, U46). Here, Thres.sub.ω represents a threshold determined in advance, and P′ an integer satisfying the following equation (0<P′≤P).

Σ.sub.i=0.sup.P′−1({dot over (ω)}.sub.i−{tilde over (ω)}.sub.i).sup.2>Thres.sub.ω [Mathematical Equation 77]

[0178] The example is described above in which the discontinuity determination is made using the Euclidean distances between the ISF parameters. However, the discontinuity determination may be made by other methods.

[0179] The ISF concealer 214 calculates the concealment ISF parameters from the quantized ISF parameters by the same process as performed by the decoder-side ISF concealer and outputs the resultant concealment ISF parameters to the ISF buffer 216 (steps U44, U45). The operation of the ISF concealment process may be performed by any method as long as it is the same process as that of the decoder-side packet loss concealer.

[0180] The ISF-LP converter 215 calculates quantized linear prediction coefficients by converting the foregoing quantized ISF parameters and outputs a resultant quantized linear prediction coefficients to the residual encoder 22 (step T45). A method used for converting the ISF parameters into the quantized linear prediction coefficients may be the method described in the Non Patent Literature.

[0181] The residual encoder 22 filters the audio signal by use of the quantized liner prediction coefficients to calculate residual signals (step T46).

[0182] Next, the residual encoder 22 encodes the residual signals by encoding means using CELP or TCX (Transform Coded Excitation) or by encoding means switchably using CELP and TCX and outputs resultant residual codes (step T47). Since the operation of the residual encoder 22 is less relevant to the present invention, description thereof is omitted herein.

[0183] The code multiplexer 23 assembles the ISF codes, the discontinuity determination code and the residual codes in a predetermined order and outputs resultant audio codes (step T48).

[0184] (Regarding Decoding Side)

[0185] An audio signal processing device according to the fourth embodiment has the same configuration as that of the aforementioned audio decoder 1 in FIG. 1 and has a novel feature in the audio code decoder, and thus the audio code decoder will be described below. FIG. 20 shows an exemplary functional configuration of an audio code decoder 12D, and FIG. 21 is a flowchart showing the process of calculating the LP coefficients. The audio code decoder 12D shown in FIG. 20 is configured by adding the discontinuity detector 129 to the aforementioned configuration shown in FIG. 3.

[0186] The ISF decoder 120 decodes the ISF codes and outputs resultant codes to the stability processor 121 and the internal state buffer 14 (step S41 in FIG. 21).

[0187] The discontinuity detector 129 decodes the discontinuity determination code and outputs a resultant discontinuity detection result to the stability processor 121 (step S42 in FIG. 21).

[0188] The stability processor 121 performs the stability process according to the discontinuity detection result (step S43 in FIG. 21). The processing procedure of the stability processor to be used can be the same method as executed in the first embodiment and the third embodiment.

[0189] The stability processor 121 may perform the stability process as described below, on the basis of other parameters included in the audio codes, in addition to the discontinuity detection result acquired from the discontinuity determination code. For example, the stability processor 121 may be configured to perform the stability process in such a manner that an ISF stability stab is calculated in accordance with the below equation and that when the ISF stability exceeds a threshold, even if the discontinuity determination code shows a detection of a discontinuity, the process is performed as if no discontinuity is detected. Here, C is a constant determined in advance.

stab=1.25−Σ.sub.i=0.sup.P′−1({dot over (ω)}.sub.i.sup.0−{dot over (ω)}.sub.i.sup.−1).sup.2/C [Mathematical Equation 78]

[0190] The ISF-ISP converter 122A in the LP coefficient calculator 122 converts the ISF parameters into the ISP parameters by the same processing procedure as performed in the first embodiment (step S44 in FIG. 21).

[0191] The ISP interpolator 122B calculates the ISP parameters for the respective subframes by the same processing procedure as performed in the first embodiment (step S45 in FIG. 21).

[0192] The ISP-LPC converter 122C converts the ISP parameters calculated for the respective subframes into the LPC parameters by the same processing procedure as performed in the first embodiment (step S46 in FIG. 21).

[0193] In the fourth embodiment as described above, the encoding side performs the discontinuity determination (the discontinuity determination using the Euclidian distances between concealment ISF parameters and quantized ISF parameters, as an example) encodes auxiliary information about a result of the determination and outputs encoded information to the decoding side, and the decoding side determine a discontinuity using the auxiliary information obtained by decoding. In this manner, the appropriate processing can be executed according to the discontinuity determination result made by the encoding side while the encoding side and the decoding side work in concert with each other.

Fifth Embodiment

[0194] (Regarding Encoding Side)

[0195] The functional configuration of the encoder is the same as that of the fourth embodiment shown in FIG. 16, and the processing flow of the encoder is the same as the processing flow of the fourth embodiment shown in FIG. 17. The below will describe the LP analyzer/encoder according to the fifth embodiment which is different from that in the fourth embodiment.

[0196] FIG. 22 shows an exemplary functional configuration of the LP analyzer/encoder, and FIG. 23 shows a flow of the processes performed by the LP analyzer/encoder. As shown in FIG. 22, the LP analyzer/encoder 21S has the LP analyzer 210, the LP-ISF converter 211, the ISF encoder 212, the discontinuity determiner 213, the ISF concealer 214, the ISF-LP converter 215, and the ISF buffer 216.

[0197] In this LP analyzer/encoder 21S, the LP analyzer 210 performs the linear prediction analysis on the input signal by the same process as performed in the fourth embodiment to obtain the linear prediction coefficients (step U51 in FIG. 23).

[0198] The LP-ISF converter 211 converts the calculated linear prediction coefficients into the ISF parameters by the same process as performed in the fourth embodiment (step U52 in FIG. 23). The method described in the Non Patent Literature may be used for the conversion from the linear prediction coefficients into the ISF parameters.

[0199] The ISF encoder 212 reads the discontinuity determination flag stored in the internal buffer (not shown) of the discontinuity determiner 213 (step U53 in FIG. 23).

[0200] The ISF encoder 212 calculates the ISF codes by vector-quantization of ISF residual parameters r.sub.i calculated by the below equation (step U54 in FIG. 23). Here, the ISF parameters calculated by the LP-ISF converter are denoted by ω.sub.i and mean vectors, which are mean.sub.i, obtained in advance by learning.

r.sub.i=ω.sub.i=mean.sub.i [Mathematical Equation 79]

[0201] Next, the ISF encoder 212 uses the quantized ISF residual parameters

{circumflex over (r)}.sub.i [Mathematical Equation 80]

obtained by quantization of the ISF residual parameters r.sub.i to update the ISF residual parameter buffer in accordance with the following equation (step U55 in FIG. 23).

{dot over (r)}.sub.i.sup.−1={circumflex over (r)}.sub.i [Mathematical Equation 81]

[0202] The ISF encoder 212 calculates the ISF codes by vector-quantization of the ISF residual parameters r.sub.i calculated by the below equation (step U54 in FIG. 23). Here, the ISF residual parameters obtained by decoding in the immediately preceding frame are denoted as follows.

{dot over (r)}.sub.i.sup.−1 [Mathematical Equation 82]

r.sub.i=ω.sub.i−mean.sub.i−⅓{dot over (r)}.sub.i.sup.−1 [Mathematical Equation 83]

[0203] Next, the ISF encoder 212 uses the quantized ISF residual parameters

{circumflex over (r)}.sub.i [Mathematical Equation 84]

obtained by quantization of the ISF residual parameters r.sub.i to update the ISF residual parameter buffer in accordance with the following equation (step U55 in FIG. 23).

{dot over (r)}.sub.i.sup.−1={circumflex over (r)}.sub.i [Mathematical Equation 85]

[0204] By the above procedure, the ISF encoder 212 calculates the ISF codes and outputs quantized ISF parameters obtained in the process of encoding to the discontinuity determiner 213, the ISF concealer 214, and the ISF-LP converter 215.

[0205] The ISF concealer 214 calculates the concealment ISF parameters from the quantized ISF parameters by the same process as performed by the decoder-side ISF concealer in the same manner as executed in the fourth embodiment and outputs them to the ISF buffer 216 (steps U56, U58 in FIG. 23). The operation of the ISF concealment process may be performed by any method as long as it is the same process as that of the decoder-side packet loss concealer.

[0206] The discontinuity determiner 213 performs a determination of a discontinuity by the same process as performed in the fourth embodiment and stores a determination result in the internal buffer (not shown) of the discontinuity determiner 213 (step U57 in FIG. 23).

[0207] The ISF-LP converter 215 converts the quantized ISF parameters, in the same manner as performed in the fourth embodiment, to calculate the quantized linear prediction coefficients and outputs them to the residual encoder 22 (FIG. 16) (step U58 in FIG. 23).

[0208] (Regarding Decoding Side)

[0209] An audio signal processing device according to the fifth embodiment has the same configuration as that of the aforementioned audio decoder 1 in FIG. 1 and has a novel feature in the audio code decoder, and thus the audio code decoder will be described below. FIG. 24 shows an exemplary functional configuration of the audio code decoder 12E, and FIG. 25 shows a flow of the calculation process performed by the LP coefficients. The audio code decoder 12E shown in FIG. 24 is configured by adding the discontinuity detector 129 to the aforementioned configuration shown in FIG. 3.

[0210] The discontinuity detector 129 decodes the discontinuity determination code and outputs the resultant discontinuity determination flag to the ISF decoder 120 (step S51 in FIG. 25).

[0211] The ISF decoder 120 calculates the ISF parameters as follows, depending upon the value of the discontinuity determination flag, and outputs the ISF parameters to the stability processor 121 and the internal state buffer 14 (step S52 in FIG. 25).

[0212] The ISF decoder 120 uses the quantized ISF residual parameters

{dot over (r)}.sub.i [Mathematical Equation 86]

obtained by decoding of the ISF codes, and the mean vectors mean.sub.i obtained in advance by learning to obtain the quantized ISF parameters

{dot over (ω)}.sub.i [Mathematical Equation 87]

in accordance with the following equation.

{dot over (ω)}.sub.i=mean.sub.i+{dot over (r)}.sub.i [Mathematical Equation 88]

[0213] Next, the ISF decoder 120 updates the ISF residual parameters stored in the internal state buffer 14 in accordance with the following equation.

{dot over (r)}.sub.i.sup.−1={dot over (r)}.sub.i [Mathematical Equation 89]

[0214] The ISF decoder 120 reads, from the internal state buffer 14, the ISF residual parameters

{dot over (r)}.sub.i.sup.−1 [Mathematical Equation 90]

obtained by decoding of the immediately preceding frame and uses the resultant ISF residual parameters

{dot over (r)}.sub.i.sup.−1, [Mathematical Equation 91]

the mean vectors mean.sub.i obtained in advance by learning and the quantized ISF residual parameters

{dot over (r)}.sub.i [Mathematical Equation 92]

obtained by decoding of the ISF codes to calculate the quantized ISF parameters

{dot over (ω)}.sub.i [Mathematical Equation 93]

in accordance with the following equation.

{dot over (ω)}.sub.i=mean.sub.i+{dot over (r)}.sub.i+⅓{dot over (r)}.sub.i.sup.−1 [Mathematical Equation 94]

[0215] Next, the ISF decoder 120 updates the ISF residual parameters stored in the internal state buffer 14 in accordance with the following equation.

{dot over (r)}.sub.i.sup.−1={dot over (r)}.sub.i [Mathematical Equation 95]

[0216] The stability processor 121 performs the same process as performed in the first embodiment (step S53 in FIG. 25) when a discontinuity is not detected.

[0217] The ISF-ISP converter 122A in the LP coefficient calculator 122 converts the ISF parameters into the ISP parameters by the same processing procedure as described in the first embodiment (step S54 in FIG. 25).

[0218] The ISP interpolator 122B calculates the ISP parameters for the respective subframes by the same processing procedure as performed in the first embodiment (step S55 in FIG. 25).

[0219] The ISP-LPC converter 122C, by the same processing procedure as performed in the first embodiment (step S56 in FIG. 25), converts the ISP parameters calculated for the respective subframes into the LPC parameters.

[0220] In the fifth embodiment as described above, the encoding side is configured as follows: When the discontinuity determination flag does not indicate a detection of a discontinuity, the vector quantization of the ISF residual parameters is carried out using the ISF residual parameters obtained by decoding of the immediately preceding frame. On the other hand, when the discontinuity determination flag indicates a detection of a discontinuity, the encoder avoids using the ISF residual parameters obtained by decoding of the immediately preceding frame. Similarly, the decoding side is configured as follows: When the discontinuity determination flag does not indicate a detection of a discontinuity, the quantized ISF parameters are calculated using the ISF residual parameters obtained by decoding of the immediately preceding frame. On the other hand, when the discontinuity determination flag indicates a detection of discontinuity, the decoder avoids using the ISF residual parameters obtained by decoding of the immediately preceding frame. In this manner, the appropriate processing according to a discontinuity determination result can be executed while the encoding side and the decoding side work in concert with each other.

Sixth Embodiment

[0221] The above first to fifth embodiments may be applied in combination. For example, as described in the fourth embodiment, the decoding side decodes the discontinuity determination code included in the audio codes from the encoding side to detect a discontinuity. When a discontinuity is detected, it may carry out the subsequent operation as follows.

[0222] For the ISF parameters

{dot over (ω)}.sub.i.sup.−1 [Mathematical Equation 96]

stored in the internal state buffer, the ISF parameters up to the low-degree P′ dimension (0<P′≤P) are replaces in accordance with the following equation as described in the third embodiment.

{dot over (ω)}.sub.i.sup.−1=ω.sub.i.sup.0(0≤i<P′) [Mathematical Equation 97]

[0223] On the other hand, the ISF parameters of the current frame are calculated in accordance with the following equation as described in the fifth embodiment.

{dot over (ω)}.sub.i=mean.sub.i+{dot over (r)}.sub.i [Mathematical Equation 98]

[0224] Thereafter, using the ISF parameters obtained as described above, the LP coefficients are obtained by the processes of the ISF-ISP converter 122A, the ISP interpolator 122B, and the ISP-LPC converter 122C as performed in the first embodiment.

[0225] It is also effective to adopt optional combinations of the first to fifth embodiments as described above.

Seventh Embodiment

[0226] It may be considered in the decoding operation according to the above first to sixth embodiments and their modifications, how the frame is lost (e.g., whether a single frame is lost or consecutive frames are lost). In the seventh embodiment, it suffices that a discontinuity detection is made using, for example, the result of decoding of the discontinuity determination code included in the audio codes, and the method of how it should be performed is not limited to the above.

[0227] An audio signal processing device according to the seventh embodiment has the same configuration as that of the aforementioned audio decoder 1 in FIG. 1 and has a novel feature in the audio code decoder, and thus the audio code decoder will be described below.

[0228] FIG. 26 shows an exemplary configuration of the audio decoder 1S according to the seventh embodiment, and FIG. 27 shows a flowchart of the processes performed in the audio decoder. As shown in FIG. 26, in addition to the aforementioned audio code decoder 12G, the concealment signal generator 13 and the internal state buffer 14, the audio decoder 1S has a reception state determiner 16 that determines packet reception states in some past frames and stores a packet loss history.

[0229] The reception state determiner 16 determines a packet reception state and updates the packet loss history information, based on a determination result (step S50 in FIG. 27).

[0230] When a packet loss is detected (NO in step S100), the reception state determiner 16 outputs a packet loss detection result of the pertinent frame to the concealment signal generator 13, and the concealment signal generator 13 generates the concealment signal as described above and updates the internal states (steps S300, S400). The concealment signal generator 13 may also utilize the packet loss history information for interpolation of parameters or the like.

[0231] On the other hand, when no packet loss is detected (YES in step S100), the reception state determiner 16 outputs the packet loss history information including a packet loss detection result of the pertinent frame and the audio codes included in the received packet to the audio code decoder 12, and the audio code decoder 12 decodes the audio codes as described before and updates the internal states (steps S200, S400).

[0232] Thereafter, the processes of steps S50 to S400 are repeated until the communication ends (or until step S500 results in a determination of YES).

[0233] FIG. 28 shows an exemplary functional configuration of the audio code decoder 12G, and FIG. 29 shows a flowchart of the calculation processes performed by the LP coefficients. An example will be described below using the packet loss history information only for the LP coefficient calculator 122, but the audio code decoder may be configured to use the packet loss history information for other constitutive elements.

[0234] Since the audio code decoder 12G has the same configuration as described in the first embodiment, except for the configuration associated with the calculation process of LP coefficients, the below will describe the configuration and its operation associated with the calculation process of LP coefficients.

[0235] The ISF decoder 120 decodes the ISF codes in the same manner as performed in the first embodiment and outputs the ISF parameters to the stability processor 121 (step S71 in FIG. 29).

[0236] The discontinuity detector 129 refers to the packet loss history information to determine the reception state (step S72). The discontinuity detector 129 may be designed, for example, as follows: It stores a specific reception pattern which indicates, for example, a packet loss occurred three frames prior, a normal reception occurred two frames prior, and a packet loss occurred one frame prior. When the reception pattern is recognized which has been looked for, it sets a reception state flag to off and, otherwise, it sets the reception state flag to on.

[0237] Furthermore, the discontinuity detector 129 detects a discontinuity in the same manner as described in one of the first to sixth embodiments.

[0238] Then, the stability processor 121 performs the stability process according to the reception state flag and a result of the discontinuity detection, for example, as described below (step S73).

[0239] When the reception state flag is off, the stability processor 121 performs the same process as performed when a discontinuity is not detected, regardless of a result of the discontinuity detection.

[0240] On the other hand, when the reception flag is on and when the result of the discontinuity detection indicates that a discontinuity is not detected, the stability processor 121 performs the same process as performed when a discontinuity is not detected.

[0241] Furthermore, when the reception flag is on and when the result of the discontinuity detection is detection of discontinuity, the stability processor 121 performs the same process as performed when a discontinuity is detected.

[0242] Thereafter, the operations (steps S74 to S76) of the ISF-ISP converter 122A, the ISP interpolator 122B, and the ISP-LPC converter 122C in the LP coefficient calculator 122 are performed in the same manners as performed in the first embodiment.

[0243] In the seventh embodiment as described above, the stability process is carried out depending upon a result of the discontinuity detection and the state of the reception state flag, whereby more accurate processing can be executed while it is considered how the frame is lost (e.g., whether a single frame is lost or consecutive frames are lost).

[0244] [Regarding Audio Signal Processing Programs]

[0245] The below will describe audio signal processing programs that program a computer to operate as an audio signal processing device according to the present invention.

[0246] FIG. 32 is a drawing showing various exemplary configurations of the audio signal processing programs. FIG. 30 is an exemplary hardware configuration of the computer, and FIG. 31 shows a schematic view of a computer. Audio signal processing programs P1-P4 (which will be referred to hereinafter generally as “audio signal processing program P”) shown in FIG. 32(a) to (d), respectively, can program the computer C10 shown in FIGS. 31 and 32 to operate as an audio signal processing device. It should be noted that the audio signal processing program P described in the present specification can be implemented not only on the computer as shown in FIGS. 31 and 32 but also on any information processing device such as a cell phone, a personal digital assistance, or a portable personal computer.

[0247] The audio signal processing program P can be provided in a form stored in a recording medium M. Examples of the recording medium M include recording media such as flexible disc, CD-ROM, DVD, or ROM, semiconductor memories, and so on.

[0248] As shown in FIG. 30, the computer C10 has a reading device C12 such as a flexible disc drive unit, a CD-ROM drive unit, or a DVD drive unit, a working memory (RAM) C14, a memory C16 for storing a program stored in the recording medium M, a display C16, a mouse C20 and a keyboard C22 as input devices, a communication device C24 for executing transmission/reception of data or the like, and a central processing unit (CPU) C26 for controlling execution of the program.

[0249] When the recording medium M is put into the reading device C12, the computer C10 becomes accessible to the audio signal processing program P stored in the recording medium M through the reading device C12 and becomes able to operate as an audio signal processing device programmed by the audio signal processing program P.

[0250] The audio signal processing program P may be one provided as computer data signal W superimposed on a carrier wave, as shown in FIG. 31, transmitted through a network. In this case, the computer C10 stores the audio signal processing program P received by the communication device C24 into the memory C16 and then can execute the audio signal processing program P.

[0251] The audio signal processing program P can be configured by adopting the various configurations shown in FIG. 32(a) to (d). For example, the audio signal processing program P1 shown in FIG. 32(a) has a discontinuity detection module P11 and a discontinuity correction module P12. The audio signal processing program P2 shown in FIG. 32(b) has an ISF/LSF quantization module P21, an ISF/LSF concealment module P22, a discontinuity detection module P23, and an auxiliary information encoding module P24. The audio signal processing program P3 shown in FIG. 32(c) has a discontinuity detection module P31, an auxiliary information encoding module P32, and an ISF/LSF quantization module P33. The audio signal processing program P4 shown in FIG. 32(d) has an auxiliary information decoding module P41, a discontinuity correction module P42, and an ISF/LSF decoding module P43.

[0252] By implementing the various embodiments described above, the subjective quality can be improved while reducing a discontinuous audio which can occur in the recovery from a packet loss at the audio start point.

[0253] The stability processor, which is the first feature of the invention, is configured so that when a discontinuity is detected in the first packet which is received correctly after a packet loss occurs, for example, a distance between elements of the ISF parameters is set wider than normal, whereby it can prevent the gain of the LP coefficients from becoming too large. Since it can prevent both the gain of the LP coefficient and the power of the excitation signal from increasing, a discontinuity of the synthesized signal is reduced, whereby a degradation of the subjective quality can be suppressed. Furthermore, the stability processor may reduce a discontinuity of the synthesized signal by multiplying the synthesized signal by the gain calculated by using the LP coefficients or the like.

[0254] The discontinuity detector, which is the second feature of the invention, monitors the gain of the excitation signal included in the first packet which is received correctly after a packet loss occurs, and determines a discontinuity for a packet whose gain of the excitation signal increased more than a certain level.

AUDIO SIGNAL PROCESSING DEVICE, AUDIO SIGNAL PROCESSING METHOD, AND AUDIO SIGNAL PROCESSING PROGRAM

Assignee

Inventors

Cpc classification

Classification Explorer

G10L19/09

PHYSICS

Classification Explorer

G10L2019/0011

PHYSICS

Classification Explorer

H04L1/00

ELECTRICITY

Classification Explorer

G10L19/135

PHYSICS

Classification Explorer

G10L19/07

PHYSICS

Classification Explorer

G10L19/005

PHYSICS

International classification

Classification Explorer

G10L19/09

PHYSICS

Classification Explorer

G10L19/005

PHYSICS

Classification Explorer

G10L19/135

PHYSICS

Classification Explorer

H04L1/00

ELECTRICITY

Abstract

Claims

Description