Optimized partial mixing of audio streams encoded by sub-band encoding

Abstract

The invention relates to a method for combining a plurality of audio streams encoded by frequency sub-band encoding, comprising the following steps: decoding (E301) a portion of the encoded streams over at least one frequency sub-band; combining (E302) the streams thus encoded to form a mixed stream; selecting (E303), from among the plurality of encoded audio streams, at least one encoded replication stream, over at least one frequency sub-band that is different from that of the decoding step. The method is such that the selection of the at least one encoded replication stream is carried out according to a criterion which takes into consideration the presence of a predetermined frequency band in the encoded stream (E304). The invention also relates to a device which implements the described method and can be integrated into a conference bridge, a communication terminal or a communication gateway.

Claims

1. A method for combining a plurality of audio signal streams coded according to a frequency sub-band coding, wherein the method comprises the following acts performed by a combining device: receiving a plurality of audio signal streams coded according to a frequency sub-band coding; decoding of a plurality of the received audio signal streams on at least one common frequency sub-band; adding the plurality of decoded audio signal streams to form a mixed signal stream in the at least one common frequency sub-band; selecting, from among the plurality of received coded audio signal streams, of at least one replication coded signal stream, on at least one frequency sub-band different from that of the decoding act, according to a criterion taking into account the presence of a predetermined frequency band in the coded signal stream; transmitting of the mixed signal stream and the at least one replication coded signal stream.

2. The method as claimed in claim 1, further comprising an act of preselecting the coded audio signal streams according to a predetermined criterion.

3. The method as claimed in claim 1, wherein, in the case where several coded signal streams are selected in the selecting act, an additional selecting of replication coded signal stream is performed on a criterion of precedence of selection of the signal streams.

4. The method as claimed in claim 1, further comprising an act of re-encoding the mixed signal stream and an act of combining with the replication signal stream obtained before transmitting.

5. The method as claimed in claim 1, wherein the decoding act is performed on low-frequency sub-bands and the predetermined frequency band of the selecting criterion is a frequency band above said low-frequency sub-bands.

6. The method as claimed in claim 1, further comprising a prior act of classifying the coded audio signal streams and in that the replication coded signal stream selected is the first signal stream in this order of classification in which the predetermined frequency band has been detected.

7. The method as claimed in claim 1, wherein the presence of a predetermined frequency band in a coded signal stream is effected by a comparison of energy, in the various frequency sub-bands, of the decoded audio signal streams.

8. The method as claimed in claim 1, wherein the presence of a predetermined frequency band in a coded signal stream is effected according to the following acts: determining by frequency sub-band of a predetermined set of sub-bands, of a signal estimated on the basis of the coded signal stream; determining by frequency sub-band of the predetermined set of sub-bands, of non-coded parameters representative of the audio content, on the basis of the corresponding estimated signal; calculating of at least one local criterion on the basis of the parameters determined; deciding as regards the presence of a predetermined frequency band in at least one sub-band of the audio content as a function of the at least one local criterion calculated.

9. A device for combining a plurality of audio signal streams coded according to a frequency sub-band coding, wherein the device comprises: an input module receiving a plurality of audio signal streams coded according to a frequency sub-band coding; a module for decoding a plurality of the received audio signal streams on at least one common frequency sub-band; a module for adding the plurality of decoded audio signal streams to form a mixed signal stream in the at least one common frequency sub-band; a module for selecting, from among the plurality of received coded audio signal streams, at least one replication coded signal stream, on at least one frequency sub-band different from that of the decoding act, according to a criterion taking into account the presence of a predetermined frequency band in the coded signal stream; an output module transmitting the mixed signal stream and the at least one replication coded signal stream.

10. A conference bridge comprising a device for combining a plurality of audio signal streams coded according to a frequency sub-band coding, wherein the device comprises: an input module receiving a plurality of audio signal streams coded according to a frequency sub-band coding; a module for decoding a plurality of the received audio signal streams on at least one common frequency sub-band; a module for adding the plurality of decoded audio signal streams to form a mixed signal stream in the at least one common frequency sub-band; a module for selecting, from among the plurality of received audio signal streams, at least one replication coded signal stream, on at least one frequency sub-band different from that of the decoding act, according to a criterion taking into account the presence of a predetermined frequency band in the coded signal stream; an output module transmitting the mixed signal stream and the at least one replication coded signal stream.

11. A communication device comprising a device for combining a plurality of audio signal streams coded according to a frequency sub-band coding, wherein the device comprises: an input module receiving a plurality of audio signal streams coded according to a frequency sub-band coding; a module for decoding a plurality of the received audio signal streams coded on at least one common frequency sub-band; a module for adding the plurality of decoded audio signal streams thus decoded to form a mixed signal stream in the at least one common frequency sub-band; a module for selecting, from among the plurality of received audio signal streams, at least one replication coded signal stream, on at least one frequency sub-band different from that of the decoding act, according to a criterion taking into account the presence of a predetermined frequency band in the coded signal stream; an output module transmitting the mixed signal stream and the at least one replication coded signal stream.

12. The communication device as claimed in claim 11, wherein the communication device is a communication gateway.

13. A non-transitory computer-readable medium, on which is stored a computer program comprising code instructions for execution of steps of a method for combining a plurality of audio signal streams coded according to a frequency sub-band coding, when these instructions are executed by a processor, wherein the method comprises the following steps performed by the processor as configured by the instructions: receiving a plurality of audio signal streams coded according to a frequency sub-band coding; decoding of a plurality of the received audio signal streams coded on at least one common frequency sub-band; adding of the plurality of decoded audio signal streams thus decoded to form a mixed signal stream in the at least one common frequency sub-band; selecting, from among the plurality of coded audio signal streams, of at least one replication coded signal stream, on at least one frequency sub-band different from that of the decoding act, according to a criterion taking into account the presence of a predetermined frequency band in the coded signal stream; transmitting of the mixed signal stream and the at least one replication coded signal stream.

14. The communication device as claimed in claim 11, wherein the communication device is a communication terminal.

15. The method as claimed in claim 1, further comprising an act of performing a first preselection of the coded audio signal streams according to a predetermined criterion and a second preselection to restrict a number of pathways taken into account for selecting.

Description

(1) Other characteristics and advantages of the invention will be more clearly apparent on reading the following description, given solely by way of nonlimiting example and with reference to the appended drawings, in which:

(2) FIG. 1a, described previously, illustrates the operating principle of a replicating bridge according to the prior art;

(3) FIG. 1b, described previously, illustrates the operating principle of a mixing bridge according to the prior art;

(4) FIG. 2, described previously, illustrates the operating principle of the partial mixing according to the prior art, applied to the coding of G.711.1 type;

(5) FIG. 3 illustrates the main steps of the combining method according to an embodiment of the invention;

(6) FIG. 4 illustrates a coder of G.722 type delivering streams able to be combined according to the method of the invention;

(7) FIG. 5a illustrates the steps of a particular embodiment for coded streams of G.722 type and implemented in a centralized bridge;

(8) FIG. 5b illustrates the steps, implemented in a terminal, of the particular embodiment for coded streams of G.722 type;

(9) FIG. 6 illustrates a coder of G.711.1 type delivering streams able to be combined according to the method of the invention;

(10) FIG. 7 illustrates the steps, implemented in a centralized bridge, of a particular embodiment for coded streams of G.711.1 type;

(11) FIGS. 8a and 8b illustrate hardware representations of combining devices according to embodiments of the invention; and

(12) FIG. 9 illustrates the steps implemented in an embodiment for the step of detecting a predetermined frequency band according to the invention.

(13) FIG. 3 illustrates the main steps of an embodiment of the combining method according to the invention. On the basis of a plurality (N) of coded streams (Be.sub.j) coded according to a frequency sub-band coding scheme, the method comprises a step of decoding at E301 a part of the coded streams received and on at least one frequency sub-band. Thus, on the basis of the bitstreams Be.sub.j.sup.l of at least one frequency sub-band, for example the low-frequency sub-band, the reconstructed signals s.sub.i.sup.l of the low-frequency sub-band are obtained on completion of this decoding.

(14) In step E302, a mixing of these streams is performed on this at least one frequency sub-band. The decoded streams are therefore added together to form a mixed signal S.sub.i.sup.l=.sub.js.sub.j.sup.l (with 0j<N, and in the case of the centralized bridge ji). In an optional step E305a, the mixed signal S.sub.i.sup.l is coded to obtain a stream Bs.sub.i.sup.l.

(15) On the basis of the coded streams received, a step E304 of selecting at least one replication coded stream is performed. This selection is performed on at least one frequency sub-band different from that (those) used for the decoding step. To implement this selection according to the invention, a step E303 is implemented to detect the presence of a predetermined frequency band in the coded stream. For example, the presence of a content in the high-frequency band conditions the selection of the coded stream which contains it. The selected stream Be.sub.k.sup.h then constitutes a replication stream Bs.sub.i.sup.h to be combined in the optional step of combining E306a the bitstreams with the coded mixed stream Bs.sub.i.sup.l obtained in step E305a: (Bs.sub.i.sup.l,Bs.sub.i.sup.h). As a variant or supplement, the replication stream Bs.sub.i.sup.h is decoded in the optional step E305b to obtain a decoded signal S.sub.i.sup.h to be combined in the optional step of combining E306b with the mixed signal S.sub.i.sup.l (obtained in step E302): (S.sub.i.sup.l,S.sub.i.sup.h).

(16) For the sake of conciseness, the case where the predetermined frequency band to be detected in a coded stream is a high-frequency band is described subsequently. It is obvious to the person skilled in the art to adapt this detection to other types of frequency band, for example to a low-frequency band or else to a frequency band of a predefined span of values.

(17) Thus, a first embodiment is now described for audio streams which have been coded according to a coding scheme of standardized ITU-T G.722 type.

(18) FIG. 4 illustrates this mode of coding. It is also described in the document cited previously: Rec. ITU-T G.722, 7 kHz audio-coding within 64 kbit/s, November 1988.

(19) The G.722 coder codes the input signal (x(n)) sampled at 16 kHz as two sub-bands sampled at 8 kHz. The division into sub-bands is done by a quadrature mirror filter (QMF) by the module 401. On the basis of two input samples the QMF filter gives as output a low band (0-4000 Hz) sample x.sub.L(n) and a high band (4000-8000 Hz) sample x.sub.H(n). The signals of the 2 sub-bands are coded independently by ADPCM (Adaptive Differential Pulse-Code Modulation) coders 402 and 403.

(20) The indices of the two quantized prediction errors I.sub.H(n) and I.sub.L(n) are thus transmitted in the bitstream I(n) after multiplexing at 404. The G.722 coder has three bitrates: 64, 56 and 48 kbit/s. Each sample of the low sub-band is coded on 6 bits at the highest bitrate (48 kbit/s), on 5 bits at the intermediate bitrate (40 kbit/s), and on 4 bits at the lowest bitrate (32 kbit/s). At the highest bitrate, the coded stream of the low sub-band consists of the core layer with 4 bits per sample and of two enhancement layers with 1 bit per sample each. The high sub-band is always coded on 2 bits (16 kbit/s) per sample independently of the bitrate.

(21) A first exemplary embodiment is now illustrated in FIG. 5a which represents the steps of the method according to the invention, implemented in a partial mixing device with centralized architecture receiving streams coded by the 64 kbit/s ITU-T G.722 coder. As mentioned previously, this coder is a sub-band coder, the signals of the two (M=2) sub-bands being coded by ADPCM technology.

(22) On the basis of N hierarchical bitstreams (also called input pathways hereinafter), coded in this embodiment by G.722 at 64 kbit/s, an optional step E501 of preselecting N streams is implemented.

(23) This preselection step makes it possible to select, from among the various input pathways, those which comply with one or more of the selection criteria described previously for the prior art schemes. For example, on the basis of the voice activity detection, the FCFS (First Come First Served) criterion is used to select the streams. Or else, on the basis of the measurement of the power of the signal or of its energy, the LT (Loudest Talker) criterion is used to select the streams with the highest intensity.

(24) Thus, a part of the coded streams received by the combining device or mixing bridge is taken into account to implement the combining method. This therefore reduces the complexity of implementation of the steps of the method since the number of pathways to be combined is restricted. This preselection step is optional and the decoding step E502 can then apply to the set N of coded input audio streams.

(25) Subsequently, for the sake of clarity, we will use the notation N (with NN) whether or not this optional step is implemented and we will denote by V the set of indices of these pathways.

(26) Step E502 of decoding the N streams in the low sub-band is thereafter implemented. As a variant, which is particularly advantageous if step E502 is not very complex, the preselection step E501 can be performed after this step E502 of decoding all the low sub-band input streams.

(27) Or else, as a supplement, a second preselection step can be performed after this decoding step so as optionally to further restrict the number of pathways to take into account thereof in the selection of a high sub-band stream to be replicated (steps E505 to E507) and/or of low band sub-band mixing (step E503).

(28) For these N coded audio streams, for each pathway j (jV) we denote by: Be.sub.j.sup.l the incoming low sub-band bitstream (composed of the core layer and of two enhancement layers); Be.sub.j.sup.h the incoming high sub-band bitstream.

(29) In the decoding step E502, the reconstructed signal s.sub.j.sup.l of the low sub-band is obtained by decoding the stream Be.sub.j.sup.l.

(30) In step E503, a procedure for mixing the bitstreams thus decoded is performed by addition of signals thus reconstructed of the low sub-band: S.sub.i.sup.l=s.sub.j.sup.l with jV, ji. Note that if iV, S.sub.i.sup.l is the sum of N1 signals, otherwise S.sub.i.sup.l is the sum of N signals.

(31) The low sub-band output bitstream (Bs.sub.i.sup.l) intended to be transmitted to a terminal Ti (0i<N) is then obtained by coding in step E504, by the low sub-band encoder of G.722 (ADPCM on 6 bits), of this sum signal S.sub.i.sup.l.

(32) On the basis of the set N of input pathways, a step of detecting a predetermined frequency band E506 is performed. In this embodiment, the predetermined frequency band is the high-frequency band. This makes it possible to determine the presence of an HD content in the coded stream. Thus, an analysis of the audio content of the input pathways is performed.

(33) Various modes of detection of the presence of the high-frequency band are possible. For example, the scheme for detecting an HD content in a stream j can use a comparison of the energy of the reconstructed signal of the high sub-band, s.sub.j.sup.h, with that of the reconstructed signal of the low sub-band s.sub.j.sup.l. This embodiment requires a decoding of the audio stream to be analyzed in the high sub-band, in addition to the decoding of the low sub-band.

(34) As an alternative, to avoid the decoding of the signals of the high sub-band, a low algorithmic cost detection method can be implemented. This method is described subsequently with reference to FIG. 9.

(35) In step E507, a selection of at least one coded stream k having HD content is performed. In the case where several coded streams comprise HD content, an additional selection, not represented in FIG. 5a, can be implemented. This additional selection may for example be based on a criterion of precedence of selection of the coded audio stream. Thus, the most recently replicated stream is chosen. Of course, other criteria are possible; for example, according to the energies of the low sub-band signals obtained in step E502.

(36) The selection of the high sub-band of the coded stream k comprising HD content is thus performed in step E507 and constitutes the output audio stream Bs.sub.i.sup.h=Be.sub.k.sup.h. This high sub-band bitstream (Bs.sub.i.sup.h) is replicated in step E508 so as to be transmitted to a terminal Ti with ik at the same time as the low sub-band coded mixed stream (Bs.sub.i.sup.l).

(37) In the case where several replication streams have been selected in step E507, these streams are replicated and combined with the low sub-band mixed stream.

(38) In another variant embodiment, a step of classifying the input pathways is performed at E505, before the step of detecting the frequency band. The classification may for example be made from the most recently replicated pathway to the least recently replicated pathway or as a function of the energies of the low sub-band signals obtained in step E502. This step E505 can very obviously use another criterion for ranking the input pathways. For example, the order according to the replication sequencing can be weighted by the criterion used in step 501 or else according to the energies of the decoded signals of the low sub-band.

(39) The analysis done in step E506 is then carried out on the streams of the input pathways ranked in the order determined in the classification step E505. As soon as an HD stream has been detected, the analysis stops.

(40) Step E505 is optional and can be performed either on the N input pathways, or on the N input pathways after application of the preselection step E501.

(41) In the case where the preselection step E501 is performed and in the case where none of the preselected streams contains HD content detected in step E506, then the detection is done on the input streams not yet analyzed to find the existence of at least one stream which comprises the predetermined frequency band. If one exists, it is then the latter which is selected in step E507.

(42) Advantageously, a pooling of the steps can be implemented. For example, the detection step such as described subsequently with reference to FIG. 9 uses a voice activity detection parameter which can also be used for the preselection step E501. It will then be understood that steps E501 and E506 may be combined and that part at least of their calculations and parameters can be pooled. Likewise when step E506 provides information about the reliability of the detection, this information is advantageously used by step E505 of classifying the input pathways.

(43) In a particular embodiment, the terminal whose stream is replicated (here k), does not receive any high sub-band streams since the high sub-band stream selected in step E507 is that originating from this terminal. For this terminal, in a variant embodiment, a step of selecting a second HD stream to be replicated k can be performed for this output. We then have: Bs.sub.k.sup.h=Be.sub.k.sup.h, kk.

(44) The embodiment described with reference to FIG. 5b describes the implementation of the invention in a terminal with multi-party communication, with meshed architecture or with centralized architecture using a replicating bridge.

(45) In this embodiment, steps E501, E502, E503, E505, E506, E507 and E508 are the same as those described with reference to FIG. 5a.

(46) Here, it is a terminal i which receives N input pathways (N hierarchical bitstreams coded by G.722 at 64 kbit/s).

(47) As previously, we use the notation N (with NN) whether or not the optional step E501 is implemented and we denote by V the set of indices of these input pathways.

(48) In this embodiment, the method uses in step E506 the technique described subsequently with reference to FIG. 9, to perform the detection of an HD content on an input pathway j. There is therefore no reconstruction of the signal in the high sub-band. In a particular embodiment, the parameters determined on the basis of the estimation of the signal according to this detection technique are also used in certain steps of the method of this embodiment, and especially the step of decoding the selected stream and also the streams in the low sub-bands. Indeed, these parameters then no longer have to be decoded, thus decreasing the complexity of the decoding steps.

(49) Thus, an analysis of the audio content of a subset of N input bitstreams to detect an HD content is performed in step E506, in the case where the preselection step E501 is implemented. A pathway k is selected at E507 from among the pathways and the bitstream of the high sub-band Be.sub.k.sup.h of this pathway is replicated, in step E508, as bitstream for the high sub-band Bs.sub.i.sup.h for terminal i. Bs.sub.i.sup.h=Be.sub.k.sup.h.

(50) Moreover, in step E502, the N low sub-band signals s.sub.j.sup.l are obtained by decoding of the low sub-band bitstreams Be.sub.j.sup.l, jV.

(51) In this embodiment, in step E503, the low sub-band signal S.sub.i.sup.l is obtained by addition of the N reconstructed signals of the low sub-band: S.sub.i.sup.l=s.sub.j.sup.l; jV. In contradistinction to FIG. 5a, S.sub.i.sup.l here is always the sum of N signals; indeed, the terminal does not receive its own stream.

(52) In step E511, the high sub-band signal S.sub.i.sup.h is obtained by decoding by the high sub-band G.722 decoder of the high sub-band bitstream Bs.sub.i.sup.h obtained in step E508 by replication of the stream Be.sub.k.sup.h selected in step E507.

(53) Finally, the wide-band reconstructed signal is obtained in E510 by G.722 synthesis QMF filtering of the two signals, low sub-band S.sub.i.sup.l and high sub-band S.sub.i.sup.h.

(54) In these two embodiments, the preselection step E501 makes it possible to reduce the number of streams to be taken into account for the analysis to be performed at E506 but also for the decoding of the low sub-band bitstreams of step E502 and for the mixing of step E503. This makes it possible therefore to globally reduce the complexity of the combining method. As in the previous case, the preselection can be performed as a variant or supplement after the decoding step.

(55) Thus, in this embodiment, a bitstream of the high sub-band of a single input pathway is selected so as to be decoded by the high sub-band decoder of the G.722 (ADPCM decoder at 2 bits per sample), the bitstreams of the two enhancement layers of the low sub-band of the input pathways are decoded with the stream of the core layer to obtain the decoded signals of the low sub-band which are added together.

(56) A possible technique for detecting a predetermined frequency band in an audio stream coded according to the G.722 coding is now described with reference to FIG. 9. A step E901 determines initially, per frequency sub-band of a predetermined set of frequency sub-bands, a signal estimated on the basis of the bitstream. Accordingly, steps of obtaining an adaptation parameter associated with the quantization index for a current sample n and of calculating a signal estimated for the current sample on the basis of this determined adaptation parameter, of the signal estimated for the previous sample and of a predefined forgetting factor, are implemented. An exemplary embodiment of such a technique for determining an estimated signal is described in French patent application FR 11 52596.

(57) This estimated signal is representative of the audio content which has been coded. The predetermined set of sub-bands, that is to say the sub-bands considered when estimating these representative signals as well as their number M, may be predefined or may evolve in the course of time.

(58) Subsequently, this signal estimated for a sub-band m (0m<M) will be denoted:

(59) {tilde over (s)}.sub.m(n), n=0, . . . , N.sub.m1, N.sub.m being the number of samples in a sub-band m.

(60) A step E902 of determining non-coded parameters representative of the audio content is thereafter implemented. These parameters p(m) are determined per frequency sub-band of the predetermined set of sub-bands, on the basis of the signal estimated in the corresponding sub-bands.

(61) Several types of parameters can be calculated. A few examples thereof are given hereinafter.

(62) For a sub-band m, a parameter can be determined for example on the basis of a norm of the estimated signal (or a power of this norm). Such parameters are given hereinbelow for a given band m (0m<M):

(63) $L_{} (m) = \max_{n = 0, .Math., N_{m} - 1} (.Math. {\tilde{s}}_{m} (n) .Math.);$ $L_{1} (m) = {.Math.}_{n = 0}^{N_{m} - 1} .Math. {\tilde{s}}_{m} (n) .Math.; L_{2} (m) = {.Math.}_{n = 0}^{N_{m} - 1} {{\tilde{s}}_{m} (n)}^{2}$
Normalized versions can also be used, such as:

(64) $L_{1}^{} (m) = \frac{1}{N_{m}} {.Math.}_{n = 0}^{N_{m} - 1} .Math. {\tilde{s}}_{m} (n) .Math.; L_{2}^{} (m) = \frac{1}{N_{m}} {.Math.}_{n = 0}^{N_{m} - 1} {{\tilde{s}}_{m} (n)}^{2}$
It is also possible to use other types of parameters such as a ratio: for example, the ratio between the minimum and the maximum of the estimated signalin absolute values or otherwise:

(65) $_{m i n m ax} (m) = \frac{\min_{n = 0, .Math., N_{m} - 1} (.Math. {\tilde{s}}_{m} (n) .Math.)}{\max_{n = 0, .Math., N_{m} - 1} (.Math. {\tilde{s}}_{m} (n) .Math.)};$ $_{m i n ma x}^{} (m) = \frac{\min_{n = 0, .Math., N_{m} - 1} ({\tilde{s}}_{m} (n))}{\max_{n = 0, .Math., N_{m} - 1} ({\tilde{s}}_{m} (n))} .$
Obviously, the inverse of this ratio can also be considered.

(66) In an exemplary embodiment, one and the same parameter is calculated for various sub-bands. However, a parameter can be calculated only on a more restricted number (optionally restricted to a single sub-band) of sub-bands.

(67) On the basis of at least one of these parameters, step E903 is implemented to calculate at least one local criterion.

(68) This local criterion can be calculated on the basis of parameters of a single sub-band or of parameters calculated on more than one sub-band. To distinguish these two categories of criterion we name them according to the number of sub-bands taken into account during the calculation, mono-band criterion and multi-band criterion.

(69) For each category, a few examples of criteria are detailed hereinafter.

(70) A mono-band criterion uses a distance between a parameter p(m) of a sub-band m and a threshold thresh.sub.mo(m). This threshold may or may not be adaptive and may optionally depend on the sub-band considered. We then denote by d(m) the mono-band criterion such that:
d(m)=dist(p(m),thresh.sub.mo(m))

(71) Advantageously, this distance is the simple difference between the parameter p(m) and this threshold:
d(m)=dist(p(m),thresh.sub.mo(m))=p(m)thresh.sub.mo(m)
For example, these mono-band criteria can be defined by the equations hereinbelow, on the sub-bands m and m, (0m, m<M):
crit0.sub.mo(m)=dist(L.sub.(m),thresh0.sub.mo(m)),crit1.sub.mo(m)=dist(L.sub.1(m),thresh1.sub.mo(m)),
where thresh0.sub.mo(m) and thresh1.sub.mo(m) are thresholdsadaptive or non-adaptivethat can depend on the sub-band considered.

(72) It would be possible, for example, to adapt the threshold on the band m as a function of the band m, or as a function of a previous block of samples.

(73) A multi-band criterion compares parameters calculated on at least two sub-bandsfor example, a parameter p(m) of a sub-band m and a parameter p(m) of a sub-band m.

(74) Here again, as in the case of a mono-band criterion, a threshold thresh.sub.M(m,m)adaptive or non-adaptiveoptionally dependent on the sub-bands considered, can be used.

(75) For example, these multi-band criteria can be defined by the equations hereinbelow, on the sub-bands m and m, (0m, m<M):
crit0.sub.M(m,m)=dist.sub.th(dist.sub.p(.sub.min max(m),.sub.min max(m)),thresh0.sub.M(m,m)),
crit1.sub.M(m,m)=dist.sub.th(dist.sub.p(L.sub.1(m),L.sub.1(m)),thresh1.sub.M(m,m))

(76) Advantageously, a distance dist.sub.th is a simple difference between a threshold and a distance dist.sub.p between parameters of at least two sub-bands.

(77) The distance dist.sub.p between parameters of at least two sub-bands can use ratios between parameters. For example, in the case of a distance between parameters of two sub-bands:

(78) ${dist}_{p} (L_{1}^{} (m), L_{1}^{} (m^{})) = L_{1}^{} (m) / L_{1}^{} (m^{}) or$ ${dist}_{p}^{} (L_{1}^{} (m), L_{} (m^{})) = L_{1}^{} (m) / L_{} (m^{})$
It is also noted that one and the same set of parameters can be used to calculate several criteria both in the case of a mono-band criterion and of a multi-band criterion.

(79) On the basis of at least one local criterion such as defined, step E904 is implemented. In this step, a local decision (instantaneous, denoted dec.sub.inst.sup.cur) is taken by detecting whether the coded audio content comprises frequencies in at least one sub-band.

(80) In a particular embodiment, in the case of detection of a frequency band termed the high-frequency band (i.e. frequencies above a threshold frequency F.sub.th), it is decided whether the audio content comprises frequencies in sub-bands m such that, m.sub.thm, where m.sub.th is the index of the sub-band including the frequency F.sub.th. At least one of these sub-bands m is taken into consideration in the decision step.

(81) In the particular example of the G.722 fixed HD voice coder with two sub-bands, when it is sought to detect whether the coded content is actually wide-band (WB), it is detected whether there is any relevant content in the second sub-band (high sub-band) so as to take a Narrow band NB or Widened band WB decision.

(82) In the case where the predetermined frequency band is not the high-frequency band, the decision is of course adapted and the sub-bands considered may be those which are below a threshold frequency to detect a low-frequency band or else those which are defined by frequencies bracketing this predetermined frequency band.

(83) To take this decision, at least one local criterion is useful. As a variant, several criteria may be used alone or jointly.

(84) The decision may be soft or hard. A hard decision consists in comparing at least one criterion with a threshold and in taking a binary decision or one with predefined states about the presence of the frequency band in the sub-band.

(85) A soft decision consists in using the value of the criterion to define, according to an interval of predefined values, a higher or lower probability of presence of the frequency band in the sub-band considered.

(86) In a particular embodiment, a step of detecting the type of content, for example a voice content, is firstly carried out so as to perform the local detection only on the relevant frames, that is to say those comprising this type of content.

(87) To detect this type of content, in an advantageous manner, the parameters determined at E902 on the signals representative of the sub-band signals are used.

(88) In a variant embodiment, to increase the reliability of detection, the final decision, denoted dec.sup.cur, for a current block of samples depends not only on the instantaneous local detection but also on the past detections. On the basis of soft or hard local decisions per block, a global decision is taken on a number of K blocks preceding the current block. This number of K blocks is adjustable as a function of a reliability of the decision/speed of the decision compromise.

(89) For example, the local detections can be smoothed over several blocks by an optionally sliding window. The dependency of the current decision on the past detections can also be a function of the reliability of the local decision. For example, if the local decision is estimated to be safe, the dependency of the current decision in relation to the past decisions may be minimized or indeed even canceled.

(90) Several embodiments are possible for the detection method such as described, both in the choice of the parameters, of the criteria, of the manner of optionally combining several criteria and in the use of soft or hard decisions, locally or globally. It is possible thereby to optimize the complexity/reliability of detection compromise as well as the reactivity of the detection.

(91) Another exemplary embodiment of the combining method according to the invention is now described. This embodiment describes the implementation of the invention in a partial mixing device comprising a bridge combining streams coded by the ITU-T G.711.1 coder at 96 kbit/s. This type of coder, illustrated in FIG. 6, is a sub-band coder, the low sub-band is coded hierarchically at 80 kbit/s (10 bits per sample) with a core coding at 64 kbit/s (8 bits per sample) and an enhancement layer at 16 kbit/s (i.e. 2 bits per sample on average) and the high sub-band at 16 kbit/s (2 bits per sample on average). It is also described in the above-mentioned document: Rec. ITU-T G.711.1, Wideband embedded extension for G.711 pulse code modulation, 2008.

(92) The G.711.1 coder operates on audio signals sampled at 16 kHz on blocks or frames of 5 ms (i.e. 80 samples at 16 kHz). The input signal x(n), optionally after a preprocessing by the module 601, is divided into 2 sub-bands [0, 4 kHz] and [4, 8 kHz] by QMF filters represented at 602. On the basis of two input samples the QMF filter gives as output a low sub-band (0-4000 Hz) sample x.sub.L(n) and a high sub-band (4000-8000 Hz) sample x.sub.H(n). The bitrate of 64 kbit/s (Layer 0 compatible with G.711) corresponds to the quantization of the [0, 4 kHz] sub-band by the PCM (Pulse Code Modulation) technique equivalent to G.711, with shaping of the quantization noise. The following two layers (Layers 1 and 2) code respectively the low sub-band [0, 4 kHz] by a PCM coding enhancement technique, and the high sub-band [4, 8 kHz] by an MDCT (Modified Discrete Cosine Transform) transform coding, each with a bitrate of 16 kbit/s (80 bits per frame). When the decoder receives these enhancement layers, it can enhance the quality of the decoded signal.

(93) The core coding of the low sub-band signal is performed by the module 603a, included in the low sub-band coding module 603, according to the PCM technique equivalent to G.711, with shaping of the quantization noise. We briefly recall hereinafter the PCM coding used in G.711.

(94) The G.711 coder is based on a logarithmic compression on 8 bits at the sampling frequency of 8 kHz, to give a bitrate of 64 kbit/s. The G.711 PCM coding operates a compression of the filtered signals in the [300-3400 Hz] band by a logarithmic curve which makes it possible to obtain a nearly constant signal-to-noise ratio for a wide dynamic range of signals. The quantization interval varies with the amplitude of the sample to be coded: when the level of the input signal is low, the quantization interval is small, when the level of the input signal is high, the quantization interval is large. Two logarithmic PCM compression laws are used: the -law (used in North America and in Japan) and the A-law (used in Europe and in the rest of the world). The G.711 A-law and the G.711 -law encode the input samples on 8 bits. In practice, to facilitate setup of the G.711 coder, the logarithmic PCM compression has been approximated by a curve in segments. During this compression, the low-order bits of the mantissa are lost.

(95) In the A-law, the 8 bits are distributed in the following manner:

(96) 1 sign bit

(97) 3 bits to indicate the segment,

(98) 4 hits to indicate the placement in the segment.

(99) The coding (performed by the module 603b of FIG. 6) of the enhancement layer (Layer 1) of the low sub-band makes it possible to reduce the quantization error for the core layer (Layer 0) based on G.711 by adding extra bits to the samples coded in G.711 (Enh.LB). This technique which makes it possible to obtain an increase in the SNR (Signal-to-Noise Ratio) of 6 dB for each bit added per sample consists in saving and in transmitting in an enhancement bitstream the high-order bits from among the bits lost during the initial PCM coding.

(100) The recovery and the transmission of bits not transmitted in the mantissa of the PCM core coding enhances the quality of the coding of the low sub-band. Indeed, in case of reception of this enhancement layer, the decoder can decode the mantissa with greater precision. In G.711.1, the number of additional bits for the mantissa depends on the amplitude of the samples: indeed, rather than allocating the same number of bits to enhance the precision of the mantissa coding of the samples, the 80 bits available in layer 1 of G.711.1 to enhance the precision of the mantissa coding of the 40 samples are allocated dynamically: more bits being allotted to the samples with a high exponent. Thus, while the bits budget of the enhancement layer is 2 bits per sample on average (16 kbit/s), with this adaptive allocation, the number of bits allocated to a sample varies according to its exponent value from 0 to 3 bits.

(101) For the high sub-band, a Modified Discrete Cosine Transform (MDCT) is firstly performed by the module 604, on blocks of the signal of the high band of 10 ms with an overlap of 5 ms. Next the 40 MDCT coefficients, S.sub.HB(k), are coded by the module 605 by a vector quantization with interleaved conjugate structure and these coefficients are thereafter weighted and then normalized (by the square root of their energy). These coefficients are then distributed into 6 sub-vectors of dimension 6, the 4 coefficients representing the highest frequencies are not coded. These six sub-vectors are quantized independently on 12 bits by a set of two dictionaries with conjugate structure, C.sub.H0w and C.sub.H1w. Finally a global gain per frame is calculated on the basis of the decoded sub-vectors and of the normalization factor, this gain being quantized on 8 bits by a scalar quantizer of -law PCM type.

(102) At the decoder, the set of 36 MDCT coefficients is reconstructed on the basis of the six decoded sub-vectors with inverse interleaving, and the 4 coefficients representing the highest non-coded frequencies are simply set to zero and then the decoded signal of the high band is generated by inverse MDCT transform.

(103) The various coding layers (I.sub.B0(n), I.sub.B1(n), I.sub.B2(n)) are multiplexed at 606 to give the coded signal I(n).

(104) In the embodiment described with reference to FIG. 7 which represents the steps of the method according to the invention, implemented in a partial mixing device with centralized architecture receiving streams coded by the ITU-T G.711.1 coder at 96 kbit/s, the bitstreams of the two enhancement layers of an input pathway are replicated, the mixing is limited to the core layer.

(105) Thus, the bridge receives N input pathways (N hierarchical bitstreams coded by G.711.1 at 96 kbit/s). For each input pathway (0j<N) we denote by: Be.sub.j.sup.0 the incoming bitstream of the low sub-band core layer; Be.sub.j.sup.1 the incoming bitstream of the enhancement layer of the low sub-band; Be.sub.j.sup.2 the high sub-band incoming bitstream; s.sub.j.sup.0 the core layer (low sub-band) reconstructed signal obtained by decoding the stream Be.sub.j.sup.0; s.sub.j.sup.l the low sub-band reconstructed signal obtained by decoding the streams Be.sub.j.sup.0 and Be.sub.j.sup.1; s.sub.j.sup.2 the high sub-band reconstructed signal obtained by decoding the stream Be.sub.j.sup.2;
For each output pathway (0i<N) we also denote by: Bs.sub.i.sup.0 the outgoing bitstream of the core layer of the low sub-band; Bs.sub.i.sup.1 the outgoing bitstream of the enhancement layer of the low sub-band; Bs.sub.i.sup.2 the outgoing bitstream of the enhancement layer of the high sub-band.
On the basis of these N sub-band-coded streams, an optional preselection step E701 is implemented.

(106) Just as for the embodiments described with reference to FIGS. 5a and 5b, this preselection step makes it possible to select, from among the various input pathways, those which comply with one or more of the selection criteria described previously for the prior art schemes. For example, the selection of the streams can be performed on the basis of the voice activity detection by the FCFS (First Come First Served) criterion or on the basis of the measurement of the power of the signal or of its energy by the LT (Loudest Talker) criterion.

(107) Thus, a part (N with N<N) of the coded streams received by the combining device or mixing bridge is taken into account to implement the combining method. This therefore reduces the complexity of implementation of the steps of the method since the number of pathways to be mixed is restricted.

(108) As previously, we use the notation N (with NN) whether or not the optional step E701 is implemented, we denote by V the set of indices of these input pathways. Likewise, the preselection can be performed as a variant or supplement after the decoding step.

(109) Step E702 of decoding the N streams of the core layer of the low sub-band, is thereafter implemented. Thus, the core layers of the low sub-bands Be.sub.j.sup.0 are decoded to obtain the reconstructed signals s.sub.j.sup.0.

(110) In step E703, a procedure for mixing the bitstreams thus decoded is performed by addition of signals thus reconstructed of the low sub-band (core layer): S.sub.i.sup.0=s.sub.j.sup.0 with jV, ji. Note that if iV, S.sub.i.sup.l is the sum of N1 signals, otherwise S.sub.i.sup.l is the sum of N signals.

(111) The low sub-band core layer output bitstream (Bs.sub.i.sup.0) intended to be transmitted to a terminal Ti (0i<N) is then obtained by coding in step E704, by the core encoder of the low sub-band of G.711.1 (PCM on 8 bits with shaping of the coding noise) of this sum signal S.sub.i.sup.0.

(112) On the basis of the set N of input pathways, a step E705 of selecting at least one bitstream (Be.sub.k.sup.1) of the enhancement layer of the low sub-band (layer 1) of an input pathway k to be replicated is performed. The criterion (crit.1) used for this selection can be a criterion as mentioned in the prior art schemes, for example, the FCFS (First Come First Served) criterion or else the LT (Loudest Talker) criterion. The choice of the criterion can depend on that employed in the preselection step if the latter has been implemented. On the basis of this selection, at least one bitstream of the enhancement layer of the low sub-band (Bs.sub.i.sup.1) to be replicated in step E706 is obtained so as to transmit it to the terminal Ti: Bs.sub.i.sup.1=Be.sub.k.sup.1 (ik).

(113) On the basis again of the N coded streams a step E708 of detecting a predetermined frequency band, in the high sub-band, is performed. In this embodiment, the predetermined frequency band is the high-frequency band. This makes it possible to determine the presence of an HD content in the coded stream. Thus, an analysis of the audio content of the input pathways is performed.

(114) Various modes of detection of the presence of the high-frequency band are possible. For example, the scheme for detecting an HD content in a stream j can use a comparison of the energy of the reconstructed signal of the high sub-band, s.sub.j.sup.2, with that of the reconstructed signal of the low sub-band s.sub.j.sup.l. This embodiment requires a decoding of the audio stream to be analyzed in the high sub-band, in addition to the decoding of the core low sub-band 0 and of the enhancement low sub-band 1.

(115) At least one bitstream (Be.sub.k.sup.2) of the high sub-band (layer 2) of an input pathway k to be replicated for the enhancement layer of the high sub-band to be transmitted to the terminal Ti is selected at E709 after analysis of the content of the input pathways to detect whether there is any HD content. If the pathway k contains the predetermined frequency band, we then have Bs.sub.i.sup.2=Be.sub.k.sup.2 (ik).

(116) In the case where several coded streams comprise HD content, an additional selection, not represented in FIG. 7, can be implemented. This additional selection may for example be based on a criterion of precedence of selection of the coded audio stream. Thus, the most recently replicated stream is chosen, thereby affording continuity and a gentle transition for the switching of the replicated stream. Alternatively, if the pathway k selected in step E705 to replicate the enhancement layer of the low sub-band actually contains the predetermined frequency band (HD content), it is the bitstream Be.sub.k.sup.2 which can be selected to be replicated Bs.sub.i.sup.2=Be.sub.k.sup.2 (ik).

(117) The selection of the high sub-band of the coded stream k comprising HD content is thus performed in step E709 and constitutes the output audio stream Bs.sub.i.sup.2=Be.sub.k.sup.2. This high sub-band bitstream (Bs.sub.i.sup.2) is obtained by replication in step E710 so as to be transmitted to a terminal Ti with ik at the same time as the two streams of the low sub-band, the stream Bs.sub.i.sup.1 obtained by replication and the stream Bs.sub.i.sup.0 obtained by coding of the mixed signal.

(118) In the case where several replication streams have been selected in step E709 and/or in step E705, these streams are replicated and combined with the mixed stream of the core layer of the low sub-band.

(119) In another variant embodiment, a step of classifying the input pathways is performed at E707, before the step of detecting the frequency band. The classification may for example be done from the most recently replicated pathway to the least recently replicated pathway.

(120) The analysis done in step E708 is then effected on the streams of the input pathways ranked in the order from the pathway whose high sub-band bitstream has been most recently replicated to the pathway whose high sub-band bitstream has been least recently replicated. As soon as an HD stream has been detected, the analysis stops.

(121) This step E707 can very obviously use another criterion for ranking the input pathways as in the case of the embodiment with the G.722 coder described with reference to FIGS. 5a and 5b.

(122) Step E707 is optional and may or may not be implemented as a supplement to the preselection step E701.

(123) In the case where the preselection step E701 is performed and in the case where none of the preselected streams contains HD content detected in step E708, then the detection is done on the input streams not yet analyzed to find the existence of at least one stream which comprises the predetermined frequency band. If one exists, it is then the latter which is selected in step E709.

(124) Advantageously, a pooling of the steps can be implemented for the detection of HD content in the input pathways. Likewise, according to the detection scheme used, parameters which have been determined can be reused to decode the frequency sub-band of the selected coded audio stream. These parameters then no longer have to be decoded, thus making it possible to reduce the complexity of decoding this stream.

(125) In a variant embodiment, the selection of at least one bitstream k to be replicated for layer 1 is not done according to the criteria as described previously. The bitstream of the low sub-band 1 to be replicated may, in this case, be that corresponding to the stream k selected in step E709 for the high sub-band.

(126) In this case, the bitstreams Be.sub.k.sup.1 and Be.sub.k.sup.2 are replicated.

(127) In a particular embodiment, the terminal whose stream is replicated (here for example k), does not receive any high sub-band streams nor any streams of enhancement layers since these selected streams originate from this terminal. For this terminal, in a variant embodiment, a step of selecting a second HD stream to be replicated k1 can be performed for the enhancement layers of this output: Bs.sub.k.sup.1=Be.sub.k1.sup.1 and Bs.sub.k.sup.2=Be.sub.k1.sup.2, k1k.

(128) Although the invention is described in the case of the partial mixing of streams coded by wide-band coders with a conventional mixing of at least the core layer of the narrow band, it will be understood that the invention applies also to the partial mixing of streams coded by coders operating on other bandwidths (medium band, super-wide-band, HiFi band, etc.) with a conventional mixing of at least one low sub-band and the replication of the streams coding the sub-bands above the mixed sub-bands. For example in the case of a coder of super-HD type (with four sub-bands coded by ADPCM technology), the application of the invention may for example consist in performing a direct recombination of the signals of the two low sub-bands (corresponding to the wide-band [0-8 kHz]) and switching the selected streams of two high sub-bands (corresponding to the audio band [8-16 kHz]), the selection of these streams being made according to the method of the invention. Another exemplary application of the invention to this super-HD coder consists in mixing the signals of the lowest sub-band (corresponding to the narrow band [0-4 kHz]) and switching the streams, selected according to the invention, of three high sub-bands (corresponding to the audio band [4-16 kHz]).

(129) Likewise the decomposition into frequency sub-bands might not be performed by a filter bank. Thus in the case of the IETF coder described in RFC6716, the signal to be coded by the linear prediction coder is obtained by a resampling of the signal to be coded (for example to obtain a signal sampled at 16 kHz on the basis of a signal sampled at 48 kHz).

(130) In this case the invention decodes the part of the bitstreams coding the wide-band, mixes the wide-band decoded signals and selects an input pathway for which the super-HD (frequency above the wide-band) coded part of the stream is replicated.

(131) FIGS. 8a and 8b represent combining devices 800a and 800b in exemplary embodiments of the invention. These devices implement the combining method as described with reference to FIG. 3 by the main steps E301 to E304.

(132) The device 800a of FIG. 8a may be more particularly associated with a centralized bridge such as a conference bridge in a communication system comprising a plurality of terminals.

(133) For its part, the device 800b of FIG. 8b may be more particularly associated with a terminal or communication gateway.

(134) Hardware-wise, these devices 800a and 800b comprise a processor 830 cooperating with a memory block BM comprising a storage and/or work memory MEM.

(135) The processor drives processing modules able to implement the method according to the invention. Thus, these devices comprise a module 801 for decoding a part of the streams coded on at least one frequency sub-band, a module 802 for adding the streams thus decoded to form a mixed stream, a module 803 for detecting presence of a predetermined frequency band in a stream, a module 804 for selecting on the basis of the detection module, from among the plurality of coded audio streams, at least one replication coded stream, on at least one frequency sub-band different from that of the decoding step hereinabove.

(136) The memory block can advantageously comprise a computer program (prog.) comprising code instructions for the implementation of the steps of the combining method within the meaning of the invention, when these instructions are executed by the processor PROC and especially the steps of decoding a part of the streams coded on at least one frequency sub-band, of adding the streams thus decoded to form a mixed stream, of selecting, from among the plurality of coded audio streams, at least one replication coded stream, on at least one frequency sub-band different from that of the decoding step, the selection of the at least one replication coded stream being performed according to a criterion taking into account the presence of a predetermined frequency band in the coded stream.

(137) Typically, the description of FIG. 3 reuses the steps of an algorithm of such a computer program. The computer program can also be stored on a memory medium readable by a reader of the device or downloadable into the memory space of the latter.

(138) The memory MEM records, in a general manner, all the data necessary for the implementation of the combining method.

(139) The device 800a of FIG. 8a furthermore comprises a coding module 807 able to implement the coding step E305a of FIG. 3. This coding module codes the mixed stream obtained by the mixing module 802 before it is combined by the combining module 808a with the replication stream selected by the module 804. The module 808a is able to implement the combining step E306a of FIG. 3.

(140) The streams Bs.sub.i resulting from the combining are transmitted to the various terminals of the communication system via an output module 806a. This device 800a also comprises an input module 805a able to receive a plurality of coded audio streams N*Be.sub.i originating for example from the various terminals of the communication system, these coded audio streams having been coded by a frequency sub-band coder.

(141) The device 800b of FIG. 8b comprises a decoding module 809 able to implement the decoding step E305b of FIG. 3. This decoding module decodes the replication stream selected by the module 804 before it is combined by the combining module 808b with the mixed stream obtained by the mixing module 802. The module 808b is able to implement the combining step E306b of FIG. 3.

(142) The stream S.sub.Mi resulting from the combining is transmitted to the restitution system of the device or terminal via an output module 806b. This device 800b also comprises an input module 805b able to receive a plurality of coded audio streams N*Be.sub.i originating for example from various communication channels, these coded audio streams having been coded by a frequency sub-band coder.

Optimized partial mixing of audio streams encoded by sub-band encoding

Assignee

Inventors

Cpc classification

Classification Explorer

H04M3/568

ELECTRICITY

Classification Explorer

H04M7/0072

ELECTRICITY

Classification Explorer

H04L65/765

ELECTRICITY

Classification Explorer

G10L19/167

PHYSICS

Classification Explorer

H04L65/403

ELECTRICITY

Classification Explorer

G10L19/173

PHYSICS

Classification Explorer

G10L19/24

PHYSICS

Classification Explorer

H04L65/1069

ELECTRICITY

International classification

Classification Explorer

G10L21/02

PHYSICS

Classification Explorer

G10L19/16

PHYSICS

Classification Explorer

H04M3/56

ELECTRICITY

Classification Explorer

H04M7/00

ELECTRICITY

Abstract

Claims

Description