Optimized mixing of audio streams encoded by sub-band encoding
10242683 ยท 2019-03-26
Assignee
Inventors
Cpc classification
H04M3/568
ELECTRICITY
G10L19/008
PHYSICS
G10L19/167
PHYSICS
G10L19/173
PHYSICS
International classification
G10L25/93
PHYSICS
H04M3/56
ELECTRICITY
G10L19/24
PHYSICS
Abstract
The invention relates to a method for mixing a plurality of audio streams coded according to a frequency sub-band coding, comprising the steps for decoding (E201) a part of the coded streams over at least a first frequency sub-band, for summing (E202) the streams thus decoded so as to form at least a first mixed stream. The method is such that it comprises the steps for detection (E203), over at least a second frequency sub-band different from the at least first sub-band, of the presence of a predetermined frequency band in the plurality of coded audio streams and for summing (E205) the decoded audio streams (E204) for which the presence of the predetermined frequency band has been detected, over said at least a second sub-band, so as to form at least a second mixed stream. The invention also relates to a mixing device implementing the method described and may be integrated into a conference bridge, a communications terminal or a communications gateway.
Claims
1. A method for mixing a plurality of coded audio streams according to a coding by frequency sub-bands, comprising: receiving a plurality of audio streams coded according to a frequency sub-band coding; decoding of a part of the received audio streams over at least a first frequency sub-band; summing of the streams thus decoded so as to form at least a first mixed stream; detecting, over at least a second frequency sub-band different from the at least first sub-band, the presence of a predetermined frequency band within the plurality of coded audio streams; decoding audio streams for which the presence of the predetermined frequency band has been detected; summing of the decoded audio streams for which the presence of the predetermined frequency band has been detected, over said at least second sub-band, so as to form at least a second mixed stream; and transmitting the second mixed stream.
2. The method as claimed in claim 1, further comprising pre-selecting the coded audio streams according to a predetermined criterion, prior to detecting the presence of a predetermined frequency band within the plurality of coded audio streams.
3. The method as claimed in claim 1, further comprising re-coding the first mixed stream and the second mixed streams.
4. The method as claimed in claim 1, wherein decoding over at least a first frequency band is carried out over low frequency sub-bands and the predetermined frequency band detected over at least a second frequency sub-band is a frequency band higher than said low frequency sub-bands.
5. The method as claimed in claim 1, wherein detecting the presence of a predetermined frequency band within a coded stream is done by a comparison of energy, within the various frequency sub-bands, of the decoded audio streams.
6. The method as claimed in claim 1, wherein detecting the presence of a predetermined frequency band within a coded stream is done according to the following steps: determination, by frequency sub-band from a predetermined set of sub-bands, of an estimated signal based on the coded stream; determination, by frequency sub-band from the predetermined set of sub-bands, of uncoded parameters representative of the audio content, based on the corresponding estimated signal; calculation of at least one local criterion using the determined parameters; decision with respect to the presence of a predetermined frequency band within at least one sub-band of the audio content as a function of the at least one local criterion calculated.
7. The method as claimed in claim 6, wherein at least a part of the determined parameters, representative of the audio content, is saved in memory for a later use during the decoding of the audio streams to be mixed.
8. The method as claimed in claim 1, further comprising: several steps for detecting predetermined frequency bands within coded audio streams, the detection of a first predetermined frequency band within a first sub-band allowing a first set of coded audio streams to be obtained, the detection of a second predetermined frequency band within a second sub-band allowing a second set of coded audio streams to be obtained included in the first set; and steps for summation of decoded audio streams for each of the sets of coded audio streams obtained.
9. A device for mixing a plurality of coded audio streams according to a frequency sub-band coding, comprising a non-transitory computer-readable medium comprising instructions stored thereon, and a processor configured by the instructions to control: an input module configured to receive a plurality of audio streams coded according to a frequency sub-band coding; a decoding module for decoding a part of the received audio streams over at least a first frequency sub-band; a summing module for summing the streams thus decoded so as to form at least a first mixed stream; a detecting module for detecting, over at least a second frequency sub-band different from the at least first sub-band, the presence of a predetermined frequency band within the plurality of coded audio streams; the decoding module for decoding audio streams for which the presence of the predetermined frequency band has been detected; the summing module for summing the decoded audio streams for which the presence of the predetermined frequency band has been detected, over said at least a second frequency sub-band, so as to form at least a second mixed stream; and an output module configured to transmit the second mixed stream.
10. A conference bridge comprising a mixing device for mixing a plurality of coded audio streams according to a frequency sub-band coding, comprising a non-transitory computer-readable medium comprising instructions stored thereon, and a processor configured by the instructions to control: an input module configured to receive a plurality of audio streams coded according to a frequency sub-band coding; a decoding module for decoding a part of the received audio streams over at least a first frequency sub-band; a summing module for summing the streams thus decoded so as to form at least a first mixed stream; a detecting module for detecting, over at least a second frequency sub-band different from the at least first sub-band, of the presence of a predetermined frequency band within the plurality of coded audio streams; the decoding module for decoding audio streams for which the presence of the predetermined frequency band has been detected; the summing module for summing the decoded audio streams for which the presence of the predetermined frequency band has been detected, over said at least a second frequency sub-band, so as to form at least a second mixed stream; and an output module configured to transmit the second mixed stream.
11. A communications device comprising a mixing device for mixing a plurality of coded audio streams according to a frequency sub-band coding, comprising a non-transitory computer-readable medium comprising instructions stored thereon, and a processor configured by the instructions to control: an input module configured to receive a plurality of audio streams coded according to a frequency sub-band coding; a decoding module for decoding a part of the received audio streams over at least a first frequency sub-band; a summing module for summing the streams thus decoded so as to form at least a first mixed stream; a detecting module for detecting, over at least a second frequency sub-band different from the at least first sub-band, the presence of a predetermined frequency band within the plurality of coded audio streams; the decoding module for decoding audio streams for which the presence of the predetermined frequency band has been detected; the summing module for summing the decoded audio streams for which the presence of the predetermined frequency band has been detected, over said at least a second frequency sub-band, so as to form at least a second mixed stream; and an output module configured to transmit the second mixed stream.
12. The communications device as claimed in claim 11, wherein the communications device is a communications gateway.
13. The communications device as claimed in claim 11, wherein the communications device is a communications terminal.
14. A non-transitory computer-readable storage media that can be read by a processor, on which a computer program is stored comprising code instructions for execution of steps of a method of mixing a plurality of coded audio streams according to a coding by frequency sub-bands, the method comprising the following steps performed by the processor as configured by the instructions: receiving a plurality of audio streams coded according to a frequency sub-band coding; decoding of a part of the audio streams received over at least a first frequency sub-band; summing of the streams thus decoded so as to form at least a first mixed stream; detecting, over at least a second frequency sub-band different from the at least first sub-band, the presence of a predetermined frequency band within the plurality of coded audio streams; the decoding module for decoding audio streams for which the presence of the predetermined frequency band has been detected; summing of the decoded audio streams for which the presence of the predetermined frequency band has been detected, over said at least a second frequency sub-band, so as to form at least a second mixed stream; and transmitting the second mixed stream.
Description
(1) Other features and advantages of the invention will become more clearly apparent upon reading the following description, presented solely by way of non-limiting example, and with reference to the appended drawings, in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13) At the step E202, a mixing of these streams is carried out over this at least one frequency band. The decoded streams are therefore added together so as to form a first mixed stream S.sub.i.sup.l=.sub.js.sub.j.sup.l (with 0j<N, and in the case of the centralized bridge ji). In an optional step E206a, the mixed signal S.sub.i.sup.l is coded in order to obtain a stream Bs.sub.i.sup.l.
(14) Starting from the received coded streams, a step E203 is implemented for detecting the presence of a predetermined frequency band in the coded streams. The detection of a frequency band may be carried out in various ways. Exemplary embodiments will be described hereinbelow. At the output of this detection step, a set H.sub.1 of streams containing the predetermined frequency band is obtained. This set comprises a number N.sub.1 of streams with N.sub.1N.
(15) For the sake of concision, in the following part, the case where the predetermined frequency band to be detected within a coded stream is a high frequency band is described. It will be obvious for those skilled in the art how to adapt this detection to other types of frequency band, for example to a low frequency band or else to a frequency band with a predefined range of values.
(16) These N.sub.1 streams, for which the presence of the predetermined frequency band has been detected, are decoded at the step E204. Thus, starting from the N.sub.1 binary streams Be.sub.j.sup.h of a frequency sub-band, for example the higher frequency sub-band, at the output of this decoding, the reconstructed signals s.sub.j.sup.h of the high frequency sub-band are obtained, with jH.sub.1.
(17) At the step E205, a mixing of these streams is carried out over this frequency band. The decoded streams are therefore added together so as to form a second mixed stream S.sub.i.sup.h=s.sub.j.sup.h (with jH.sub.1, and in the case of the centralized bridge ji). In an optional step E207a, the mixed signal S.sub.i.sup.h is coded in order to obtain a stream Bs.sub.i.sup.h. The second coded stream Bs.sub.i.sup.h is combined at the optional step E208a for combining the binary streams with the first coded mixed stream Bs.sub.i.sup.l obtained at the step E206a: (Bs.sub.i.sup.l,Bs.sub.i.sup.h), this stream thus combined constitutes the stream to be transmitted to the terminal i.
(18) As a variant, in an optional step E208b, the second mixed stream of the high sub-band S.sub.j.sup.h obtained at the step E205 is subsequently combined with the first mixed stream of the low sub-band S.sub.i.sup.l obtained at the step E202, in order to form the stream to be reproduced.
(19) This method is notably applied in a centralized bridge which receives N streams from various terminals and transmits the mixed stream to each of the terminals i after re-encoding.
(20) It is also applicable to a terminal receiving N streams from other terminals and mixing according to the method these N received streams for reproducing to the terminal.
(21) A first embodiment is now described for audio streams that have been coded according to a method of coding of the standardized UIT-T G.722 type.
(22)
(23) The G.722 coder codes the input signal (x(n)) sampled at 16 kHz into two sub-bands sampled at 8 kHz. The division into sub-bands is carried out by a quadrature mirror filter (QMF for Quadrature mirror filter in English) via the module 301. Starting from two input samples, the filter QMF yields one sample x.sub.L(n) of low band (0-4000 Hz) and one sample x.sub.H(n) of high band (4000-8000 Hz) at the output. The signals of the 2 sub-bands are coded independently by ADPCM (Adaptative Differential Pulse-Code Modulation) coders 302 and 303.
(24) The indices of the two quantified prediction errors I.sub.H(n) and I.sub.L(n) are thus transmitted within the binary stream I(n) after multiplexing in 304. The G.722 coder has three data rates: 64, 56 and 48 kbit/s. Each sample of the low sub-band is coded over 6 bits at the highest data rate (48 kbit/s), over 5 bits at the intermediate data rate (40 kbit/s), and over 4 bits at the lowest data rate (32 kbit/s). At the highest data rate, the coded stream of the low sub-band is composed of the core layer at 4 bits per sample and of two improvement layers at 1 bit per sample each. The high sub-band is always coded over 2 bits (16 kbit/s) per sample independently of the data rate.
(25) A first exemplary embodiment is now illustrated in
(26) Starting from N hierarchical binary streams (also hereinafter referred to as input channels), coded in this embodiment by the G.722 at 64 kbit/s, an optional step E401 for pre-selection of N streams is implemented.
(27) This pre-selection step allows the selection from amongst the various input channels of those which meet one or more of the selection criteria previously described for the methods of the prior art. For example, based on the detection of voice activity, the FCFS criterion (for First Come First Serve) is used to select the streams. Or else, based on the measurement of the power of the signal or of its energy, the LT criterion (for Loudest Talker) is used to select the streams with the highest intensity.
(28) Thus, a part (N with N<N) of the coded streams received by the mixing device or mixing bridge is taken into account for implementing the method of mixing. This therefore reduces the complexity of implementation of the steps of the method, since the number of channels to be mixed is limited.
(29) This pre-selection step is optional and the decoding step E402 can then be applied to all N of the input coded audio streams.
(30) In the following, for the sake of clarity, the notation N (with NN) will be used whether this optional step is implemented or not and the set of the indices of these channels will be denoted V.
(31) The step E402 for decoding the N streams in the low sub-band is subsequently implemented. As a variant, being particularly advantageous if the step E402 is not very complex, the pre-selection step E401 may be carried out after this decoding step E402 for all the low sub-band input streams.
(32) Or again, as a complement, a second pre-selection step may be carried out after this decoding step in order, where required, to limit even more the number of channels to be taken into account in the processing of the high sub-band streams to be detected, mixed and to be re-encoded (steps E405 to E408) and/or in the processing of the low sub-band streams to be mixed and to be re-encoded (steps E403 to E404).
(33) For these N coded audio streams, for each channel j (jV), the following notations are used: Be.sub.j.sup.l the incoming low sub-band binary stream (composed of the core layer and of two improvement layers); Be.sub.j.sup.h the incoming high sub-band binary stream.
(34) At the decoding step E402, the reconstructed signal s.sub.j.sup.l of the low sub-band is obtained by decoding the stream Be.sub.j.sup.l.
(35) At the step E403, a procedure for mixing the binary streams thus decoded is carried out by summing N1 signals thus reconstructed of the low sub-band: S.sub.i.sup.l=s.sub.j.sup.l with jV, ji for a transmission of the stream to the terminal i. It should be noted that, if iV, S.sub.i.sup.l is the sum of N1 signals, otherwise S.sub.i.sup.l is the sum of N signals.
(36) The low sub-band output binary stream (Bs.sub.i.sup.l) intended to be transmitted to a terminal Ti (0i<N) is then obtained by coding at the step E404, by the low sub-band encoder of the G.722 (ADPCM over 6 bits), of this sum signal S.sub.i.sup.l.
(37) Starting from the set N of input channels, a step for detection of a predetermined frequency band E405 is carried out. In this embodiment, the predetermined frequency band is the high frequency band. This allows the presence of a HD content in a coded stream to be determined. Thus, an analysis of the audio content of the input channels is carried out.
(38) Various modes for detection of the presence of the high frequency band are possible. For example, the method for detection of an HD content in a stream j can use a comparison of the energy of the reconstructed signal of the high sub-band, s.sub.j.sup.h, with that of the signal reconstructed of the low sub-band s.sub.j.sup.l. This embodiment requires a decoding of the audio stream to be analyzed in the high sub-band, in addition to the decoding of the low sub-band.
(39) As an alternative, in order to avoid the decoding of the signals of the high sub-band, a method of detection with a lower cost of algorithm may be implemented. This method is described hereinbelow with reference to
(40) At the output of the step E405, a set H.sub.1 of streams for which the presence of the predetermined frequency band has been detected is obtained. In this embodiment, these streams are those that have the HD content. The number of streams in the set H.sub.1 is N.sub.1, with N.sub.1N.
(41) At the step E406, the audio streams Be.sub.j.sup.h (with jH.sub.1) of the set H.sub.1 are decoded in order to obtain the N.sub.1 reconstructed signals of the high sub-band s.sub.j.sup.h.
(42) At the step E407, a mixing of the decoded streams of the set H.sub.1 is carried out for a transmission to the terminal i.
(43) If iH.sub.1, then the mixing takes place by summing N.sub.11 reconstructed signals of the high sub-band: S.sub.i.sup.h=s.sub.j.sup.h with jH.sub.1{i}.
(44) In the opposite case, (i.Math.H.sub.1) then the mixing is carried out by summing the N.sub.1 reconstructed signals of the high sub-band S.sub.i.sup.h=s.sub.j.sup.h with jH.sub.1.
(45) At the step E408, the high sub-band output binary stream (Bs.sub.i.sup.h) intended to be transmitted to the terminal Ti (0i<N) is then obtained by coding, with the low sub-band encoder of the G.722 (ADPCM over 2 bits), of the mixed signal S.sub.i.sup.h.
(46) Depending on the number of signals to be considered after the steps E401 (optional) and E405, it is sometimes more advantageous to start by performing the summation of the N signals (S.sup.l=.sub.jv, s.sub.j.sup.l; or S.sup.h=.sub.jH.sub.
(47) In the following, it will be understood that the term summation or summing of N1 signals can refer to the subtraction of a signal from the sum of N signals.
(48) Thus, by taking into account the presence or absence of a high-frequency content in the streams to be combined, this allows the complexity of the decoding E406 and mixing E407 steps to be reduced. Indeed, at the step E406, only the streams having HD content are decoded, hence the number of ADPCM decodings is reduced from N to N.sub.1. Similarly, at the step E407, there are not 2N (or N(N1)) calculations but 2N.sub.1 (or N.sub.1(N.sub.11)) calculations.
(49) Moreover, since the signals S.sub.i.sup.h are the same for the outputs i, i.Math.H.sub.1, the number of re-codings at the step E408 can be reduced from N+1 to N.sub.1+1. However, the complexity of the detection of HD content in the input channels at the step E405 needs to be factored in.
(50) A method of low complexity for detection of a frequency band within an audio content may be implemented within the framework of this invention. This method is now described with reference to
(51) A step E501 determines, as a first stage, by frequency sub-band from a predetermined set of frequency sub-bands, an estimated signal based on the binary stream. For this purpose, steps are implemented for obtaining an adaptation parameter associated with the quantification index for a current sample n and for calculation of an estimated signal for the current sample using this determined adaptation parameter, the signal estimated for the preceding sample and a predefined omission factor. One exemplary embodiment of such a technique for determination of an estimated signal is described in the French patent application FR 11 52596.
(52) This estimated signal is representative of the audio content that has been coded. The predetermined set of sub-bands, in other words the sub-bands considered for estimating these representative signals together with their number M, may be predefined or may vary over time.
(53) In the following, this estimated signal for a sub-band k (0k<M) will be denoted: {tilde over (s)}.sub.k(n), n=0, . . . , N.sub.k1, N.sub.k being the number of samples in a sub-band k.
(54) A step E502 for determination of uncoded parameters representative of the audio content is subsequently implemented. These parameters p(k) are determined by frequency sub-band from the predetermined set of sub-band, using the estimated signal in the corresponding sub-bands.
(55) Several types of parameters may be calculated. A few examples of these are presented hereinafter.
(56) For a sub-band k, a parameter may be determined for example from a norm of the estimated signal (or a power of this norm). Such parameters are given hereinbelow for a given band k (0k<M):
(57)
(58) Normalized versions may also be used, such as:
(59)
Other types of parameters may also be used, such as a ratio: for example, the ratio between the minimum and the maximum of the estimated signalin absolute values or otherwise:
(60)
Of course, the inverse of this ratio may also be considered.
(61) In one exemplary embodiment, the same parameter is calculated for various sub-bands. However, a parameter might only be calculated over a more restricted number of sub-bands (which could be limited to a single sub-band).
(62) Using at least one of these parameters, the step E503 is implemented for calculating at least one local criterion.
(63) This local criterion may be calculated using parameters of a single sub-band or parameters calculated over more than one sub-band. In order to distinguish these two categories of criterion, they are named according to the number of sub-bands taken into account in the calculation: mono-band criterion and multi-band criterion.
(64) For each category, a few examples of criteria are detailed hereinafter.
(65) A mono-band criterion uses a distance between a parameter p(k) of a sub-band k and a threshold thresh.sub.m(k). This threshold may be adaptative or otherwise and may potentially depend on the sub-band in question. The mono-band criterion is then denoted d(k) such that:
d(k)=dist(p(k),thresh.sub.m(k))
(66) Advantageously, this distance is the simple difference between the parameter p(k) and this threshold:
d(k)=dist(p(k),thresh.sub.m(k))=p(k)thresh.sub.m(k)
For example, these mono-band criteria may be defined by the equations hereinbelow, over the sub-bands k and k, (0k, k<M):
crit0.sub.m(k)=dist(L.sub.(k),thresh0.sub.m(k)),crit1.sub.m(k)=dist(L.sub.1(k),thresh1.sub.m(k)),
where thresh0.sub.m(k) and thresh1.sub.m(k) are thresholdsadaptative or otherwiseand may be dependent on the sub-band in question.
(67) The threshold on the band i could, for example, be adapted as a function of the band j; or as a function of a preceding block of samples.
(68) A multi-band criterion compares parameters calculated over at least two sub-bandsfor example, a parameter p(k) of a sub-band i and a parameter q(k) of a sub-band k.
(69) Here again, as in the case of a mono-band criterion, a threshold thresh.sub.m(k,k)adaptative or otherwise and potentially dependent on the sub-bands in questionmay be used.
(70) For example, these multi-band criteria may be defined by the equations hereinbelow, over the sub-bands k and k, (0k, k<M):
crit0.sub.M(k,k)=dist.sub.th(dist.sub.p(.sub.min max(k),.sub.min max(k)),thresh0.sub.M(k,k)),
crit1.sub.M(k,k)=dist.sub.th(dist.sub.p(L.sub.1(k),L.sub.1(k)),thresh1.sub.M(k,k))
(71) Advantageously, a distance dist.sub.th is a simple difference between a threshold and a distance dist.sub.p between parameters of at least two sub-bands.
(72) The distance dist.sub.p between parameters of at least two sub-bands may use ratios between parameters. For example, in the case of a distance between parameters of two sub-bands:
dist.sub.p(L.sub.1(k),L.sub.1(k))=L.sub.1(k)/L.sub.1(k) or dist.sub.p(L.sub.1(k),L.sub.(k))=L.sub.1(k)/L.sub.(k)
It is also noted that the same set of parameters may be used for calculating several criteria both in the case of a mono-band criterion and of a multi-band criterion.
(73) Based on at least one local criterion such as defined, the step E504 is implemented. At this step, a local decision (instantaneous, denoted dec.sub.inst.sup.cur) is taken while detecting whether the coded audio content comprises frequencies within at least one sub-band.
(74) In one particular embodiment, in the case of detection of a frequency band referred to as high frequency band (i.e. frequencies higher than a frequency threshold F.sub.th), it is decided whether the audio content comprises frequencies in the sub-bands i such that, i.sub.thk, where i.sub.th is the index of the sub-band including the frequency F.sub.th. At least one of these sub-bands k is taken into consideration at the decision step.
(75) In the particular example of the G.722 fixed HD voice coder with two sub-bands, when trying to detect if the coded content really is wideband (WB), it is detected whether there is relevant content in the second sub-band (high sub-band) in order to take a decision Narrow band NB or Wide band WB.
(76) In the case where the predetermined frequency band is not the high frequency band, the decision is of course adapted and the sub-bands considered can be those that are lower than a frequency threshold for detecting a low-frequency band or else those that are defined by frequencies either side of and including this predetermined frequency band.
(77) In order to take this decision, at least one local criterion is useful. As a variant, several criteria may be used alone or jointly.
(78) The decision may be flexible or hard. A hard decision consists in comparing at least one criterion with a threshold and in taking a binary decision or with predefined states on the presence of the frequency band within the sub-band.
(79) A flexible decision consists in using the value of the criterion in order to define, according to a predefined interval of values, a higher or lower probability for the presence of the frequency band within the sub-band in question.
(80) In one particular embodiment, a step for detection of the type of content, for example a vocal content, is first of all carried out in order to only carry out the local detection on the relevant frames, in other words comprising this type of content.
(81) In order to detect this type of content, advantageously, the parameters determined in E502 on the signals representative of the signals in sub-bands are used.
(82) In one variant embodiment, in order to increase the reliability of the detection, the final decision for a current block of samples, denoted dec.sup.cur, depends not only on the local instantaneous detection but also on the past detections. Using flexible or hard local decisions by block, a global decision is taken on a number of K blocks preceding the current block. This number of K blocks is adjustable depending on a compromise reliability of the decision/speed of the decision.
(83) For example, the local detections can be smoothed over several blocks by a window which could be sliding. The dependence of the current decision on the past detections may also be dependent on the reliability of the local decision. For example, if the local decision is estimated to be robust, the dependence of the current decision with respect to past decisions may be minimized or even eliminated.
(84) Several embodiments are possible for the detection method such as described, both in the choice of the parameters, of the criteria, of the way in which to combine potentially several criteria and in the use of a flexible or hard decision, locally or globally. It is thus possible to optimize the compromise complexity/reliability of the detection together with the speed of the detection.
(85) As has been mentioned, this method of detection with a low algorithmic cost of the audio band for a content coded by the G.722 also carries out, in one preferred embodiment, a detection of voice activity. This information is then advantageously used at the step E401 in
(86) Thus, another advantage of this detection technique is that the majority of the calculations necessary for the decoding have already been carried out for the detection. Thus, depending on the compromise storage memory/calculation complexity, the signals used for the detection of HD content (step E405) may be saved in memory in order to be used to reduce the complexity of the steps for decoding the signals of the low (step E402) and high (step E406) sub-bands.
(87) The method of mixing according to the invention is applicable to the combination of streams coded by coders operating over various bandwidths (medium band, super-wide band, HiFi band, etc.). For example, in the case of a super-HD coder (with four sub-bands coded by ADPCM technology), as described for example in the document by the authors A. Charbonnier, J. P. Petit, entitled Sub-band ADPCM coding for high quality audio signals in ICASSP 1988, pp. 2540-2543, the application of the invention may consist in carrying out a direct recombination of the signals of the two low sub-bands (corresponding to the wide band [0-8 kHz]) and in recombining the signals of the two high sub-bands (corresponding to the audio band [8-16 kHz]) selected after detection of super-HD content. Another example of application of the invention to this super-HD coder consists in combining the signals of the lowest sub-band (corresponding to the narrow band [0-4 kHz]), in recombining the signals of the second sub-band (corresponding to the audio band [4-8 kHz]) selected after detection of HD content, and in recombining the signals of the two high sub-bands (corresponding to the audio band [8-16 kHz]) having been selected after detection of super-HD content.
(88)
(89) In order to limit the complexity, in this embodiment, the technique described with reference to
(90) Using N hierarchical binary streams or input channels, coded in this embodiment by ADPCM-4SB technology, an optional pre-selection step E601 is implemented.
(91) This pre-selection step allows the selection from amongst the various input channels, those that meet one or more of the selection criteria described previously for the methods of the prior art. For example, using the detection of voice activity, the FCFS criterion (for First Come First Serve in English) is used for selecting streams. Or else, based on the measurement of the power of the signal or of its energy, the LT criterion (for Loudest Talker in English) is used for selecting the streams with the highest intensity.
(92) Thus, a part (N with NN) of the coded streams received by the mixing device or mixing bridge is taken into account for implementing the method of mixing. This therefore reduces the complexity of implementation of the steps of the method since the number of channels to be mixed is limited.
(93) This pre-selection step is optional and the decoding step E602 may also be applied to all N of the input coded audio streams. V denotes the set of the input channels being considered, consisting of either the N input channels if the optional pre-selection step is implemented, or of the N input channels otherwise.
(94) As previously, the notation N (with NN) is used whether the optional step E601 is implemented or not. Similarly, the pre-selection may be applied as a variant or as a complement after the step for decoding the low sub-band.
(95) The step E602 for decoding the N streams in the low sub-band is subsequently implemented.
(96) For the set V, for each channel j of input (jV), the following notations are used: Be.sub.j.sup.f, f=0, . . . , 3 the incoming binary stream of the sub-band f (corresponding to the audio band [4f4(f+1) kHz] s.sub.j.sup.f the reconstructed signal of the sub-band f obtained by decoding the stream Be.sub.j.sup.f Also, for each output channel i (0i<N): Bs.sub.i.sup.f denotes the outgoing binary stream for the sub-band f, f=0, . . . , 3.
(97) At the decoding step E602, the reconstructed signal S.sup.9 of the lowest sub-band (corresponding to the narrow band [0-4 kHz]) is obtained by decoding the stream Be.sub.j.sup.0 (jV).
(98) At the step E603, a procedure for mixing the binary streams thus decoded is carried out by summing N1 signals thus reconstructed of the low sub-band: S.sub.i.sup.0=s.sub.j.sup.0 with jV, ji for a transmission of the stream to the terminal i.
(99) If the pre-selection step E601 is carried out and if iV, then the mixing takes place by addition of N1 reconstructed signals of the sub-band 0: S.sub.i.sup.0=s.sub.j.sup.0 with jV{i}.
(100) In the opposite case, (i.Math.V) the mixing is then carried out by summation of the N constructed signals of the sub-band 0: S.sub.i.sup.0=s.sub.j.sup.0 with jV.
(101) The low sub-band output binary stream (Bs.sub.i.sup.0) intended to be transmitted to a terminal Ti (0i<N) is then obtained by coding this sum signal S.sub.i.sup.0 at the step E604, using the ADPCM coder.
(102) Using the N input channels, a step E605 for detection of a first predetermined frequency band BF1 is carried out. In this embodiment, the first predetermined frequency band is the frequency sub-band [4-8 kHz]. Thus, an analysis of the audio content of the set V of the input channels is carried out. The method described with reference to
(103) Thus, a sub-set H.sub.1 of N.sub.1 input channels with HD content is selected at the output of the step E605. The set H.sub.1 is included in the set V (H.sub.1V) of the input channels being considered (in other words in the set of the N preselected input channels if the pre-selection step E601 is carried out; otherwise in the set of the N input channels).
(104) It goes without saying that other modes of detection of the presence of the band BF1 are possible.
(105) At the decoding step E606, the N.sub.1 reconstructed signals s.sub.j.sup.1 of the high sub-band or sub-band 1 are obtained by decoding sub-band 1 binary streams Be.sub.j.sup.1, jH.sub.1.
(106) At the step E607, a mixing of the decoded streams of the set H.sub.1 is carried out for a transmission to the terminal i.
(107) If iH.sub.1, then the mixing is done by summing N.sub.11 reconstructed signals of the sub-band 1: S.sub.i.sup.1=s.sub.j.sup.1 with jH.sub.1{i}.
(108) In the opposite case, (i.Math.H.sub.1), the mixing is then carried out by summing the N.sub.1 reconstructed signals of the sub-band 1: S.sub.i.sup.1=s.sub.j.sup.1 with jH.sub.1.
(109) At the step E608, the binary stream of output sub-band 1 (Bs.sub.i.sup.1) designed to be transmitted to the terminal Ti is then obtained by coding by the encoder ADPCM high sub-band, of the mixed signal S.sub.i.sup.1.
(110) Starting from the set H.sub.1 determined at the step E605, a step E609 of detection of a second predetermined frequency band BF2 is carried out. The second frequency band BF2 is, in this exemplary embodiment, the sub-band [8-12 kHz]. The detection method in
(111) At the decoding step E610, the N.sub.2 reconstructed signals s.sub.j.sup.2 of the sub-band 2 are obtained by decoding sub-band 2 binary streams Be.sub.j.sup.2, jH.sub.2.
(112) At the step E611, a mixing of the decoded streams of the set H.sub.2 is carried out for a transmission to the terminal i.
(113) If iH.sub.2, then the mixing is done by summing N.sub.21 reconstructed signals of the sub-band 2: S.sub.i.sup.2=s.sub.j.sup.2 with jH.sub.2{i}.
(114) In the opposite case, (i.Math.H.sub.2) the mixing is then carried out by summing the N.sub.2 reconstructed signals of the sub-band 2: S.sub.i.sup.2=s.sub.j.sup.2 with jH.sub.2.
(115) At the step E612, the output binary stream of the sub-band 2 (Bs.sub.i.sup.2), intended to be transmitted to the terminal Ti, is then obtained by coding the mixed signal S.sub.i.sup.2 using the ADPCM coder of the sub-band 2.
(116) Using the set H.sub.2 determined at the step E609, a step E613 for detection of a third predetermined frequency band BF3 is carried out. The third frequency band BF3 is, in this exemplary embodiment, the sub-band [12-16 kHz]. The detection method in
(117) At the decoding step E614, the N.sub.3 reconstructed signals s.sub.j.sup.3 of the sub-band 3 are obtained by decoding binary streams of the sub-band 3: Be.sub.j.sup.3, jH.sub.3.
(118) At the step E615, a mixing of the decoded streams of the set H.sub.3 is carried out for a transmission to the terminal i.
(119) If iH.sub.3, then the mixing is done by summing N.sub.31 reconstructed signals of the sub-band 3: S.sub.i.sup.3=s.sub.j.sup.3 with jH.sub.3{i}.
(120) In the opposite case, (i.Math.H.sub.3), the mixing is then carried out by summing the N.sub.3 reconstructed signals of the sub-band 3: S.sub.i.sup.3=s.sub.j.sup.3 with jH.sub.3.
(121) At the step E616, the output binary stream of sub-band 3 (Bs.sub.i.sup.3) designed to be transmitted to the terminal Ti is then obtained by coding the mixed signal S.sub.i.sup.3 by the ADPCM coder of the sub-band 3.
(122) Several mixed streams coded are thus obtained (Bs.sub.i.sup.f, f=0, . . . , 3) for the four sub-bands f (f=0, . . . , 3). These mixed streams are transmitted to a terminal Ti (0i<N). A step for combining these mixed streams may be carried out prior to transmission.
(123) Thus, taking into account the presence or absence of a content in the high frequency sub-bands (sub-bands 1, 2, 3) in the streams to be combined allows the complexity of the steps for decoding E606, E610 and E614, for mixing E607, E611 and E615 and for coding E608, E612 and E616 to be reduced.
(124) Indeed, at the steps E606, E610 and E614, with f such that f=1, 2 or 3, only the streams of the sub-sets H.sub.f are decoded, hence the number of decodings ADPCM is reduced from N to N.sub.f. Similarly, at the steps E607, E611 and E615, there are not 2N(or N(N1)) calculations but 2N.sub.f (or N.sub.f.sup.2). Furthermore, since the signals S.sub.i.sup.f are the same for the outputs i, i.Math.H.sub.f the number of re-codings at the steps E608, E612 and E616, can be reduced from N to N.sub.f+1.
(125) It should also be noted that only carrying out the procedure for detection of a frequency band BFf within the sub-set of the input channels selected for the lower frequency band BF(f1), also reduces the complexity of the steps for detection of the various frequency bands (E609 and E613).
(126) Moreover, as previously mentioned, certain calculations necessary for the decoding have already been able to be performed at the detection step and are thus re-usable for the decoding if the input is selected. This therefore further reduces the complexity of the calculation of the method.
(127) Another exemplary embodiment of the method of mixing according to the invention is now described. This embodiment describes the implementation of the invention in a mixing device comprising a bridge combining streams coded by the UIT-T G.711.1 coder at 96 kbit/s.
(128) This type of coder, illustrated in
(129) The G.711.1 coder operates on audio signals sampled at 16 kHz over blocks or frames of 5 ms (i.e. 80 samples at 16 kHz). The input signal x(n) is divided into 2 sub-bands [0, 4 kHz] and [4, 8 kHz] by filters QMF shown in 702 potentially after a pre-processing (in order for example to eliminate the DC component by high-pass filtering) in 701. With two input samples, the filter QMF produces a sample x.sub.L(n) of low sub-band (0-4000 Hz) and a sample x.sub.H(n) of high sub-band (4000-8000 Hz) at the output. The data rate of 64 kbit/s (Layer 0 compatible with the G.711) corresponds to the quantification of the sub-band [0, 4 kHz] by the PCM (Pulse-Code Modulation) technique equivalent to the G.711, with a conditioning of the quantification noise. The next two layers (Layers 1 and 2) respectively code the low sub-band [0, 4 kHz] by a PCM coding improvement technique, and the high sub-band [4, 8 kHz] by an MDCT (for Modified Discrete Cosine Transform) coding, each with a data rate of 16 kbit/s (80 bits per frame). When the decoder receives these improvement layers, it can improve the quality of the decoded signal.
(130) The core coding of the low sub-band signal is carried out by the module 703a according to the PCM technique equivalent to the G.711, with a conditioning of the quantification noise. The PCM coding used in the G.711 is briefly recalled hereinafter.
(131) The G.711 coder is based on a logarithmic compression over 8 bits at the sampling frequency of 8 kHz, to yield a data rate of 64 kbit/s. The PCM G.711 coding applies a compression of the filtered signals in the band [300-3400 Hz] by a logarithmic curve which allows a just about constant signal-to-noise ratio to obtained for a wide dynamic range of signals. The quantification resolution varies with the amplitude of the sample to be coded: when the level of the input signal is low, the quantification step is small, when the level of the input signal is high, the quantification step is large. Two logarithmic PCM compression laws are used: the law (used in North America and in Japan) and the A law (used in Europe and in rest of the world). The G.711 A law and the G.711 law encode the input samples over 8 bits. In practice, in order to facilitate the implantation of the G.711 coder, the logarithmic PCM compression has been approximated by a segmented curve. During this compression, the least-significant bits of the mantissa are lost.
(132) In the A law, the 8 bits are laid out in the following fashion:
(133) 1 sign bit 3 bits indicating the segment, 4 bits indicating the location in the segment.
(134) The coding of the improvement layer (Layer 1) of the low sub-band (carried out by the module 703b in
(135) The recovery and the transmission of bits not transmitted in the mantissa of the PCM core coding improves the quality of the coding of the low sub-band. Indeed, in the case of reception of this improvement layer, the decoder can decode the mantissa with a higher precision. In the G.711.1, the number of additional bits for the mantissa depends on the amplitude of the samples: indeed, rather than allocating the same number of bits for improving the precision of the mantissa coding of the samples, the 80 bits available in the layer 1 of the G.711.1 for improving the precision of the mantissa coding of the 40 samples are allocated dynamically: more bits being assigned to the samples with a high exponent. Thus, whereas the budget of bits for the improvement layer is 2 bits per sample on average (16 kbit/s), with this adaptative allocation, the number of bits allocated to a sample varies according to its exponent value from 0 to 3 bits.
(136) For the high sub-band, a modified discrete cosine transform (MDCT) is firstly carried out by the module 704, over blocks of the signal from the high band of 10 ms with an overlap of 5 ms. Then, the MDCT coefficients, S.sub.HB(k), are coded by the module 705, using a vector quantification with an interleaved conjugate structure, and these coefficients are subsequently weighted then normalized (by the square route of their energy). These coefficients are then distributed into 6 sub-vectors with 6 dimensions, the 4 coefficients representing the highest frequencies are not coded. These six sub-vectors are quantified independently over 12 bits by a set of two dictionaries with a conjugate structure, C.sub.HOw and C.sub.H1w. Lastly, one overall gain per frame is calculated using the decoded sub-vectors and the normalization factor, this gain being quantified over 8 bits by a scalar quantifier of the p-law PCM type.
(137) The various coding layers (with the indices I.sub.B0(n), I.sub.B1(n), I.sub.B2(n)) are multiplexed in 706 so as to yield the coded signal I(n).
(138) In the decoder, the set of the 36 MDCT coefficients is reconstructed based on the six sub-vectors decoded with inverse interleaving and the 4 coefficients representing the highest uncoded frequencies are simply set to zero, then the signal of the decoded high band is generated by inverse MDCT transform.
(139) In the two preceding embodiments, a detection of high frequency content with a low algorithmic cost is used and makes use of the signals estimated during this detection in order to reduce the complexity of the decoding of the signals with sub-bands selected for the recombination. In this third embodiment, it is shown that, even when the method of detection is a conventional method, the invention allows the complexity of the recombination of streams to be reduced. For this purpose, the application of the invention to the UIT-T G.711.1 coder such as described with reference to
(140) In this embodiment, the method of detection of an HD content in an input stream uses a comparison of the energy of the decoded signal of the high sub-band with that of the decoded signal of the low sub-band.
(141)
For each output channel (0i<N), the following notations are used: Bs.sub.i.sup.l the binary stream of the outgoing low sub-band; and Be.sub.i.sup.h the binary stream of the outgoing high sub-band.
Thus, starting from the N coded streams received by the mixing device, a step E801 for decoding the binary streams of the low sub-band Be.sub.j.sup.l, 0j<N is carried out so as to obtain N signals s.sub.j.sup.l of the low sub-band.
(142) Similarly, at the step E805, N signals of the high sub-band s.sub.j.sup.h are obtained by decoding binary streams of the high sub-band Be.sub.j.sup.h, 0j<N.
(143) So as to carry out a detection of a predetermined frequency band on the audio content of the signals thus decoded, a first step E802 for calculating the energies E.sub.j.sup.l (0j<N) of the decoded low sub-band signals is carried out.
(144) A step E806 for calculation of the energies E.sub.j.sup.h (0j<N) of the decoded high sub-band signals is also implemented.
(145) The step E807, performs a calculation of the differences between the energies of the two sub-bands, in the logarithmic domain (dB)or of their ratios in the linear domain; 0j<N.
(146) This comparison between the energies of the two sub-bands allows the presence of a predetermined frequency band to be detected in the content, for example a high frequency band.
(147) Thus, at the step E807, a set H of the input channels having HD content is determined. N.sup.h denotes the cardinal of the set H.
(148) At the step E808, a mixing of the decoded streams of the set H is carried out for a transmission to the terminal i.
(149) If iH, then the mixing is done by summing N.sup.h1 reconstructed signals of the high sub-band: S.sub.i.sup.h=s.sub.j.sup.h with jH{i}.
(150) In the opposite case, (i.Math.H) the mixing is then carried out by summing the N.sup.h reconstructed signals of the high sub-band: S.sub.i.sup.h=s.sub.j.sup.h with jH.
(151) At the step E809, the high sub-band output binary stream (Bs.sub.i.sup.h), intended to be transmitted to the terminal Ti, is then obtained by coding this sum signal S.sub.i.sup.h using the high sub-band encoder of the G.711.1
(152) Similarly, at the step E803, the summing of N1 reconstructed signals of the low sub-band is carried out: S.sub.i.sup.l=s.sub.j.sup.l; 0j<N, ji.
(153) At the step E804, the low sub-band output binary stream Bs.sub.i.sup.l, intended to be transmitted to the terminal Ti is then obtained by coding this sum signal S.sub.i.sup.l using the low sub-band encoder of the G.711.1,
(154) A step for combining these two mixed streams may be carried out prior to transmission.
(155) With respect to a direct recombination in the domain of the decoded signals of the sub-bands, by taking into account the presence or absence of a high frequency content in the streams to be combined, the invention allows the complexity of the steps E808 and E809 to be reduced.
(156) Indeed, at the step 808, there are only N.sup.h+1 sum signals to be calculated: the signal S.sup.h being common to the outputs i such that i.Math.H. Furthermore, the signal S.sup.h only comprises N.sup.h signals and the N.sup.h sum signals S.sub.i.sup.h of the outputs i such that iH only comprise N.sup.h1 signals.
(157) Similarly, at the step 809, the number of re-codings may be reduced. In order to reduce the complexity even more, as in the prior art, the MDCT transforms needed for the re-coding of the combined signals of the high sub-band (step 809) will be able to be eliminated by storing in memory, at the step E805, the high sub-band signals in the MDCT domain and by carrying out the summations of the step E808 in the MDCT domain.
(158) Thus, although the invention has been illustrated in embodiments in mixing bridges, it will be understood that it can be implemented in any device having to combine streams from sub-band coders. For example, the invention may advantageously be used in a terminal in a multi-party communication using a mesh architecture or a centralized architecture using a replicating bridge, in order to reduce the number of decodings and summations.
(159)
(160) The device 900a in
(161) In terms of hardware, these devices comprise a processor 930 cooperating with a memory block BM comprising a storage and/or working memory MEM.
(162) The processor controls processing modules capable of implementing the method according to the invention. Thus, these devices comprise a module 902 for decoding a part of the coded streams over at least a first frequency sub-band, a module 903 for summing the streams thus decoded so as to form a first mixed stream. It also comprises a module 901 for detection, over at least a second frequency sub-band different from the at least first sub-band, of the presence of a predetermined frequency band within the plurality of coded audio streams. The module 902 also decodes coded audio streams for which the presence of the predetermined frequency band has been detected, over said at least a second sub-band and the mixing module 903 also adds together these decoded audio streams so as to form at least a second mixed stream.
(163) The memory block may advantageously comprise a computer program (prog.) comprising code instructions for the implementation of the steps of the method of mixing in the sense of the invention, when these instructions are executed by the processor PROC and notably the steps for decoding a part of the coded streams over at least a first frequency sub-band, for summing the streams thus decoded so as to form at least a first mixed stream, for detecting, over at least a second frequency sub-band different from the at least first sub-band, the presence of a predetermined frequency band within the plurality of coded audio streams and for summing the decoded audio streams for which the presence of the predetermined frequency band has been detected, over said at least a second sub-band so as to form at least a second mixed stream.
(164) Typically, the description in
(165) Generally speaking, the memory MEM stores all the data necessary for the implementation of the method of mixing.
(166) The device 900a in
(167) This device 900a also comprises an input module of 905a designed to receive a plurality of coded audio streams N*Be.sub.j originating for example from the various terminals of the communications system, these streams having been coded by a coder using frequency sub-band coding.
(168) The device 900b in
(169) This device 900b also comprises an output module 906b designed to transmit the stream S.sub.Mi, resulting from the combination of the mixed streams by the module 903, to the reproduction system of the device or of the terminal.