Abstract
In some embodiments, a pitch filter for filtering a preliminary audio signal generated from an audio bitstream is disclosed. The pitch filter has an operating mode selected from one of either: (i) an active mode where the preliminary audio signal is filtered using filtering information to obtain a filtered audio signal, and (ii) an inactive mode where the pitch filter is disabled. The preliminary audio signal is generated in an audio encoder or audio decoder having a coding mode selected from at least two distinct coding modes, and the pitch filter is capable of being selectively operated in either the active mode or the inactive mode while operating in the coding mode based on control information.
Claims
1. A decoder system for decoding a bit stream signal as an audio time signal, including: a decoding section for decoding a bit stream signal as a preliminary audio time signal; and an interharmonic noise attenuation post filter for filtering the preliminary audio time signal to obtain an audio time signal, wherein the interharmonic noise attenuation is controlled by a post filter gain, comprising a control section adapted to selectively disable the post filter, by setting the post filter gain to zero, responsive only to post-filtering information encoded in the bit stream signal, the post-filtering information being indicative of an encoder-side decision whether to disable post filtering, wherein the preliminary audio time signal is output as the audio time signal.
2. The decoder system of claim 1, wherein the post-filtering is adapted to attenuate only such spectral components which are located below a predetermined cut-off frequency.
3. The decoder system of claim 1, wherein the decoding section further comprises: a code-excited linear prediction, CELP, decoding module; and a transform-coded excitation, TCX, decoding module for decoding a bit stream signal as an audio time signal, the control section being adapted operate the decoder system in at least the following modes: a) the TCX module is enabled and the post filter is disabled; b) the CELP module and the post filter are enabled; and c) the CELP module is enabled and the post filter is disabled, wherein the preliminary audio time signal and the audio time signal coincide.
4. The decoder system of claim 1, said decoding section including a speech decoding module, wherein a pitch frequency estimated by a long-term prediction section in the encoder is encoded in the bit stream signal, and wherein the post filter is adapted to attenuate spectral components located between harmonics of the pitch frequency.
5. The decoder system of claim 1, wherein the bit stream signal contains a representation of a pitch frequency and the post filter is adapted to attenuate spectral components located between harmonics of the pitch frequency.
6. The decoder system of claim 4, wherein the post filter is adapted to attenuate only such spectral components which are located below a predetermined cut-off frequency, and wherein the decoding section further comprises an Advanced Audio Coding, AAC, decoding module for decoding a bit stream signal as an audio time signal, the control section being adapted to operate the decoder system also in the following mode: d) the AAC module is enabled and the post filter is disabled.
7. The decoder system of claim 1, wherein the bit stream signal is segmented into time frames and the control section is adapted to disable an entire time frame or a sequence of entire time frames.
8. The decoder system of claim 7, wherein the control section is further adapted to receive, for each time frame in a Moving Pictures Experts Group, MPEG, bit stream, a data field associated with this time frame and is operable, responsive to the value of the data field, to disable the post filter.
9. The decoder system of claim 1, wherein the control section is adapted to decrease and/or increase the gain of the post filter gradually.
10. A method of decoding a bit stream signal as an audio time signal, including the steps of: decoding a bit stream signal as a preliminary audio time signal; and post-filtering the preliminary audio time signal by attenuating interharmonic noise, thereby obtaining an audio time signal, wherein the interharmonic noise attenuation is controlled by a post filter gain, wherein the post-filtering step is selectively omitted, by setting the post filter gain to zero, responsive only to post-filtering information encoded in the bit stream signal, the post-filtering information being indicative of an encoder-side decision whether to disable post filtering.
11. A non-transitory computer readable storage medium containing a program of instructions which, when executed by one or more processors, cause the one or more processors to perform the method of claim 10.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Embodiments of the present invention will now be described with reference to the accompanying drawings, on which:
(2) FIG. 1 is a block diagram showing a conventional decoder with post filter;
(3) FIG. 2 is a schematic block diagram of a conventional decoder operable in AAC, ACELP and TCX mode and including a post filter permanently connected downstream of the ACELP module;
(4) FIG. 3 is a block diagram illustrating the structure of a post filter;
(5) FIGS. 4 and 5 are block diagrams of two decoders according to the invention;
(6) FIGS. 6 and 7 are block diagrams illustrating differences between a conventional decoder (FIG. 6) and a decoder (FIG. 7) according to the invention;
(7) FIG. 8 is a block diagram of an encoder according to the invention;
(8) FIGS. 9 and 10 are block diagrams illustrating differences between a conventional decoder (FIG. 9) and a decoder (FIG. 10) according to the invention; and
(9) FIG. 11 is a block diagram of an autonomous post filter which can be selectively activated and deactivated.
DETAILED DESCRIPTION OF EMBODIMENTS
(10) FIG. 4 is a schematic drawing of a decoder system 400 according to an embodiment of the invention, having as its input a bit stream signal and as its output an audio signal. As in the conventional decoders shown in FIG. 1, a post filter 440 is arranged downstream of a decoding module 410 but can be switched into or out of the decoding path by operating a switch 442. The post filter is enabled in the switch position shown in the figure. It would be disabled if the switch was set in the opposite position, whereby the signal from the decoding module 410 would instead be conducted over the bypass line 444. As an inventive contribution, the switch 442 is controllable by post filtering information contained in the bit stream signal, so that post filtering may be applied and removed irrespectively of the current status of the decoding module 410. Because a post filter 440 operates at some delay—for example, the post filter shown in FIG. 3 will introduce a delay amounting to at least the pitch period T—a compensation delay module 443 is arranged on the bypass line 444 to maintain the modules in a synchronized condition at switching. The delay module 443 delays the signal by the same period as the post filter 440 would, but does not otherwise process the signal. To minimize the change-over time, the compensation delay module 443 receives the same signal as the post filter 440 at all times. In an alternative embodiment where the post filter 440 is replaced by a zero-delay post filter (e.g., a causal filter, such as a filter with two taps, independent of future signal values), the compensation delay module 443 can be omitted.
(11) FIG. 5 illustrates a further development according to the teachings of the invention of the triple-mode decoder system 500 of FIG. 2. An ACELP decoding module 511 is arranged in parallel with a TCX decoding module 512 and an AAC decoding module 513. In series with the ACELP decoding module 511 is arranged a post filter 540 for attenuating noise, particularly noise located between harmonics of a pitch frequency directly or indirectly derivable from the bit stream signal for which the decoder system 500 is adapted. The bit stream signal also encodes post filtering information governing the positions of an upper switch 541 operable to switch the post filter 540 out of the processing path and replace it with a compensation delay 543 like in FIG. 4. A lower switch 542 is used for switching between different decoding modes. With this structure, the position of the upper switch 541 is immaterial when one of the TCX or AAC modules 512, 513 is used; hence, the post filtering information does not necessary indicate this position except in the ACELP mode. Whatever decoding mode is currently used, the signal is supplied from the downstream connection point of the lower switch 542 to a spectral band replication (SBR) module 550, which outputs an audio signal. The skilled person will realize that the drawing is of a conceptual nature, as is clear notably from the switches which are shown schematically as separate physical entities with movable contacting means. In a possible realistic implementation of the decoder system, the switches as well as the other modules will be embodied by computer-readable instructions.
(12) FIGS. 6 and 7 are also block diagrams of two triple-mode decoder systems operable in an ACELP, TCX or frequency-domain decoding mode. With reference to the latter figure, which shows an embodiment of the invention, a bit stream signal is supplied to an input point 701, which is in turn permanently connected via respective branches to the three decoding modules 711, 712, 713. The input point 701 also has a connecting branch 702 (not present in the conventional decoding system of FIG. 6) to a pitch enhancement module 740, which acts as a post filter of the general type described above. As is common practice in the art, a first transition windowing module 703 is arranged downstream of the ACELP and TCX modules 711, 712, to carry out transitions between the decoding modules. A second transition module 704 is arranged downstream of the frequency-domain decoding module 713 and the first transition windowing module 703, to carry out transition between the two super-modes. Further a SBR module 750 is provided immediately upstream of the output point 705. Clearly, the bit stream signal is supplied directly (or after demultiplexing, as appropriate) to all three decoding modules 711, 712, 713 and to the pitch enhancement module 740. Information contained in the bit stream controls what decoding module is to be active. By the invention however, the pitch enhancement module 740 performs an analogous self actuation, which responsive to post filtering information in the bit stream may act as a post filter or simply as a pass-through. This may for instance be realized through the provision of a control section (not shown) in the pitch enhancement module 740, by means of which the post filtering action can be turned on or off. The pitch enhancement module 740 is always in its pass-through mode when the decoder system operates in the frequency-domain or TCX decoding mode, wherein strictly speaking no post filtering information is necessary. It is understood that modules not forming part of the inventive contribution and whose presence is obvious to the skilled person, e.g., a demultiplexer, have been omitted from FIG. 7 and other similar drawings to increase clarity.
(13) As a variation, the decoder system of FIG. 7 may be equipped with a control module (not shown) for deciding whether post filtering is to be applied using an analysis-by-synthesis approach. Such control module is communicatively connected to the pitch enhancement module 740 and to the ACELP module 711, from which it extracts an intermediate decoded signal s.sub.i_DEC(n) representing an intermediate stage in the decoding process, preferably one corresponding to the excitation of the signal. The detection module has the necessary information to simulate the action of the pitch enhancement module 740, as defined by the transfer functions P.sub.LT(Z) and H.sub.LP(Z) (cf. Background section and FIG. 3), or equivalently their filter impulse responses p.sub.LT(z) and h.sub.LP(n). As follows by the discussion in the Background section, the component to be subtracted at post filtering can be estimated by an approximate difference signal s.sub.AD(n) which is proportional to [(s.sub.i_DEC*p.sub.LT)*h.sub.LP](n), where * denotes discrete convolution. This is an approximation of the true difference between the original audio signal and the post-filtered decoded signal, namely
s.sub.ORIG(n)−s.sub.E(n)=s.sub.ORIG(n)−(s.sub.DEC(n)−α[s.sub.DEC*p.sub.LT*h.sub.LP](n)),
where α is the post filter gain. By studying the total energy, low-band energy, tonality, actual magnitude spectrum or past magnitude spectra of this signal, as disclosed in the Summary section and the claims, the control section may find a basis for the decision whether to activate or deactivate the pitch enhancement module 740.
(14) FIG. 8 shows an encoder system 800 according to an embodiment of the invention. The encoder system 800 is adapted to process digital audio signals, which are generally obtained by capturing a sound wave by a microphone and transducing the wave into an analog electric signal. The electric signal is then sampled into a digital signal susceptible to be provided, in a suitable format, to the encoder system 800. The system generally consists of an encoding module 810, a decision module 820 and a multiplexer 830. By virtue of switches 814, 815 (symbolically represented), the encoding module 810 is operable in either a CELP, a TCX or an AAC mode, by selectively activating modules 811, 812, 813. The decision module 820 applies one or more predefined criteria to decide whether to disable post filtering during decoding of a bit stream signal produced by the encoder system 800 to encode an audio signal. For this purpose, the decision module 820 may examine the audio signal directly or may receive data from the encoding module 810 via a connection line 816. A signal indicative of the decision taken by the decision module 820 is provided, together with the encoded audio signal from the encoding module 810, to a multiplexer 830, which concatenates the signals into a bit stream constituting the output of the encoder system 800.
(15) Preferably, the decision module 820 bases its decision on an approximate difference signal computed from an intermediate decoded signal s.sub.i_DEC, which can be subtracted from the encoding module 810. The intermediate decoded signal represents an intermediate stage in the decoding process, as discussed in preceding paragraphs, but may be extracted from a corresponding stage of the encoding process. However, in the encoder system 800 the original audio signal s.sub.ORIG is available so that, advantageously, the approximate difference signal is formed as:
s.sub.ORIG(n)−(s.sub.i_DEC(n)−α[(s.sub.i_DEC*p.sub.LT)*h.sub.LP](n)).
The approximation resides in the fact that the intermediate decoded signal is used in lieu of the final decoded signal. This enables an appraisal of the nature of the component that a post filter would remove at decoding, and by applying one of the criteria discussed in the Summary section, the decision module 820 will be able to take a decision whether to disable post filtering.
(16) As a variation to this, the decision module 820 may use the original signal in place of an intermediate decoded signal, so that the approximate difference signal will be [(s.sub.i_DEC*p.sub.LT)*h.sub.LP](n). This is likely to be a less faithful approximation but on the other hand makes the presence of a connection line 816 between the decision module 820 and the encoding module 810 optional.
(17) In such other variations of this embodiment where the decision module 820 studies the audio signal directly, one or more of the following criteria may be applied: Does the audio signal contain both a component with dominant fundamental frequency and a component located below the fundamental frequency? (The fundamental frequency may be supplied as a by-product of the encoding module 810.) Does the audio signal contain both a component with dominant fundamental frequency and a component located between the harmonics of the fundamental frequency? Does the audio signal contain significant signal energy below the fundamental frequency? Is post-filtered decoding (likely to be) preferable to unfiltered decoding with respect to rate-distortion optimality?
(18) In all the described variations of the encoder structure shown in FIG. 8—that is, irrespectively of the basis of the detection criterion—the decision section 820 may be enabled to decide on a gradual onset or gradual removal of post filtering, so as to achieve smooth transitions. The gradual onset and removal may be controlled by adjusting the post filter gain.
(19) FIG. 9 shows a conventional decoder operable in a frequency-decoding mode and a CELP decoding mode depending on the bit stream signal supplied to the decoder. Post filtering is applied whenever the CELP decoding mode is selected. An improvement of this decoder is illustrated in FIG. 10, which shows a decoder 1000 according to an embodiment of the invention. This decoder is operable not only in a frequency-domain-based decoding mode, wherein the frequency-domain decoding module 1013 is active, and a filtered CELP decoding mode, wherein the CELP decoding module 1011 and the post filter 1040 are active, but also in an unfiltered CELP mode, in which the CELP module 1011 supplies its signal to a compensation delay module 1043 via a bypass line 1044. A switch 1042 controls what decoding mode is currently used responsive to post filtering information contained in the bit stream signal provided to the decoder 1000. In this decoder and that of FIG. 9, the last processing step is effected by an SBR module 1050, from which the final audio signal is output.
(20) FIG. 11 shows a post filter 1100 suitable to be arranged downstream of a decoder 1199. The filter 1100 includes a post filtering module 1140, which is enabled or disabled by a control module (not shown), notably a binary or non-binary gain controller, in response to a post filtering signal received from a decision module 1120 within the post filter 1100. The decision module performs one or more tests on the signal obtained from the decoder to arrive at a decision whether the post filtering module 1140 is to be active or inactive. The decision may be taken along the lines of the functionality of the decision module 820 in FIG. 8, which uses the original signal and/or an intermediate decoded signal to predict the action of the post filter. The decision of the decision module 1120 may also be based on similar information as the decision modules uses in those embodiments where an intermediate decoded signal is formed. As one example, the decision module 1120 may estimate a pitch frequency (unless this is readily extractable from the bit stream signal) and compute the energy content in the signal below the pitch frequency and between its harmonics. If this energy content is significant, it probably represents a relevant signal component rather than noise, which motivates a decision to disable the post filtering module 1140.
(21) A 6-person listening test has been carried out, during which music samples encoded and decoded according to the invention were compared with reference samples containing the same music coded while applying post filtering in the conventional fashion but maintaining all other parameters unchanged. The results confirm a perceived quality improvement.
(22) Further embodiments of the present invention will become apparent to a person skilled in the art after reading the description above. Even though the present description and drawings disclose embodiments and examples, the invention is not restricted to these specific examples. Numerous modifications and variations can be made without departing from the scope of the present invention, which is defined by the accompanying claims.
(23) The systems and methods disclosed hereinabove may be implemented as software, firmware, hardware or a combination thereof. Certain components or all components may be implemented as software executed by a digital signal processor or microprocessor, or be implemented as hardware or as an application-specific integrated circuit. Such software may be distributed on computer readable media, which may comprise computer storage media (or non-transitory media) and communication media (or transitory media). As is well known to a person skilled in the art, computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Further, it is well known to the skilled person that communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.