Binaural Dialogue Enhancement

Abstract

Methods for dialogue enhancing audio content, comprising providing a first audio signal presentation of the audio components, providing a second audio signal presentation, receiving a set of dialogue estimation parameters configured to enable estimation of dialogue components from the first audio signal presentation, applying said set of dialogue estimation parameters to said first audio signal presentation, to form a dialogue presentation of the dialogue components; and combining the dialogue presentation with said second audio signal presentation to form a dialogue enhanced audio signal presentation for reproduction on the second audio reproduction system, wherein at least one of said first and second audio signal presentation is a binaural audio signal presentation.

Claims

1. A method of processing immersive audio content, comprising: receiving a first audio signal presentation of the immersive audio content, the first audio signal presentation configured to reproduce on a first audio reproduction system; receiving a second audio signal presentation of the immersive audio content, the second audio signal presentation configured to reproduce on a second audio reproduction system; receiving a set of dialogue estimation parameters configured to enable estimation of dialogue components from the first audio signal presentation; forming a dialogue presentation of the dialogue components by applying the set of dialogue estimation parameters to the first audio signal presentation; and combining the dialogue presentation with the second audio signal presentation to form a dialogue enhanced audio signal presentation for reproduction on the second audio reproduction system, wherein at least one of the first or second audio signal presentation is a binaural audio signal presentation.

2. The method of claim 1, wherein the immersive audio content includes one or more spatial audio components.

3. The method of claim 1, wherein both said first and second audio signal presentations are binaural audio signal presentations.

4. The method of claim 1, wherein only one of said first and second audio signal presentation is a binaural audio signal presentation.

5. A system comprising: one or more processors; and a non-transitory computer readable medium storing instructions that, upon execution by the one or more processors, cause the one or more processors to perform operations of dialogue enhancing immersive audio content, the operations comprising: receiving a first audio signal presentation of the immersive audio content, the first audio signal presentation configured to reproduce on a first audio reproduction system; receiving a second audio signal presentation of the immersive audio content, the second audio signal presentation configured to reproduce on a second audio reproduction system; receiving a set of dialogue estimation parameters configured to enable estimation of dialogue components from the first audio signal presentation; forming a dialogue presentation of the dialogue components by applying the set of dialogue estimation parameters to the first audio signal presentation; and combining the dialogue presentation with the second audio signal presentation to form a dialogue enhanced audio signal presentation for reproduction on the second audio reproduction system, wherein at least one of the first or second audio signal presentation is a binaural audio signal presentation.

6. A non-transitory computer readable medium storing instructions that, upon execution by the one or more processors, cause one or more processors to perform operations of dialogue enhancing immersive audio content, the operations comprising: receiving a first audio signal presentation of the immersive audio content, the first audio signal presentation configured to reproduce on a first audio reproduction system; receiving a second audio signal presentation of the immersive audio content, the second audio signal presentation configured to reproduce on a second audio reproduction system; receiving a set of dialogue estimation parameters configured to enable estimation of dialogue components from the first audio signal presentation; forming a dialogue presentation of the dialogue components by applying the set of dialogue estimation parameters to the first audio signal presentation; and combining the dialogue presentation with the second audio signal presentation to form a dialogue enhanced audio signal presentation for reproduction on the second audio reproduction system, wherein at least one of the first or second audio signal presentation is a binaural audio signal presentation.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0029] Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings in which:

[0030] FIG. 1 illustrates a schematic overview of the HRIR convolution process for two sound sources or objects, with each channel or object being processed by a pair of HRIRs/BRIRs.

[0031] FIG. 2 illustrates schematically dialogue enhancement in a stereo context.

[0032] FIG. 3 is a schematic block diagram illustrating the principle of dialogue enhancement according to the invention.

[0033] FIG. 4 is a schematic block diagram of single presentation dialogue enhancement according to an embodiment of the invention.

[0034] FIG. 5 is a schematic block diagram of two presentation dialogue enhancement according to a further embodiment of the invention.

[0035] FIG. 6 is a schematic block diagram of the binaural dialogue estimator in FIG. 5 according to a further embodiment of the invention.

[0036] FIG. 7 is a schematic block diagram of a simulcast decoder implementing dialogue enhancement according to an embodiment of the invention.

[0037] FIG. 8 is a schematic block diagram of a simulcast decoder implementing dialogue enhancement according to another embodiment of the invention.

[0038] FIG. 9a is a schematic block diagram of a simulcast decoder implementing dialogue enhancement according to yet another embodiment of the invention.

[0039] FIG. 9b is a schematic block diagram of a simulcast decoder implementing dialogue enhancement according to yet another embodiment of the invention.

[0040] FIG. 10 is a schematic block diagram of a simulcast decoder implementing dialogue enhancement according to yet another embodiment of the invention.

[0041] FIG. 11 is a schematic block diagram of a simulcast decoder implementing dialogue enhancement according to yet another embodiment of the invention.

[0042] FIG. 12 is a schematic block diagram showing yet another embodiment of the present invention.

DETAILED DESCRIPTION

[0043] Systems and methods disclosed in the following may be implemented as software, firmware, hardware or a combination thereof. In a hardware implementation, the division of tasks referred to as “stages” in the below description does not necessarily correspond to the division into physical units; to the contrary, one physical component may have multiple functionalities, and one task may be carried out by several physical components in cooperation. Certain components or all components may be implemented as software executed by a digital signal processor or microprocessor, or be implemented as hardware or as an application-specific integrated circuit. Such software may be distributed on computer readable media, which may comprise computer storage media (or non-transitory media) and communication media (or transitory media). As is well known to a person skilled in the art, the term computer storage media includes both volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Further, it is well known to the skilled person that communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.

[0044] Various ways to implement embodiments of the invention will be discussed with reference to FIGS. 3-6. All these embodiments generally relate to a system and method for applying dialogue enhancement to an input audio signal having one or more audio components, wherein each component is associated with a spatial location. The illustrated blocks are typically implemented in a decoder.

[0045] In the presented embodiments the input signals are preferably analyzed in time/frequency tiles, for example by means of a filter bank such as a quadrature mirror filter (QMF) bank, a discrete Fourier transform (DFT), a discrete cosine transform (DCT), or any other means to split input signals into a variety of frequency bands. The result of such a transform is that an input signal x.sub.i [n] for input with index i and discrete-time index n is represented by sub-band signals x.sub.i[b,k] for time slot (or frame) k and sub-band b. Consider for example the estimation of the binaural dialogue presentation from a stereo presentation. Let x.sub.j[b, k],j=1, 2 denote the sub-band signals of the left and right stereo channels, and {circumflex over (d)}.sub.i[b,k], i=1, 2 denote the sub-band signals of the estimated left and right binaural dialogue signals. The dialogue estimate may be computed like

[00001] ${\hat{d}}_{i} [b, k] = {.Math.}_{m = 0}^{M - 1} {.Math.}_{j = 1}^{J} w_{i j m}^{B_{p}, K} x_{j} [b, k - m], i = 1, 2, b \in B_{p}, k \in K, p = 1, .Math., P$

with B.sub.p, K sets of frequency (b) and time (k) indices corresponding to a desired time/frequency tile, p the parameter band index, and m a convolution tap index, and w.sub.ijm.sup.B.sup.p.sup.,K matrix coefficient belonging to input index j, parameter band B.sub.p, sample range or time slot K, output index i, and convolution tap index m. Using the above formulation, the dialogue is parameterized by the parameters w (relative to the stereo signal; J=2 in this case of a stereo signal). The number of time slots in the set K can be independent of, and constant with respect to frequency and is typically chosen to correspond to a time interval of 5-40 ms. The number P of sets of frequency indices is typically between 1-25 with the number of frequency indices in each set typically increasing with increasing frequency to reflect properties of hearing (higher frequency resolution in the parameterization toward low frequencies).

[0046] The dialogue parameters w may be computed in the encoder, and encoded using techniques disclosed in U.S. Provisional Patent Application Ser. No. 62/209,735, filed Aug. 25, 2015, hereby incorporated by reference. The parameters w are then transmitted in the bitstream and decoded by a decoder prior to application using the above equation. Due to the linear nature of the estimate the encoder computation can be implemented using minimum mean squared error (MMSE) methods in cases where the target signal (the clean dialogue or an estimate of the clean dialogue) is available.

[0047] The choice of P, and the choice of the number of time slots in K is a trade-off between quality and bit rate. Furthermore, the parameters w can be constrained in order to lower the bit rate (at the cost of lower quality), e.g., by assuming w.sub.ijm.sup.B.sup.p.sup.,K=0 when i≠j and simply not transmitting those parameters. The choice of M is also a quality/bitrate trade-off, see U.S. patent application 62/209,742 filed on Aug. 25, 2015, hereby incorporated by reference. The parameters w are in general complex valued since the binauralization of the signals introduces ITDs (phase differences). However, the parameters can be constrained to be real-valued in order to lower the bit rate. Furthermore, it is well-known that humans are insensitive to phase and time differences between the signals in the left and right ear above a certain frequency, the phase/magnitude cut-off frequency, around 1.5-2 kHz, thus above that frequency, binaural processing is typically done so that no phase difference is introduced between the left and right binaural signals, and hence parameters can be real-valued with no loss in quality (cf. Breebaart, J., Nater, F., Kohlrausch, A. (2010). Spectral and spatial parameter resolution requirements for parametric, filter-bank-based HRTF processing. J. Audio Eng. Soc., 58 No 3, p. 126-140). The above quality/bit rate trade-offs can be done independently in each time/frequency tile.

[0048] In general it is proposed to use estimators of the form

[00002] ${\hat{y}}_{i} [b, k] = {.Math.}_{m = 0}^{M - 1} {.Math.}_{j = 1}^{J} w_{i j m}^{B_{p}, K} x_{j} [b, k - m], i = 1, .Math., I, b \in B_{p}, k \in K, p = 1, .Math., P$

where at least one of ŷ and x is a binaural signal, i.e., I=2 or J=2 or I=J=2. For notational convenience we will in the following often omit the time/frequency tile indexing B.sub.p, K as well as the i,j,m indexing when referring to different parameter sets used to estimate dialogue.

[0049] The above estimator can conveniently be expressed in matrix notation as (omitting the time/frequency tile indexing for ease of notation)

[00003] $\hat{Y} = {.Math.}_{m = 0}^{M - 1} X_{m} W_{m}$

where X.sub.m=[x.sub.1(m) . . . x.sub.j(m)] and Ŷ=[ŷ.sub.1 . . . ŷ.sub.I] contain vectorized versions of x.sub.j[b, k−m] and ŷ.sub.i[b, k] respectively in the columns, and W.sub.m is a parameter matrix with J rows and I columns. The above form of the estimator may be used when performing only dialogue extraction, or when performing only a presentation transform, as well as in the case where both extraction and presentation transform is done using a single set of parameters as is detailed in embodiments below.

[0050] With reference to FIG. 3, a first audio signal presentation 31 has been rendered from an immersive audio signal including a plurality of spatialized audio components. This first audio signal presentation is provided to a dialogue estimator 32, in order to provide a presentation 33 of one or several extracted dialogue components. The dialogue estimator 32 is provided with a dedicated set of dialogue estimation parameters 34. The dialogue presentation is level modified (e.g. boosted) by gain block 35, and then combined with a second presentation 36 of the audio signal to form a dialogue enhanced output 37. As will be discussed below, the combination may be a simple summation, but may also involve a summation of the dialogue presentation with the first presentation, before applying a transform to the sum, thereby forming the dialogue enhanced second presentation.

[0051] According to the present invention, at least one of the presentations is a binaural presentation (echoic or anechoic). As will be further discussed in the following, the first and second presentations may be different, and the dialogue presentation may or may not correspond to the second presentation. For example, the first audio signal presentation may be intended for playback on a first audio reproduction system, e.g. a set of loudspeakers, while the second audio signal presentation may be intended for playback on a second audio reproduction system, e.g. headphones.

Single Presentation

[0052] In the decoder embodiment in FIG. 4, the first and second presentations 41, 46, as well as the dialogue presentation 43, are all (echoic or anechoic) binaural presentations. The (binaural) dialogue estimator 42—and the dedicated parameters 44—is thus configured to estimate binaural dialogue components which are level modified in block 45 and added to the second audio presentation 46 to form output 47.

[0053] In the embodiment in FIG. 4, the parameters 44 are not configured to perform any presentation transform. Still, for best quality, the binaural dialogue estimator 42 should be complex valued in frequency bands up to the phase/magnitude cut-off frequency. To explain why complex valued estimators can be needed even when no presentation transform is done consider estimation of binaural dialogue from a binaural signal that is a mix of binaural dialogue and other binaural background content. Optimal extraction of dialogue often includes subtracting portions of say the right binaural signal from the left binaural signal to cancel background content. Since the binaural processing, by nature, introduces time (phase) differences between left and right signals, those phase differences must be compensated for prior to any subtraction can be done, and such compensation requires complex valued parameters. Indeed, when studying the result of MMSE computation of parameters the parameters in general come out as complex valued if not constrained to be real valued. In practice the choice of complex vs real valued parameters is a trade-off between quality and bit rate. As mentioned above, parameters can be real-valued above the frequency phase/magnitude cut-off frequency without any loss in quality by exploiting the insensitivity to fine-structure waveform phase differences at high frequencies.

Two Presentations

[0054] In the decoder embodiment in FIG. 5, the first and second presentations are different. In the illustrated example, the first presentation 51 is a non-binaural presentation (e.g. stereo 2.0, or surround 5.1), while the second presentation 56 is a binaural presentation. In this case, the set of dialogue estimation parameters 54 are configured to allow the binaural dialogue estimator 52 to estimate a binaural dialogue presentation 53 from a non-binaural presentation 51. It is noted that the presentations could be reversed, in which case the binaural dialogue estimator would e.g. estimate a stereo dialogue presentation from a binaural audio presentation. In either case, the dialogue estimator needs to extract dialogue components and perform a presentation transform. The binaural dialogue presentation 53 is level modified by block 55 and added to the second presentation 56.

[0055] As indicated in FIG. 5, the binaural dialogue estimator 52 receives one single set of parameters 54, configured to perform the two operations of dialogue extraction and presentation transform. However, as indicated in FIG. 6, it is also possible that an (echoic or anechoic) binaural dialogue estimator 62 receives two sets of parameters D1, D2; one set (D1) configured to extract dialogue (dialogue extraction parameters) and one set (D2) configured to perform the dialogue presentation transform (dialogue transform parameters). This may be advantageous in an implementation where one or both of these subsets D1, D2 are already available in the decoder. For example, the dialogue extraction parameters D1 may be available for conventional dialogue extraction as illustrated in FIG. 2. Further, the parameter transform parameters D2 may be available in a simulcast implementation, as discussed below.

[0056] In FIG. 6, the dialogue extraction (block 62a) is indicated as occurring before the presentation transform (block 62b), but this order may of course equally well be reversed. It is also noted that for reasons of computational efficiency, even if the parameters are provided as two separate sets D1, D2, it may be advantageous to first combine the two sets of parameters into one combined matrix transform, before applying this combined transform to the input signal 61.

[0057] Further, it is noted that the dialogue extraction can be one dimensional, such that the extracted dialogue is a mono representation. The transform parameters D2 are then positional metadata, and the presentation transform comprises rendering the mono dialogue using HRTFs, HRIRs or BRIRs corresponding to the position. Alternatively, if the desired rendered dialogue presentation is intended for loudspeaker playback, the mono dialogue could be rendered using loudspeaker rendering techniques such as amplitude panning or vector-based amplitude panning (VBAP).

Simulcast Implementation

[0058] FIGS. 7-11 show embodiments of the present invention in the context of a simulcast system, i.e. a system where one audio presentation is encoded and transmitted to a decoder together with a set of transform parameters which enable the decoder to transform the audio presentation into a different presentation adapted to the intended playback system (e.g. as indicated a binaural presentation for headphones). Various aspects of such a system is described in detail in co-pending and non-published U.S. Provisional Patent Application Ser. No. 62/209,735, filed Aug. 25, 2015, hereby incorporated by reference. For simplicity, FIGS. 7-11 only illustrate the decoder side.

[0059] As illustrated in FIG. 7, a core decoder 71 receives an encoded bitstream 72 including an initial audio signal presentation of the audio components. In the illustrated case this initial presentation is a stereo presentation z, but it may also be any other presentation. The bitstream 72 also includes a set of presentation transform parameters w(y) which are used as matrix coefficients to perform a matrix transform 73 of the stereo signal z to generate a reconstructed anechoic binaural signal ŷ. The transform parameters w(y) have been determined in the encoder as discussed in U.S. 62/209,735. In the illustrated case, the bitstream 72 also includes a set of parameters w(f) which are used as matrix coefficients to perform a matrix transform 74 of the stereo signal z to generate a reconstructed input signal {circumflex over (f)} for an acoustic environment simulation, here a feedback delay network (FDN) 75. These parameters w(f) have been determined in a similar way as the presentation transform parameters w(y). The FDN 75 receives the input signal {circumflex over (f)} and provides an acoustic environment simulation output FDN.sub.out which may be combined with the anechoic binaural signal ŷ to provide an echoic binaural signal.

[0060] In the embodiment in FIG. 7, the bitstream further includes a set of dialogue estimation parameters w(D) which are used as matrix coefficients in a dialogue estimator 76 to perform a matrix transform of the stereo signal z to generate an anechoic binaural dialogue presentation D. The dialogue presentation D is level modified (e.g. boosted) in block 77, and combined with the reconstructed anechoic signal ŷ and the acoustic environment simulation output FDN.sub.out in summation block 78.

[0061] FIG. 7 is essentially an implementation of the embodiment in FIG. 5 in a simulcast context.

[0062] In the embodiment in FIG. 8, a stereo signal z, a set of transform parameters w(y) and a further set of parameters w(f) are received and decoded just as in FIG. 7, and elements 71, 73, 74, 75, and 78 are equivalent to those discussed with respect to FIG. 7. Further, the bitstream 82 here also includes a set of dialogue estimation parameters w(D1) which are applied by a dialogue estimator 86 on the signal z. However, in this embodiment, the dialogue estimation parameters w(D1) are not configured to provide any presentation transform. The dialogue presentation output D.sub.stereo from the dialogue estimator 86 therefore corresponds to the initial audio signal presentation, here a stereo presentation. This dialogue presentation D.sub.stereo is level modified in block 87, and then added to the signal z in the summation 88. The dialogue enhanced signal (z+D.sub.stereo) is then transformed by the set of transform parameters w(y).

[0063] FIG. 8 can be seen as an implementation of the embodiment in FIG. 6 in a simulcast context, where w(D1) is used as D1 and w(y) is used as D2. However, while in FIG. 6 both sets of parameters are applied in the dialogue estimator 62, in FIG. 8 the extracted dialogue D.sub.stereo is added to the signal z and the transform w(y) is applied to the combined signal (z+D).

[0064] It is noted that the set of parameters w(D1) may be identical to the dialogue enhancement parameters used to provide dialogue enhancement of the stereo signal in a simulcast implementation. This alternative is illustrated in FIG. 9a, where the dialogue extraction 96a is indicated as forming part of the core decoder 91. Further, in FIG. 9a, a presentation transform 96b using the parameter set w(y) is performed before the gain, separately from the transformation of the signal z. This embodiment is thus even more similar to the case shown in FIG. 6, with the dialogue estimator 62 comprising both transforms 96a, 96b.

[0065] FIG. 9b shows a modified version of the embodiment in FIG. 9a. In this case the presentation transform is not performed using the parameter set w(y), but with an additional set of parameters w(D2) which is provided in a part of the bitstream dedicated to binaural dialogue estimation.

[0066] In one embodiment, the aforementioned dedicated presentation transform w(D2) in FIG. 9b is a real-valued, single-tap (M=1), full-band (P=1) matrix.

[0067] FIG. 10 shows a modified version of the embodiment in FIG. 9a-9b. In this case, the dialogue extractor 96a again provides a stereo dialogue presentation D.sub.stereo, and is again indicated as forming part of the core decoder 91. Here, however, the stereo dialogue presentation D.sub.stereo, after level modification in block 97, is added directly to the anechoic binaural signal 9 (together with the acoustic environment simulation from the FDN).

[0068] It is noted that combining signals with different presentations, e.g., summing a stereo dialogue signal to a binaural signal (which contains non-enhanced binaural dialogue components) naturally leads to spatial imaging artifacts since the non-enhanced binaural dialogue components are perceived to be spatially different compared to a stereo presentation of the same components.

[0069] It is further noted that combining signals with different presentations can lead to constructive summing of dialogue components in certain frequency bands, and destructive summing in other frequency bands. The reason for this is that binaural processing introduces ITDs (phase differences) and we are summing signals that are in-phase in certain frequency bands and out-of-phase in other bands, leading to coloring artifacts in the dialogue components (moreover the coloring can be different in the left and right ear). In one embodiment, phase differences above the phase/magnitude cut-off frequency are avoided in the binaural processing so as to reduce this type of artifact.

[0070] As a final note to the case of combining signals with different presentations it is acknowledged that in general, binaural processing can reduce the intelligibility of dialogue. In cases where the goal of dialogue enhancement is to maximize intelligibility, it may be advantageous to extract and level modify (e.g. boost) a dialogue signal that is non-binaural. To elaborate further, even if the final presentation intended for playback is binaural, it may be advantageous in such a case to extract and level modify (e.g. boost) a stereo dialogue signal and combine that with the binaural presentation (trading off coloring artifacts and spatial imaging artifacts as described above, for increased intelligibility).

[0071] In the embodiment in FIG. 11, a stereo signal z, a set of transform parameters w(y) and a further set of parameters w(f) are received and decoded just as in FIG. 7. Further, similar to FIG. 8, the bitstream also includes a set of dialogue estimation parameters w(D1) which are not configured to provide any presentation transform. However, in this embodiment, the dialogue estimation parameters w(D1) are applied by the dialogue estimator 116 on the reconstructed anechoic binaural signal ŷ to provide an anechoic binaural dialogue presentation D. This dialogue presentation D is level modified by a block 117 and added in summation 118 to the signal ŷ together with FDN.sub.out.

[0072] FIG. 11 is essentially an implementation of the single presentation embodiment in FIG. 5 in a simulcast context. However, it can also be seen as an implementation of FIG. 6 with a reversed order of D1 and D2, where again w(D1) is used as D1 and w(y) is used as D2. However, while in FIG. 6 both sets of parameters are applied in the dialogue estimator, in FIG. 9 the transform parameters D2 have already been applied in order to obtain ŷ, and the dialogue estimator 116 only needs to apply the parameters w(D1) to the signal ŷ in order to obtain the echoic binaural dialogue presentation D.

[0073] In some applications, it may be desirable to apply different processing depending on the desired value of the dialogue level modification factor G. In one embodiment, example, appropriate processing is selected based on a determination of whether the factor G is greater than or smaller than a given threshold. Of course, there may also be more than one threshold, and more than one alternative processing. For example, a first processing when G<th1, a second processing when th1<=G<th2, and a third processing when G>=th2, where th1 and th2 are two given threshold values.

[0074] In a specific example, illustrated in FIG. 12, the threshold is zero, and first processing is applied when G<0 (attenuation of dialogue), while a second processing is applied when G>0 (enhancement of dialogue). For this purpose, the circuit in FIG. 12 includes selection logic in the form of a switch 121 with two positions A and B. The switch is provided with the value of the gain factor G from block 122, and is configured to assume position A when G<0, and position B when G>0.

[0075] When the switch is in position A, the circuit is here configured to combine the estimated stereo dialogue from matrix transform 86 with the stereo signal z, and then perform the matrix transform 73 on the combined signal to generate a reconstructed anechoic binaural signal. The output from the feedback delay network 75 is then combined with this signal in 78. It is noted that this processing essentially corresponds to FIG. 8 discussed above.

[0076] When the switch is in position B, the circuit is here configured to apply transform parameters w(D2) to the stereo dialogue from matrix transform 86 in order to provide a binaural dialogue estimation. This estimation is then added to the anechoic binaural signal from transform 73, and output from the feedback delay network 75. It is noted that this processing essentially corresponds to FIG. 9b discussed above.

[0077] The skilled person will realize many other alternatives for the processing in position A and B, respectively. For example, the processing when the switch is in position B could instead correspond to that in FIG. 10. However, the main contribution of the embodiment in FIG. 12 is the introduction of the switch 121, which enables alternative processing depending on the value of the gain factor G.

Interpretation

[0078] Reference throughout this specification to “one embodiment”, “some embodiments” or “an embodiment” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases “in one embodiment”, “in some embodiments” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment, but may. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner, as would be apparent to one of ordinary skill in the art from this disclosure, in one or more embodiments.

[0079] As used herein, unless otherwise specified the use of the ordinal adjectives “first”, “second”, “third”, etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.

[0080] In the claims below and the description herein, any one of the terms comprising, comprised of or which comprises is an open term that means including at least the elements/features that follow, but not excluding others. Thus, the term comprising, when used in the claims, should not be interpreted as being limitative to the means or elements or steps listed thereafter. For example, the scope of the expression a device comprising A and B should not be limited to devices consisting only of elements A and B. Any one of the terms including or which includes or that includes as used herein is also an open term that also means including at least the elements/features that follow the term, but not excluding others. Thus, including is synonymous with and means comprising.

[0081] As used herein, the term “exemplary” is used in the sense of providing examples, as opposed to indicating quality. That is, an “exemplary embodiment” is an embodiment provided as an example, as opposed to necessarily being an embodiment of exemplary quality.

[0082] It should be appreciated that in the above description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, FIG., or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.

[0083] Furthermore, while some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention, and form different embodiments, as would be understood by those skilled in the art. For example, in the following claims, any of the claimed embodiments can be used in any combination.

[0084] Furthermore, some of the embodiments are described herein as a method or combination of elements of a method that can be implemented by a processor of a computer system or by other means of carrying out the function. Thus, a processor with the necessary instructions for carrying out such a method or element of a method forms a means for carrying out the method or element of a method. Furthermore, an element described herein of an apparatus embodiment is an example of a means for carrying out the function performed by the element for the purpose of carrying out the invention.

[0085] In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In other instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

[0086] Similarly, it is to be noticed that the term coupled, when used in the claims, should not be interpreted as being limited to direct connections only. The terms “coupled” and “connected,” along with their derivatives, may be used. It should be understood that these terms are not intended as synonyms for each other. Thus, the scope of the expression a device A coupled to a device B should not be limited to devices or systems wherein an output of device A is directly connected to an input of device B. It means that there exists a path between an output of A and an input of B which may be a path including other devices or means. “Coupled” may mean that two or more elements are either in direct physical or electrical contact, or that two or more elements are not in direct contact with each other but yet still co-operate or interact with each other.

[0087] Thus, while there has been described specific embodiments of the invention, those skilled in the art will recognize that other and further modifications may be made thereto without departing from the spirit of the invention, and it is intended to claim all such changes and modifications as falling within the scope of the invention. For example, any formulas given above are merely representative of procedures that may be used. Functionality may be added or deleted from the block diagrams and operations may be interchanged among functional blocks. Steps may be added or deleted to methods described within the scope of the present invention.

Binaural Dialogue Enhancement

Assignee

Inventors

Cpc classification

Classification Explorer

H04S3/00

ELECTRICITY

Classification Explorer

H04S1/002

ELECTRICITY

Classification Explorer

H04S2420/01

ELECTRICITY

Classification Explorer

H04S2420/03

ELECTRICITY

Classification Explorer

H04R5/04

ELECTRICITY

Classification Explorer

H04S3/02

ELECTRICITY

Classification Explorer

H04S3/008

ELECTRICITY

Classification Explorer

H04S7/303

ELECTRICITY

International classification

Classification Explorer

H04S1/00

ELECTRICITY

Classification Explorer

H04R5/04

ELECTRICITY

Classification Explorer

H04S3/00

ELECTRICITY

Classification Explorer

H04S7/00

ELECTRICITY

Abstract

Claims

Description