APPARATUS AND METHOD FOR WEIGHTING STEREO AUDIO SIGNALS
20190306650 ยท 2019-10-03
Inventors
Cpc classification
H04S1/002
ELECTRICITY
H04S7/302
ELECTRICITY
H04R5/04
ELECTRICITY
H04R2499/11
ELECTRICITY
G10H2210/305
PHYSICS
G10H2210/301
PHYSICS
H04S3/008
ELECTRICITY
H04S2400/01
ELECTRICITY
International classification
H04S7/00
ELECTRICITY
H04R5/04
ELECTRICITY
H04S3/00
ELECTRICITY
Abstract
A signal generator has a filter bank that provides weighted versions of audio signals to speakers. The weights were derived by identifying a first constraint that limits a weight that can be applied to an audio signal to be provided to a first speaker. A characteristic of a second speaker that affects how a user will perceive audio signals output by that speaker relative to audio signals output by the first speaker was also determined. A second constraint was determined based on the determined characteristic and the first constraint. The weights were then determined so as to minimize a difference between an actual balance of each signal that is expected to be heard by a user and a target balance. The signal generator can achieve sweet spot correction and sound stage widening simultaneously. It also achieves a balanced sound stage, particularly when the speakers are asymmetric.
Claims
1. A signal generator comprising: an input configured to receive at least two audio signals; and one or more filters configured to apply weights to the at least two audio signals to generate weighted audio signals and to provide the weighted audio signals to at least two speakers; wherein the weights applied by the one or more filters to the audio signals are derived by: identifying a first constraint that limits a weight that can be applied to an audio signal to be provided to a first speaker; determining a characteristic of a second speaker that affects how a user would perceive audio signals output by the second speaker relative to audio signals output by the first speaker; determining a second constraint based on the characteristic of the second speaker and the first constraint; and determining the weights so as to minimize a difference between an actual balance of each signal that is expected to be heard by the user when the weighted audio signals are output by the first and second speakers and a target balance, wherein the weights to be applied to audio signals to be provided to the first speaker are based on the first constraint, and the weights applied to audio signals to be provided to the second speaker are based on the second constraint.
2. The signal generator according to claim 1, wherein the weights applied by the one or more filters are derived by: determining an attenuation factor for stereo balancing based on the characteristic of the second speaker; and determining the first constraint based on the attenuation factor.
3. The signal generator according to claim 1, wherein the first and second speakers are different distances away from the user, and wherein the weights applied by the one or more filters are derived by determining the characteristic of the second speaker to be a relative distance of the second speaker from the user compared with the first speaker from the user.
4. The signal generator according to claim 3, wherein the weights applied by the one or more filters are derived by determining the relative distance to be:
5. The signal generator according to claim 1, wherein the first and second speakers have different frequency responses, and wherein the weights applied by the one or more filters are derived by determining the characteristic of the second speaker to be a relative frequency response of the second speaker compared with the first speaker.
6. The signal generator according to claim 5, wherein the weights applied by the one or more filters are derived by determining the relative frequency response to be:
7. The signal generator according to claim 1, wherein the weights applied by the one or more filters are derived by determining the first constraint to be a maximum gain associated with the at least two speakers.
8. The signal generator according to claim 7, wherein the at least two speakers are located in a car, and wherein the first constraint is a maximum gain associated with the most distant speaker to the user of the at least two speakers.
9. The signal generator according to claim 1, wherein the weights applied by the one or more filters are derived by determining the weights such that a sum of the squares of the weights to be applied to the audio signals to be provided to one speaker of the at least two speakers does not exceed a constraint for the one speaker.
10. The signal generator according to claim 1, wherein the weights applied by the one or more filters are derived by determining the target balance based on a physical arrangement of the at least two speakers relative to the user.
11. The signal generator according to claim 1, wherein the weights applied by the one or more filters are derived by determining the target balance so as to simulate speakers that are symmetrically arranged with respect to the user.
12. The signal generator according to claim 1, wherein the weights applied by the one or more filters are derived by determining the target balance so as to simulate speakers that are further apart than the at least two speakers.
13. A method comprising: receiving at least two audio signals, identifying a first constraint that limits a weight that can be applied to an audio signal to be provided to a first speaker; determining a characteristic of a second speaker that affects how a user would perceive audio signals output by the second speaker relative to audio signals output by the first speaker; determining a second constraint based on the characteristic of the second speaker and the first constraint; determining weights to apply to the at least two audio signals to generate weighted audio signals so as to minimize a difference between an actual balance of each signal that is expected to be heard by the user when the weighted audio signals are output by the first and second speakers and a target balance, wherein the weights applied to audio signals to be provided to the first speaker are based on the first constraint, and the weights applied to audio signals to be provided to the second speaker are based on the second constraint; applying the weights to the audio signals to generate the weighted audio signals; and providing the weighted audio signals to at least two speakers including the first speaker and the second speaker.
14. A non-transitory machine readable storage medium having stored thereon processor executable instructions for controlling a computer to carry out the following operations: receiving at least two audio signals; identifying a first constraint that limits a weight that can be applied to an audio signal to be provided to a first speaker; determining a characteristic of a second speaker that affects how a user would perceive audio signals output by the second speaker relative to audio signals output by the first speaker; determining a second constraint based on the characteristic of the second speaker and the first constraint; determining weights to apply to the at least two audio signals to generate weighted audio signals so as to minimize a difference between an actual balance of each signal that is expected to be heard by the user when the weighted audio signals are output by the first and second speakers and a target balance, wherein the weights applied to audio signals to be provided to the first speaker are based on the first constraint, and the weights applied to audio signals to be provided to the second speaker are based on the second constraint; applying the weights to the audio signals to generate the weighted audio signals; and providing the weighted audio signals to at least two speakers including the first speaker and the second speaker.
15. The method according to claim 13, further comprising: determining an attenuation factor for stereo balancing based on the characteristic of the second speaker; and determining the first constraint based on the attenuation factor.
16. The signal generator according to claim 1, wherein the first and second speakers are different distances away from the user, the method further comprising: determining the characteristic of the second speaker to be a relative distance of the second speaker from the user compared with the first speaker from the user.
17. The signal generator according to claim 1, wherein the first and second speakers have different frequency responses, the method further comprising: determining the characteristic of the second speaker to be a relative frequency response of the second speaker compared with the first speaker.
18. The machine readable storage medium according to claim 14, the operations further comprising: determining an attenuation factor for stereo balancing based on the characteristic of the second speaker; and determining the first constraint based on the attenuation factor.
19. The machine readable storage medium according to claim 14, wherein the first and second speakers are different distances away from the user, the operations further comprising: determining the characteristic of the second speaker to be a relative distance of the second speaker from the user compared with the first speaker from the user.
20. The machine readable storage medium according to claim 14, wherein the first and second speakers have different frequency responses, the operations further comprising: determining the characteristic of the second speaker to be a relative frequency response of the second speaker compared with the first speaker.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0022] The present disclosure will now be described by way of example with reference to the accompanying drawings. In the drawings:
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
DETAILED DESCRIPTION OF EMBODIMENTS
[0030] An example of a signal generator is shown in
[0031] The precalculated weights are preferably derived using a multi-constraint optimisation technique that is described in more detail below. This technique is adapted to derive weights that can achieve sound stage balancing for asymmetric speaker arrangements. A speaker arrangement might be asymmetric due to one speaker being more distant from one speaker than from another speaker (e.g. in a car). A speaker arrangement might be asymmetric due to one speaker having a different impulse response from another speaker (e.g. in a smartphone scenario). The sound generator (100) is configured to achieve a sound stage widening and sweet spot correction simultaneously.
[0032] In some embodiments, the signal generator may include a data store 105 for storing a plurality of different sets of filter weights. Each filter set might be applicable to a different scenario. The filter bank may be configured to use a set of filter weights in dependence on user input and/or internally or externally generated observations that suggest a particular scenario is applicable. For example, where the signal generator is providing audio signals to a stereo system in a car, the user might usually want to optimise the sound stage for the driver but the sound stage could also be optimised for one of the passengers. This might be an option that a user could select via a user interface associated with the car stereo system. In another example, the appropriate weights to achieve sound stage optimisation might depend on how a mobile device such as a smart phone is being used. For example, different weights might be appropriate if the device's sensors indicate that it is positioned horizontally on a flat surface from if sensor outputs indicate that the device is positioned vertically and possibly near the user's face.
[0033] In many implementations the signal generator is likely to form part of a larger device. That device could be, for example, a mobile phone, smart phone, tablet, laptop, stereo system or any generic user equipment, particularly user equipment with audio playback capability.
[0034] The structures shown in
[0035] One common example of an asymmetric speaker arrangement occurs in cars. This is a scenario in which sound stage widening can be particularly beneficial.
[0036] An example of a system structure for determining filter weights that can be used to address the type of unbalanced speaker arrangement illustrated in
[0037] The system structure has, as its inputs 301, the original left and right stereo sound signals. These are audio signals for being output by a loudspeaker. The system structure is described below with specific reference to an example that involves two audio signals: one for a left-hand speaker and one for a right-hand speaker, but the techniques described below can be readily extended to more than two audio channels.
[0038] Functional blocks 302 to 305 are largely configured to mimic what happens as the input audio signals 301 are output by a loudspeaker and travel through the air to be heard by a listener. Very low and high frequencies are expected to be bypassed, which is represented in the system structure of
[0039] The frequency-dependent transfer functions h.sub.ml(k) for sound propagation from the loudspeakers to a listener's ears are determined by the positions of the loudspeakers and the positions of the listener's ears. This is illustrated in
h.sub.11(k), h.sub.12(k), h.sub.21(k), h.sub.22(k) can be determined using the spherical head model, based on the respective loudspeaker and listener positions.
[0040] In the system of
[0041] For each frequency bin k, it is possible to formulate an optimization with two (and possibly more than two) constraints. This formulation starts by denoting a loudspeaker weights matrix, of dimension 22:
[0042] The diagonal elements of W(k) represent the ipsilateral filter gains for the left stereo channel and for the right stereo channel. The off-diagonal elements represent the contralateral filter gains for the two channels. The gains are specific to frequency bins, so the matrix is in the frequency domain.
[0043] The short-time Fourier transform (STFT) coefficients for the stereo sound signals can be denoted s.sub.n(k) (n{1,2}) where n is the channel index. The STFT coefficients can be computed by dividing the audio signal into short segments of equal length and then computing an FFT separately on each short segment. The STFT coefficients thus have an amplitude and a time extension. The left channel has n=1, the right channel has n=2. The playback signal which drives the l-th speaker can therefore be written as:
where l{1,2}. This represents an audio signal that is bandpass filtered into separate frequency bins, with each frequency bin being separately weighted before playback.
[0044] Referring to the physical arrangement of the two speakers relative to the user that is illustrated in
where m{1; 2}.
[0045] The weights applied to the audio signals by the loudspeakers thus combine with the transfer functions determined using the spherical head model to form response coefficients b.sub.mn(k):
[0046] The response coefficients transform the left and right channel signals s.sub.1(k) and s.sub.2(k) into the signals y.sub.m(k) (m{1; 2}) that are perceived by the listener. The weights w.sub.ln(k) can, in principle, be freely chosen. The transfer functions h.sub.ml(k) are fixed by the geometry of the system.
[0047] The aim is to choose weights w.sub.ln(k) for the actual setup such that the resulting response coefficients b.sub.mn(k) are identical or at least close to the response coefficients of a desired virtual setup:
[0048] The (22)-matrix {circumflex over (b)}(k)=[{circumflex over (b)}.sub.mn(k)] associated with the virtual setup represents a desired frequency response observed at listener's ears. The target matrix {circumflex over (b)}(k) is preferably selected such that the resulting filters show minimal pre-echoes, which leads to good quality playback and better sound widening perception.
[0049] The desired virtual setup is an imaginary setup in which the two loudspeakers are positioned more favourably than in the actual setup, in terms of both sound stage widening and good playback quality. An example of a desired virtual set-up is shown in
[0050] For car scenarios, in which two loudspeakers are usually asymmetrically positioned with respect to the driver, it is often desirable to physically widen at least one of the speakers. Referring to the physical arrangement of the two speakers relative to the user that is illustrated in
[0051] For smart phone scenarios, the two loudspeakers are usually symmetrically positioned with respect to the user. In this scenario the first and second columns of the {circumflex over (b)}(k) matrix may represent the frequency responses of a symmetrical pair of left and right virtual speakers, with those virtual sources having a wider spatial interval than the physical speakers. The asymmetry in the smart phone scenario is linked to the frequency responses of the speakers rather than their physical arrangement. The two physical speakers are likely to have different frequency responses.
[0052] Returning to the system structure of
[0053] One option would be for the system to determine the filter weights directly as soon as the plant matrix and the set of desirable response coefficients have been determined (e.g. by means of equation (6)). This is not optimal, however, as it does not account for one or more constraints that are inherent in the physical speaker arrangement, and that can affect how the user will perceive the audio signals output by the different speakers. In particular, there may be physical constraints that limit a weight that can applied to audio signals before they are supplied to a physical loudspeaker. One such constraint is associated with the upper gain limit for a particular loudspeaker. This constraint may be denoted N.
[0054] In the system structure of
w(1,:)(k).sup.2N.sub.1 that is, .sub.n=1.sup.2|w.sub.1,n(k)|.sup.2N.sub.1, and
w(2,:)(k).sup.2N.sub.2, that is .sub.n=1.sup.2|w.sub.2,n(k)|.sup.2N.sub.2(7)
[0055] So the sum of the squares of the weights for each speaker should not exceed the constraint for that speaker.
[0056] The constraint derivation unit may determine that one of the constraints is set by a maximum gain associated with both speakers. This sets an upper limit on the filter gain for either speaker. For example, if the two loudspeakers have different gain limits, the upper limit for the speaker pair may be the lower of those gain limits. The upper limit might also be affected by the loudspeakers respective positions with respect to the user and/or their respective frequency responses. For example, if the two loudspeakers are asymmetrically positioned with respect to the user, the upper limit may be determined by the loudspeaker that is the further away of the two. This is particularly expected to apply to the case where the audio signals are provided to speakers in a car. For mobile devices, it will usually be the case that either speaker can provide the upper gain limit. This is described in more detail below with respect to the scenario illustrated in
[0057] The constraint derivation unit 307 may be configured to use a preset upper gain limit6 dB might be a suitable exampleand assign this to whichever speaker the upper limit is considered more appropriate to. For example, in
[0058] Often, the same constraint will not be applicable to all speakers. This can be because of inherent differences between the speakers themselves and/or because of differences in the way those speakers are physically arranged with respect to the user. The constraint derivation unit (307) is preferably configured to address this by determining a characteristic of one speaker that affects how the user will perceive audio signals output by that other speaker relative to audio signals output by another speaker (step S604). The aim is to create a balanced sound stage, in which the user perceives the stereo signals as being output equally by the virtual speakers.
[0059] In one example, the constraint derivation unit 307 is configured to quantify this characteristic of the other loudspeaker through determining an attenuation factor for stereo balancing. The attenuation factor is denoted (k), and the constraint for the other speaker can be determined as:
N.sub.1=(k)N.sub.2(8)
[0060] For a typical car scenario, the constraint derivation unit 307 may assume that the speakers are essentially the sameso they have the same frequency response and the same gain limitmeaning that the characteristic that determines how the user will perceive audio signals is dependent on the relative distances between each respective speaker and the user. In this scenario, (k) can be derived using distance-based amplitude panning (DBAP):
[0061] In
[0062] For a typical smartphone scenario, the constraint derivation unit 307 may assume that the speakers are the same distance from the user but have different frequency responses. In this scenario, (k) can be derived from the measured impulse responses of the left and right speaker/receiver:
where t.sub.l(k) and t.sub.r(k) are the frequency responses of the left-hand and right-hand speakers at frequency k, respectively.
[0063] The constraint derivation unit may be provided with the appropriate frequency responses 309. Frequency responses of virtual sources can be determined, for example, based on online CIPIC HRTF databases available from the University of California Davis.
[0064] Having determined the characteristic of the second speaker that will affect how the user perceives audio signals output by that speaker compared with audio signals output by the first speaker, the constraint determination unit is able to determine the constraint for the second speaker in dependence on the constraint for the first speaker and the determined characteristic, e.g. by applying equation 8 (step S605).
[0065] In the system structure of
subject to:
w(1,:)(k).sup.2N.sub.1 that is, .sub.n=1.sup.2|w.sub.1,n(k)|.sup.2N.sub.1, and
w(2,:)(k).sup.2N.sub.2, that is .sub.n=1.sup.2|w.sub.2,n(k)|.sup.2N.sub.2
where H(k)W(k) represents the actual balance of each audio signal that is expected to be heard by the user and {circumflex over (b)}(k) represents the target balance. N.sub.1 and N.sub.2 limit the weight gain in the complex dimension.
[0066] As described above, the target balance may aim to simulate a symmetric speaker arrangement, i.e. a physical speaker arrangement in which the speakers are symmetrically arranged with respect to the user (which is achieved by representing the user via a user head model around which the simulated speakers are symmetrically arranged) and/or a speaker arrangement in which both speakers show the same frequency response. The target balance may also aim to simulate a speakers that are further apart than the speakers are in reality.
[0067] The optimisation unit 308 is thus capable of generating weights that accurately render the desired virtual source while also satisfying the attenuation constraints of the left channel speaker compared with the right channel speaker. If the optimisation unit applies equation 8, it will find the globally optimal solution in the MMSE (minimum mean square error) sense that minimizes the reproduction error compared with the desired virtual source responses in the complex frequency domain, while also being effectively constrained by the specified filter gain attenuation.
[0068] The system structure shown in
[0069] The structures shown in
[0070]
[0071] The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein, and without limitation to the scope of the claims. The applicant indicates that aspects of the present disclosure may consist of any such individual feature or combination of features. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the disclosure.