Methods, apparatus and systems for asymmetric speaker processing
10659880 ยท 2020-05-19
Assignee
- Dolby Laboratories Licensing Corporation (San Francisco, CA)
- Dolby International Ab (Amsterdam Zuidoost, NL)
Inventors
- Dirk Jeroen Breebaart (Ultimo, AU)
- Mark David DE BURGH (Mount Colah, AU)
- Nicholas Luke Appleton (Bellevue Hill, AU)
- Heiko Purnhagen (Sundbyberg, DE)
- Mark William Gerrard (Balmain, AU)
- David Matthew Cooper (Carlton, AU)
Cpc classification
H04S2400/03
ELECTRICITY
H04R5/04
ELECTRICITY
H04R2499/11
ELECTRICITY
H04S7/30
ELECTRICITY
H04R2205/022
ELECTRICITY
H04R2420/01
ELECTRICITY
H04S3/008
ELECTRICITY
H04S2400/01
ELECTRICITY
H04M1/03
ELECTRICITY
H04S1/002
ELECTRICITY
International classification
H04R5/04
ELECTRICITY
H04M1/03
ELECTRICITY
H04S7/00
ELECTRICITY
Abstract
A method of processing audio data for replay on a mobile device with a first speaker and a second speaker, wherein the audio data comprises a respective audio signal for each of the first and second speakers, includes: determining a device orientation of the mobile device; if the determined device orientation is vertical orientation, applying a first processing mode to the audio signals for the first and second speakers; and if the determined device orientation is horizontal orientation, applying a second processing mode to the audio signals for the first and second speakers. Applying the first processing mode involves: determining respective mono audio signals in at least two frequency bands based on the audio signals for the first and second speakers; in a first one of the at least two frequency bands, routing a larger portion of the respective mono audio signal to one of the first and second speakers; and in a second one of the at least two frequency bands, routing a larger portion of the respective mono audio signal to the other one of the first and second speakers. Applying the second processing mode involves applying cross-talk cancellation to the audio signals for the first and second speakers.
Claims
1. A method of processing audio data for replay on a mobile device with a first speaker and a second speaker, wherein the audio data comprises a respective audio signal for each of the first and second speakers, the method comprising: determining a device orientation of the mobile device; if the determined device orientation is vertical orientation, applying a first processing mode to the audio signals for the first and second speakers; and if the determined device orientation is horizontal orientation, applying a second processing mode to the audio signals for the first and second speakers, wherein applying the first processing mode involves: determining respective mono audio signals in at least two frequency bands based on the audio signals for the first and second speakers; in a first one of the at least two frequency bands, routing a larger portion of the respective mono audio signal to one of the first and second speakers; and in a second one of the at least two frequency bands, routing a larger portion of the respective mono audio signal to the other one of the first and second speakers; and wherein applying the second processing mode involves applying cross-talk cancellation to the audio signals for the first and second speakers.
2. The method according to claim 1, wherein the second processing mode further involves applying a multi-band dynamic range compressor, peak limiter, RMS limiter, or signal limiter to the audio signals after cross-talk cancellation.
3. The method according to claim 2, wherein applying the multi-band dynamic range compressor, peak limiter, RMS limiter, or signal limiter to the audio signals after cross-talk cancellation involves applying gains that are coupled between respective audio signals after cross-talk cancellation, at least over a range of frequencies.
4. The method according to claim 1, wherein the second processing mode involves bypassing cross-talk cancellation for low frequencies.
5. The method according to claim 4, wherein bypassing cross-talk cancellation for low frequencies involves: determining a mono audio signal in a low frequency band based on the audio signals for the first and second speakers; and routing the mono audio signal in the low frequency band to a main speaker among the first and second speakers.
6. The method according to claim 1, wherein the second processing mode involves: applying a first correction filter to that audio signal after cross-talk cancellation that is routed to the one of the first and second speakers; and applying a second correction filter to that audio signal after cross-talk cancellation that is routed to the other one of the first and second speakers, wherein the first correction filter is different from the second correction filter.
7. The method according to claim 1, wherein the second processing mode involves: extracting a center channel from the audio signals for the first and second speakers; and bypassing cross-talk cancellation for the extracted center channel.
8. The method according to claim 1, wherein in the first processing mode, determining the respective mono audio signals in the at least two frequency bands involves: downmixing the audio signals for the first and second speakers to a mono audio signal and splitting the mono audio signal into at least two frequency bands; or splitting each audio signal into at least two frequency bands and, in each frequency band, downmixing the respective audio signals to a respective mono audio signal.
9. The method according to claim 1, wherein the first processing mode involves: applying a first correction filter to that part of the mono audio signal in the first one of the at least two frequency bands that is routed to the one of the first and second speakers; and applying a second correction filter to that part of the mono audio signal in the second one of the at least two frequency bands that is routed to the other one of the first and second speakers, wherein the first correction filter is different from the second correction filter.
10. The method according to claim 9, wherein the first processing mode involves applying a multi-band dynamic range compressor, peak limiter, RMS limiter, or signal limiter to the audio signals after filtering by the first and second correction filters.
11. The method according to claim 1, wherein in the first processing mode, the first one of the at least two frequency bands is a low frequency band and the mono audio signal in the low frequency band is routed only to the one of the first and second speakers.
12. The method according to claim 11, wherein the one of the first and second speakers is a main speaker of the mobile device.
13. The method according to claim 1, wherein in the first processing mode, the second one of the at least two frequency bands is a high frequency band, and wherein the mono audio signal in the high frequency band is routed only to the other one of the first and second speakers.
14. The method according to claim 13, wherein the other one of the first and second speakers is an ear speaker of the mobile device.
15. The method according to claim 1, further comprising: for at least one of the first and second speakers, applying a speaker correction filter to the respective audio signal that is routed to that speaker, wherein the speaker correction filter has a phase component intended to match the phase response of that speaker to the phase response of the other one of the first and second speakers.
16. The method according to claim 1, further comprising: obtaining sensor data from one or more sensors of the mobile device; and determining the device orientation based on the sensor data.
17. The method according to claim 1, further comprising: obtaining a user input; determining the device orientation based on the user input.
18. The method according to claim 1, wherein the mobile device is a mobile phone, the first speaker is a main speaker of the mobile phone, and the second speaker is an ear speaker of the mobile phone.
19. A non-transitory computer-readable storage medium storing a computer program including instructions that causes a processor that carries out the instructions to perform the method according to claim 1.
20. A mobile device comprising: a first speaker and a second speaker; and a processor coupled to a memory storing instructions for the processor, wherein the processor is adapted to perform a method of processing audio data for replay on the mobile device with the first speaker and the second speaker, wherein the audio data comprises a respective audio signal for each of the first and second speakers, the method comprising: determining a device orientation of the mobile device; if the determined device orientation is vertical orientation, applying a first processing mode to the audio signals for the first and second speakers; and if the determined device orientation is horizontal orientation, applying a second processing mode to the audio signals for the first and second speakers, wherein the applying the first processing mode involves: determining respective mono audio signals in at least two frequency bands based on the audio signals for the first and second speakers; in a first one of the at least two frequency bands, routing a larger portion of the respective mono audio signal to one of the first and second speakers; and in a second one of the at least two frequency bands, routing a larger portion of the respective mono audio signal to the other one of the first and second speakers; and wherein applying the second processing mode involves applying cross-talk cancellation to the audio signals for the first and second speakers.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Example embodiments of the disclosure are explained below with reference to the accompanying drawings, wherein like reference numbers indicate like or similar elements, and wherein
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
DETAILED DESCRIPTION
(15) As indicated above, identical or like reference numbers in the disclosure indicate identical or like elements, and repeated description thereof may be omitted for reasons of conciseness.
(16) Broadly speaking, the present disclosure relates to customized device audio processing (virtualization, speaker correction) of a device that has at least two or more speakers, where the audio processing: 1) employs a different audio processing algorithm topology depending on the device's use case, and/or 2) employs magnitude compensation for at least one of the speakers depending on the device's use case, and/or 3) includes a phase compensation for at least one of the speakers, and/or 4) is dependent on the device's orientation, location, or environment including any changes therein over time.
(17)
(18) An outline of a method 1200 according to embodiments of the disclosure will now be described with reference to
(19) At step S1210, a device orientation of the mobile device is determined. For example, the mobile device (e.g., device 100) may obtain (e.g., receive) device sensor data 101b and/or user data 101a. The sensor data may be received from one or more sensors of the device 100. The device sensor data 101b and/or user data 101a may be processed and/or analyzed to determine orientation, position and/or environment use case data 105. In particular, the device orientation may be determined based on sensor data. The data 105 which may be provided to (1) a topology selector 103 and (2) a correction filter selector 108.
(20) The orientation, location and/or environment data 105 of the device may be detected automatically based on the device's accelerometer data, gyroscope, compass, GPS sensor, light sensor, microphone, or any other sensor data available to the device.
(21) Accordingly, the one or more sensors of the device 100 may include any, some, or all of an accelerometer, a gyroscope, a compass, a GPS sensor, a light sensor, and/or microphone.
(22) The orientation, location and/or environment data 105 of the device 100 may also be determined at block 104 from direct user input, such as, for example, voice prompts, keyboard input or any other method through which direct user input is collected. In particular, the device orientation may be determined based on user input. The user of the device may provide such input to direct or modify the device processing, signal a specific use case, or request a specific mode of device processing.
(23) The orientation, location, environment or use-case data may be used by the topology selector 103 to selectively switch between two or more available device processing algorithm topologies (e.g., device processing topology A 106 or device processing topology B 107, or first and second processing modes). The data may further be used by a correction filter selector to modify or select correction filter data in one or more device processing topologies.
(24) In some embodiments, the method 1200 decides on using either a first processing mode or a second processing mode for applying to the audio signals for the first and second speakers, depending on the determined device orientation. The most relevant device orientations are horizontal orientation and vertical orientation. Horizontal orientations may also be referred to as landscape orientation or landscape mode. Vertical orientation may also be referred to as portrait orientation or portrait mode. Accordingly, the method may further include mapping a device orientation to either horizontal orientation or vertical orientation, depending on one or more angles defining the device orientation. For example, if the device orientation is closer to horizontal orientation than to vertical orientation, the device orientation may be treated as horizontal orientation. Otherwise, the device orientation may be treated as vertical orientation. For example, horizontal orientation may be defined as that orientation in which the two speakers of the mobile device are approximately located at a same height. On the other hand, vertical orientation may be defined as that orientation in which the first and second speakers are located at substantially different heights.
(25) At step S1220 of method 1200, if the determined device orientation is vertical orientation, a first processing mode is applied to the audio signals for the first and second speakers.
(26) On the other hand, if the determined device orientation is horizontal orientation, a second processing mode is applied to the audio signals for the first and second speakers at step S1230.
(27) The method 1200 may further comprise receiving the audio data, for example from a bitstream.
(28) In general, applying the first processing mode involves determining respective mono audio signals in at least two frequency bands based on the audio signals for the first and second speakers. In a first one of the at least two frequency bands, a larger portion of the respective mono audio signal is routed (e.g., sent) to one of the first and second speakers. In a second one of the at least two frequency bands, a larger portion of the respective mono audio signal is routed to the other one of the first and second speakers. Applying the second processing mode involves applying cross-talk cancellation to the audio signals for the first and second speakers. Examples of the first and second processing modes are described below.
(29)
(30) This mode (or the first processing mode in general) is engaged automatically whenever the device is placed or held vertically, (e.g. in Portrait mode). For example, the ear speaker may be above the main speaker. In such a case, there is no basis for spatial imaging as both speakers are positioned vertically relative to one another. Such orientation can be determined based on an analysis of accelerometer data showing a gravity component that is substantially downwards oriented, for example.
(31) A typical asymmetric speaker layout comprising an ear speaker and main speaker will exhibit different frequency responses across the two drivers. In particular, the main speaker is typically more efficient and capable in reproducing low-frequency content, while the opposite can be true for high-frequency content. In order to produce a maximum loudness while minimizing the amount of electric and/or digital power required, it is beneficial to split the signal(s) to be reproduced by the two drivers in at least two (or more) frequency bands. A low-frequency band is reproduced by the main speaker, while the high-frequency band is reproduced by the ear speaker. Hybrid approaches may be feasible as well involving multiple frequency bands that are steered to just one or both speakers. Besides the application of such band-split filters, speaker correction can be applied simultaneously by superimposing a correction filter on the band-split filter(s).
(32) An overview 200 of this particular speaker correction mode is shown in
(33) In general, in the first processing mode respective mono audio signals are determined in at least two frequency bands based on the audio signals for the first and second speakers (e.g., based on a stereo signal). In a first one of the at least two frequency bands, a larger portion of the respective mono audio signal (possibly all of the respective mono audio signal) is routed to one of the first and second speakers. The first one of the at least two frequency bands may be a low frequency band. The one of the first and second speakers may be the main speaker of the mobile device (e.g., for a mobile phone with a main speaker and an ear speaker). In a second one of the at least two frequency bands, a larger portion of the respective mono audio signal (possibly all of the respective mono audio signal) is routed to the other one of the first and second speakers. The second one of the at least two frequency bands may be a high frequency band. The other one of the first and second speakers may be an ear speaker of the mobile device (e.g., for a mobile phone with a main speaker and an ear speaker). For example, the mono audio signal in the low frequency band may be routed only to the one of the first and second speakers (e.g., the main speaker) and the mono audio signal in the high frequency band may be routed only to the other one of the first and second speakers (e.g., the ear speaker).
(34) The mono audio signals in the at least two frequency bands can be obtained in different manners. For example, the audio signals for the first and second speakers (e.g., the stereo audio signal) can be first downmixed to a mono audio signal, which is then split into the at least two frequency bands. Alternatively, each audio signal can be first split into the at least two frequency bands, and the split audio signals in each frequency band can then be downmixed to a mono audio signal for that frequency band. In both cases, the splitting may be effected by a combination of a high-pass filter and a low-pass filter (e.g., in parallel), and optionally, one or more bandpass filters.
(35) In some embodiments, speaker-specific correction filters may be applied to respective parts of the mono audio signals in the at least two frequency bands. Herein, speaker-specific means that the correction filters are different for the first and second speakers. For example, a first correction filter can be applied to that part of the mono audio signal in the first one of the at least two frequency bands that is routed to the one of the first and second speakers. For example, the first correction filter can be applied to (that part of) the mono audio signal in the low frequency band that is applied to the main speaker. Likewise, a second correction filter can be applied to that part of the mono audio signal in the second one of the at least two frequency bands that is routed to the other one of the first and second speakers. For example, the second correction filter can be applied to (that part of) the mono audio signal in the high frequency band that is applied to the ear speaker.
(36) The first correction filter may be specific to the one of the first and second speakers (e.g., the main speaker). Likewise, the second correction filter may be specific to the other one of the first and second speakers (e.g., the ear speaker). Accordingly, if not all of the mono signal in the first (e.g., low) frequency band is routed to the one of the first and second speakers (e.g., the main speaker), the second correction filter may also be applied to that (presumably small) part of the mono audio signal in the first (e.g., low) frequency band that is routed to the other one of the first and second speakers (e.g., the ear speaker). Likewise, if not all of the mono signal in the second (e.g., high) frequency band is routed to the other one of the first and second speakers (e.g., the ear speaker), the first correction filter may also be applied to that (presumably small) part of the mono audio signal in the second (e.g., high) frequency band that is routed to the one of the first and second speakers (e.g., the main speaker).
(37) After filtering the audio signals by the aforementioned correction filters, the first processing mode can further involve applying one of a multi-band DRC, a peak limiter, a RMS limiter, or a signal limiter to the audio signals that are eventually routed to the first and second speakers. These compressors/limiters can be examples of limiters 260, 270 in
(38) An example a combination of a high-pass/low-pass filters (to achieve band-splitting) and subsequent two correction filters are shown in
(39)
(40) Thus, as is shown in the example of
(41) For asymmetric speaker configurations, in which one of the first and second speakers has inferior power handling capabilities and/or inferior capabilities to play back low frequency content compared to the other one of the first and second speakers, it may be advantageous to bypass cross-talk cancellation for low frequencies. Thereby, overall loudness can be improved. Bypassing cross-talk cancellation for low frequencies may proceed as follows. A mono audio signal is determined in a low frequency band based on the audio signals for the first and second speakers (e.g., based on a stereo signal). Determining this mono audio signal may involve low pass filtering the audio signals for the first and second speakers and subsequently downmixing the low pass filtered audio signals to obtain the mono audio signal. The order of low-pass filtering and downmixing may be reversed in some embodiments. The determined mono audio signal in the low frequency band is then routed to (only) a main speaker among the first and second speakers. On the other hand, cross-talk cancellation is applied to the high pass filtered versions of the audio signals for the first and second speakers. An output of the cross-talk cancellation for the high frequency band is then routed to the first and second speakers.
(42) Bypassing cross-talk cancellation for low frequencies can be advantageously performed in conjunction with the processing that may be performed by the bass management module(s), as schematically illustrated in
(43) Moreover, in some embodiments cross-talk cancellation can also be bypassed for a center channel that is extracted from the audio signals for the first and second speakers (e.g., from a stereo signal). Again, this may contribute to improving overall loudness. Subsequent to cross-talk cancellation in
(44) Optionally, the second processing mode can further involve applying respective correction filters to the audio signals after cross-talk cancellation (and optionally, bass management) that are routed to the first and second speakers, respectively. That is, a first correction filter may be applied to that audio signal, after cross-talk cancellation (and optionally, bass management), that is eventually routed to the one of the first and second speakers, whereas a second correction filter may be applied to that audio signal, after cross-talk cancellation (and optionally, bass management), that is eventually routed to the other one of the first and second speakers. The first and second correction filters may be specific to their respective speakers and may be different from each other in general.
(45) That is, in the example of
(46) That is, the second processing mode can involve applying, for at least one of the first and second speakers, a speaker correction filter to the respective audio signal that is eventually routed to that speaker. Therein, the speaker correction filter preferably has a phase component that is chosen/set to (substantially) match the phase response of that speaker to the phase response of the other one of the first and second speakers. In some cases, speaker correction filters can be applied to both audio signals (i.e., the audio signal that is eventually routed to the first speaker and the audio signal that is eventually routed to the second speaker). In this case, the phase components of both speaker correction filters are chosen/set so that the phase responses of the two speakers (substantially) match. In other words, to allow for a faithful, well-balanced stereo image emitted from the ear and main speakers, the two correction filters (1) and (2) are configured such that the resulting response of loudspeaker plus correction filter is sufficiently similar. In other words, the speaker correction filters aim not only at improving overall timbre, but also at matching the two effective responses in magnitude and phase.
(47) In the second processing mode, the correction filters are coupled to the drivers/transducers of their respective speakers, i.e., the correction filters can be specific to their respective speakers. In this configuration, it is understood that, dependent on the specific horizontal orientation of the device (e.g., normal landscape mode or upside-down landscape mode, which can be obtained by a rotation of the device by 180 degrees) the audio channels need to be interchanged to ensure that the left channel is perceived as coming from the left and the right channel is perceived as coming from the right. In other words, the audio channels may have to be flipped in the 180 degree rotation case of landscape mode.
(48) Regardless of the flipping of the audio channels, the correction filters are not flipped and remain coupled to their respective speakers.
(49) It is understood that also the first processing mode according to embodiments of this disclosure can involve applying speaker correction filter(s).
(50) To allow for accurate phase matching of the two speakers, the two correction filters may differ in their phase response to correct any phase offsets. An exemplary phase offset to be applied to the main speaker to align its effective phase response to the ear speaker is shown as curve 510 in
(51) Applying the difference to the main speaker only is an option. Another option would be to apply the inverse of this phase difference to the ear speaker to align the ear speaker to the main speaker.
(52)
(53) After applying cross-talk cancellation to the audio signals for the first and second speakers (and optionally, after applying respective correction filters), the second processing mode can further involve applying one of a multi-band DRC, a peak limiter, a RMS limiter, or a signal limiter to the respective audio signals that are eventually routed to the first and second speakers. The multi-band DRC, the peak limiter, the RMS limiter, or the signal limiter can be specific to the respective speaker. Thereby, it can be ensured that the audio signals are kept in the linear range of their respective speaker. In the example of
(54)
(55) It is understood that cross-talk cancellation and bass management can be combined also in different manner. For example, cross-talk cancellation and bass management can be performed in an intertwined manner
(56) As discussed in the context of the processing topology A for portrait mode (as an example of the first processing mode) with respect to
(57) In one example, the present invention is directed to asymmetric cross-talk cancellation. On handheld devices (e.g., mobile phones and tablets) the speakers are close together even in landscape mode. When the device is positioned in an landscape orientation, the addition of crosstalk cancellation (often with coupled virtualization) can greatly improve the perceived width and immersiveness of the output sound. See, e.g., WIPO Publication No. WO 2018/132417. The crosstalk canceller can be composed of two filtersone on each channel (ipsilateral paths)and another two filters for the two interchannel (contra-lateral) paths. In the case of speakers with large difference in power handling capabilities, effective crosstalk cancellation is limited by the power handling of the weakest speaker. This can result in very poor loudness levels particularly at low frequencies, where much of the available speaker power is being cancelled by one-another. For this situation band splitting filters where low frequencies bypass the canceller and are sent to the more capable speaker can give big gains in loudness.
(58) Centre extraction techniques that bypass the canceller for the center channel have been developed to improve dialog clarity. Such systems are of even greater value in this asymmetric speaker situation to give improved loudness as they don't lose energy to the crosstalk cancellation for center panned content. Additionally in this topology it is clear that the asymmetric correction filters can be incorporated into the crosstalk canceller for computational efficiency.
(59) There are some hardware topologies where the robustness of the soundstage can be improved by not only asymmetric phase cancellation of the ipsilateral paths but also asymmetric phase cancellation of the contra-lateral paths, due to the asymmetric properties of the directivity pattern of the speakers.
(60)
(61) In one example, the main speakers of a device may be mounted at the bottom of the device in a direction perpendicular to the top-down direction. The main speakers will have a frequency response that depends significantly on the environment of the device and its orientation. For example, if a device is on a flat, hard surface such as a table or desk, the frequency response of a loudspeaker is significantly enhanced compared to the response of the same loudspeaker when the device is hand held. To ensure appropriate spatial imaging using the ear and main speaker, and a consistent timbre between the two speakers, the speaker correction filters need to be modified appropriately depending on the device's orientation, position and environment use case.
(62) An example of a main-speaker correction filter for two different use cases is shown in
(63) The detection of the environment and its implications for (changing the) correction filters could come from a wide variety of sensors that are typically available on mobile devices, including but not limited to:
(64) Camera (front or rear);
(65) Microphones;
(66) Accelerometer or gyroscope;
(67) Any other device sensor.
(68) To determine the effect of the environment on acoustical performance (including aspects such as timbre and spatial imaging on asymmetric speaker configurations), one or more microphones may be used that are available on a portable/mobile device. The goal of this method is to use one or more microphones to capture the audio that is reproduced by the device itself, analyzing the audio to determine environment properties and current acoustic performance, and adjusting the audio playback and/or device processing appropriately if necessary to optimize timbre, loudness, and/or spatial imaging.
(69)
(70) Examples of relevant acoustical/environmental properties that can adjust playback include: The absence or presence of any object potentially interfering with the reproduction of loudspeaker playback such as a hand of a user, a mobile device stand, furniture (table, desk, etc.), a mobile device cover, and alike. The absence or presence of distortion or other indicators of limited acoustical performance of one or more loudspeakers.
(71) Various aspects, implementations, and aspects of the present invention may be appreciated from the following enumerated example embodiments (EEEs).
(72) EEE1 relates to an audio processing method, comprising: receiving media input audio data and sensor data; determining device orientation, position, environment or use-case data based on received device sensor data; and generating media output audio data for loudspeaker playback based on the determined device orientation, position, environment, or use-case data.
(73) EEE2 relates to the method of EEE1, wherein the device orientation data indicates whether the device is in a vertical orientation or horizontal orientation.
(74) EEE3 relates to the method of EEE1 or EEE2, wherein the device orientation, position, environment, or use-case data indicate whether the device is positioned on a surface or is hand-held.
(75) EEE4 relates to the method of any of EEE1 to EEE3, wherein the device processing uses a different processing topology based on the determined device orientation, position, environment, or use-case data.
(76) EEE5 relates to the method of any of EEE1 to EEE4, in which the device processing includes at least one speaker correction filter, said correction filter being dependent on the determined device orientation, position, environment, or use-case data.
(77) EEE6 relates to the method of any of EEE1 to EEE5, wherein the device processing includes at least one speaker correction filter, said speaker correction filter having a phase component intended to match the phase response to another speaker available in a device.
(78) EEE7 relates to the method of any of EEE1 to EEE6, wherein the device processing includes at least one band-split filter to send low-frequency signal content to only one of the speakers.
(79) EEE8 relates to the method of any of EEE1 to EEE7, wherein the processing is configured to switch to horizontal orientation, wherein the processing is based on information from a cross-talk canceller.
(80) EEE9 relates to the method of EEE8, wherein the crosstalk canceller is bypassed for low frequencies to improve loudness.
(81) EEE10 relates to the method of EEE8 or EEE9, wherein a center channel is extracted that bypasses the cross-talk canceller to improve the loudness of asymmetric speakers.
(82) EEE11 relates to the method of any one of EEE1 to EEE10, wherein the acoustic performance and/or environment effect on said acoustic performance is assessed using one or more microphones, and for which said output audio data being processed in response of said acoustic performance.
(83) EEE12 relates to the method of EEE1, wherein the device sensor data is received from at least one device sensor.
(84) EEE13 relates to the method of EEE1, wherein the device sensor data is based on user input.
(85) Various aspects, implementations, and aspects of dynamic equalization for cross-talk cancellation as described in WIPO Publication No. WO 2018/132417 may be appreciated from the following enumerated example embodiments (EEEs), which are not claims.
(86) EEE1: A method of decoding a playback stream presentation from a data stream, the method comprising: a. decoding a first playback stream presentation, the first playback stream presentation configured for reproduction on a first audio reproduction system; b. decoding transform parameters suitable for transforming an intermediate playback stream into a second playback stream presentation, the second playback stream presentation configured for reproduction on headphones, wherein the intermediate playback stream presentation is at least one of the first playback stream presentation, a downmix of the first playback stream presentation, or an upmix of the first playback stream presentation; c. Applying the transform parameters to the intermediate playback stream presentation to obtain the second playback stream presentation; d. Processing the second playback stream presentation by a cross-talk cancellation algorithm to obtain a cross-talk-cancelled signal; e. Processing the cross-talk-cancelled signal by a dynamic equalization or gain stage in which an amount of equalization or gain is dependent on a level of the first playback stream presentation or the second playback stream presentation, to produce a modified version of the cross-talk-cancelled signal; and f. Outputting the modified version of the cross-talk-cancelled signal.
(87) EEE2: The method of EEE1, wherein the cross-talk cancellation algorithm is based, at least in part, on loudspeaker data.
(88) EEE3: The method of EEE2, wherein the loudspeaker data comprise loudspeaker position data.
(89) EEE4: The method of any one of EEE1-EEE3, wherein the amount of dynamic equalization or gain is based, at least in part, on acoustic environment data.
(90) EEE5: The method of EEE4, wherein the acoustic environment data includes data that are representative of the direct-to-reverberant ratio at the intended listening position.
(91) EEE6: The method of EEE4 or EEE5, wherein the dynamic equalization or gain is frequency-dependent.
(92) EEE7: The method of any one of EEE4-EEE6, wherein the acoustic environment data are frequency-dependent.
(93) EEE8: The method of any one of EEE1-EEE7, further comprising playing back the modified version of the cross-talk-cancelled signal on headphones.
(94) EEE9: A method for virtually rendering channel-based or object-based audio, the method comprising: a. Receiving one or more input audio signals and data corresponding to an intended position of at least one of the input audio signals; b. Generating a binaural signal pair for each input signal of the one or more input signals, the binaural signal pair being based on an intended position of the input signal; c. Applying a cross-talk cancellation process to the binaural signal pair to obtain a cross-talk cancelled signal pair; d. Measuring a level of the cross-talk cancelled signal pair; e. Measuring a level of the input audio signals; and f. Applying a dynamic equalization or gain to the cross-talk cancelled signal pair in response to a measured level of the cross-talk cancelled signal pair and a measured level of the input audio, to produce a modified version of the cross-talk-cancelled signal; and g. Outputting the modified version of the cross-talk-cancelled signal.
(95) EEE10: The method of EEE9, wherein the dynamic equalization or gain is based, at least in part, on a function of time or frequency.
(96) EEE11: The method of EEE9 or EEE10, wherein level estimates are based, at least in part, on summing the levels across channels or objects.
(97) EEE12: The method of EEE11, wherein levels are based at least in part, on one or more of energy, power, loudness or amplitude.
(98) EEE13: The method of any one of EEE9-EEE12, wherein at least part of the processing is implemented in a transform or filterbank domain.
(99) EEE14: The method of any one of EEE9-EEE13, wherein the cross-talk cancellation algorithm is based, at least in part, on loudspeaker data.
(100) EEE15: The method of any one of EEE9-EEE14, wherein the loudspeaker data comprise loudspeaker position data.
(101) EEE16: The method of any one of EEE 9-EEE15, wherein the amount of dynamic equalization or gain is based, at least in part, on acoustic environment data.
(102) EEE17: The method of EEE16, wherein the acoustic environment data include data that is representative of the direct-to-reverberant ratio at the intended listening position.
(103) EEE18: The method of EEE16 or EEE17, wherein the dynamic equalization or gain is frequency-dependent.
(104) EEE19: The method of EEE18, wherein the acoustic environment data is frequency-dependent.
(105) EEE20: The method of any one of EEE9-EEE19, further comprising summing the binaural signal pairs together to produce a summed binaural signal pair, wherein the cross-talk cancellation process is applied to the summed binaural signal pair.
(106) EEE21: A non-transitory medium having software stored thereon, the software including instructions for performing a method of decoding a playback stream presentation from a data stream, the method comprising:
(107) decoding a first playback stream presentation, the first playback stream presentation configured for reproduction on a first audio reproduction system;
(108) decoding transform parameters suitable for transforming an intermediate playback stream into a second playback stream presentation, the second playback stream presentation configured for reproduction on headphones, wherein the intermediate playback stream presentation is at least one of the first playback stream presentation, a downmix of the first playback stream presentation, or an upmix of the first playback stream presentation;
(109) applying the transform parameters to the intermediate playback stream presentation to obtain the second playback stream presentation;
(110) processing the second playback stream presentation by a cross-talk cancellation algorithm to obtain a cross-talk-cancelled signal;
(111) processing the cross-talk-cancelled signal by a dynamic equalization or gain stage in which an amount of equalization or gain is dependent on a level of the first playback stream presentation or the second playback stream presentation, to produce a modified version of the cross-talk-cancelled signal; and
(112) outputting the modified version of the cross-talk-cancelled signal.
(113) EEE22: The non-transitory medium of EEE21, wherein the cross-talk cancellation algorithm is based, at least in part, on loudspeaker data.
(114) EEE23: The non-transitory medium of EEE22, wherein the loudspeaker data comprise loudspeaker position data.
(115) EEE24: The non-transitory medium of any one of EEE21-EEE23, wherein the amount of dynamic equalization or gain is based, at least in part, on acoustic environment data.
(116) EEE25: The non-transitory medium of EEE24, wherein the acoustic environment data includes data that is representative of the direct-to-reverberant ratio at the intended listening position.
(117) EEE26: The non-transitory medium of EEE24 or EEE25, wherein the dynamic equalization or gain is frequency-dependent.
(118) EEE27: The non-transitory medium of any one of EEE24-EEE26, wherein the acoustic environment data is frequency-dependent.
(119) EEE28: The non-transitory medium of any one of EEE21-EEE27, further comprising playing back the modified version of the cross-talk-cancelled signal on headphones.
(120) EEE29: A non-transitory medium having software stored thereon, the software including instructions for performing a method of virtually rendering channel-based or object-based audio, the method comprising:
(121) receiving one or more input audio signals and data corresponding to an intended position of at least one of the input audio signals;
(122) generating a binaural signal pair for each input signal of the one or more input signals, the binaural signal pair being based on an intended position of the input signal;
(123) applying a cross-talk cancellation process to the binaural signal pair to obtain a cross-talk cancelled signal pair;
(124) measuring a level of the cross-talk cancelled signal pair;
(125) measuring a level of the input audio signals;
(126) applying a dynamic equalization or gain to the cross-talk cancelled signal pair in response to a measured level of the cross-talk cancelled signal pair and a measured level of the input audio, to produce a modified version of the cross-talk-cancelled signal; and
(127) outputting the modified version of the cross-talk-cancelled signal.
(128) EEE30: The non-transitory medium of EEE29, wherein the dynamic equalization or gain is based, at least in part, on a function of time or frequency.
(129) EEE31: The non-transitory medium of EEE29 or EEE30, wherein level estimates are based, at least in part, on summing the levels across channels or objects.
(130) EEE32: The non-transitory medium of EEE31, wherein levels are based at least in part, on one or more of energy, power, loudness or amplitude.
(131) EEE33: The non-transitory medium of any one of EEE29-EEE32, wherein at least part of the processing is implemented in a transform or filterbank domain.
(132) EEE34: The non-transitory medium of any one of EEE29-EEE33, wherein the cross-talk cancellation algorithm is based, at least in part, on loudspeaker data.
(133) EEE35: The non-transitory medium of any one of EEE29-EEE34, wherein the loudspeaker data comprise loudspeaker position data.
(134) EEE36: The non-transitory medium of any one of EEE32-EEE35, wherein the amount of dynamic equalization or gain is based, at least in part, on acoustic environment data.
(135) EEE37: The non-transitory medium of EEE36, wherein the acoustic environment data includes data that is representative of the direct-to-reverberant ratio at the intended listening position.
(136) EEE38: The non-transitory medium of EEE36 or EEE37, wherein the dynamic equalization or gain is frequency-dependent.
(137) EEE39: The non-transitory medium of EEE38, wherein the acoustic environment data is frequency-dependent.
(138) EEE40: The non-transitory medium of any one of EEE29-EEE39, further comprising summing the binaural signal pairs together to produce a summed binaural signal pair, wherein the cross-talk cancellation process is applied to the summed binaural signal pair.
(139) EEE41: An apparatus, comprising: an interface system; and a control system configured for:
(140) decoding a first playback stream presentation received via the interface system, the first playback stream presentation configured for reproduction on a first audio reproduction system;
(141) decoding transform parameters received via the interface system, the transform parameters suitable for transforming an intermediate playback stream into a second playback stream presentation, the second playback stream presentation configured for reproduction on headphones, wherein the intermediate playback stream presentation is at least one of the first playback stream presentation, a downmix of the first playback stream presentation, or an upmix of the first playback stream presentation;
(142) applying the transform parameters to the intermediate playback stream presentation to obtain the second playback stream presentation;
(143) processing the second playback stream presentation by a cross-talk cancellation algorithm to obtain a cross-talk-cancelled signal;
(144) processing the cross-talk-cancelled signal by a dynamic equalization or gain stage in which an amount of equalization or gain is dependent on a level of the first playback stream presentation or the second playback stream presentation, to produce a modified version of the cross-talk-cancelled signal; and
(145) outputting, via the interface system, a modified version of the cross-talk-cancelled signal.
(146) EEE42: The apparatus of EEE41, wherein the cross-talk cancellation algorithm is based, at least in part, on loudspeaker data.
(147) EEE43: The apparatus of EEE42, wherein the loudspeaker data comprise loudspeaker position data.
(148) EEE44: The apparatus of any one of EEE41-EEE43, wherein the amount of dynamic equalization or gain is based, at least in part, on acoustic environment data.
(149) EEE45: The apparatus of EEE44, wherein the acoustic environment data includes data that is representative of the direct-to-reverberant ratio at the intended listening position.
(150) EEE46: The apparatus of EEE44 or EEE45, wherein the dynamic equalization or gain is frequency-dependent.
(151) EEE47: The apparatus of any one of EEE44-EEE46, wherein the acoustic environment data is frequency-dependent.
(152) EEE48: The apparatus of any one of EEE41-EEE47, further comprising headphones, wherein the control system is further configured for playing back the modified version of the cross-talk-cancelled signal on the headphones.
(153) EEE49: An apparatus, comprising: an interface system; and a control system configured for:
(154) receiving one or more input audio signals and data corresponding to an intended position of at least one of the input audio signals;
(155) generating a binaural signal pair for each input signal of the one or more input signals, the binaural signal pair being based on an intended position of the input signal;
(156) applying a cross-talk cancellation process to the binaural signal pair to obtain a cross-talk cancelled signal pair;
(157) measuring a level of the cross-talk cancelled signal pair;
(158) measuring a level of the input audio signals;
(159) applying a dynamic equalization or gain to the cross-talk cancelled signal pair in response to a measured level of the cross-talk cancelled signal pair and a measured level of the input audio, to produce a modified version of the cross-talk-cancelled signal; and outputting, via the interface system, a modified version of the cross-talk-cancelled signal.
(160) EEE50: The apparatus of EEE49, wherein the dynamic equalization or gain is based, at least in part, on a function of time or frequency.
(161) EEE51: The apparatus of EEE49 or EEE50, wherein level estimates are based, at least in part, on summing the levels across channels or objects.
(162) EEE52: The apparatus of EEE51, wherein levels are based at least in part, on one or more of energy, power, loudness or amplitude.
(163) EEE53: The apparatus of any one of EEE49-EEE52, wherein at least part of the processing is implemented in a transform or filterbank domain.
(164) EEE54: The apparatus of any one of EEE49-EEE53, wherein the cross-talk cancellation algorithm is based, at least in part, on loudspeaker data.
(165) EEE55: The apparatus of any one of EEE49-EEE54, wherein the loudspeaker data comprise loudspeaker position data.
(166) EEE56: The apparatus of any one of EEE52-EEE55, wherein the amount of dynamic equalization or gain is based, at least in part, on acoustic environment data.
(167) EEE57: The apparatus of EEE56, wherein the acoustic environment data includes data that is representative of the direct-to-reverberant ratio at the intended listening position.
(168) EEE58: The apparatus of EEE56 or EEE57, wherein the dynamic equalization or gain is frequency-dependent.
(169) EEE59: The apparatus of EEE58, wherein the acoustic environment data is frequency-dependent.
(170) EEE60: The apparatus of any one of EEE49-EEE59, wherein the control system is further configured for summing the binaural signal pairs together to produce a summed binaural signal pair, wherein the cross-talk cancellation process is applied to the summed binaural signal pair.
(171) EEE61: An apparatus, comprising:
(172) means for receiving a first playback stream presentation and transform parameters;
(173) means for:
(174) decoding the first playback stream presentation, the first playback stream presentation being configured for reproduction on a first audio reproduction system;
(175) decoding the transform parameters, the transform parameters being suitable for transforming an intermediate playback stream into a second playback stream presentation, the second playback stream presentation configured for reproduction on headphones, wherein the intermediate playback stream presentation is at least one of the first playback stream presentation, a downmix of the first playback stream presentation, or an upmix of the first playback stream presentation;
(176) applying the transform parameters to the intermediate playback stream presentation to obtain the second playback stream presentation;
(177) processing the second playback stream presentation by a cross-talk cancellation algorithm to obtain a cross-talk-cancelled signal; and
(178) processing the cross-talk-cancelled signal by a dynamic equalization or gain stage in which an amount of equalization or gain is dependent on a level of the first playback stream presentation or the second playback stream presentation, to produce a modified version of the cross-talk-cancelled signal; and
(179) means for outputting the modified version of the cross-talk-cancelled signal.
(180) EEE62: The apparatus of EEE61, wherein the cross-talk cancellation algorithm is based, at least in part, on loudspeaker data.
(181) EEE63: The apparatus of EEE62, wherein the loudspeaker data comprise loudspeaker position data.
(182) EEE64: The apparatus of any one of EEE61-EEE63, wherein the amount of dynamic equalization or gain is based, at least in part, on acoustic environment data.
(183) EEE65: An apparatus, comprising:
(184) means for receiving a plurality of input audio signals and data corresponding to an intended position of at least some of the input audio signals;
(185) means for:
(186) generating a binaural signal pair for each input signal of the plurality of input signals, the binaural signal pair being based on an intended position of the input signal;
(187) applying a cross-talk cancellation process to the binaural signal pair to obtain a cross-talk cancelled signal pair;
(188) measuring a level of the cross-talk cancelled signal pair;
(189) measuring a level of the input audio signals; and
(190) applying a dynamic equalization or gain to the cross-talk cancelled signal pair in response to a measured level of the cross-talk cancelled signal pair and a measured level of the input audio, to produce a modified version of the cross-talk-cancelled signal; and
(191) means for outputting the modified version of the cross-talk-cancelled signal.
(192) EEE66: The apparatus of EEE65, wherein the dynamic equalization or gain is based, at least in part, on a function of time or frequency.
(193) EEE67: The apparatus of EEE65 or EEE66, wherein level estimates are based, at least in part, on summing the levels across channels or objects.
(194) EEE68: The apparatus of EEE67, wherein levels are based at least in part, on one or more of energy, power, loudness or amplitude.
(195) EEE69: The apparatus of any one of EEE65-EEE68, wherein the cross-talk cancellation algorithm is based, at least in part, on loudspeaker data.
(196) EEE70: The apparatus of any one of EEE65-EEE69, further comprising means for summing the binaural signal pairs together to produce a summed binaural signal pair, wherein the cross-talk cancellation process is applied to the summed binaural signal pair.