Method for Audio Processing
20230134271 ยท 2023-05-04
Assignee
Inventors
Cpc classification
H04S2420/01
ELECTRICITY
H04R5/04
ELECTRICITY
H04S7/30
ELECTRICITY
H04S5/00
ELECTRICITY
H04S3/008
ELECTRICITY
H04S2400/01
ELECTRICITY
International classification
H04S5/00
ELECTRICITY
H04R5/04
ELECTRICITY
Abstract
A method for audio processing, the method comprising: determining at least one input audio object that includes an input audio object signal and an input audio object location, wherein the input audio object location includes a distance and a direction relative to a listener location; depending on the distance, applying a delay, a gain, and/or a spectral modification to the input audio object signal to produce a first dry signal; depending on the direction, panning the first dry signal to the locations of a plurality of speakers around the listener location to produce a second dry signal; depending on one or more predetermined room characteristics, generating an artificial reverberation signal from the input audio object signal; mixing the second dry signal and the artificial reverberation signal to produce a multichannel audio signal; and outputting each channel of the multichannel audio signal by one of the plurality of speakers.
Claims
1. A method for audio processing, the method comprising: determining at least one input audio object that includes an input audio object signal and an input audio object location, wherein the input audio object location includes a distance and a direction relative to a listener location; depending on the distance, applying at least one of a delay, a gain, and a spectral modification to the input audio object signal to produce a first dry signal; depending on the direction, panning the first dry signal to locations of a plurality of speakers around the listener location to produce a second dry signal; depending on one or more predetermined room characteristics, generating an artificial reverberation signal from the input audio object signal; mixing the second dry signal and the artificial reverberation signal to produce a multichannel audio signal; and outputting each channel of the multichannel audio signal by one of the plurality of speakers.
2. The method of claim 1, further comprising applying a common spectral modification to adapt the input audio object signal to a frequency range generable by all speakers.
3. The method of claim 2, wherein the common spectral modification comprises a band-pass filter.
4. The method of claim 1, further comprising applying at least one of a spectral speaker adaptation and a time-dependent gain on a signal of at least one channel, and outputting the at least one channel by at least a height speaker comprised in the plurality of speakers.
5. The method of claim 1, further comprising: determining a sub-range of a spectral range of the input audio object signal; outputting, by one or more main speakers that are closer to a listener position than remaining speakers, a main playback signal including frequency components of the input audio object signal that correspond to the sub-range; and discarding the frequency components of the second dry signal that correspond to the sub-range.
6. The method of claim 5, wherein the sub-range comprises a part of the spectral range of the input audio object signal below a predetermined cutoff frequency.
7. The method of claim 5, wherein determining a cutoff frequency comprises: determining the spectral range of the input audio object signal, and calculating the cutoff frequency as an absolute cutoff frequency of a predetermined relative cutoff frequency relative to the spectral range.
8. The method of claim 5, wherein the main speakers are comprised in or attached to a headrest of a seat in proximity to the listener position.
9. The method of claim 5, further comprising outputting by the main speakers, a mix, in particular a sum, of the main playback signal and the multichannel audio signal.
10. The method of claim 5, further comprising transforming the multichannel audio signal to be output by the main speakers by a head-related transfer function of a virtual source location at a greater distance to the listener position than a position of the main speakers.
11. The method of claim 5, further comprising transforming, by cross-talk cancellation, the multichannel audio signal to be output by the main speakers into a binaural main playback signal, wherein outputting the main playback signal comprises outputting the binaural main playback signal by at least two main speakers comprised in the plurality of speakers.
12. The method of claim 1, further comprising panning the artificial reverberation signal to the locations of the plurality of speakers.
13. An apparatus for generating the multichannel audio signal based on the method of claim 1.
14. A method for audio processing, the method comprising: receiving a plurality of input audio objects, and processing each of the plurality of input audio objects, generating an artificial reverberation signal by: generating an adjusted signal, for each input audio object by modifying a gain for an input audio object signal depending on a corresponding distance; determining a sum of the adjusted signals; and processing the sum by a single-channel reverberation generator to generate the artificial reverberation signal.
15. The method of claim 14, wherein the input audio object indicates one or more of: a navigation prompt, a distance between a vehicle and an object outside the vehicle, a warning related to a blind spot around the vehicle, a warning of a risk of collision of the vehicle with an object outside the vehicle, and/or a status indication of a device attached to or comprised in the vehicle.
16. A method for audio processing, the method comprising: determining at least one input audio object that includes an input audio object signal and an input audio object location, wherein the input audio object location includes a distance and a direction relative to a listener location; depending on the distance, applying at least one of a delay, a gain, and a spectral modification to the input audio object signal to produce a first dry signal; depending on the direction, panning the first dry signal to locations of a plurality of speakers to produce a second dry signal; depending on one or more predetermined room characteristics, generating an artificial reverberation signal from the input audio object signal; mixing the second dry signal and the artificial reverberation signal to produce a multichannel audio signal; and outputting each channel of the multichannel audio signal by one of the plurality of speakers.
17. The method of claim 16, further comprising applying a common spectral modification to adapt the input audio object signal to a frequency range generable by all speakers.
18. The method of claim 17, wherein the common spectral modification comprises a band-pass filter.
19. The method of claim 16, further comprising applying at least one of a spectral speaker adaptation and a time-dependent gain on a signal of at least one channel, and outputting the at least one channel by at least a height speaker comprised in the plurality of speakers.
20. The method of claim 16, further comprising: determining a sub-range of a spectral range of the input audio object signal; outputting, by one or more main speakers that are closer to a listener position than remaining speakers, a main playback signal including frequency components of the input audio object signal that correspond to the sub-range; and discarding the frequency components of the second dry signal that correspond to the sub-range.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0063] The features, objects, and advantages of the present disclosure will become more apparent from the detailed description set forth below when taken in conjunction with the drawings in which like reference numerals refer to similar elements.
[0064]
[0065]
[0066]
[0067]
[0068]
[0069]
DETAILED DESCRIPTION
[0070]
[0071] The signal is then split and processed, on the one hand, by one or more dry signal operations 108 and panning 116, and on the other hand, by generating an artificial reverberation signal 124.
[0072] The dry signal processing steps are described with respect to
[0073] In parallel to this, the input audio object signal is transformed into an artificial reverberation signal, 110, based on predetermined room characteristics. For example, as a room characteristic, a reverberation time constant may be provided. The artificial reverberation signal is then generated to decay in time such that the signal decays to, for example, 1/e, according to the reverberation time constant. If, for example, the method 100 is to be used to generate spatialized sound in a vehicle, then the reverberation parameters may be adapted to the vehicle interior. Alternatively, more sophisticated room characteristics may be provided, including a plurality of decay times. Transforming into an artificial reverberation signal may comprise the usage of a feedback delay network (FDN) 112, as opposed to, for example, a convolutional reverberation generator. Implementing the generation of artificial reverberation by the FDN 112 allows flexibly adjusting the reverberation for different room sizes and types. Furthermore, the FDN 112 uses processing power efficiently. Using the FDN 112 allows implementing non-static behavior. The reverberation is preferably applied once on the input audio object signal and then equally mixed into the channels at the output as set out below, i.e., the reverberation signal is preferably a single-channel signal. In an optional step 113, the single-channel signal can be panned over some or all of the speakers. This can make the rendering more realistic. All features related to the dry signal panning are applicable to panning the reverb signal. Alternatively, this step is omitted and panning may be applied to the dry signal, in order to reduce the computing workload.
[0074] To produce a multichannel audio signal, the second dry signal and the artificial reverberation signal are mixed, 114, so that the multichannel audio signal is a combination of both. For example, a sum of both signals may be produced. Also, more complicated combinations are possible. For example, a weighted sum or a non-linear function that takes the second dry signal and the artificial reverberation signal as an input may be utilized.
[0075] Outputting, 116 the multichannel audio signal via the speakers then generates an acoustic output signal that creates the impression to a listener at the listener position that the signal is coming from the input audio object location.
[0076] Determining the second dry signal and the artificial reverberation signal separately and in parallel allows generating a realistic representation of a far signal, while at the same time reducing the number of computational steps. In particular, the relative differences in delay and gain are produced by applying the corresponding transformations to the dry signal, thereby limiting the complexity of the method 100.
[0077]
[0078] In optional steps 204 and 206, the signal is split 204 into two frequency components. The frequency components are preferably complementary, i.e., each frequency component covers its spectral range, and the spectral ranges together cover the entire spectral range of the input audio object signal. In a further exemplary embodiment, splitting the signal comprises determining a cutoff frequency and splitting the signal into a low-frequency component covering all frequencies below the cutoff frequency, and a high-frequency component covering the remainder of the spectrum.
[0079] Preferably, the low-frequency component is processed as a main audio playback signal, and the high-frequency component is processed as a dry signal. This entails that these high-frequency components are used for giving a directional cue to the listener. By contrast, the low-frequency components are represented in the main playback signal played by the main speakers, which are closer to the listener position. The gain is adjusted so that the full sound signal arrives at the listener position. For example, a user sitting in a chair at the listener position, will hear essentially the full sound signal with both high-frequency and low-frequency components. The user will perceive the directional cues from the high-frequency component. By contrast, at any other position, the volume of the low-frequency component is lower, and anyone situated at these positions is prevented from hearing the entire signal. Thereby, users in the surroundings, such as passengers in a vehicle, are less disturbed by the acoustic signals. Also, a certain privacy of the signal is obtained. Use of the high-frequency allows using smaller speakers for the spatial cues.
[0080] Alternatively, the input audio object signal (after optional common spectral modification) is copied to create two replicas, and the above splitting process is replaced by applying high-pass, low-pass, or band-pass filters after finishing the other processing steps.
[0081] The main audio playback signal may optionally be further processed by applying, 224, a head-related transfer function (HRTF). The HRTF, a technique of binaural rendering, transforms the spectrum of the signal such that the signal appears to come from a virtual source that is further away from the listener position than the main speaker position. This reduces the impression of the main signal coming from a source that is close to the ears. The HRTF may be a personalized HRTF. In this case, a user at the listener position is identified and a personalized HRTF is selected. Alternatively, a generic HRTF may be used to simplify the processing. In case two or more main speakers are used, a plurality of main audio playback channels is generated, each of which is related to a main speaker. The HRTF is then generated for each main speaker.
[0082] If two or more main speakers are used, it is preferable to apply, 226, cross-talk cancellation. This includes processing each main audio playback channel such that the component reaching the more distant ear is less perceivable. In combination with the application of the HRTF, this allows the use of main speakers that are close to the listener position, so that the main signal is at a high volume at the listener position and at s lower volume elsewhere, and at the same time has a spectrum similar to that of a signal coming from further away.
[0083] It should be noted that steps 225 and 226 are optional. In a simplified embodiment, no main audio signal may be created, and no main speakers may be used. Rather, first dry signal processing and panning are applied to an unfiltered signal.
[0084] The single-channel modifications 208 comprise one or more of a delay 210, a gain 212, and a spectral modification 214. Applying, 210, a distance-dependent delay on the input audio object signal allows adjusting the relative timing of reverberation and dry signals to the delay observed in a simulated room having the predetermined room characteristics. There, under otherwise equal parameters, the delay of the dry signal is larger at a larger distance. The gain simulates lower volume of the sound due to the increased distance, for example, by a power law. The spectral modification 214 accounts for attenuation of sound in air. The distance-dependent spectral modification 214 preferably comprises a low-pass filter that simulates absorption of sound waves in air. Such absorption is stronger for high frequencies.
[0085] Panning, 216, the first dry signal to the speaker locations generates a multichannel signal, wherein one channel is generated for each speaker, and for each channel, the amplitude is set such that the apparent source of the sound is at a speaker or between two speakers. For example, if the input audio object location, seen from the listener location, is situated between two speakers, the multichannel audio signal is non-zero for these two speakers, and the relative volumes of these speakers are determined using the tangent law. This approach may further be modified by applying a multichannel gain control, i. e. multiplying the signals at each of the channels with a predefined factor. This factor can take into account specifics of the individual speaker, and of the arrangement of the speakers and other objects in the room.
[0086] The optional path from block 216 to block 224 relates to the optional feature that the main speakers are used both for main playback and for playback of the directional cues. In this case, the main speakers are accorded a channel each, in the multichannel output, and the main speakers are each configured to output an overlay, e. g. a sum, of main and directional cue signal. For example, their low-frequency output may comprise the main signal, and their high-frequency output may comprise a part of the directional cues.
[0087] Optionally, speakers may comprise height speakers. For example, the height speakers may comprise speakers that are installed above the height of the listener position, so as to be above a listener's head. For example, in a vehicle, the height speakers may be located above the side windows. The signal may be spectrally adapted, 218, to have high frequencies in the signal. The signal may also subject to a time-dependent gain, in particular increasing gain, such as a fading-in effect. These steps make the fact less obvious for a listener that the speakers are indeed above head's height.
[0088] In order to account for specifics of the room, the gain of each speaker may optionally be adapted, 220. For example, objects, such as seats, in front of a speaker, attenuate the sound generated by the speaker. In this case, the volume of the speakers should be relatively higher than that of the other speakers. This optional adaptation may comprise applying predetermined values but may also change as room characteristics change. For example, in a vehicle, the gain may be modified in response to a passenger being detected as sitting on a passenger seat, a seat position being changed, or a window being opened. In these cases, speakers for which a relatively minor part of the acoustic output reaches the listener position are subjected to increased gain.
[0089] The signal is then sent to step 114, where the signal is mixed with the main signal.
[0090]
[0091] The input audio object 300 comprises information on the type of audio that is to be played (input audio object signal 302), which may comprise any kind of audio signal, such as a warning sound, a voice, or music. It can be received in any format but preferably the signal is included in a digital audio file or digital audio stream. The input audio object 300 further comprises an input audio object location 304, defined as distance 306 and direction 308 relative to the listener location. Execution of the method thereby permits rendering and playing the input audio object signal 302 such that a listener, located at the listener position, is able to hear the sound and have the appearance that the sound is coming from the input audio object location 304. For example, if the input audio object 300 is to comprise an indication of a malfunctioning component, then a stored input audio object signal 302 comprises a warning tone and direction 308 and distance 306 from the expected position of a head of a driver sitting on a driver's seat. Alternatively, when received from a collision warning system, the warning tone, direction 308, and distance 306 may represent a level of danger, direction and distance associated with an obstacle outside the vehicle. For example, a warning system may detect another vehicle on the road and generate a warning signal whose frequency depends on the relative velocities or type of vehicle, and direction 308 and distance 306 of the audio object location represent the actual direction and distance of the object.
[0092] The spectral range 310 of the input audio object signal covers all frequencies from the lowest to the highest frequency. It may be split into different components. In particular, a sub-range 312 may be defined, in order to use the main audio object signal at this sub-range, preferably after applying HRTF 224 and Cross-talk cancellation 226, as a main signal. A remaining part of the spectrum may be then used as a dry signal. In order to determine the sub-range 312, a cutoff frequency 314 may be determined, such that the sub-range covers the frequencies below the cutoff frequency 314.
[0093] The generation of the reverb signal is steered by using one or more room characteristics 316, such as a reverb time, the time and level of the early reflections, the level of the reverberation, or the reverberation time.
[0094] The input audio object signal or the part of its spectrum not comprised in the sub-range 312 is processed by single-channel modifications 208 to generate the first dry signal 318, which is in turn processed by panning, 216, to generate the second dry signal 320. The reverberation signal 322 is generated based on the room characteristics 316 and mixed together with the second dry signal 320 to obtain the multichannel audio signal 324.
[0095]
[0096]
[0097] The speakers 412 may be located substantially in a plane. In this case, the apparent source is confined to the plane, and the direction comprised on the input audio object can then be specified as a single parameter, for example, an angle 514. Alternatively, the speakers may be located three-dimensionally around the listener position 512, and the direction can then be specified by two parameters, for example, azimuthal and elevation angles.
[0098] In this embodiment, the speakers 412 comprise a pair of main speakers 502, in a headrest 504 of a seat (not shown), configured to output the multichannel audio signal 324, and thereby creating the impression that the main audio playback comes from virtual positions 506. The speakers 412 further comprise a plurality of cue speakers 510. In an illustrative example, in a vehicle, the cue speakers may be installed at the height of the listener's (driver's) ears, e. g. in the front dashboard and front A pillars. However, also other positions, such as B pillars, vehicle top, and doors are possible.
[0099] Additional height speakers 508 above the side windows generate sound coming from the sides. A height speaker is a device or arrangement of devices that sends sound waves toward the listener position from a point above the listener position. The height speaker may comprise a single speaker positioned higher the listener, or a system comprising a speaker and a reflecting wall that generates and redirects a sound wave to generate the appearance of the sound coming from above. The time-dependent gain may comprise a fading-in effect, where the gain of a signal is increased over time. This reduces the impression by the listener that the sound is coming from above. A sound source location can thus be placed above a place that is obstructed or otherwise unavailable placing a speaker, and the sound nonetheless appears to come from the that place. This creates the impression of sound coming from a position substantially on the same height as the listener, although the speaker is not in that position. In an illustrative example, in a vehicle, most speakers may be installed at the height of the listener's (driver's) ears, for example, in the A pillars, B pillars and headrests. Additional height speakers above the side windows generate sound coming from the sides.
[0100]
[0101] The input equalizer 608 is configured to apply the common spectral modification 104 to adapt the input audio object signal to a frequency range generable by all speakers. The input equalizer may implement a band-pass filter.
[0102] The signal is then fed into a dry signal processor 610, a main signal processor 628, and a reverb signal processor 632.
[0103] The dry signal processor 610 comprises a distance equalizer 612 configured to apply a spectral modification that emulates sound absorption in air. The front speaker channel processor 614, main speaker channel processor 616, and a height speaker channel processor 618 process each replica of the spectrally modified signal and are each configured to pan the corresponding signal over the speakers, to apply gain corrections, and to apply delays. The parameters of these processes may be different for front, main, and height speakers. The signals for the main speakers, which are close to the listener position, are further processed by the HRTF and cross-talk cancelation 620, in order to create an impression of a signal originating from a more distant source. The three signals are then sent into high pass filters 622, 624, 626 so that the frequency cues are output by this part of the system.
[0104] The main signal processor 628 comprises a low pass filter 630 to create a main signal to be output by the main speakers. In other embodiments, the main signal processor may also comprise head-related transfer function and cross-talk cancelation sections, to create the impression that the main signal is coming from a more distant source.
[0105] The reverb signal processor 632 comprises a reverb generator 634, for example a feedback delay network, to generate a reverb signal based on its input. The reverb signal is then processed by additional reverb signal panning 636, to create the impression that the reverb is originated at the virtual source location. In different embodiments, additional optional steps may comprise application of spectral modifications to better simulate absorption of the reverb in air.
[0106] The signal combiner 638 mixes and sends the signals to the appropriate speakers 640. For example, the main speakers may receive a weighted sum the dry signals treated by the main speaker channel processing 616, the main signal filtered by the low-pass filter 630, and the reverb signal. The height speakers may receive a weighted sum of the dry signals treated by the height speaker channel processing 618 and the reverb signal. The other speakers are, in this embodiment, front speakers. They may receive a weighted sum of the dry signals treated by the front speaker channel processing 614 and the reverb signal.
REFERENCE SIGNS
[0107] 100 Method for audio processing [0108] 102-116 Steps of method 100 [0109] 200 Method for dry signal and main audio signal processing [0110] 202-228 Steps of method 100 [0111] 300 Input audio object [0112] 302 Input audio object signal [0113] 304 Input audio object location [0114] 306 Distance to a listener location [0115] 308 Direction relative to a listener location [0116] 310 Spectral range [0117] 312 Sub-range of the main playback signal [0118] 314 Cutoff frequency [0119] 315 Main playback signal [0120] 316 Room characteristics [0121] 318 First dry signal [0122] 320 Second dry signal [0123] 322 Artificial reverberation signal [0124] 324 Multichannel audio signal [0125] 400 System [0126] 402 Control section [0127] 404 Input equalizer [0128] 406 Dry signal processor [0129] 408 Reverb generator [0130] 410 Signal combiner [0131] 412 Speakers [0132] 500 Virtual source [0133] 502 Main speakers [0134] 504 Headrest [0135] 506 Virtual source for main signal [0136] 508 Height speakers [0137] 510 Directional cue speakers [0138] 512 Listener position [0139] 514 Angle [0140] 600 System [0141] 602 Control section [0142] 604 Distance control [0143] 606 Direction control [0144] 608 Input equalizer [0145] 610 Dry signal processor [0146] 612 Distance equalizer [0147] 614 Front speaker channel processing [0148] 616 Main speaker channel processing [0149] 618 Height speaker channel processing [0150] 620 Head-related transfer function and Cross-talk cancelation [0151] 622 High pass filter for front speakers [0152] 624 High pass filter for front speakers [0153] 626 High pass filter for front speakers [0154] 628 Main signal processor [0155] 630 Low pass filter [0156] 632 Reverb signal processor [0157] 634 Reverb generator [0158] 636 Reverb signal panning [0159] 638 Signal combiner [0160] 640 Speakers