Method for playing back the sound of a digital audio signal
09609454 ยท 2017-03-28
Inventors
Cpc classification
H04S2400/15
ELECTRICITY
H04R5/04
ELECTRICITY
H04S2400/11
ELECTRICITY
H04R2205/021
ELECTRICITY
H04S5/00
ELECTRICITY
H04S2400/05
ELECTRICITY
H04S3/008
ELECTRICITY
H04S2400/07
ELECTRICITY
International classification
H04S7/00
ELECTRICITY
H04S5/00
ELECTRICITY
H04R5/04
ELECTRICITY
Abstract
A method for playing back the sound of a digital audio signal comprising an oversampling step consisting of producing, from a signal sampled at a frequency F, a signal sampled at a frequency NF, where N corresponds to an integer greater than 1, then of applying convolution processing to a first digital file sampled at a frequency NF corresponding to the acquisition of the soundscape of a reference sound space, a second digital file sampled at a frequency NF corresponding to the acquisition of the noise footprint of a piece of reference playback equipment, a third digital file sampled at a frequency NF corresponding to the acquisition of the noise footprint of an equalizer and a fourth file corresponding to said oversampled audio file, the resulting digital packets then undergoing digital conversion processing at a sampling frequency F/M corresponding to the working frequency of the listening equipment.
Claims
1. A method for playing back the sound of a digital audio signal, comprising: a step of oversampling is executed, which consists in producing, from a signal sampled at a frequency F, a signal sampled at a frequency NF, where N corresponds to an integer greater than 1, then in applying convolution processing to a first digital file sampled at a frequency NF corresponding to the acquisition of the soundscape of a reference sound space, a second digital file sampled at a frequency NF corresponding to the acquisition of the noise footprint of a piece of reference playback equipment, a third digital file sampled at a frequency NF corresponding to the acquisition of the noise footprint of an equalizer and a fourth file corresponding to said oversampled audio file, with the resulting digital packets then undergoing digital conversion processing at a sampling frequency F/M corresponding to the working frequency of the listening equipment.
2. The method for playing back the sound of a digital audio signal according to claim 1, further comprising an additional step of recomputing the file corresponding to said noise footprint of the reference sound space, so as to change the balance between the space channels of said noise footprint.
Description
BRIEF DESCRIPTION OF THE DRAWING
(1) The invention will be better understood upon reading the following description, referring to the appended drawings corresponding to non-restrictive embodiments wherein:
(2)
DETAILED DESCRIPTION
(3) The processing method according to the invention consists in producing different acoustic footprints of a sound source, in order to achieve a convolution of such various noise footprints.
(4) The convolutions technology is a known capture technique implemented by the user, then the reproduction of the acoustic behavior of a location or a device. For example, the convolution reverberations make it possible to propose using the acoustics of many real places, famous concert halls or other places: such previously sampled acoustics may be reused at will within the program.
(5) In the case of sound on picture, the first considered exploitation of this possibility was the capture of acoustics on filming sets in order to get direct acoustic links between the direct sounds and the sounds added in post-production (post-synchronizing, sound effects).
(6) The principle then consists in executing the sampling of the acoustics on the sets where scenes of the movie have been shot, in order to be able to easily apply such acoustics to the elements recorded afterwards so that they fit perfectly with the sounds from the direct sound recordings.
(7) The Impulse Response sensor to obtain the impulse response of a piece of equipment or a room constituting the noise footprint is based on deconvolution. It uses the excitation of the system by a known signal (referred to herein as f(t)). Such signal is such that if a transform (deconvolution function) is applied thereto, the result is the Dirac function.
(8) The deconvolution function is so chosen that, for the excitation signal f(t) and any function h(t):
G[f(t)]=(t)
G[f(t)=G[h(t)]*f(t)=G[f(t)]*h(t)
(9) With this deconvolution function, an impulse response signal of a system is produced from the response thereof to an excitation signal different from the Dirac pulse.
(10) Upon listening, the types of signals used to capture impulse responses sound like a Gaussian noise or a white noise. The excitation sequences are generated by a deterministic algorithm and are periodic (periods of the order of a few seconds or tens of seconds for our application) and form a pseudo-random signal.
(11) Such sequences are created by linear feedback shift registers (LFSR). Such register structure, the order of which is determined by the number of registers, is such that, over its period, it will produce all the possible binary values for its order (if the structure is of the fourth order, 2.sup.n values are possible). Such sequences are known by the persons skilled in the art as MLS for Maximum Length Sequence: the longest possible sequence of binary numbers without repeating twice the same value.
(12) The initial popularity of the MLS is based on the simplicity of the deconvolution method.
(13) As a matter of fact, the MLS signal is such that for the deconvolution thereof, a transform can be used known as the Hadamard transform, which simplifies the calculations and has the advantage of being calculable by a computer using few resources.
(14) Another excitation signal solution is based on the so-called logarithmic sweep or exponential sweep technique, which corresponds, as the name suggests, to a shifting sinus the frequency of which is related to time by an exponential law. This implies that the shifting is faster at high frequencies than at low frequencies, and consequently its spectrum is that of a pink noise (less energy is released at high frequencies since less time is used).
(15) The measures taken can be deconvoluted in two ways. The first one uses the passage in the frequency domain to execute computing prior to returning to the time domain. The second one consists in not periodically convoluting the recorded signal with the temporally returned excitation signal:
h(t)=r(t)s*(tT)
(16) where T is the sweep duration.
(17) With this procedure, two advantages appear: The non-linear distortions of the system are totally rejected and do not disturb the measurement of the linear impulse response of the system The method tolerates slight audio video splits: the sweep can be broadcast from a device and be recorded by another without these two machines being synchronized by a clock.
(18) In the present invention, three noise footprints or impulse responses are captured, which correspond to: a noise footprint of a listening means, for example a headset a noise footprint of an equalizer a noise footprint of a reference sound space.
(19) Each of these impulse responses is captured from a reference signal with a high sampling, above the nominal sampling frequency of the playback equipment.
(20) For example, the room footprint 3 is acquired from a white noise producing a 6 MByte file per loudspeaker, for a long time greater than 500 milliseconds, preferably between one and two seconds. The file corresponding to the impulse response is then compressed without loss (ZIP compression for example) and encrypted.
(21) The footprint of the headphones 1 (or a series of loudspeakers) is acquired in the same way with a white or a pink signal having a duration of about 200 milliseconds, preferably between 100 and 500 milliseconds.
(22) The footprint of the equalizer 2 is acquired in the same way with a white or a pink signal having a duration of about 200 milliseconds, preferably between 100 and 500 milliseconds for each equalizer setting.
(23) These three impulse response files 1 to 3 as well as the digital file of the audio signal 4 undergo convolution processing 5 based on processing by fast Fourier transform FFT.
(24) To reduce the computing time, a step 6 is executed, which makes it possible to dynamically recalculate the left and right footprints depending on the particularities of the playback equipment and if appropriate on the listener's sensory characteristics. An adjusting means making it possible to change the virtual spatial position is available, for instance. A change in this setting controls the computing of a new pair of noise footprints from the footprints originally provided by morphing: a central virtual speaker and two footprints for the right loudspeaker and the left loudspeaker are taken into account the left/right footprints are recomputed in real time to move the sound spot
(25) This function can be controlled by the gyro sensor to create a dynamic movement of the sound spot based on the user's movements.
(26) It makes it possible to center the voice in real time relative to the head.