Method and system for processing an audio signal including ambisonic encoding

11432092 · 2022-08-30

Assignee

Inventors

Cpc classification

International classification

Abstract

A method for processing a sound signal including synchronously acquiring an input sound signal S.sub.input by means of at least two omnidirectional microphones, encoding the input sound signal S.sub.entréeinput in a sound data D format of the ambisonics type of order R, R being a natural number greater than or equal to one, the encoding step including a directivity optimisation sub-step carried out by means of filters of the Finite Impulse Response filter type. Each of the signals acquired by the microphones is filtered during the directivity optimisation sub-step by a FIR filter, then subtracted from an unfiltered version of each of the other signals in order to obtain N enhanced signals. The present invention also relates to a system for processing the sound signal.

Claims

1. A method for processing a sound signal, the method comprising: synchronously acquiring an input sound signal by each of N omnidirectional microphones, N being a natural number greater than or equal to two; encoding said input sound signal in a sound data format of the ambisonics type of order R, R being a natural number greater than or equal to one, said encoding step comprising a directivity optimisation sub-step carried out by means of filters of the Finite Impulse Response (FIR) filter type, and said encoding step comprising a sub-step of creating an output sound signal in the ambisonics format from N enhanced signals derived from the directivity optimisation sub-step; rendering the output sound signal by means of a digital processing of said sound data; and during the directivity optimisation sub-step, it is subtracted from each of the N input sound signals acquired by the microphones the input sound signals acquired by the N−1 other microphones, each input sound signal acquired by the N−1 other microphones being filtered by a respective one of the FIR filters, in order to obtain the N enhanced signals, wherein the FIR filter applied during the directivity optimisation sub-step to each acquired signal is equal to the ratio of the Z-transform of the impulse response of the microphone associated with the signal object of the subtraction over the Z-transform of the impulse response of the microphone associated with the signal to be filtered then subtracted, for an angle of incidence associated with a direction to be deleted.

2. The method according to claim 1, wherein the N omnidirectional microphones are integrated into a device.

3. The method according to claim 2, wherein the device is a smartphone and wherein the method implements two microphones, each placed on one lateral edge of said smartphone.

4. The method according to claim 1, wherein the microphones are disposed in a circle on a plane, spaced apart by an angle equal to 360°/N.

5. The method according to claim 4, wherein the method implements four microphones spaced apart by an angle of 90° to the horizontal.

6. The method according to claim 1, wherein at least one Infinite Impulse Response (IIR) filter is applied to each of the enhanced signals during the directivity optimisation sub-step in order to correct the artefacts produced by the filtering operations using FIR filters.

7. The method according to claim 6, wherein the at least one IIR filter is a “peak” type filter, of which a central frequency, a quality factor and a gain in decibels can be configured to compensate for the artefacts.

8. The method according to claim 1, wherein the order R of the ambisonics type format is equal to one.

9. The method according to claim 1, wherein the creation of the output signal in the ambisonics format is carried out by algebraic operations performed on the enhanced signals derived from the directivity optimisation sub-step in order to create the different channels of said ambisonics format.

10. A system for processing a sound signal, the system comprising: acquiring, in a synchronous manner, an input sound signal by each of N microphones, N being a natural number greater than or equal to two; encoding said input sound signal in a sound data format of the ambisonics type of order R, R being a natural number greater than or equal to one; and rendering an output sound signal by means of a digital processing of said sound data; wherein said system for processing the sound signal includes means comprising Finite Impulse Response (FIR) filters for filtering each of the N input sound signals acquired by the microphones and subtracting from each of the N input sound signals acquired by the microphones the input sound signals acquired by the N−1 other microphones, each input sound signals acquired by the N−1 other microphones being filtered by a respective one of the FIR filters, in order to obtain N enhanced signals, wherein the FIR filter applied during the directivity optimisation sub-step to each acquired signal is equal to the ratio of the Z-transform of the impulse response of the microphone associated with the signal object of the subtraction over the Z-transform of the impulse response of the microphone associate with the signal to be filtered then subtracted, for an angle of incidence associated with a direction to be deleted.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) The disclosure will be better understood from the following description and the accompanying figures. These are intended for purposes of illustration only and are not intended to limit the scope of the disclosure.

(2) FIG. 1 shows the different steps of the method according to the disclosure.

(3) FIG. 2 shows a smartphone equipped with two microphones acquiring an acoustic wave.

(4) FIG. 3 shows a block diagram of the sub-steps of optimising the directivity of the microphones and of creating the ambisonics format.

(5) FIG. 4 shows a block diagram for determining Infinite Impulse Response filters used during the directivity optimisation sub-step.

(6) FIG. 5 shows a device including two pairs of microphones, the two directions defined by the two pairs of microphones being orthogonal.

(7) FIG. 6 shows a block diagram for the optimisation of the Left channel in the aspect of the disclosure shown in FIG. 5 comprising four microphones.

(8) FIG. 7 shows a block diagram for the creation of the ambisonics format in the aspect of the disclosure shown in FIG. 5.

(9) FIG. 8 shows two pairs of microphones acquiring an acoustic wave, the two directions defined by the two pairs of microphones forming an angle of strictly less than 90°.

DETAILED DESCRIPTION

(10) With reference to FIG. 1, the present disclosure relates to a method 100 for processing a sound signal, comprising the following steps of: synchronously acquiring 110 an input sound signal S.sub.input by means of N microphones, N being a natural number greater than or equal to two; encoding 120 said input sound signal S.sub.input in a sound data D format of the ambisonics type of order R, R being a natural number greater than or equal to one; rendering 130 an output sound signal S.sub.output by means of digital processing of said sound data D.

(11) In the aspect of the disclosure described hereafter, the acquisition 110 is carried out with a number N of microphones equal to two, and the order R is equal to 1 (the ambisonics format is thus referred to as “B-format”). The channels of the B-format will be denoted in the description below by (W; X; Y; Z) according to usual practice, these channels respectively representing: the omnidirectional sound component (W); the Front-Back sound component (X); the Left-Right sound component (Y); the Up-Down sound component (Z).

(12) Acquisition 110 consists of a recording of the sound signal S.sub.input. With reference to FIG. 2, two omnidirectional microphones M.sub.1, M.sub.2, disposed at the periphery of a device 1, acquire an acoustic wave 2 of incidence θ relative to a straight line passing through the said microphones.

(13) In the shown aspect of the disclosure, the device 1 is a smartphone.

(14) The two microphones M.sub.1; M.sub.2 are considered herein to be disposed along the Y dimension. The reasonings that follow could be conducted in an equivalent manner while considering the two microphones to be disposed along the X dimension (Front-Back) or along the Z dimension (Up-Down), the disclosure not being limited by this choice.

(15) At the end of the acquisition step 110, two sampled digital signals are obtained. y.sub.g is used to denote the signal associated with the “Left channel” and recorded by the microphone M.sub.1 and y.sub.d is used to denote the signal associated with the “Right channel” and recorded by the microphone M.sub.2, said signals y.sub.g, y.sub.d constituting the input signal S.sub.input.

(16) S entrée = ( y g y d ) S input

(17) As shown in FIG. 2, the microphone M.sub.1 first acquires the acoustic wave 2 originating from the left. The microphone M.sub.2 acquires it with a delay relative to the microphone M.sub.1. The delay is in particular the result of: a distance d between the two microphones; the presence of an obstacle, in this case the device 1, causing in particular reflection and diffraction phenomena.

(18) When the acoustic wave 2 has a plurality of frequencies, the delay with which the microphone M.sub.2 acquires said acoustic wave depends on the frequency, in particular as a result of the presence of the device 1 between the microphones causing a diffraction phenomenon.

(19) Similarly, each frequency of the acoustic wave is attenuated in a different manner, as a result of the presence of the device 1 on the one hand, and on the other hand as a function of the directivity properties of the microphones M.sub.1, M.sub.2 dependent on the frequency.

(20) Moreover, since the microphones are both omnidirectional, they both reproduce the entire sound space.

(21) Thereafter, the microphones M.sub.1 and M.sub.2 are sought to be differentiated by virtually modifying their directivity by processing the digital signals recorded, so as to be able to combine the modified signals to create the ambisonics format.

(22) FIG. 3 shows the processing operations applied to the digital signals obtained during the acquisition step 110, within the scope of the encoding step 120 of the method according to the disclosure.

(23) In a directivity optimisation sub-step 121, a filter F.sub.21(Z) is applied to the signal y.sub.g of the “Left channel”. The filtered signal is then subtracted from the signal y.sub.d of the “Right channel” by means of a subtractor.

(24) According to the disclosure, the filter F.sub.21(Z) is of the Finite Impulse Response (FIR) filter type. Such a FIR filter allows each of the frequencies to be handled independently, by modifying the amplitude and the phase of the input signal over each of the frequencies, and thus allows the effects resulting from the presence of the device 1 between the microphones to be compensated.

(25) By denoting as H.sub.1(Z, θ) and H.sub.2(Z, θ) the respective Z-transforms of the impulse responses of the microphones M.sub.1 and M.sub.2 when integrated into the device 1, in the direction of incidence given by the angle of incidence θ, the filter F.sub.21(Z) is determined by the relation:

(26) F 21 ( Z ) = H 2 ( Z , θ = 0 ° ) H 1 ( Z , θ = 0 ° )

(27) The choice of a zero angle of incidence θ when determining the filter F.sub.21(Z) allows the sound component originating from the left to be isolated. Thus, after subtracting the signals, an enhanced signal y.sub.d* associated with the “Right channel”, from which the sound component originating from the left has been substantially deleted, is obtained.

(28) The directivity of the microphone M.sub.2 is thus virtually modified so as to essentially acquire the sounds originating from the right.

(29) The same operation is carried out in a similar manner for the Left channel. Similarly, a filter F.sub.12(Z) is applied to the signal y.sub.d of the Right channel. The filtered signal is then subtracted from the signal y.sub.g of the “Left channel” by means of a subtractor. The filter F.sub.12(Z) is a FIR filter defined by the relation:

(30) F 12 ( Z ) = H 1 ( Z , θ = 180 ° ) H 2 ( Z , θ = 180 ° )

(31) The choice of an angle of incidence θ equal to 180° when determining the filter F.sub.12(Z) allows the sound component originating from the right to be isolated. Thus, after subtracting the signals, an enhanced signal y.sub.g* associated with the “Left channel”, from which the sound component originating from the right has been substantially deleted, is obtained.

(32) The directivity of the microphone M.sub.1 is thus virtually modified so as to essentially acquire the sounds originating from the left.

(33) In practice, the filters F.sub.21(Z) and F.sub.12(Z) have properties of high-pass filters and their application produces artefacts. In particular, the frequency spectrum of the enhanced signals y.sub.g*, y.sub.d* is attenuated in the low frequencies and altered in the high frequencies.

(34) In order to correct these defects, at least one filter G.sub.1(Z), G.sub.2(Z) of the Infinite Impulse Response (IIR) filter type is applied to the enhanced signals y.sub.g* and y.sub.d* respectively.

(35) In order to determine the at least one filter G.sub.1(Z) G.sub.2(Z) to be applied, a white noise B is filtered by the filters F.sub.21(Z), F.sub.12(Z) previously determined, as shown in FIG. 4. The filtered signals are then subtracted from the original white noise B. The comparison of the profiles P, P′ of the output signals with the white noise B allows to determine the one or more filters G.sub.1(Z), G.sub.2(Z) to be applied to correct the alterations of the frequency spectrum as a result of the processing of the signals, during the sub-step 121.

(36) In one aspect of the disclosure, the IIR filters are “peak” type filters, of which a central frequency fc, a quality factor Q and a gain G.sub.dB in decibels can be configured to correct the artefacts. Thus, an attenuated frequency could be corrected by a positive gain, an accentuated frequency could be corrected by a negative gain.

(37) Thus, after filtering by the at least one IIR filter G.sub.1(Z), G.sub.2(Z), a corrected signal Y.sub.G is obtained, representative of the sounds originating from the left and a corrected signal Y.sub.D is obtained, representative of the sounds originating from the right.

(38) Thereafter, with reference to FIG. 3, the output in ambisonics format is created 122.

(39) In order to obtain the omnidirectional component W of the sound signal, the corrected signals Y.sub.D, Y.sub.G are added and the result is normalised by multiplying by a gain K.sub.W equal to 0.5:

(40) W = Y G + Y D 2

(41) On the basis of the convention according to which the Y component is positive if the sound essentially originates from the left, the Left-Right sound component is obtained by subtracting the corrected signal Y.sub.D associated with the “Right channel” from the corrected signal Y.sub.G associated with the “Left channel”. The result is normalised by multiplying by a factor K.sub.Y equal to 0.5:

(42) Y = Y G - Y D 2

(43) Given that no information is known on the Front-Back and Up-Down components, the X and Z components are set to zero.

(44) At the end of the encoding step 120, data D in B-format is obtained (in the present aspect of the disclosure, the signals W and Y, the other signals X and Z being set to zero):

(45) D = ( Y G + Y D 2 0 Y G - Y D 2 0 )

(46) The corrected signals Y.sub.G, Y.sub.D of the Left and Right channels respectively can be reproduced by adding and subtracting the signals W and Y:

(47) ( Y G Y D ) = ( W + Y W - Y )

(48) The rendering step 130 consists of rendering the sound signal, thanks to a transformation of the data in ambisonics format into binaural channels.

(49) In one method of implementing the disclosure, the data D in ambisonics format is transformed into data in binaural format.

(50) The disclosure is not limited to the aspect of the disclosure described hereinabove. In particular, the number of microphones used can be greater than two.

(51) In one alternative aspect of the disclosure of the method 100 according to the disclosure, four omnidirectional microphones M.sub.1, M.sub.2, M.sub.3, M.sub.4 disposed at the periphery of a device 1, acquire an acoustic wave 2 of incidence θ relative to a straight line passing through the microphones M.sub.1 and M.sub.2, as shown in FIG. 5.

(52) The two microphones M.sub.1; M.sub.2 are considered herein to be disposed along the Y dimension and the two microphones M.sub.3, M.sub.4 are considered herein to be disposed along the X dimension. The four microphones are disposed in a circle, shown by dash-dot lines in FIG. 5.

(53) At the end of the acquisition step 110, four sampled digital signals are obtained. The following denotations are applied: y.sub.g denotes the signal associated with the “Left channel” and recorded by the microphone M.sub.1; y.sub.d denotes the signal associated with the “Right channel” and recorded by the microphone M.sub.2; x.sub.av denotes the signal associated with the “Front channel” and recorded by the microphone M.sub.3; x.sub.ar denotes the signal associated with the “Back channel” and recorded by the microphone M.sub.4;
the said signals y.sub.g, y.sub.d, x.sub.av, x.sub.ar constituting the input signal S.sub.input:

(54) S entrée = ( y g y d x av x ar ) S input

(55) With reference to FIG. 6, the directivity optimisation sub-step 121 is shown for this aspect of the disclosure. For clarity purposes, only the processing of the signal y.sub.g associated with the Left channel is shown.

(56) In this aspect of the disclosure, the enhanced signal y.sub.g* is obtained by subtracting the signals y.sub.d, x.sub.av and x.sub.ar respectively filtered by FIR filters F.sub.12(Z), F.sub.13(Z) and F.sub.14(Z) from the signal y.sub.g acquired by the microphone M.sub.1, which filters are defined by:

(57) F 12 ( Z ) = H 1 ( Z , θ = 180 ° ) H 2 ( Z , θ = 180 ° ) F 13 ( Z ) = H 1 ( Z , θ = 90 ° ) H 3 ( Z , θ = 90 ° ) F 14 ( Z ) = H 1 ( Z , θ = 270 ° ) H 4 ( Z , θ = 270 ° )
where H.sub.1(Z, θ), H.sub.2(Z, θ), H.sub.3(Z, θ), H.sub.4(Z, θ) denote the respective Z-transforms of the impulse responses of the microphones M.sub.1, M.sub.2, M.sub.3, M.sub.4 when integrated into the device 1, for an angle of incidence θ.

(58) The choice of the angles of incidence 180°, 90°, 270° when determining the filters allows the sound components respectively originating from the right, from the front and from the back to be isolated.

(59) Thus, after subtracting the signals, an enhanced signal y.sub.g* associated with the “Left channel” is obtained, from which the sound components originating from the right, from the front and from the back have been substantially deleted.

(60) A filter G.sub.3(Z) of the IIR type is then applied to correct the artefacts generated by the filtering operations using FIR filters.

(61) At the end of this step, the corrected signal Y.sub.G is obtained.

(62) Similar processing operations can be applied to the signals of the Right, Front and Back channels, in order to respectively obtain the corrected signals Y.sub.D, X.sub.AV, X.sub.AR.

(63) FIG. 7 describes the sub-step 122 of creating the ambisonics format in the aspect of the disclosure using four microphones described hereinabove.

(64) In order to obtain the omnidirectional component W of the sound signal, the corrected signals Y.sub.D, Y.sub.G, X.sub.AV, X.sub.AR are added and the result is normalised by multiplying by a gain K.sub.W equal to one quarter:

(65) 0 W = Y G + Y D + X AV + X AR 4

(66) On the basis of the convention according to which the Y component is positive if the sound essentially originates from the left, the Left-Right sound component is obtained by subtracting the corrected signal Y.sub.D associated with the “Right channel” from the corrected signal Y.sub.G associated with the “Left channel”. The result is normalised by multiplying by the factor K.sub.Y equal to one half:

(67) Y = Y G - Y D 2

(68) On the basis of the convention according to which the X component is positive if the sound essentially originates from the front, the Front-Back sound component is obtained by subtracting the corrected signal X.sub.AR associated with the Back channel from the corrected signal X.sub.Av associated with the Front channel. The result is normalised by multiplying by the factor K.sub.X equal to one half:

(69) X = X AV - X AR 2

(70) In one alternative aspect, the disclosure includes six microphones in order to integrate the Z component of the ambisonics format.

(71) In alternative aspects of the disclosure, the order R of the ambisonics format is greater than or equal to 2, and the number of microphones is adapted so as to integrate all of the components of the ambisonics format. For example, for an order R equal to two, eighteen microphones are implemented in order to form the nine components of the corresponding ambisonic format.

(72) The FIR filters applied to the signals acquired are adapted accordingly, in particular the angle of incidence θ considered for each filter is adapted so as to remove, from each of the signals, the sound components originating from unwanted directions in space.

(73) For example, with reference to FIG. 7, an angle φ between a direction Y through which the microphones M.sub.1 and M.sub.2 pass and a direction X′ through which the microphones M.sub.3 and M.sub.4 pass is strictly less than 90°.

(74) In this aspect of the disclosure, the filter applied to the signal recorded by M.sub.3 and subtracted from the signal acquired by M.sub.1 is given by:

(75) F 13 ( Z ) = H 1 ( Z , θ = φ ) H 3 ( Z , θ = φ )

(76) In this manner, after subtracting the filtered signal from the signal acquired by M.sub.1, an enhanced signal is obtained from which the sound component in the X′ direction has been deleted.

(77) Thus, an ambisonics format of an order greater than or equal to two can be created by adding, for example, microphones in the directions such that φ=45°, φ=90° or φ=135°.

(78) The present disclosure further relates to a sound signal processing system, comprising means for: acquiring, in a synchronous manner, an input sound signal S.sub.input by means of N microphones, N being a natural number greater than or equal to two; encoding the said input sound signal S.sub.input in a sound data D format of the ambisonics type of order R, R being a natural number greater than or equal to one, said means being implemented using filters of the FIR type and using IIR filters of the “peak” type; rendering an output sound signal S.sub.output by means of a digital processing of said sound data D.

(79) This sound signal processing system comprises at least one computation unit and one memory unit.

(80) The above description of the disclosure is provided for the purposes of illustration only. It does not limit the scope of the disclosure.