Audio signal processing method

09838823 · 2017-12-05

Assignee

Inventors

Cpc classification

International classification

Abstract

Disclosed is an audio signal processing method. The audio signal processing method according to the present invention comprises the steps of: receiving a bit-stream including at least one of a channel signal and an object signal; receiving a user's environment information; decoding at least one of the channel signal and the object signal on the basis of the received bit-stream; generating the user's reproducing channel information on the basis of the user's received environment information; and generating a reproducing signal through a flexible renderer on the basis of at least one of the channel signal and the object signal and the user's reproducing channel information.

Claims

1. An audio signal processing method performed by an audio signal processing device, comprising: receiving a bit-stream including at least one of a channel signal and an object signal; receiving user environment information; decoding at least one of the channel signal and the object signal based on the received bit-stream; generating a reproduction signal through a flexible renderer based on the user environment information and at least one of the channel signal and the object signal; determining gain and delay in consideration of information on at least one of a speaker's position and a user's position; and applying the gain and delay to the reproduction signal, wherein the generating the reproduction signal generates a first reproduction signal in which the decoded channel signal and the decoded object signal are combined, using information about a user reproduction channel derived based on the user environment information, and wherein the generating the reproduction signal comprises: selecting three (3) channel signals that are adjacent to the object signal using position information of the object signal when the information about the user reproduction channel derived based on the user environment information corresponds to 22.2 channels; multiplying the object signal by a gain value; and combining the multiplied result with at least one of the selected channel signals.

2. The audio signal processing method of claim 1, further comprising: determining whether the user environment information corresponds to a range designated by a standard specification, wherein the generating the reproduction signal is performed by mapping at least one of the channel signal and the object signal to an available channel signal according to the user environment information when the user environment information does not correspond to the range designated by the standard specification.

3. The audio signal processing method of claim 1, wherein generating the reproduction signal generates a second reproduction signal in which the decoded channel signal and the decoded object signal are included, using information about a user reproduction channel derived based on the user environment information.

4. The audio signal processing method of claim 1, further comprising: generating information about a user reproduction channel, wherein the generating information about the user reproduction channel comprises distinguishing an object included in a space range, in which the object is reproducible based on a changed speaker position, from an object that is not included in the space range, in which the object is reproducible.

5. The audio signal processing method of claim 1, wherein selecting the channel signal comprises: selecting three (3) or fewer channel signals that are adjacent to the object signal when the information about the user reproduction channel derived based on the user environment information does not correspond to 22.2 channels; and multiplying the object signal by a gain value that is calculated using sound attenuation information according to a distance, and combining a result with the selected channel signal.

6. The audio signal processing method of claim 1, wherein: receiving the bit-stream comprises receiving a bit-stream further including object end information; and decoding at least one of the channel signal and the object signal comprises decoding the object signal and the object end information, using the received bit-stream and received user environment information, decoding further comprises: generating a decoding object list using the received bit-stream and the received user environment information; generating an updated decoding object list using the decoded object end information and the generated decoding object list; and transmitting the decoded object signal and the updated decoding object list to the flexible renderer.

7. The audio signal processing method of claim 6, wherein generating the updated decoding object list is configured to remove a corresponding item of an object that includes the object end information from the decoding object list that is generated from object information of a previous frame, and add a new object.

8. The audio signal processing method of claim 7, wherein generating the updated decoding object list comprises: storing a frequency of use of a past object; and being substituted by a new object using the stored frequency of use.

9. The audio signal processing method of claim 7, wherein generating the updated decoding object list comprises: storing a usage time of a past object; and being substituted by a new object using the stored usage time.

10. The audio signal processing method of claim 6, wherein the object end information is implemented by adding one or more bits of different additional information to an object sound source header according to a reproduction environment.

11. The audio signal processing method of claim 6, wherein the object end information is capable of reducing traffic.

Description

DESCRIPTION OF DRAWINGS

(1) FIG. 1 is a flowchart of an audio signal processing method according to the present invention;

(2) FIG. 2 is a view describing the format of an object group bit-stream according to the present invention;

(3) FIG. 3 is a view describing the process in which, in an object group, the number of objects to be decoded is selectively determined using user environment information;

(4) FIG. 4 is a view describing an embodiment of an object signal rendering method when the position of a user reproduction channel falls outside of the range designated by a standard specification;

(5) FIG. 5 is a view describing an embodiment in which an object signal according to the position of a user reproduction channel is decoded;

(6) FIG. 6 is a view for explaining the problem caused when a decoding object list is updated without transmission of an END flag, and for explaining the case in which empty space is present in the decoding object list;

(7) FIG. 7 is a view for explaining the problem caused when a decoding object list is updated without transmission of an END flag, and for explaining the case in which no empty space is present in the decoding object list;

(8) FIG. 8 is a view illustrating the structure of an object decoder including an END flag;

(9) FIG. 9 is a view describing the concept of a rendering method (VBAP) using multiple speakers; and

(10) FIG. 10 is a view describing an embodiment of an audio signal processing method according to the present invention.

BEST MODE

(11) The present invention is described in detail below with reference to the accompanying drawings. Repeated descriptions, as well as descriptions of known functions and configurations which have been deemed to make the gist of the present invention unnecessarily obscure, will be omitted below.

(12) The embodiment described in this specification is provided for allowing those skilled in the art to more clearly comprehend the present invention. The present invention is not limited to the embodiment described in this specification, and the scope of the present invention should be construed as including various equivalents and modifications that can replace the embodiments and the configurations at the time at which the present application is filed. The terms in this specification and the accompanying drawings are for easy description of the present invention, and the shape and size of the elements shown in the drawings may be exaggeratedly drawn. The present invention is not limited to the terms used in this specification or the accompanying drawings.

(13) In the following description, when the functions of conventional elements and the detailed description of elements related with the present invention may make the gist of the present invention unclear, a detailed description of those elements will be omitted.

(14) In the present invention, the following terms may be construed based on the following criteria, and terms which are not used herein may also be construed based on the following criteria. The term “coding” may be construed as encoding or decoding, and the term “information” includes values, parameters, coefficients, elements, etc., and the meanings thereof may be differently construed according to the circumstances, and the present invention is not limited thereto.

(15) Hereinafter, referring to the accompanying drawings, an audio signal processing method according to the present invention is described.

(16) FIG. 1 is a flowchart of an audio signal processing method according to the present invention.

(17) Described with reference to FIG. 1, the audio signal processing method according to the present invention includes: receiving a bit-stream including at least one of a channel signal and an object signal (S100), receiving user environment information (S110), decoding at least one of the channel signal and the object signal, based on the received bit-stream (S120), generating user reproduction channel information using the received user environment information (S130), and generating a reproduction signal through a flexible renderer, based on the user reproduction channel information and at least one of the channel signal and the object signal (S140).

(18) Hereinafter, the audio signal processing method according to the present invention is described in more detail.

(19) FIG. 2 is a view describing the format of an object group bit-stream.

(20) Described with reference to FIG. 2, based on an audio feature, multiple object signals are included in a single group, and generate a bit-stream 210.

(21) The bit-stream of the object group is comprised of a bit-stream of a signal DA, in which all objects are included, and individual object bit-streams. The individual object bit-streams are generated by the difference between the DA signal and the signal of a corresponding object. Therefore, an object signal is acquired using the addition of a decoded DA signal and signals that are obtained by decoding the individual object bit-streams.

(22) FIG. 3 is a view describing the process whereby, in an object group, the number of objects to be decoded is selectively determined using user environment information.

(23) Object bit-streams, numbering as many as the number that is selected according to the input user environment information, are decoded. If the number of user reproduction channels within the area that is formed by the position information of the received object group bit-stream is as high as proposed by a standard specification, all of the objects (N objects) in the group are decoded. However, if not, a signal (DA), which adds all the objects, along with some object signals (K object signals), are decoded.

(24) The present invention is characterized in that the number of objects to be decoded is determined by the resolution of a user reproduction channel in the user environment information. Also, a representative object in the group is used when the resolution of the user reproduction channel is low and when each of the objects is decoded. An embodiment for generating a signal that adds all the objects included in a group is as follows.

(25) Attenuation according to the distance between a representative object and other objects in a group is computed according to Stokes' law and added. If the first object is D1, other objects are D2, D3, . . . , Dk, and a is a sound attenuation constant based on frequency and spatial density, the signal DA in which the representative object in the group is added is given by the following Equation 1.
DA=D1+D2exp(−a.Math.d.sub.1)+D3exp(−a.Math.d.sub.2)+ . . . +Dkexp(−a.Math.d.sub.k−1)   [Equation 1]

(26) In the above Equation 1, d.sub.1, d.sub.2, . . . , d.sub.k mean the distance between each object and the first object.

(27) The first object is determined to be the object of which the physical position is closest to the position of a speaker that is always present regardless of the resolution of a user reproduction channel, or the object that has the highest loudness level based on the speaker.

(28) Also, when the resolution of a user reproduction channel is low, the method for determining whether an object in a group is decoded is that the object is decoded when its perceived loudness at the position of the closest reproduction channel is higher than a certain level. As an alternative, simply, an object may be decoded when the distance between the object and the position of a reproduction channel is greater than a certain value.

(29) FIG. 4 is a view describing an embodiment of an object signal rendering method when the position of a user reproduction channel falls outside of the range designated by a standard specification.

(30) Specifically, referring to FIG. 4, it is confirmed that some object signals may not be rendered at desired positions when the position of a user reproduction channel falls outside of the range designated by a standard specification.

(31) In this case, unless the positions of speakers have changed, two object signals may generate sound staging at the given positions using three speakers by a VBAP technique. However, because of the change in the position of the reproduction channel, there is an object signal that is not included in a channel reproduction space range 410, which is the space range in which an object signal may be reproduced by VBAP.

(32) FIG. 5 is a view describing an embodiment in which an object signal according to the position of a reproduction channel is decoded. In other words, described is an object signal decoding method performed when the position of a user reproduction channel falls outside of the range designated by a standard specification, as illustrated in FIG. 4.

(33) In this case, an object decoder 530 may include an individual object decoder, a parametric object decoder, and the like. As a typical example of the parametric object decoder, there is Spatial Audio Object Coding (SAOC).

(34) Whether the position of a reproduction channel in user environment information corresponds to the range of a standard specification is checked, and if the position falls within the range, an object signal that has been decoded by an existing method is transmitted to a flexible renderer. However, if the position of the reproduction channel is very different from the standard specification, the channel signal to which the decoded object signal is added is transmitted to the flexible renderer, to obtain a reproduction channel.

(35) In a detailed embodiment according to the present invention, a step for determining whether user environment information corresponds to the range designated by a standard specification includes determining whether it corresponds to the number of channels according to the standard specification (as a configuration according to the number of channels, 22.2, 10.1, 7.1, 5.1, etc.). Also, the step includes rendering of a decoded object. In this case, if the user environment information corresponds to the number of channels according to the standard, the decoded object is rendered based on the corresponding standard channels, but if not, the decoded object is rendered based on the next highest number of channels among the standard channel configurations. Also, the step includes transmitting the object, which has been rendered according to the standard channels, to a 3DA flexible renderer.

(36) In this case, because the object signal that is input to the 3DA flexible renderer corresponds to the standard channels, the 3DA flexible renderer is implemented by performing flexible rendering according to the position of a user, without rendering of the object.

(37) This implementation method has the effect of resolving unconformity between the spatial precision of object rendering and that of channel rendering.

(38) An audio signal processing method according to the present invention discloses a technique for processing the audio signal of an object signal when the position of a user reproduction channel falls outside of the range designated by a standard specification.

(39) Specifically, after channel decoding and object decoding are performed using the received bit-stream and user environment information, when a change occurs in the position of a user reproduction channel, whether there is an object signal that may not generate sound staging in a desired position using a flexible rendering technique is checked. If such an object signal exists, the object signal is mapped to a channel signal and transmitted to a flexible renderer, and if not, the object signal is directly transmitted to the flexible renderer.

(40) Also, when an object signal is rendered in 3-dimensional space through a VBAP technique, there are an object signal Obj2, which falls within a channel reproduction space range 410, and an object signal Obj1, which falls outside of the channel reproduction space range 410, wherein the channel reproduction space range is a space range in which an object may be reproduced according to the changed position of a speaker, as in the embodiment of FIG. 4.

(41) Also, when the object signal is mapped to a channel signal, the closest channel signals are searched for using the position information of the object signal, signals are multiplied by an appropriate gain value, and the object signal is added.

(42) In this case, if the received user reproduction channel includes 22.2 channels, the 3 closest channel signals are searched for, the object signal is multiplied by a VBAP gain value, and the result is added to the channel signal. If the user reproduction channel does not 22.2 channels, the 3 or fewer closest channels are searched for, the object signal is multiplied by a sound attenuation constant, which is based on a frequency and spatial density, and by a gain value, which is inversely exponentially proportional to the distance between the object and the channel position, and the result is added to the channel signal.

(43) FIG. 6 is a view for explaining the problem caused when a decoding object list is updated without transmission of an END flag, and for explaining the case in which empty space is present in the decoding object list. FIG. 7 is a view for explaining the problem caused when a decoding object list is updated without transmission of an END flag, and for explaining the case in which no empty space is present in the decoding object list.

(44) Described with reference to FIG. 6, empty spaces are present from the k-th position of a decoding object list. When a new object signal is added to the list, the decoding object list is updated by putting the object signal in the k-th space. However, if the decoding object list is filled up as illustrated in FIG. 7, when a new object is added to the list, the object substitutes for an arbitrary object in the list.

(45) Because the object being used is randomly substituted, the previous object signal cannot be used. This problem occurs whenever a new object is added.

(46) FIG. 8 is a view illustrating the structure of an object decoder including an END flag.

(47) Described with reference to FIG. 8, an object bit-stream is decoded to object signals through an object decoder 530. An END flag is checked in the decoded object information, and a result is transmitted to an object information update unit 820. The object information update unit 820 receives the past object information and the current object information, and updates the data in a decoding object list.

(48) An audio signal processing method according to the present invention is characterized in that an emptied decoding object list may be reused by transmitting an END flag.

(49) The object information update unit 820 removes an unused object from the decoding object list, and increases the number of decodable objects on the receiver side, which has been determined by user environment information.

(50) Also, by storing the frequency of use of the past object or the time of use of the past object, when there is no empty space in the decoding object list, the object having the lowest frequency of use or the earliest used object may be substituted with a new object.

(51) Also, the END flag check unit 810 checks whether the set END flag is valid by checking a single bit of information corresponding to the END flag. As another operation method, it is possible to verify whether the set END flag is valid according to a value obtained by dividing the length of a bit-stream of the object by 2. These methods may reduce the amount of information that is used to transmit the END flag.

(52) Hereinafter, referring to the drawing, an embodiment of an audio signal processing method according to the present invention is described.

(53) FIG. 10 is a view describing an embodiment of an audio signal processing method according to the present invention.

(54) Described with reference to FIG. 10, an object position calibration unit 1030 updates the position information of an object sound source for lip synchronization, using the previously measured positions of a screen and a user. An initial calibration unit 1010 and a user position calibration unit 1020 serve to directly determine a constant value for a flexible rendering matrix, whereas the object position calibration unit performs a function for calibrating object sound source position information, which is used as an input of an existing flexible rendering matrix along with the object sound source signal.

(55) If rendering of the transmitted object or channel signal is a relative rendering value based on a screen that is arranged to have a specific size in a specific position, when the changed screen position information is received according to the present invention, the position of the object to be rendered or the channel to be rendered may be changed using the relative value between the changed screen position information and the initial screen information.

(56) To update object sound source information by the proposed method, depth information of an object that maintains a distance from a screen (or becomes far from or close to the screen) should be determined when content is generated, and should be included in the object position information.

(57) The depth information of an object may also be obtained using existing object sound source information and screen position information. The object position calibration unit 1030 updates the object sound source information by calculating the position angle of the object based on a user in consideration of both the depth information of the decoded object and the distance between the user and the screen. The updated object position information and the rendering matrix update information, which is calculated by the initial calibration unit 1010 and user position calibration unit 1020, are transmitted to the flexible rendering stage, and are used to generate a final speaker channel signal.

(58) Consequently, the proposed invention relates to a rendering technique for assigning an object sound source to each speaker output. In other words, gain and delay values for calibrating the localization of the object sound source are determined by receiving object header (position) information, including time/spatial position information of the object, position information that represents unconformity between a screen and a speaker, and position/rotation information of a user's head.

(59) To update object sound source information by the proposed method, depth information of an object that maintains a distance from a screen (or becomes far from or close to the screen) should be determined when content is generated, and should be included in the object position information. The depth information of an object may also be obtained using existing object sound source information and screen position information. The object position calibration unit updates the object sound source information by calculating the position angle of the object based on a user in consideration of both the depth information of the decoded object and the distance between the user and the screen. The updated object position information and the rendering matrix update information, which is calculated by the initial calibration unit and user position calibration unit, are transmitted to the flexible rendering stage, and are used to generate a final speaker channel signal.

(60) Consequently, the proposed invention relates to a rendering technique for assigning an object sound source to each speaker output. In other words, gain and delay values for calibrating the localization of the object sound source are determined by receiving object header (position) information, including time/spatial position information of the object, position information that represents unconformity between a screen and a speaker, and position/rotation information of a user's head.

(61) The audio signal processing method according to the present invention may be implemented as a program that can be executed by various computer means. In this case, the program may be recorded on a computer-readable storage medium. Also, multimedia data having a data structure according to the present invention may be recorded on the computer-readable storage medium.

(62) The computer-readable storage medium may include all types of storage media to record data readable by a computer system. Examples of the computer-readable storage medium include the following: ROM, RAM, CD-ROM, magnetic tapes, floppy disks, optical data storage, and the like. Also, the computer-readable storage medium may be implemented in the form of carrier waves (for example, transmission over the Internet). Also, the bit-stream generated by the above-described encoding method may be recorded on the computer-readable storage medium, or may be transmitted using a wired/wireless communication network.

(63) Meanwhile, the present invention is not limited to the above-described embodiments, and may be changed and modified without departing from the gist of the present invention, and it should be understood that the technical spirit of such changes and modifications also belong to the scope of the accompanying claims.

(64) The embodiment of the present invention is provided for allowing those skilled in the art to more clearly comprehend the present invention. Therefore, the shape and size of the elements shown in the drawings may be exaggeratedly drawn for clear description.

(65) It will be understood that, although the terms “first,” “second,” “A,” “B,” “(a),” “(b),” etc., may be used to describe components of the present invention, these terms are only used to distinguish one component from another component. Thus, the nature, sequence, or order of the components is not limited by these terms.