RENDERING ENCODED 6DOF AUDIO BITSTREAM AND LATE UPDATES
20230171557 · 2023-06-01
Inventors
Cpc classification
H04S2400/15
ELECTRICITY
H04S7/302
ELECTRICITY
H04S2400/11
ELECTRICITY
H04N21/4728
ELECTRICITY
H04N21/8106
ELECTRICITY
G10L19/167
PHYSICS
H04N21/4394
ELECTRICITY
H04N21/4318
ELECTRICITY
International classification
Abstract
Examples of the disclosure relate to apparatus, methods and computer programs for enabling audio content rendering. An example apparatus comprising means for receiving a bitstream which comprises audio content; means for receiving dynamic content independent from the bitstream; means for receiving at least one instruction for the dynamic content from at least one of: the received bitstream or the received dynamic content; and means for rendering audio with a renderer based upon the audio content of the bitstream, the received dynamic content, and the at least one instruction. In an embodiment, the means for receiving the at least one instruction comprises means for determining presence the of at least one instruction for the dynamic content in the bitstream, When the bitstream does not comprise the at least one instruction for the received dynamic content, the apparatus comprising means for rendering audio with a renderer based upon the audio content of the bitstream without adapting the audio based upon the received dynamic content, When the bitstream comprises the at least one instruction for the received dynamic content, the apparatus comprising means for rendering the audio with the renderer based upon the audio content of the bitstream, the received dynamic content, and the at least one instruction. In a further embodiment, the apparatus further comprising means for determining position of audio elements in the audio scene and audio elements in the dynamic content. When the audio elements in the audio scene and the audio elements in the dynamic content are in a same acoustic environment, the apparatus comprising means for rendering audio with the renderer based upon the audio content of the bitstream without adapting the audio based upon the received dynamic content. When the audio elements in the audio scene and the audio elements in the dynamic content are not in the same acoustic environment, the apparatus comprising means for rendering the audio with the renderer based upon both the audio content of the bitstream and the received dynamic content. In an embodiment, the apparatus further comprising means for determining an anchor object in an audio scene; means for determining at least one instruction for dynamic content relative to the anchor object; and means for transmitting the audio scene in a bitstream, where the bitstream comprises the at least one instruction.
Claims
1-21. (canceled)
22. A method comprising: determining an audio content; determining dynamic content; and determining at least one instruction for the dynamic content.
23. The method as claimed in claim 22, further comprises rendering audio with a renderer based upon the audio content, the dynamic content, and the at least one instruction.
24. The method as claimed in claim 23, further comprises receiving at least one of: a bitstream, wherein the bitstream comprises the audio content; and the dynamic content independent from the bitstream.
25. The method as claimed in claim 24, wherein determining the at least one instruction for the dynamic content comprises receiving the at least one instruction from at least one of: the received bitstream; and the dynamic content.
26. The method as claimed in claim 22, wherein the dynamic content is at least one of: received at a renderer interface or as a MPEG-H Audio Stream packet; and arriving with a timestamp to enable association of the dynamic content with a playback timeline, or one or more bitstream content time segments.
27. The method as claimed in claim 24, further comprising at least one of: determining information regarding at least one anchor object in the dynamic content; associating the at least one anchor object in the dynamic content with at least one anchor object in the bitstream; and modifying a position of an audio element in the dynamic content whose position is defined relative to the at least one anchor object in the bitstream.
28. The method as claimed in claim 24, further comprising determining a spatial audio flag value in the dynamic content, and selecting to: when the spatial audio flag value is false, rendered dynamic content communication audio without any further acoustic modelling, or when the spatial audio flag value is true, render dynamic content communication audio with acoustic modelling according to the information in the bitstream.
29. The method as claimed in claim 24, further comprising determining position of an audio element in the audio content of the bitstream and an audio element in the dynamic content, and selecting to: when the audio element in the audio content and the audio element in the dynamic content are in a same acoustic environment, render audio based upon the audio content of the bitstream without adapting the audio based upon the received dynamic content, or when the audio element in the audio content and the audio element in the dynamic content are not in the same acoustic environment, render audio based upon both the audio content of the bitstream and the received dynamic content.
30. The method as claimed in claim 24, further comprising determining position of an audio element in the audio content of the bitstream and an audio element in the dynamic content, and selecting to: modify a position of the audio element in the dynamic content by moving the audio element outside of an acoustic environment, or modify a position of the audio element in the dynamic content by moving the audio element together as a constellation.
31. The method as claimed in claim 24, further comprising determining presence of at least one instruction for the dynamic content, and selecting to: when the bitstream does not comprise the at least one instruction for the dynamic content, render audio based upon the audio content of the bitstream without adapting the audio based upon the received dynamic content, or when the bitstream comprises the at least one instruction for the received dynamic content, render audio based upon the audio content of the bitstream, the received dynamic content, and the at least one instruction.
32. The method as claimed in claim 24, further comprising determining position of audio elements in the audio content and audio elements in the dynamic content, and selecting to: when the audio elements in the audio content and the audio elements in the dynamic content are in a same acoustic environment, render audio based upon the audio content of the bitstream without adapting the audio based upon the received dynamic content, or when the audio elements in the audio content and the audio elements in the dynamic content are not in the same acoustic environment, render audio based upon both the audio content of the bitstream and the received dynamic content.
33. The method as claimed in claim 22, further comprises at least one of: determining the audio content comprises receiving audio content; and determining the dynamic content comprises receiving dynamic content.
34. The method as claimed in claim 22, further comprises: determining an anchor object in the audio content, wherein the audio content comprises an audio scene; determining the at least one instruction for the dynamic content relative to the anchor object; and transmitting the audio scene in a bitstream, where the bitstream comprises the at least one instruction.
35. An apparatus comprising: at least one processor; and at least one non-transitory memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to: determine an audio content; determine dynamic content; and determine at least one instruction for the dynamic content.
36. The apparatus as claimed in claim 35, is further caused to receive at least one of: a bitstream, wherein the bitstream comprises the audio content; and the dynamic content independent from the bitstream.
37. The apparatus as claimed in claim 36, wherein the determined at least one instruction for the dynamic content causes the apparatus to receive the at least one instruction from at least one of: the received bitstream; and the dynamic content.
38. The apparatus as claimed in claim 35, wherein the dynamic content is at least one of: received at a renderer interface or as a MPEG-H Audio Stream packet; and arriving with a timestamp to enable association of the dynamic content with a playback timeline, or one or more bitstream content time segments.
39. The apparatus as claimed in claim 36, is further caused to at least one of: determine information regarding at least one anchor object in the dynamic content; associate the at least one anchor object in the dynamic content with at least one anchor object in the bitstream; and modify a position of an audio element in the dynamic content whose position is defined relative to the at least one anchor object in the bitstream.
40. The apparatus as claimed in claim 35, is further caused to at least one of: determine the audio content based on received audio content; and determine the dynamic content based on received dynamic content.
41. The apparatus as claimed in claim 35, is further caused to: determine an anchor object in the audio content, wherein the audio content comprises an audio scene; determine the at least one instruction for the dynamic content relative to the anchor object; and transmit the audio scene in a bitstream, where the bitstream comprises the at least one instruction.
Description
SUMMARY OF THE FIGURES
[0006] For a better understanding of the present application, reference will now be made by way of example to the accompanying drawings in which:
[0007]
[0008]
[0009]
[0010]
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019] EXAMPLES OF 6DOF RENDERING ADAPTATION [0020] 1. One example may comprise re-alignment of audio elements' position performed based on the content creator specified approach when a dynamic audio scene change contains different position compared to bitstream content. For example, if a group of audio elements have been assigned to always belong to a common acoustic environment (such as assigned by the content creator for example) the adaptation for a dynamic update may ensure this condition is upheld. An acoustic environment is a space with certain acoustic characteristics and is defined in the MPEG-I encoder input format. In practice, the condition can mean that the audio elements may be located in a certain space as a whole group; so that some elements are not outside the space (e.g., in two different rooms separated by a wall). [0021] 2. Another example may be based on the content creator intent in the bitstream, where the renderer controls the application of acoustic modelling for dynamic content comprising communication audio from a remote user in social Virtual Reality (VR). This is also applicable to social AR. In an example, the content creator may indicate in the bitstream that communication audio needs to be reverberated. In the case of dry communication audio, the renderer may apply acoustic modelling on the signal according to the bitstream indication, but if the communication audio is already reverberated (IVAS audio), no acoustic modelling is applied.
[0022] As an example, this may be achieved by: [0023] the adding of association and modification metadata in the MPEG-I audio bitstream. The association and modification metadata may be defined as a new entity, an “anchor-object”, in the encoder input format (EIF). The EIF may be incorporated in the bitstream by the MPEG-I audio encoder. [0024] a new interface may be added to the MPEG-I Audio renderer to ingest the dynamic content (i.e. audio data or information available ONLY during playback) for a new late binding adoption module in the MPEG-I Audio renderer. The information adoption module may perform adoption of dynamic content information as indicated by the anchor-object entity in the bitstream content. The dynamic content information may comprise instructions for determination of rendering parameters.
[0025] Knowledge of position of audio elements which are related to real world or real time features or objects, available in Augmented Reality (AR) for example, is available during content consumption. Support for rendering of audio elements which do not have positions known during content creation (i.e. during the encoding or creation of MPEG-I Audio bitstream for example) is a challenge for the acoustic modelling of the audio source. This problem is not necessarily limited to audio content, but also to all modalities which are relative to real world features or objects (such as visual content for example). However, this problem is specifically a required feature for MPEG-I Audio Renderer implementations to be useful for AR domain. The coordinates, extent, etc. of the real-world objects, which correspond to the MPEG-I audio elements, may be known only during the time of content consumption or playback. This information, because it is based upon real world, render time object(s), may arrive just in time, such as from the AR consumption device sensors (e.g., acoustic environment such as room geometry, materials, etc.), also referred to as “dynamic content” herein. Features as described herein may be used to handle this real world, real time scenario from an audio rendering perspective. This may be related to dynamic scene updates and AR evaluation; which is one of the two main categories agreed to be evaluated for a MPEG-I 6DoF Audio call for proposal.
[0026] In addition, there is currently no method available to render dynamic content in an acoustic scene which contains encoded content (with entirely known rendering properties such as position, orientation, acoustic properties, etc.). Consequently, rendering dynamic content which arrives at the renderer just in time during content consumption or playback, without the necessary processing by an encoder to determine the appropriate rendering parameters, may lead to a poor match between the rendering of the dynamic content and the bitstream content. This would lead to a poor subjective quality, and adversely impact the user experience.
[0027] Features as described herein may be used to address MPEG-I requirements related to dynamic scene updates and Social VR (w18158, MPEG-I Audio Architecture and Requirements). For example,
[0028] Social VR
[0029] A specification may support rendering of speech and audio from other users in a virtual environment. The speech and audio may be immersive. [0030] a. The specification may support low-latency conversation between users within a given virtual environment. [0031] b. The specification may support low-latency conversation between a user within the given virtual environment and a user outside the given virtual environment. [0032] c. The specification may enable synchronization of audio and video of users and the scene. [0033] d. The specification may support metadata specifying restrictions and recommendations for rendering of speech/audio from the other users (e.g. on placement and sound level).
[0034] Features as described herein will now be described with regard to implementation with reference to two embodiments; a first one is in regard to enabling AR content consumption, and a second one is in regard to enabling Social VR content consumption.
[0035]
[0036] As illustrated with
[0037] Features may comprise AR sensing as illustrated by 210. This may provide input to the association and modification block 208. In the renderer 206, output from the association and modification block 208 may be provided to the auralization 212. At least two pipelines may be provided comprising the dynamic rendering pipeline 602 and the bitstream rendering pipeline 600.
[0038] The anchor object description facilitates association of the dynamic content information with the audio entities and their parameters in the bitstream. The content consumption application may identify the AR-enabled content with the presence of an indication in the received audio content. The AR capable audio bitstream indication may be implemented as a file type in the header of the MPEG-H file format.
[0039] Current MPEG-H bitstream carries information in the sample table box to indicate if it is a single file with an audio track consisting of single stream or multiple stream MPEG-H bitstream (e.g., for single file playback), a single/multiple stream streaming MPEG-H bitstream which can change its configuration at any sample (e.g., useful for streaming over DASH, MMT, etc.). Similarly, to indicate presence of 6DOF VR-only, content is implemented as a new MPEG-H bitstream containing 6DOF VR content and may be labelled as ‘mi6v’ [0040] Box Types: ‘mi6v’, ‘mi6a’ [0041] Container: Sample Table Box (‘stbl’) [0042] Mandatory: No [0043] Quantity: One or more sample entries may be present
[0044] For 6DoF streaming or broadcast environments based on (such as MPEG-DASH or MPEG-H MMT for example), the MPEG-H 3D Audio configuration may include 6DOF metadata capable packets which may change at arbitrary positions of the stream, and not necessarily only on fragment boundaries. To enable this use-case, a new MHASampleEntry may be defined to indicate 6DoF rendering related metadata for MPEG-H 3D Audio files.
[0045] If the bitstream content is also enabled to be used in AR, the sample entry may be ‘mi6a’ for MPEG-H audio bitstream suitable for 6DOF rendering as well as AR consumption.
[0046] Another component to add AR support may comprise implementing a new interface in the MPEG-I 6DoF Audio renderer to ingest dynamic content comprising scene information obtained from the sensing apparatus 210 shown in
[0047] The dynamic content may be ingested, and necessary rendering adaptation may be performed for the parameters defined in bitstream content, such as per the content creator instructions in the bitstream 204 shown in
[0048] AR AnchorObjects
[0049] In one example embodiment, the positions of a set of AudioElements defined in the bitstream may be only known at rendering time. The bitstream may contain an AudioScene with at least the following information: [0050] audio signals for AudioElements in the scene [0051] an AnchorObject with rendering instructions (see below) [0052] AudioElement positions, which are defined relative to the position of the AnchorObject. (The position of the AnchorObject may not be known at this point.) [0053] The position and dimensions of an AudioEnvironment (a room, for example). In some cases, the AudioEnviroment may not be in the bitstream, but input as a dynamic update.
[0054] The rendering instructions in the AnchorObject may contain the following (as shown in
[0058] Example XML description of dynamic update adaptation information in the EIF is shown in
[0059] During rendering, the renderer may receive dynamic updates via a dynamic ingestion interface or as a new type of MPEG-H Audio Stream (MHAS) packet. The updates may include the position of the anchor object and/or the positions of surfaces (walls, floor, ceiling etc.) in the current user environment. Thus, at this point one may have 1) an audio scene in the bitstream, 2) rendering instructions for dynamic updates also in the bitstream, and 3) a dynamic update at rendering time. Based on these, the renderer 206 shown in
[0071]
[0085] The additions for the steps in the flowchart are applicable to all the flowcharts included subsequently.
[0086] The anchor object related AudioElements may also be a multi-channel ObjectSource which is implemented by taking into account the CommonAcousticEnvironment and Deformable content creator instructions for rendering adaptation. Thus, if the multi-channel object cannot fit in the single AcousticEnvironment, then it may be shifted. However, if there is a flag, such as which indicates “deformable==1” for example, then the object may be compressed to fit the entire object in the single AcousticEnvironment.
[0087] In another example embodiment the Update message as defined in EIF may be extended to allow updates via dynamic content in addition to the currently specified Updates. The currently specified updates may be done based on a predetermined timestamp, condition-based update (e.g., location-based trigger) and explicit user interaction (e.g., turn on the radio). An EIF Update may be similar to that described in clause 2.2 of MPEG-I 6DoF Audio Encoder Input Format, ISO/IEC JTC 1/SC 29/WG 11, N18979, Jan. 17, 2020, which describes Scene Updates with the declaration part in a scene.xml file may be followed any number of <Update> nodes. They have the following syntax:
TABLE-US-00002 <Update> Declares one or more changes to the audio scene. The update is performed, when the specified time is reached, or the condition changed its state to the logical value expressed by fireOn, the update is triggered by its ID or index by an external entity The fireOn parameter determines whether the update fires when the condition changes from false-to-true (fireOn = “true”) or from true-to-false (fireOn = “false”). This is helpful for if-else type conditional updates. An <Update> node has one or more <Modify> child nodes. Child node Description <Modify> Count >= 1 Modifications (see below) Attribute Type Flags Default Description id ID R Identifier index Integer O none Index identifying the update (globally unique) time Value O none Time when update is performed (seconds) Note: Must be less than or equal to the duration attribute of the AudioScene. condition Condition ID O none Condition fireOn Boolean O true Update fires when this state is reached delay Float >= 0 O 0 Postpone the update (seconds)
TABLE-US-00003 <Modify> Declares a modification of modifiable parameters of a single entity. The target entity is selected by the id attribute. Following attributes must be attributes of the corresponding entity. The attribute values are assigned the entities property values. When the target entity also has attributes ‘transition’ or ‘duration’ (see below), these can be modified by specifying them two times in the modification. The first occurrence controls the modification parameter, while the second marks the destination value of the entities’ property. Example: <Modify id = “src1” position = “1 2 3” orientation = “−20 5 0”/> sets the attributes position and orientation for the entity with ID src1 Attribute Type Flags Default Description id ID R Target entity to be modified transition Transition O continuous Transition of values (see 4.13) duration Float >= 0 O 0 Period for adapting from the current values to the new values (seconds) * * * * Attribute of the target entity
[0088] Note, that not every attribute can be changed. Only those entities that have an entity type specification that allows for modification can be modified (labelled ‘M’).
[0089] The following updates synchronously move three ObjectSources of a vehicle in motion along a trajectory.
TABLE-US-00004 <Update time=”0.2”> <Modify id=”engine” position=”2.2 1.7 −1.25” /> <Modify id=”tire1” position=”2.2 1.7 0.75” /> <Modify id=”tire2” position=”2.2 1.7 −0.95” /> </Update> <Update time=”0.4”> <Modify id=”engine” position=”2.4 1.7 −1.20” /> <Modify id=”tire1” position=”2.4 1.7 0.70” /> <Modify id=”tire2” position=”2.4 1.7 −0.95” /> </Update> ...
[0090] The following example turns on the sources of a car when the listener gets close.
TABLE-US-00005 <Box id=”geo:region1” position=”5 0 −5” size=”10 2 10” /> <ListenerProximityCondition id=”cond:listenerNearCar” region=”geo:region1” /> <!-- Turn on the engine sound 100ms after the listener entered the region. Smoothly activate the source within 50ms. --> <Update condition=”cond:listenerNearCar” delay=”0.1”> <Modify id=”engine” transition=”continuous” duration=”0.05” active=”true” /> </Update> <!-- Turn on the other sources 100ms later from the engine --> <Update condition=”cond:listenerNearCar” delay=”0.2”> <Modify id=”radio” transition=”continuous” duration=”0.2” active=”true” /> <Modify id=”exhaust” transition=”continuous” duration=”0.1” active=”true”/> </Update>
[0091] The scene loops at the rate of the scene duration as specified in the AudioScene attribute. Timed updates are triggered for every loop of the scene.
[0092] The proposed update in EIF may be as follows:
TABLE-US-00006 <Update api=”<api id>” > <Modify id=(int)(AnchorObject.ref_id) transition=”immediate” position=”<from API>” orientation=″0,0,0″, <timestamp> /> </Update>
[0093] The above will result in a message analogous to the following in the API interface: [0094] {anchorObject.ref_id, X1, Y1, Z1, timestamp}
[0095] In the above, the timestamp can also be a sequence number to enable temporal association with the bitstream content.
[0096] For example, the renderer loop will apply the dynamic content to the right temporal segment of the bitstream content. The timestamp is thus used for associating the update message with the appropriate playback timeline.
[0097] Dynamic Content for Social AR/VR
[0098] Referring also to
[0099] Social VR is another requirement for MPEG-I Audio standard which may utilize dynamic content update. An example schematic is presented in
[0100] Example XML description of dynamic update adaptation information in the EIF is shown in
[0101]
[0102]
[0110]
[0111] MPEG Audio is in the process of standardizing a 6DOF Audio codec. Currently there is no support for: [0112] AR scenarios [0113] Social VR
[0114] The above two are important requirements according to the MPEG-I 6DoF Audio Architecture and Requirements [w18158]. This is due to absence of any mechanism to incorporate information which is not available during content creation. For example [0115] Position of a real-world object or scene orientation which may change during content consumption. [0116] Position of a social VR remote participant whose position may change during the consumption of 6DOF audio content.
[0117] All the agreed scenes are such contents that are known entirely before-hand, and not expected to be different compared to the created content. In other words, there are no unknown parameters during the consumption or playback of 6DoF audio content. Audio scene information such as the audio element positions, orientations, etc. are all known beforehand in the encoder input format (EIF) which is used by an MPEG-I audio encoder.
[0118] Referring also to
[0119] Features as described herein may be provided with an example method comprising receiving a bitstream which comprises recorded audio content and at least one instruction for management or handling of dynamic content; receiving dynamic content separate from the bitstream, where the dynamic content comprises dynamic audio content; and rendering audio with a renderer based upon the recorded audio content of the bitstream, the received dynamic content, and the at least one instruction in the bitstream for management or handling of the dynamic content.
[0120] In some examples, the dynamic content will not always have audio content. There can be dynamic content which is only scene description changes or some rendering parameter change without having any audio data.
[0121] Although “recorded” is mentioned above, it should be noted that features as described herein may be used in real time and the audio content can be transmitted (just like audio communication). The received instruction in the bitstream for the dynamic content may be received separately from the received dynamic content. The rendered audio may comprise the received audio content (as discussed above) and the dynamic content based on the received instruction(s). The management may comprise the use or handling of the dynamic content and audio content (from the bitstream) with one another. The indication in the bitstream may be that a certain part of the audio scene may be rendered with the dynamic content. If the position update from the dynamic content is such that it results in different acoustic environment, the renderer may modify the rendering such that the audio rendering remains in the same acoustic environment while adapting to the new information.
[0122] Examples of the what the dynamic content might comprises include (but are not limited to): [0123] position of audio elements to be rendered, which may be filtered, unfiltered, etc.; not necessarily with the same position filtering process which happened to bitstream content, [0124] acoustic elements modified or new ones for acoustic modelling, [0125] audio data (e.g., for social VR communication audio), [0126] spatial extent and/or orientation of audio sources in the scene
[0127] The received audio content in the bitstream may comprise, for example (but are not limited to): [0128] audio data, [0129] scene description of the audio scene (which comprises), [0130] acoustic environment information such as reflecting surfaces, [0131] acoustic properties such as RT60, direct to reverberation ratio, etc., [0132] content creator intent, [0133] EIF
[0134] Regarding the similarity of ‘audio data’ between the dynamic content and the audio content of the bitstream noted above, the audio data in the bitstream content may be MPEG-H encoded audio data for example, and the audio data in the dynamic content, on the other hand, may be a low latency encoded content (such as AMR, EVS, IVAS, etc.) for example.
[0135] An example embodiment may be provided with a method comprising: receiving a bitstream which comprises recorded audio content and at least one instruction for management of dynamic content; receiving dynamic content separate independent from the bitstream, where the dynamic content comprises dynamic audio content; and rendering audio with a renderer based upon the recorded audio content of the bitstream, the received dynamic content, and the at least one instruction in the bitstream for management of the dynamic content.
[0136] The received bitstream may comprise an audio scene. The received dynamic content may be received at a renderer interface or as a MPEG-H Audio Stream packet. The dynamic content update may arrive with a timestamp to enable association of the update with the playback timeline, or one or more bitstream content time segments. The method may further comprise determining information regarding at least one anchor object in the dynamic content. The method may further comprise associating the at least one anchor object in the dynamic content with at least one anchor object in the bitstream. The method may further comprise modifying a position of an audio element in the dynamic content whose position is defined relative to the at least one anchor object in the bitstream. The method may further comprise determining a spatial audio flag value in the dynamic content, and selecting to: when the spatial audio flag value is false, rendered dynamic content communication audio without any further acoustic modelling, or when the spatial audio flag value is true, render dynamic content communication audio with acoustic modelling according to the information in the bitstream. The method may further comprise determining position of an audio element in an audio scene of the bitstream and an audio element in the dynamic content, and selecting to: when the audio element in the audio scene and the audio element in the dynamic content are in a same acoustic environment, render audio with a renderer based upon the recorded audio content of the bitstream without adapting the recorded audio based upon the received dynamic content, or when the audio element in the audio scene and the audio element in the dynamic content are not in the same acoustic environment, render the audio with the renderer based upon both the recorded audio content of the bitstream and the received dynamic content. The method may further comprise determining position of an audio element in an audio scene of the bitstream and an audio element in the dynamic content, and selecting to: modify a position of the audio element in the dynamic content by moving the audio element outside of an acoustic environment, or modify a position of the audio element in the dynamic content by moving the audio element together as a constellation.
[0137] An example embodiment may be provided with an apparatus comprising: at least one processor; and at least one non-transitory memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to: receiving of a bitstream which comprises audio content and at least one instruction for dynamic content; receiving of dynamic content independent from the bitstream, where the dynamic content comprises dynamic audio content; and cause rendering of audio with a renderer based upon the audio content of the bitstream, the received dynamic content, and the at least one instruction in the bitstream.
[0138] An example embodiment may be provided with a non-transitory program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine for performing operations, the operations comprising: receiving a bitstream which comprises audio content and at least one instruction for dynamic content; receiving dynamic content independent from the bitstream, where the dynamic content comprises dynamic audio content; and rendering audio with a renderer based upon the audio content of the bitstream, the received dynamic content, and the at least one instruction in the bitstream.
[0139] An example embodiment may be provided with an apparatus comprising: means for receiving a bitstream which comprises audio content and at least one instruction for dynamic content; means for receiving dynamic content independent from the bitstream, where the dynamic content comprises dynamic audio content; and means for rendering audio with a renderer based upon the audio content of the bitstream, the received dynamic content, and the at least one instruction in the bitstream.
[0140] An example embodiment may be provided with an apparatus comprising: circuitry configured to receive a bitstream which comprises audio content and at least one instruction for dynamic content; circuitry configured to receive dynamic content independent from the bitstream, where the dynamic content comprises dynamic audio content; and circuitry configured to render audio with a renderer based upon the audio content of the bitstream, the received dynamic content, and the at least one instruction in the bitstream.
[0141] An example embodiment may be provided with a method comprising: receiving a bitstream which comprises recorded audio content; receiving dynamic content separate independent from the bitstream, where the dynamic content comprises dynamic audio content; and determining presence of at least one instruction for management of dynamic content in the bitstream, and selecting to: when the bitstream does not comprise the at least one instruction for the received dynamic content, render audio with a renderer based upon the recorded audio content of the bitstream without adapting the recorded audio based upon the received dynamic content, and when the bitstream comprises the at least one instruction for the received dynamic content, render the audio with the renderer based upon the recorded audio content of the bitstream, the received dynamic content, and the at least one instruction.
[0142] An example embodiment may be provided with an apparatus comprising: at least one processor; and at least one non-transitory memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to: receive a bitstream which comprises audio content; receive dynamic content independent from the bitstream, where the dynamic content comprises dynamic audio content; and determine presence of at least one instruction for dynamic content in the bitstream, and selecting to: when the bitstream does not comprise the at least one instruction for the received dynamic content, render audio with a renderer based upon the audio content of the bitstream without adapting the audio based upon the received dynamic content, or when the bitstream comprises the at least one instruction for the received dynamic content, render the audio with the renderer based upon the audio content of the bitstream, the received dynamic content, and the at least one instruction.
[0143] An example embodiment may be provided with an apparatus comprising a non-transitory program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine for performing operations, the operations comprising: receiving a bitstream which comprises audio content; receiving dynamic content independent from the bitstream, where the dynamic content comprises dynamic audio content; and determining presence of at least one instruction for dynamic content in the bitstream, and selecting to: when the bitstream does not comprise the at least one instruction for the received dynamic content, render audio with a renderer based upon the audio content of the bitstream without adapting the audio based upon the received dynamic content, or when the bitstream comprises the at least one instruction for the received dynamic content, render the audio with the renderer based upon the audio content of the bitstream, the received dynamic content, and the at least one instruction.
[0144] An example embodiment may be provided with an apparatus comprising: means for receiving a bitstream which comprises audio content; means for receiving dynamic content independent from the bitstream, where the dynamic content comprises dynamic audio content; and means for determining presence of at least one instruction for dynamic content in the bitstream, and selecting to: when the bitstream does not comprise the at least one instruction for the received dynamic content, render audio with a renderer based upon the audio content of the bitstream without adapting the audio based upon the received dynamic content, or when the bitstream comprises the at least one instruction for the received dynamic content, render the audio with the renderer based upon the audio content of the bitstream, the received dynamic content, and the at least one instruction.
[0145] An example embodiment may be provided with an apparatus comprising: circuitry configured to receive a bitstream which comprises audio content; circuitry configured to receive dynamic content independent from the bitstream, where the dynamic content comprises dynamic audio content; and circuitry configured to determine presence of at least one instruction for dynamic content in the bitstream, and selecting to: when the bitstream does not comprise the at least one instruction for the received dynamic content, render audio with a renderer based upon the audio content of the bitstream without adapting the audio based upon the received dynamic content, or when the bitstream comprises the at least one instruction for the received dynamic content, render the audio with the renderer based upon the audio content of the bitstream, the received dynamic content, and the at least one instruction.
[0146] An example embodiment may be provided with a method comprising: receiving a bitstream which comprises an audio scene with recorded audio content; receiving dynamic content separate from the bitstream, where the dynamic content comprises dynamic audio content; and determining position of audio elements in the audio scene and audio elements in the dynamic content, and selecting to: when the audio elements in the audio scene and the audio elements in the dynamic content are in a same acoustic environment, render audio with a renderer based upon the recorded audio content of the bitstream without adapting the recorded audio based upon the received dynamic content, or when the audio elements in the audio scene and the audio elements in the dynamic content are not in the same acoustic environment, render the audio with the renderer based upon both the recorded audio content of the bitstream and the received dynamic content.
[0147] An example embodiment may be provided with an apparatus comprising: at least one processor; and at least one non-transitory memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to: receive a bitstream which comprises an audio scene with audio content; receive dynamic content separate from the bitstream, where the dynamic content comprises dynamic audio content; and determine position of audio elements in the audio scene and audio elements in the dynamic content, and selecting to: when the audio elements in the audio scene and the audio elements in the dynamic content are in a same acoustic environment, render audio with a renderer based upon the audio content of the bitstream without adapting the audio based upon the received dynamic content, or when the audio elements in the audio scene and the audio elements in the dynamic content are not in the same acoustic environment, render the audio with the renderer based upon both the audio content of the bitstream and the received dynamic content.
[0148] An example embodiment may be provided with an apparatus comprising a non-transitory program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine for performing operations, the operations comprising: receiving a bitstream which comprises an audio scene with audio content; receiving dynamic content separate from the bitstream, where the dynamic content comprises dynamic audio content; and determining position of audio elements in the audio scene and audio elements in the dynamic content, and selecting to: when the audio elements in the audio scene and the audio elements in the dynamic content are in a same acoustic environment, render audio with a renderer based upon the audio content of the bitstream without adapting the audio based upon the received dynamic content, or when the audio elements in the audio scene and the audio elements in the dynamic content are not in the same acoustic environment, render the audio with the renderer based upon both the audio content of the bitstream and the received dynamic content.
[0149] An example embodiment may be provided with an apparatus comprising: means for receiving a bitstream which comprises an audio scene with audio content; means for receiving dynamic content separate from the bitstream, where the dynamic content comprises dynamic audio content; and means for determining position of audio elements in the audio scene and audio elements in the dynamic content, and selecting to: when the audio elements in the audio scene and the audio elements in the dynamic content are in a same acoustic environment, render audio with a renderer based upon the audio content of the bitstream without adapting the audio based upon the received dynamic content, or when the audio elements in the audio scene and the audio elements in the dynamic content are not in the same acoustic environment, render the audio with the renderer based upon both the audio content of the bitstream and the received dynamic content.
[0150] An example embodiment may be provided with an apparatus comprising: circuitry configured to receive a bitstream which comprises an audio scene with audio content; circuitry configured to receive dynamic content separate from the bitstream, where the dynamic content comprises dynamic audio content; and circuitry configured to determine position of audio elements in the audio scene and audio elements in the dynamic content, and selecting to: when the audio elements in the audio scene and the audio elements in the dynamic content are in a same acoustic environment, render audio with a renderer based upon the audio content of the bitstream without adapting the audio based upon the received dynamic content, or when the audio elements in the audio scene and the audio elements in the dynamic content are not in the same acoustic environment, render the audio with the renderer based upon both the audio content of the bitstream and the received dynamic content.
[0151] An example embodiment may be provided with a method comprising: determining an anchor object in an audio scene; determining at least one instruction for management of dynamic content relative to the anchor object; and transmitting the audio scene in a bitstream, where the bitstream comprises the at least one instruction.
[0152] An example embodiment may be provided with an apparatus comprising: at least one processor; and at least one non-transitory memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to: determine an anchor object in an audio scene; determine at least one instruction for dynamic content relative to the anchor object; and transmit the audio scene in a bitstream, where the bitstream comprises the at least one instruction.
[0153] An example embodiment may be provided with an apparatus comprising: a non-transitory program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine for performing operations, the operations comprising: determining an anchor object in an audio scene; determining at least one instruction for dynamic content relative to the anchor object; and transmitting the audio scene in a bitstream, where the bitstream comprises the at least one instruction.
[0154] An example embodiment may be provided with an apparatus comprising: means for determining an anchor object in an audio scene; means for determining at least one instruction for dynamic content relative to the anchor object; and means for transmitting the audio scene in a bitstream, where the bitstream comprises the at least one instruction.
[0155] An example embodiment may be provided with an apparatus comprising: circuitry configured to determine an anchor object in an audio scene; circuitry configured to determine at least one instruction for dynamic content relative to the anchor object; and circuitry configured to transmit the audio scene in a bitstream, where the bitstream comprises the at least one instruction.
[0156] In one example embodiment, it is possible to receive one or more of the instructions in the dynamic content. The dynamic content information may comprise instructions for determination of rendering parameters. The one or more instructions may arrive with the dynamic content. This is a valid alternative method for implementing social VR for example. One or more instructions could be received via the bitstream and one or more instructions, or some other parts, could be included in the dynamic content.
[0157] An example embodiment may be provided with a method comprising: receiving a bitstream which comprises audio content; receiving dynamic content independent from the bitstream; receiving at least one instruction for the dynamic content from at least one of: the received bitstream or the received dynamic content; and rendering audio with a renderer based upon the audio content of the bitstream, the received dynamic content, and the at least one instruction.
[0158] An example embodiment may be provided with an apparatus comprising: at least one processor; and at least one non-transitory memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to: receive a bitstream which comprises audio content and at least one instruction for dynamic content; receive dynamic content independent from the bitstream, where the dynamic content comprises dynamic audio content; receive at least one instruction for the dynamic content from at least one of: the received bitstream or the received dynamic content; and render audio with a renderer based upon the audio content of the bitstream, the received dynamic content, and the at least one instruction in the bitstream.
[0159] An example embodiment may be provided with an apparatus comprising: a non-transitory program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine for performing operations, the operations comprising: receiving a bitstream which comprises audio content and at least one instruction for dynamic content; receiving dynamic content independent from the bitstream, where the dynamic content comprises dynamic audio content; receiving at least one instruction for the dynamic content from at least one of: the received bitstream or the received dynamic content; and rendering audio with a renderer based upon the audio content of the bitstream, the received dynamic content, and the at least one instruction in the bitstream.
[0160] An example embodiment may be provided with an apparatus comprising: means for receiving a bitstream which comprises audio content and at least one instruction for dynamic content; means for receiving dynamic content independent from the bitstream, where the dynamic content comprises dynamic audio content; means for receiving at least one instruction for the dynamic content from at least one of: the received bitstream or the received dynamic content; and means for rendering audio with a renderer based upon the audio content of the bitstream, the received dynamic content, and the at least one instruction in the bitstream.
[0161] An example embodiment may be provided with an apparatus comprising: circuitry configured to receive a bitstream which comprises audio content and at least one instruction for dynamic content; circuitry configured to receive dynamic content independent from the bitstream, where the dynamic content comprises dynamic audio content; circuitry configured to receive at least one instruction for the dynamic content from at least one of: the received bitstream or the received dynamic content; and circuitry configured to render audio with a renderer based upon the audio content of the bitstream, the received dynamic content, and the at least one instruction in the bitstream.
[0162] It should be understood that the foregoing description is only illustrative. Various alternatives and modifications can be devised by those skilled in the art. For example, features recited in the various dependent claims could be combined with each other in any suitable combination(s). In addition, features from different embodiments described above could be selectively combined into a new embodiment. Accordingly, the description is intended to embrace all such alternatives, modifications and variances which fall within the scope of the appended claims.