Near-field rendering of immersive audio content in portable computers and devices
11528554 · 2022-12-13
Assignee
Inventors
Cpc classification
H04S7/00
ELECTRICITY
International classification
Abstract
Embodiments for a speaker system that produces a near-field sound pattern for rendering immersive audio content in a portable device. An array of drivers projects sound upwards from a top surface of the portable device to form upward-firing speakers; a set of speakers projects sound downwards from a bottom surface of the portable device to form downward-firing speakers. A decoder/renderer component receives immersive audio content, decodes height audio signals from the content and sends direct audio signals to the downward-firing speakers. A crossover performs a high-pass filter function to pass high frequency components of the decoded height audio signals to the upward-firing speakers and low frequency components of the decoded height audio signals to the downward-firing speakers.
Claims
1. A speaker system for a portable computing device comprising: an array of drivers projecting sound upwards from a top surface of the portable computing device to form upward-firing speakers for playback of height audio signals; a set of speakers mounted on a bottom underside surface of the portable computing device and including at least two stereo speakers projecting sound downwards from a bottom surface of the portable computing device to form downward-firing speakers for playback of direct audio signals; a decoder/renderer component within an operating system of the portable computing device receiving immersive audio content comprising channel-based audio and object-based audio including sound objects having height components, decoding the height audio signals from the content into high-frequency height signals and low-frequency height signals to form decoded audio to be sent appropriately to the downward-firing speakers and the upward-firing speakers, wherein the downward-firing and upward-firing speakers are selected for and positioned in the portable computing device in a defined speaker configuration tailored specifically for a product model of the portable computing device; an audio subsystem driver comprising a software processing block implemented as a Stream Effect Audio Processing Object (SFX APO) accessed by one or more software libraries implementing audio post-processing software utilizing the defined speaker configuration, and having a speaker virtualizer, a content processing block, and a device processing block, wherein the speaker virtualizer virtualizes the decoded audio to transmit respective audio signals through the device processing block to correct speakers of the upward-firing speakers and downward-firing speakers; a crossover of the content processing block having a threshold frequency separating the high-frequency height signals from low-frequency height signals, performing a high-pass filter function to pass the high frequency height signals to the upward-firing speakers and the low frequency height signals and the direct audio signals to the downward-firing speakers; and the audio subsystem driver using the defined speaker configuration and filter function with the audio post-processing software to simulate a sound field above and around the portable computing device in a near-field distance on the order of two feet around the portable computing device for a user positioned proximate the portable computing device for playback of the immersive audio content, wherein the upward-firing speakers and the audio subsystem driver are configured to form the sound field above the portable computing device from only upward projected sound signals transmitted from the upward-firing speakers and without utilizing sound reflection from a ceiling or upper surface of a listening environment.
2. The speaker system of claim 1 wherein the sound from the upward firing speakers is projected in a sound pattern directed 90 degrees up from the top surface of the portable computing device when a lid of the portable computing device is opened 90 degrees or more from the top surface, and wherein the upward firing speakers are placed in a position of the upper surface proximate a display screen of the portable computing device when the display screen is deployed for viewing by the user to improve reproduction of height cues in the height audio signals.
3. The speaker system of claim 1 wherein the array of drivers comprises one of: a pair of stereo drivers or a set of four equidistantly spaced drivers, and wherein the set of downward-firing speakers comprises a low frequency effect (LFE) driver, and further wherein the array of drivers and set of speakers is configured in a 2.1.2 format with the 0.1 component for the LFE driver and the 0.2 designation for two height channel speakers.
4. The speaker system of claim 1 wherein each driver of the array of drivers comprises a transducer of approximately 15 mm to 20 mm in diameter and 4 mm to 6 mm thickness placed into an enclosure of approximately 3 cc to 4 cc in volume.
5. The speaker system of claim 1 wherein the threshold frequency is 2 kHz.
6. The speaker system of claim 1 wherein the portable computing device is a device selected from the group consisting of: laptop computer, tablet computer, game console, smart phone, and portable audio playback device.
7. The system of claim 1, further comprising a device post-processing block including the device processing block and integrated as an Endpoint Effect Audio Processing Object (EFX APO) to regulate the upward-firing speakers and downward-firing speakers through at least one of audio equalization, filtering, and high pass/low pass functions.
8. The system of claim 7 wherein the immersive audio content comprises audio content encoded in Dolby Digital Plus/Joint Object Coding (DD+/JOC) format.
9. A method of creating a near-field sound environment for playback of immersive audio content through a portable computing device, comprising: providing an array of drivers projecting sound upwards from a top surface of the portable computing device to form upward-firing speakers for playback of height audio signals, and a set of speakers mounted on a bottom underside surface of the portable computing device and including at least two stereo speakers projecting sound downwards from a bottom surface of the portable computing device to form downward-firing speakers for playback of direct audio signals; receiving immersive audio content within a decoder/renderer component within an operating system of the portable computing device, the immersive audio content comprising channel-based audio and object-based audio including sound objects having height components; decoding the received immersive audio content to separate direct audio from height audio to generate appropriate direct and height speaker feeds, and further decoding the height audio into high-frequency height signals and low-frequency height signals to form decoded audio to be sent appropriately to the downward-firing speakers and the upward-firing speakers, wherein the downward-firing and upward-firing speakers are selected for and positioned in the portable computing device in a defined speaker configuration tailored specifically for a product model of the portable computing device; virtualizing, in an audio subsystem driver having a speaker virtualizer comprising a software processing block implemented as a Stream Effect Audio Processing Object (SFX APO) accessed by one or more software libraries implementing audio post-processing software utilizing the defined speaker configuration, a content processing block, and a device processing block, the decoded immersive audio content to transmit respective audio signals through the device processing block to correct speakers of the upward-firing speakers and downward-firing speakers; transmitting the direct audio to direct speakers of the portable computing device through the direct speaker feeds; high-pass filtering, in a crossover of the content processing block having a threshold frequency separating the high-frequency height signals from low-frequency height signals, the height audio to pass the high-frequency height signals to the height speakers of the portable computing device and to pass the direct audio signals low-frequency height signals to the downward-firing speakers; and using the defined speaker configuration and filter function with the audio post-processing software to simulate a sound field above and around the portable computing device in a near-field distance on the order of two feet around the portable computing device for a user positioned proximate the portable computing device for playback of the immersive audio content, wherein the upward-firing speakers and the audio subsystem driver are configured to form the sound field above the portable computing device from only upward projected sound signals transmitted from the upward-firing speakers and without utilizing sound reflection from a ceiling or upper surface of a listening environment.
10. The method of claim 9 wherein the threshold frequency is 2 KHz, and wherein the upward firing speakers are placed in a position of the upper surface proximate a display screen of the portable computing device when the display screen is deployed for viewing by the user to improve reproduction of height cues in the height audio signals, and further wherein the sound from the upward firing speakers is projected in a sound pattern directed 90 degrees up from the top surface of the portable computing device when a lid of the portable computing device is opened 90 degrees or more from the top surface.
11. The method of claim 9 wherein the direct speaker feeds comprise left, right, and low frequency effects (LFE) channel feeds, and the height speaker feeds comprise right and left height channels, wherein each height channel drives at least one or a pair of individual upward-firing drivers of a speaker array, and further wherein the array of drivers and set of speakers is configured in a 2.1.2 format with the 0.1 component for the LFE driver and the 0.2 designation for two height channel speakers.
12. The method of claim 9 further comprising providing a device post-processing block including the device processing block and integrated as an Endpoint Effect Audio Processing Object (EFX APO) to regulate the upward-firing speakers and downward-firing speakers through at least one of equalization, filtering, and shaping of the immersive audio content.
13. The method of claim 12 wherein the portable computing device is a device selected from the group consisting of: laptop computer, tablet computer, game console, smart phone, and portable audio playback device, and wherein the immersive audio content comprises audio content encoded in Dolby Digital Plus/Joint Object Coding (DD+/JOC) format.
14. The method of claim 9 further comprising: detecting the presence of one or more external speakers for playback of the height audio; and transmitting the height speaker feeds to the detected external speakers.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) In the following drawings like reference numbers are used to refer to like elements. Although the following figures depict various examples, the one or more implementations are not limited to the examples depicted in the figures.
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
DETAILED DESCRIPTION
(12) Systems and methods are described for speakers in a portable device, such as a laptop computer or tablet that creates a near field audio experience for playback of immersive audio content without requiring sound reflection or special speaker placement. Aspects of the one or more embodiments described herein may be implemented in or used in conjunction with an audio or audio-visual (AV) system that processes source audio information in a mixing, rendering and playback system that includes one or more computers or processing devices executing software instructions.
(13) Any of the described embodiments may be used alone or together with one another in any combination. Although various embodiments may have been motivated by various deficiencies with the prior art, which may be discussed or alluded to in one or more places in the specification, the embodiments do not necessarily address any of these deficiencies. In other words, different embodiments may address different deficiencies that may be discussed in the specification. Some embodiments may only partially address some deficiencies or just one deficiency that may be discussed in the specification, and some embodiments may not address any of these deficiencies.
(14) For purposes of the present description, the following terms have the associated meanings: the term “channel” means an audio signal plus metadata in which the position is coded as a channel identifier, e.g., left-front or right-top surround; “channel-based audio” is audio formatted for playback through a pre-defined set of speaker zones with associated nominal locations, e.g., 5.1, 7.1, and so on (i.e., a collection of channels as just defined); the term “object” means one or more audio channels with a parametric source description, such as apparent source position (e.g., 3D coordinates), apparent source width, etc.; “object-based audio” means a collection of objects as just defined; and “immersive audio,” (alternatively “spatial audio”) means channel-based and object or object-based audio signals plus metadata that renders the audio signals based on the playback environment using an audio stream plus metadata in which the position is coded as a 3D position in space; and “listening environment” means any open, partially enclosed, or fully enclosed area, such as a room that can be used for playback of audio content alone or with video or other content. The term “driver” means a single electroacoustic transducer that produces sound in response to an electrical audio input signal. A driver may be implemented in any appropriate type, geometry and size, and may include horns, cones, ribbon transducers, and the like. The term “speaker” means one or more drivers in a unitary enclosure, and the terms “cabinet” or “housing” mean the unitary enclosure that encloses one or more drivers. The terms “driver” and “speaker” may be used interchangeably when referring to a single-driver speaker. The terms “speaker feed” or “speaker feeds” may mean an audio signal sent from an audio renderer to a speaker for sound playback through one or more drivers.
(15) Embodiments are directed to a reflected sound rendering system that is configured to work with a sound format and processing system that may be referred to as an “immersive audio system,” or “spatial audio system” that is based on an audio format and rendering technology to allow enhanced audience immersion, greater artistic control, and system flexibility and scalability. An overall adaptive audio system generally comprises an audio encoding, distribution, and decoding system configured to generate one or more bitstreams containing both conventional channel-based audio and object-based audio. Such a combined approach provides greater coding efficiency and rendering flexibility compared to either channel-based or object-based approaches taken separately. An example of an immersive audio system that may be used in conjunction with present embodiments is described in U.S. Provisional Patent Application 61/636,429, filed on Apr. 20, 2012 and entitled “System and Method for Adaptive Audio Signal Generation, Coding and Rendering.”
(16) In general, audio objects can be considered as groups of sound elements that may be perceived to emanate from a particular physical location or locations in the listening environment. Such objects can be static (stationary) or dynamic (moving). Audio objects are controlled by metadata that defines the position of the sound at a given point in time, along with other functions. When objects are played back, they are rendered according to the positional metadata using the speakers that are present, rather than necessarily being output to a predefined channel. In an immersive audio decoder, the channels are sent directly to their associated speakers or down-mixed to an existing speaker set, and audio objects are rendered by the decoder in a flexible manner. The parametric source description associated with each object, such as a positional trajectory in 3D space, is taken as an input along with the number and position of speakers connected to the decoder. The renderer utilizes certain algorithms to distribute the audio associated with each object across the attached set of speakers. The authored spatial intent of each object is thus optimally presented over the specific speaker configuration that is present in the listening environment.
(17) Portable Computer Speaker System
(18) As described above, accurate playback of immersive content in portable devices such as laptop/notebook computers is not presently possible because of speaker placement and audio processing constraints. Embodiments of a portable device speaker system overcomes this problem by integrating by configuring speakers to directly fire upwards at a substantially 90-degree angle from the surface of the table (referred to as upward-firing speakers), thus creating a sound field that can reproduce a similar height effect as can be produced by direct or reflected speakers (e.g., as in Dolby Atmos Home Theater systems) for the listener in a near-field environment that is around the portable computer itself. The system includes specific immersive audio processor and software library to apply post-processing technology that allows the correct filtering of the height information to send only high-frequency content in the height-related channels to the upward-firing speakers (such as by using a standard high-pass filter) and the rest of the content to the downward-firing speakers. This allows the use of speakers small enough to fit within the laptop form factor.
(19) For purposes of illustration and explanation, embodiments are primarily described and shown with respect to a laptop or notebook computer. It should be noted, however, that the speaker system described herein can be applied to many different types of portable devices of various form factors, including but not limited to: smartphones, portable game consoles, handheld computing devices, tablets, and so on. Thus, for brevity, embodiments may be described with respect to a portable device embodied in a two-piece (lid plus body) portable computer, but embodiments are not so limited.
(20) In an embodiment, an array of two or more height channel speakers is positioned on an upper surface of a laptop computer or tablet device to project sound upward relative to a user, while the non-height or standard speakers may be located on other surfaces of the device, and typically in the bottom surface of the computer. As shown in
(21)
(22) The underside speakers 203 and 204 represent the direct playback channels for surround-sound or immersive audio content, and the LFE speaker 206 represents the standard surround LFE channel, while the upward-firing speakers 105 and 106 represent the height channels. For purposes of description, it is appropriate to refer to this portable device speaker system in the same manner as Dolby Atmos or similar home theater systems, where the speakers are referred to as: X.Y.Z (e.g. 5.1.4, or 7.1.2) and X denotes the number of direct channel speakers, Y denotes the number of LFE or subwoofer speakers, and Z denotes the number of height speakers. For the embodiment of
(23) Any practical number of speakers may be provided for each component of the immersive audio to be rendered, though numbers are typically low for small-scale portable devices. For example, the number of LFE speakers is typically just one, but two to four direct channels speakers may be provided in the underside of the device. Similarly, the array of upward-firing speakers may be a pair of speakers as shown in
(24) For the example embodiments of
(25) The upward-firing speaker array is intended to play Dolby Atmos or other immersive audio content on PC laptop form factors and other portable devices as close as possible to the real intention of the content creator by creating a sound field that simulates the height information above and around the laptop by utilizing the upward-firing speakers and special post-processing software. Accordingly, embodiments of the system include the integration of both a hardware component in the form of specially designed and integrated speakers in the PC laptop housing, and a software component in the form of a new immersive audio processor and software/firmware library that will recreate the height content optimized for these speakers.
(26) With respect to the hardware aspect, the upward-firing speaker array comprises two or more speakers located on the upper surface of the device body. These speakers are generally small-diameter speakers that are fitted inside specially-designed enclosures into the audio subsystem of the PC laptop or device. In an embodiment, the speakers feature a 15 to 20 mm-diameter transducer with a maximum 4 mm to 6 mm thickness to fit into the laptop body. Other sizes and dimensions may also be used depending on the size and shape of the device, but for a standard 12 inch to 15 inch laptop computer, the above dimensions are generally preferable, though embodiments are not so limited.
(27) The transducers are generally chosen to have good SPL (sound pressure level) and performance from approximately 2 KHz to 20 kHz. In an embodiment, the speaker enclosure should be designed with about 3 to 4 cc volume. The speaker should be integrated on the rim above the keyboard area of the laptop housing, and spaced as far apart from each other as possible, such as on either side of the body as shown in
(28) With regard to the software aspect, certain additional program components may be provided for use with existing immersive audio content processors, such as the Dolby Atmos system. Thus, for example, software components may include programs, plug-ins or libraries that are built on top of existing Dolby Atmos technologies to optimize the audio content for playback on the exact audio hardware that is built on the specific PC laptop.
(29)
(30) In an example embodiment, the immersive audio content comprises Dolby Atmos content encoded in Dolby Digital Plus/Joint Object Coding format (referred to as DD+/JOC or generically as “immersive audio content”) that is transmitted to the laptop either over an IP network (as in streaming content) or via BluRay playback. Embodiments are not so limited, however, and other standards and transmission formats are also possible. For the example embodiment shown, the DD+/JOC content is decoded and rendered in a standard fashion (e.g., as 7.1.4 or 5.1.2 channel Atmos format) with a decoder block 404 that is integrated as a Media Foundation Transform, and which is provided by Microsoft on all Windows 10 OS installations. A special immersive audio content post-processing block is then implemented as a Stream Effect Audio Processing Object (referred to as SFX APO) as part of the audio subsystem driver 407.
(31) In an embodiment, the audio subsystem driver 407 comprises certain discrete software components including speaker virtualizer 410, content processing block 412, and device processing block 414. The speaker virtualizer 410 takes the immersive audio content in the appropriate format (e.g., Atmos 5.1.2) from the renderer 406. It then outputs this audio as channel output for the upward, downward, and LFE speakers of the portable device, such as 2.1.2. format as shown in
(32) The content processing block 412 then performs certain processing steps, including performing a cross-over high-pass filter operation on the height channels (denoted as the “0.2” in the 2.1.2 system above) to extract all high-frequency content, specified by a cutoff frequency, out of the height channels and physically route them to the upward-firing speakers in the system, which in the 2.1.2 system case are the two upward-firing drivers 105 and 106. The low-frequency content remaining in the height-channels that are below the cutoff frequency, will then sent to the downward-firing drivers (in the 2.1.2 system case, the two downward-firing transducers) equally. Thus, for a 2.1.2 system, the remaining low-frequency left height channel content will be distributed to the single left downward-firing driver, or equally between any number of left downward-firing drivers; and the same for the right height channel content.
(33) The content processor component 412 thus includes a crossover process or sub-component. The exact cutoff frequency of this crossover defines the high/low pass filter frequency for the height channels to be sent to either the upward or downward-firing drivers. This cutoff frequency may be set, through well-known crossover techniques, to any appropriate frequency, typically in the range of 1 kHz to 5 kHz as determined by the actual performance and physical characteristics of the upward-firing drivers relative to the downward-firing drivers. In an example embodiment, cutoff frequency for a laptop computer with upward-firing drivers as configured with the specifications mentioned above is 2 KHz.
(34) A primary component of the software stack is the crossover filter step that distributes the height channel content in the original immersive audio (DD+/JOC) file among the upward and downward-firing transducers, with respect to their directions and performance capabilities. This process simulates a sound field above and around the PC laptop in the near-field for a user sitting at a normal distance and posture from the laptop. In typical usage, the near-field distance is an area within two feet of the laptop computer body.
(35) For the embodiment of
(36)
(37)
(38)
(39)
(40) Embodiments have been described in relation to drivers that are internal to the portable device, through either drivers that are native to the device from initial manufacture or added to the device as part of an audio subsystem (hardware) upgrade to add upward-firing driver capability to the device. In an alternative embodiment, the portable device and audio subsystem (software stack) can be used in conjunction with external speakers that are close coupled to the device and that may be used to provide upward-firing capability. Such external speakers may be embodied in the form of small or miniature speaker units that plug directly or through a short cable into a speaker port of the device and/or a miniature soundbar that is directly or closely coupled to the device.
(41)
(42) In an embodiment, renderer/decoder
(43) Embodiments are directed to a novel audio subsystem that integrates upward-firing speakers and audio post-processing technologies will allow portable devices to render and play immersive audio content, such as Dolby Atmos content (encoded in DD+/JOC format) and simulate the height content in the near field for the listener. The embodiments described herein allow portable computer and audio playback devices to render newer audio formats, such as the object-based Dolby Atmos system. Though such systems traditionally may introduce additional speakers, such as height speakers or reflected sound speakers that provide immersive sound by projecting sound based on height cues in the audio program. The internal device speakers provide a near-field audio experience that allows these portable devices to recreate at least some of the height cues that are rendered in much larger immersive audio environments.
(44) One or more of the components, blocks, processes or other functional components may be implemented through a computer program that controls execution of a processor-based computing device of the system. It should also be noted that the various functions disclosed herein may be described using any number of combinations of hardware, firmware, and/or as data and/or instructions embodied in various machine-readable or computer-readable media, in terms of their behavioral, register transfer, logic component, and/or other characteristics. Computer-readable media in which such formatted data and/or instructions may be embodied include, but are not limited to, physical (non-transitory), non-volatile storage media in various forms, such as optical, magnetic or semiconductor storage media.
(45) Unless the context clearly requires otherwise, throughout the description and the claims, the words “comprise,” “comprising,” and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in a sense of “including, but not limited to.” Words using the singular or plural number also include the plural or singular number respectively. Additionally, the words “herein,” and “hereunder” and words of similar import refer to this application as a whole and not to any particular portions of this application. When the word “or” is used in reference to a list of two or more items, that word covers all of the following interpretations of the word: any of the items in the list, all of the items in the list and any combination of the items in the list.
(46) While one or more implementations have been described by way of example and in terms of the specific embodiments, it is to be understood that one or more implementations are not limited to the disclosed embodiments. To the contrary, it is intended to cover various modifications and similar arrangements as would be apparent to those skilled in the art. Therefore, the scope of the appended claims should be accorded the broadest interpretation so as to encompass all such modifications and similar arrangements.