SYSTEMS AND METHODS FOR ADJUSTING DIRECTIONAL AUDIO IN A 360 VIDEO
20170339507 · 2017-11-23
Inventors
Cpc classification
H04S2400/15
ELECTRICITY
H04S2400/11
ELECTRICITY
H04N13/189
ELECTRICITY
H04S3/002
ELECTRICITY
H04S2400/01
ELECTRICITY
International classification
Abstract
In a computing device for adjusting audio output during playback of 360 video, a 360 video bitstream is received, and the 360 video bitstream separated into video content and audio content. The audio content corresponding to a plurality of audio sources is decoded, wherein a number of audio sources is represented by N. The video content is displayed and the audio content is output through a plurality of output devices, wherein a number of output devices is represented by M. In response to detecting a change in a viewing angle for the video content, a determination is made, for each of the plurality of output devices, of a distribution ratio for each of the plurality of audio sources based on the viewing angle such that N×M distribution ratios are determined; and the audio content is output through each of the plurality of output devices based on the determined N×M distribution ratios.
Claims
1. A method implemented in a computing device for adjusting audio output during playback of 360 video, comprising: receiving a 360 video bitstream; separating the 360 video bitstream into video content and audio content; decoding the audio content corresponding to a plurality of audio sources, wherein a number of audio sources is represented by N; displaying the video content and outputting the audio content through a plurality of output devices, wherein a number of output devices is represented by M; in response to detecting a change in a viewing angle for the video content: determining, for each of the plurality of output devices, a distribution ratio for each of the plurality of audio sources based on the viewing angle such that N×M distribution ratios are determined; and outputting the audio content through each of the plurality of output devices based on the determined N×M distribution ratios.
2. The method of claim 1, wherein detecting the change in the viewing angle for the video content comprises detecting input from at least one of: a mouse, a touchscreen, a virtual-reality headset, and an accelerometer.
3. The method of claim 1, wherein outputting the audio content through each of the plurality of output devices based on the determined N×M distribution ratios comprises: generating, for each of the plurality of output devices, a magnitude for outputting audio content corresponding to each of the plurality of audio sources based on the N×M distribution ratios such that N×M magnitudes are adjusted; outputting the audio content corresponding to each of the plurality of audio sources through each of the plurality of output devices based on the N×M magnitudes.
4. The method of claim 1, wherein M is equal to 2, and wherein the output devices comprises a left channel output device and a right channel output device.
5. The method of claim 4, wherein the distribution ratios for the N audio sources for the left channel output device are determined according to:
6. The method of claim 4, wherein the N×M magnitudes are generated according to:
7. The method of claim 1, wherein M is greater than 2, and wherein the output devices comprise multiple channels.
8. A system, comprising: a memory storing instructions; and a processor coupled to the memory and configured by the instructions to at least: receive a 360 video bitstream; separate the 360 video bitstream into video content and audio content; decode the audio content corresponding to a plurality of audio sources, wherein a number of audio sources is represented by N; display the video content and output the audio content through a plurality of output devices, wherein a number of output devices is represented by M; in response to detecting a change in a viewing angle for the video content: determine, for each of the plurality of output devices, a distribution ratio for each of the plurality of audio sources based on the viewing angle such that N×M distribution ratios are determined; and output the audio content through each of the plurality of output devices based on the determined N×M distribution ratios.
9. The system of claim 8, wherein detecting the change in the viewing angle for the video content comprises detecting input from at least one of: a mouse, a touchscreen, a virtual-reality headset, and an accelerometer.
10. The system of claim 8, wherein outputting the audio content through each of the plurality of output devices based on the determined N×M distribution ratios comprises: generating, for each of the plurality of output devices, a magnitude for outputting audio content corresponding to each of the plurality of audio sources based on the N×M distribution ratios such that N×M magnitudes are adjusted; outputting the audio content corresponding to each of the plurality of audio sources through each of the plurality of output devices based on the N×M magnitudes.
11. The system of claim 8, wherein M is equal to 2, and wherein the output devices comprises a left channel output device and a right channel output device.
12. The system of claim 11, wherein the distribution ratios for the N audio sources for the left channel output device are determined according to:
13. The system of claim 11, wherein the N×M magnitudes are generated according to:
14. The system of claim 8, wherein M is greater than 2 and wherein the output devices comprise multiple channels.
15. A non-transitory computer-readable storage medium storing instructions to be implemented by a computing device having a processor, wherein the instructions, when executed by the processor, cause the computing device to at least: receive a 360 video bitstream; separate the 360 video bitstream into video content and audio content; decode the audio content corresponding to a plurality of audio sources, wherein a number of audio sources is represented by N; display the video content and output the audio content through a plurality of output devices, wherein a number of output devices is represented by M; in response to detecting a change in a viewing angle for the video content: determine, for each of the plurality of output devices, a distribution ratio for each of the plurality of audio sources based on the viewing angle such that N×M distribution ratios are determined; and output the audio content through each of the plurality of output devices based on the determined N×M distribution ratios.
16. The non-transitory computer-readable storage medium of claim 15, wherein detecting the change in the viewing angle for the video content comprises detecting input from at least one of: a mouse, a touchscreen, a virtual-reality headset, and an accelerometer.
17. The non-transitory computer-readable storage medium of claim 15, wherein outputting the audio content through each of the plurality of output devices based on the determined N×M distribution ratios comprises: generating, for each of the plurality of output devices, a magnitude for outputting audio content corresponding to each of the plurality of audio sources based on the N×M distribution ratios such that N×M magnitudes are adjusted; outputting the audio content corresponding to each of the plurality of audio sources through each of the plurality of output devices based on the N×M magnitudes.
18. The non-transitory computer-readable storage medium of claim 15, wherein M is equal to 2, and wherein the output devices comprises a left channel output device and a right channel output device.
19. The non-transitory computer-readable storage medium of claim 18, wherein the distribution ratios for the N audio sources for the left channel output device are determined according to:
20. The non-transitory computer-readable storage medium of claim 18, wherein the N×M magnitudes are generated according to:
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0007] Various aspects of the disclosure can be better understood with reference to the following drawings. The components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Moreover, in the drawings, like reference numerals designate corresponding parts throughout the several views.
[0008]
[0009]
[0010]
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
DETAILED DESCRIPTION
[0017] An increasing number of digital capture devices are capable of recording 360 degree video (hereinafter “360 video”), which offers viewers a fully immersive experience. The creation of 360 video generally involves capturing a full 360 degree view using multiple cameras, stitching the captured views together, and encoding the video. An individual viewing a 360 video can experience audio from multiple directions due to placement of various audio capture devices during capturing of 360 video, as shown in FIG. 4. Various embodiments achieve an improved audio experience over conventional systems by adjusting the perceived direction of audio according to the user's viewing angle during playback of 360 video, thereby providing the user with a more realistic experience. In this regard, various embodiments provide an improvement over systems that output the same audio content regardless of whether the viewing angle changes.
[0018] As shown in
[0019] It should be emphasized that the present invention does not limit how the microphones are connected to the camera. Each audio source (AS) provides a separate sound signal based on the audio content captured by a corresponding microphone. For example, AS1 produces a sound signal based on the sound signal captured by Mic1. The microphone configuration utilized while capturing 360 video can be designed to accommodate different camera designs. For example, the microphone can be coupled via a cable or coupled wirelessly to the camera via Bluetooth®. In some configurations, a microphone array can be attached directly below or above the video camera to capture audio from different directions. The microphones can be evenly located around the camera or randomly placed.
[0020] A description of a system for implementing the audio adjustment techniques disclosed herein is now described followed by a discussion of the operation of the components within the system.
[0021] For some embodiments, the computing device 102 may be equipped with a plurality of cameras (not shown) where the cameras are utilized to directly capture digital media content comprising 360 degree views. In accordance with such embodiments, the computing device 102 further comprises a stitching module (not shown) configured to process the captured views and generate a 360 degree video. Alternatively, the computing device 102 can obtain 360 video from other digital recording devices coupled to the computing device 102 through a network interface 104. The network interface 104 in the computing device 102 may also access one or more content sharing websites 124 hosted on a server via the network 120 to retrieve digital media content.
[0022] As one of ordinary skill will appreciate, the digital media content may be encoded in any of a number of formats including, but not limited to, Motion Picture Experts Group (MPEG)-1, MPEG-2, MPEG-4, H.264, Third Generation Partnership Project (3GPP), 3GPP-2, Standard-Definition Video (SD-Video), High-Definition Video (HD-Video), Digital Versatile Disc (DVD) multimedia, Video Compact Disc (VCD) multimedia, High-Definition Digital Versatile Disc (HD-DVD) multimedia, Digital Television Video/High-definition Digital Television (DTV/HDTV) multimedia, Audio Video Interleave (AVI), Digital Video (DV), QuickTime (QT) file, Windows Media Video (WMV), Advanced System Format (ASF), Real Media (RM), Flash Media (FLV), or any number of other digital formats.
[0023] The computing device 102 includes a splitter 106 for receiving a 360 video file and separating the 360 video file into video and audio content. The splitter 106 routes the video content to a video decoder 108 and the audio content to an audio decoder 110 for decoding the video and audio data inside the file, respectively. The video decoder 108 is coupled to a display 116 and the audio decoder 110 is coupled to an audio output adjuster 112. As described in more detail below, the audio output adjuster 112 is configured to determine a ratio for distributing audio content from each of the audio sources (AS1, AS2, . . . ASN) (
[0024] For embodiments where the audio output device 118 in
[0025]
[0026] The processing device 202 may include any custom made or commercially available processor, a central processing unit (CPU) or an auxiliary processor among several processors associated with the computing device 102, a semiconductor based microprocessor (in the form of a microchip), a macroprocessor, one or more application specific integrated circuits (ASICs), a plurality of suitably configured digital logic gates, and other well known electrical configurations comprising discrete elements both individually and in various combinations to coordinate the overall operation of the computing system.
[0027] The memory 214 can include any one of a combination of volatile memory elements (e.g., random-access memory (RAM, such as DRAM, and SRAM, etc.)) and nonvolatile memory elements (e.g., ROM, hard drive, tape, CDROM, etc.). The memory 214 typically comprises a native operating system 216, one or more native applications, emulation systems, or emulated applications for any of a variety of operating systems and/or emulated hardware platforms, emulated operating systems, etc. For example, the applications may include application specific software which may comprise some or all the components of the computing device 102 depicted in
[0028] Input/output interfaces 204 provide any number of interfaces for the input and output of data. For example, where the computing device 102 comprises a personal computer, these components may interface with one or more user input/output interfaces, which may comprise a keyboard or a mouse, as shown in
[0029] In the context of this disclosure, a non-transitory computer-readable medium stores programs for use by or in connection with an instruction execution system, apparatus, or device. More specific examples of a computer-readable medium may include by way of example and without limitation: a portable computer diskette, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM, EEPROM, or Flash memory), and a portable compact disc read-only memory (CDROM) (optical).
[0030] Reference is made to
[0031] Although the flowchart of
[0032] To begin, in block 310, the computing device 102 receives 360 video to be viewed by a user and splits the 360 video into video and audio content. In block 320, the audio decoder 110 decodes the encoded audio content and extracts the number of audio sources (AS1 to ASN) encoded in the audio portion of the 360 video, where N represents the total number of audio sources. As shown earlier in
[0033] Next, in block 330, the computing device 102 monitors for a change in viewing angle specified by the user as the user views the 360 video. A change in the viewing angle by the user triggers calculation of the ratio for distributing audio content from each of the N audio sources to the channels of the audio output device, and adjustment of the audio output is performed on the fly. For implementations where the audio output device comprises headphones, the headphones comprise a right channel and a left channel such that the number of audio output devices is two (M=2).
[0034] The computing device 102 determines the ratio for distributing audio content originating from each of the N audio sources between the M=2 audio output devices—specifically, between the left and right channels of the headphones (block 340). Thus, the ratio is calculated for each of the N audio sources, thereby yielding N ratio values for each of the M=2 audio output devices for a total of N×M ratio values. Based on the determined ratio, in block 350, the computing device 102 adjusts the corresponding magnitude or volume of the audio content for the left and right channels for each of the N audio sources and outputs the audio content accordingly to the left and right channels. Thereafter the process in
[0035] Additional details are now provided for calculation of distribution ratios by the audio output adjuster 112 (
[0036] With regards to the distribution ratios, assume that the viewing angle is θ. Based on this, the left channel angle is θL=270+θ, and the right channel angle is θR=90+θ, where the respective magnitudes of each audio source (AS1 to ASN) for the left and right channels are calculated according to the following equations:
In the equations above, fLi(θ) represents the ratio for distributing the audio content from the i.sup.th audio source (ASi) out of N audio sources to the left channel speaker based on a viewing angle θ degrees. Similarly, in the equations above, fRi(θ) represents the ratio for distributing the audio content from audio source ASi to the right channel speaker based on a viewing angle θ degrees, where the sum of the ratios is fLi(θ)+fRi(θ)=1. Thus, CHl represents the magnitude/volume of all audio signals from the N audio source (AS1 . . . ASNi) output to the left channel, while CHr(i) represents the magnitude/volume of all audio signals from all the N audio source (AS1 . . . ASNi) output to the right channel, where the audio signals are weighted by corresponding distribution ratios (fLi(θ),fRi(θ)). Thus, an improved audio experience is achieved by adjusting the perceived direction of audio according to the user's viewing angle during playback of 360 video, thereby providing the user with a more realistic experience.
[0037] To further illustrate calculation of the distribution ratios disclosed above, reference is made to
Thus, if the viewing angle θ=0, then CHl=AS2 and CHr=AS1, whereas if the viewing angle θ=180, then CHl=AS1 and CHr=AS2. If the viewing angle θ=90, then CHl=½×AS1+½×AS2 and CHr=½×AS1+½×AS2. That is, for this particular example, the two audio sources (AS1, AS2) contribute equally when the viewing angle θ=90.
[0038] With reference to
Thus, if the viewing angle θ=0, then:
If the viewing angle θ=180, then:
If the viewing angle θ=90, then:
CHl=AS1+¼×AS2+¼×AS3
CHr=¾×AS2+¾×AS3
[0039] With reference to
Thus, if viewing angle θ=0, then:
CHl=½×AS1+½×AS3+AS4
CHr=½×AS1+AS2+½×AS3
If the viewing angle θ=180, then:
CHl=½×AS1+AS2+½×AS3
CHr=½×AS1+½×AS3+AS4
If the viewing angle θ=90, then:
CHl=AS1+½×AS2+½×AS4
CHr=½×AS2+AS3+½×AS4
[0040] Note that while the audio output device (
If θ=0, then CH1=AS1, CH2=AS2, CH3, =AS3.
If θ=120, then CH1=AS2, CH2=AS3, CH3, =AS1
If θ=240, then CH1=AS3, CH2=AS1, CH3, =AS2
If θ=30, then:
[0041] It should be emphasized that the above-described embodiments of the present disclosure are merely possible examples of implementations set forth for a clear understanding of the principles of the disclosure. Many variations and modifications may be made to the above-described embodiment(s) without departing substantially from the spirit and principles of the disclosure. All such modifications and variations are intended to be included herein within the scope of this disclosure and protected by the following claims.