Patent classifications
H04N21/8106
Apparatus and method for screen related audio object remapping
An apparatus for generating loudspeaker signals includes an object metadata processor configured to receive metadata, to calculate a second position of the audio object depending on the first position of the audio object and on a size of a screen if the audio object is indicated in the metadata as being screen-related, to feed the first position of the audio object as the position information into the object renderer if the audio object is indicated in the metadata as being not screen-related, and to feed the second position of the audio object as the position information into the object renderer if the audio object is indicated in the metadata as being screen-related. The apparatus further includes an object renderer configured to receive an audio object and to generate the loudspeaker signals depending on the audio object and on position information.
DATA PROCESSING METHOD AND APPARATUS, DEVICE, AND READABLE STORAGE MEDIUM
A data processing method includes acquiring video frame data including one or more video frames and audio data of a video, and determining position attribute information of a target object in the acquired one or more video frames, the target object being associated with the audio data. The method also includes acquiring a channel encoding parameter associated with the position attribute information, and performing azimuth enhancement processing on the audio data according to the channel encoding parameter to obtain enhanced audio data. Apparatus and non-transitory computer-readable storage medium counterpart embodiments are also contemplated.
ACOUSTIC DEVICE, DISPLAY CONTROL METHOD, AND DISPLAY CONTROL PROGRAM
An acoustic device includes: a first display unit; a second display unit; an operation unit configured to receive a user's operation; a judging unit configured to judge a type of the operation performed on the operation unit; and a display controller configured to, in response to the type of the operation determined by the judging unit, change display contents of the first display unit to display contents corresponding to the type of the operation and display on the second display unit at least a part of the display contents having been displayed on the first display unit.
Automated audio mapping using an artificial neural network
According to one implementation, an automated audio mapping system includes a computing platform having a hardware processor and a system memory storing an audio mapping software code including an artificial neural network (ANN) trained to identify multiple different audio content types. The hardware processor is configured to execute the audio mapping software code to receive content including multiple audio tracks, and to identify, without using the ANN, a first music track and a second music track of the multiple audio tracks. The hardware processor is further configured to execute the audio mapping software code to identify, using the ANN, the audio content type of each of the multiple audio tracks except the first music track and the second music track, and to output a mapped content file including the multiple audio tracks each assigned to a respective one predetermined audio channel based on its identified audio content type.
VIDEO DUBBING METHOD, APPARATUS, DEVICE, AND STORAGE MEDIUM
The present disclosure provides a video dubbing method, an apparatus, a device, and a storage medium. The method includes: when receiving an audio recording start trigger operation for a first time point of a target video and starting from a video picture corresponding to the first time point, playing the target video based on a timeline and receiving audio data based on the timeline; and when receiving an audio recording end trigger operation for a second time point, generating an audio recording file. The audio recording file has a linkage relationship with a timeline of a video clip taking the video picture corresponding to the first time point as a starting frame and taking a video picture corresponding to the second time point as an ending frame.
METHOD AND APPARATUS FOR VIDEO PROCESSING
The present disclosure provides a method and apparatus for video processing. The method displays a playback interface of a first video displaying subtitle widget used to display a subtitle of the first video; displays, in response to a first touch operation of a user on the subtitle widget during playback of the first video, a setting pop-up window, the setting pop-up window used by a user to set the language of subtitles for the first video and/or to set the subtitle widget to be hiding; processes the first video in response to a second touch operation of the user on the setting pop-up window.
Methods and systems for supplementing media assets during fast-access playback operations
Methods and systems are disclosed herein for a media guidance application that enhances the viewer experience by providing supplemental content related to a media asset during a fast-access playback operation. For example, in response to a user input during a fast-forward or rewind operation, the media guidance application may generate for display supplemental content related to the progression point of the media asset at which the user input was received while the fast-forward or rewind operation continues.
Method of improving synchronization of the playback of audio data between a plurality of audio sub-systems
A method of synchronizing sub-systems each including a master device and at least one slave device connected to the master device via Bluetooth for playback by the at least one slave device of audio data. The method includes collecting respective internal latency data of the sub-systems, determining, based on the internal latency data of the plurality of sub-systems, respective delays to be applied by the sub-systems between reception of the audio data and playback of the audio data by the slave devices of the sub-system, and, by each sub-system, applying the corresponding delay for playback of the audio data.
DEVICE AND METHOD FOR GENERATING SPEECH VIDEO
A speech video generation device according to an embodiment includes a first encoder that receives an input of a first person background image of a predetermined person partially hidden by a first mask, and extracts a first image feature vector from the first person background image, a second encoder, which receives an input of a second person background image of the person partially hidden by a second mask, and extracts a second image feature vector from the second person background image, a third encoder, which receives an input of a speech audio signal of the person, and extracts a voice feature vector from the speech audio signal, a combining unit, which generates a combined vector of the first image feature vector, the second image feature vector, and the voice feature vector, and a decoder, which reconstructs a speech video of the person using the combined vector as an input.
DISTRIBUTED NETWORK RECORDING SYSTEM WITH TRUE AUDIO TO VIDEO FRAME SYNCHRONIZATION
A remote voice recording is synchronized to video using a cloud-based virtual recording studio within a web browser to record and review audio while viewing the associated video playback and script. All assets are accessed through or streamed within the browser application, thereby eliminating the need for the participants to install any applications or store content locally for later transmission. Recording controls, playback/record status, and audio timeline and script edits are synchronized across participants and may be controlled for all participants remotely by a sound engineer so that each participant sees and hears the section of the program being recorded and edited at the same time. High-resolution audio files for dubbing video are created and time synchronized to the relevant video frames.