G10L25/57

System for deliverables versioning in audio mastering

Some implementations of the disclosure relate to using a model trained on mixing console data of sound mixes to automate the process of sound mix creation. In one implementation, a non-transitory computer-readable medium has executable instructions stored thereon that, when executed by a processor, causes the processor to perform operations comprising: obtaining a first version of a sound mix; extracting first audio features from the first version of the sound mix obtaining mixing metadata; automatically calculating with a trained model, using at least the mixing metadata and the first audio features, mixing console features; and deriving a second version of the sound mix using at least the mixing console features calculated by the trained model.

System for deliverables versioning in audio mastering

Some implementations of the disclosure relate to using a model trained on mixing console data of sound mixes to automate the process of sound mix creation. In one implementation, a non-transitory computer-readable medium has executable instructions stored thereon that, when executed by a processor, causes the processor to perform operations comprising: obtaining a first version of a sound mix; extracting first audio features from the first version of the sound mix obtaining mixing metadata; automatically calculating with a trained model, using at least the mixing metadata and the first audio features, mixing console features; and deriving a second version of the sound mix using at least the mixing console features calculated by the trained model.

Method and apparatus for locating video playing node, device and storage medium

The disclosure provides a method for locating a video playing node, and relates to fields of big data and video processing. The method includes: selecting a target video out from a plurality of videos; and sending the target video, a plurality of subtitle text segments of the target video and start time information of each of the plurality of subtitle text segments to a client, to cause the client to display the plurality of subtitle text segments, and determine, in response to a trigger operation on an any subtitle text segment of the plurality of subtitle text segments, a start playing node of the target video based on the start time information of the any subtitle text segment. The disclosure further provides an apparatus for locating a video playing node, an electronic device and a storage medium.

Method and apparatus for locating video playing node, device and storage medium

The disclosure provides a method for locating a video playing node, and relates to fields of big data and video processing. The method includes: selecting a target video out from a plurality of videos; and sending the target video, a plurality of subtitle text segments of the target video and start time information of each of the plurality of subtitle text segments to a client, to cause the client to display the plurality of subtitle text segments, and determine, in response to a trigger operation on an any subtitle text segment of the plurality of subtitle text segments, a start playing node of the target video based on the start time information of the any subtitle text segment. The disclosure further provides an apparatus for locating a video playing node, an electronic device and a storage medium.

Target character video clip playing method, system and apparatus, and storage medium
11580742 · 2023-02-14 · ·

Provided are a target character video clip playing method, system and apparatus, and a storage medium. The method comprises: using image recognition technology to perform target character recognition on an entire video, positioning a plurality of video clips containing target characters, and obtaining a first playing time period set corresponding to the video clips; according to audio clips corresponding to each character marked within the entire video, obtaining a second playing time period set corresponding to the audio clips of the various characters; merging the time periods included in the playing time period sets, and obtaining a sum playing time period set of the target characters; according to a sorting of various playing timelines within the sum playing time period set, performing video playing of the target characters.

METHOD AND APPARATUS FOR PROCESSING VIRTUAL VIDEO LIVESTREAMING, STORAGE MEDIUM AND ELECTRONIC DEVICE
20230039789 · 2023-02-09 ·

A method includes: receiving text data and motion data of a virtual object, the motion data including a motion identifier of a specified motion and a start position identifier of a start position that the specified motion starts being in line with text in the text data; generating audio data and expression data of the virtual object according to the text data, and generating facial images of the virtual object according to the expression data; generating a background image sequence containing the specified motion according to the start position identifier and the motion identifier, the background image sequence including at least one background image; performing image fusion processing on the facial images and the at least one background image to obtain one or more live video frames; and synthesizing the live video frames with the audio data into a live video stream in real time.

METHOD AND APPARATUS FOR PROCESSING VIRTUAL VIDEO LIVESTREAMING, STORAGE MEDIUM AND ELECTRONIC DEVICE
20230039789 · 2023-02-09 ·

A method includes: receiving text data and motion data of a virtual object, the motion data including a motion identifier of a specified motion and a start position identifier of a start position that the specified motion starts being in line with text in the text data; generating audio data and expression data of the virtual object according to the text data, and generating facial images of the virtual object according to the expression data; generating a background image sequence containing the specified motion according to the start position identifier and the motion identifier, the background image sequence including at least one background image; performing image fusion processing on the facial images and the at least one background image to obtain one or more live video frames; and synthesizing the live video frames with the audio data into a live video stream in real time.

Systems and Methods for Assisted Translation and Lip Matching for Voice Dubbing
20230039248 · 2023-02-09 ·

Systems and methods for generating candidate translations for use in creating synthetic or human-acted voice dubbings, aiding human translators in generating translations that match the corresponding video, automatically grading how well a candidate translation matches the corresponding video, suggesting modifications to the speed and/or timing of the translated text to improve the grading of a candidate translation, and suggesting modifications to the voice dubbing and/or video to improve the grading of a candidate translation. In that regard, the present technology may be used to fully automate the process of generating lip-matched translations and associated voice dubbings, or as an aid for human-in-the-loop processes that may reduce or eliminate the time and effort required from translators, adapters, voice actors, and/or audio editors to generate voice dubbings.

ADJUSTING AUDIO AND NON-AUDIO FEATURES BASED ON NOISE METRICS AND SPEECH INTELLIGIBILITY METRICS

Some implementations involve determining a noise metric and/or a speech intelligibility metric and determining a compensation process corresponding to the noise metric and/or the speech intelligibility metric. The compensation process may involve altering a processing of audio data and/or applying a non-audio-based compensation method. In some examples, altering the processing of the audio data does not involve applying a broadband gain increase to the audio signals. Some examples involve applying the compensation process in an audio environment. Other examples involve determining compensation metadata corresponding to the compensation process and transmitting an encoded content stream that includes encoded compensation metadata, encoded video data and encoded audio data from a first device to one or more other devices.

ADJUSTING AUDIO AND NON-AUDIO FEATURES BASED ON NOISE METRICS AND SPEECH INTELLIGIBILITY METRICS

Some implementations involve determining a noise metric and/or a speech intelligibility metric and determining a compensation process corresponding to the noise metric and/or the speech intelligibility metric. The compensation process may involve altering a processing of audio data and/or applying a non-audio-based compensation method. In some examples, altering the processing of the audio data does not involve applying a broadband gain increase to the audio signals. Some examples involve applying the compensation process in an audio environment. Other examples involve determining compensation metadata corresponding to the compensation process and transmitting an encoded content stream that includes encoded compensation metadata, encoded video data and encoded audio data from a first device to one or more other devices.