H04N21/234336

METHOD FOR JUST-IN-TIME TRANSCODING OF BYTERANGE-ADDRESSABLE PARTS
20230011518 · 2023-01-12 ·

A method including: ingesting a video segment and a set of video features of the video segment; estimating a part size distribution for the video segment based on the set of video features and a first rendition of the video segment; calculating a maximum expected part size based on a threshold percentile in the part size distribution; at a first time, transmitting, to an video player, a manifest file indicating a set of byterange-addressable parts of the video segment in the first rendition, each byterange addressed part characterized by the maximum expected part size; at a second time, receiving, a playback request for a first byterange-addressable part; transcoding the first byterange-addressable part; in response to the maximum expected part size exceeding a size of the first byterange-addressable part in the first rendition, appending padding data to the first byterange-addressable part; and transmitting the first byterange-addressable part to the AV player.

SYSTEM FOR PROVIDING CUSTOMIZED VIDEO PRODUCING SERVICE USING CLOUD-BASED VOICE COMBINING
20220415362 · 2022-12-29 ·

A system for providing a customized video producing service using a cloud based voice combination of the present invention comprises a customized video production service providing server including: a user terminal that is input and uploads utterance of a user by voice data, selects any one category among at least one type of category to select content including an image or a video, selects a subtitle or background music, and plays a customized video including the content, the uploaded voice data, and the subtitle or background music; a database unit classifying and storing text, image, video, and background music by the at least one type of category; an upload unit receiving the voice data corresponding to the utterance of the user uploaded from the user terminal; a conversion unit that converts the uploaded voice data into text data using STT (Speech to Text) and stores the converted text data; a provision unit that provides an image or video previously mapped and stored in the selected category to the user terminal when any one category among the at least one type of category is selected from the user terminal; a creation unit that creates the customized video including the content, the uploaded voice, and the subtitles or background music when receiving subtitle data or selection of background music from the user terminal by the user terminal's selection of the subtitle or background music.

METHODS AND APPARATUS TO DETERMINE THE SPEED-UP OF MEDIA PROGRAMS USING SPEECH RECOGNITION
20220417588 · 2022-12-29 ·

Methods, apparatus, systems and articles of manufacture are disclosed to determine the speed-up of media programs using speech recognition. An example apparatus disclosed herein is to perform speech recognition on a first audio clip collected by a media meter to recognize a first text string associated with the first audio clip, compare the first text string to a plurality of reference text strings associated with a corresponding plurality of reference audio clips to identify a matched one of the reference text strings, and estimate a presentation rate of the first audio clip based on a first time associated with the first audio clip and a second time associated with a first one of the reference audio clips corresponding to the matched one of the reference text strings.

Caption modification and augmentation systems and methods for use by hearing assisted user

A system and method for facilitating communication between an assisted user (AU) and a hearing user (HU) includes receiving an HU voice signal as the AU and HU participate in a call using AU and HU communication devices, transcribing HU voice signal segments into verbatim caption segments, processing each verbatim caption segment to identify an intended communication (IC) intended by the HU upon uttering an associated one of the HU voice signal segments, for at least a portion of the HU voice signal segments (i) using an associated IC to generate an enhanced caption different than the associated verbatim caption, (ii) for each of a first subset of the HU voice signal segments, presenting the verbatim captions via the AU communication device display for consumption, and (iii) for each of a second subset of the HU voice signal segments, presenting enhanced captions via the AU communication device display for consumption.

Rendering stream controller
11523151 · 2022-12-06 · ·

A system is provided for a rendering stream distributor controller for use with a plurality of content sources, a HTML code repository, a plurality of video rendering engines and a distribution network. The rendering stream distributor controller includes an outbound IP address inventory system, a video rendering engine and network elements inventory system and a rendering stream controller. The rendering stream controller is able to provide a stream instruction, based on one of a plurality of outbound IP addresses, one of a plurality of HTML content identification data, and one of a plurality of sets of HTML code so as to instruct one of the plurality of video rendering engines to output an MPEG transport stream.

Model-based dubbing to translate spoken audio in a video

Model-based dubbing techniques are implemented to generate a translated version of a source video. Spoken audio portions of a source video may be extracted and semantic graphs generated that represent the spoken audio portions. The semantic graphs may be used to produce translations of the spoken portions. A machine learning model may be implemented to generate replacement audio for the spoken portions using the translation of the spoken portion. A machine learning model may be implemented to generate modifications to facial image data for a speaker of the replacement audio.

VIDEO CAPTION GENERATING APPARATUS AND METHOD
20220375221 · 2022-11-24 ·

The disclosure is for a video caption generation apparatus and method thereof to generate a natural language sentence explaining a video used as input. The disclosure is configured by including an embedding unit to perform a video embedding and a category information embedding, a stack embedding encoder block unit to select a feature by utilizing the embedded video vector and category vector, a video-category attention unit to receive a result of the stack embedding encoder, to generate a similarity matrix and a feature matrix for a video and category information, and to provide a final encoding result, and a decoder module to generate a sentence by utilizing the final encoding result.

Real time popularity based audible content acquisition

A personalized news service provides personalized news programs for its users by generating personalized combinations of audible versions of news stories derived from text-based based versions of the news stories. The audible versions may be generated from the text-based version by a text-to-speech system, or may by recording a person reading aloud the text-based version. To acquire recordings, the personalized news service can make a determination that a particular news story has a threshold extent of popularity. The news service can then transmit a request to a remote recording station for a recording of a verbal reading of the particular news story. The news service can then receive the requested recording from the remote recording station.

Video processing for enabling sports highlights generation
11594028 · 2023-02-28 · ·

One or more highlights of a video stream may be identified. The highlights may be segments of a video stream, such as a broadcast of a sporting event, that are of particular interest to one or more users. According to one method, at least a portion of the video stream may be stored. The portion of the video stream may be compared with templates of a template database to identify the one or more highlights. Each highlight may be a subset of the video stream that is deemed likely to match the one or more templates. The highlights, an identifier that identifies each of the highlights within the video stream, and/or metadata pertaining particularly to the one or more highlights may be stored to facilitate playback of the highlights for the users.

POINT CLOUD DATA ENCODING METHOD, POINT CLOUD DATA DECODING METHOD, POINT CLOUD DATA PROCESSING METHOD, APPARATUSES, ELECTRONIC DEVICE, COMPUTER PROGRAM PRODUCT, AND COMPUTER-READABLE STORAGE MEDIUM

A point cloud data encoding method is provided, which includes: acquiring initial point cloud data in a point cloud data processing environment; determining a space grid structure corresponding to the initial point cloud data; determining a filling order of different point cloud points in the initial point cloud data in the space grid structure; determining, based on the filling order of different point cloud points in the initial point cloud data in the space grid structure, residual information matched with the initial point cloud data; and encoding, according to the residual information, the initial point cloud data to obtain target point cloud data. A point cloud data decoding method, a point cloud data processing method, apparatuses, an electronic device, a computer program product, and a computer-readable storage medium are also provided.