Patent classifications
H04N19/177
Optimized multipass encoding
An original input video file is encoded using a machine learning approach. The encoder performs a detailed video analysis and selection of encoding parameters that using a machine learning algorithm improves over time. The encoding process is done using a multi-pass approach. During a first pass, the entire video file is scanned to extract video property information that does not require in-depth analyses. The extracted data is then entered into an encoding engine, which uses artificial intelligence to produce optimized encoder settings. The video file is into a set of time-based chunks and, in a second pass, the encoding parameters for each chunk are set and distributed to encoding nodes for parallel processing. These encoder instances probe-encode each chunk determine the level of complexity for the chunk and to derive chunk-specific encoding parameters. Following completion of the second pass, the results of both passes are then merged to obtain the necessary information for the encoder to achieve the best possible result.
Data-driven event detection for compressed video
A system can obtain a labelled data set, including historic video data and labelled events. The system can divide the labelled data set into historic training/testing data sets. The system can determine, using the historic training data set, a plurality of different parameter configurations to be used by a video encoder to encode a video that includes a plurality of video frames. Each parameter configuration can include a group of pictures (“GOP”) size and a scenecut threshold. The system can calculate an accuracy of event detection (“ACC”) and a filtering rate (“FR”) for each parameter configuration. The system can calculate, for each parameter configuration of the plurality of different parameter configurations, a harmonic mean between the ACC and the FR. The system can then select a best parameter configuration of the plurality of different parameter configurations based upon the parameter configuration that has the highest harmonic mean.
Methods, systems, and media for generating an immersive light field video with a layered mesh representation
Mechanisms for generating compressed images are provided. More particularly, methods, systems, and media for capturing, reconstructing, compressing, and rendering view-dependent immersive light field video with a layered mesh representation are provided.
Methods, systems, and media for generating an immersive light field video with a layered mesh representation
Mechanisms for generating compressed images are provided. More particularly, methods, systems, and media for capturing, reconstructing, compressing, and rendering view-dependent immersive light field video with a layered mesh representation are provided.
VIDEO DECODING METHOD AND DEVICE ENABLING IMPROVED USER INTERACTION WITH VIDEO CONTENT
A method of managing the flow of data through a video decoder is described. The method includes receiving a stream of video data including compressed video frames organized in groups-of-pictures (GOP). A GOP typically includes one intra-frame coded image and a plurality of inter-frame coded images. Data included in received GOPs as uniquely identified GOP data blocks with uniquely identified compressed video frames are entered in a pre-decode cache module and they are selected, based on a current playback status, to be appended to a decode queue for GOP data blocks that will be delivered as input to a video decoder (106). Output data from the decoder (106) is delivered as decoded video frames to a post-decode cache module (303). Also described is a video decoder and a software program product.
VIDEO DECODING METHOD AND DEVICE ENABLING IMPROVED USER INTERACTION WITH VIDEO CONTENT
A method of managing the flow of data through a video decoder is described. The method includes receiving a stream of video data including compressed video frames organized in groups-of-pictures (GOP). A GOP typically includes one intra-frame coded image and a plurality of inter-frame coded images. Data included in received GOPs as uniquely identified GOP data blocks with uniquely identified compressed video frames are entered in a pre-decode cache module and they are selected, based on a current playback status, to be appended to a decode queue for GOP data blocks that will be delivered as input to a video decoder (106). Output data from the decoder (106) is delivered as decoded video frames to a post-decode cache module (303). Also described is a video decoder and a software program product.
TECHNIQUES FOR CONSTRAINT FLAG SIGNALING FOR RANGE EXTENSION WITH RESIDUAL RICE CODING EXTENSION
Aspects of the disclosure provide methods and apparatuses for video data processing. In some examples, an apparatus for video data processing includes processing circuitry. For example, the processing circuitry determines a first syntax element for coding control in a first scope of coded video data in a bitstream. The first syntax element is associated with a second coding tool that is alternative to a first coding tool for Rice parameter derivation in a residual coding. In response to the first syntax element being a first value indicative of disabling of the second coding tool in the first scope, the processing circuitry decodes the first scope of coded video data that includes one or more second scopes of coded video data without invoking the second coding tool.
METHOD FOR DECODING IMMERSIVE VIDEO AND METHOD FOR ENCODING IMMERSIVE VIDEO
A method of encoding an immersive video according to the present disclosure includes determining whether an input image is a first type, converting the input image into the first type when the input image is a second type different from the first type, encoding a converted image, and generating metadata for the encoded image.
METHOD FOR DECODING IMMERSIVE VIDEO AND METHOD FOR ENCODING IMMERSIVE VIDEO
A method of encoding an immersive video according to the present disclosure includes determining whether an input image is a first type, converting the input image into the first type when the input image is a second type different from the first type, encoding a converted image, and generating metadata for the encoded image.
Systems and methods for multi-video stream transmission
The present disclosure relates to systems and methods for a multi-video stream transmission to a client terminal. The systems and methods may include obtaining a multi-video stream including a plurality of video streams, each video stream including multiple key frames characterized by a frame rate and a key frame interval. The systems and methods may include determining a delay time of an initial key frame for each video stream based on a plurality of frame rates and a plurality of key frame intervals of the plurality of video streams. The systems and methods may further include processing the plurality of video streams to determine a desired sending time of the initial key frame in the corresponding video streams based on the delay time of the initial key frame in each video stream. The systems and methods may further include transmitting the plurality of processed video streams to the client terminal.