H04N19/87

Multi-person pose recognition method and apparatus, electronic device, and storage medium

In a multi-person pose recognition method, a to-be-recognized image is obtained, and a circuitous pyramid network is constructed. The circuitous network pyramid includes parallel phases, and each phase includes downsampling network layers, upsampling network layers, and a first residual connection layer to connect the downsampling and upsampling network layers. The phases are interconnected by a second residual connection layer. The circuitous pyramid network is traversed, by extracting a feature map for each phase, and the feature map of the last phase is determined to be the feature map of the to-be-recognized image. Multi-pose recognition is then performed on the to-be-recognized image according to the feature map to obtain a pose recognition result for the to-be-recognized image.

Spatiotemporal prediction for bidirectionally predictive (B) pictures and motion vector prediction for multi-picture reference motion compensation

Several improvements for use with Bidirectionally Predictive (B) pictures within a video sequence are provided. In certain improvements Direct Mode encoding and/or Motion Vector Prediction are enhanced using spatial prediction techniques. In other improvements Motion Vector prediction includes temporal distance and subblock information, for example, for more accurate prediction. Such improvements and other presented herein significantly improve the performance of any applicable video coding system/logic.

Spatiotemporal prediction for bidirectionally predictive (B) pictures and motion vector prediction for multi-picture reference motion compensation

Several improvements for use with Bidirectionally Predictive (B) pictures within a video sequence are provided. In certain improvements Direct Mode encoding and/or Motion Vector Prediction are enhanced using spatial prediction techniques. In other improvements Motion Vector prediction includes temporal distance and subblock information, for example, for more accurate prediction. Such improvements and other presented herein significantly improve the performance of any applicable video coding system/logic.

SELECTIVELY IDENTIFYING DATA BASED ON MOTION DATA FROM A DIGITAL VIDEO TO PROVIDE AS INPUT TO AN IMAGE PROCESSING MODEL

The present disclosure relates to systems, methods, and computer-readable media for selectively identifying pixel data to provide as an input to an image processing model based on motion data associated with the content of a digital video. For example, systems disclosed herein include receiving a compressed digital video and decompressing the compressed digital video to generate a decompressed digital video. The systems disclosed herein further include extracting or otherwise identifying motion data while decompressing the compressed digital video. The systems disclosed herein also include analyzing the motion data to determine a subset of pixel data from the decompressed digital video to provide as input to an image processing model trained to generate an output based on input pixel data.

SELECTIVELY IDENTIFYING DATA BASED ON MOTION DATA FROM A DIGITAL VIDEO TO PROVIDE AS INPUT TO AN IMAGE PROCESSING MODEL

The present disclosure relates to systems, methods, and computer-readable media for selectively identifying pixel data to provide as an input to an image processing model based on motion data associated with the content of a digital video. For example, systems disclosed herein include receiving a compressed digital video and decompressing the compressed digital video to generate a decompressed digital video. The systems disclosed herein further include extracting or otherwise identifying motion data while decompressing the compressed digital video. The systems disclosed herein also include analyzing the motion data to determine a subset of pixel data from the decompressed digital video to provide as input to an image processing model trained to generate an output based on input pixel data.

Data-driven event detection for compressed video

A system can obtain a labelled data set, including historic video data and labelled events. The system can divide the labelled data set into historic training/testing data sets. The system can determine, using the historic training data set, a plurality of different parameter configurations to be used by a video encoder to encode a video that includes a plurality of video frames. Each parameter configuration can include a group of pictures (“GOP”) size and a scenecut threshold. The system can calculate an accuracy of event detection (“ACC”) and a filtering rate (“FR”) for each parameter configuration. The system can calculate, for each parameter configuration of the plurality of different parameter configurations, a harmonic mean between the ACC and the FR. The system can then select a best parameter configuration of the plurality of different parameter configurations based upon the parameter configuration that has the highest harmonic mean.

Data-driven event detection for compressed video

A system can obtain a labelled data set, including historic video data and labelled events. The system can divide the labelled data set into historic training/testing data sets. The system can determine, using the historic training data set, a plurality of different parameter configurations to be used by a video encoder to encode a video that includes a plurality of video frames. Each parameter configuration can include a group of pictures (“GOP”) size and a scenecut threshold. The system can calculate an accuracy of event detection (“ACC”) and a filtering rate (“FR”) for each parameter configuration. The system can calculate, for each parameter configuration of the plurality of different parameter configurations, a harmonic mean between the ACC and the FR. The system can then select a best parameter configuration of the plurality of different parameter configurations based upon the parameter configuration that has the highest harmonic mean.

SIGNAL RESHAPING FOR HIGH DYNAMIC RANGE SIGNALS

In a method to improve backwards compatibility when decoding high-dynamic range images coded in a wide color gamut (WCG) space which may not be compatible with legacy color spaces, hue and/or saturation values of images in an image database are computed for both a legacy color space (say, YCbCr-gamma) and a preferred WCG color space (say, IPT-PQ). Based on a cost function, a reshaped color space is computed so that the distance between the hue values in the legacy color space and rotated hue values in the preferred color space is minimized HDR images are coded in the reshaped color space. Legacy devices can still decode standard dynamic range images assuming they are coded in the legacy color space, while updated devices can use color reshaping information to decode HDR images in the preferred color space at full dynamic range.

SIGNAL RESHAPING FOR HIGH DYNAMIC RANGE SIGNALS

In a method to improve backwards compatibility when decoding high-dynamic range images coded in a wide color gamut (WCG) space which may not be compatible with legacy color spaces, hue and/or saturation values of images in an image database are computed for both a legacy color space (say, YCbCr-gamma) and a preferred WCG color space (say, IPT-PQ). Based on a cost function, a reshaped color space is computed so that the distance between the hue values in the legacy color space and rotated hue values in the preferred color space is minimized HDR images are coded in the reshaped color space. Legacy devices can still decode standard dynamic range images assuming they are coded in the legacy color space, while updated devices can use color reshaping information to decode HDR images in the preferred color space at full dynamic range.

SHOT-CHANGE DETECTION USING CONTAINER LEVEL INFORMATION
20230060780 · 2023-03-02 ·

The disclosed computer-implemented method may include, for a current frame of a sequence of video frames, determining a frame type label of the current frame. The method may include, in response to determining that the current frame is labeled as an intra frame (I-frame), decoding the current frame and comparing the decoded frame to historical I-frame data. The method may also include, in response to the comparison satisfying a shot-change threshold, flagging the current frame as a shot-change frame, and in response to flagging the current frame as the shot-change frame, storing the current frame for a subsequent shot-change detection. The method may further include updating, based on flagged shot-change frames, shot boundaries for the sequence of video frames. Various other methods, systems, and computer-readable media are also disclosed.