Patent classifications
H04N19/23
USING REINFORCEMENT LEARNING AND PERSONALIZED RECOMMENDATIONS TO GENERATE A VIDEO STREAM HAVING A PREDICTED, PERSONALIZED, AND ENHANCE-QUALITY FIELD-OF-VIEW
Embodiments of the invention are directed to a computer-implemented method that includes using a reinforcement learning (RL) system to generate a first set of displayed region candidates based on inputs received from online users while watching video. A recommendation system is used to rank the first set of displayed region candidates based on inputs received from a local user watching video. The recommendation system is further used to select a first highest ranked one of the first set of displayed region candidates. Based on the first highest ranked one of the first set of displayed region candidates, a first section of a first raw video frame is fetched that matches the first highest ranked one of the first set of displayed candidate regions, wherein the first section of the first raw video frame includes a first predicted display region of the video frame.
METHOD AND SYSTEM FOR ENCODING, DECODING AND PLAYBACK OF VIDEO CONTENT IN CLIENT-SERVER ARCHITECTURE
One or more methods and systems are provided for encoding, decoding and playback of a video content in a client-server architecture. The invention proposes a video encoding and decoding method that includes identification of activities in the video content, identification of corresponding API's with related parameters corresponding to activity and storing those API's along with base frame and object frame in the database. In this invention, animation API functions are created for unknown/random activities. The playback involves decoding the data, which is a set of instructions to play the animation with given objects and base frames, and animating object frame over base frame using said API functions.
Gaze-driven recording of video
Systems and methods for gaze-driven recording of video are described. Some implementations may include accessing gaze data captured using one or more gaze-tracking sensors; applying a temporal filter to the gaze data to obtain a smoothed gaze estimate; determining a region of interest based on the smoothed gaze estimate, wherein the region of interest identifies a subset of a field of view; accessing a frame of video; recording a portion of the frame associated with the region of interest as an enhanced frame of video, wherein the portion of the frame corresponds to a smaller field of view than the frame; and storing, transmitting, or displaying the enhanced frame of video.
Gaze-driven recording of video
Systems and methods for gaze-driven recording of video are described. Some implementations may include accessing gaze data captured using one or more gaze-tracking sensors; applying a temporal filter to the gaze data to obtain a smoothed gaze estimate; determining a region of interest based on the smoothed gaze estimate, wherein the region of interest identifies a subset of a field of view; accessing a frame of video; recording a portion of the frame associated with the region of interest as an enhanced frame of video, wherein the portion of the frame corresponds to a smaller field of view than the frame; and storing, transmitting, or displaying the enhanced frame of video.
Method and apparatus for supporting augmented and/or virtual reality playback using tracked objects
Methods for capturing and generating information about objects in a 3D environment that can be used to support augmented reality or virtual reality playback operations in a data efficient manner are described. In various embodiments one or more frames including foreground objects are generated and transmitted with corresponding information that can be used to determine the location where the foreground objects are to be positioned relative to a background for one or more frame times are described. Data efficiency is achieved by specifying different locations for a foreground object for different frame times avoiding in some embodiments the need to transmit an image and depth information defining the same of the foreground for each frame time. The frames can be encoded using a video encoder even though some of the information communicated are not pixel values but alpha blending values, object position information, mesh distortion information, etc.
FLEXIBLE REFERENCE PICTURE MANAGEMENT FOR VIDEO ENCODING AND DECODING
Innovations in flexible reference picture management are described. For example, a video encoder and video decoder use a global reference picture set (“GRPS”) of reference pictures that remain in memory, and hence are available for use in video encoding/decoding, longer than conventional reference pictures. In particular, reference pictures of the GRPS remain available across random access boundaries. Or, as another example, a video encoder and video decoder clip a reference picture so that useful regions of the reference picture are retained in memory, while unhelpful or redundant regions of the reference picture are discarded. Reference picture clipping can reduce the amount of memory needed to store reference pictures or improve the utilization of available memory by providing better options for motion compensation. Or, as still another example, a video encoder and video decoder filter a reference picture to remove random noise (e.g., capture noise due to camera imperfections during capture).
FLEXIBLE REFERENCE PICTURE MANAGEMENT FOR VIDEO ENCODING AND DECODING
Innovations in flexible reference picture management are described. For example, a video encoder and video decoder use a global reference picture set (“GRPS”) of reference pictures that remain in memory, and hence are available for use in video encoding/decoding, longer than conventional reference pictures. In particular, reference pictures of the GRPS remain available across random access boundaries. Or, as another example, a video encoder and video decoder clip a reference picture so that useful regions of the reference picture are retained in memory, while unhelpful or redundant regions of the reference picture are discarded. Reference picture clipping can reduce the amount of memory needed to store reference pictures or improve the utilization of available memory by providing better options for motion compensation. Or, as still another example, a video encoder and video decoder filter a reference picture to remove random noise (e.g., capture noise due to camera imperfections during capture).
METHOD AND PROGRAM FOR PRODUCING MULTI REACTIVE VIDEO, AND GENERATE META DATA TO MAKE MULTI REACTIVE VIDEO, AND ANALYZE INTO INTERACTION DATA TO UNDERSTAND HUMAN ACT
Disclosed is a multi-reactive video generating method and program that performs various condition playbacks depending on a user’s manipulation, based on a video database (e.g., a basic video) in which a general video or a plurality of image frames are stored. According to an embodiment of the inventive concept, various actions (i.e., reactions) may be applied as the multi-reactive video generation file is played with a general video or a combination of a plurality of image frames.
VIDEO ENCODING THROUGH NON-SALIENCY COMPRESSION FOR LIVE STREAMING OF HIGH DEFINITION VIDEOS IN LOW-BANDWIDTH TRANSMISSION
A computer-implemented method of encoding video streams for low-bandwidth transmissions includes identifying a salient data and a non-salient data in a high-resolution video stream. The salient data and the non-salient data is segmented. The non-salient data is compressed to a lower resolution. The salient data and the compressed non-salient data are transmitted in a low-bandwidth transmission.
VIDEO ENCODING THROUGH NON-SALIENCY COMPRESSION FOR LIVE STREAMING OF HIGH DEFINITION VIDEOS IN LOW-BANDWIDTH TRANSMISSION
A computer-implemented method of encoding video streams for low-bandwidth transmissions includes identifying a salient data and a non-salient data in a high-resolution video stream. The salient data and the non-salient data is segmented. The non-salient data is compressed to a lower resolution. The salient data and the compressed non-salient data are transmitted in a low-bandwidth transmission.