Patent classifications
H04N19/25
Transferring data from autonomous vehicles
A system includes at least one imaging sensor and a processor. The processor is configured to acquire, using the imaging sensor, detected data describing an environment of an autonomous vehicle. The processor is further configured to derive reference data, which describe the environment, from a predefined map, to compute difference data representing a difference between the detected data and the reference data, and to transfer the difference data. Other embodiments are also described.
Temporal alignment of MPEG and GLTF media
An apparatus includes at least one processor; and at least one memory including computer program code; wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to: provide an animation timing extension; wherein the animation timing extension links a graphics library transmission format animation to timed metadata and a metadata track of the timed metadata; wherein the metadata track of the timed metadata is listed with an object associated with moving picture media; and align at least one timeline of the moving picture media with at least one timeline of the graphics library transmission format animation; wherein a sample of the metadata track is used to manipulate an animation event.
DATA STREAMING PROTOCOLS IN EDGE COMPUTING
Systems and methods are provided for reducing stream data according to a data streaming protocol under a multi-access edge computing. In particular, an IoT device, such as a video image sensing device, may capture stream data and generate inference data by applying a machine-learning model trained to infer data based on the captured stream data. The inference data represents the captured stream data in a reduced data size based on performing data analytics on the captured data. The IoT device formats the inference data according to the data streaming protocol. In contrast to video data compression, the data streaming protocol includes instructions for transmitting the reduced volume of inference data through a data analytics pipeline.
METHOD AND APPARATUS FOR AR REMOTE RENDERING PROCESSES
A method for augmented reality (AR) remote rendering process performed by a remote rendering device. The method includes performing 3D scene compositing based on a live geometry information and an anchor information; receiving, from an AR device, a pose information; rendering 2D frame of the composited 3D scene based on the pose information; performing 2D frame processing on the rendered 2D frame; creating a first metadata associated with the 2D frame processing, a second metadata associated with the rendered 2D frame and a third metadata associated with AR frame compositing of the rendered 2D frame; and transmitting, to the AR device, the rendered 2D frame, the first metadata, the second metadata and the third metadata.
Moving image analysis apparatus, system, and method
A moving image analysis apparatus includes at least one of a processor and a circuitry configured to perform operations including acquiring first data and second data used in processing, in which a moving image is compressed and encoded, for a first frame and a second frame, respectively, included in the moving image, detecting first feature data indicating a first feature of the moving image on the basis of the first frame and the first data and detecting second feature data indicating a second feature of the moving image on the basis of the second frame and the second data, and detecting an object included in the first frame on the basis of the first feature data and the second feature data.
METHOD, AN APPARATUS AND A COMPUTER PROGRAM PRODUCT FOR VIDEO ENCODING AND VIDEO DECODING
The embodiments relate to a method comprising establishing a three-dimensional conversational interaction with one or more receivers; generating a pointcloud relating to a user and capturing audio from one or more audio source; generating conversational scene description comprising at least a first dynamic object describing a virtual space for the three-dimensional conversational interaction, wherein the first dynamic object refers to one or more objects specific to the three-dimensional conversational interaction, wherein said one or more objects comprises at least data relating to transformable pointcloud; audio obtained from said one or more audio source and input obtained from one or more connected devices controlling at least the pointcloud, wherein said objects are linked to each other for seamless manipulation; applying the conversational scene description into a metadata, and transmitting the metadata with the respective audio in realtime to said one or more receivers.
Method and System for Encoding a 3D Scene
A computer-implemented method for encoding a scene volume includes: (a) identifying features of a scene volume that are within a camera perspective range with respect to a default camera perspective; (b) converting the identified features into rendered features; and (c) sorting the rendered features into a plurality of scene layers, each including corresponding depth, color, and transparency maps for the respective rendered features. Further, (a), (b), and (c) may be repeated, operating on temporally ordered scene volumes, to produce and output a sequence encoding a video. Corresponding systems and non-transitory computer-readable media are disclosed for encoding a 3D scene and for decoding an encoded 3D scene. Efficient compression, transmission, and playback of video describing a 3D scene can be enabled, including for virtual reality displays with updates based on a changing perspective of a user viewer for variable-perspective playback.
HAPTIC ATLAS CODING AND DECODING FORMAT
Methods and devices for encoding and decoding a data stream representative of a 3D volumetric scene comprising haptic features associated with objects of the 3D scene are disclosed. At the encoding, haptic features are associated with objects of the scene, for instance as haptic maps. Haptic components are stored in points of the 3D scene as color may be. These components are projected onto patch pictures which are packed in atlas images. At the decoding, haptic components are un-projected onto reconstructed points as color may be according to the depth component of pixels of the decoded atlases.
Layered scene decomposition codec system and methods
A system and methods for a CODEC driving a real-time light field display for multi-dimensional video streaming, interactive gaming and other light field display applications is provided applying a layered scene decomposition strategy. Multi-dimensional scene data is divided into a plurality of data layers of increasing depths as the distance between a given layer and the plane of the display increases. Data layers are sampled using a plenoptic sampling scheme and rendered using hybrid rendering, such as perspective and oblique rendering, to encode light fields corresponding to each data layer. The resulting compressed, (layered) core representation of the multi-dimensional scene data is produced at predictable rates, reconstructed and merged at the light field display in real-time by applying view synthesis protocols, including edge adaptive interpolation, to reconstruct pixel arrays in stages (e.g. columns then rows) from reference elemental images.
Layered scene decomposition codec system and methods
A system and methods for a CODEC driving a real-time light field display for multi-dimensional video streaming, interactive gaming and other light field display applications is provided applying a layered scene decomposition strategy. Multi-dimensional scene data is divided into a plurality of data layers of increasing depths as the distance between a given layer and the plane of the display increases. Data layers are sampled using a plenoptic sampling scheme and rendered using hybrid rendering, such as perspective and oblique rendering, to encode light fields corresponding to each data layer. The resulting compressed, (layered) core representation of the multi-dimensional scene data is produced at predictable rates, reconstructed and merged at the light field display in real-time by applying view synthesis protocols, including edge adaptive interpolation, to reconstruct pixel arrays in stages (e.g. columns then rows) from reference elemental images.