G06V30/194

Learning apparatus, learning method, and non-transitory computer readable storage medium
11521110 · 2022-12-06 · ·

According to one aspect of an embodiment a learning apparatus includes a generating unit that generates a model. The model includes an encoder that encodes input information. The model includes a vector generating unit that generates a vector by applying a predetermined matrix to the information encoded by the encoder. The model includes a decoder that generates information corresponding to the information from the vector. The learning apparatus includes a learning unit that, when predetermined input information is input to the model, learns the model such that the model outputs output information corresponding to the input information and the predetermined matrix serves as a dictionary matrix of the input information.

Method and system for improved object marking in sensor data
11521375 · 2022-12-06 · ·

A method and a system for improved object marking in sensor data, as the result of which an at least partially automated annotation of objects or object classes in a recorded data set is possible. The method provides that a scene is detected in a first state by at least one sensor. An association of a first object marking with at least one object contained in the scene in a first data set containing the scene in the first state then takes place. The similar or matching scene is subsequently detected in a second state that is different from the first state by the at least one sensor, and an at least partial acceptance of the first object marking, contained in the first data set, for the object recognized in the second state of the scene as a second object marking in a second data set takes place.

METHOD FOR DEPTH ESTIMATION FOR A VARIABLE FOCUS CAMERA

The disclosure relates to a method including: capturing a sequence of images of a scene with a camera at different focus positions according to a predetermined focus schedule that specifies a chronological sequence of focus positions of the camera, extracting image features of captured images, after having extracted and stored image features from said captured images, processing a captured image whose image features have not yet been extracted, said processing comprising extracting image features from the currently processed image and storing the extracted image features, said processing further comprising aligning image features stored from the previously captured images with the image features of the currently processed image, and generating a multi-dimensional tensor representing the image features of the processed images aligned to the image features of the currently processed image, and generating a two-dimensional depth map using the focus positions in the predetermined focus schedule and the generated multi-dimensional tensor.

MODEL COMPRESSION USING CYCLE GENERATIVE ADVERSARIAL NETWORK KNOWLEDGE DISTILLATION
20220383044 · 2022-12-01 ·

Systems and processes for prediction using generative adversarial network and distillation technology are provided. For example, an input is received at a first portion of a language model. A first output distribution is obtained, based on the input, from the first portion of the language model. Using a first training model, the language model is adjusted based on the first output distribution. The first output distribution is received at a second portion of the language model. A first representation of the input is obtained, based on the first output distribution, from the second portion of the language model. The language model is adjusted, using a second training model, based on the first representation of the input. Using the adjusted language model, an output is provided based on a received user input.

Method and apparatus for building image model

A method and apparatus for building an image model, where the apparatus generates a target image model that includes layers duplicated from a layers of a reference image model and an additional layer, and trains the additional layer.

Model-based image labeling and/or segmentation
11514693 · 2022-11-29 · ·

In some embodiments, reduction of computational resource usage related to image labeling and/or segmentation may be facilitated. In some embodiments, a collection of images may be used to train one or more prediction models. Based on a presentation of an image on a user interface, an indication of a target quantity of superpixels for the image may be obtained. The image may be provided to a first prediction model to cause the prediction model to predict a quantity of superpixels for the image. The target quantity of superpixels may be provided to the first model to update the first model's configurations based on (i) the predicted quantity and (ii) the target quantity. A set of superpixels may be generated for the image based on the target quantity, and segmentation information related to the superpixels set may be provided to a second prediction model to update the second model's configurations.

Deep learning based on image encoding and decoding
11593632 · 2023-02-28 · ·

A deep learning based compression (DLBC) system trains multiple models that, when deployed, generates a compressed binary encoding of an input image that achieves a reconstruction quality and a target compression ratio. The applied models effectively identifies structures of an input image, quantizes the input image to a target bit precision, and compresses the binary code of the input image via adaptive arithmetic coding to a target codelength. During training, the DLBC system reconstructs the input image from the compressed binary encoding and determines the loss in quality from the encoding process. Thus, the models can be continually trained to, when applied to an input image, minimize the loss in reconstruction quality that arises due to the encoding process while also achieving the target compression ratio.

Deep learning based on image encoding and decoding
11593632 · 2023-02-28 · ·

A deep learning based compression (DLBC) system trains multiple models that, when deployed, generates a compressed binary encoding of an input image that achieves a reconstruction quality and a target compression ratio. The applied models effectively identifies structures of an input image, quantizes the input image to a target bit precision, and compresses the binary code of the input image via adaptive arithmetic coding to a target codelength. During training, the DLBC system reconstructs the input image from the compressed binary encoding and determines the loss in quality from the encoding process. Thus, the models can be continually trained to, when applied to an input image, minimize the loss in reconstruction quality that arises due to the encoding process while also achieving the target compression ratio.

Systems and methods for generating a video summary

Systems and method of generating video summaries are presented herein. Information defining a video may be obtained. The video may include a set of frame images. Parameter values for parameters of individual frame images of the video may be determined. Interest weights for the frame images may be determined. An interest curve for the video that characterizes the video by interest weights as a function of progress through the set of frame images may be generated. One or more curve attributes of the interest curve may be identified and one or more interest curve values of the interest curve that correspond to individual curve attributes may be determined. Interest curve values of the interest curve may be compared to threshold curve values. A subset of frame images of the video to include within a video summary of the video may be identified based on the comparison.

Learning apparatus, operation program of learning apparatus, and operation method of learning apparatus
11594056 · 2023-02-28 · ·

A learning apparatus learns a machine learning model for performing semantic segmentation of determining a plurality of classes in an input image in units of pixels by extracting, for each layer, features which are included in the input image and have different frequency bands of spatial frequencies. A learning data analysis unit analyzes the frequency bands included in an annotation image of learning data. A learning method determination unit determines a learning method using the learning data based on an analysis result of the frequency bands by the learning data analysis unit. A learning unit learns the machine learning model via the determined learning method using the learning data.