G06V10/803

Multi-modal machine learning architectures integrating language models and computer vision systems
11803710 · 2023-10-31 · ·

Improved multi-modal machine learning networks integrate computer vision systems with language models. In certain embodiments, a computer vision system analyzes at least one image to generate a computer vision output. The language model generates an output based, at least in part, on a consideration of the computer vision output. The outputs of the language model can be generated by jointly considering textual information learned by the language model and visual content extracted by the computer vision system, thereby significantly improving the accuracy, breadth, and comprehensiveness of the outputs.

Data fusion system for a vehicle equipped with unsynchronized perception sensors

A sensor data fusion system for a vehicle with multiple sensors includes a first-sensor, a second-sensor, and a controller-circuit. The first-sensor is configured to output a first-frame of data and a subsequent-frame of data indicative of objects present in a first-field-of-view. The first-frame is characterized by a first-time-stamp, the subsequent-frame of data characterized by a subsequent-time-stamp different from the first-time-stamp. The second-sensor is configured to output a second-frame of data indicative of objects present in a second-field-of-view that overlaps the first-field-of-view. The second-frame is characterized by a second-time-stamp temporally located between the first-time-stamp and the subsequent-time-stamp. The controller-circuit is configured to synthesize an interpolated-frame from the first-frame and the subsequent-frame. The interpolated-frame is characterized by an interpolated-time-stamp that corresponds to the second-time-stamp. The controller-circuit fuses the interpolated-frame with the second-frame to provide a fused-frame of data characterized by the interpolated-time-stamp, and operates the host-vehicle in accordance with the fused-frame.

Computational model for analyzing images of a biological specimen

A method of analyzing images of a biological specimen using a computational model is described, the method including processing a cell image of the biological specimen and a phase contrast image of the biological specimen using the computational model to generate an output data. The cell image is a composite of a first brightfield image of the biological specimen at a first focal plane and a second brightfield image of the biological specimen at a second focal plane. The method also includes performing a comparison of the output data and a reference data and refining the computational model based on the comparison of the output data and the reference data. The method also includes thereafter processing additional image pairs according to the computational model to further refine the computational model based on comparisons of additional output data generated by the computational model to additional reference data.

SURROGATE MODEL FOR TIME-SERIES MODEL INTERPRETATION
20230342671 · 2023-10-26 ·

Provided is a system and method which build a composite time-series machine learning model including a core model and a debrief model that includes a combination of the core model and a surrogate model. In one example, the method may include executing the plurality of models on test data and determining accuracy values and interpretability toughness values for the plurality models, selecting a most accurate model as a core model based on the accuracy values and select a most interpretable model as a surrogate model from among other models remaining in the plurality of models based on the interpretability toughness values, building a composite model comprising the core model, the surrogate model, and instructions for generating a debrief model for debriefing the core model based on a combination of the core model and the surrogate model, and storing the composite model within the memory.

Image fusion for autonomous vehicle operation
11823460 · 2023-11-21 · ·

Devices, systems and methods for fusing scenes from real-time image feeds from on-vehicle cameras in autonomous vehicles to reduce redundancy of the information processed to enable real-time autonomous operation are described. One example of a method for improving perception in an autonomous vehicle includes receiving a plurality of cropped images, wherein each of the plurality of cropped images comprises one or more bounding boxes that correspond to one or more objects in a corresponding cropped image; identifying, based on the metadata in the plurality of cropped images, a first bounding box in a first cropped image and a second bounding box in a second cropped image, wherein the first and second bounding boxes correspond to a common object; and fusing the metadata corresponding to the common object from the first cropped image and the second cropped image to generate an output result for the common object.

SYSTEMS AND METHODS FOR AUTOMATED DATA PROCESSING USING MACHINE LEARNING FOR VEHICLE LOSS DETECTION

A data processing system comprising: inputting a tiled image of a vehicle including four different angle views of the vehicle combined into a single image to a first machine learning model (e.g. CNN), the model trained based on historical image data to predict a first likelihood of total loss vehicle; inputting a multi-fusion of images each into a second set of machine learning models; the multi-fusion of images including a set of separate and distinct images for each of the views input separately into the second set of machine learning models, and extracting features to predict a second likelihood of total loss vehicle; inputting tabular data relating to the vehicle into a third machine learning model to predict a third likelihood of total loss vehicle for the vehicle; and aggregating the first, second and third likelihood of total loss vehicle to determine the overall likelihood of total loss.

ADJUSTING MENTAL STATE TO IMPROVE TASK PERFORMANCE
20230252315 · 2023-08-10 ·

A method of adjusting mental state includes acquiring video data of an individual, extracting image data and audio data from the video data, extracting semantic text data from the audio data, identifying a first set of features, predicting a baseline mental state, identifying a target mental state, and simulating a predicted path from the baseline mental state to the target mental state. The baseline mental state is predicted based on the first set of features. The predicted path is simulated using a multidimensional mental state model, a plurality of actions, and a first computer-implemented machine learning model. The predicted path comprises one or more actions of the plurality of actions and corresponding changes to at least one of first and second dimensions of the multidimensional mental state model. An indication of the one or more actions of the predicted path is output to the individual.

Three-dimensional object reconstruction method and apparatus

A three-dimensional object reconstruction method, applied to a terminal device or a server, is provided. The method includes obtaining a plurality of video frames of an object; determining three-dimensional location information of key points of the object in the plurality of video frames and physical meaning information of the key points, the physical meaning information indicating respective positions of the object; determining a correspondence between the key points having the same physical meaning information in the plurality of video frames; and generating a three-dimensional object according to the correspondence and the three-dimensional location information of the key points.

System and method to fuse multiple sources of optical data to generate a high-resolution, frequent and cloud-/gap-free surface reflectance product

Aspects of the subject disclosure may include, for example, performing, by a processing system, image fusion using two or more groups of images to generate predicted images, wherein each group of the two or more groups has one of a different resolution, a different frequency temporal pattern or a combination thereof than another of the two or more groups. Gap filling can be performed by the processing system to correct images of the two or more groups. Additional embodiments are disclosed.

Utilizing computer vision and machine learning models for determining utilization metrics for a space

In some implementations, a device may receive image data identifying images of a space with racks and objects stored on the racks. The device may receive location data identifying location coordinates associated with the images. The device may process the image data and the location data to generate a merged point cloud identifying the racks and the objects in the space. The device may process the image data to generate mask data identifying at least a first mask for the racks and a second mask for the objects. The device may process the location data, the merged point cloud, and the mask data to generate a semantic point cloud identifying the racks and the objects in the space. The device may process the semantic point cloud, with a computer vision model, to calculate utilization metrics for the space.