G06V10/803

Three-dimensional object detection

Generally, the disclosed systems and methods implement improved detection of objects in three-dimensional (3D) space. More particularly, an improved 3D object detection system can exploit continuous fusion of multiple sensors and/or integrated geographic prior map data to enhance effectiveness and robustness of object detection in applications such as autonomous driving. In some implementations, geographic prior data (e.g., geometric ground and/or semantic road features) can be exploited to enhance three-dimensional object detection for autonomous vehicle applications. In some implementations, object detection systems and methods can be improved based on dynamic utilization of multiple sensor modalities. More particularly, an improved 3D object detection system can exploit both LIDAR systems and cameras to perform very accurate localization of objects within three-dimensional space relative to an autonomous vehicle. For example, multi-sensor fusion can be implemented via continuous convolutions to fuse image data samples and LIDAR feature maps at different levels of resolution.

Continuous Convolution and Fusion in Neural Networks

Systems and methods are provided for machine-learned models including convolutional neural networks that generate predictions using continuous convolution techniques. For example, the systems and methods of the present disclosure can be included in or otherwise leveraged by an autonomous vehicle. In one example, a computing system can perform, with a machine-learned convolutional neural network, one or more convolutions over input data using a continuous filter relative to a support domain associated with the input data, and receive a prediction from the machine-learned convolutional neural network. A machine-learned convolutional neural network in some examples includes at least one continuous convolution layer configured to perform convolutions over input data with a parametric continuous kernel.

INTERSECTION DETECTION AND CLASSIFICATION IN AUTONOMOUS MACHINE APPLICATIONS

In various examples, live perception from sensors of a vehicle may be leveraged to detect and classify intersections in an environment of a vehicle in real-time or near real-time. For example, a deep neural network (DNN) may be trained to compute various outputs—such as bounding box coordinates for intersections, intersection coverage maps corresponding to the bounding boxes, intersection attributes, distances to intersections, and/or distance coverage maps associated with the intersections. The outputs may be decoded and/or post-processed to determine final locations of, distances to, and/or attributes of the detected intersections.

GENERATING SYNTHETIC GROUND-LEVEL DATA BASED ON GENERATOR CONDITIONED FOR PARTICULAR AGRICULTURAL AREA
20230169764 · 2023-06-01 ·

Implementations are described herein for conditioning a generator machine learning model to generate synthetic ground-level data that is biased towards a given agricultural area based on high-elevation images. In various implementations, a plurality of ground-level images may be accessed that depict crops within a specific agricultural area. A first set of high-elevation image(s) may also be accessed that depict the specific agricultural area. The ground-level images and the first set of high-elevation image(s) may be used to condition an air-to-ground generator machine learning model to generate synthetic ground-level data from high-elevation imagery depicting the specific agricultural area. A second set of high-elevation image(s) that depict a specific sub-region of the specific agricultural area may then be accessed and processed using the air-to-ground generator machine learning model to generate synthetic ground-level data that infers one or more conditions of the specific sub-region of the specific agricultural area.

Selective amplification of speaker of interest
11496842 · 2022-11-08 · ·

A system may include a camera configured to capture images from an environment of a user and a microphone configured to capture sounds from an environment of the user. The system may also include a processor programmed to: receive the images; identify a representation of a first individual and a representation of a second individual in the images; receive, from the microphone, a first audio signal associated with a voice of the first individual and a second audio signal associated with a voice of the second individual; detect an amplification criteria indicative of a voice amplification priority between the first individual and the second individual; selectively amplify the first audio signal relative to the second audio signal when the amplification criteria indicates that the first individual has voice amplification priority over the second individual; and cause transmission of the selectively amplified first audio signal to a hearing interface device.

Information processing device, information processing method, program, recording medium, and camera system
11494942 · 2022-11-08 · ·

An information processing device includes: an acquisition unit that acquires feature information of a target depicted in images; a storage unit that stores registration information containing feature information of registered targets; and a distinction unit that distinguishes, on a basis of a result of identification of the feature information acquired by the acquisition unit and the feature information contained in the registration information, one registered target of the registered targets, the one registered target corresponding to the target in the images. The registration information contains zip codes of sites relating to the registered targets. The distinction unit identifies a zip code of a site relating to the target in the images and zip codes contained in registration information with each other, and distinguishes one registered target corresponding to the target in the images using the result of identification of the feature information and using the identification of the zip codes.

Aligning and blending image data from multiple image sensors

Described are systems and methods for generating high dynamic range (“HDR”) images based on image data obtained from different image sensors for use in detecting events and monitoring inventory within a materials handling facility. The different image sensors may be aligned and calibrated and the image data from the sensors may be generated at approximately the same time but at different exposures. The image data may then be preprocessed, matched, aligned, and blended to produce an HDR image that does not include overexposed regions or underexposed regions.

Apparatus and methodology of road condition classification using sensor data

Methods and systems are provided for controlling a vehicle action based on a condition of a road on which a vehicle is travelling, including: obtaining first sensor data as to a surface of the road from one or more first sensors onboard the vehicle; obtaining second sensor data from one or more second sensors onboard the vehicle as to a measured parameter pertaining to operation of the vehicle or conditions pertaining thereto; generating a plurality of road surface channel images from the first sensor data, wherein each road surface channel image captures one of a plurality of facets of properties of the first sensor data; classifying, via a processor using a neural network model, the condition of the road on which the vehicle is travelling, based on the measured parameter and the plurality of road surface channel images; and controlling a vehicle action based on the classification of the condition of the road.

Object detection method, electronic apparatus and object detection system
11263785 · 2022-03-01 · ·

An object detection method, an electronic apparatus and an object detection system are provided. The method is adapted to the electronic apparatus and includes the following steps. A first image is obtained. A geometric transformation operation is performed on the first image to obtain at least one second image. The first image and the at least one second image are combined to generate a combination image. The combination image including the first image and the at least one second image is inputted into a trained deep learning model to detect a target object.

METHOD AND SYSTEM FOR OBJECT CENTRIC STEREO IN AUTONOMOUS DRIVING VEHICLES

The present teaching relates to a method, system, medium, and implementation of processing image data in an autonomous driving vehicle. Sensor data acquired by one or more types of sensors deployed on the vehicle are continuously received. The sensor data provide different information about surrounding of the vehicle. Based on a first data set acquired by a first sensor of a first type of the one or more types of sensors at a specific time, an object is detected, where the first data set provides a first type of information about the surrounding of the vehicle. Depth information of the object is then estimated via object centric stereo at object level based on the object detected as well as a second data set acquired by a second sensor of the first type of the one or more types of sensors at the specific time. The second data set provides the first type of information about the surrounding of the vehicle with a different perspective as compared with the first data set.