G06T2207/30261

Geometry-aware instance segmentation in stereo image capture processes
11120280 · 2021-09-14 · ·

A system detects multiple instances of an object in a digital image by receiving a two-dimensional (2D) image that includes a plurality of instances of an object in an environment. For example, the system may receive the 2D image from a camera or other sensing modality of an autonomous vehicle (AV). The system uses a first object detection network to generate a plurality of predicted object instances in the image. The system then receives a data set that comprises depth information corresponding to the plurality of instances of the object in the environment. The data set may be received, for example, from a stereo camera of an AV, and the depth information may be in the form of a disparity map. The system may use the depth information to identify an individual instance from the plurality of predicted object instances in the image.

Vehicle trajectory prediction model with semantic map and LSTM

A system and method for predicting the near-term trajectory of a moving obstacle sensed by an autonomous driving vehicle (ADV) is disclosed. The method applies neural networks such as a LSTM model to learn dynamic features of the moving obstacle's motion based on its past trajectory up to its current position and a CNN model to learn the semantic map features of the driving environment in a portion of an image map. From the learned dynamic features of the moving obstacle and the learned semantic map features of the environment, the method applies a neural network to iteratively predict the moving obstacle's positions for successive time points of a prediction interval. To predict the moving obstacle's position at the next time point from the currently predicted position, the methods may update the learned dynamic features of the moving obstacle based on its past trajectory up to the currently predicted position.

ADAPTIVE MULTIPLE REGION OF INTEREST CAMERA PERCEPTION
20210192231 · 2021-06-24 ·

Autonomous driving systems described herein provide an efficient way to manage camera-based perception by considering the characteristics of captured images. In one example, a camera sensor may capture an image and a processor may determine a first region of interest (ROI) within the image and a second ROI within the image. The processor may generate a first image of the first ROI and a second image of the second ROI. The processor may transmit a control signal based on one or more objects detected in the first ROI and/or one or more objects detected in the second ROI to cause the vehicle to perform an autonomous driving operation.

CROSS-DOMAIN IMAGE COMPARISON METHOD AND SYSTEM

A cross-domain image comparison method and a cross-domain image comparison system are provided. The cross-domain image comparison method includes the following steps. Two videos in cross-domain are obtained. The videos are generated by different types of devices. A plurality of semantic segmentation areas are obtained from one frame of each of the videos. A region of interest pair (ROI pair) is obtained according to moving paths of the semantic segmentation areas in the videos. Two bounding boxes and two central points of the ROI pair are obtained. A similarity between the frames is obtained according to the bounding boxes and the central points.

Object detection based on three-dimensional distance measurement sensor point cloud data

Distance measurements are received from one or more distance measurement sensors, which may be coupled to a vehicle. A three-dimensional (3D) point cloud are generated based on the distance measurements. In some cases, 3D point clouds corresponding to distance measurements from different distance measurement sensors may be combined into one 3D point cloud. A voxelized model is generated based on the 3D point cloud. An object may be detected within the voxelized model, and in some cases may be classified by object type. If the distance measurement sensors are coupled to a vehicle, the vehicle may avoid the detected object.

Objective-based control of an autonomous unmanned aerial vehicle

A technique is described for controlling an autonomous vehicle such as an unmanned aerial vehicle (UAV) using objective-based inputs. In an embodiment, the underlying functionality of an autonomous navigation system is via an application programming interface (API). In such an embodiment, the UAV can be controlled trough specifying a behavioral objective, for example, using a call to the API to set parameters for the behavioral objective. The autonomous navigation system can then incorporate perception inputs such as sensor data from sensors mounted to the UAV and the set parameters using a multi-objective motion planning process to generate a proposed trajectory that most closely satisfies the behavioral objective in view of certain constraints. In some embodiments, developers can utilize the API to build customized applications for utilizing the UAV to capture images. Such applications, also referred to as “skills,” can be developed, shared, and executed to control the behavior of an autonomous UAV and to aid in overall system improvement.

LIDAR POINT SELECTION USING IMAGE SEGMENTATION
20210287387 · 2021-09-16 ·

The subject disclosure relates to techniques for selecting points of an image for processing with LiDAR data. A process of the disclosed technology can include steps for receiving an image comprising a first image object and a second image object, processing the image to place a bounding box around the first image object and the second image object, and processing an image area within the bounding box to identify a first image mask corresponding with a first pixel region of the first image object and a second image mask corresponding with a second pixel region of the second image object. Systems and machine-readable media are also provided.

SYSTEM AND METHOD FOR DETECTING OBJECTS

An object detection system may include an imager configured to generate image data indicative of an environment in which a machine is present, and a sensor configured to generate sensor data indicative of the environment in which the machine is present. The object detection system may further include an object detection controller including one or more object detection processors configured to receive an image signal indicative of the image data, identify an object associated with the image data, and determine a first location of the object relative to the position of the imager. The one or more object detection processors may also be configured to receive a sensor signal indicative of the sensor data, and determine, based at least in part on the sensor signal, the presence or absence of the object at the first location.

LOCALIZATION USING SEMANTICALLY SEGMENTED IMAGES

Techniques are discussed for determining a location of a vehicle in an environment using a feature corresponding to a portion of an image representing an object in the environment which is associated with a frequently occurring object classification. For example, an image may be received and semantically segmented to associate pixels of the image with a label representing an object of an object type (e.g., extracting only those portions of the image which represent lane boundary markings). Features may then be extracted, or otherwise determined, which are limited to those portions of the image. In some examples, map data indicating a previously mapped location of a corresponding portion of the object may be used to determine a difference. The difference (or sum of differences for multiple observations) are then used to localize the vehicle with respect to the map.

Systems and Methods for Object Detection and Motion Prediction by Fusing Multiple Sensor Sweeps into a Range View Representation

Systems and methods for detecting objects and predicting their motion are provided. In particular, a computing system can obtain a plurality of sensor sweeps. The computing system can determine movement data associated with movement of the autonomous vehicle. For each sensor sweep, the computing system can generate an image associated with the sensor sweep. The computing system can extract, using the respective image as input to one or more machine-learned models, feature data from the respective image. The computing system can transform the feature data into a coordinate frame associated with a next time step. The computing system can generate a fused image. The computing system can generate a final fused image. The computing system can predict, based, at least in part, on the final fused representation of the plurality of sensors sweeps from the plurality of sensor sweeps, movement associated with the feature data at one or more time steps in the future.