Patent classifications
G06T2207/30261
Vehicle external environment recognition apparatus
A vehicle external environment recognition apparatus to be applied to a vehicle includes one or more processors and one or more memories configured to be coupled to the one or more processors. The one or more processors are configured to: calculate three-dimensional positions of respective blocks in a captured image; group the blocks to put any two or more of the blocks that have the three-dimensional positions differing from each other within a predetermined range in a group and thereby determine three-dimensional objects; identify each of a preceding vehicle of the vehicle and a sidewall on the basis of the determined three-dimensional objects; and track the preceding vehicle. The one or more processors are configured to determine, upon tracking the preceding vehicle, whether the preceding vehicle to track is to be hidden by the sidewall on the basis of a border line between a blind region and a viewable region.
DRIVABLE AREA DETECTION METHOD, COMPUTER DEVICE, STORAGE MEDIUM, AND VEHICLE
The disclosure provides a drivable area detection method, a computer device, a storage medium, and a vehicle, the method of the disclosure includes: obtaining three-dimensional point clouds of a driving environment of a vehicle; estimating a ground height of the current environment based on the three-dimensional point clouds of the current environment by using a ground height estimation model based on a convolutional neural network; determining, based on the ground height, non-ground point clouds not belonging to the ground in the three-dimensional point clouds; performing obstacle detection on the non-ground point clouds to obtain one or more obstacles; and determining a drivable area in the driving environment of the vehicle based on a position of an obstacle.
Moving robot with improved identification accuracy of carpet
There is provided a moving robot including a light projector, an image sensor and a processing unit. The light projector projects a vertical light segment and a horizontal light segment toward a moving direction. The image sensor captures, toward the moving direction, an image frame containing a first light segment image associated with the vertical light segment and a second light segment image associated with the horizontal light segment. The processing unit recognizes a plush carpet in the moving direction when a vibration intensity of the second light segment image is higher than a predetermined threshold, and an obstacle height calculated according to the first light segment image is larger than a height threshold.
Track confidence model
Techniques for determining an output from a plurality of sensor modalities are discussed herein. Features from a radar sensor, a lidar sensor, and an image sensor may be input into respective models to determine respective intermediate outputs associated with a tracks associated with an object and associated confidence levels. The Intermediate outputs from a radar model, a lidar model, and an vision model may be input into a fused model to determine a fused confidence level and fused output associated with the track. The fused confidence level and the individual confidence levels are compared to a threshold to generate the track to transmit to a planning system or prediction system of an autonomous vehicle. Additionally, a vehicle controller can control the autonomous vehicle based on the track and/or on the confidence level(s).
OBJECT TRACKING AND TIME-TO-COLLISION ESTIMATION FOR AUTONOMOUS SYSTEMS AND APPLICATIONS
In various examples, systems and methods for tracking objects and determining time-to-collision values associated with the objects are described. For instance, the systems and methods may use feature points associated with an object depicted in a first image and feature points associated with a second image to determine a scalar change associated with the object. The systems and methods may then use the scalar change to determine a translation associated with the object. Using the scalar change and the translation, the systems and methods may determine that the object is also depicted in the second image. The systems and methods may further use the scalar change and a temporal baseline to determine a time-to-collision associated with the object. After performing the determinations, the systems and methods may output data representing at least an identifier for the object, a location of the object, and/or the time-to-collision.
System and method for training an adapter network to improve transferability to real-world datasets
Systems and methods for training an adapter network that adapts a model pre-trained on synthetic images to real-world data are disclosed herein. A system may include a processor and a memory in communication with the processor and having machine-readable that cause the processor to output, using a neural network, a predicted scene that includes a three-dimensional bounding box having pose information of an object, generate a rendered map of the object that includes a rendered shape of the object and a rendered surface normal of the object, and train the adapter network, which adapts the predicted scene to adjust for a deformation of the input image by comparing the rendered map to the output map acting as a ground truth.
Method and apparatus for recognizing object
A method and apparatus for recognizing an object are provided, including extracting a feature from an input image and generating a feature map in a neural network. In parallel with the generating of the feature map, a region of interest (ROI) corresponding to an object of interest is extracted from the input image, and a number of object candidate regions used to detect the object of interest is determined based on a size of the ROI. The object of interest is recognized from the ROI based on the number of object candidate regions in the neural network.
Methods for determining three-dimensional (3D) plane information, methods for displaying augmented reality display information and corresponding devices
The present invention provides a method for determining a plane, a method for displaying Augmented Reality (AR) display information and corresponding devices. The method comprises the steps of: performing region segmentation and depth estimation on multimedia information; determining, according to the result of region segmentation and the result of depth estimation, 3D plane information of the multimedia information; and, displaying AR display information according to the 3D plane information corresponding to the multimedia information. With the method for determining a plane, the method for displaying AR display information and the corresponding devices provided by the present invention, virtual display information can be added into a 3D plane, the reality of the display effect of enhanced display can be improved, and the user experience can be improved.
Occlusion aware planning and control
Techniques are discussed for controlling a vehicle, such as an autonomous vehicle, based on occluded areas in an environment. An occluded area can represent areas where sensors of the vehicle are unable to sense portions of the environment due to obstruction by another object. An occlusion grid representing the occluded area can be stored as map data or can be dynamically generated. An occlusion grid can include occlusion fields, which represent discrete two- or three-dimensional areas of driveable environment. An occlusion field can indicate an occlusion state and an occupancy state, determined using LIDAR data and/or image data captured by the vehicle. An occupancy state of an occlusion field can be determined by ray casting LIDAR data or by projecting an occlusion field into segmented image data. The vehicle can be controlled to traverse the environment when a sufficient portion of the occlusion grid is visible and unoccupied.
TRACKING USERS ACROSS IMAGE FRAMES USING FINGERPRINTS OBTAINED FROM IMAGE ANALYSIS
Systems and methods are disclosed herein for tracking a vulnerable road user (VRU) regardless of occlusion. In an embodiment, the system captures a series of images including the VRU, and inputs each of the images into a detection model. The system receives a bounding box for each of the series of images of the VRU as output from the detection model. The system inputs each bounding box into a multi-task model, and receives as output from the multi-task model an embedding for each bounding box. The system determines, using the embeddings for each bounding box across the series of images, an indication of which of the embeddings correspond to the VRU.