DUAL SENSING METHOD OF OBJECT AND COMPUTING APPARATUS FOR OBJECT SENSING
20230375694 · 2023-11-23
Assignee
Inventors
Cpc classification
G01S7/2955
PHYSICS
International classification
G01S13/86
PHYSICS
G01S13/72
PHYSICS
Abstract
A dual sensing method of an object and a computing apparatus for object sensing are provided. In the method, a first clustering is performed on radar information including a plurality of sensing points and is for determining a first part of the sensing points to be an object. A second clustering is performed on a result of the first clustering and is for determining that the sensing points determined to be the object in the result of the first clustering are located in a region of a first density. A result of the second clustering is taken as a region of interest. According to the region of interest, object detection and/or object tracking is performed on combined information formed by combining the radar information and an image, whose respective detection region and photographing region are overlapped.
Claims
1. A dual sensing method of an object, comprising: performing a first clustering on radar information, wherein the radar information comprises a plurality of sensing points, and the first clustering is for determining a first part of the sensing points to be an object; performing a second clustering on a result of the first clustering, wherein the second clustering is for determining that in response to the first part of the sensing points being the object in the result of the first clustering being located in a region of a first density; taking a result of the second clustering as a region of interest; and performing at least one of object detection and object tracking on combined information according to the region of interest, wherein the combined information is formed by combining the radar information and an image, and a detection region of the radar information is overlapped with a photographing region of the image.
2. The method of claim 1, wherein a point number of the second clustering is less than a point number of the first clustering, and a range of the second clustering is smaller than a range of the first clustering; or, a point number of the second clustering is greater than a point number of the first clustering, and a range of the second clustering is larger than a range of the first clustering.
3. The method of claim 1, wherein taking the result of the second clustering as the region of interest comprises: taking a center of the sensing points in the region of the first density as a center of the region of interest.
4. The method of claim 1, wherein a result of the second clustering is a region that does not have the first density, and the dual sensing method further comprises: using a fixed region as the region of interest, wherein the fixed region is predefined.
5. The method of claim 1, further comprising: converting coordinates of the sensing points in the radar information into a plurality of image coordinates for generating converted radar information, wherein the converted radar information comprises the radar information on the image coordinates; and combining the converted radar information and the image for generating the combined information.
6. The method of claim 5, wherein converting the coordinates of the sensing points in the radar information into the image coordinates comprises: setting a coordinate relationship between a radar world coordinate and a camera world coordinate according to a relative position of a radar and an image capturing device, wherein the radar is for obtaining the radar information, and the image capturing device is for obtaining the image; and determining the image coordinates according to the coordinate relationship, wherein the sensing points in the radar information are sequentially converted from a radar coordinate, the radar world coordinate, the camera world coordinate, and a camera coordinate into one of the image coordinates.
7. The method of claim 5, wherein a first sensing type in the radar information comprises at least one of a relative distance, a relative velocity, and an intensity, a second sensing type in the image comprises a plurality of color types defined by a color space, and combining the converted radar information and the image comprises: combining the converted radar information and the image into a sensing image of a plurality of channels according to the first sensing type and the second sensing type, wherein the channels respectively correspond to the color types and at least one of the relative distance, the relative velocity, and the intensity.
8. The method of claim 7, wherein performing at least one of the object detection and the object tracking on the combined information according to the region of interest comprises: inputting the sensing image of the channels into a detection model for the object detection to output a prediction result of the detection model; and performing the object tracking according to the prediction result.
9. The method of claim 1, wherein performing the at least one of the object detection and the object tracking on the combined information according to the region of interest comprises: determining an overlapped region of a first framed region in a prediction result of the object detection and a second framed region of the object tracking, wherein the overlapped region is an intersection of the first framed region and the second framed region; and determining whether the object in the first framed region and the second framed region is the same according to a ratio of the overlapped region to a total region, wherein the total region is a union of the first framed region and the second framed region.
10. The method of claim 1, wherein performing the at least one of the object detection and the object tracking on the combined information according to the region of interest comprises: detecting whether there is the object in a fixed region, wherein the fixed region is predefined, and a second part of the fixed region is not overlapped with the result of the second clustering; and updating the region of interest according to a detection result of the fixed region.
11. A computing apparatus for object sensing, comprising: a memory, for storing a program code; and a processor, coupled to the memory, and for loading and executing the program code to: perform a first clustering on radar information, wherein the radar information comprises a plurality of sensing points, and the first clustering is for determining a first part of the sensing points to be an object; perform a second clustering on a result of the first clustering, wherein the second clustering is for determining that in response to the first part of the sensing points being the object in the result of the first clustering being located in a region of a first density; take a result of the second clustering as a region of interest; and perform at least one of object detection and object tracking on combined information according to the region of interest, wherein the combined information is formed by combining the radar information and an image, and a detection region of the radar information is overlapped with a photographing region of the image.
12. The computing apparatus of claim 11, wherein a point number of the second clustering is less than a point number of the first clustering, and a range of the second clustering is smaller than a range of the first clustering; or, a point number of the second clustering is greater than a point number of the first clustering, and a range of the second clustering is larger than a range of the first clustering.
13. The computing apparatus of claim 11, wherein the processor further: takes a center of sensing points in the region of the first density as a center of the region of interest.
14. The computing apparatus of claim 11, wherein a result of the second clustering is a region that does not have the first density, and the processor further: uses a fixed region as the region of interest, wherein the fixed region is predefined.
15. The computing apparatus of claim 11, wherein the processor further: converts coordinates of the sensing points in the radar information into a plurality of image coordinates for generating converted radar information, wherein the converted radar information comprises the radar information on the image coordinates; and combines the converted radar information and the image for generating the combined information.
16. The computing apparatus of claim 15, wherein the processor further: sets a coordinate relationship between a radar world coordinate and a camera world coordinate according to a relative position of a radar and an image capturing device, wherein the radar obtains the radar information, and the image capturing device obtains the image; and determines the image coordinates according to the coordinate relationship, wherein the sensing points in the radar information are sequentially converted from a radar coordinate, the radar world coordinate, the camera world coordinate, and a camera coordinate into one of the image coordinates.
17. The computing apparatus of claim 15, wherein a first sensing type in the radar information comprises at least one of a relative distance, a relative velocity, and an intensity, a second sensing type in the image comprises a plurality of color types defined by a color space, and the processor further: combines the converted radar information and the image into a sensing image of a plurality of channels according to the first sensing type and the second sensing type, wherein the channels respectively correspond to the color types and at least one of the relative distance, the relative velocity, and the intensity.
18. The computing apparatus of claim 17, wherein the processor further: inputs the sensing image of the channels into a detection model for the object detection to output a prediction result of the detection model; and performs the object tracking according to the prediction result.
19. The computing apparatus of claim 11, wherein the processor further: determines an overlapped region of a first framed region in a prediction result of the object detection and a second framed region of the object tracking, wherein the overlapped region is an intersection of the first framed region and the second framed region; and determines whether the object in the first framed region and the second framed region is the same according to a ratio of the overlapped region to a total region, wherein the total region is a union of the first framed region and the second framed region.
20. The computing apparatus of claim 11, wherein the processor further: detects whether there is the object in a fixed region, wherein the fixed region is predefined, and a second part of the fixed region is not overlapped with the result of the second clustering; and updates the region of interest according to a detection result of the fixed region.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] The accompanying drawings are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention.
[0010]
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
DESCRIPTION OF THE EMBODIMENTS
[0029]
[0030] The radar 20 is a device that transmits radio waves, light, or sound waves into a space and detects echoes reflected by objects in the space. In one embodiment, radar information such as a relative position, a relative velocity, a direction, and/or an intensity may be determined according to the echoes.
[0031] The image capturing device 30 may be a camera, a video camera, a monitor, a smart phone, or a road side unit (RSU) with an image capturing function, and accordingly captures images within a specified field of view.
[0032] The computing apparatus 100 may be a smart phone, a tablet computer, a server, a cloud host, or a computer host. The computing apparatus 100 includes (but is not limited to) a memory 110, a communication transceiver 130, and a processor 150.
[0033] The memory 110 may be any type of fixed or movable random access memory (RAM), read only memory (ROM), flash memory (flash memory), conventional hard disk drive (HDD), solid-state drive (SSD) or similar components. In an embodiment, the memory 110 is configured to store program codes, software modules, configurations, data (for example, images, radar information, sensing results, etc.) or files, and embodiments thereof will be described in detail later.
[0034] The communication transceiver 130 may be a communication transceiver, a serial communication interface (such as RS-232) supporting fourth generation (4G) or other generation mobile communication, Wi-Fi, Bluetooth, infrared, radio frequency identification (RFID), Ethernet, optical fiber network, etc, or may be universal serial bus (USB), thunderbolt or other communication transmission interfaces. In an embodiment of the invention, the communication transceiver 130 is configured to transmit data to or receive data from other electronic devices (for example, the radar 20, the image capturing device 30).
[0035] The processor 150 is coupled to the memory 110 and the communication transceiver 130. The processor 150 may be a central processing unit (CPU), a graphics processing unit (GPU), or other programmable general-purpose or special-purpose microprocessors, a digital signal processor (DSP), a programmable controller, a field programmable gate array (FPGA), an application-specific integrated circuit (ASIC), a neural network accelerator or other similar elements or a combination of the above elements. In one embodiment, the processor 150 is configured to execute all of or a part of tasks of the computing apparatus 100, and may load and execute each program code, software module, file, and data stored in the memory 110. In some embodiments, the functions of the processor 150 may be implemented by software or chips.
[0036] In some embodiments, either of the radar 20 and the image capturing device 30 may be integrated with the computing apparatus 100 to form an independent device.
[0037] Hereinafter, the method described in the embodiment of the invention will be described with reference to various devices, components, and modules in the sensing system 1. Each process of the method may be adjusted according to an actual implementation situation, which is not limited by the invention.
[0038]
[0039] For example,
[0040] The first clustering is used to determine that a first part of the sensing points to be one or a plurality of objects. The first part may be all of or a part of the sensing points in the radar information. Namely, to determine whether a part of or all of the sensing points have an object. The clustering method (which also known as a grouping method) used in the first clustering may be a K-means algorithm, a Gaussian mixture model (GMM), a mean-shift algorithm, a hierarchical clustering method, a spectral clustering algorithm, a density-based spatial clustering of applications with noise (DBSCAN) algorithm or other clustering/grouping algorithms.
[0041] Taking DBSCAN as an example, a distance parameter and a point number (or referred to as the least number of points in the group) parameter may be set. Then, each sensing point is taken as a center point and the distance parameter is taken as a radius to form a circle. If the number of the sensing points in the circle is greater than the point number parameter, the sensing point serving as the center point of the circle is taken as a core point, and the other sensing points in the circle are marked and connected. If the number of the sensing points in the circle is not greater than the point number parameter, the sensing point serving as the center point of the circle is not the core point, and the other sensing points in the circle will not be connected. Then, the connected sensing points are assigned to a same group. While other outlier points (i.e., the unconnected sensing points) may be assigned to different groups.
[0042] It should be noted that, in some applications, compared with the K-means algorithm, DBSCAN is more suitable for applications in the field of automation. However, the embodiment of the invention does not limit the type of the clustering method. Ultimately, those sensing points that form a group may be regarded as an object, while the sensing points that are not assigned to any group are not regarded as an object.
[0043] For example,
[0044] Referring to
[0045] In an embodiment, if the DBSCAN algorithm is used, a point number (for example, the aforementioned point number parameter) of the second clustering is less than a point number of the first cluster, and a range of the second clustering (for example, the aforementioned distance parameter) is smaller than a range of the first clustering. In an embodiment, the point number (for example, the aforementioned point number parameter) of the second clustering may also be greater than the point number of the first clustering, and the range of the second clustering (for example, the aforementioned distance parameter) is greater than the range of the first clustering.
[0046] For example,
[0047] It should be noted that there may be other changes in the parameters used for clustering. In addition, the results of the first clustering and the second clustering may be converted into image coordinates, i.e., the sensing points in the group may be converted from radar coordinates to image coordinates. The coordinate conversion will be described in detail in subsequent embodiments.
[0048] Referring to
[0049] On the other hand, if the result of the second clustering is a region that does not have the first density or does not form a group, the processor 150 takes one or a plurality of fixed regions as (fixed) regions of interest. These fixed regions are predefined. For example, a region derived from previous detection or tracking results, or any designated region.
[0050] For example,
[0051] Referring to
[0052] In order to combine characteristics of radar sensing and image sensing, two types of information need to be integrated/combined first.
[0053] First, the processor 150 may obtain a setting posture of the radar 20. Such posture may be defined by rotation angles on three vertical axes (for example, pitch, yaw, and horizontal (or roll) angles).
[0054]
[0055] To be specific, regarding the conversion from the radar world coordinate O.sub.rw-x.sub.rwy.sub.rwz.sub.rw to the camera world coordinate O.sub.cw-x.sub.cwy.sub.cwz.sub.cw, since the initial radar information obtained by the radar 20 is only two-dimensional information, only the two-dimensional relative distance between the radar 20 and the object may be obtained. In order to obtain the radar world coordinates, the processor 150 may determine a yaw angle of the radar 20 and a height difference between the radar 20 and the object.
[0056]
y.sub.r_new=√{square root over (y.sub.r.sup.2−Height.sub.radar_object.sup.2)} (1)
[0057]
x.sub.rw=x.sub.r*cos β+y.sub.r_new*sin β (2)
y.sub.rw=(−x.sub.r*sin β)+y.sub.r_new*cos β (3)
[0058] In an embodiment, the processor 150 may set a coordinate relationship between the radar world coordinates and the camera world coordinates according to a relative position of the radar 20 and the image capturing device 30. Such coordinate relationship may be derived from equations (4), (5):
x.sub.cw=x.sub.rw−L.sub.x (4)
y.sub.cw=y.sub.rw−L.sub.y (5)
[0059] L.sub.x and L.sub.y are the horizontal and vertical distances (i.e., the relative position) between the radar 20 and the image capturing device 30.
[0060] Then, the processor 150 may determine the image coordinates corresponding to each sensing point of the radar information according to the coordinate relationship. Any sensing point in the radar information is sequentially converted from the radar coordinates, the radar world coordinates, the camera world coordinates and the camera coordinates into image coordinates. To be specific, the conversion from radar coordinates and radar world coordinates to camera world coordinates is as described above, and detail thereof is not repeated. Then, the conversion from camera world coordinates to camera coordinates may be obtained from equation (6):
[0061] θ is a pitch angle, and H is a height of the image sensing device 30 relative to the ground.
[0062] Then, considering a yaw angle β,
x.sub.c_new=x.sub.c*cos β+z.sub.c*sin β (7)
y.sub.c_new=y.sub.c (8)
z.sub.c_new=(−x.sub.c*sin β)+y.sub.c*cos β (9)
[0063] It should be noted that if there is no yaw angle β, the camera coordinate conversion may be ignored.
[0064] Then, the conversion from camera coordinates to image coordinates, may be obtained from following equations (10) and (11):
[0065] It should be noted that the invention is not limited to the positional relationship between the radar 20 and the image capturing device 30 as shown in
[0066] In an embodiment, neither the radar 20 nor the image capturing device 30 senses forward. For example, the horizontal angle is not zero. Therefore, coordinate axis conversion may be performed. For example,
[0067] α is an included angle between the new coordinate axis and the original coordinate axis.
[0068] After the radar information is converted to the image coordinates, the processor 150 may combine the converted radar information (including the radar information on the image coordinates) and the image to generate combined information.
[0069] It should be noted that a feature map used by a deep learning model may distinguish the channels according to the color types. Taking RGB as an example, an image IM includes three channels. In order to jointly input the radar information and the image to the deep learning model, the converted radar information may be differentiated into multiple channels according to the sensing types. For example, the converted radar information is a radar image RIM after the radar information is mapped to the image coordinates. The radar image RIM includes three channels of a relative distance, a relative velocity and an intensity. In step S520, the processor 150 concatenates the image IM and the radar image RIM to form a multi-channel array MCA.
[0070] It should be noted that, in other embodiments, the radar image RIM may only take any two or any one of the three channels to be combined with the image IM. In addition, it is assumed that a pixel value of the radar image RIM is limited to 0-255, an upper limit value of each sensing type may be set. For example, the maximum relative distance is 90 meters, the maximum relative velocity is 33.3 meters per second, and the maximum intensity is 100 dBw. The conversion of the relative distance, the relative velocity and the intensity in the radar image RIM is as follows:
D=d*2.83 (13)
V=|v|*7.65 (14)
I=(10 log.sub.10(10.sup.SNR*0.01*(P.sub.Noise*0.1)))*2.55 (15)
[0071] d is an original relative distance, D is a new relative distance, v is an original relative velocity, Vis a new relative velocity, i is an original intensity, and I is a new intensity. Moreover, SNR is a signal-to-noise ratio, and P.sub.Noise is a noise power. If the new value (for example, the new relative distance, the new relative speed, or the new intensity) still exceeds 255, the new value is directly set to 255. After the new value is determined, the radar image RIM and the image IM may be combined into the multi-channel array MCA. In addition, according to an actual requirement, a size of each channel in the multi-channel array MCA may be adjusted to be consistent.
[0072] Then, object detection and object tracking are described.
[0073] Regarding object detection (step S1520), the processor 150 may input the sensing image of the channels (for example, the multi-channel array MCA of
[0074] The region of interest may be used for object detection. As shown in
[0075] Regarding (multi) object tracking (step S1530), a main function of the object tracking is to track a same object framed by front and rear image frames. There are also many algorithms for object tracking. For example, Kalman filter, optical flow, SORT (simple online and realtime tracking) or deep SORT, joint detection and embedding (JDE), etc.
[0076] The processor 150 may perform object tracking according to the prediction result of the object detection. For example, the prediction result of the object detection may be used as an input for the object tracking.
[0077] In an embodiment, the prediction result may be preprocessed. The processor 150 may determine an overlapped region of a first framed region (one or a plurality of objects are framed) in the prediction result of the object detection and a second framed region (one or a plurality of objects are framed) of the object tracking. The overlapped region is an intersection of the first framed region and the second framed region. For example,
[0078] The processor 150 may determine whether the objects in the first framed region and the second framed region are the same according to a ratio of the overlapped region to a total region. This total region is a union of the first framed region and the second framed region. The intersection of unit is a ratio of the overlapped region to the total region. Namely, a result of dividing an area of the overlapped region by an area of the total region.
[0079] Taking
[0080] In an embodiment, the processor 150 may detect whether there is an object in one or a plurality of fixed regions (step S1540). The fixed region is predefined. A second part of the one or plurality of fixed regions is not overlapped with the region of the first density or the region forming the group in the result of the second clustering. The processor 150 may update the region of interest according to a detection result of the fixed region. If an object is detected in the fixed region, the processor 150 may update the region of interest. For example, the region of interest is formed based on a position of the object in the image. If no object is detected in the fixed region, the processor 150 may disable/ignore/not update the region of interest.
[0081] The processor 150 may associate the prediction result and/or the tracking result with the radar information (step S1550). Due to heterogeneity of the radar information and the image, the sensing points detected by the radar 20 may be mapped to the framed region of the object detection and/or the object tracking through data association. In an embodiment, the processor 150 may use a combination optimization algorithm to pair one or more sensing points of the radar information with one or more target objects (framed) in the sensing image. The combination optimization algorithm is, for example, a Hungarian algorithm, a K-M algorithm (Kuhn-Munkres algorithm) or a dual method. Taking the Hungarian algorithm as an example, a Euclidean distance and an Mahalanobis distance may be applied to the data association. Since the radar information and the image use different dimensions, the Mahalanobis distance is more suitable for the association of heterogeneous data, but the invention is not limited thereto.
[0082] Finally, a result may be output (step S1560). For example, a prediction result, a tracking result and/or associated data.
[0083] In summary, in the dual sensing method and the computing apparatus for object sensing according to the embodiments of the invention, the radar information is performed with two times of clustering to determine the dynamic region of interest, and object detection and/or object tracking is performed on the combined information of the radar and the image capturing device according to the dynamic region of interest. Accordingly, identification accuracy of object detection and/or object tracking is improved through heterogeneous data combination. In addition, the embodiments of the invention may be further applied to intelligent transportation applications such as object movement trajectory analysis, traffic flow analysis, and visual blind spot approaching vehicle warning, etc.
[0084] It will be apparent to those skilled in the art that various modifications and variations can be made to the disclosed embodiments without departing from the scope or spirit of the invention. In view of the foregoing, it is intended that the invention covers modifications and variations provided they fall within the scope of the following claims and their equivalents.