Systems and methods for characterizing object pose detection and measurement systems
11580667 · 2023-02-14
Assignee
Inventors
- Agastya Kalra (Nepean, CA)
- Achuta Kadambi (Los Altos Hills, CA, US)
- Kartik Venkataraman (San Jose, CA)
Cpc classification
G06T19/20
PHYSICS
G06V20/52
PHYSICS
International classification
Abstract
A method for characterizing a pose estimation system includes: receiving, from a pose estimation system, first poses of an arrangement of objects in a first scene; receiving, from the pose estimation system, second poses of the arrangement of objects in a second scene, the second scene being a rigid transformation of the arrangement of objects of the first scene with respect to the pose estimation system; computing a coarse scene transformation between the first scene and the second scene; matching corresponding poses between the first poses and the second poses; computing a refined scene transformation between the first scene and the second scene based on coarse scene transformation, the first poses, and the second poses; transforming the first poses based on the refined scene transformation to compute transformed first poses; and computing an average rotation error and an average translation error of the pose estimation system based on differences between the transformed first poses and the second poses.
Claims
1. A method for characterizing a pose estimation system, comprising: receiving, from a pose estimation system configured to estimate poses of objects with respect to a reference coordinate system, by a characterization system comprising a processor and a memory, a first plurality of poses of an arrangement of objects in a first scene; receiving, from the pose estimation system, by the characterization system, a second plurality of poses of the arrangement of objects in a second scene, the second scene being a rigid transformation of the arrangement of objects of the first scene with respect to the pose estimation system; computing, by the characterization system, a coarse scene transformation between the first scene and the second scene; matching, by the characterization system, corresponding poses between the first plurality of poses and the second plurality of poses; computing, by the characterization system, a refined scene transformation between the first scene and the second scene based on coarse scene transformation, the first poses, and the second poses; transforming, by the characterization system, the first plurality of poses, received from the pose estimation system, based on the refined scene transformation to compute a plurality of transformed first poses; and computing an average rotation error and an average translation error of the pose estimation system based on differences between the transformed first poses and the second plurality of poses received from the pose estimation system.
2. The method of claim 1, wherein the rigid transformation of the arrangement of objects with respect to the pose estimation system comprises: a rotation of the arrangement of objects.
3. The method of claim 1, wherein the arrangement of objects is on a support platform, and wherein the characterization system is configured to control the support platform to rigidly transform the arrangement of objects with respect to the pose estimation system.
4. The method of claim 1, wherein a fiducial, adjacent the arrangement of objects, is imaged in the first scene, rigidly transformed with the arrangement of objects, and imaged in the second scene, and wherein the coarse scene transformation between the first scene and the second scene is computed based on computing a first pose of the fiducial imaged in the first scene and a second pose of the fiducial imaged in the second scene.
5. The method of claim 1, wherein the matching the corresponding poses between the first plurality of poses and the second plurality of poses is performed by: transforming the first plurality of poses in accordance with the coarse scene transformation to compute a plurality of coarsely transformed first poses; and for each coarsely transformed first pose of the first coarsely transformed first poses: identifying a second pose of the second poses closest to the coarsely transformed first pose; and determining that the transformed first pose and the second pose closest to the coarsely transformed first pose match when a distance between the coarsely transformed first pose and the second pose closest to the coarsely transformed first pose is less than a false-positive threshold distance.
6. The method of claim 1, wherein the matching the corresponding poses between the first plurality of poses and the second plurality of poses is performed by: transforming the first plurality of poses in accordance with the coarse scene transformation to compute a plurality of coarsely transformed first poses; and for each coarsely transformed first pose of the first coarsely transformed first poses: identifying a second pose of the second poses closest to the coarsely transformed first pose; identifying a type of an object corresponding to the coarsely transformed first pose and the second pose; positioning a first 3-D model of the type of the object at the coarsely transformed first pose; positioning a second 3-D model of the type of the object at the second pose; and determining that the coarsely transformed first pose and the second pose closest to the coarsely transformed first pose match when an intersection between the positioned first 3-D model and the positioned second 3-D model satisfies a false-positive threshold intersection.
7. The method of claim 1, wherein the computing the refined scene transformation comprises: initializing a current scene transformation based on the coarse scene transformation; computing a plurality of first poses as transformed by the current scene transformation; and updating the current scene transformation in accordance with reducing a cost function computed based on differences between the second poses and the first poses as transformed by the current scene transformation.
8. The method of claim 1, wherein the average rotation error is computed based on a sum of the rotation errors between the differences between rotational components of the transformed first poses and the second plurality of poses, and wherein the average translation error is computed based on a sum of the translation errors between the differences between translation components of the transformed first poses and the second plurality of poses.
9. The method of claim 8, wherein the average rotation error R.sub.err is computed in accordance with:
10. A system for characterizing a pose estimation system, comprising: a processor; and a memory storing instructions that, when executed by the processor, cause the processor to: receive, from a pose estimation system configured to estimate poses of objects with respect to a reference coordinate system, a first plurality of poses of an arrangement of objects in a first scene; receive, from the pose estimation system, a second plurality of poses of the arrangement of objects in a second scene, the second scene being a rigid transformation of the arrangement of objects of the first scene with respect to the pose estimation system; compute a coarse scene transformation between the first scene and the second scene; match corresponding poses between the first plurality of poses and the second plurality of poses; compute a refined scene transformation between the first scene and the second scene based on coarse scene transformation, the first poses, and the second poses; transform the first plurality of poses, received from the pose estimation system, based on the refined scene transformation to compute a plurality of transformed first poses; and compute an average rotation error and an average translation error of the pose estimation system based on differences between the transformed first poses and the second plurality of poses received from the pose estimation system.
11. The system of claim 10, wherein the rigid transformation of the arrangement of objects with respect to the pose estimation system comprises a rotation of the arrangement of objects.
12. The system of claim 10, further comprising a support platform, and wherein the memory further stores instructions that, when executed by the processor, cause the processor to control the support platform to rigidly transform the arrangement of objects with respect to the pose estimation system from the first scene to the second scene.
13. The system of claim 10, wherein a fiducial, adjacent the arrangement of objects, is imaged in the first scene, rigidly transformed with the arrangement of objects, and imaged in the second scene, and wherein the coarse scene transformation between the first scene and the second scene is computed based on computing a first pose of the fiducial imaged in the first scene and a second pose of the fiducial imaged in the second scene.
14. The system of claim 10, wherein the memory further stores instructions that, when executed by the processor, cause the processor to match the corresponding poses between the first plurality of poses and the second plurality of poses by: transforming the first plurality of poses in accordance with the coarse scene transformation to compute a plurality of transformed first poses; and for each transformed first pose of the first transformed first poses: identifying a second pose of the second poses closest to the transformed first pose; and determining that the transformed first pose and the second pose closest to the transformed first pose match when a distance between the transformed first pose and the second pose closest to the transformed first pose is less than a false-positive threshold distance.
15. The system of claim 10, wherein the memory further stores instructions that, when executed by the processor, cause the processor to match the corresponding poses between the first plurality of poses and the second plurality of poses by: transforming the first plurality of poses in accordance with the coarse scene transformation to compute a plurality of transformed first poses; and for each transformed first pose of the first transformed first poses: identifying a second pose of the second poses closest to the transformed first pose; identifying a type of an object corresponding to the transformed first pose and the second pose; positioning a first 3-D model of the type of the object at the transformed first pose; positioning a second 3-D model of the type of the object at the second pose; and determining that the transformed first pose and the second pose closest to the transformed first pose match when an intersection between the positioned first 3-D model and the positioned second 3-D model satisfies a false-positive threshold intersection.
16. The system of claim 10, wherein the memory further stores instructions that, when executed by the processor, cause the processor to compute the refined scene transformation by: initializing a current scene transformation based on the coarse scene transformation; computing a plurality of first poses as transformed by the current scene transformation; and updating the current scene transformation in accordance with reducing a cost function computed based on differences between the second poses and the first poses as transformed by the current scene transformation.
17. The system of claim 10, wherein the memory further stores instructions that, when executed by the processor, cause the processor to: compute the average rotation error based on a sum of the rotation errors between the differences between rotational components of the transformed first poses and the second plurality of poses, and compute the average translation error based on a sum of the translation errors between the differences between translation components of the transformed first poses and the second plurality of poses.
18. The system of claim 17, wherein the average rotation error R.sub.err is computed in accordance with:
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The accompanying drawings, together with the specification, illustrate exemplary embodiments of the present invention, and, together with the description, serve to explain the principles of the present invention.
(2)
(3)
(4)
(5)
(6)
(7)
DETAILED DESCRIPTION OF THE INVENTION
(8) In the following detailed description, only certain exemplary embodiments of the present invention are shown and described, by way of illustration. As those skilled in the art would recognize, the invention may be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein.
(9) Pose estimation generally refers to a computer vision technique for estimating or predicting the location and orientation of objects. Some forms of pose estimation refer to detecting the physical pose of a human figure, such as the position and orientation of a person's head, arms, legs, and joints. Pose estimation may also refer more generally to the position and orientation of various animate or inanimate physical objects in a scene. For example, autonomously navigating robots may maintain information regarding the physical poses of objects around them in order to avoid collisions and to predict trajectories of other moving objects. As another example, in the case of robotics for use in manufacturing, pose estimation may be used by robots to detect the position and orientation of components, such that a robot arm can approach the component from the correct angle to obtain a proper grip on the part for assembly with other components of a manufactured product (e.g., gripping the head of a screw and threading the screw into a hole, whereas gripping a screw by the tip would make it difficult to insert into a hole).
(10) There are a variety of techniques for performing pose estimation, including three-dimensional (3D) scanners that capture depth information regarding a scene. For example, pose estimation may be performed by capturing images using stereo vision systems (e.g., based on depth from stereo), which may be active (with an active light emitter, which may emit a pattern of light or structured light). As another example, time of flight sensing may be used to measure depth of surfaces in a scene based on the time between the emission of light and the detection of its reflection. Further computer vision techniques such as instance segmentation using a convolutional neural may also be used to separate individual objects from one another, and further computer vision analysis may be performed to determine the poses of the objects with respect to one another. These various pose estimation techniques may exhibit different tradeoffs regarding, for example, accuracy, precision, latency, power consumption, and the like.
(11) Some applications of pose estimations may require higher precision than others, and therefore different approaches to pose estimation may be better suited for different tasks, based on the design constraints of those tasks.
(12) Generally, characterizing the error rate of a system involves computing the difference between the outputs of a system to a known true value or actual value (“ground truth”), and aggregating the differences, such as by computing a mean absolute error (MAE), a mean squared error (MSE), or a root mean square error (RMSE).
(13) However, it is often difficult to obtain a ground truth set of poses for characterizing a pose estimation system, at least because there are few techniques for measuring the poses of objects. This is for three main reasons. First, methods for accurately estimating the pose are limited to very high resolution point clouds followed by some version of applying an iterative closest point algorithm to align the point clouds. These methods are costly and do not guarantee the accuracy required to obtain a high quality ground truth. Second, a pose must always be with respect to a specific coordinate space, and to compare two poses, they must be in the same coordinate space. Obtaining the transformation in an error-free way is non-trivial. For example, if transform between coordinate spaces is correct to 100 microns, and the application specifications require accuracy to 40 microns, there is no way for the estimated transform to be used to measure at the higher precision of 40 microns. Third, certain objects, such as small objects and transparent objects (e.g., made of glass or transparent plastic) are optically challenging to image and comparative 3-D scanning or sensing systems are not capable of obtaining get high resolution dense point clouds of these types of objects.
(14) As such, aspects of embodiments of the present disclosure are directed to systems and methods for characterizing a pose estimation system, such as characterizing the rotational error and the translational error in the poses computed by the pose estimation system at high precision. For example, some embodiments of the present disclosure are capable of characterizing pose errors in pose estimation systems at a resolution of 30 microns (30 micrometers) and 0.3 degrees. Comparative systems operating in similar conditions are generally limited to accuracies of 300 microns or more.
(15)
(16) In some embodiments, a fiducial 30 (or marker) such as a ChArUco board (e.g., a checkerboard pattern of alternating black and white squares with ArUco fiducial markers in each of the white squares, where ArUco markers are described, for example, in Garrido-Jurado, Sergio, et al. “Automatic generation and detection of highly reliable fiducial markers under occlusion.” Pattern Recognition 47.6 (2014): 2280-2292.) The arrangement 20 of objects 22 and the fiducial 30 may be placed on a movable support platform 40 such as a rotatable turntable.
(17) The support platform 40 is configured to perform a physical rigid transformation of the arrangement 20 of objects 22 together with the fiducial 30 with respect to the pose estimator 10, while keeping the relative positions and orientations of the objects 22 with respect to one another and the fiducial 30 substantially fixed. For example, in the case of the use of a turntable as the movable support platform 40, the rigid transformation may be a rotation (as indicated by the arrows) around a vertical axis (e.g., an axis aligned with gravity).
(18) A characterization system 100 according to various embodiments of the present disclosure is configured to characterize the performance of the pose estimator 10, such as predicting or calculating the average pose error (e.g., rotation error and translation error) in the estimated poses of the objects 22 computed by the pose estimator 10.
(19) In more detail, the pose estimator 10 is configured to estimate the poses of objects detected within its field of view 12. In the embodiment shown in
(20) In particular, a “pose” refers to the position and orientation of an object with respect to a reference coordinate system. For example, a reference coordinate system may be defined with the pose estimation system 10 at the origin, where the direction along the optical axis of the pose estimation system 10 (e.g., a direction through the center of its field of view 12) is defined as the z-axis of the coordinate system, and the x and y axes are defined to be perpendicular to one another and perpendicular to the z-axis. (Embodiments of the present disclosure are not limited to this particular coordinate system, and a person having ordinary skill in the art would understand that poses may be transformed between different coordinate systems.)
(21) Each object 22 may also be associated with a corresponding coordinate system of its own, which is defined with respect to its particular shape. For example, a rectangular prism with sides of different lengths may have a canonical coordinate system defined where the x-axis is parallel to its shortest direction, z-axis is parallel to its longest direction, the y-axis is orthogonal to the x-axis and z-axis, and the origin is located at the centroid of the object 22.
(22) Generally, in a three-dimensional coordinate system, objects 22 have six degrees of freedom—rotation around three axes (e.g., rotation around x-, y-, and z-axes) and translation along the three axes (e.g., translation along x-, y-, and z-axes). For the sake of clarity, symmetries of the objects 22 will not be discussed in detail herein, but may be addressed, for example, by identifying multiple possible poses with respect to different symmetries (e.g., in the case of selecting the positive versus negative directions of the z-axis of a right rectangular prism), or by ignoring some rotational components of the pose (e.g., a right cylinder is rotationally symmetric around its axis).
(23) In some embodiments, it is assumed that a three-dimensional (3-D) model or computer aided design (CAD) model representing a canonical or ideal version of each type of object 22 in the arrangement of objects 20 is available. For example, in some embodiments of the present disclosure, the objects 22 are individual instances of manufactured components that have a substantially uniform appearance from one component to the next. Examples of such manufactured components include screws, bolts, nuts, connectors, and springs, as well as specialty parts such electronic circuit components (e.g., packaged integrated circuits, light emitting diodes, switches, resistors, and the like), laboratory supplies (e.g. test tubes, PCR tubes, bottles, caps, lids, pipette tips, sample plates, and the like), and manufactured parts (e.g., handles, switch caps, light bulbs, and the like). Accordingly, in these circumstances, a CAD model defining the ideal or canonical shape of any particular object 22 in the arrangement 20 may be used to define a coordinate system for the object (e.g., the coordinate system used in the representation of the CAD model).
(24) Based on a reference coordinate system (or camera space, e.g., defined with respect to the pose estimation system) and an object coordinate system (or object space, e.g., defined with respect to one of the objects), the pose of the object may be considered to be a rigid transform (rotation and translation) from object space to camera space. The pose of object 1 in camera space 1 may be denoted as P.sub.c.sub.
(25)
where the rotation submatrix R:
(26)
represents rotations along the three axes from object space to camera space, and the translation submatrix T:
(27)
represents translations along the three axes from object space to camera space.
(28) If two objects—Object A and Object B—are in the same camera C coordinate frame, then the notation P.sub.CA is used to indicate the pose of Object A with respect to camera C and P.sub.CB is used to indicate the pose of Object B with respect to camera C. For the sake of convenience, it is assumed herein that the poses of objects are represented based on the reference coordinate system, so the poses of objects A and B with respect to camera space C may be denoted P.sub.A and P.sub.B, respectively.
(29) If Object A and Object B are actually the same object, but performed during different pose estimation measurements, and a residual pose P.sub.err or P.sub.AB (P.sub.AB=P.sub.err) is used to indicate a transform from pose P.sub.A to pose P.sub.B, then the following relationship should hold:
P.sub.AP.sub.err=P.sub.B (1)
and therefore
P.sub.err=P.sub.A.sup.−1P.sub.B (2)
(30) Ideally, assuming the object has not moved (e.g., translated or rotated) with respect to the pose estimator 10 between the measurements of pose estimates P.sub.A and P.sub.B, then P.sub.A and P.sub.B should both be the same, and P.sub.err should be the identity matrix (e.g., indicating no error between the poses):
(31)
(32) Similarly, the above would hold if the object underwent a known rigid transformation T and pose P.sub.B represented the estimated pose P.sub.B′ after transforming the estimated pose back to the original scene (P.sub.B=P.sub.B′T) or, alternatively, if pose P.sub.A represented the estimated pose after applying transformation T to the estimated pose to transform it to the new scene (P.sub.A=P.sub.A′T).
(33) Differences between the actual measured value P.sub.err, as computed based on the estimates computed by the pose estimator 10 and the identity matrix may be considered to be errors:
R.sub.err=∥R(P.sub.err)∥ (3)
T.sub.err=∥T(P.sub.err)∥ (4)
where R.sub.err is the rotation error and T.sub.err is the translation error. The function R( ) converts P.sub.err into an axis-angle where the magnitude is the rotation difference, and the function T( ) extracts the translation component of the pose matrix.
(34) The axis-angle representation from rotation matrix R is given by:
(35)
where Tr( ) denotes the matrix trace (the sum of the diagonal elements of the matrix), and θ represents the angle of rotation
(36) Accordingly, some aspects of embodiments of the present disclosure relate to applying the above pose comparison framework for characterizing a pose estimation system 10.
(37)
(38) Referring to
(39) Accordingly, the pose estimator 10 estimates a first plurality of poses of various ones of the objects 22 in the arrangement 20 of a first scene S.sub.1. For example, the plurality of poses may be represented as a collection (e.g., an array) of matrices representing the rotation and translation of the individual objects from their canonical object spaces to camera space. The poses may also include information regarding the classifications of the objects (e.g., in the example shown in
(40) In operation 220, the arrangement 20 of objects 22 is rigidly transformed to form a second scene S.sub.2 based on the first scene S.sub.1. In more detail, applying a rigid transformation, with respect to the pose estimator 10, to the arrangement 20 as a whole maintains the physical relationships of the objects 22 with respect to one another (e.g., without changing the physical distances between the objects 22 or the orientations of the objects with respect to one another), but changes the physical relationship of the arrangement and the pose estimator 10.
(41) As shown in
(42) (In some circumstances, it may be functionally equivalent to form the second scene S.sub.2 by rotating and/or translating the pose estimation system 10 in a manner that maintains the arrangement 20 of objects 22 in the field of view 12 of the pose estimation system 10.)
(43)
(44) While
(45) In operation 230, the characterization system 100 receives a second plurality of poses of the objects 22 in the arrangement 20 of a second scene S.sub.2, where the second plurality of poses of the objects 22 are computed by the same pose estimation system 10 as the first plurality of poses in second scene S.sub.2. The second plurality of poses may be denoted as {Q.sub.S.sub.
(46) Given the first plurality of estimated poses {P.sub.S.sub.
(47) In operation 240, the characterization system 100 computes a coarse scene transformation T.sub.coarse between the first scene S.sub.1 and the second scene S.sub.2. In some embodiments of the present disclosure, a distinctive marker or fiducial 30 is included with the arrangement 20 of objects 22 and appears in both the first scene S.sub.1 and the second scene S.sub.2, where the fiducial 30 is rigidly transformed together with the arrangement 20 of objects 22, such that the physical relationship between the fiducial 30 and the objects 22 is maintained through the transformation, thereby enabling the fiducial 30 to provide a reference for computing the coarse scene transformation T.sub.coarse. In the embodiments shown in
T.sub.coarse=T.sub.S.sub.
(48) In some embodiments of the present disclosure, other types of fiducials 30 are placed in the scene and used to compute the coarse scene transformation, such as a grid of ArUco markers (e.g., without the checkerboard), augmented reality tags (ARTag), AprilTags, one or more rulers, one or more protractors, and the like.
(49) In various other embodiments of the present disclosure, other techniques may be used to compute a coarse scene transformation. For example, in embodiments of the present disclosure where the support platform 40 can be controlled at high precision, the coarse scene transformation may be computed based on the known transformation applied by the support platform 40. As another example, a coarse scene transformation may be computed based on treating the points poses as point clouds (e.g., considering the positions only) and registering or aligning the point clouds (e.g., by applying an iterative closest point algorithm). As a further example, the two poses can be matched using a graph matching approach. The pose estimator 10 computes a 3-D connected graph from each component in the set of poses of S.sub.1 to each other component in the set of poses of S.sub.2. Then the pose estimator computes a feature vector for each element in S.sub.1 and each element in S.sub.2 using the relative transformation (R and T) between itself and its closest neighbors (e.g., its five closest neighbors). These relative transformations are then used to compute correspondences between S.sub.1 and S.sub.2 (e.g., finding poses in S.sub.1 and S.sub.2 that have similar relative transformations to its closest neighbors). After finding correspondences between poses in S.sub.1 and poses in S.sub.2, the pose estimator 10 computes one or more 3-D rigid body transform estimations using, for example, random sample consensus (RANSAC) where inliers are defined as correspondences less than a threshold distance (e.g., 3 mm). The estimated rigid body transform with the most inliers could be used as T.sub.coarse.
(50) In operation 250, the characterization system 100 matches corresponding ones of the first poses in {P.sub.S.sub.
(51) In some embodiments of the present disclosure, the characterization system 100 performs the matching between the first poses in {P.sub.S.sub.
P.sub.S.sub.
in accordance with one embodiment of the present disclosure.
(52)
(53) For example, for each coarsely transformed first pose P.sub.S.sub.
(54) In some embodiments of the present disclosure, the characterization system 100 performs the matching between the first poses in {P.sub.S.sub.
(55) In some embodiments of the present disclosure, there may be mismatches in the poses. For example, the pose estimation system 10 may estimate poses for a different number of objects in the first scene S.sub.1 versus the second scene S.sub.2 or estimate poses for different objects (e.g., five objects A, B, C, D, and E in the first scene S.sub.1 and five objects A, B, D, E, and F in the second scene S.sub.2). These differences may be due, for example, to noise or instability in the pose estimation system 10 or asymmetries in the performance of the pose estimation system 10.
(56) In some embodiments of the present disclosure, instead of using a greedy search to perform matching of poses, a false-positive threshold approach is applied instead to match coarsely transformed first poses P.sub.S.sub.
(57) After performing the matching, it is assumed that first pose N, and second pose Q.sub.S.sub.
(58) In operation 260, the characterization system 100 computes a refined scene transform T.sub.S.sub.
(59)
where x.sub.j is a predefined set of points (e.g., [0,0,1], [0,1,0], and [1,0,0], although embodiments of the present disclosure are not limited thereto). If the points are set to [0,0,0], then this function is equivalent to a 3-D rigid body transform.
(60) For example, referring back to
(61) In some embodiments of the present disclosure, the refinement process is an iterative operation (such as by applying gradient descent) to update the current rigid transformation T.sub.current until the cost function is minimized (e.g., until a threshold condition has been met, such as reaching a set number of iterations or where the improvement from one iteration to the next is less than a threshold value), at which point the updated value of T.sub.current is output as the refined scene transformation T.sub.S.sub.
(62) Accordingly, in operation 260, the characterization system 100 computes a refined scene transformation T.sub.S.sub.
(63)
(64) In operation 270, characterization system 100 characterizes the pose estimation system 100 based on the refined scene transformation T.sub.S.sub.
(P.sub.S.sub.
P.sub.err=(P.sub.S.sub.
(65) As such, following the approach of equations (3) and (4), the rotation error R.sub.err and translation error T.sub.err characterizing the error of a pose estimation system 10 may be computed as:
(66)
where, as above, the function R( ) converts its argument into an axis-angle where the magnitude is the rotation difference, and the function T( ) extracts the translation component of the pose matrix from its argument. In particular:
R((P.sub.S.sub.
T((P.sub.S.sub.
(67) In the example shown in
(68) This procedure can be repeated across multiple pairs of scenes (e.g., multiple different arrangements of different objects, where the arrangements are rigidly transformed to produce pairs of scenes) to compute a variance, maximum, and expected value for the various pose error measurements for a particular pose estimation system. These values then allow the performance of different pose estimation systems to be compared against one another.
(69) In some experiments with approaches in accordance with embodiments of the present disclosure, a pose characterization system was used to accurately predict pose errors made by pose estimators to a precision of less than equal to 30 microns in translation error T.sub.err and less than or equal to 0.3 degrees in rotational error R.sub.err. This enables the evaluation of whether such pose estimation systems are capable of performing to particular high-precision design constraints, such as a desired precision of less than 200 microns of translation error and less than 1 degree of rotation error at a distance of approximately 1 meter, whereas such high-precision measurements of the error characterization of pose estimation systems may otherwise have been impossible or expensive to implement.
(70) As such, aspects of embodiments of the present disclosure provide systems and methods for characterizing the performance (e.g., accuracy and precision) of pose estimation systems at a high level of precision without relying on an external source of ground truth.
(71) While the present invention has been described in connection with certain exemplary embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims, and equivalents thereof.