Machine vision-based method and system to facilitate the unloading of a pile of cartons in a carton handling system
11557058 · 2023-01-17
Assignee
Inventors
Cpc classification
B25J9/1664
PERFORMING OPERATIONS; TRANSPORTING
International classification
Abstract
A machine vision-based method and system to facilitate the unloading of a pile of cartons within a work cell are provided. The method includes the step of providing at least one 3-D or depth sensor having a field of view at the work cell. Each sensor has a set of radiation sensing elements which detect reflected, projected radiation to obtain 3-D sensor data. The 3-D sensor data including a plurality of pixels. For each possible pixel location and each possible carton orientation, the method includes generating a hypothesis that a carton with a known structure appears at that pixel location with that container orientation to obtain a plurality of hypotheses. The method further includes ranking the plurality of hypotheses. The step of ranking includes calculating a surprisal for each of the hypotheses to obtain a plurality of surprisals. The step of ranking is based on the surprisals of the hypotheses.
Claims
1. A machine vision-based method to facilitate the unloading of at least one item supported on a pallet, the method comprising the steps of: providing at least one 3-D or depth sensor having a field of view, the at least one sensor having a set of radiation sensing elements which detect reflected, projected radiation to obtain 3-D sensor data, the 3-D sensor data including a plurality of pixels; for each possible pixel location and each possible item orientation, generating a hypothesis that an item with a known structure appears at that pixel location with that orientation to obtain a plurality of hypotheses; and ranking the plurality of hypotheses wherein the step of ranking includes calculating a surprisal for each of the hypotheses to obtain a plurality of surprisals and wherein the step of ranking is based on the surprisals of the hypotheses.
2. The method as claimed in claim 1, further comprising utilizing an algorithm to unload a plurality of items from the pallet.
3. The method as claimed in claim 1, wherein at least one of the hypotheses is based on print on the at least one item.
4. A machine vision-based system to facilitate the unloading of at least one item supported on a pallet, the system comprising: at least one 3-D or depth sensor having a field of view, the at least one sensor having a set of radiation sensing elements configured to detect reflected, projected radiation to obtain 3-D sensor data, the 3-D sensor data including a plurality of pixels; at least one processor configured to process the 3-D sensor data and, for each possible pixel location and each possible item orientation, generate a hypothesis that an item with a known structure appears at that pixel location with that item orientation to obtain a plurality of hypotheses; and the at least one processor configured to rank the plurality of hypotheses wherein ranking includes calculating a surprisal for each of the hypotheses to obtain a plurality of surprisals and wherein the ranking is based on the surprisals of the hypotheses.
5. The system as claimed in claim 4, wherein the at least one processor utilizes an algorithm to unload a plurality of items from the pallet.
6. The system as claimed in claim 4, wherein at least one of the hypotheses is based on print on the at least one item.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
DETAILED DESCRIPTION
(17) As required, detailed embodiments of the present invention are disclosed herein; however, it is to be understood that the disclosed embodiments are merely exemplary of the invention that may be embodied in various and alternative forms. The figures are not necessarily to scale; some features may be exaggerated or minimized to show details of particular components. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a representative basis for teaching one skilled in the art to variously employ the present invention.
(18) Preferably, one or more 3-D or depth sensors 32 (
(19) “Multipoint” refers to the laser projector which projects thousands of individual beams (aka pencils) onto a scene. Each beam intersects the scene at a point.
(20) “Disparity” refers to the method used to calculate the distance from the sensor to objects in the scene. Specifically, “disparity” refers to the way a laser beam's intersection with a scene shifts when the laser beam projector's distance from the scene changes.
(21) “Depth” refers to the fact that these sensors are able to calculate the X, Y and Z coordinates of the intersection of each laser beam from the laser beam projector with a scene.
(22) “Passive Depth Sensors” determine the distance to objects in a scene without affecting the scene in any way; they are pure receivers.
(23) “Active Depth Sensors” determine the distance to objects in a scene by projecting energy onto the scene and then analyzing the interactions of the projected energy with the scene. Some active sensors project a structured light pattern onto the scene and analyze how long the light pulses take to return, and so on. Active depth sensors are both emitters and receivers.
(24) For clarity, each sensor 32 is preferably based on active monocular, multipoint disparity technology as a “multipoint disparity” sensor herein. This terminology, though serviceable, is not standard. A preferred monocular (i.e. a single infrared camera) multipoint disparity sensor is disclosed in U.S. Pat. No. 8,493.496. A binocular multipoint disparity sensor, which uses two infrared cameras to determine depth information from a scene, is also preferred.
(25) Multiple volumetric sensors 32 may be placed in key locations around and above the piles or stacks of cartons 25 (
(26) In general, sources of information are unified in the same framework, so that they can be compared as commensurate quantities without using special parameters. This approach to image processing is generally noted as follows: Generate hypotheses. Rank how well each hypothesis matches the evidence, then select the ‘best’ hypothesis as the answer. This approach is very probabilistic in nature and is shown in the block diagram flow chart of
(27) Boxes or cartons as an example: What are the hypotheses in boxes? A box is located at some h,v position with some orientation (h,v,r). For every position and possible orientation, a hypothesis is generated. Ranking the hypotheses. For each hypothesis one calculates how improbable that the configuration arose by chance. One calculates probability to see if something shows up by random chance.
(28) Explanation of Calculation
(29) Focus on intensity in the image and look just for the edge (strength of the edge). What is the probability that there is a pixel that bright? One could focus on the brightest pixels and calculate the probability of a histogram (i.e. probability histogram of
(30) One asks if there is a pixel with brightness equal to or greater than this pixel, which is a Cumulative Distribution Function (see the cumulative probability histogram of
(31) Then the probability of observing a pixel with intensity greater than or equal to the given value is 1-CDF(g).
(32) If one pick pixels at random (see
(33) What is the probability of observing a pixel this bright or brighter? See the cumulative probability formula of
(34) If one takes the multiplication over the box, one can hypothesize where the box is and the orientation and do it for only one box. For each point and rotation of the image one can assign probability of the box being there. It requires a lot of computation to get this number. See the sharp edges of the box of
(35) What is the probability of seeing all the pixels in that box together by chance? Assuming independent probabilities, the probability of observing a whole bunch of independent events together is the multiplication of the probabilities of the individual events. Pi means multiply all these things together. This quantity is the multiplication of a bunch of very small numbers. A low number means this configuration will not occur by random chance.
(36)
(37) One does not do all the multiplication computations because they are numerically unstable. One calculates the entropy, (aka the surprisal), which is the negative log probability of the number or observation. Negative log is entropy, which allows for addition instead of multiplication, therefore one works in surprisal which makes probability more accurate and faster. (See
(38) One could just as easily have done this same thing working on the intensity image, volume, or any other feature like print, rather than the edge image. The algorithm does not care what features one is looking at.
(39) The distribution of the random variable, G, is found by observation. The algorithm is good enough that observation of a single image is sufficient, but by continuously updating the CDF as we go, the performance of the algorithm improves
(40) Using classical methods, if one were looking at the surface points of
(41) If one were to look at
(42) Also, consider what happens if one uses grayscale gradients, but then one wants to add depth gradients as a second source of information. How does one add these quantities together? The classical approach to image processing has no answer for this question. Only the present approach of this application has a ready answer: multiply the probabilities (add the Surprisals). In this approach one could use a whole image, or depth image, or greyscale, or any image.
(43) Algorithm 1: Box Likelihood Evaluation Generate: For each pixel (h,v), for each box orientation a=0-359 degrees of (h,v), generate a hypothesis H(h,v,a) that a box with dimensions L×W appears at that location, with that orientation. Rank: Compute the surprisal for each hypothesis. Rank the hypotheses according to Surprisals. Bigger surprisal means it is a better chance box or organization of chance value. Select: Get the best hypothesis. (Single box pick) Very unlikely to see a tie.
(44) Algorithm 2: Multibox Parse
(45) Decanting: In Multibox Parse, one does not do the select phase, one does not care about the best box. One needs to know where all the boxes are. Visually, the Surprisal hypothesis is represented in
(46) For Single Box Pick one simply picks the strongest hypothesis from
(47) For Multibox Parse: one must take the same information and find all the boxes. Select a set of hypotheses to look for consistence. Boxes need to be disjoint, cannot overlap. Sum of the suprisals of the hypothesis is maximum. Hypotheses that are consistent and maximum. This is an NP hard problem. No known polynomial time solution; cannot verify in polynomial time. One can find an approximate solution in polynomial time.
(48) One solves with an approximation method like Simulated Annealing algorithm but multiple methods for approximating the optimum answer will present themselves.
(49) Algorithm 3: Pallet Decomposition
(50) The algorithm for Pallet Decomposition will ideally partition a layer of boxes such that the number of partitions is minimal—one wants to empty the layer of boxes, by picking multiple boxes at a time, in the minimum number of picks. Decomp Sequence of legal picks to empty a layer. Optimal decomposition is a decomposition with minimum number of picks. Find an optimum decomposition for a layer. NP Hard, so we will be using approximation like Branch and Bound algorithm.
(51) A legal pick does not overlap existing boxes. Pick tool does not overlay any box partially.
(52) Illegal picks have tool picking overlapping boxes.
(53) Algorithm 4: Reading Printed Data Look at the boxes on the outside of the pallet (especially boxes in corners) that have strong signals in both the edge grayscale gradient and depth gradient surprisal matrices. Boxes on the corners are identifiable by just the depth and edge grayscale surprisal matrices. Once one has identified the corner boxes using the gradient information, one can ‘look at’ the print on corner boxes—that is one can segregate visual data comprising the image of the print from the visual data generated by the edges of the boxes.
Features of at Least One Embodiment of the Invention
(54) 1) Calculate Maximum Likelihood through surprisal. Use surprisal to unify treatment of all sources of information.
(55) 2) Describe HDLS using multipoint disparity sensors such as Kinect sensors available from Microsoft Corporation. Since one combines grayscale and depth in common probabilistic framework, it is important to insure steadiness of distributions. One wants isolation from ambient illumination, so find a source to overwhelm ambient. Efficiency is obtained by using the IR sensors twice: once for disparity and once for grayscale. Each sensor is configured to alternate between acquisition of disparity (depth) and grayscale information. Thus, one uses the same hardware for two purposes. The wavelength of disparity sensors operates at frequency of fabret-perot IR laser at 830 nm. LED and laser diode sources are commercially available at 850 nm but not 830 nm. One uses special source at 850 nm, along with wide band pass filter between 830 (disparity) and 850 HDLS.
(56) 3) No/Eliminate training. Use known structure of boxes to eliminate training session for ML Use of orthogonal projection allows one to treat all boxes the same. Use length, width, depth for grayscale and depth information. No matter how far away the boxes are or orientation, with orthogonal projection one knows that it is a box without the need for training.
(57) 4) Use gradient image of printed box. Use as additional information to improve the likelihood of correctly identifying boxes on the interior of the pallet which may not have significant depth gradient because they are packed together.
(58) The system includes vision-guided robots 21 and one or more cameras 32 having a field of view 30. The cameras 32 and the robots 21 may be mounted on support beams of a support frame structure of the system 10 or may rest on a base. One of the cameras 32 may be mounted on one of the robots 21 to move therewith.
(59) The vision-guided robots 21 have the ability to pick up any part within a specified range of allowable cartons using multiple-end-of-arm tooling or grippers. The robots pick up the cartons and orient them at a conveyor or other apparatus. Each robot 21 precisely positions self-supporting cartons on a support or stage.
(60) The robots 21 are preferably six axis robots. Each robot 21 is vision-guided to identify, pick, orient, and present the carton so that they are self-supporting on the stage. The grippers 17 accommodate multiple part families.
(61) Benefits of Vision-based Robot Automation include but are not limited to the following:
(62) Smooth motion in high speed applications;
(63) Handles multiple cartons in piles 25 of cartons;
(64) Slim designs to operate in narrow spaces;
(65) Integrated vision; and
(66) Dual end-of-arm tooling or grippers 17 designed to handle multiple carton families.
(67) A master control station or system controller (
(68) In some embodiment, multiple cameras such as the cameras 32 can be situated at fixed locations on the frame structure at the station, or may be mounted on the arms of the robot 21. Two cameras 32 may be spaced apart from one another on the frame structure. The cameras 32 are operatively connected to the master controller via their respective image processors. The master controller also controls the robots of the system through their respective robot controllers. Based on the information received from the cameras 32, the master controller then provides control signals to the robot controllers that actuate robotic arm(s) or the one or more robot(s) 21 used in the method and system.
(69) The master controller can include a processor and a memory on which is recorded instructions or code for communicating with the robot controllers, the vision systems, the robotic system sensor(s), etc. The master controller is configured to execute the instructions from its memory, via its processor. For example, master controller can be host machine or distributed system, e.g., a computer such as a digital computer or microcomputer, acting as a control module having a processor and, as the memory, tangible, non-transitory computer-readable memory such as read-only memory (ROM) or flash memory. The master controller can also have random access memory (RAM), electrically-erasable, programmable, read only memory (EEPROM), a high-speed clock, analog-to-digital (A/D) and/or digital-to-analog (D/A) circuitry, and any required input/output circuitry and associated devices, as well as any required signal conditioning and/or signal buffering circuitry. Therefore, the master controller can include all software, hardware, memory, algorithms, connections, sensors, etc., necessary to monitor and control the vision subsystem, the robotic subsystem, etc. As such, a control method can be embodied as software or firmware associated with the master controller. It is to be appreciated that the master controller can also include any device capable of analyzing data from various sensors, comparing data, making the necessary decisions required to control and monitor the vision subsystem, the robotic subsystem, sensors, etc.
(70) An end effector on the robot arm may include a series of grippers supported to pick up the cartons. The robotic arm is then actuated by its controller to pick up the cartons with the particular gripper, positioning the gripper 17 relative to the cartons using the determined location and orientation from the visual position and orientation data of the particular vision subsystem including its camera and image processor.
(71) In general, the method and system of at least one embodiment of the present invention searches for objects like boxes or cartons which have high variability in shape, size, color, printing, barcodes, etc. There is lots of differences between each object, even of the same type and one needs to determine location of the boxes that may be jammed very close together, without much discernible feature. The method combines both 2D and 3D imaging (grayscale and depth) to get individuation of the objects. The objects may all “look” the same to a human, but have high variability between each assembled box or carton.
(72) While exemplary embodiments are described above, it is not intended that these embodiments describe all possible forms of the invention. Rather, the words used in the specification are words of description rather than limitation, and it is understood that various changes may be made without departing from the spirit and scope of the invention. Additionally, the features of various implementing embodiments may be combined to form further embodiments of the invention.