SYSTEM AND METHOD FOR AUTONOMOUS MULTI-BIN PARCEL LOADING SYSTEM
20210253376 · 2021-08-19
Assignee
Inventors
- Aniruddha SINGHAL (Noida, IN)
- HARSHAD KHADILKAR (Thane West, IN)
- Venkat Raju Chintalapalli Patta (Bangalore, IN)
- Deepak RAINA (Noida, IN)
- Venkatesh Srinivas PRASAD (Bangalore, IN)
- Shivam THUKRAL (Noida, IN)
- Rajesh SINHA (Noida, IN)
- Richa VERMA (Noida, IN)
Cpc classification
G06Q10/087
PHYSICS
G06N5/01
PHYSICS
G06Q10/04
PHYSICS
International classification
Abstract
State of the art automated bin packing systems fail to handle dynamic scenarios in which information on dimensions of objects to be loaded is not available in advance. These systems also fail to consider capabilities of robots used for the automated packing of objects/bins. The disclosure herein generally relates to automated bin packing, and, more particularly, to a system and method for autonomous multi-bin parcel loading system. The system handles an online object packing in which information on dimensions of objects to be loaded is not available in advance. The system is also configured to consider capabilities of one or more robots used for loading objects to containers, while generating recommendations for object packing.
Claims
1. A processor implemented method of object packing, comprises: obtaining a current state of a first Long Distance Container (LDC) in which an object is to be packed, via one or more hardware processors; obtaining dimensions of the object to be packed, via the one or more hardware processors; determining a plurality of locations in the first LDC where the object can be placed, for different orientations of the object, via the one or more hardware processors; determining a plurality of future states of the first LDC, for placement of the object in the plurality of the determined locations for the different orientations of the object, via the one or more hardware processors; determining whether any of the plurality of future states is a state that satisfies at least one pre-defined condition with respect to packing of objects, via the one or more hardware processors; determining orientation and location associated with the future state that has been determined as satisfying the at least one pre-defined condition, via the one or more hardware processors; and loading the object to the first LDC, based on the determined orientation and location, via the one or more hardware processors.
2. The method as claimed in claim 1, wherein determining the plurality of locations in the first LDC where the object can be placed comprises determining all the locations in the first LDC where the object can fit in, by considering a) dimension feasibility of the object, b) capability of a robotic arm that is being used for loading the object to the first LDC, and c) stability of a stack of objects in the first LDC, for the different orientations of the object.
3. The method as claimed in claim 1, wherein the step of determining the orientation and location for loading the object is fine-tuned based on a feedback mechanism, comprising: obtaining a feedback data after loading each object to the first LDC based on the determined orientation and location; and fine-tuning the step of making decision with respect to placement of the object in a particular location and the orientation of the object, based on the obtained feedback data.
4. The method as claimed in claim 1, wherein the method comprises recurringly performing the steps of opening a new LDC and determining the orientation and location in the new LDC where the object can be placed, if none of the plurality of future states determined for the first LDC satisfies the pre-defined condition, till a future state satisfying the pre-defined condition is identified.
5. A system for object packing, comprises: one or more hardware processors; a communication interface; and a memory storing a plurality of instructions, wherein the plurality of instructions when executed, cause the one or more hardware processors to: obtain a current state of a first Long Distance Container (LDC) in which an object is to be packed; obtain dimensions of the object to be packed; determine a plurality of locations in the first LDC where the object can be placed, for different orientations of the object; determine a plurality of future states of the first LDC, for placement of the object in the plurality of the determined locations for the different orientations of the object; determine whether any of the plurality of future states is a state that satisfies at least one pre-defined condition with respect to packing of objects; determine orientation and location associated with the future state that has been determined as satisfying the at least one pre-defined condition; and load the object to the first LDC, based on the determined orientation and location.
6. The system as claimed in claim 5, wherein the system determines the plurality of locations in the first LDC where the object can be placed by determining all the locations in the first LDC where the object can fit in, for the different orientations of the object.
7. The system as claimed in claim 5, wherein the system fine-tunes the step of determining the orientation and location for loading the object, based on a feedback mechanism, comprising: obtaining a feedback data after loading each object to the first LDC based on the determined orientation and location; and fine-tuning the step of making decision with respect to placement of the object in a particular location and the orientation of the object, based on the obtained feedback data.
8. The system as claimed in claim 5, wherein the system opens a second LDC and determines the orientation and location in the second LDC where the object can be placed, if none of the plurality of future states determined for the first LDC satisfies the pre-defined condition.
9. A computer program product comprising a non-transitory computer readable medium having a computer readable instructions embodied therein, wherein the computer readable instructions, when executed, cause to perform an object packing by: obtaining a current state of a first Long Distance Container (LDC) in which an object is to be packed, via one or more hardware processors; obtaining dimensions of the object to be packed, via the one or more hardware processors; determining a plurality of locations in the first LDC where the object can be placed, for different orientations of the object, via the one or more hardware processors; determining a plurality of future states of the first LDC, for placement of the object in the plurality of the determined locations for the different orientations of the object, via the one or more hardware processors; determining whether any of the plurality of future states is a state that satisfies at least one pre-defined condition with respect to packing of objects, via the one or more hardware processors; determining orientation and location associated with the future state that has been determined as satisfying the at least one pre-defined condition, via the one or more hardware processors; and loading the object to the first LDC, based on the determined orientation and location, via the one or more hardware processors.
10. The computer program product as claimed in claim 9, wherein determining the plurality of locations in the first LDC where the object can be placed comprises determining all the locations in the first LDC where the object can fit in, by considering a) dimension feasibility of the object, b) capability of a robotic arm that is being used for loading the object to the first LDC, and c) stability of a stack of objects in the first LDC, for the different orientations of the object.
11. The computer program product as claimed in claim 9, wherein the step of determining the orientation and location for loading the object is fine-tuned based on a feedback mechanism, comprising: obtaining a feedback data after loading each object to the first LDC based on the determined orientation and location; and fine-tuning the step of making decision with respect to placement of the object in a particular location and the orientation of the object, based on the obtained feedback data.
12. The computer program product as claimed in claim 9, wherein the bin packing comprises recurringly performing the steps of opening a new LDC and determining the orientation and location in the new LDC where the object can be placed, if none of the plurality of future states determined for the first LDC satisfies the pre-defined condition, till a future state satisfying the pre-defined condition is identified.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0010] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles.
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
DETAILED DESCRIPTION
[0019] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments. It is intended that the following detailed description be considered as exemplary only, with the true scope being indicated by the following claims.
[0020]
[0021] The communication interface(s) 103 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the communication interface(s) 103 can include one or more ports for connecting a number of devices to one another or to another server.
[0022] The memory 101 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, one or more components (not shown) of the system 100 can be stored in the memory 101. The memory 101 is configured to store a plurality of operational instructions (or ‘instructions’) which when executed cause one or more of the hardware processor(s) 102 to perform various actions associated with the process of automated bin packing being performed by the system 100. The system 100 can be implemented in a variety of ways as per requirements. Various steps involved in the process of the automated bin packing being performed by the system 100 of
[0023]
[0024] The conveyor belt 301 is used to transport/carry the cuboidal parcels of arbitrary dimensions during the parcel loading. The cuboidal parcels are alternately referred to as objects or bins in the packing scenario. The plurality of sensors 302 are configured to measure values of various parameters such as but not limited to size of each of the incoming boxes along all three dimensions, and their weights. The robot arm 306 is configured to load the parcel into the container, which involves the robot arm 306 picking up each object and placing inside the container.
[0025] The system 100 is configured to generate suggestions/recommendations in terms of location in the container where each of the objects can be placed, and orientation of each of the objects while placing at the suggested location. The system 100 generates the recommendations for placing each of the objects by executing the sequence of steps in method 200, which are explained below.
[0026] The method 200 used by the system 100 for generating the bin packing recommendations is based on reinforcement learning approach. In an embodiment, the packing recommendations are generated such that location suggestion for each object coincides with corner locations. In another embodiment, the recommendations also depend on constraints imposed by the hardware system i.e. of the robotic arm used for picking and placing the objects in the containers. A few examples of such constraints are, but not limited to, gripping capability, accuracy, and degrees of freedom of robotic arm. Taking the capabilities/constraints of the robotic arm into consideration while generating the recommendations, helps ensure that the robotic arm can execute/perform the picking and placing of the objects as per the recommendations. For example, in a scenario in which the robotic constraints are not considered while generating the recommendations, the robotic arm may fail to execute the picking and placing of the objects as per the recommendations, as the recommendations may require the robotic arm to, for example, tilt the object by certain degrees, which the robotic arm may not be able to do as it does not have capability to perform the tilting of the object, at least in to the extent specified (in terms of degree) in the recommendation.
[0027] At step 202, the system 100 obtains a current state of a container in which the objects are to be packed. In an embodiment, multiple containers may be available at any instance for packing the objects, and one of the available containers is considered for object loading at a time. The container that is initially considered for object loading is referred to as ‘first LDC’ for explanation purpose. At step 204, the system 100 obtains dimensions of an object to be packed. In an embodiment, multiple objects may be loaded on the conveyor belt at once. Out of the multiple objects, the system 100 may be configured to consider objects on a First Come First Serve basis i.e. the object that is on the front line of the conveyor belt at any instance is considered by the system 100 as the object to be packed. The dimensions obtained by the system 100 may contain information on Length, Width, and Breadth (L, B, W) of the object. The system 100 uses one or more of the plurality of sensors 302 to measure and obtain the dimensions of the object.
[0028] Further, at step 206, the system 100 determines all locations in the first LDC where the object can be placed. The system 100 identifies the locations by comparing the obtained dimensions of the object with dimensions of different slots available in the first LDC. Each time an object is placed in one of the available locations/slots in the first LDC, state of the first LDC changes. At step 208, the system 100 determines future states of the first LDC for placement of the object in each of the locations identified at step 206, for different orientations of the object i.e. the future state is identified for each location-orientation pair. Method of generating the state (state representation) by the system 100 is explained in reference to
[0029] As depicted in
[0030] While using the RL mechanism, in order to keep an RL agent independent of the size of the input, container state is encoded into a fixed-size representation x.sup.−. In various embodiments, this representation could be learnt by an autoencoder, or one can directly use pooling functions. During the experiment, three different pooling functions i.e. (i) average pooling, (ii) max pooling, and (iii) the difference between max pooling and min pooling, were considered to reduce any input size to vectors of size 144 each. The vector x.sup.− of size 3*144=432 is expected to indicate to the RL agent, an average height of the underlying receptive fields, as well as its smoothness. Step size of pooling ensures disjoint receptive fields for each element of the vector. In addition, two more input channels also are defined.
[0031] A vector y.sup.− is used to encode height of bordering cells of each of the selected locations in the box, in order to indicate how well the object fits with surrounding cells when placed in the identified location. A size of 144 units was used for this representation, with borders smaller than 144 units (depending on perimeter of the box) padded by trailing zeroes, and borders larger than 144 units populated using constant-skip sampling. A vector z.sup.− was used as a one-hot encoding of the receptive field that each of the selected locations belongs to. A Deep Q network (DQN) was used to process this information, wherein (x.sup.−,y.sup.−,z.sup.−) forms input to the DQN. The DQN processes the collected inputs and determines the future states of the first LDC for the placement of the object, for different orientations of the object.
[0032] At step 210, the system 100 determines if any of the determined future states is a state that satisfies a pre-defined condition with respect to object packing. In an embodiment, the pre-defined condition is in terms of storage space optimization and the condition is specified with an intention of reducing storage space consumption such that maximum objects can be placed in the LDC container. The system 100 determines an expected storage space utilization resulting from the placement of each object in each location, while considering possible characteristics of future objects to be packed. The system 100 may use any suitable technique, such as but not limited to a Q-Learning algorithm, to determine the expected storage space utilization, and/or any other parameter that forms the pre-defined condition. If any of the future states is determined as satisfying the pre-defined condition, at step 212, then at step 214 the system 100 determines orientation of the object and location/slot in the container where the object needs to be placed, based on location and orientation associated with the future state that has been identified as satisfying the pre-defined condition. Further, at step 216, the object is loaded to the first LDC based on the determined orientation and location.
[0033] In another embodiment, if more than one of the plurality of future states satisfy the pre-defined condition, then the system 100 may choose a future state having highest value of compatibility score. In an embodiment, the compatibility score of a future state represents extent to which the future state is matching the pre-defined condition. If none of the future states of the first LDC is satisfying the pre-defined condition, then at step 218 the system 100 selects next LDC and repeats steps from 202 through 216 for the LDC that has been newly selected, till a location and orientation for loading the object is determined. This process is repeated for all the objects.
[0034] As the RL based approach is used by the system 100, feedback for each recommendation generated by the system 100 is obtained, and this feedback is used to fine-tune recommendations that are generated at future instances. The system 100 uses a terminal reward function represented as:
r.sub.t=ρ.sup.N−tζ (2) [0036] where ζ=0.99 is a discount factor for the terminal reward function. The entire episode is inserted into a replay buffer after completion and computation of the rewards for each step. The logic for using this reward definition are: (i) to provide a reward based on the results of the entire sequence of decisions/recommendations, and (ii) to speed up training by avoiding the sparse nature of pure terminal reward approaches.
[0037] The system 100 carries out the training of the RL model with a mean squared error loss with respect to the relation in (3):
Q(s.sub.t, α.sub.t)=(1−γ)r.sub.t+γQ(s.sub.t+1, α.sub.t+1) (3) [0038] Where, left-hand side part of (3) is network output and the right-hand side part is target value, produced by a target network (cloned from the online network after every 10 training runs). During the experiments conducted, the training was carried out after every episode with a batch size of 256 (the approximate number of boxes per episode in the training data).
[0039] The steps in method 200 may be performed in the same order as depicted in
[0040] Experimental Results:
[0041] During the experiment conducted, the system 100 was trained using synthetically generated data sets, containing boxes of randomly generated dimensions, while making sure that the dimensions match up such that each container can be filled completely (100% fill fraction). Each data set consisted of 10 containers worth of boxes (Opt(I)=10), with the number of boxes ranging between 230 and 370 per episode. The order of upcoming boxes is not known to the system 100, apart from n=2 boxes after the current one. The episode terminates when all boxes are packed, or if the algorithm is unable to pack all boxes in a maximum of 16 containers. The terminal reward (2) is computed based on the number of containers required in each episode.
Training:
[0042] During the training, improvement in packing efficiency was verified using a ε-greedy exploration, with ε decreasing linearly from 1 to 0 in 1000 episodes. The algorithm was trained for the last 1000 episodes without exploration, because it was found during the experiments that random decisions early in an episode greatly affect the terminal reward, significantly slowing down the training. The initial packing efficiency of approximately 65% improved steadily to 82% over 1100 episodes, and stayed stable afterwards. The number of bins used decreased from just above 16 to just under 13.
TABLE-US-00001 TABLE 1 Comp. Time per Avg. Best Algorithm ratio c box (sec) pack pack AH 1.58 — — — Floor building 1.52 0.0002 .sup. 81% 05% Column build 1.46 0.0001 .sup. 81% 06% First Fit 1.47 0.0002 81.3% 07% WallE 1.41 0.0106 81.8% 25% System 100 1.29 0.0342 82.8% 57%
[0043] Table 1 compares performance of the method used by system 100 with a few state of the art techniques on various parameters including competitiveness ratio metric (c), time taken per loading decision, average packing efficiency, and fraction of test instances in which a given method returned the best packing efficiency. Advanced Harmonic (AH) is known to have a theoretical upper bound of c=1:58, although this is with unconstrained rotation. Empirical results for robot-stackable algorithms show that the method 200 has best empirical ratio of 1.29, averaging T.sub.used=12.9 bins compared to Opt(I)=10. It also has the highest average packing fraction. While the difference in packing fractions is small, further experiments revealed that this was because there was significant variation among the instances, with some box streams favoring one approach/method over the others. The fact that the method 200 returns the best efficiency in 57% of test cases implied that it retained a significant advantage over other algorithms across a variety of instances.
[0044] Difference between these approaches is more evident in the box whiskers plot depicted in
[0045] Goal of the method 200 includes training an RL that works without retraining on other problem instances, whether generated using a different distribution of box dimensions, or a different scale of inputs. During the experiments, the packing efficiency was plotted (
Results when system 100 was deployed in real-world scenario:
[0046] To monitor and assess performance of the method 200, the system 100 was deployed on a real-time scenario. Most critical challenge in deployment was updating the method 200's belief state according to the containers' actual configuration, which are subject to sensor and actuator noise. Because of this, the objects may not get placed at the exact location dictated by the method 200.
[0047] It was observed that the objects collide inside the container during placement due to these errors. To avoid the collision grid size was increased, leaving a space of around 1-2 cm between boxes. Eliminating this space is possible only when the method 200 is updated using millimeter-level accurate real-time measurements. Another challenge was estimation of box orientation and dimension while picking, as any error at this point is carried forward and causes differences.
[0048] The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[0049] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[0050] It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.