COMPUTER-IMPLEMENTED METHOD, APPARATUS FOR DATA PROCESSING, AND COMPUTER SYSTEM FOR CONTROLLING A CONTROL DEVICE OF A CONVEYOR SYSTEM

20240140724 ยท 2024-05-02

    Inventors

    Cpc classification

    International classification

    Abstract

    A computer-implemented method, a device for data processing and a computer system for controlling a control device of a conveyor system to achieve an alignment and/or a defined spacing of piece goods, wherein the control of the control device is determined by an agent acting according to Reinforcement Learning methods. An individual, local state vector of predefined dimension that is the same for all the piece goods is created for each of the piece goods and an action vector is selected from an action space according to a strategy that is the same for all piece goods for the current state vector of this piece good. These action vectors are projected onto the conveying elements, wherein conflicts are resolved. After a cycle time has elapsed, state vectors are created again for each piece good and evaluated with rewards and the strategy is adjusted.

    Claims

    1.-15. (canceled)

    16. A computer-implemented method for controlling a control device of a conveyor system for transporting piece goods of at least one type including mail items and pieces of luggage, wherein the conveyor system has a plurality of conveyor elements aligned along and parallel to a conveying direction, the conveyor elements being driven, under control of the control device, by a respectively assigned drive at an individually adjustable velocity to achieve an alignment and/or a defined spacing of the piece goods, wherein the activation of the control device is determined by at least one agent acting or predetermined according to methods of Reinforcement Learning, which agent, in accordance with a strategy, situationally selects an action from an action space for an initial state in order to reach a subsequent state, wherein the initial state and the subsequent state are mappable with state vectors and the actions are mappable with action vectors, the method comprising the process steps: a) creating an output image of the conveyor system; b) for each of the piece goods on the image, individually creating a state vector of predetermined dimension and of the same dimension for all piece goods of one type, comprising state information of the respective item taken from the immediately previously created image; c) for each piece good individually selecting an action vector from an action space according to the strategy, which is the same for all piece goods of a kind, for the current state vector of each piece good, the dimension of the action vector being predetermined; d) for each piece good mapping the action vector onto the conveying elements of each piece good to determine the velocity of these conveying elements, and corresponding control of the conveying elements with the control device; e) creating of a sequential image of the conveyor system and performing process step b) to obtain a state vector of the subsequent state for each piece good after a cycle time has elapsed; f) if the strategy for piece goods of one type is to be trained further during the execution of the method, the state vector of the subsequent state is evaluated for each piece good of this kind by a method of Reinforcement Learning on the basis of a reward, whereupon the agent trains and thus optimizes the strategy for piece goods of this kind by adjusting the action vectors of the action space; and g) for each piece good, carrying out the process steps c)-f) again using the improved or predetermined strategy as long as the piece good concerned is shown on the subsequent image.

    17. The method according to claim 16, further comprising the step of assigning the piece goods on the image to a first and at least one further type depending on properties of the piece goods and for each assigned type providing an agent with a strategy for piece goods of this kind.

    18. The method according to claim 17, further comprising the step of determining for each cycle time and for each piece good the velocities of those conveying elements on which the piece good rests but onto which no action vector an of this piece good has been mapped, and corresponding individual control of these conveying elements with the control device, wherein the velocities are determined by interpolation of the velocities of those adjacent conveying elements onto which an action vector of this piece goods has been mapped.

    19. The method according to claim 18, further comprising the step for each cycle time of determining the velocities of all those conveying elements on which both no piece goods rests and on which no action vector of a piece good has been mapped, and corresponding individual control of just these conveying elements with the control device, wherein: the velocities are determined by interpolation of the velocities of those adjacent conveying elements to which an action vector of a piece good has been mapped; and/or the velocities are determined on the basis of velocity parameters of the conveyor system; and/or the velocity of the conveying elements on whose adjacent conveying elements the action vector of a piece good has been mapped, be selected to match the velocity of that adjacent conveyor element; and/or the velocities for some or all of these conveying elements are identical and are determined from the mean value of the velocities of the conveying elements onto which an action vector of a piece good has been mapped.

    20. The method according to claim 19, wherein the state information of a piece good mapped in the state vector comprises position and/or orientation of the piece good.

    21. The method according to claim 20, wherein the state information of a piece good mapped in the state vector or otherwise comprise: overlap of the piece good with those conveyor elements on which the piece good rests; and/or state information of a predetermined number of nearest adjacent piece goods within a predetermined distance, at least comprising their position and/or distance to the piece good of the state vector, wherein in the case of a smaller number than the predetermined number of nearest adjacent piece goods, the state vector is assigned default values; and/or velocity and/or size of the piece good; and/or global state information of the conveyor system comprising a number of piece goods on the conveyor system, average velocity of the conveyor system, prioritization of individual piece goods, for example based on size and/or sorting criterion.

    22. The method according to claim 21, wherein the action vectors describe only velocities that lie under predetermined points or surface areas of the piece good.

    23. The method according to claim 16, wherein if the action vectors, assigned to two or more piece goods are mapped onto the same conveying element, prioritization and/or weighted averaging of the velocities specified by the action vectors, an is carried out as a function of the respective overlap of these piece goods with this conveying element and/or of a quality of the state vectors; and/or if two elements of the action vector of a piece good are mapped onto the same conveying element, this conveying element is controlled with an mean value of these elements or one of the elements is given full or weighted preference.

    24. The method according to claim 16, wherein the image is evaluated with image processing methods and the state vectors are created based on the evaluated image.

    25. The method according to claim 16, wherein a first generation attempt of the state vectors is performed automatically via Deep Reinforcement Learning from the image.

    26. The method according to claim 16, further comprising the step of training the strategy of the agent for piece goods of a kind with a virtual or real conveyor system.

    27. The method according to claim 16, further comprising the step of determining for each cycle time and for each piece good the velocities of those conveying elements on which the piece good rests but onto which no action vector an of this piece good has been mapped, and corresponding individual control of these conveying elements with the control device, wherein the velocities are determined by interpolation of the velocities of those adjacent conveying elements onto which an action vector of this piece goods has been mapped.

    28. The method according to claim 16, further comprising the step for each cycle time of determining the velocities of all those conveying elements on which both no piece goods rests and on which no action vector of a piece good has been mapped, and corresponding individual control of just these conveying elements with the control device, wherein: the velocities are determined by interpolation of the velocities of those adjacent conveying elements to which an action vector of a piece good has been mapped; and/or the velocities are determined on the basis of velocity parameters of the conveyor system; and/or the velocity of the conveying elements on whose adjacent conveying elements the action vector of a piece good has been mapped, be selected to match the velocity of that adjacent conveyor element; and/or the velocities for some or all of these conveying elements are identical and are determined from the mean value of the velocities of the conveying elements onto which an action vector of a piece good has been mapped.

    29. The method according to claim 16, wherein the state information of a piece good mapped in the state vector comprises position and/or orientation of the piece good.

    30. The method according to claim 16, wherein the state information of a piece good mapped in the state vector or otherwise comprise: overlap of the piece good with those conveyor elements on which the piece good rests; and/or state information of a predetermined number of nearest adjacent piece goods within a predetermined distance, at least comprising their position and/or distance to the piece good of the state vector, wherein in the case of a smaller number than the predetermined number of nearest adjacent piece goods, the state vector is assigned default values; and/or velocity and/or size of the piece good; and/or global state information of the conveyor system comprising a number of piece goods on the conveyor system, average velocity of the conveyor system, prioritization of individual piece goods, for example based on size and/or sorting criterion.

    31. The method according to claim 16, wherein the action vectors describe only velocities that lie under predetermined points or surface areas of the piece good.

    32. A device for data processing for computer-implemented control of the control device of the conveyor system for transporting the piece goods of at least one type, wherein the plurality of conveyor elements are aligned along and parallel to a conveying direction, and wherein the conveyor elements are aligned under the following conditions: control of the control device are driven by a respectively assigned drive with individually adjustable velocity in order to achieve an alignment and/or a defined spacing of the piece goods, wherein the control of the control device is determined by at least an agent which acts according to the Reinforcement Learning methods and which, in accordance with a strategy for piece goods of one type, situationally selects an action from the action space for the initial state in order to reach the subsequent state, wherein the initial state and the subsequent state can be mapped with the state vectors and the actions can be mapped with the action vectors, wherein for the piece goods on the conveyor system are detected by at least one sensor and the control device comprises a computing unit that configured to carry out the method according to claim 16.

    33. A conveying system for transporting the piece goods of at least one type, wherein the conveying elements are driven under the control of the control device by a respectively associated drive at an individually adjustable velocity in order to achieve an alignment and/or a defined spacing of the piece goods, wherein the control of the control device is determined by the agent acting according to methods of the Reinforcement Learning, which agent, in accordance with a strategy which is the same for all piece goods of one type, situationally selects the action from the action space for the initial state in order to reach the subsequent state, wherein the initial state and the subsequent state are represented with the state vectors and the actions are represented with the action vectors, wherein the conveyor system comprises a device according to claim 32.

    34. A computer program comprising instructions which, when executed by the computing unit connected to the conveyor system according to claim 33.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0035] Embodiments of the invention are explained in more detail below with reference to the figures, for example. Thereby show:

    [0036] FIG. 1A shows a top view of a conveyor system;

    [0037] FIGS. 2A-2C show a selection of possible arrangements of the conveying elements;

    [0038] FIG. 3 is a flowchart for determining the action vector;

    [0039] FIG. 4 is the principle of a Reinforcement Learning system;

    [0040] FIG. 5 shows a piece good with corner points and estimated center of gravity; and

    [0041] FIG. 6 exemplifies certain velocities of the conveying elements on which a piece of material rests;

    DETAILED DESCRIPTION

    [0042] FIG. 1 shows a corresponding conveyor system 2, which transports piece goods 4 along a main conveying direction 6 on a conveying line 8 resting on conveying means 12 with a typical field of application as singulator 2 in the postal and logistics sector. The conveying means 12 are arranged parallel to the main conveying direction 6 in segments 10 arranged one behind the other along the main conveying direction 6, aligned and along a line. The piece goods 4 are transferred for transport from one segment 10 to the respective following segment 10 and lie on several conveying means 12 at the same time and can therefore be singulated and/or rotated during their transport by individual control of the conveying means 12 by a control device not shown here, for example by operating the conveying means 12 on which the respective piece good 4 lies at a higher conveying velocity 16 than the adjacent conveying means 12. For this purpose, the control device comprises a computing unit not shown in the figure. The conveyor system 2 comprises a plurality of sensors 26 arranged above and along the conveyor path and designed as optical detectors, but in principle other types of sensors can also be attracted as long as the computing unit is able to generate the state vectors of the piece goods 4 on the basis of the sensor input. A single sensor 26 can in principle already be sufficient if the viewing angle is good.

    [0043] The conveyor system 2 is subdivided into segments 18, 20, 22, 24 performing essentially different tasks along the main conveying direction 6. First, on an expansion device 18, an attempt is made to achieve an expansion of the piece goods distribution on the basis of the arrangement of the conveying elements 12. Subsequently, transport along the main conveying direction 6 is performed solely on a transfer conveyor 20. The transfer conveyor 20 comprises two segments 10b, 10c, each of which comprises only a single conveying means 12 spanning the entire width of the conveyor line 8. For a particularly efficient correction of the alignment, the segments 10d-10h or its conveying means 12 are relatively short in the alignment section 22.

    [0044] For a particularly efficient correction of the distance, the segments 10d-10h or its conveying means 12 in the distance correction section 24 are longer than those of the alignment section 22. It is possible to divide the sections 22, 24 of the conveyor system 2 into sub-conveyor systems with different strategies (higher reward for good alignment in the alignment section 22 or for well-adjusted distances in the distance correction section 24), so that in each case a strategy optimized or optimizable for the respective section 22, 24 is used. However, this procedure of dividing into different sections 22/24 is mainly suitable for conveyor systems 2 which proceed without methods of Reinforcement Learning. According to one embodiment, a reward is also awarded on the basis of a comparison of the state vectors of initial and subsequent state S.sub.n(t), S.sub.n(t+?t) in order to achieve an even better and faster optimization of the strategy.

    [0045] The optimal control behavior of the control device of the conveyor system 2 is machine-learned by means of Reinforcement Learning (FIG. 4). Here, an agent interacts with the environment, which can be either a concrete plant as conveyor system 2, its simulation/digital twin or a data-driven learned model (surrogate model) of the plant 2 or simulation. The actions used to influence the environment are the velocities v of all conveyor elements 12 (e.g., conveyor belts) and are represented as available motor actions in action vectors a.sub.n(t) with typically lower dimensionality than the number of conveyor elements 12. Observations available to the agent as input data are images of the conveyor system, in particular based on cameras 26 and/or other sensor data, and are represented in state vectors S.sub.n(t). If the state vector S.sub.n(t) of a piece good 4 already has the desired orientation and sufficient distance to its adjacent piece goods 4, the action vector will map a simple onward transport in conveying direction 6. The behavior of the agent is optimized based on a reward (reward) signal, which is used to describe the goodness of the current situation. Essentially, the goodness is determined by the position/orientation and the mutual distances of the packages 4. For example, the reward value is high if the packages 4 have a defined target distance from each other and lie at a certain angle on the conveyor system 2 or its conveyor elements 12. Furthermore, power consumption, lifetime consumption, noise emissions, etc. can also be considered as rewards.

    [0046] Since the methods of Reinforcement Learning, in particular by a neural network or a recurrent neural network, including determination of the system model is known, a more detailed description is omitted here. The common methods (e.g. NFQ, DQN, Proximal Policy Optimization), can be used in principle for the invention.

    [0047] According to one embodiment, the piece goods 4 on the image are assigned to a first and at least one further type depending on properties of the piece goods 4. An agent will provide a separate strategy for each assigned piece goods type. If only one strategy is used for all piece goods 4, no assignment needs to be performed.

    [0048] The assignment of the piece goods 4 to a type is done depending on the characteristics of the piece goods. The assignment can be made on the basis of the image or can be determined beforehand (e.g. at a sorting station), in which case the individual piece goods must be tracked precisely during the process so that the assignment to a piece good type is not lost. Possible characteristics determining the assignments to a piece good type can be category (parcels, packages, large letters, . . . ), packaging material (cardboard or plastic), weight (as it influences the adhesion to the conveyor elements), size (determines on how many conveyor elements a piece good rests) . . . The conveyor system determines the type of piece goods 4, e.g. based on the image or based on further sensors, and then assigns a separate strategy to each assigned piece good type, e.g. strategy one for heavy cardboard packages and strategy two for light cardboard packages, strategy three for heavy plastic bag packs, strategy four for light plastic bag packs, as well as any additional strategies for further piece good types.

    [0049] FIGS. 2A-2C show non-exhaustive possible arrangements of the conveying elements 12 of the conveying system 2. In FIG. 2a, all conveying elements 12 are arranged in a net-like matrix. This form is the easiest to describe and also the mapping of an action vector a.sub.n(t) onto the real conveying elements 12 is particularly uncomplicated in this way and always results in a comparable effect by the controlled conveying elements 12 as with a different arrangement. The conveying elements 12 in FIG. 2b are offset in segments transversely to the conveying direction 6, so that two adjacent conveying elements 12 each end in a conveying element 12. And in FIG. 2c, the conveying elements 12 arranged one behind the other along the conveying direction 6 each form continuous conveying sections which are each offset with respect to their conveying elements 12. The arrangement of FIGS. 2b, 2c can, however, offer advantages in particular for smaller piece goods 4 of a package stream which otherwise rest on only one conveying element 12.

    [0050] In a conveyor system 2 operated with the method according to the invention, a design of equal length of all conveyor elements 12 without a division into sections 22/24 (FIG. 2a) is advantageous, since in this way all conveyor elements 12 are of equal length and thus piece goods 4 are manipulated in the same way over the entire area.

    [0051] FIG. 3 shows a flow chart for the determination of the action vector a(t) according to the invention. Since a belt velocity from a continuous range (e.g. between 0.1 m/s and 3.5 m/s) must be set for each conveyor element 12, the action space is a subset of R{circumflex over ()}85 for, for example, 85 conveyor elements, which is far above the complexity that can be learned with known methods (e.g. because in general the number of required training examples increases exponentially with the dimensionality of the data spaces).

    [0052] Therefore, a state vector s(t) is not created for the entire conveyor system 2, but an individual state vector S.sub.1(t), S.sub.n(t) is created for each piece good 4.sub.1, 4.sub.n based on an image of the sensor 26. The state vectors S.sub.1(t), S.sub.n(t) are constructed such that it has the same dimensionality for each piecegood 4.sub.1, 4.sub.n. This means that in particular the number of considered adjacent piece goods 4 remains constant, for example by being limited to the nearest two or three piece goods at a predetermined distance. Piece goods 4 further away are irrelevant for the orientation and spacing of this piece good 4 and need not be considered. This constraint gives a state vector S.sub.n(t) of constant magnitude regardless of the actual number of piece goods 4. In case the total number of actually adjacent piece goods 4 is smaller than the number of considered adjacent packages, the corresponding state information of the state vector S.sub.n(t) can be filled with standard values. Here, for example, values are suitable which originate from so-called virtual piece goods 4 with sufficient distance and perfect alignment on the belt. The values of the virtual piece goods 4 should be selected in such a way that they have as little influence as possible on the control of the considered piece good 4.sub.n.

    [0053] In order to suitably reduce the action space, only a subset of conveyor elements 12 is used for each piece goods 4.sub.n. This is possible in principle, since from the point of view of an individual piece good 4.sub.n at a time t not all conveying elements 12 are relevant, but only a subset of the conveying elements 12, in particular those on which the piece of material 4.sub.n lies. However, depending on the size and orientation of the piece goods 4.sub.n and the conveyor elements 12, the number of relevant conveyor elements 12 varies. For machine learning, however, the action vectors a.sub.n(t) must have a constant dimensionality. Thus, the dimension of the action vectors a.sub.n(t) is smaller than the number of conveying elements 12 of the entire conveying system 2 to achieve a reduction of the dimensionality of the overall problem. For this purpose, a suitable abstraction must be found. For example, the action vector a.sub.n(t) per piece good 4.sub.n can be chosen to include only certain conveying elements 12, e.g. those under the corner points v.sub.1, v.sub.2, v.sub.3, v.sub.4 of a piece 4 as well as under its (estimated) center of gravity V.sub.c, (FIG. 5). In FIG. 6, a 5-dimensional action vector a.sub.n(t) would be given by the belt velocities v.sub.21, v.sub.11, v.sub.13, v.sub.23 (2.01, 2.04, 2.04, 0.10) [m/s] under the 4 vertices as well as by the belt velocity v.sub.s (2.04 m/s) below the center of gravity.

    [0054] An alternative representation of the action vector a.sub.n(t) would be the division of the base area of the piece goods 4.sub.n or a circumscribing rectangle into a fixed number of zones, wherein each zone is described by a velocity v.sub.i. Alternatively, the action vector a.sub.n(t) may describe a velocity vector of the piece goods 4.sub.n. The representation of the action vector a.sub.n(t) is in any case independent of the actual conveying elements 12, but determines their control in the further course of the process.

    [0055] Reinforcement Learning methods use a strategy function (policy) that maps a state vector S.sub.n(t) to an action vector a.sub.n(t) of the action space, i.e. the strategy function chooses appropriate belt velocities depending on the respective situation mapped in the state vector S.sub.n(t). The strategy function is usually represented by a machine-learned model (neural network, Gaussian process, random forrest, parameterized equations, etc.). Mapping the chosen action vector a.sub.n(t) to the real conveying elements 12 influences the subsequent state S.sub.n(t+?t) of the piece goods. To train the strategy, a reward is given based on the subsequent state S.sub.n(t+?t), based on which the agent adjusts the action vectors of the action space and thus improves the strategy. It is possible to additionally award a reward for a comparison of the subsequent state S.sub.n(t+?t) with the initial state S.sub.n(t) or with states S.sub.n(t??t), S.sub.n(t??t), . . . further back in time. This isolated comparison of the subsequent state with the previous state or with more than just the immediately previous state and/or the isolated evaluation of the subsequent state S.sub.n(t+?t) combined with an evaluation quantified with rewards allows the strategy model to be adjusted.

    [0056] The strategy model is thus improved so that in the future, for the initial state S.sub.n(t), even more suitable action vectors a.sub.n(t) are selected and mapped onto the real conveyor system 2. However, it is also possible to optimize the strategy in advance with a real or virtual conveyor system according to the described procedure and to simply apply this already predetermined strategy to the individual state vectors S.sub.n(t) during the control of the conveyor system 2.

    [0057] Thus, on the one hand, it is possible to optimize the strategy and thus the selection of the action vectors a.sub.n(t) for each piece good 4, 4.sub.n during the operation of the plant 2 (i.e., the strategy continues to learn or train during the execution of the process). Alternatively, the strategy can be trained and predetermined in advance using training data (e.g., historical data of the operation of the plant using the standard control), with the same or a comparable plant 2 and different piece good occupancy, or using a simulation of the plant 2. On the one hand, this predetermined strategy can be used as a predetermined initial strategy and this predetermined strategy is then further trained and thus optimized during the execution of the process. Or, on the other hand, this predetermined strategy is simply applied to the states of the piece goods 4.sub.n mapped in the state vectors S.sub.n(t) during the runtime without further optimizationthe strategy is then no longer changed during the runtime.

    [0058] Since the location coordinates of the piece goods 4 and the conveyor elements 12 are known, the states of the piece goods 4 can be mapped from the real world into state vectors S.sub.n(t) of the virtual world. For each piece good 4 individually, an action vector a.sub.n(t) is selected based on its state vector S.sub.n(t) using a strategy in the virtual world. This action vector a.sub.n(t) can in turn be mapped back to the conveying elements 12 of the real conveyor system 2, so that these conveying elements 12 are controlled at the mapped velocities of the action vector a.sub.n(t), whereupon the piece good 4 and the entire conveyor system 2 are transferred to a subsequent state. Each time after a cycle time ?t has elapsed, this process is evaluated on the basis of a reward, which improves the strategy. This process is carried out for each piece good 4 in the area of the image until the piece good 4 has left the area of the image.

    [0059] After each cycle time ?t has elapsed, i.e. essentially at the same time as the velocities v of those conveying elements to which an action vector a.sub.n(t) has been mapped are determined, the velocities of those conveying elements 12 are determined to which, however, the action vector a.sub.n(t) has not been mapped. The velocities of these conveying elements 12 are determined and controlled by the control device according to this determination.

    [0060] This concerns the conveying elements 12 on which the piece good 4.sub.n rests but to which no action vector a.sub.n(t) of this piece goods 4.sub.n has been mapped. The velocities v of these conveying elements 12 are determined by interpolation, e.g. bilinear interpolation, of the velocities v of those adjacent conveying elements 12 to which an action vector a.sub.n(t) of this piece good 4.sub.n has been mapped.

    [0061] In addition, this concerns those conveying elements on which no piece goods 4.sub.n rest as well as on which no action vector a.sub.n(t) of a piece good 4.sub.n has been mapped. The velocities v of these conveying elements 12 can be determined according to one of the following approaches, which can also be combined with each other:

    [0062] Via interpolation of the velocities v of those adjacent conveyor elements 12 to which an action vector a.sub.n(t) of a piece good 4.sub.n has been mapped. Special boundary conditions can be assumed for edge conveyor elements 12. The velocities v are determined based on velocity parameters of the conveyor system 2 (standard values from installation or simulation, e.g. mean value of all action vector conveyor elements). The velocity v from the conveyor elements 12, on whose adjacent conveyor elements 12 the action vector a.sub.n(t) of a piece good 4.sub.n has been mapped, are chosen to match the velocity of this adjacent conveyor element 12. Potential conflicts may arise in this regard and may be resolved by, for example, prioritization and/or weighted averaging. The velocities for some or all of these conveyor elements 12 may be identical and determined from the average of the velocities of the conveyor elements 12 onto which an action vector a.sub.n(t) of a piece of material 4.sub.n has been mapped.

    [0063] An essential advantage of the method according to the invention is that the strategy is trained from the point of view of one piece good 4.sub.n at a time for all future piece goods 4 (and for future states of this same piece good 4.sub.n) and is also used as a common, shared strategy for all piece goods 4. The same strategy model is thus applied to each piece good 4, 4.sub.1, 4.sub.n and calculates an individual, local action vector a.sub.1(t), a.sub.n(t) based on the individual state vector S.sub.1(t), S.sub.n(t) in each case.

    [0064] The action vectors a.sub.1(t), a.sub.n(t) are then mapped to the real conveying elements 12 as a global band matrix (comprising all conveying elements 12). Intermediate conveyor elements 12 are given suitably interpolated values (e.g. via bilinear interpolation). When mapping to the real belt matrix, conflicts may arise, i.e. more than one package 4 addresses the same conveying element 12. These conflicts, several of which are shown in FIG. 7, are resolved by prioritizing and/or weighted averaging depending on the overlap of the piece goods 4 with the conveying element 12 and package state. For example, a package 4 with little overlap receives a small weight of the velocity of its action vector a(t) projected onto the conveying element 12 in the averaging. An appropriate logic can be given via expert knowledge or can be learned by machine. The overlap of each piece good 4 with its conveying elements 12 can be mapped in the state vector S.sub.n(t) or otherwise.

    [0065] Training of the strategy function can be performed using real or simulated data. In particular, the training at the customer's site can be continued in operation, which allows the conveyor system to automatically adapt to changing characteristics of the package flow (size, weight, shape, and material of the packages).

    [0066] According to one embodiment, the state vector S.sub.n(t) of a piece good 4.sub.n may comprise one or more of the following information: State information of the respective package 4 (and adjacent packages 4) such as positions, velocities, orientation, . . . Global information about the state of the conveyor system 2: number of packages 4, average velocity v, prioritization by the user, . . .

    LIST OF REFERENCE SIGNS

    [0067] 2 Conveyor system [0068] 4 Piece goods [0069] 6 Conveying direction [0070] 8 Conveyor line [0071] 10 Segment [0072] 12 Conveying means [0073] 18 Expansion device [0074] 20 Transfer conveyor [0075] 22 Alignment section [0076] 24 Distance correction section [0077] 26 Sensor [0078] V Velocity [0079] a(t) Action vector [0080] s(t) State vector [0081] ?t Cycle time