SCHEDULING JOBS OF A MANUFACTURING OR LOGISTICS PROCESS
20220404815 · 2022-12-22
Inventors
Cpc classification
G06Q10/06
PHYSICS
G06Q10/08
PHYSICS
G06Q10/0631
PHYSICS
International classification
G05B19/418
PHYSICS
Abstract
A computer-implemented method of scheduling jobs of a manufacturing or logistics process using a priority function. The priority function is evaluated on multiple to be scheduled jobs to obtain multiple respective priority values. The priority function is defined to invoke a kernel function. The kernel function is defined to compare representations of two respective jobs. Evaluating the priority function on a selected job comprises evaluating the kernel function on representations of the selected job and one or more reference jobs. The schedule for the multiple jobs is determined based on the priority values and is then output to enable the multiple jobs to be carried out according to the schedule.
Claims
1. A computer-implemented method of scheduling jobs of a manufacturing or logistics process or for data package transmission in telecommunications, the method comprising the following steps: obtaining data representing a priority function, wherein the priority function is defined to determine a priority value indicating a priority for a job of the manufacturing or logistics process, wherein the priority function is defined to invoke a kernel function, wherein the kernel function is defined to compare representations of two respective jobs, and wherein the priority function includes comparisons by the kernel function of the job with one or more reference jobs; obtaining job status data including representations of multiple jobs to be scheduled; evaluating the priority function on the multiple respective jobs to obtain multiple respective priority values, wherein the evaluating of the priority function on a selected job of the multiple respective jobs includes evaluating the kernel function to compare the selected job to the one or more reference jobs; determining a schedule for the multiple jobs based on the priority values; outputting scheduling data representing the determined schedule to enable the multiple jobs to be carried out according to the schedule.
2. The method of claim 1, wherein the representation of the selected job includes one or more of: a release date of the selected job, a logarithm of the release date, a weight of the selected job, a logarithm of the weight, a processing time of the selected job, a logarithm of the processing time.
3. The method of claim 2, further comprising computing an aggregate of a job attribute over the multiple jobs to be scheduled, and including the aggregate in representation of the selected job.
4. The method of claim 3, wherein evaluating the kernel function comprises computing a degree of similarity between the multiple jobs to be scheduled and a set of jobs corresponding to the reference job.
5. The method of claim 1, further comprising: obtaining data representing multiple respective priority functions; determining respective schedules according to the respective priority functions; and selecting a schedule among the respective schedules.
6. A computer-implemented method of determining a priority function for scheduling jobs of a manufacturing or logistics process or for data package transmission in telecommunications, wherein the priority function is defined to determine a priority value indicating a priority for a job of the manufacturing or logistics process, and wherein the method comprises the following steps: accessing training data including a set of multiple training jobs, and obtaining from the training data one or more training instances, wherein each training instance includes multiple jobs of the set of the training jobs ordered according to a target schedule, wherein the multiple jobs include an earlier job ordered before a later job; obtaining a kernel function, wherein the kernel function is defined to compare representations of two respective jobs, wherein a class of priority functions is defined in terms of the kernel function, wherein a priority function of the class of priority functions includes comparisons according to the kernel function with one or more reference jobs; determining the priority function by performing an optimization, wherein the optimization is configured to select the priority function from the class of priority functions, and wherein the optimization is configured to penalize the priority function for assigning a higher priority to the later job than to the earlier job; and outputting the determined priority function from the class of priority functions for use in the manufacturing or logistics process.
7. The method of claim 6, further comprising: obtaining the training instance by selecting the multiple jobs from the set of training jobs, and determining the target schedule for the selected multiple jobs.
8. The method of claim 7, wherein the determining of the target schedule includes determining an optimal or near-optimal target schedule.
9. The method of claim 6, wherein the class of priority functions is a class of linear combinations of comparisons with reference jobs, and wherein the optimization penalizes the priority function by optimizing a loss function defined over respective pairs of consecutive earlier and later jobs of the training set.
10. The method of claim 9, wherein the optimizing includes iteratively: selecting the earlier job and the later job from the respective pairs such that the earlier and later job provide a contribution to the loss function; and adapting the current priority function to reduce the contribution to the loss function.
11. The method of claim 6, wherein the method further comprises: determining a position priority function for each given position, wherein the position priority function is configured to separate jobs occurring at the given position in respective training instances at least from jobs occurring after the given position in the respective training instances; and combining respective position priority functions for respective positions into an overall priority function.
12. The method of claim 11, further comprising: determining the position priority function as a classifier defined by a system of respective inequalities, wherein the position priority function is computed based on differences to decision boundaries of the respective inequalities.
13. A scheduling system for scheduling jobs of a manufacturing or logistics process or for data package transmission in telecommunications, the system comprising: a data interface configured to access data representing a priority function, wherein the priority function is defined to determine a priority value indicating a priority for a job of the manufacturing or logistics process, wherein the priority function is defined to invoke a kernel function, wherein the kernel function is defined to compare representations of two respective jobs, and wherein the priority function includes comparisons by the kernel function of the job with one or more reference jobs; an input interface configured to obtain job status data including representations of multiple jobs to be scheduled; an output interface configured to output scheduling data representing a schedule to enable the multiple jobs to be carried out according to the schedule; and a processor subsystem configured to: obtain, via the input interface, the job status data including the representations of the multiple jobs to be scheduled; evaluate the priority function on the multiple respective jobs to obtain multiple respective priority values, wherein the evaluating of the priority function on a selected job of the multiple respective jobs includes evaluating the kernel function on representations of the selected job and the reference job; determine a schedule for the multiple jobs based on the priority values; output, via the output interface, the determined schedule to enable the multiple jobs to be carried out according to the schedule.
14. A training system for determining a priority function for scheduling jobs of a manufacturing or logistics process or for data package transmission in telecommunications, wherein the priority function is defined to determine a priority value indicating a priority for a job of the manufacturing or logistics process, the system comprising: a data interface configured to access training data including a set of multiple training jobs; a processor subsystem configured to: obtain from the training data one or more training instances, wherein each training instance comprises multiple jobs of set of the training jobs ordered according to a target schedule, wherein the multiple jobs include an earlier job ordered before a later job; obtain a kernel function, wherein the kernel function is defined to compare representations of two respective jobs, wherein a class of priority functions is defined in terms of the kernel function, wherein a priority function of the class of priority functions includes comparisons according to the kernel function with one or more reference jobs; determine the priority function by performing an optimization, wherein the optimization is configured to select the priority function from the class of priority functions, and wherein the optimization is configured to penalize the priority function for assigning a higher priority to the later job than to the earlier job; and output the determined priority function from the class of priority functions for use in the manufacturing or logistics process.
15. A non-transitory computer-readable medium on which are stored one or more of: instructions for scheduling jobs of a manufacturing or logistics process or for data package transmission in telecommunications which, the instructions, when executed by a processor system, cause the processor system to perform: obtaining data representing a priority function, wherein the priority function is defined to determine a priority value indicating a priority for a job of the manufacturing or logistics process, wherein the priority function is defined to invoke a kernel function, wherein the kernel function is defined to compare representations of two respective jobs, and wherein the priority function includes comparisons by the kernel function of the job with one or more reference jobs, obtaining job status data including representations of multiple jobs to be scheduled, evaluating the priority function on the multiple respective jobs to obtain multiple respective priority values, wherein the evaluating of the priority function on a selected job of the multiple respective jobs includes evaluating the kernel function to compare the selected job to the one or more reference jobs, determining a schedule for the multiple jobs based on the priority values, outputting scheduling data representing the determined schedule to enable the multiple jobs to be carried out according to the schedule; and/or a priority function for use in scheduling jobs of a manufacturing or logistics process, wherein the priority function is defined to determine a priority value indicating a priority for a job of the manufacturing or logistics process, wherein the priority function is defined to invoke a kernel function, wherein the kernel function is defined to compare representations of two respective jobs, and wherein the priority function includes comparisons by the kernel function of the job with one or more reference jobs.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0038] These and other aspects of the present invention will be apparent from and elucidated further with reference to the example embodiments described by way of example in the following description and with reference to the figures.
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
[0050]
[0051]
[0052]
[0053]
[0054]
[0055]
[0056] It should be noted that the figures are purely diagrammatic and not drawn to scale. In the figures, elements which correspond to elements already described may have the same reference numerals.
DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS
[0057]
[0058] For example, as also illustrated in
[0059] The system 100 may further comprise a processor subsystem 140 which may be configured to, during operation of the system 100, obtain from the training data one or more training instances. A training instance may comprise multiple jobs of set of the training jobs ordered according to a target schedule. The multiple jobs may comprise an earlier job ordered before a later job. The processor system 140 may be further configured, during operation of the system, obtain a kernel function. The kernel function may be fixed, e.g., hardcoded, or configurable, for example. For example, instructions for evaluating the kernel function may be accessible via data interface 120. The kernel function may be defined to compare representations of two respective jobs. A class of priority functions may be defined in terms of the kernel function. A priority function of the class of priority functions may include comparison according to the kernel function with one or more reference jobs. The processor system 140 may be further configured to, during operation of the system 100, determine the priority function by performing an optimization. The optimization may be configured to select the priority function from the class of priority functions. The optimization may be configured to penalize the priority function for assigning a higher priority to the later job than to the earlier job.
[0060] The processor subsystem 140 may be further configured to, during operation of the system 100, output the determined priority function from the class of priority functions. For example, the priority function may be provided, directly or indirectly, to a manufacturing system for use in its scheduling, e.g., as discussed with respect to
[0061]
[0062] For example, as also illustrated in
[0063] The system 200 may further comprise a processor subsystem 240 which may be configured to, during operation of the system 200, obtain job status data comprising representations of the multiple jobs to be scheduled via an input interface 260, as also described elsewhere. The processor subsystem 240 may be further configured to, during operation of the system 200, evaluate the priority function 040 on the multiple respective jobs to obtain multiple respective priority values. Evaluating the priority function on a selected job of the multiple respective jobs may comprise evaluating the kernel function to compare the selected job to the one or more reference jobs. The processor subsystem 240 may be further configured to, during operation of the system 200, determine a schedule for the multiple jobs based on the priority values. The processor subsystem 240 may be further configured to, during operation of the system 200, output the schedule via an output interface 280, as also discussed elsewhere, to enable the multiple jobs to be carried out according to the schedule.
[0064] It will be appreciated that the same considerations and implementation options apply for the processor subsystem 240 as for the processor subsystem 140 of
[0065] The system 200 may comprise an input interface 260 for obtaining job status data 224. The job data typically comprises representations of multiple jobs to be scheduled, and/or status information about devices of the process 082, e.g., their current availability or any other information needed to execute scheduling, e.g., to evaluate the constraints of the scheduler. The system 200 may receive combined job status data 224 containing information about jobs from multiple respective devices of the process 082, or the system 200 may receive respective separate job status data 224 about jobs from several respective devices. The input interface 260 may communicate internally with processor subsystem 240 via data communication 223.
[0066] The system 200 may further comprise an output interface 280 for outputting scheduling data 226 representing a schedule to enable the multiple jobs to be carried out in the process 082 according to the schedule. For example, the system 200 may output overall scheduling data 226 relating to jobs of multiple devices to a separate system responsible for letting the devices perform the jobs according to the schedule, or the system 200 may output respective scheduling data 226, e.g., instructions to respective devices of the process 082 to perform respective jobs. The scheduling data 226 may be output in one go as a schedule for all jobs to be scheduled, or can be determined, updated, and/or output progressively. The output interface 280 may communicate internally with processor subsystem 240 via data communication 225.
[0067] In some embodiments, the input interface 260 and/or the output interface 280 are embodied as a communication interface configured for communication 224, 226 with one or more other systems. For example, the other system can be a SCADA system, a PLC system, a distributed control system, a batch automation system, or, more generally, any ISA-95 level 2 system. System 200 itself may be implemented as a manufacturing operations system, e.g., a manufacturing execution system, or, more generally, any ISA-95 level 3 system. However, system 200 can also be implemented in or as a level-2 system such as a SCADA system or PLC system, with data 224, 226, being provided directly to manufacturing devices. Generally, a communication interface 260, 280 may be arranged for direct communication with the other system, e.g., using USB, IEEE 1394, or similar interfaces. The communication interface 260, 280 may also communicate over a computer network, for example, a wireless personal area network, an internet, an intranet, a LAN, a WLAN, etc. For instance, communication interface 260, 280 may comprise a connector, e.g., a wireless connector, an Ethernet connector, a Wi-Fi, 4G or 4G antenna, a ZigBee chip, etc., as appropriate for the computer network. Communication interface 260, 280 may also be an internal communication interface, e.g., a bus, an API, a storage interface, etc.
[0068] In some embodiments, input interface 260 may be implemented as a sensor interface for directly accessing sensor data 224 of one or more sensors of the process 082. For example, this may the case if system 200 is integrated in a system that is used to directly control one or more devices. Based on the sensor data 224, the status of jobs currently carried out may be determined, based on which it can be determined which jobs are still to be carried out. Other data needed for scheduling, e.g., data needed to evaluate the constraints of the scheduler, may be derived from sensor data 224 as well. The sensor may but does not need to be part of the system 200. The sensor may have any suitable form, such as an image sensor, a lidar sensor, a radar sensor, a pressure sensor, a contain temperature sensor, etc. The sensor data interface 260 may have any suitable form corresponding in type to the type of sensor, including but not limited to a low-level communication interface, e.g., based on I2C or SPI data communication, or a data storage interface of a type as described above for the data interface 220.
[0069] In some embodiments, the output interface 280 may be implemented as an actuator interface 280 for providing control data 226 to one or more actuators of the process 082. For example, this may be the case if system 200 is integrated in a system that is used to directly control one or more manufacturing devices. The actuators may or not be part of system 200. For example, an actuator may be an electric, hydraulic, pneumatic, thermal, magnetic and/or mechanical actuator. Specific yet non-limiting examples include electrical motors, electroactive polymers, hydraulic cylinders, piezoelectric actuators, pneumatic actuators, servomechanisms, solenoids, stepper motors, etc.
[0070] Using such control data 226 the devices may be made to carry out the jobs according to the schedule.
[0071] In general, each system described in this specification, including but not limited to the system 100 of
[0072]
[0073] In particular, in this example, the manufacturing process 082 is controlled and/or monitored by a manufacturing control system 310. For example, system 310 can be a supervisory control and data acquisition (SCADA) system, a programmable logic controller (PLC), a distributed control system (DCS), or a batch automation system (BAS). There can be multiple such manufacturing control systems, e.g., controlling and/or monitoring respective parts of the manufacturing process 082. Manufacturing control systems are also sometimes referred to as level-2 systems, e.g., in the context of the ISA-95 standard.
[0074] The manufacturing control system 310 is configured to monitor and/control one or more manufacturing devices 311, 312, 313, also referred to as manufacturing machines. As an illustrative example, manufacturing robots are shown; however, other types of manufacturing devices are also possible. Generally, a job may be a task that is assigned to be carried out by a particular unit of the process, e.g., a single manufacturing device. Typically, a unit can also carry out only one job (or a limited number of jobs) at the same time. Thus, the manufacturing process 082 may be scheduled in terms of jobs as atomic tasks carried out by the manufacturing devices 311-313. For example, the number of manufacturing devices for which a schedule is determined as described herein, may be at most or at least 10, at most or at least 50, or at most or at least 100. The respective manufacturing devices 311-313 may each be identical, but can also differ from each other.
[0075] As illustrated in the figure, the system 200 may be configured to use the techniques described herein determine a schedule 320 for scheduling jobs of the manufacturing process 082, e.g., respective jobs may be assigned to be performed by the respective manufacturing devices 311-313, at respective points in time. The figure illustrates the schedule 320 being provided by the system 200 to the manufacturing control system 310 for controlling the manufacturing such that the jobs are carried out by the device 311-313 according to the schedule, for example, using a computer network 330. Thus, the manufacturing system 080 benefits from the more optimal schedule determined by the system 200, allowing it to perform manufacturing faster, with increased throughput, and/or with reduced waste of resources.
[0076] The provided techniques are applicable to various types of manufacturing processes 082. The provided techniques are especially suitable to relatively complex manufacturing processes, and processes using relatively expensive equipment. In such cases, it is particularly important to determine an efficient schedule in a computationally efficient way.
[0077] In a preferred embodiment, the manufacturing process 082 is a process for semiconductor production, e.g., in a foundry. For example, the manufacturing process can be a semiconductor device fabrication process used to manufacture a semiconductor device or an integrated circuit comprising multiple such semiconductor devices. The process may be for producing an electronic circuit on a wafer of semiconducting material such as silicon.
[0078] As another example, the manufacturing process 082 can be a process for producing a vehicle or components thereof, etc. The provided techniques are also applicable to other types of systems that apply jobs according to a schedule, such as logistics systems.
[0079] It is noted that, although systems 200, 310, 311-313 comprised in the manufacturing system 080 are illustrated here as separate devices, it is possible to combine several such systems into a single device. For example, system 200 may be combined with system 310 in a single device; systems 310-313 may be combined in to a single device, or both. It is also possible to implement the scheduling as described herein not at the level of a manufacturing execution system, but at the level of a manufacturing control system, for example. Many variants and alternatives will be envisioned by the skilled person, in view of the disclosure herein.
[0080]
[0081] Illustrated in the figure is a set of multiple jobs to be scheduled. Shown for illustrative purposes are jobs J1, 401; J2, 402; and Jk, 403. Interestingly, the scheduling techniques described with respect to this figure, can be applied to scheduling problems that comprise a large number of jobs Ji, for example, the number of jobs can be at least 1000, at least 10000, or at least 100000.
[0082] Typically, a job is represented by one or more attributes, such as a processing time (illustrated by the respective beams in the figure), a release date r1,r2,rk, and a weight w1,w2,wk indicating the importance of the job. Mathematically, a sequence of k jobs may be described by column vectors D.sub.1, . . . , D.sub.k of attributes. Thus, the set of jobs to be scheduled may be represented as a matrix D whose columns are D.sub.j where j ranges from 1 to k. For example, jobs J1, J2, . . . Jk may be represented as:
[0083] The jobs Ji may be scheduled for execution on a number of m devices, e.g., robots. The devices can be all identical, for example. As an example, a scheduling problem may be denoted as Pk|r.sub.j|Σ.sub.j w.sub.jC.sub.j, where there are k parallel identical machines, e.g., processors or robots; job release dates r.sub.j; and weights w.sub.j. Thus, such a job may be described by a column vector D.sub.j=(p.sub.j, r.sub.j, w.sub.j).sup.T, where p.sub.j is the processing time of job j.
[0084] An objective of the schedule can for example be to minimize, at least approximately, the sum of weighted completion times C.sub.j, e.g., to minimize the average weighted completion time.
[0085] In the example of this figure, the jobs are scheduled based on a priority function PF, 410. The priority function may be defined to determine a priority value indicating a priority for a job of the manufacturing or logistics process. Typically, the priority value is real-valued, e.g., positive real-valued. Lower values may be used to denote higher priority, but the converse is also possible. The priority function PF may receive a representation of the job. By applying the priority function PF to the respective jobs Ji, respective priority values PR1, 411, PR2, 412, . . . , PRk, 413, may be obtained.
[0086] A scheduling procedure may be used that schedules the jobs according to their respective priority values PRi. In particular, such a scheduling procedure may be based on sorting the jobs Ji according to their respective priority values PRi. The figure shows a sorting operation SRT, 420, that determines a sorting of the jobs Ji according to their priority values, e.g., from highest to lowest priority. In the example of the figure, the sorting has resulted in job Jk, followed by job J2, followed by job J1. The sorting 430 of the jobs may be regarded as a permutation of the original list 401-403 of jobs. It is not necessary that the sorting 430 is explicitly and completely determined, e.g., it is also possible to repeatedly scan through the list of priority values PRi to find a next remaining job with highest priority.
[0087] A scheduling procedure LS, 440, may determine a schedule based on the jobs sorted according to their priority values PRi. For example, the scheduling procedure LS may not otherwise use the priority values. An advantage of scheduling based on an ordering of priority values, is that the priority values need only be determined once, e.g., the scheduling does not need to apply the priority function again. This means that it is possible to use a relatively complex priority function PF and still keep efficient scheduling.
[0088] Preferably, the priority function PF does not take into account constraints imposed by the manufacturing or logistics process at hand, such as availability constraints of the machines, interdependencies between the jobs, etcetera. Such interdependency constraints may for example indicate that one or more jobs need to be processed before one or more other jobs, and/or that one or more jobs cannot be processed at the same time. Such constraints, also called feasibility conditions, are taken into account in the scheduling LS of the jobs based on the computed priority values. This has the advantage that the feasibility conditions can be modified as necessary without affecting the way the priority function PF is computed. This is particularly important in complex industrial processes. For example, during use of the system, the constraints may change, but the same priority function PF and/or priority values PRi may be used to update the schedule. Or, at least, the priority function may be determined, e.g., updated, in such a way that the applicable constraints are accessed only indirectly in that they may be used to determine a target schedule that is used to determine the priority function (for example, using scheduling operation LS), but are not otherwise accessed directly. In this way, the complexity of the constraints may have little to no effect on determination of the priority function, which may thus remain efficient even if the constraints are complicated, and, since it does not need to take the constraints into account, can be trained accurately on relatively little data.
[0089] For example, the procedure LS used to schedule the jobs may apply list scheduling. List scheduling may take as input the ordered set of jobs 430, in other words, a permutation 430 of the set of jobs to be scheduled, and may iteratively select a job from the ordered set of jobs 430 and schedule it as early as possible, taking into account the constraints imposed by the manufacturing or logistics process. For example, a list scheduling procedure LS may be implemented given a permutation π(1), . . . , π(n) on indices {1,2, . . . , n} of the set of jobs to be scheduled by, given π, for t=1, . . . , n, scheduling job π(t) as early as possible into one of the available time slots on one of the machines taking into account the release dates, e.g., processing can start no earlier than r.sub.π(t), e.g.:
[0090] Step 1. Initialize the counter of jobs: t:=1.
[0091] Step 2. Pick the job π(t) and assign it to a machine which can start processing job π(t) as early as possible while satisfying all feasibility conditions.
[0092] Step 3. Increase the counter of jobs: t:=t+1.
[0093] Step 4. If t≤n, then go to Step 2, else return the schedule.
[0094] Scheduling procedure LS may result in a schedule SCH, 450, in which respective jobs are assigned to particular resources (e.g., machines) for execution at particular times. As also discussed elsewhere, scheduling procedure LS may only output a particular schedule SCH and/or may progressively output parts of the schedule, as needed. In some cases, e.g., for the purpose of determining a priority function, it may be sufficient for scheduling procedure LS to output just the order of the jobs of the schedule SCH, e.g., the list of scheduled jobs sorted by the starting times at which they are scheduled. The mapping from permutations to feasible schedules determined by the scheduling procedure LS may be denoted by L. At least if list scheduling is used, it may be noted that for each scheduling task, it is possible for priority function PF to output priority values that give rise to a permutation π, 430 for which L(π) is an optimal schedule.
[0095] Some possibilities for priority function PF are now discussed that are conventional. These priority functions do not use a kernel function. For example, these priority functions may be used in relation to the scheduling problem Pk|r.sub.j|Σ.sub.j w.sub.jC.sub.j: [0096] earliest release date (ERD): schedule the jobs in nondecreasing order of r.sub.j. This priority function is optimal e.g. for Pk|r.sub.j,p.sub.j=const, w.sub.j=const|Σ.sub.j w.sub.jC.sub.j. [0097] shortest processing time (SPT): schedule the jobs in nondecreasing order of p.sub.j. This priority function is optimal for 1∥Σ C.sub.j. [0098] weighted shortest processing time (WSPT): schedule the jobs in nondecreasing order of p.sub.j/w.sub.j. This priority function is optimal for 1∥Σ w.sub.jC.sub.j. [0099] maximum weight (MW): schedule the jobs in nonincreasing order of w.sub.j. This priority function is optimal for 1|p.sub.j=const|Σ.sub.j w.sub.jC.sub.j.
[0100] These rules can be described as posynomials as follows:
TABLE-US-00001 Rule Description Posynomial φ(p.sub.j, w.sub.j, r.sub.j) Earliest release r.sub.π(j) ≤ r.sub.π(j+1). r.sub.j date (ERD): Shortest processing p.sub.π(j) ≤ p.sub.π(j+1). p.sub.j time (SPT): Weighted shortest p.sub.π(j)/w.sub.π(j) ≤ p.sub.π(j+1)/w.sub.π(j+1) p.sub.j .Math. w.sub.j.sup.−1 processing time (WSPT): Maximum Weight w.sub.π(j) ≥ w.sub.π(j+1) w.sub.j.sup.−1 (MW):
[0101] Interestingly, it is also possible to use multiple respective priority functions to determine a schedule. For example, the multiple priority functions may comprise one or more priority functions determined as described herein, and/or one or more other priority functions, e.g., standard heuristics. To this, end, respective schedules SCH may be determined according to the respective priority functions as described herein; and a schedule may be selected among the respective schedules SCH. For example, an objective or loss function may be computed for the respective schedules, e.g., minimal total weighted completion time or similar. This way, it is possible to use multiple priority rules determined using the techniques described herein, and use one that works best in the current setting.
[0102]
[0103] As is conventional, a kernel function KF takes two inputs and essentially compares the two inputs in the sense of computing an inner product between the two elements in an implicit feature space. Kernel functions are sometimes referred to as positive-definite kernels. Kernel function KF is typically symmetric, e.g., K(x,y)=K(y,x) for all x, y. Kernel function KF is typically also positive semi-definite. Several definitions of positive semi-definiteness are possible. One possible definition of positive semi-definiteness is that given c.sub.1, . . . , c.sub.n, for any x.sub.1, . . . , x.sub.n, Σ.sub.iΣ.sub.j c.sub.ic.sub.jK (x.sub.i, x.sub.j)≥0. The kernel function can be positive definite, meaning that Σ.sub.i Σ.sub.j c.sub.ic.sub.jK(x.sub.i, x.sub.j)=0 implies c.sub.1= . . . =c.sub.n=0, but this is not needed.
[0104] The kernel function KF may be defined to compare representations of two respective jobs. That is, the kernel function KF may compute an inner product between the representations of the jobs in an implicit feature space.
[0105] Computing the kernel function KF may comprise determining respective attribute vectors representing the respective jobs, and computing a kernel of the respective attribute vectors. The kernel function applied to the attribute vectors can be a conventional kernel function, e.g., a polynomial kernel, a Gaussian kernel, an RBF kernel, a sigmoid kernel, etc.
[0106] Various advantageous options for the attribute vectors are possible. The attribute vector may comprise a release date, a weight, and/or a processing time of the job. Instead or in addition, the attribute vector may contain logarithms of the respective attributes. The use of logarithms in the kernel functions may correspond conceptually to the use of posynomials in priority rules as discussed w.r.t.
[0107] In addition to including attributes of the jobs themselves, the attribute vector may also include one or more attributes obtained by computing an aggregate of a job attribute over the set of jobs to which the job belongs. For example, the attribute vector may include a sum or average of an attribute, e.g., weight, processing time, or release date, over all jobs to be scheduled, or over all jobs excluding the selected job.
[0108] For example, mathematically, the attribute vector may be denoted as τ.sub.D(x)=ξ(x, D), where ξ is a vector function, x is the selected job, and [D] is the set of jobs, e.g., the jobs be scheduled, e.g., [D]={D.sub.j: j=1, . . . , n(D)}.
[0109] For example, the following choice of ξ includes logarithms of respective attributes, allowing to find a posynomial-type rule over the original space of job attributes (that may optionally be transformed to make them positive):
ξ(x, D)=logx.sub.1, . . . , logx.sub.m).
[0110] For example, a priority rule based on this kernel function may be defined by Σ.sub.i=1.sup.m α.sub.ilogx.sub.i, corresponding to a rule defined by a posynomial Π.sub.i=1.sup.m x.sup.α.sup.
[0111] As another example, aggregates of job attributes can be included as follows, thus including more information of how each job is related to the context given by the respective problem instance (in this example using a logarithm, but this is not needed):
[0112] In another example, linear rules in the original space of attributes are combined with posynomial-type rules:
ξ(x, D)=(x, y, log x, log y),
[0113] where the log is applied component-wise to the vector.
[0114] Instead of or in addition to computing a kernel over the job attributes, the computation of the kernel function KF can also include the computation of a degree of similarity between the sets of jobs corresponding to the respective inputs. For example, the degree of similarity may be computed using an intersection kernel. For example, the jobs may be represented as regions, e.g., spheres, in Euclidean space, and an intersection between the sets of regions corresponding to the respective sets of jobs may be computed. For example, the following intersection kernel may be used:
[0115] for any nonempty measurable subsets M.sub.1 and M.sub.2 of a Euclidean space, where vol(.Math.) denotes the volume. This is a normalized variant of the conventional intersection kernel.
[0116] The degree of similarity may be computed for volumes corresponding to the respective sets of jobs, from which the selected job may optionally be excluded. Excluding the job itself allows to better distinguish between jobs of the same instance because it leads to differences for jobs from the same set of jobs with different attribute vectors.
[0117] Mathematically, a representation of a job may for example be defined by a function τ as follows:
τ.sub.D(x)=(ξ(x, D), [D]\{x}),
[0118] where ξ is the attribute vector function, and [D] the set of jobs. Thus, the representation may comprise the attribute vector ξ(x,D) and the set of columns of D (possibly, excluding the given vector x), e.g., as a tuple.
[0119] A kernel function over this representation that uses both an attribute vector and a degree of similarity of job sets can for example be computed as:
K((z.sub.1, S.sub.1), (z.sub.2, s.sub.2))=K.sub.0(z.sub.1, z.sub.2)+γ{tilde over (K)}(∪.sub.p∈S.sub.
[0120] where K.sub.0 is a positive semidefinite kernel function over the Cartesian product of two copies of the Euclidean space containing the codomain of ξ, γ is a positive parameter, and B(p,r) denotes the ball centred at p with radius r. Both γ and r may be given parameters, e.g., are typically not learned as part of determining the priority function. Typically, K.sub.0 and K.sub.int are positive semidefinite kernels, so K is a positive semidefinite kernel as well.
[0121] Kernel function KF K may be computed approximately using e.g. Monte-Carlo methods. Interestingly, the use of Monte Carlo is effective in this setting because the set of attributes of a job is relatively small, for example, at most three or at most six attributes.
[0122] The inclusion of information about other jobs, for example, in the form of attribute aggregates and/or degrees of similarity, provides an advantage over standard priority functions. When using only attributes of the job itself, if job j.sub.1 of instance D.sup.1 and job j.sub.2 of instance D.sup.2 share the same vector of attributes, e.g., if D.sub.j.sub.
[0123] Interestingly, the inventors envisaged to use the kernel function K, KF, to compute a priority function. Given a representation x of a reference job, the function yK (x, y) is a function taking as input a representation of a job, and providing as output the result of comparing the job y to the reference job x according to the kernel function. The inventors envisaged that a priority function PF for a job may be constructed by combining one or more such respective comparisons of the job with respective reference jobs. In the example of this figure, the priority function is constructed as a function from the class of linear combination of comparisons with respective reference jobs. Shown is a job Jk, 503, for which a priority value PRk, 513, is to be determined. Also shown are a number of reference jobs RJ1, 507, RJ2, 508, . . . , RJI, 509; in this example, three.
[0124] The priority function may be evaluated by performing respective comparisons of the job Jk to the respective reference jobs RJi according to the kernel function, e.g., the kernel function may be applied respectively to the job and the respective reference jobs. As a result, respective similarity values SIM1, 537, SIM2, 538, . . . , SIMI, 539, may be determined, e.g., respective inner products. The respective similarity values may be combined according to a combination operation CMB, 550, that in this example computes a linear combination of the respective similarity values SIMI according to respective coefficients λ1, 547, λ2, 548, . . . , λI, 549, resulting in the priority value PRk.
[0125] Mathematically, the above priority function may be written as h(φ), κ(x)
, where h(φ) is an element of the so-called reproducing kernel Hilbert space (RKHS) of K, and where κ is the feature map of K. In this example, h(φ)=Σ.sub.i λ.sub.iκ(x.sub.i), where λ.sub.i are the coefficients and x.sub.i the representations of the respective reference jobs. It is noted that the feature map h(φ) corresponding to the priority function, does not need to be explicitly evaluated; instead, the priority function may be evaluated in terms of invocations K (x.sub.i, x) of the kernel function using the respective representations of the reference jobs. However, for some types of kernel functions, explicitly representing the reference jobs x.sub.i and job x in feature space may also be possible and can provide an alternative way of evaluating the priority function.
[0126]
[0127] In this example, in order to compute priority value PR, 513′, for a job Jk, 503, first, multiple respective linear combinations hi{i,j} of comparisons, 506, of the job Jk with reference jobs are computed using the linear priority function LPF, 510, of
[0128] In general, various different non-linear combination operations CMB may be used to increase expressivity of the priority function. In this example, a combination operation CMB of a particular class is used. A procedure for determining priority functions of this class is provided in
[0129] Priority value PR may be determined based on one or more decision boundary differences. A decision boundary difference is computed by computing a difference between a constant δ.sub.i.sup.j and a priority value h.sub.i.sup.j, κ(x)
.sub.RKHS, PR{i,j}. The constant δ{i,j}, 561, is defined as part of the priority function. These decision boundary differences may be combined to obtain the priority value PR. For example, based on one or more of the decision boundary differences, one or more minimal decision boundary differences may be computed. A minimal decision boundary difference φ.sub.j (x) may be computed as a minimum of respective decision boundary differences. For example:
[0130] These minimal decision boundary differences may be combined to obtain the priority value PR, for example, by computing a minimum of the respective minimal decision boundary differences, weighed by respective weighting factors λ{j}, 562, given as part of the priority function, for example,
[0131] As also shown elsewhere, by adding a combination operation CMB, a broader and thus more expressive class of priority functions can be used to schedule jobs. In particular, using priority functions as described with respect to this figure, it can be guaranteed for a set of given training instance that the priority function completely reproduces the respective target solutions (at least if all jobs are distinct).
[0132]
[0133] In this example, priority functions are determined based on a set of multiple training jobs TJS, 600. No target schedule is typically known for the full set of training jobs TJS (or, at least it may not be used according to this example). For example, it may be computationally too expensive to compute an optimal or near-optimal schedule for the full set of jobs, as also discussed below.
[0134] From the set of multiple training jobs, one or more training instances may be determined. A training instance may be determined by first, in a selection operation Sel, selecting a set of jobs TIJ1, 601, TIJ2, 602, from the set of training jobs TJS. For example, the jobs of a training instance TIJi may be selected randomly from the set of training jobs. The respective sets of training instance jobs TIJ1, TIJ2 may be selected to be disjoint, but this is not necessary. For example, a sample size indicating the number of training instances to be generated, and an instance size indicating a number of jobs of a training instance TIJi may be fixed or given as hyperparameters. It is not necessary for all instances to have the same size. The number of jobs of a training instance is typically much smaller than the total number of training jobs TJS, for example, at most 10% or at most 5% of the size.
[0135] Interestingly, the training instances are selected to be small enough that it is feasible to determine a target schedule, as described below. For example, the sample size for a training instance TIJi, or for all training instances, may be at most or at least 25, at most or at least 100, or at most or at least 250 jobs. The instance size may be at most or at least 10, at most or at least 20 or at most or at least 50 training instances, for example. The priority function may be optimized essentially to reproduce the target schedules on the training instances.
[0136] The figure shows a scheduling operation Sched, 620 that may be applied to determine target schedules TIS1, 651, TIS2, 652, for the sets of jobs TIJ1, TIJ2, of the respective training instances. A target schedules TISi may define at least an ordering of the jobs TIJi; it may also specify additional scheduling information, such as starting times, an assignment of jobs to resources, etc; but this additional scheduling information is not needed to determine the priority function.
[0137] There are several ways to implement scheduling operation Sched. In some embodiments, the target schedule may be determined by a procedure that determines an optimal or near-optimal schedule, using conventional techniques. Interestingly, in many cases, it is not feasible to apply this procedure to the full set of training jobs TJS, for example, because this may be too time-consuming. Since the training instances TIJi are smaller, it is however possible to apply the scheduling operation to them, and thus to use them to determine a priority function as described herein. The priority function may then be used for scheduling of large numbers of jobs, e.g., as described with respect to
[0138] However, it is not needed for scheduling operation Sched to output an optimal or near-optimal schedule. In some embodiments, scheduling operation Sched determines a schedule for a set of training jobs TIJi by performing each of the following zero or more times: [0139] determining the schedule according to a given priority function, e.g., a known priority function or a previously used priority function, e.g., using the techniques of
[0142] Accordingly, one or more schedules may be determined for a set of training jobs TIJi, and target schedule TISi may be selected as a best schedule (according to some metric) among the determined schedules. For different training instances, it may differ which scheduling operation returns the best schedule. The priority function may be determined as a single priority function that attempts to reproduce the respective best target schedules TISi for the respective training instances. Thus, for example, a known set of priority rules may be extended by additional priority rules that effectively combine and generalize existing rules.
[0143] Given the one or more training instances comprising respective sets of multiple training jobs TIJi ordered according to respective target schedules TISi, a priority function finding operation Find, 630, may be performed to determine a priority function PR, 610.
[0144] As also discussed elsewhere, operation Find may select the priority function PR from a class of priority functions determined in terms of a kernel function. The kernel function may be defined to compare representations of two respective jobs, not necessarily of the same problem instance. A priority function of the class of priority functions includes one or more comparisons according to the kernel function with respective reference jobs. The priority function PR may be determined by performing an optimization. The optimization may be configured to select the priority function PR from the class of priority functions. Given an earlier job that is ordered before a later job according to a target schedule TISi, the optimization may be configured to penalize the priority function for assigning a higher priority to the later job than to the earlier job. Several concrete ways of determining such priority functions are described, e.g., with respect to
[0145] As indicated by arrow 631 in the figure, the selecting Sel, scheduling Sched, and priority function determining Find may be repeated multiple times to determine multiple priority functions. For example, jobs may be scheduled using the multiple priority functions, as described, e.g., with respect to
[0146] The multiple priority functions may each use the same kernel function and/or be from the same class of priority functions and/or be determined according to the same procedure Find, but this is not needed.
[0147] An example implementation of the above procedure for determining priority functions is provided below. Here, E is used to denote a set of one or more training instances TIJi, TISi. A training instance is represented in the algorithm below by a matrix D=(D.sub.1, . . . , D.sub.n(D)), where the columns represent training jobs TIJi that are enumerated according to the target solution TISi.
TABLE-US-00002 Algorithm for determining priority functions Input: Training jobs TJS representing a large instance D* of the scheduling problem Algorithm A for determining target schedules TISi, e.g., using a list scheduling routine and/or an objective function in the form of a procedure for computing a value representing optimality of a schedule Hyperparameters: The number of rules, the sample size, and the instance size for instances in a training instance TIJi (usually much smaller than the number of columns in the original instance). Output: A set of priority rules. Step 1. Repeat a given number of times (equal to the number of rules to be generated): Using Sel, generate the given number of instances being subsets of D*, each of the given instance size. Let {tilde over (E)} be the obtained set of instances. E:=∅. For each {tilde over (D)} ∈ {tilde over (E)}: Apply Algorithm A, Sched, to {tilde over (D)} to find a target solution Sort the columns of {tilde over (D)} in the order given by the target solution. Let D be the matrix obtained by sorting. Add D to E. Apply an algorithm Find for finding a priority rule to the training set E. Step 2. Return the collection of rules found at Step 1.
[0148]
[0149] Shown in the figure are a number of training instances, that each comprise multiple jobs according to a target schedule. In particular, shown are jobs J11, 607, and J12, 608, of a first training instance, and a job J21, 609, of a further training instance. Also shown are respective coefficients for the respective jobs: coefficient λ11, 647 for job J11; coefficient λ12, 648 for job J12; and coefficient λ21, 649 for job J21. The training may comprise determining the coefficients for the respective jobs. The priority function may be determined as a linear combination of the respective jobs according to the respective coefficients.
[0150] To penalize the priority function for assigning a higher priority to later jobs than to earlier jobs, a loss function can be optimized that is defined over respective pairs of consecutive earlier and later jobs of the training set. Thus, the loss function may include respective contributions for respective pairs of consecutive jobs. The figure shows a contribution s11, 657, for the first two jobs of the first training instance, for example. Likewise, there may be a contribution for the second and third job of the first training instance, and for consecutive jobs of the other training instance. A contribution s11 may penalize pairs of earlier and later jobs in which the priority function assigns a higher priority to the earlier job than to the later job. A loss function based on such pairwise losses is relatively simple, for example, it may not use comparison of priorities of non-consecutive jobs. Still, it represents optimality since it encourages that all jobs to be ordered correctly.
[0151] Mathematically, the loss function can for example be phrased as the following empirical risk minimization problem (ERMP):
where is the set of linear combinations of comparisons of references φ(τ.sub.D(D.sub.j)) to training jobs D.sub.j; mathematically: the set of linear functions over the reproducing kernel Hilbert space of the kernel function K. (Instead of “+1” in the above ERMP, any positive value +ϵ may be used.) If the optimal value of this ERMP is zero and φ is an optimal solution, then the rule φ leads to the target schedule for each instance in the training set E. In this case φ is a solution to the linear system
∀D ∈ E: φ(τ.sub.D(D.sub.j))<φ(τ.sub.D(D.sub.j+1)),j=1, . . . , n(D)−1.
[0152] If the optimal value of the ERMP is nonzero, then there are instances in the training set E that are are not ordered the priority function according to the target schedule. As a consequence, when scheduling is performed based on φ, it is not guaranteed to induce the target schedule. However, also in such cases, an optimal or near-optimal solution φ to the optimization of the loss function may not violate too many inequalities defining job orders related to such instances, and so the objective values of solutions induced by φ can be close to the objective values of the target solutions, thus resulting in a high-quality schedule.
[0153] Various techniques may be used to optimize the loss function described above. Generally, in a training signal determining operation Sig, 650, a training signal γ11, 660, may be determined, based on which the trainable parameters of the optimization, comprising the coefficients λ.sub.ij for the respective reference jobs, may be updated in an updating operation Upd, 670. For example, gradient descent, stochastic gradient, or other conventional optimization techniques can be applied.
[0154] Interestingly, however, the inventors realized that for the specific optimization to be performed, namely the determining of coefficients λij of elements of a reproducing kernel Hilbert space, it is also possible to perform the optimization more efficiently. Namely, the optimization can be regarded as a convex optimization problem in the reproducing kernel Hilbert space. This optimization problem can be tackled by using a local optimization approach. In this approach, an earlier job and a later job, e.g., jobs J11 and J12, may be selected from the pairs of jobs such that they provide a contribution to the loss function. For example, the priority function assigns a higher priority to the later job J12 than to the earlier job J11, or in any case the difference in priority between J11 and J12 is not sufficiently high. The earlier job and later job may be selected such that they provide the largest absolute or relative contribution to the loss function, for example. Thus, the training signal γ11 determined in operation Sig may indicate a pair of jobs contributing to the loss function.
[0155] The updating Upd may then update the priority function to reduce the contribution of this pair to the loss function. In particular, the coefficients λ11, λ12 for the pair of jobs may be adapted to reduce the contribution. With respect to the ERMP formulated mathematically above, the updating may be implemented as a projection onto a convex set in the space comprising the priority function as an element of the reproducing kernel Hilbert space, and respective loss contributions of respective pairs of jobs. The updating may completely eliminate the contribution of the pair to the loss, but this is not needed. The pair may still provide a nonzero contribution to the loss in the eventual optimization result.
[0156] Mathematically, an optimization to determine the priority function by a series of projections onto convex sets may be implemented as follows:
TABLE-US-00003 Algorithm for finding priority rules as linear combinations of comparisons Input: A training set E comprising one or more training instances A mapping τ of jobs to kernel function inputs Output: A priority function φ* ∈ . 1. Initialize φ by any available method (e.g., with a constant function). Define: ∀D ∈ E, j = 1, . . . , n(D) − 1: s.sub.D,j: = min{−1, φ(τ.sub.D(D.sub.j)) − φ((τ.sub.D(D.sub.j+1))}. 2. While a stopping condition is not satisfied (e.g., the number of iterations does not exceed a given limit and/or the empirical risk related to the current priority function φ is not zero): - ∀D ∈ E, j = 1, . . . , n(D) − 1, denote
, Upd:
[0157] In this example, instead of −1, generally, any value ϵ<0 can be used.
[0158] Interestingly, performing the optimization in terms of projections onto convex sets may be efficient to implement, and may converge to an optimal solution of the ERMP if the optimal value is zero. Otherwise, the returned solution serves provides a good approximate solution of the ERMP.
[0159] For example, as also discussed elsewhere, the representation
ξ(x, D)=(logx.sub.1, . . . , logx.sub.m)
with γ: =0 may be used to find a posynomial rule over the space of job attributes. In this case, a priority rule found according to the above algorithm may be denoted as Σ.sub.i=1.sup.m α.sub.1logx.sub.i, corresponding to the rule defined by the posynomial Π.sub.i=1.sup.m x.sup.α.sup.
[0160]
[0161] Shown in the example are one or more training instances each comprising multiple jobs ordered according to a target schedule. For illustrative purposes, shown is a training instance comprising jobs J11, 701; J12,711; and J13, 712. A further training instance is also shown, comprising jobs J21, 702; J22,713; and J23, 714. The respective training instances need not all have the same number of jobs.
[0162] Interestingly, in this figure, the priority function is based on respective position priority functions. A position priority function may be a function configured to separate jobs occurring at a given position in respective training instances from jobs not at this position, or at least from jobs occurring after the given position in the respective training instances. For example, shown in the figure is a first set of jobs FSJ, 700, occurring at the first position in respective training instances, and a second set of jobs SSJ, 710 occurring at other positions in the respective training instances. The position priority function may generally assign a value from a given domain to (most) jobs FSJ occurring at the respective position, and assign a value outside that domain to (most) other jobs SSJ. Such separation makes it possible to combine the respective position priority functions to obtain an overall priority function.
[0163] As shown in the figure, the position priority function may generally be obtained by performing a classifier training Class, 730, using the first set of jobs FSJ as a set of positive examples and the second set of jobs SSJ as a set of negative examples. Various conventional kernel-based classifiers may be used. The exact set of trainable parameters of the classifier training Class depends on how the position priority function is composed from invocations of the kernel function.
[0164] As is also discussed with respect to
[0165] An advantageous way of determining set of parameters hi{i,j}, δ{i,j}, is by determining a binary classifier to distinguish the sets FSJ, SSJ as described with respect to
[0166] Applying the techniques of
[0167] Mathematically, the use of an inequality-based classifier may be described as follows. Let
[0168] A position priority function may be denoted as:
[0169] for each C.sub.j such that φ.sub.j is negative over C.sub.j and positive over all C.sub.i with i≠j. For example, ϕ.sub.j can be a polyhedral function as discussed with respect to
[0170] Respective position priority functions for respective positions may be combined into an overall priority function. Parameters λ{j}, 762, for making this combination, may be determined in a combining operation Cmb, 750. The parameters can be weighting factors λ.sub.j as discussed with respect to
although other ways of combining the position priority functions are also possible. Generally, the position priority functions separate jobs at a given position from other jobs by assigning them a different subset of values, and the position priority function may use this separation to generally assign higher (or lower) priorities to jobs at earlier positions.
[0171] For example, weighting factors λ.sub.j may be determined as follows. Weighting factor λ.sub.1 for the first position may be selected as an arbitrary positive coefficient. The other coefficients may be defined recursively as
[0172] where α>0 is a positive constant. Interestingly, it may be observed that if all jobs have different representations; if the position priority functions perfectly separate the jobs at the given position from other jobs; and if the above weighting factors are used, then a solution for the ERMP is found with a zero value of the objective function. That is, the solution is optimal and the priority function reproduces the target schedule for each training instance. This may be observed from the fact that, for any x.sub.j ∈ C.sub.j and x.sub.j+1 ∈ C.sub.j+1 we have
λ.sub.j|φ.sub.j(x.sub.j)|≥λ.sub.jmin{|φ.sub.j(x)|:x ∈ C.sub.j}=(1+α)λ.sub.j+1max{|φ.sub.j+1(x)|:x ∈ C.sub.j+1}>λ.sub.j+1|φ.sub.j+1 (x.sub.j+1)|,
from where it follows that
φ(x.sub.j)=λ.sub.jφ.sub.j(x.sub.j)<λ.sub.j+1φ.sub.j+1 (x.sub.j+1)=φ(x.sub.j+1)
[0173] because φ.sub.j(x.sub.j)<0 and φ.sub.j+1(x.sub.j+1)<0. Since for any instance D the τ.sub.D(D.sub.i) is contained in C.sub.i for i=1, . . . , n(D), this means that φ induces the sequence 1, . . . , n(D) for every D ∈ E because the above inequalities imply that
∀D ∈ E: φ(D.sub.j)<φ(D.sub.j+1), j=1, . . . , n(D)−1.
[0174] Zero empirical risk may be achieved by multiplying φ by a sufficiently large positive constant.
[0175]
[0176] These figures relate to a computer-implemented method for training a binary classifier, wherein the classifier comprises a number of m of hyperplanes 720 in a Hilbert space. H. The method may comprise providing at least one training dataset of at least one pair of a numerical vector e ∈ E⊂D in an instance space D and an associated label. The label may take one of two different label values L0, L1, respectively. Further, the method may comprise mapping the one or more numerical vectors e of the training data set to points of a unit sphere S⊂H in the Hilbert space H according to a predetermined mapping τºκ: D.fwdarw.H, where the unit sphere S is defined according to a norm ∥.∥.sub.H (Hilbert norm), which is derived from an inner product < . , . >.sub.H of the Hilbert space H. Note that the symbols τ and κ are used with a different meaning in the description of
[0177] Further, the method may comprise training the classifier based on the numerical vectors e of the training data set. The training may comprise determining the number m of hyperplanes 720, and determining the hyperplanes 720. Further, the training may be directed to determining that numerical vectors e ∈ E.sup.− ⊂E of the training data mapped to the unit sphere S with a zero-label value L0 each lie on a side (e.g., an orientation side 730) of at least one of the hyperplanes 720.
[0178] As described herein, the proposed computer-implemented method can be trained offline. In offline training (i.e., a training set is completely available at the beginning of training and is then used for training), 100% accuracy with respect to the training data set (in-sample accuracy) can be achieved (but this does not mean that this accuracy has to be achieved in all cases). The accuracy of the binary classifier can be improved outside the training data set (out-of-sample accuracy).
[0179]
[0180]
[0181]
[0182] The proposed computer-implemented method of
[0183] Per hyperplane 720, the side may be an orientation side 730, and training may further comprise determining the orientation sides 730 per hyperplane 720. The hyperplanes 720 may each be defined by a hyperplane orientation function g.sub.i(z)=<h.sub.i,z>.sub.H, i=1, . . . , m depending on a (generic, i.e., arbitrary) point z ∈ H of the Hilbert space H, provided that they are each equal to a respective threshold value δ.sub.i, i=1, . . . , m where the hyperplane orientation function g.sub.i(z) which depends on a point z ∈ H of the Hilbert space H in each case is the inner product < . , . >.sub.H associated with the Hilbert space H of a linear combination h.sub.i=Σ.sub.e∈E λ.sub.i,e τºκ(e), i=1, . . . , m parameterized by respective coefficients λ.sub.i,e, i=1, . . . , m, e ∈ E of numerical vectors e all mapped to the unit sphere S of the training data set and the point z ∈ H is of the Hilbert space H. Furthermore, for each hyperplane 720, the orientation side 730 may be defined as an open half-space 740 to a side of the hyperplane 720 of the Hilbert space H whereby the side is defined in each case in the direction of a gradient of the hyperplane orientation function g.sub.i(z)=<h.sub.i,z>.sub.H, i=1, . . . , m dependent on a point z ∈ H of the Hilbert space H. Each half-space 740 is open in that it does not include the respective hyperplane 720. When in a finite-dimensional real Hilbert space, the inner product < . , . >.sub.H=< . , A.> can be expressed by a Euclidean inner product < . , . > and a linear mapping A:H.fwdarw.H, then a gradient 750 can be calculated via A.sup.T h.sub.i. Furthermore, for each hyperplane 720 a hyperplane inequality g.sub.i(z)≤δ.sub.i for a (generic) point z ∈ H of the Hilbert space H can be defined which is violated if (or exactly if) the (generic) point z ∈ H lies on the orientation side 730 of the hyperplane 720.
[0184] Such a hyperplane inequality (in the edge case the hyperplane inequality can be an equation) is g.sub.i(z)≤δ.sub.i for i=1, . . . , m. Thus, the orientation side 730 per hyperplane 720 can also be defined as the set of all points z ∈ H with g.sub.i(z)>δ.sub.i. Furthermore, per hyperplane 720 an inequality (in the edge case the inequality can be an equation) f.sub.i(x)=g.sub.i(τºκ(x))≤δ.sub.i for a (generic) numerical vector e⊂D of the instance space D can be defined which is violated if (or exactly if) the vector mapped to the unit sphere S⊂H of the Hilbert space H by the given mapping τºκ: D.fwdarw.H violates the respective hyperplane inequality. The advantage is that these inequalities can form a classification system. The determination of the hyperplanes 720 and their respective orientation sides 730 (during the training) can be the determination of the coefficients λ.sub.i,e per hyperplane 720 and the threshold values δ.sub.i per hyperplane 720. As an example,
[0185] In offline training, the training may further be directed to ensure that none of the numerical vectors e ∈ E.sup.+⊂E of the training data set mapped to the unit sphere S having a first label value L1 lies on an orientation side 730 of the hyperplanes 720. This may prove advantageous in that it may result in a classification rule and thus a classifier, where the classifier may have 100% accuracy on the training data set (in-sample accuracy) after offline training. Since the orientation side 730 is defined as an open half-space 740, the numerical vectors e ∈ E.sup.+⊂E of the training data set mapped to the unit sphere S with a first label value L1 may also lie on a hyperplane 720.
[0186] In the offline training, the training may further be directed to ensure that none of the numerical vectors e ∈ E.sup.+⊂E of the training data set mapped to the unit sphere S with the first label value L1 lie outside of a polyhedron and that all numerical vectors e ∈ E.sup.−⊂E of the training data set mapped onto the unit sphere S with the zeroth label value L0 lie outside of the polyhedron, wherein the polyhedron in the Hilbert space H may be defined by the hyperplanes 720. The polyhedron may be defined, for example, by a (finite) intersection of all non-orientation sides of the hyperplanes 720, where a non-orientation side of a hyperplane 720 is defined as the complement in Hilbert space H to the orientation side 730 of the hyperplane 720 and thus may represent a closed half-space. Particularly illustrative is the case in which the polyhedron may be bounded and/or oriented. Such a polyhedron is exemplarily shown in
[0187] After offline training, for a numerical vector to be classified x ∈ D the first label value L1 may be assigned if all inequalities evaluated on the numerical vector to be classified x are fulfilled, and the zeroth label value L0 may assigned if at least one of the inequalities evaluated on the numerical vector x to be classified is violated. Such an assignment may constitute a classification rule. A numerical vector to be classified x ∈ D may in particular be taken from a pair of the training data set. Thus, after offline training, the accuracy on the training data set (in-sample accuracy) can be evaluated. Each inequality f.sub.i(x)≤δ.sub.i, i=1, . . . , m implies that an instance function f.sub.i(x) depending on one of the elements x ∈ D of the instance space D, in particular the element to be classified x ∈ D, is less than or equal to the respective threshold value δ.sub.i, i=1, . . . , m where, for each hyperplane 720, the instance function f.sub.i: D.fwdarw.x
f.sub.i(x) is defined depending on an element x ∈ D in the instance space D, in particular on an element to be classified (x ∈ D), as a concatenation of the given mapping τºκ: D.fwdarw.H and hyperplane orientation function g.sub.i: H.fwdarw.
g.sub.i(z)=<h.sub.i, z>.sub.H, i=1, . . . , m depending on the respective point z ∈ H of the Hilbert space H, where the element x ∈ D in the instance space D, in particular the element to be classified x ∈ D is mapped by the given mapping τºκ: D.fwdarw.H to the point z=τºκ(x) ∈ H of the Hilbert space H.
[0188] Such a classifier has a classification system defined by the m hyperplanes 720 in the Hilbert space H and their respective orientation sides 730—or equivalently by the m inequalities—as well as the described classification rule, as described herein.
[0189] Mapping the numerical vectors e of the training data set may involve choosing a kernel function K: D×D.fwdarw.that maps two elements x, y ∈ D of the instance space D into the real numbers and which defines a reproducing kernel Hilbert space RKHS for the kernel function K and a feature map κ: D.fwdarw.RKHS from the instance space D into the reproducing kernel Hilbert space RKHS, wherein an inner product associated with the reproducing kernel Hilbert space RKHS is < . , . >.sub.RKHS after retraction to the instance space D corresponds to the kernel function K(x,y)=<κ(x), κ(y)>.sub.RKHS, ∀x, y ∈ D. The given mapping τºκ:D .fwdarw.H from the instance space D into the Hilbert space H can then be a concatenation of the feature mapping κ:D.fwdarw.RKHS and a sphere mapping τ: RKHS.fwdarw.S⊂H from the reproducing kernel Hilbert space RKHS into the unit sphere S of the Hilbert space H.
[0190] The sphere map τ:RKHS.fwdarw.S⊂H can transform an element h of the reproducing kernel
[0191] Hilbert space RKHS into the Hilbert space H and normalize it to norm one ∥τ(h)∥.sub.H=1, in particular where the sphere mapping extends the element h of the reproducing kernel Hilbert space RKHS with a coordinate with the value one (h,1) and the unit sphere S is centered in an origin of the Hilbert space H. In general, the reproducing kernel Hilbert space RKHS and the Hilbert space H can be infinite-dimensional Hilbert spaces. Otherwise, if they are finite-dimensional, the dimension of the Hilbert space H can be greater by one than the dimension of the reproducing kernel Hilbert space RKHS.
[0192] Next, the method steps of offline training are described. The determination of the number m of hyperplanes 720 as well as the determination of the coefficients λ.sub.i,e and the threshold values δ.sub.i of the hyperplanes 720 may comprise at least three steps: In a first step, a temporary set E.sub.0 can be defined, which is composed of all numerical vectors e ∈ E.sup.−⊂D of the training data set having the zeroth label value L0, i.e. E.sub.0:=E.sup.−. A counter i for the hyperplanes 720 may be initialized to zero, i.e. i=0 (or with some other initial value for the counter).
[0193] In at least a second step, first an element e.sub.0 ∈ E.sub.0 of the temporary set E.sub.0 can be selected. Furthermore, the coefficient λ.sub.i,e.sub.
λ.sub.i,e:=0, ∀e ∈ E.sub.0 ∪E.sup.+, e≠e.sub.0
[0194] Furthermore, the threshold value designated by the counter i, δ.sub.i, can be defined as half the sum of the instance function f.sub.i(e.sub.0) dependent on the selected element e.sub.0 (e.g. on the instance function value f.sub.i(e.sub.0) of the instance function value at the selected element e.sub.0) and the maximum max{f.sub.i(e.sub.1): e.sub.1ϵE.sup.+} over the instance functions f.sub.i(e.sub.1) dependent on the elements e.sub.1ϵE.sup.+ with the first label value L1 denoted by the numerator i, thus holds:
δ.sub.i=0.5.Math.(f.sub.i(e.sub.0)+max{f.sub.i(e.sub.1):e.sub.1ϵE.sup.+)
[0195] In at least a third step, all elements in the temporary set E.sub.0 that do not satisfy all of the inequalities numbered up to the numerator i can be removed. Furthermore, the numerator i may be incremented by one, i.e. i:=i+1 and then can be jumped back to the second step if the temporary set is E.sub.0 is not empty, or the number m of hyperplanes 720 can be set equal to the counter i, m:=i, if the temporary set E.sub.0 is empty.
[0196] This algorithm for offline training can be advantageous in that it can generate a classifier in a finite number of steps which, in contrast to conventional classifiers in the related art, can have (nearly) 100% accuracy on the training data set (in-sample accuracy). Another advantage of the generated classifier can be seen in the fact that the number of inequalities (of the classification system) of the classifier is not dependent on the (usually very large) number of training pairs. This can in particular simplify an application of the trained classifier.
[0197] To improve the practical behaviour of the algorithm and to reduce the number of inequalities, a so-called cut augmentation procedure can be used. For this purpose, immediately after the (or a) second step of the described procedure for offline training, a further first substep and a further second substep can be performed: In a further first sub-step of the second step, an auxiliary set of
X.sub.0:={x.sub.0 ∈ E.sub.0: f.sub.i(x.sub.0)>δ.sub.i}
can be calculated as the set of elements x.sub.0 ∈ E.sub.0 in the temporary set E.sub.0 whose instance functions f.sub.i(x.sub.0) denoted by the counter i is greater than the threshold value δ.sub.i indicated by the counter i. Furthermore, the cardinality of the auxiliary set X.sub.0 can be determined. Furthermore, the coefficients λ.sub.i,e indicated by the counter i can be redetermined. For this an expression F(h.sub.i) can be maximized in an optimization, where the expression F(h.sub.i) can be defined as a quotient
of a difference as numerator of the quotient and the norm ∥h.sub.i∥.sub.H of the linear combination h.sub.i of all numerical vectors e mapped to the unit sphere S of the training data set as the denominator of the quotient, where the difference is a (mathematical: the) minimum min.sub.x.sub.
[0198] In a further second substep of the second step, again the auxiliary quantity X.sub.0 can be computed as in the further first substep of the second step. Furthermore, the first sub-step of the second step may be repeated if the recomputed auxiliary set X.sub.0 has a larger cardinality than the cardinality of the auxiliary set X.sub.0 calculated in the first substep of the second step. Alternatively (i.e., if the recalculated auxiliary set X.sub.0 does not have a cardinality greater than the cardinality of the auxiliary set X.sub.0 calculated in the first substep of the second step) a new threshold value δ.sub.i denoted by i can be computed as half the sum of the minimum over instance functions f.sub.i(x.sub.0) defined by the elements x.sub.0 of the auxiliary set X.sub.0 and denoted by the numerator i and the maximum over the instance functions f.sub.i(e.sub.1) which depend on the numerical vectors e.sub.1 ∈ E.sup.+⊂D of the training data set having the first label value L1 and denoted by the counter i, i.e.:
δ.sub.i:=0.5 (min{f.sub.i(x.sub.0):x.sub.0 ξ X.sub.0}+max{f.sub.i(e.sub.1): e.sub.1 ϵ E.sup.+})
[0199] Then, the third step of the described procedure for offline training can proceed.
[0200] In the optimization (maximum margin problem), in some examples a respective ϵ-approximate solution for the coefficients λ.sub.i,e denoted by the numerator i can be calculated, where an ϵ-approximate solution means that the expression evaluated at the ϵ-approximate solution F(h.sub.i) is greater than or equal to the expression evaluated at an optimal solution OPT F(h.sub.i) divided by the sum of one and a non-negative real number (ϵ), i.e.:
[0201] In some examples, the optimization may be calculated using a two-sided von Neumann algorithm. This may comprise a first, second, third and fourth substep.
[0202] In a first substep, a first vector e.sub.0 ∈ X.sub.0 (the equality of the variable name with that of the variable chosen in the second step e.sub.0 is unproblematic insofar as the optimization can have its own namespace) of the auxiliary set X.sub.0 is chosen and mapped according to the given mapping τºκ: D.fwdarw.H to a first unit vector z.sub.0: =τºκ(e.sub.0) into the unit sphere S of the Hilbert space H. Furthermore, a second vector e.sub.1 ∈ E.sup.+ from a set E.sup.+⊂D of the numerical vectors e.sub.1 ∈ E.sup.+⊂D of the training data set having the first label value L1 can be selected and mapped according to the given mapping τºκLD.fwdarw.H to a second unit vector z.sub.1:=τºκ(e.sub.1) into the unit sphere S of the Hilbert space H. Furthermore, a difference quantity h:=z.sub.0-z.sub.1 can be formed as the first vector z.sub.0 in the unit sphere S minus the second vector z.sub.1 in the unit sphere S. Furthermore, a non-negative real number ϵ can be chosen.
[0203] At least a second substep may comprise choosing a further first vector e.sub.0′ ∈ X.sub.0 of the auxiliary set X.sub.0 such that the inner product < . , . >.sub.H of the Hilbert space H from the difference quantity h and the further first vector e.sub.0′ ∈ X.sub.0 of the auxiliary set X.sub.0 which is mapped according to the given mapping τºκ: D.fwdarw.H into the unit sphere S of the Hilbert space H is minimized by e.sub.0′ over X.sub.0, i.e.:
[0204] Furthermore, another second vector e.sub.1′ ∈ E.sup.+ of the set E.sup.+ ⊂D of the numerical vectors e.sub.1 ∈ E.sup.+ ⊂D of the training data set having the first label value L1 may be chosen such that the inner product < . , . >.sub.H of the Hilbert space H from the difference quantity h and the further second vector e.sub.1′ ∈ E.sup.+ which is mapped according to the given mapping τºκK: D.fwdarw.H into the unit sphere S of the Hilbert space H is maximized by e.sub.1′ over X.sub.1, i.e.:
[0205] In at least a third substep, a convex quadratic problem with a first real variable α.sub.0ϵ[0,1] in the unit interval and with a second real variable α.sub.1ϵ[0,1] in the unit interval can be solved. For this purpose a first auxiliary variable z.sub.0′ in Hilbert space H can be defined as a first sum
z.sub.0′=(1−α.sub.0).Math.z.sub.0+α.sub.0.Math.τºκ(e.sub.0′)
of two terms, where the first term is the first unit vector z.sub.0 multiplied by a difference of one minus the first real variable α.sub.0 and the second term is the further first vector e.sub.0′ mapped according to the given mapping τºκ: D.fwdarw.H into the unit sphere S of the Hilbert space H multiplied by the first real variable α.sub.0. Furthermore, a second auxiliary variable z.sub.1′ in the Hilbert space H can be defined as a second sum
z.sub.1′=(1−α.sub.1).Math.z.sub.1+α.sub.1.Math.τºκ(e.sub.1′)
of two further terms, where the further first term is the second unit vector z.sub.1 multiplied by a further difference of one minus the second real variable α.sub.1 and the second term is the further second vector e.sub.1′ mapped according to the given mapping τºκ:D.fwdarw.H into the unit sphere S of the Hilbert space H multiplied by the second real variable α.sub.1. Furthermore, the first real variable α.sub.0 and the second real variable α.sub.1 can be determined in such a way that the norm ∥.∥.sub.H of a difference of the first auxiliary variable z.sub.0′ and the second auxiliary variable z.sub.1′, i.e. ∥z.sub.0′-z.sub.1′∥.sub.H is minimized.
[0206] In at least a fourth substep, the first unit vector z.sub.0 can be replaced by the first auxiliary quantity z.sub.0′ and the second unit vector z.sub.1 can be replaced by the second auxiliary quantity z.sub.1′ where the first auxiliary variable z.sub.0′ can be computed with the first real variable α.sub.0 determined in the third substep and the second auxiliary variable z.sub.1′ can be computed with the second real variable α.sub.1 determined in the third substep. Furthermore, the difference variable h can be set to the first unit vector z.sub.0 minus the second unit vector z.sub.1, e.g.:
h:=z.sub.0−z.sub.1
[0207] Furthermore, it can be checked whether another expression is greater than or equal to the inverse of the sum of one and the non-negative real number ϵ e.g.:
where the further expression can be defined as a further quotient of a further difference as the numerator of the quotient and the square of the norm ∥h∥.sub.H of the difference quantity h as the denominator of the quotient, the further difference being a minimum min.sub.e.sub.
[0208] The images of the auxiliary sets X.sub.0, X.sub.1 under the given mapping τºκ:D.fwdarw.H may be linearly separable. This condition can always be guaranteed when called from the main algorithm.
[0209] A trained binary classifier may assign, as described, to each element x ∈ D of the instance space D, in particular an element to be classified x ∈ D, one of the two different label values L0, L1. Thus, a computer-implemented method for using a binary classifier is provided, wherein the binary classifier has been trained as described.
[0210]
[0211] The method 800 may comprise, in an operation titled “ACCESS TRAINING DATA”, accessing 810 training data comprising a set of multiple training jobs. The method may further comprise, in an operation titled “OBTAIN TRAINING INSTANCES”, obtaining 820 from the training data one or more training instances. A training instance may comprise multiple jobs of set of the training jobs ordered according to a target schedule. The multiple jobs may comprise an earlier job ordered before a later job.
[0212] The method 800 may comprise, in an operation titled “OBTAIN KERNEL FUNCTION”, obtaining 830 a kernel function. The kernel function may be defined to compare representations of two respective jobs. A class of priority functions may be defined in terms of the kernel function. A priority function of the class of priority functions may include comparison according to the kernel function with one or more reference jobs.
[0213] The method 800 may comprise, in an operation titled “OPTIMIZE PRIORITY FUNCTION”, determining 840 the priority function by performing an optimization. The optimization may be configured to select the priority function from the class of priority functions. The optimization 840 may be configured to penalize the priority function for assigning a higher priority to the later job than to the earlier job.
[0214] The method 800 may comprise, in an operation titled “OUTPUT PRIORITY FUNCTION”, outputting 850 the determined priority function from the class of priority functions for use in the manufacturing or logistics process.
[0215]
[0216] The method 900 may comprise, in an operation titled “OBTAIN PRIORITY FUNCTION”, obtaining 910 data representing a priority function. The priority function may have been previously trained as described herein, e.g., according to method 800 of
[0217] The method 900 may comprise, in an operation titled “OBTAIN JOBS”, obtaining 920 job status data comprising representations of multiple jobs to be scheduled.
[0218] The method 900 may comprise, in an operation titled “EVALUATE PRIORITY FUNCTION”, evaluating 930 the priority function on the multiple respective jobs to obtain multiple respective priority values. The evaluating 930 of the priority function on a selected job of the multiple respective jobs may comprise, in an operation titled “EVALUATE KERNEL FUNCTION”, evaluating 940 the kernel function to compare the selected job to the one or more reference jobs.
[0219] The method 900 may comprise, in an operation titled “DETERMINE SCHEDULE”, determining 950 a schedule for the multiple jobs based on the priority values.
[0220] The method 900 may comprise, in an operation titled “OUTPUT SCHEDULE”, outputting 960 scheduling data representing the schedule to enable the multiple jobs to be carried out according to the schedule. In particular, the method may comprise carrying out the multiple jobs according to the schedule.
[0221] It will be appreciated that, in general, the operations of method 800 of
[0222] The method(s) may be implemented on a computer as a computer implemented method, as dedicated hardware, or as a combination of both. As also illustrated in
[0223]
[0224] This example shows an instance of a scheduling problem Pk|r.sub.j|Σ.sub.j w.sub.jC.sub.j with 50 machines and 1000 jobs.
[0225]
[0226] In
[0227] In
[0228] Examples, embodiments or optional features, whether indicated as non-limiting or not, are not to be understood as limiting the present invention.
[0229] It should be noted that the above-mentioned embodiments illustrate rather than limit the present invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the present invention. Use of the verb “comprise” and its conjugations does not exclude the presence of elements or stages other than those stated. The article “a” or “an” preceding an element does not exclude the presence of a plurality of such elements. Expressions such as “at least one of” when preceding a list or group of elements represent a selection of all or of any subset of elements from the list or group. For example, the expression, “at least one of A, B, and C” should be understood as including only A, only B, only C, both A and B, both A and C, both B and C, or all of A, B, and C. The present invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the device described as including several means, several of these means may be embodied by one and the same item of hardware. The mere fact that certain measures are described separately does not indicate that a combination of these measures cannot be used to advantage.