System and method for filtering a data set and/or selecting at least one item of the data set
20230106165 · 2023-04-06
Assignee
Inventors
Cpc classification
G06F2119/02
PHYSICS
International classification
Abstract
A computer-implemented method of generating a visual representation of items comprising selecting a pair of items m and n comprising parameters representing a property, selecting a pair of parameters p and q, being a.sub.mp the parameter p of item m, b.sub.mq the parameter q of item m, a.sub.np the parameter p of item n, and b.sub.nq the parameter q of item n, calculating a pair of weights w.sub.p and w.sub.q based on a.sub.mp, b.sub.mq, a.sub.np and b.sub.nq, and based on a.sub.mp, b.sub.mq, a.sub.np and b.sub.nq, storing, the pair of weights, determining a first vertex item comprising the greatest value for the parameter p, a second vertex item comprising the greatest value for the parameter q, and a third vertex comprising the greatest value for a parameter r, generating a plurality of points based on stored and determined values, and displaying a geometric shape comprising the plurality of points
Claims
1. A computer-implemented method of generating a visual representation of items, the method comprising: (a) selecting, at an operating system, a pair of items m and n from a plurality of items, wherein each item of the plurality of items comprises a plurality of parameters, each parameter of the plurality of parameters representing a property of the plurality of items; (b) for the selected pair of items m and n, selecting, at the operating system, a pair of parameters p and q from the plurality of parameters, wherein a.sub.mp is the parameter p of the item m, b.sub.mq is the parameter q of the item m, a.sub.np is the parameter p of the item n, and b.sub.nq is the parameter q of the item n; (c) calculating, at the operating system, a pair of weights w.sub.p and w.sub.q wherein w.sub.p is calculated based on a.sub.mp, b.sub.mq, a.sub.np and b.sub.nq, and wherein w.sub.q is calculated based on a.sub.mp, b.sub.mq, a.sub.np and b.sub.nq; (d) if w.sub.p>0 and w.sub.q>0, storing, in a memory by the operating system, the pair of weights w.sub.p and w.sub.q and the pair of items m and n; (e) determining, at the operating system, a first vertex item, a second vertex item and a third vertex item, wherein the first vertex item is the item of the plurality of items comprising the greatest value for the parameter p, the second vertex item is the item of the plurality of items comprising the greatest value for the parameter q, and the third vertex item is the item of the plurality of items comprising the greatest value for a parameter r from the plurality of parameters; (f) generating, at the operating system, a plurality of points based on the stored pair of weights w.sub.p and w.sub.q and the pair of items m and n, the first vertex item, the second vertex item and the third vertex item; and (g) displaying, in a display screen by the operating system, the visual representation of the plurality of items by displaying a plot comprising a geometric shape and comprising the plurality of points.
2. The computer-implemented method according to claim 1, the method further comprising the steps of: (I) selecting, at the operating system, a set of three items i, j and k from the plurality of items; (II) for the selected set of three items i, j and k, selecting, at the operating system, a set of three parameters p, q and r from the plurality of parameters, wherein a.sub.ip is the parameter p of the item i, b.sub.iq is the parameter q of the item i, c.sub.ir is the parameter r of the item i, a.sub.jp is the parameter p of the item j, b.sub.jq is the parameter q of the item j, c.sub.jr is the parameter r of the item j, a.sub.kp is the parameter p of the item k, b.sub.kq is the parameter q of the item k, and c.sub.kr is the parameter r of the item k; (III) calculating, at the operating system, a set of three weights w.sub.p, w.sub.q and w.sub.r wherein w.sub.p is calculated based on a.sub.ip, b.sub.iq, c.sub.ir, a.sub.jp, b.sub.jq, c.sub.jr, a.sub.kp, b.sub.kq, and c.sub.kr, w.sub.q is calculated based on a.sub.ip, b.sub.iq, c.sub.ir, a.sub.jp, b.sub.jq, c.sub.jr, a.sub.kp, b.sub.kq, and c.sub.kr and w.sub.r is calculated based on a.sub.ip, b.sub.iq, c.sub.ir, a.sub.jp, b.sub.jq, c.sub.jr, a.sub.kp, b.sub.kq, and c.sub.kr; and (IV) if w.sub.p>0, w.sub.q>0 and w.sub.r>0, storing, in the memory by the operating system, the set of three weight values w.sub.p, w.sub.q and w.sub.r and the set of three items i, j and k; wherein step (f) further comprises generating the plurality of points based on the stored set of three weight values w.sub.p, w.sub.q and w.sub.r and the set of three items i, j and k.
3. The computer-implemented method according to claim 1 further comprising repeating steps (a) through (d) for each pair of items m and n from the plurality of items.
4. The computer-implemented method according to claim 1 further comprising repeating steps (b) through (d) for each pair of parameters p and q from the plurality of parameters.
5. The computer-implemented method according to claim 2 further comprising repeating the steps (I) through (IV) for each set of three items i, j and k from the plurality of items.
6. The computer-implemented method according to claim 2 further comprising repeating steps (II) through (IV) for each set of three parameters p, q and r from the plurality of parameters.
7. The computer-implemented method according to claim 1 wherein the pair of weights w.sub.p and w.sub.q in step (c) is calculated such that:
(w.sub.p*a.sub.mp)+(w.sub.q*b.sub.mq)=(w.sub.p*a.sub.np)+(w.sub.q*b.sub.nq), and
w.sub.p+w.sub.q=1.
8. The computer-implemented method according to claim 2 wherein the set of three weights w.sub.p, w.sub.q and w.sub.r in step (III) is calculated such that:
(w.sub.p*a.sub.ip)+(w.sub.q*b.sub.iq)+(w.sub.r*c.sub.ir)=(w.sub.p*a.sub.jp)+(w.sub.q*b.sub.iq)+(w.sub.r*c.sub.jr),
(w.sub.p*a.sub.ip)+(w.sub.q*b.sub.iq)+(w.sub.r*c.sub.ir)=(w.sub.p*a.sub.kp)+(w.sub.q*b.sub.kq)+(w.sub.r*c.sub.kr), and
w.sub.p+w.sub.q+w.sub.r=1.
9. The computer-implemented method according to claim 1 wherein generating the plurality of points in step (f) further comprises: calculating, at the operating system, a score value of the stored pair of weight values w.sub.p and w.sub.q and the pair of items m and n, wherein said score value is calculated as a linear combination of w.sub.p, and w.sub.q and the plurality of parameters of one of the pair of items m and n; and determining whether to discard the stored set pair of weight values w.sub.p, and w.sub.q and the pair of items m and n based on the calculated score value; or the computer-implemented method according to claim 2 wherein generating the plurality of points in step (f) further comprises: calculating, at the operating system, a score value of the stored set of three weight values w.sub.p, w.sub.q and w.sub.r and the set of three items i, j and k, wherein said score value is calculated as a linear combination of w.sub.p, w.sub.q and w.sub.r and the plurality of parameters of one of the three items i, j and k; and determining whether to discard the stored set of three weight values w.sub.p, w.sub.q and w.sub.r and the set of three items i, j and k based on the calculated score value.
10. The computer-implemented method according to claim 9 wherein: determining whether to discard the stored set pair of weight values w.sub.p, and w.sub.q and the pair of items m and n further comprises: calculating, at the operating system, another score value for each item of the plurality of items as a linear combination of the stored pair of weight values w.sub.p, and w.sub.q and the plurality of parameters of one of the pair of items m and n; obtaining, at the operating system, a comparison value based on the score value and the another score value; and determining whether to discard the stored set pair of weight values w.sub.p, and w.sub.q and the pair of items m and n based on the comparison value; or wherein determining whether to discard the stored set of three weight values w.sub.p, w.sub.q and w.sub.r and the set of three items i, j and k further comprises: calculating, at the operating system, another score value for each item of the plurality of items as a linear combination of the stored set of three weight values w.sub.p, w.sub.q and w.sub.r and the plurality of parameters of one of the three items i, j and k; obtaining, at the operating system, a comparison value based on the score value and the another score value; and determining whether to discard the stored set of three weight values w.sub.p, w.sub.q and w.sub.r and the set of three items i, j and k based on the comparison value.
11. The computer-implemented method according to claim 1 wherein each parameter of the plurality of parameters further comprises a direction value and wherein the method further comprises performing, at the operating system, a transformation of at least one parameter of at least one item of the plurality of items based on the direction of the at least one parameter.
12. The computer-implemented method according to claim 11, wherein performing the transformation comprises performing a sign change operation of the at least one parameter of the at least one item.
13. The computer-implemented method according to claim 11, wherein the direction value of a parameter indicates one of two directions of improvement for said parameter and wherein performing the transformation of the at least one parameter based on the direction comprises performing the transformation if the direction value indicates a determined direction of improvement.
14. The computer-implemented method according to claim 1, wherein the plot is a ternary plot and each item of the plurality of items identifies an experimental model.
15. The computer-implemented method according to claim 14, further comprising receiving an input, by the operating system, to select an item in the displayed ternary plot.
16. A computer device for generating a visual representation of items, the computer device comprising a display, one or more processors, memory and one or more programs stored in the memory and configured for execution of the one or more processors, the one or more programs comprising instructions for: (a) Selecting a pair of items m and n from a plurality of items, wherein each item of the plurality of items comprises a plurality of parameters, each parameter of the plurality of parameters representing a property of the plurality of items; (b) for the selected pair of items m and n, selecting a pair of parameters p and q from the plurality of parameters, wherein a.sub.mp is the parameter p of the item m, b.sub.mq is the parameter q of the item m, a.sub.np is the parameter p of the item n, and b.sub.nq is the parameter q of the item n; (c) calculating a pair of weights w.sub.p and w.sub.q wherein w.sub.p is calculated based on a.sub.mp, b.sub.mq, a.sub.np and b.sub.nq, and wherein w.sub.q is calculated based on a.sub.mp, b.sub.mq, a.sub.np and b.sub.nq; (d) if w.sub.p>0 and w.sub.q>0, storing, in the memory, the pair of weights w.sub.p and w.sub.q and the pair of items m and n; (e) determining a first vertex item, a second vertex item and a third vertex item, wherein the first vertex item is the item of the plurality of items comprising the greatest value for the parameter p, the second vertex item is the item of the plurality of items comprising the greatest value for the parameter q, and the third vertex item is the item of the plurality of items comprising the greatest value for a parameter r from the plurality of parameters; (f) generating a plurality of points based on the stored pair of weights w.sub.p and w.sub.q and the pair of items m and n, the first vertex item, the second vertex item and the third vertex item; and (g) displaying, in the display, the visual representation of the plurality of items by displaying a plot comprising a geometric shape and comprising the plurality of points.
17. The computer device according to claim 16, the one or more programs further comprising instructions for: (I) Selecting a set of three items i, j and k from the plurality of items; (II) for the selected set of three items i, j and k, selecting a set of three parameters p, q and r from the plurality of parameters, wherein a.sub.ip is the parameter p of the item i, b.sub.iq is the parameter q of the item i, c.sub.ir is the parameter r of the item i, a.sub.jp is the parameter p of the item j, b.sub.jq is the parameter q of the item j, c.sub.jr is the parameter r of the item j, a.sub.kp is the parameter p of the item k, b.sub.kq is the parameter q of the item k, and c.sub.kr is the parameter r of the item k; (III) calculating a set of three weights w.sub.p, w.sub.q and w.sub.r wherein w.sub.p is calculated based on a.sub.ip, b.sub.iq, c.sub.ir, a.sub.jp, b.sub.jq, c.sub.jr, a.sub.kp, b.sub.kq, and c.sub.kr, w.sub.q is calculated based on a.sub.ip, b.sub.iq, c.sub.ir, a.sub.jp, b.sub.jq, c.sub.jr, a.sub.kp, b.sub.kq, and c.sub.kr and w.sub.r is calculated based on a.sub.ip, b.sub.iq, c.sub.ir, a.sub.jp, b.sub.jq, c.sub.jr, a.sub.kp, b.sub.kq, and c.sub.kr; and (IV) if w.sub.p>0, w.sub.q>0 and w.sub.r>0, storing, in the memory, the set of three weight values w.sub.p, w.sub.q and w.sub.r and the set of three items i, j and k; wherein step (f) further comprises generating the plurality of points based on the stored set of three weight values w.sub.p, w.sub.q and w.sub.r and the set of three items i, j and k.
18. The computer device according to claim 16, the one or more programs further comprising instructions for generating the plurality of points in (f) by: Calculating a score value of the stored pair of weight values w.sub.p and w.sub.q and the pair of items m and n, wherein said score value is calculated as a linear combination of w.sub.p, and w.sub.q and the plurality of parameters of one of the pair of items m and n; and determining whether to discard the stored set pair of weight values w.sub.p, and w.sub.q and the pair of items m and n based on the calculated score value; or the computer device according to any of claims 17, 20, 21 and 23, the one or more programs further comprising instructions for generating the plurality of points in (f) by: calculating a score value of the stored set of three weight values w.sub.p, w.sub.q and w.sub.r and the set of three items i, j and k, wherein said score value is calculated as a linear combination of w.sub.p, w.sub.q and w.sub.r and the plurality of parameters of one of the three items i, j and k; and determining whether to discard the stored set of three weight values w.sub.p, w.sub.q and w.sub.r and the set of three items i, j and k based on the calculated score value.
19. The computer device according to claim 18, the one or more programs further comprising instructions for determining whether to discard the stored set pair of weight values w.sub.p, and w.sub.q and the pair of items m and n by: Calculating another score value for each item of the plurality of items as a linear combination of the stored pair of weight values w.sub.p, and w.sub.q and the plurality of parameters of one of the pair of items m and n; Obtaining a comparison value based on the score value and the another score value; and determining whether to discard the stored set pair of weight values w.sub.p, and w.sub.q and the pair of items m and n based on the comparison value; or the one or more programs further comprising instructions for determining whether to discard the stored set of three weight values w.sub.p, w.sub.q and w.sub.r and the set of three items i, j and k by: calculating another score value for each item of the plurality of items as a linear combination of the stored set of three weight values w.sub.p, w.sub.q and w.sub.r and the plurality of parameters of one of the three items i, j and k; obtaining a comparison value based on the score value and the another score value; and determining whether to discard the stored set of three weight values w.sub.p, w.sub.q and w.sub.r and the set of three items i, j and k based on the comparison value.
20. A non-transitory computer-readable storage medium storing one or more programs configured for execution by a computer system having a display, one or more processors, and memory, the one or more programs comprising instructions for: (a) Selecting a pair of items m and n from a plurality of items, wherein each item of the plurality of items comprises a plurality of parameters, each parameter of the plurality of parameters representing a property of the plurality of items; (b) for the selected pair of items m and n, selecting a pair of parameters p and q from the plurality of parameters, wherein a.sub.mp is the parameter p of the item m, b.sub.mq is the parameter q of the item m, a.sub.np is the parameter p of the item n, and b.sub.nq is the parameter q of the item n; (c) calculating a pair of weights w.sub.p and w.sub.q wherein w.sub.p is calculated based on a.sub.mp, b.sub.mq, a.sub.np and b.sub.nq, and wherein w.sub.q is calculated based on a.sub.mp, b.sub.mq, a.sub.np and b.sub.nq; (d) if w.sub.p>0 and w.sub.q>0, storing, in the memory, the pair of weights w.sub.p and w.sub.q and the pair of items m and n; (e) determining a first vertex item, a second vertex item and a third vertex item, wherein the first vertex item is the item of the plurality of items comprising the greatest value for the parameter p, the second vertex item is the item of the plurality of items comprising the greatest value for the parameter q, and the third vertex item is the item of the plurality of items comprising the greatest value for a parameter r from the plurality of parameters; (f) generating a plurality of points based on the stored pair of weights w.sub.p and w.sub.q and the pair of items m and n, the first vertex item, the second vertex item and the third vertex item; and (g) displaying, in the display, the visual representation of the plurality of items by displaying a plot comprising a geometric shape and comprising the plurality of points.
Description
DETAILED DESCRIPTION OF THE DRAWINGS
[0102]
[0103]
[0104]
[0105]
[0106]
[0107]
[0108]
[0109]
[0110]
[0111]
[0112]
[0113]
[0114]
[0115]
[0116]
[0117]
[0118]
[0119]
DESCRIPTION OF EMBODIMENTS
[0120] Embodiments of the present disclosure will be described herein below with reference to the accompanying drawings. However, the embodiments of the present disclosure are not limited to the specific embodiments and should be construed as including all modifications, changes, equivalent devices and methods, and/or alternative embodiments of the present disclosure.
[0121] The terms “have,” “may have,” “include,” and “may include” as used herein indicate the presence of corresponding features (for example, elements such as numerical values, functions, operations, or parts), and do not preclude the presence of additional features.
[0122] The terms “A or B,” “at least one of A or/and B,” or “one or more of A or/and B” as used herein include all possible combinations of items enumerated with them. For example, “A or B,” “at least one of A and B,” or “at least one of A or B” means including at least one A, including at least one B, or including both at least one A and at least one B.
[0123] The terms such as “first” and “second” as used herein may modify various elements regardless of an order and/or importance of the corresponding elements, and do not limit the corresponding elements. These terms may be used for the purpose of distinguishing one element from another element. For example, a first element may be referred to as a second element without departing from the scope the present invention, and similarly, a second element may be referred to as a first element.
[0124] It will be understood that, when an element (for example, a first element) is “(operatively or communicatively) coupled with/to” or “connected to” another element (for example, a second element), the element may be directly coupled with/to another element, and there may be an intervening element (for example, a third element) between the element and another element. To the contrary, it will be understood that, when an element (for example, a first element) is “directly coupled with/to” or “directly connected to” another element (for example, a second element), there is no intervening element (for example, a third element) between the element and another element.
[0125] The expression “configured to (or set to)” as used herein may be used interchangeably with “suitable for” “having the capacity to” “designed to” “adapted to” “made to,” or “capable of” according to a context. The term “configured to (set to)” does not necessarily mean “specifically designed to” in a hardware level. Instead, the expression “apparatus configured to . . . ” may mean that the apparatus is “capable of . . . ” along with other devices or parts in a certain context.
[0126] In the specification below, the same reference numbers in the drawings refer to the same elements/components. DoE is usually an approach using statistics to deal with planning efficient experiments to be used in product development and process improvement. In both product development and process improvement, it is of interest to know how the values of a given set of parameters (called factors) influence one or more outcomes (called responses). Usually an experiment is defined by the number and nature of the parameters or factors that are to be studied, together with the number and nature of the outcomes or responses that will be measured. An experiment is divided in tests, being the total number of tests another important feature of an experiment. The number of tests is directly related with the cost of an experiment, as an experiment with more tests will be more expensive in terms of time, man-power and materials, for example. The set of all the features describing an experiment, such as the number and nature of the parameters or factors, the number and nature of the outcomes or responses, and/or the number of tests, is called experimental conditions.
[0127] The detailed description of what values of the parameters or factors are to be used for each one of the tests in an experiment is called an experimental design or experimental plan.
[0128] An experimental design can be characterized using a multiplicity of statistical quality characteristics. These statistical quality characteristics can be numerical (take values equal to a real number) or categorical (take values within a given finite set of possibilities) in nature. The set of all statistical quality characteristics of a certain experimental design is called experimental design characterization.
[0129] Given certain experimental conditions, an experimental design with appropriate experimental design characterization is chosen from a database of experimental designs. Such a database contains several hundreds of thousands of experimental designs and it is stored in a computer database (denoted as Database system).
[0130]
[0131] Input unit 102 may comprise any known device to allow an operator to generate data and instructions for CPU 101, like a keyboard, a mouse, one or more touch screens, etc.
[0132] Memory 104 may comprise any suitable known memory devices to store data and computer programs to be run on CPU 101, and may include any known type of volatile and non-volatile memory equipment, RAM and ROM types of memories, etc. A non-transitory computer-readable storage medium maybe any kind of memory such as a non-volatile RAM, etc. The computer programs comprise instructions to be loaded by CPU 101.
[0133] Output unit 106 may comprise any suitable output device to output data to a user including a display, etc.
[0134] Communication module 108 is configured to transmit and to receive signals from other electronic devices adapted to communicate with the electronic device 100. Any known and suitable transceiver equipment can be used for that purpose using any known or still to be developed (standard) communication technique including 2G, 3G, 4G, 5G, Wifi, Bluetooth, NFC, etc. To that end communication module 108 is connected to a network 110 and an antenna 112.
[0135]
[0136]
[0137]
[0138] The designs and the design characterizations are contained in a Database Management System (DBMS) 156. The DBMS 156 contains a structured representation of the data and it is able to receive queries via a database query module 158. The data in the DBMS may be represented in related tables, where each table contains a multiplicity of records and columns that can have character or numerical types. The server application 130 runs in a electronic server device 124 comprising a computer infrastructure 150 with a certain operating system 152 (e.g. Linux or any other suitable operating system) and a certain filesystem 154 (e.g. ext4). The server application 124 can send queries to the DBMS using a communication interface (eg. via a socket).
[0139] The process can be described as follows. First, the user interacts with the user interface via the user interface module 146 which comprises the filtering data procedure 148 and the plot visualization procedure 141. The user interface is graphically represented in a displaying device such as display 106 using one or several input devices such as input device 102. The interaction is the user defined input, which is then sent by the client application 140 to the server application 130 through the network 110 using communication module 151 and communication module 143. The user defined input is then translated by the server application 130 into a multiplicity of queries which are submitted to the DBMS 156 to be used by the database query module 158 and the database search module 153. The queries can be, for example, SQL queries where the tables and the parameters involved form part of the query statement in a format that is understood by the DBMS 156. The DBMS processes the queries and performs the necessary operations to collect the query results, which are then sent to the server application. The multiplicity of queries represents a filtering of the designs characterizations and the results are a set of designs and their corresponding design characterizations that fulfill the filtering requirements. The server application may perform some processing of the results using internal local memory and a set of processing commands. The processed results are then sent through the network to the client application which displays the information to the user.
[0140]
[0141] After the filtering data is received in step 202 of
[0142] The filtering data is a set of requirements on the experiment conditions and/or the statistical quality characteristics of the experimental designs. The set of requirements may comprise the number of 3-level factors of the experimental design (which, in the experimental design, are set at a low, average or high level), the number of 2-level factors of the experimental design (which, in the experimental design, are set at a low or a high level), the number of design runs of an experimental design, the number of center points in the design (a center point is a test of the experimental design where all 3-level factors are set at their average level), the existence of extreme points in the designs (an extreme point is a test of the experimental design where all factors are set at a low or a high level) and/or the number of replicates (a replicate occurs when there exists two or more tests in the design with the same levels for all factors). The statistical characteristics of the experimental designs may be classified in two groups: numerical statistical characteristics and categorical statistical characteristics. A numerical statistical characteristic is represented by a real number, while a categorical statistical characteristic is represented by a set of categories. The requirements on a numerical statistical characteristic may be expressed in a lower bound and/or an upper bound, wherein the lower bound of a numerical statistical characteristic represents a limitation on the minimum value of the real number representing said numerical statistical characteristic and the upper bound of a numerical statistical characteristic represents a limitation on the maximum value of the real number representing said numerical statistical characteristic. The requirements of a categorical statistical characteristic may be expressed as a selection of one or more categories of the set of categories representing said categorical statistical characteristic.
[0143] The filtering data may be one or more of the three following statistical quality criteria: projection estimation capacity, maximum third and fourth order correlation and the power to detect interaction and quadratic effects. As already said, the filtering data will be used to search a database of experimental designs. Any other statistical suitable parameter may be used to filter the database, such as for instance the average third and fourth order correlation.
[0144] The projection estimation capacity is related to the projection estimation capacity and projection information criteria of experimental designs. The statistic characteristics associated with the projection estimation capacity can be both numerical and/or categorical. The statistic characteristics of the projection estimation capacity comprise at least one of a number of projected 3-level factors, which is a numerical statistic characteristic represented by an integer, a number of projected 2-level factors, which is a numerical statistic characteristic represented by an integer, a statistical model, which is a categorical statistic characteristic represented by a limited number of categories, an average D-efficiency, which is a numerical statistic characteristic represented by a real number, an average unscaled prediction variance which is a numerical statistic characteristic represented by a real number, a G-efficiency which is a numerical statistic characteristic represented by a real number, and an average A-efficiency which is a numerical statistic characteristic represented by a real number.
[0145]
[0146] The different statistical characteristics that can be used to filter the database will be explained now. Each experimental design is stored in the database as a design matrix.
[0147]
[0148] Each element 406 of the first two columns of the design matrix 400 comprises a −1, a 0 or a 1 representing a 3-level factor. The 3-level factor are numerical factors, this is, factors that take values in a given closed interval. If we denote a factor as X, then the values that X can take within the experiment are included in the interval [a,b]. The three levels, −1, 0 and 1, represent, respectively, the low value, average value and high value within the interval [a,b]. This is, a factor level of −1 is equivalent to X=a, a factor level of 0 is equivalent of X=(a+b)/2 and a factor level of 1 is equivalent to X=b.
[0149] Each element 404 of the last two columns of the design matrix 400 comprises a −1 or a 1 and represents a 2-level factor. The 2-level factor are either a numerical factor that take values within the closed interval [a,b] or a categorical factor that indicates “Category A” or “Category B”. If we denote a factor as X, then a factor level of −1 indicates a factor value of X=a for a numerical factor or a factor value X=“Category A” for a categorical factor. A factor level of 1 indicates a factor value of X=b for a numerical factor or a factor value X=“Category B” for a categorical factor. The values of the 2-level factor are divided in a first group and a second group such that the first group corresponds to the first category and the second group corresponds to the second category. In the design matrix 400 of
[0150] The goal of an experiment is to study the relation of the factors with a variable of interest, also called response. Experimental design is widely used in product development and process optimization across different industries. The statistical model provides a relation between the factors and the response, which is the outcome variable that needs to be studied. The most commonly used statistical models may include main effects, interaction effects, quadratic effects and a constant term. The main effect of a factor is based on the differences between the mean of the response variable for each unit change of the factor. The interaction effect of two factors indicates that the effect of each factor on the response depends on the other factor. The quadratic effect of a factor indicates that there is curvature in the relation between a factor and the variable of interest or response. The quadratic effects exist only for 3-level factors.
[0151] The statistical model may comprise intercept and main effects model, or intercept, main effects and interaction effects model or intercept, main effects and second-order effects. The statistical model may be any other kind of suitable statistical model. The statistical model comprising intercept and main effects is a statistical model comprising the main effects of all the factors of the experimental design and a constant term. The statistical model comprising intercept, main effects and interaction effects is a statistical model comprising the same effects as the intercept and main effects statistical model and two-factor interaction effects. The statistical model comprising intercept, main effects and second-order effects comprises the same effects as the intercept, main effects and interaction effects statistical model and quadratic effects.
[0152] As already explained, the experimental designs may be selected based on projection capacity parameters. The projection capacity parameters may be related to the projection estimation capacity and projection information capabilities of the experimental designs. An experimental design may comprise a number m.sub.1 of 3-level factors, a number m.sub.2 of 2-level factors and n experimental tests or runs. The experimental design may be defined by a design matrix D comprising a number m of columns, wherein m equals (m.sub.1+m.sub.2), and n rows, such that the matrix D comprises m multiply by n elements and wherein each element of the design matrix D is a real number. Each one of the n rows of the design matrix D corresponds respectively to each one of the n experimental tests or runs, and each one of the m columns of the design matrix D corresponds respectively to each one of the m factor levels of each experimental test. A design sub-matrix of the design matrix D is a design sub-matrix D.sub.p comprising a number of rows equal to n, wherein n is the number of experimental tests or runs, and comprising as columns a subset of the columns of the design matrix D such that the first q.sub.1 columns of the design sub-matrix D.sub.p comprise a subset of the m.sub.1 columns of the design matrix D, and the last q.sub.2 columns of the design sub-matrix D.sub.p comprise a subset of the m.sub.2 columns of the design matrix D such that q.sub.1 is an integer number being equal or smaller than m.sub.1, q.sub.2 is an integer number being equal or less than m.sub.2, and q is equal to q.sub.1+q.sub.2.
[0153] As already said, a statistical model α may be a model comprising intercept effects and main effects, or a model comprising intercept effects, main effects and interaction effects or a model comprising intercept effects, main effects and second-order effects.
[0154] For each considered statistical model, a model matrix for an experimental design, denoted as X, can be built based on the design matrix representing the experimental design.
[0155] For example, a design matrix for an intercept, main effects and interaction effects model can be built by taking the design matrix 500 and adding a number of columns corresponding to the interaction effects. The columns corresponding to the interaction effects can be calculated by multiplying the columns of the main effects of two factors.
[0156] A model matrix X.sub.pα corresponding to a statistical model α may be constructed in the following ways.
[0157] If the statistical model α is a model α.sub.1 comprising intercept effects and main effects, then a model matrix X.sub.pα1, of the statistical model α.sub.1 comprises a number of columns equal to (q+1), wherein q is the number of columns of the design sub-matrix D.sub.p, and a number of rows equal to n, wherein n is the number of rows of the design sub-matrix D.sub.p, where the first column of the model matrix X.sub.pα1 is a column comprising only ones (also called the intercept column) and the second column up to the (q+1)-th column of the model matrix X.sub.pα1 are equal respectively to the first column up to the q-th column of the design sub-matrix D.sub.p.
[0158] An alternative model matrix X′.sub.pα1 corresponding to a design matrix D.sub.p and an statistical model α.sub.1 may be also constructed by permutation of the columns of the model matrix X.sub.pα1.
[0159] If the statistical model α is a model α.sub.2 comprising intercept, main effects and interaction effects, then a model matrix X.sub.pα2 of the model α.sub.2 comprises a number of columns equal to ((q.sup.2/2)+(q/2)+1), wherein q is the number of columns of the design sub-matrix D.sub.p, and a number of rows is equal to n, wherein n is the number of rows of the design sub-matrix D.sub.p, wherein the first column of the model matrix X.sub.pα2 is a column comprising only ones, the second column up to the (q+1)-th column of the model matrix X.sub.pα2 are equal respectively to the first column up to the q-th column of the design sub-matrix D.sub.p, and the (q+1+i)-th column of the model matrix X.sub.pα2 with i=1, . . . , ((q*(q−1))/2) is determined by the element-wise multiplication of the r-th and the s-th columns of the design sub-matrix D.sub.p, wherein r=floor (−0.5+square root of (0.25+(2*i))) and s=i−((r*(r+1))/2), wherein the floor function is the function that takes as input a real number x, and gives as output the greatest integer less than or equal to x.
[0160] Again, an alternative model matrix X′.sub.pα2 corresponding to a design matrix D.sub.p and an statistical model α.sub.2 may be also constructed by permutation of the columns of the model matrix X.sub.pα2.
[0161] If the statistical model a is a model α.sub.3 comprising intercept, main effects and second-order effects, wherein second order effects comprise interaction effects and quadratic effects, then the model matrix X.sub.pα3 of the model α.sub.3 comprises a number of columns equal to (q.sub.1+(q.sup.2/2)+(q/2)+1), wherein q.sub.1 is the number of, wherein q is the number of columns of the design sub-matrix D.sub.p, and a number of rows is equal to n, wherein n is the number of rows of the design sub-matrix D.sub.p, wherein the first column up to the ((q.sup.2/2)+(q/2)+1)-th column of the model matrix X.sub.pα3 are equal respectively to the first column up to the ((q.sup.2/2)+(q/2)+1)-th column of the model matrix X.sub.pα2, and the ((q.sup.2/2)+(q/2)+1+i)-th column of the model matrix X.sub.pα3 with i=1, . . . , q.sub.1 is determined by the element-wise multiplication of the i-th column of the design sub-matrix D.sub.p, by itself.
[0162] As already explained, an alternative model matrix X′.sub.pα3 corresponding to a design matrix D.sub.p and an statistical model α.sub.3 may be also constructed by permutation of the columns of the model matrix X.sub.pα3.
[0163] A design with design matrix D is said to have a projection estimation capacity equal to one for a specific model α and q.sub.1 3-level factors and q.sub.2 2-level factors if all distinct X.sub.pα constructed including all different combinations of q.sub.1 3-level factors and q.sub.2 two-level factors are full rank. A full rank matrix implies that its rank equals the lesser of the number of rows and columns.
[0164] A user can set the values for q.sub.1, q.sub.2 and α, so that there is a query made to the database where only the designs with the projection estimation capacity equal to one for a model α and q.sub.1 3-level factors and q.sub.2 2-level factors are returned.
[0165] The D-efficiency of an experimental design for a given statistical model having a model matrix X can be calculated according to the following equation:
[0166] wherein p is the number of effects of the statistical model, i.e., p equals the number of columns of the model matrix X, X.sup.T is the transpose of the matrix X obtained by flipping the matrix X over its diagonal, and det is the determinant.
[0167] The A-efficiency of an experimental design for a given statistical model can be calculated according to the following equation:
[0168] wherein trace is a matrix operating that adds the diagonal elements of a matrix.
[0169] The average unscaled prediction variance of an experimental design for a given statistical model can be calculated according to the following equation:
[0170] where x is a column vector of factor levels, this is, a column vector with m1+m2 entries where the first m1 entries lie inside the interval [−1,1]. The last m2 entries correspond to the two-level factors. For a categorical factor, then the corresponding entry is either −1 or 1. For a numerical two-level factor, then the corresponding entry lies in the interval [−1,1]. The function f(x) is a function that takes a vector of factor levels x and expands it to the corresponding model effects, which can include intercept, main effects, interaction effects and quadratic effects. The experimental region R consists of all possible values that a vector x can take (all combinations of the factor levels).
[0171] The G-efficiency of an experimental design for a given statistical model can be calculated according to the following equation:
Wherein upv.sub.x is the unscaled prediction variance at a point defined by the vector x, and it is equal to
upv.sub.x=f.sup.T(x)(X.sup.TX).sup.−1f(x)
[0172] The power to detect an effect can be calculated, for instance, using the proposed method in Chapter 9 of the book Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Routledge, https://doi.org/10.4324/9780203771587.
[0173] The one or more items may be as well selected based on certain requirements in statistical quality characteristics of the items such as a correlation parameter of the design matrix D. The correlation parameter of the design matrix D may be based on an absolute correlation between the columns of the design matrix D. The correlation parameter of the design matrix D may comprise a maximum value of a third order correlation parameter of the design matrix D and/or a maximum value of a fourth order correlation parameter of the design matrix D. The maximum third and fourth order correlation parameters are related to the correlation between the effects in the statistical model when using a certain experimental design. There are two numerical characteristics associated with this criterion: the maximum third order correlation in absolute value between a main effect and a second-order effect, and the maximum fourth order correlation in absolute value between two second-order effects.
[0174] The third order correlation parameter of the design matrix D may be determined based on the maximum absolute value of the cosine similarity between any of the columns of the design matrix D and a second-order column of the design matrix D, wherein the second-order column of the design matrix D is a column vector obtained by an element-wise multiplication of two different columns of the design matrix D or by an element-wise multiplication of one column of the design matrix D with itself.
[0175] The cosine similarity of a vectors A of dimension n and a vector B of dimension n is calculated as follows:
[0176] The fourth order correlation parameter of the design matrix D may be determined based on the maximum absolute value of the cosine similarity between any two different second-order columns of the design matrix D.
[0177] The third and/or the fourth order correlation parameters can be used in a query to the database such that the database is searched based on said query. The search based on said query may return only designs having a maximum third and/or fourth correlation parameter of the design matrix D with a value less or equal to the value set by the user. The user may set up third and/or the fourth order correlation parameters by using a user interface displayed in an input unit as it is shown in
[0178] The one or more items may be also selected based on a minimum power to detect an interaction effect or a quadratic effect. An interaction effect column of a design matrix D is defined as as a column vector of n real entries that are the result of an element-wise multiplication of two distinct columns in the design matrix D. A quadratic effect column of a design matrix D is defined as a column vector of n real entries that are the result of an element-wise multiplication of a column of the design matrix D by itself.
[0179] The power of an interaction or a quadratic effect is calculated using a non-central F-distribution, considering several signal-to-noise ratios and several significance levels, considering an underlying model with intercept and main effects. A minimum power to detect an interaction effect is a minimum power across the powers to detect each distinct interaction effect column. A minimum power to detect a quadratic effect is a minimum power across the powers to detect each distinct quadratic effect column.
[0180] As explained, the power to detect interaction and quadratic effects parameters are related to the power to detect the two different types of second-order effects, namely interaction effects and quadratic effects, while using an experimental design. There are two numerical characteristics associated with this criterion: the powerto detect an interaction effect and the powerto detect a quadratic effect.
[0181] As already explained, the filtering data is received in step 202 of
[0182] The result of the database search can be presented to a user via a user interface such a screen display in the form of, for instance, boxplots, parallel coordinate plots and/or tables.
[0183] The user can select one or several experimental designs based on the displayed information. The unique identifiers of the selected experimental designs can be stored in the memory for further processing. The user can also modify the filtering data. If this is the case, then a new query is sent to the database system. Otherwise, the unique identifiers of all selected experimental designs are saved and the filtering stage ends.
[0184] As explained, the method of
[0185] A user can set the values of the minimum power to detect an interaction effect or a quadratic effect so that a query is made to the database and only designs with a minimum power to detect an interaction effect or a quadratic effect greater or equal than the set values are returned.
[0186] The database of design characterizations may be contained in a relational database (RDBMS), which is a system that presents the data to the user as relations in a tabular form wherein each table consists of a set of rows and columns. A column may be called a field, and a row may be called a record. The columns of the table may be of numerical or categorical type. A numerical type may be represented by a data type INTEGER, FLOAT or DECIMAL or other suitable data type. A categorical type may be represented by a data type STRING, ENUM, BOOLEAN or other suitable data type.
[0187] As a non-limiting example, the schema in
[0188] The table design_characterizations 702 may contain 10 columns. Each column of said table 702 contains the statistic characteristics of a design. The content of the columns of the table design_characterizations 702 will be described below following the order in which said columns appear in
[0199] The table projection_properties 704 contains 9 columns. Each column of this table 704 contains statistic characteristics of a design. The content of the columns of the table projection_properties 704 will be described below following the order in which said columns appear in
[0209] The relation between both tables is represented by the symbol
[0210] This symbol indicates that one or more records in the table projection_properties 702 are related to one record in the design_characterizations table 704.
[0211] Such a relational database as the one presented in the schema in
[0212] Consider the database consisting of the following records shown in the below tables A and B wherein table A shows the design_characterizations and table B shows the projection_properties.
TABLE-US-00001 TABLE A example table design_characterizations with data ex- max_third_or- max_fourth_or- min_power_inter- min_power_qua- num_three_lev- num_two_lev- num_cen- treme id der_correlation der_correlation action_effects dratic_effects el_factors el_factors num_runs ter_points points 1 0 0.167 0.43 0.38 3 0 14 2 FALSE 2 0 0 0.45 0.45 3 0 16 4 FALSE
TABLE-US-00002 TABLE B example table projection_properties with data design_character- num_three_level_fac- num_two_level_fac- d_effi- a_effi- g_effi- average_predic- id ization_id tors_projected tors_projected model ciency ciency ciency tion_variance 1 1 1 0 meie 75.59 72.72 72.72 0.111 2 1 1 0 mesoe 51.92 36.73 85.71 0.156 3 1 2 0 meie 55.27 50 50 0.185 4 1 2 0 mesoe 41.94 30.61 61.22 0.281 5 1 3 0 meie 45.99 41.79 41.79 0.281 6 1 3 0 mesoe 37.7 27.87 49.69 0.44 7 2 1 0 meie 70.71 66.67 66.67 0.104 8 2 1 0 mesoe 50 37.5 75 0.134 9 2 2 0 meie 50 44.44 44.44 0.178 10 2 2 0 mesoe 39.69 31.58 54.54 0.239 11 2 3 0 meie 41.02 36.84 36.84 0.275 12 2 3 0 mesoe 35.36 29.41 45.45 0.36
[0213] In order to retrieve the id, the maximum fourth order correlation, and the number of runs of the designs with 3 three-level factors and 0 two-level factors, with a number of runs less or equal than 16 and a D-efficiency for a model with main effects and second order effects (coded as “mesoe”) of at least 50, following query, for instance in SQL language may be built:
“SELECT t1.id, t1.max_fourth_order_correlation, t1.num_runs FROM design_characaterizations t1
INNER JOIN projection_properties t2 ON t1.id=t2.design_characterization_id
WHERE t1.n_three_level_factors=3 AND t1.n_two_level_factors=0 AND t1.n_runs<=16
AND t2.model=‘mesoe’ AND t2.d_efficiency>=50 AND t2. num_three_level_factors_projected=3”
And the results obtained by performing the above query are shown in a table format in the below table C.
TABLE-US-00003 TABLE C table with the results of the example query id max_fourth_order_correlation num_runs 1 0.167 14
[0214] A visual method to select an item among a plurality of items based on ternary plots will be explained now.
[0215]
[0216] In a ternary plot, the proportion of the three numerical parameters or attributes sum up to some constant number K. The numerical parameters or attributes will be denoted by the letters a, b and c in the rest of the present disclosure. The three proportions or weights of each of the numerical attributes or parameters will be denoted respectively with the symbols w.sub.a, w.sub.b and w.sub.c. Each of the three weights w.sub.a, w.sub.b and w.sub.c are non-negative real numbers. The sum of the three weights w.sub.a, w.sub.b and w.sub.c is equal to one. The set of the three weights (w.sub.a, w.sub.b, w.sub.c) represent a weight point. For example, in
[0217]
[0218] In step 904 of
[0219] The step 904 of the method of
[0220] Further examples and details about how to obtain the vertices, edges and internal points of the ternary plot will be provided later on.
[0221] Each of the first, second and third parameters may further comprise a direction value. The method of
[0222] Step 906 of
[0223] Finally, the method of
[0224]
[0225] In step 1002 of
(w.sub.a*a.sub.i)+(w.sub.b*b.sub.i)=(w.sub.a*a.sub.j)+(w.sub.b*b.sub.j);
w.sub.a+w.sub.b=1;
[0226] wherein a.sub.i is the first parameter of item i, b.sub.i is the second parameter of item i, a.sub.j is the first parameter of item j, and b.sub.j is the second parameter of item j.
[0227] In step 1004, if w.sub.a and w.sub.b are both greater than zero, the method proceeds further to step 1006 wherein w.sub.a and w.sub.b are stored as weights of the items i and j, for instance in the form of pairs {[(w.sub.a, w.sub.b, 0), i], [(w.sub.a, w.sub.b, 0), j]}. If in step 1004, w.sub.a and w.sub.b are not both greater than zero, the method proceeds to step 1008.
[0228] In step 1008 of
(w.sub.a*a.sub.i)+(w.sub.c*c.sub.i)=(w.sub.a*a.sub.j)+(w.sub.c*c.sub.j);
w.sub.a+w.sub.c=1;
[0229] In step 1010, if w.sub.a and w.sub.c are both greater than zero, the method proceeds further to step 1012 wherein w.sub.a′ and w.sub.c′ are stored as weights of the items i and j, for instance in the form of pairs {[(w.sub.a, 0, w.sub.c), i], [(w.sub.a, 0, w.sub.c), j]}. If, in step 1014, w.sub.a and w.sub.c are not both greater than zero, the method proceeds to step 1014.
[0230] In step 1014 of
(w.sub.b*b.sub.i)+(w.sub.c*c.sub.i)=(w.sub.b*b.sub.j)+(w.sub.c*c.sub.i);
w.sub.b+w.sub.c=1;
[0231] In step 1016, if w.sub.b and w.sub.c are both greater than zero, the method proceeds further to step 1018 wherein w.sub.b and w.sub.c are stored as weights of the items i and j, for instance in the form of pairs {[(0, w.sub.b, w.sub.c), i], [(0, w.sub.b, w.sub.c), j]}. If, in step 1014, w.sub.b and w.sub.c are not both greater than zero, the method ends.
[0232] The method described in
[0233]
[0234] In step 1102 of
(w.sub.a*a.sub.i)+(w.sub.b*b.sub.i)+(w.sub.c*c.sub.i)=(w.sub.a*a.sub.j)+(w.sub.b*b.sub.i)+(w.sub.c*c.sub.j);
(w.sub.a*a.sub.i)+(w.sub.b*b.sub.i)+(w.sub.c*c.sub.i)=(w.sub.a*a.sub.k)+(w.sub.b*b.sub.k)+(w.sub.c*c.sub.k);
w.sub.a+w.sub.b+w.sub.c=1;
[0235] wherein a.sub.1 is the first parameter of item i, b.sub.i is the second parameter of item i, c.sub.i is the third parameter of item i, a.sub.j is the first parameter of item j, b; is the second parameter of item j, and c.sub.j is the second parameter of item j, and a.sub.k is the first parameter of item k, b.sub.k is the second parameter of item k, and c.sub.k is the second parameter of item k.
[0236] In step 1104, if w.sub.a, w.sub.b and w.sub.c are all greater than zero, the method proceeds further to step 1106 wherein w.sub.a, w.sub.b and w.sub.c are stored as weights of the items i, j and k, for instance in the form of {[(w.sub.a, w.sub.b, w.sub.c), i], [(w.sub.a, w.sub.b, w.sub.c), j], [(w.sub.a, w.sub.b, w.sub.c), k] }. If in step 1104, w.sub.a, w.sub.b and w.sub.c are not all greater than zero, the method ends.
[0237] The method described in
[0238] Regarding the methods described with respect to
[0239] The method described with respect to
[0240] Furthermore, the methods described with respect to
[0241] The saved weights points and the associated items contain all necessary information to produce the ternary plot object of the present disclosure. Multi-attribute decision making (MADM) refers to making preference decisions (such as evaluation, prioritization, selection) over the available alternatives that are characterized by multiple, usually conflicting, attributes. An example of MADM can be found in “Multiple criteria decision support software”, 2005, by Weistroffer, H. R., Smith, C. H., and Narula, S. C. or in “Multiple Criteria Decision Analysis: State of the Art Surveys, International Series in Operations Research & Management Science”, by In Figueira, J., Greco, S., and Ehrgott, M., editors, Springer.
[0242] It is important to note that the number of available alternatives is finite.
[0243]
[0244] The vertex points and the points that define the regions in the ternary plot 1200 of
[0245] First it will be explained how to filter a database of experimental designs and then it will be explained how to construct a ternary plot showing experimental designs.
[0246] The allocated budget allows for a number n of experimental tests or runs between 16 and 20. In this case, due to the high price of each experimental test (in materials, manpower and time), it is desired that the number of experimental tests is as low as possible. On the other hand, the experimental designs should have high quality meaning that strict requirements for the design statistical quality characteristics should be imposed. These requirements consist in certain thresholds imposed in the following statistical quality characteristics: the fourth order correlation, the power to detect an interaction effect, the power to detect a quadratic effect and the G-efficiency for a design model with intercept, main effects, interaction effects and quadratic effects.
[0247] To this end, a pharmaceutical company may use a software program to access a database comprising a catalog of experimental designs by performing a query to filter the database records that fulfil the specified requirements. These requirements may be as follows: [0248] Number of three-level factors: 3 [0249] Number of experimental tests or runs: between 16 and 20 [0250] Fourth order correlation <=0.55 [0251] Power to detect an interaction effect >=0.5 [0252] Power to detect a quadratic effect >=0.5 [0253] G-efficiency for a model with intercept, main effects, interaction effects and quadratic effects >=40
[0254] These requirements may be provided by a user through, for instance, a User Interface displayed on a display 106 of an electronic device 100.
[0255] The requirements are then sent by the client application 140 of the electronic device 100 via the communication module 108 through network 110 to the server application 130 via the communication module of another electronic device 124 which communicates with the DBMS 156 to obtain data representing the experimental designs complying with the mentioned requirements. The server application 130 may perform some processing of the data representing the experimental designs and send it through the network 110 to the client application 100 which may display the data representing the experimental designs in the display 106 of the electronic device 100.
[0256] It will be explained now an example of how to display the data representing the experimental designs in the display 106 of the electronic device 100. The results of the query may comprise data representing four experimental designs, each one being characterized using the five different statistical quality characterizations specified before. The data is displayed in Table 1, wherein the first column shows a design number used to identify each one of the experimental designs stored in the DBMS 156.
TABLE-US-00004 TABLE 1 results of the filtering of the DBMS G-efficiency for a model with number intercept, main design of 4th order power power effects and second number tests correlation interaction quadratic order effects 60 16 0 0.5223 0.5223 45.45 2033 17 0.514 0.8005 0.5424 40.63 9226 18 0.298 0.8076 0.5424 71.11 9227 20 0.167 0.8184 0.6218 66.67
[0257] The direction of improvement of each of the different statistical quality characterizations is as follows: [0258] Number of experimental tests or runs: lower values are preferred [0259] 4.sup.th order correlation: lower values are preferred [0260] Power interaction effects: higher values are preferred [0261] Power quadratic effects: higher values are preferred [0262] G-efficiency: higher values are preferred
[0263] Given these directions of improvement, the following transformation is applied to some of the columns in Table 1. The columns that are transformed are those corresponding to statistical quality parameters for which lower values are preferred, this is, the second and third columns which correspond respectively to the number of experimental tests or runs and the 4.sup.th order correlation. The transformation consists in multiplying all values in those columns by −1. Table 2 shows the data of table 1 after applying this transformation.
TABLE-US-00005 TABLE 2 results after applying the directions of improvements G-efficiency for a model with intercept, main design Number 4th order power power effects and second number of tests correlation interaction quadratic order effects 60 −16 0 0.5223 0.5223 45.45 2033 −17 −0.514 0.8005 0.5424 40.63 9226 −18 −0.298 0.8076 0.5424 71.11 9227 −20 −0.167 0.8184 0.6218 66.67
[0264] As already explained, a pharmaceutical company may consider that the most important statistical quality characteristics to select experimental designs are the number of experimental tests or runs, the 4.sup.th order correlation and the power to detect interaction effects. The method presented now applied to this example allows an efficient construction of a ternary plot, where a plurality of selected items or experimental designs can be compared to each other based on different attributes such as the above specified statistical quality characteristics in order to select them.
[0265]
[0266] To avoid numerical issues in the calculations that follow, the statistical quality characteristics of interest in Table 2 will be normalized, so that the values of said statistical quality characteristics lie between 0 and 1. Table 3 shows the same columns as table 2 after normalization. In table 3, the statistical quality characteristics have been renamed as a which is the number of experimental tests or runs, b which is the 4.sup.th order correlation, and c which is the power interaction.
[0267] From now on we will refer to the statistical quality characteristics as attributes. The design numbers of tables 1 and 2 are called items in table 3 and designs 60, 2033, 9226 and 9227 of tables 1 and 2 have been relabeled respectively as item 1, item 2, item 3 and item 4. This is just a notation used to allow more easily to explain the example. We use the notation a.sub.1 to refer to the value of attribute a for item 1, this is, a.sub.1=1, and so on for the other items and corresponding attributes. i.e, each item i has associated three attribute values a.sub.i, b.sub.i and c.sub.i corresponding respectively to the statistical quality characteristics number of experimental tests or runs, the 4.sup.th order correlation and the power to detect interaction effects, and wherein i=1 . . . 4.
TABLE-US-00006 TABLE 3 results after applying the directions of improvements and normalization item a b c 1 1 1 0 2 0.75 0 0.9395 3 0.5 0.4202 0.9635 4 0 0.6751 1
[0268] We define the score of an item i for a certain set of weights of the attributes w.sub.a, w.sub.b and w.sub.c as: score.sub.i=w.sub.a*a.sub.i+w.sub.b*b.sub.i+w.sub.c*c.sub.i
[0269] The weights w.sub.a, w.sub.b and w.sub.c correspond to a point in the ternary plot of
[0270] The calculation of the points in the ternary plot is performed by a processing unit such as CPU 102 of the electronic device 100 shown in
[0271] As already said, in the first step, the items corresponding to the vertex points vertex 1, vertex 2 and vertex 3 of the ternary plot 1500 will be calculated as follows. For each vertex, the item with a higher score when using the weights corresponding to the coordinates in the ternary plot of said vertex is calculated by CPU 101 in the following way:
[0272] For vertex 1 having coordinates in the ternary plot (w.sub.a, w.sub.b, w.sub.c)=(1,0,0), only the attribute a (number of experimental tests) of each item is taken into account and therefore from table 3 the following scores are calculated for the different items: score.sub.1=1, score.sub.2=0.75, score.sub.3=0.5, score.sub.4=0. As the highest score is score.sub.1=1, the pair or weights and item {[(1,0,0), 1]} is stored in memory 104.
[0273] For vertex 2 with coordinates (0,1,0), we will consider the attribute b of each item and therefore we obtain from table 3 the following scores for the different items: score.sub.1=1, score.sub.2=0, score.sub.3=0.4202, score.sub.4=0.6751. As the highest score is score.sub.1=1, the pair or weights and item {[(0,1,0), 1]} is stored in memory 104.
[0274] For vertex 3 with coordinates (0,0,1), we will consider the attribute c of each item and therefore we obtain from table 3 the following scores for the different items: score.sub.1=0, score.sub.2=0.9395, score.sub.3=0.9635, score.sub.4=1. As the highest score is score.sub.4=1, the pair or weights and item {[(0,0,1), 4]} is stored in memory 104.
[0275] In the second step, the edge points will be calculated by CPU 101. To obtain the breaking points on each edge 1502, 1504 and 1506 (line between two vertex in the ternary plot) CPU 101 iterates through all the pairs of items. For each pair of items three different systems of linear equations are solved by CPU 101, one for each edge 1502, 1504 and 1506 of the ternary plot 1500. As there are four items, there are six different pairs of distinct items. In what follows, we detail the calculations for all pairs and all edges.
[0276] First we consider the pair 1 comprising items 1 and 2.
[0277] For this pair 1, we will start finding the edge points on the edge 1502 connecting vertex 1 and vertex 2. The following system of linear equations with two variables and two equations is solved to obtain the weights of potential points in this edge 1502 by CPU 101:
w.sub.a*a.sub.1+w.sub.b*b.sub.1=w.sub.a*a.sub.2+w.sub.b*b.sub.2
w.sub.a+w.sub.b=1.
[0278] Using the values for the item attributes displayed in Table 3, the above system of equations becomes:
w.sub.a+w.sub.b=w.sub.a*0.75
w.sub.a+w.sub.b=1
[0279] The solution to this system is w.sub.a=1.33 and w.sub.b=−0.33. As w.sub.b<0, this set of weights is discarded by CPU 101 and no edge point is stored for edge 1502.
[0280] Edge 1504 connecting vertex 1 and vertex 3 is considered now. The following system of linear equations with two variables and two equations is solved by CPU 101 to obtain the weights of potential points in this edge 1504:
w.sub.a*a.sub.1+w.sub.c*c.sub.1=w.sub.a*a.sub.2+w.sub.c*c.sub.2
w.sub.a+w.sub.b=1
[0281] Using the values for the item attributes displayed in Table 3, this system of equations becomes:
w.sub.a+w.sub.c=w.sub.a*0.75+w.sub.c*0.9395
w.sub.a+w.sub.c=1
[0282] The solution to this system is w.sub.a=0.79 and w.sub.c=0.21. Because w.sub.a>0 and w.sub.c>0 then w the following pairs of weights and items is stored in memory 104: {[(0.79, 0, 0.21), 1], [(0.79, 0, 0.21), 2]}.
[0283] Next, edge 1506 between vertex 2 and vertex 3 is considered. The following system of linear equations with two variables and two equations is solved by CPU 101 to obtain the weights of potential points in this edge:
w.sub.b*b.sub.1+w.sub.c*c.sub.1=w.sub.b*b.sub.2+w.sub.c*c.sub.2
w.sub.b+w.sub.c=1
[0284] Using the values for the item attributes displayed in Table 3, this system of equations becomes:
w.sub.b=w.sub.c*0.9395
w.sub.b+w.sub.c=1
[0285] The solution to this system is w.sub.b=0.48 and w.sub.c=0.52. As w.sub.b>0 and w.sub.c>0, the following pairs of weights and items are stored in memory 104: {[(0, 0.48, 0.52), 1], [(0, 0.48, 0.52), 2]}
[0286] Now a second pair 2 comprising items 1 and 3 is considered.
[0287] For this pair 2, again the edge points on the edge 1502 connecting vertex 1 and vertex 2 are calculated by the CPU 101.
[0288] Using the values for the item attributes displayed in Table 3, the following system of linear equations with two variables and two equations is solved by CPU 101 to obtain the weights of potential points in this edge 1502:
w.sub.a+w.sub.b=w.sub.a*0.5+w.sub.b*0.4202
w.sub.a+w.sub.b=1
[0289] The solution to this system is w.sub.a=7.26 and w.sub.b=−6.26. As w.sub.b<0, this set of weights is discarded by CPU 101 and no edge point is calculated for edge 1502.
[0290] Next, edge 1504 between vertex 1 and vertex 3 is considered. Using the values for the item attributes displayed in Table 3, the following system of linear equations with two variables and two equations is solved by CPU 101 to obtain the weights of potential points in this edge 1504:
w.sub.a=w.sub.a*0.5+w.sub.c*0.9635
w.sub.a+w.sub.c=1
[0291] The solution to this system is w.sub.a=−0.08 and w.sub.c=1.08. As w.sub.a<0, this set of weights is discarded by CPU 101.
[0292] When considering edge 1506 between vertex 2 and vertex 3, the following system of linear equations with two variables and two equations is solved by CPU 101 to obtain the weights of potential points in this edge 1506:
w.sub.b=w.sub.b*0.4202+w.sub.c*0.9635
w.sub.b+w.sub.c=1
[0293] The solution to this system is w.sub.b=0.62 and w.sub.c=0.38. As w.sub.b>0 and w.sub.c>0, the following pairs of weights and items are stored in memory 104: {[(0, 0.62, 0.38), 1], [(0, 0.62, 0.38), 3]}.
[0294] Now a third pair 3 comprising items 1 and 4 is considered.
[0295] For this pair 3, again the edge points on the edge 1502 connecting vertex 1 and vertex 2 are calculated by the CPU 101.
[0296] Using the values for the item attributes displayed in Table 3, the following system of linear equations with two variables and two equations is solved by CPU 101 to obtain the weights of potential points in this edge 1502:
w.sub.a+w.sub.b=w.sub.b*0.6751
w.sub.a+w.sub.b=1
[0297] The solution to this system is w.sub.a=−0.48 and w.sub.b=1.48. As w.sub.a<0, these values are discarded.
[0298] The edge points on the edge 1504 are calculated by the CPU 101 now.
[0299] Using the values for the item attributes displayed in Table 3, the following system of linear equations with two variables and two equations is solved by CPU 101 to obtain the weights of potential points in this edge 1504:
w.sub.a=w.sub.c
w.sub.a+w.sub.c=1
[0300] The solution to this system is w.sub.a=0.5 and w.sub.c=0.5. As w.sub.c>0 and w.sub.a>0, the following pairs of weights and items are stored in memory 104: {[(0.5, 0, 0.5), 1], [(0.5, 0, 0.5), 4]}.
[0301] Next, edge 1506 between vertex 2 and vertex 3 is considered. The following system of linear equations with two variables and two equations is solved by CPU 101 to obtain the weights of potential points in this edge 1506:
w.sub.b=w.sub.b*0.6751+w.sub.c
w.sub.b+w.sub.c=1
[0302] The solution to this system is w.sub.b=0.75 and w.sub.c=0.25. As w.sub.b>0 and w.sub.c>0, the following pairs of weights and items are stored in memory 104: {[(0, 0.75, 0.25), 1], [(0, 0.75, 0.25), 4]}.
[0303] Now a fourth pair 4 comprising items 2 and 3 is considered.
[0304] For this pair 4, again the edge points on the edge 1502 connecting vertex 1 and vertex 2 are calculated by the CPU 101.
[0305] Using the values for the item attributes displayed in Table 3, the following system of linear equations with two variables and two equations is solved by CPU 101 to obtain the weights of potential points in this edge 1502:
w.sub.a*0.75=w.sub.a*0.5+w.sub.b*0.4202
w.sub.a+w.sub.b=1
[0306] The solution to this system is w.sub.a=0.63 and w.sub.b=0.37. As w.sub.b<0. As w.sub.a>0 and w.sub.c>0, the following pairs of weights and items are stored in memory 104: {[(0.63, 0.37, 0), 2], [(0.63, 0.37, 0), 3]}.
[0307] The edge points on the edge 1504 are calculated by the CPU 101 now. Using the values for the item attributes displayed in Table 3, the following system of linear equations with two variables and two equations is solved by CPU 101 to obtain the weights of potential points in this edge 1504:
w.sub.a*0.75+w.sub.c*0.9395=w.sub.a*0.5+w.sub.c*0.9635
w.sub.a+w.sub.c=1
[0308] The solution to this system is w.sub.a=0.09 and w.sub.c=0.91. As w.sub.c>0 and w.sub.a>0, the following pairs of weights and items are stored in memory 104: {[(0.09, 0, 0.91), 2], [(0.09, 0, 0.91), 3]}.
[0309] Next, edge 1506 between vertex 2 and vertex 3 is considered. The following system of linear equations with two variables and two equations is solved by CPU 101 to obtain the weights of potential points in this edge 1506:
w.sub.c*0.9395=w.sub.b*0.4202+w.sub.c*0.9635
w.sub.b+w.sub.c=1
[0310] The solution to this system is w.sub.b=−0.06 and w.sub.c=1.06. As w.sub.b<0 these weights are discarded.
[0311] Now a fifth pair 5 comprising items 2 and 4 is considered.
[0312] For this pair 5, again the edge points on the edge 1502 connecting vertex 1 and vertex 2 are calculated by the CPU 101.
[0313] Using the values for the item attributes displayed in Table 3, the following system of linear equations with two variables and two equations is solved by CPU 101 to obtain the weights of potential points in this edge 1502:
w.sub.a*0.75=w.sub.b*0.6751
w.sub.a+w.sub.b=1
[0314] The solution to this system is w.sub.a=0.47 and w.sub.b=0.53. Because w.sub.a>0 and w.sub.b>0 then we store the following pairs of weights and items: {[(0.47, 0.53, 0), 2], [(0.47, 0.53, 0), 4]}
[0315] The edge points on the edge 1504 are calculated by the CPU 101 now. Using the values for the item attributes displayed in Table 3, the following system of linear equations with two variables and two equations is solved by CPU 101 to obtain the weights of potential points in this edge 1504:
w.sub.a*0.75+w.sub.c*0.9395=w.sub.c
w.sub.a+w.sub.c=1
[0316] The solution to this system is w.sub.a=0.07 and w.sub.c=0.93. Because w.sub.a>0 and w.sub.c>0 then we store the following pairs of weights and items: {[(0.07, 0, 0.93), 2], [(0.07, 0, 0.93), 4]}
[0317] The edge points on the edge 1506 are calculated by the CPU 101 now. Using the values for the item attributes displayed in Table 3, the following system of linear equations with two variables and two equations is solved by CPU 101 to obtain the weights of potential points in this edge 1506:
w.sub.c*0.9395=w.sub.b*0.6751+w.sub.c
w.sub.b+w.sub.c=1
[0318] The solution to this system is w.sub.b=−0.1 and w.sub.c=1.1. Because w.sub.b<0 we discard this point.
[0319] The rest of the process for the rest of the pairs of items left is more schematically explained below.
[0320] For a sixth pair 6 comprising items 3 and 4, the process is as follows.
[0321] For the edge 1502 between vertex 1 and vertex 2, the following system of equations is solved:
w.sub.a*0.5+w.sub.b*0.4202=w.sub.b*0.6751
w.sub.a+w.sub.b=1
[0322] The solution to this system is w.sub.a=0.34 and w.sub.b=0.66. Because w.sub.a>0 and w.sub.b>0 then we store the following pairs of weights and items: {[(0.34, 0.66, 0), 3], [(0.34, 0.66, 0), 4]}
[0323] For the edge 1504 between vertex 1 and vertex 3, the system of equations is:
w.sub.a*0.5+w.sub.c*0.9635=w.sub.c
w.sub.a+w.sub.c=1
[0324] The solution to this system is w.sub.a=0.07 and w.sub.c=0.93. Because w.sub.a>0 and w.sub.c>0 then we store the following pairs of weights and items: {[(0.07, 0, 0.93), 3], [(0.07, 0, 0.93), 4]}
[0325] For edge 1506 between vertex 2 and vertex 3, the system of equations is:
w.sub.b*0.4202+w.sub.c*0.9635=w.sub.b*0.6751+w.sub.c
w.sub.b+w.sub.c=1
[0326] The solution to this system is w.sub.b=−0.17 and w.sub.c=1.17. Because w.sub.b<0 we discard this point.
[0327] Finally, the possible internal points of the ternary plot will be calculated. We consider all distinct subsets of three items out of the four designs. There are 4 of such subsets.
[0328] For the first triplet 1, items 1, 2 and 3 are considered and the system of equations is:
w.sub.a*a.sub.1+w.sub.b*b.sub.1+w.sub.c*c.sub.1=w.sub.a*a.sub.2+w.sub.b*b.sub.2+w.sub.c*c.sub.2
w.sub.a*a.sub.1+w.sub.b*b.sub.1+w.sub.c*c.sub.1=w.sub.a*a.sub.3+w.sub.b*b.sub.3+w.sub.c*c.sub.3
w.sub.a+w.sub.b+w.sub.c=1
[0329] Using the data from Table 3, this system becomes:
w.sub.a+w.sub.b=w.sub.a*0.75+w.sub.c*0.9395
w.sub.a+w.sub.b=w.sub.a*0.5+w.sub.b*0.4202+w.sub.c*0.9635
w.sub.a+w.sub.b+w.sub.c=1
[0330] The solution to this system is w.sub.a=0.42, w.sub.b=0.23 and w.sub.c=0.35. Because all weights are positive, then we store the following pairs of weights and items: {[(0.42, 0.23, 0.35), 1], [(0.42, 0.23, 0.35), 2], [(0.42, 0.23, 0.35), 3]}.
[0331] For the second triplet 2 comprising items 1, 2 and 4, the system of equations is:
w.sub.a*a1+w.sub.b*b1+w.sub.c*c1=w.sub.a*a2+w.sub.b*b2+w.sub.c*c.sub.2
w.sub.a*a1+w.sub.b*b1+w.sub.c*c1=w.sub.a*a4+w.sub.b*b4+w.sub.c*c.sub.4
w.sub.a+w.sub.b+w.sub.c=1
[0332] Using the data from Table 3, this system becomes:
w.sub.a+w=w.sub.a*0.75+w.sub.c*0.9395
w.sub.a+w.sub.b=w.sub.b*0.6751+w.sub.c
w.sub.a+w.sub.b+w.sub.c=1
[0333] The solution to this system is w.sub.a=0.3, w.sub.b=0.3 and w.sub.c=0.4. Because all weights are positive, then we store the following pairs of weights and items: {[(0.3, 0.3, 0.4), 1], [(0.3, 0.3, 0.4), 2], [(0.3, 0.3, 0.4), 4]}
[0334] For the third triplet 3 comprising items 1, 3 and 4, the system of equations is:
w.sub.a*a.sub.1+w.sub.b*b.sub.1+w.sub.c*c.sub.1=w.sub.a*a.sub.3+w.sub.b*b.sub.3+w.sub.c*c.sub.3
w.sub.a*a.sub.1+w.sub.b*b.sub.1+w.sub.c*c.sub.1=w.sub.a*a.sub.4+w.sub.b*b.sub.4+w.sub.c*c.sub.4
w.sub.a+w.sub.b+w.sub.c=1
[0335] Using the data from Table 3, this system becomes:
w.sub.a+w.sub.b=w.sub.a*0.5+w.sub.b*0.4202+w.sub.c*0.9635
w.sub.a+w.sub.b=w.sub.b*0.6751+w.sub.c
w.sub.a+w.sub.b+w.sub.c=1
[0336] The solution to this system is w.sub.a=0.23, w.sub.b=0.41 and w.sub.c=0.36. Because all weights are positive, then we store the following pairs of weights and items: {[(0.23, 0.41, 0.36), 1], [(0.23, 0.41, 0.36), 3], [(0.23, 0.41, 0.36), 4]}
[0337] For the fourth triplet 4 comprising items 2, 3 and 4, the system of equations is:
w.sub.a*a.sub.2+w.sub.b*b.sub.2+w.sub.c*c.sub.2=w.sub.a*a.sub.3+w.sub.b*b.sub.3+w.sub.c*c.sub.3
w.sub.a*a.sub.2+w.sub.b*b.sub.2+w.sub.c*c.sub.2=w.sub.a*a.sub.4+w.sub.b*b.sub.4+w.sub.c*c.sub.4
w.sub.a+w.sub.b+w.sub.c=1
[0338] Using the data from Table 3, this system becomes:
w.sub.a*0.75+w.sub.c*0.9395=w.sub.a*0.5+w.sub.b*0.4202+w.sub.c*0.9635
w.sub.a*0.75+w.sub.c*0.9395=w.sub.b*0.6751+w.sub.c
w.sub.a+w.sub.b+w.sub.c=1
[0339] The solution to this system is w.sub.a=0.06, w.sub.b=−0.02 and w.sub.c=0.96. Because w.sub.b<0 we discard this point.
[0340] Table 4 shows the stored vertex, edge and internal points after these steps.
TABLE-US-00007 TABLE 4 obtained vertex, edge and internal points after Phase 1 Type Items w.sub.a w.sub.b w.sub.c 1 Vertex 1 1 0 0 2 Vertex 1 0 1 0 3 Vertex 4 0 0 1 4 Edge 1,2 0.79 0 0.21 5 Edge 1,2 0 0.48 0.52 6 Edge 1,3 0 0.62 0.38 7 Edge 1,4 0.5 0 0.5 8 Edge 1,4 0 0.75 0.25 9 Edge 2,3 0.63 0.37 0 10 Edge 2,3 0.09 0 0.91 11 Edge 2,4 0.47 0.53 0 12 Edge 2,4 0.07 0 0.93 13 Edge 3,4 0.34 0.66 0 14 Edge 3,4 0.07 0 0.93 15 Internal 1,2,3 0.42 0.23 0.35 16 Internal 1,2,4 0.3 0.3 0.4 17 Internal 1,3,4 0.23 0.41 0.36
[0341] Now the edge and internal points will be checked. In this phase, we iterate through all the edge and internal points (points 4 to 17) and perform the following operations. For each point, which is determined by a set of weights, the score of all items is calculated. For each point, the items stored (two in the case of edges and three in the case of internal points) have the same score value (SCORE 1). Then, the maximum score value across all the items is obtained, which will be called SCORE MAX. If SCORE MAX>SCORE 1, then this point is discarded.
[0342] Table 5 shows the calculation of these scores. With a * we indicate the items that are stored with the corresponding points. In bold font we point out the cases where SCORE MAX>SCORE 1. The points that are not discarded are underlined.
TABLE-US-00008 TABLE 5 processed of the obtained points SCORE type items wa wb wc score.sub.1 score.sub.2 score.sub.3 score.sub.4 SCORE 1 MAX 4 edge 1, 2 0.79 0 0.21 0.790* 0.790* 0.597 0.210 0.790 0.790 5 edge 1, 2 0 0.48 0.52 0.480* 0.489* 0.703 0.844 0.490 0.844 6 edge 1, 3 0 0.62 0.38 0.620* 0.357 0.627* 0.799 0.630 0.799 7 edge 1, 4 0.5 0 0.5 0.500* 0.845 0.732 0.500* 0.500 0.845 8 edge 1, 4 0 0.75 0.25 0.750* 0.235 0.556 0.756* 0.760 0.756 9 edge 2, 3 0.63 0.37 0 1.000 0.473* 0.470* 0.250 0.470 1.000 10 edge 2, 3 0.09 0 0.91 0.090 0.922* 0.922* 0.910 0.920 0.922 11 edge 2, 4 0.47 0.53 0 1.000 0.353* 0.458 0.358* 0.350 1.000 12 edge 2, 4 0.07 0 0.93 0.075 0.925* 0.929 0.925* 0.925 0.929 13 edge 3, 4 0.34 0.66 0 1.000 0.255 0.447* 0.446* 0.450 1.000 14 edge 3, 4 0.07 0 0.93 0.070 0.926 0.931* 0.930* 0.930 0.931 15 internal 1, 2, 3 0.42 0.23 0.35 0.650* 0.644* 0.644* 0.505 0.640 0.650 16 internal 1, 2, 4 0.3 0.3 0.4 0.600* 0.601* 0.661 0.603* 0.600 0.661 17 internal 1, 3, 4 0.23 0.41 0.36 0.640* 0.511 0.634* 0.637* 0.640 0.640
[0343] From the edge and the internal points, the points numbers 4, 8, 19, 14, 15 and 17 are selected. The final list of points in Table 6 are displayed in
TABLE-US-00009 TABLE 6 final list of points point type items w.sub.a w.sub.b w.sub.c A vertex 1 1 0 0 B vertex 1 0 1 0 C vertex 4 0 0 1 D edge 1,2 0.79 0 0.21 E edge 1,4 0 0.75 0.25 F edge 2,3 0.09 0 0.91 G edge 3,4 0.07 0 0.93 H internal 1,2,3 0.42 0.23 0.35 I internal 1,3,4 0.23 0.41 0.36
[0344] Each item corresponds to an area in the ternary plot, which is defined as the convex hull of the weight coordinates of the point. The convex hull of a set of two-dimensional points is the smallest convex polygon containing all the points. Before calculating the convex hull, the weight coordinates (wa,wb,wc) have to be transformed to coordinates in a two-dimensional space, where the origin is located at the bottom left vertex. A set of coordinates is denoted as the tuple (x,y) and the components are calculated from the weight coordinates as follows:
The final list of points in two-dimensional coordinates becomes:
TABLE-US-00010 TABLE 7 final list of points in two-dimensional coordinates point type items x y A vertex 1 0 0 B vertex 1 1 0 C vertex 4 0.5 0.866 D edge 1,2 0.105 0.182 E edge 1,4 0.875 0.217 F edge 2,3 0.455 0.788 G edge 3,4 0.465 0.805 H internal 1,2,3 0.405 0.303 I internal 1,3,4 0.59 0.312
The two-dimensional points associated with item 1 are (0,0), (1,0), (0.105,0.182), (0.875,0.217), (0.405,0.303) and (0.59,0.312) (points A, B, D, E, H and I in
[0345] The ternary plot shown in
[0346] The first area 1510 at the bottom indicates that experimental design 60 performs the best in the number of runs (number of tests) and the 4.sup.th order correlation. This is, experimental design 60 has the best value within the four designs for these two statistical quality characteristics. The fourth area 1516 is at the top of the ternary plot and this means that experimental design 9227 is the best in terms of the power to detect quadratic effects. The fact that the fourth area 1516 does not extend towards the vertex 1 (number of experimental tests or runs) indicates that the experimental design 9227 is substantially larger than the experimental design 60. The third area 1514 occupies a central position in the ternary plot. This means that, despite not being the best design for any of the selected statistical quality characteristics, performs overall good for all three of them. The ternary plot is a decision support tool as it allows quantification of the alternatives. For example, the surface of the areas can be calculated, using, for example, the Shoelace formula. In the example, area 1510 has a surface of 0.228, area 1512 has a surface of 0.07, area 1514 has a surface of 0.045 and area 1516 has a surface of 0.087. Area 1510 has the largest surface, which indicates that experimental design 60 is performing best for a largest variety of attribute weights. Besides the large surface, design 60 performs best for two out of the three criteria, and it contains the weights coordinates (⅓,⅓,⅓), which is the center of gravity of the triangle. All these facts indicate that design 60 is the most appropriate for the problem at hand. However, if all three criteria are equally important, then one may consider design 9226. The central location of area 1514 indicates that design 9226 performs well for all three criteria despite it is not the best for any of them.
[0347] Several examples of applications of the invention will be provided in the following. It is understood that these examples are non-limiting and other application of the invention are also possible wherein other industrial processes are optimized. Furthermore, other parameters may be taken into account for optimizing the different industrial processes according to the invention.
Example 1: Optimization of Potato Chips Ingredients and Production Process Parameters in the Food Industry
[0348] The first example relates to a company that needs to optimizes the proportions of ingredients and other some process parameters when producing potato chips. This first example would be also applicable for any other production process wherein proportions of ingredients or other process production parameters need to be optimized. To optimize the production of potato chips, an experiment needs to be performed to determine the optimal proportions of ingredients such as rice flour, potato, corn flour, the optimal temperature of the process, and the optimal cooking time to reduce the fat content of their chips while minimizing the production cost.
[0349] To that end they need to perform an experiment and have several alternative experimental designs to choose from. The experimental designs are described by three numerical attributes or parameters: a first parameter is the number of tests, a second parameter is the amount of information, and a third parameter is the maximum statistical correlation between the different effects. It is understood that any other suitable number of parameters could be chosen and that any other suitable parameters could be selected.
[0350] According to the invention, an operating system in a computer performs the following operations. A weight is assigned to each one of the attributes or parameters that describe the set of available experimental designs. Each experimental design corresponds to an item, and each attribute of an experimental design is a parameter of the corresponding item. Then, for each pair of items, a set of weights is determined that produce the same score in both items and the set of weights is stored if they are strictly positive. Then, the items that perform the best for each one of the three parameters is determined. For that, it is determined, at the operating system, a first vertex item, a second vertex item and a third vertex item, wherein the each of the first, second and third vertex items is the item comprising the greatest value for the parameter p, the second vertex item is the item of the plurality of items comprising respectively the greatest value for the first, second and third parameters. Then, the method proceeds to determine, at the operating system, a plurality of points based on the stored set of weights and corresponding pair of items, the first vertex item, the second vertex item and the third vertex item. The result is a plurality of points that are displayed as a ternary plot on a display screen by determining a set of regions. The set of regions partitions completely the ternary plot. Each region is a polygon, whose vertices are contained in the plurality of points obtained. Each regions indicates one or more items that perform the best for any combination of weights contained in the region. Then, the area of each polygon is calculated as previously described, and the operating systems may make a suggestion to the user to select one of the items in the region with the largest area. The operating system them receives an input from the user to select a region (item) in the displayed plot and the experiment design associated with said item is performed. In this way, the invention provides items optimized in respect to the set of weights contained in each respective region of the map, received an input to select an item in the displayed plot, wherein each item of the plurality of items identifies an experimental design for performing an industrial process, and then the industrial process is performed based on the experimental design corresponding to the selected item. I.e., the selected item identifies the experimental design that will be used to execute an experiment that optimizes the industrial process to produce potato chips. The selected experimental design balances the quality in what respect the attributes or parameters considered, this is, the number of tests, the among of information and the maximum statistical correlation between the different effects.
[0351] After the execution of the selected experiment, the user may further analyze the experimental data to further optimize the ingredients and production process parameters to obtain the best recipe for his industrial process. This is, the specific proportions of ingredients and the values of the process parameters that will minimize both the fat content and the production costs. The invention allows selecting a good experimental design that optimizes the industrial process at a minimal cost.
Example 2: Optimization of the Pharmaceutical Excipients Used in Pill Production
[0352] The second experiment relates to, for instance, a pharmaceutical company that has developed a new active ingredient to successfully treat certain disease. The challenge for the pharmaceutical company is now to find appropriate excipient proportions to define a pill formula which is both acceptable in terms of production costs and stability. The pill should fulfill strict requirements in terms of shelf life, solubility, and hardness. This example is not limiting in this regard and could be applied to the production of any other pharmaceutical product wherein any other kind of parameters of the production process needs to be optimized.
[0353] To plan the experiment that optimizes the production of the pill, the company has several alternative experimental designs to choose from. Each experimental design is described by three numerical attributes or parameters: the first parameter is the number of excipients combinations that are to be tested, the second parameter is the projection capacity, and the third parameter is the average unscaled prediction variance.
[0354] As said, according to the invention, an operating system in a computer will perform the following operations. A weight is assigned to each one of the attributes or parameters that describe the set of available experimental designs. Each experimental design corresponds to an item, and each attribute of an experimental design is a parameter of the corresponding item. Then, for each pair of items, a set of weights is determined that produce the same score in both items and the set of weights is stored if they are strictly positive. Then, the items that perform the best for each one of the three parameters is determined. For that, it is determined, at the operating system, a first vertex item, a second vertex item and a third vertex item, wherein the each of the first, second and third vertex items is the item comprising the greatest value for the parameter p, the second vertex item is the item of the plurality of items comprising respectively the greatest value for the first, second and third parameters. Then, the method proceeds to determine, at the operating system, a plurality of points based on the stored set of weights and corresponding pair of items, the first vertex item, the second vertex item and the third vertex item. The result is a plurality of points that are displayed as a ternary plot on a display screen by determining a set of regions. The set of regions partitions completely the ternary plot. Each region is a polygon, whose vertices are contained in the plurality of points obtained. Each regions indicates one or more items that perform the best for any combination of weights contained in the region. Then, the area of each polygon is calculated as previously described, and the operating systems may make a suggestion to the user to select one of the items in the region with the largest area. The operating system them receives an input from the user to select a region (item) in the displayed plot and the experiment design associated with said item is performed. In this way, the invention provides items optimized in respect to the set of weights contained in each respective region of the map, received an input to select an item in the displayed plot, wherein each item of the plurality of items identifies an experimental design for producing the pill, and then the industrial process is performed based on the experimental design corresponding to the selected item by producing the pill. I.e., the selected item identifies the experimental design that will be used to execute an experiment that optimizes the industrial process to produce pills based on the selected parameters.
[0355] The excipients combination of each one of the experiments tests is used to produce a pill. On each pill, the outcomes of interest, this is, the shelf life, solubility, and hardness, may be measured. The set of all excipients combinations values and the corresponding outcomes of interest values conform the experimental data. The experimental data is analyzed, and a model is fitted to the data which relates the excipient proportions and the outcomes. The model is then used to optimize the excipients proportions to produce a pill that will remain within specifications at the minimal cost.
Example 3: Optimization of a Polymerization Industrial Process
[0356] The third example relates to a chemical industry that produces polypropylene for the car industry and wants to reduce the process energy cost and increase the adhesion of their polymerization process. A good adhesion implies that the polypropylene can be easily painted.
[0357] In the polymerization process, there are 5 chemicals that influence the adhesion, together with the temperature and pressure of the process. Depending on the values of the 5 chemicals, and the temperature and pressure, the adhesion of the polypropylene and the energy cost of the process varies. The objective of the company is to run a more efficient process (lower energy cost) while assuring a high quality in terms of adhesion. To achieve this goal, the company must figure out the best values for the proportions for the 5 chemicals, and the best values for the temperature and pressure of the process. To find this, the company decides to execute an experiment. Due to the fact that the polymerization process is expensive, the experimental plan must have 20 tests or less.
[0358] To plan the experiment, the company gathers 30 experimental designs with a number of runs that range between 12 and 20. Each experimental design is described by the number of tests (the first parameter), and by two other attributes or parameters. These two extra attributes are the D-efficiency for a ME models (the second parameter), and the maximum fourth order correlation (the third parameter).
[0359] According to the invention, an operating system in a computer performs the following operations. A weight is assigned to each one of the attributes or parameters that describe the set of available experimental designs. Each experimental design corresponds to an item, and each attribute of an experimental design is a parameter of the corresponding item. Then, for each pair of items, a set of weights is determined that produce the same score in both items and the set of weights is stored if they are strictly positive. Then, the items that perform the best for each one of the three parameters is determined. For that, it is determined, at the operating system, a first vertex item, a second vertex item and a third vertex item, wherein the each of the first, second and third vertex items is the item comprising the greatest value for the parameter p, the second vertex item is the item of the plurality of items comprising respectively the greatest value for the first, second and third parameters. Then, the method proceeds to determine, at the operating system, a plurality of points based on the stored set of weights and corresponding pair of items, the first vertex item, the second vertex item and the third vertex item. The result is a plurality of points that are displayed as a ternary plot on a display screen by determining a set of regions. The set of regions partitions completely the ternary plot. Each region is a polygon, whose vertices are contained in the plurality of points obtained. Each regions indicates one or more items that perform the best for any combination of weights contained in the region. Then, the area of each polygon is calculated as previously described, and the operating systems may make a suggestion to the user to select one of the items in the region with the largest area. The operating system them receives an input from the user to select a region (item) in the displayed plot and the experiment design associated with said item is performed. In this way, the invention provides items optimized in respect to the set of weights contained in each respective region of the map, received an input to select an item in the displayed plot, wherein each item of the plurality of items identifies an experimental design for performing an industrial process, and then the industrial process is performed based on the experimental design corresponding to the selected item. I.e., the selected item identifies the experimental design that will be used to execute an experiment that optimizes the industrial process to a polymerization industrial process. The selected experimental design balances the quality in what respect the first, second and third parameters considered.
[0360] After the execution of the selected experiment, the user may further analyze the experimental data to further optimize the ingredients and production process parameters to obtain the best recipe for his industrial process. The invention allows selecting a good experimental design that optimizes the polymerization industrial process at a minimal cost.
[0361] The selected item identifies the experimental design that will be used in the polymerization process experiment. Each test of the experiment contains specific values for the chemicals, the temperature, and the pressure that should be used at each time. After each test, the adhesion of the resulting polypropylene and the energy consumption are measured. Once the experiment has been executed entirely, the data is analyzed and optimized to find the best values for the chemicals, the temperature, and the pressure. In this way, the a polymerization industrial process is optimized at a minimal cost.
Example 4: Optimization of a Cleaning Process at the Semiconductor Industry
[0362] A company in the silicon sector needs to fine tune their wafer cleaning process, so that the level of impurities is kept to a minimum. To clean a wafer, the company can use up to 5 different reactives, and the process can be done at a pressure contained in a range of possible pressures, and at a temperature contained in a range of possible temperatures. To optimize the number of reactives, the pressure, and the temperature, the company decides to plan an experiment. To this end, the company gathers 20 different experimental designs or items to improve the cleaning process of the wafer. The experimental design should run in less than a week, and optimize information about the influence of the 5 reactives, the pressure, and the temperature in the final level of impurities. The amount of information of each experimental design is quantified using two parameters or numerical attributes: the running time and influence of reactives as the first parameter, and the pressure and temperature in level of impurities as the second parameter. The invention allows to choose a suitable design for the experiment, perform the experiment and analyze and optimize the data thereby providing an improved cleaning process. For that, the invention is applied in the same way as explained for previous examples.
Clauses
[0363] 1. A method of designing and executing experiments in an industrial process using a method of generating a visual representation of a plurality of items, the method comprising: [0364] (a) selecting, at an operating system (142), a pair of items m and n from a plurality of items, wherein each item of the plurality of items comprises a plurality of parameters, each parameter of the plurality of parameters representing a property of the plurality of items; [0365] (b) for the selected pair of items m and n, selecting, at the operating system (142), a pair of parameters p and q from the plurality of parameters, wherein a.sub.mp is the parameter p of the item m, b.sub.mq is the parameter q of the item m, a.sub.np is the parameter p of the item n, and b.sub.nq is the parameter q of the item n; [0366] (c) calculating (1002) at the operating system (142), a pair of weights w.sub.p and w.sub.q wherein w.sub.p is calculated based on a.sub.mp, b.sub.mq, a.sub.np and b.sub.nq, and wherein w.sub.q is calculated based on a.sub.mp, b.sub.mq, a.sub.np and b.sub.nq; [0367] (d) if w.sub.p>0 and w.sub.q>0, storing (1004), in a memory (154) by the operating system (142), the pair of weights w.sub.p and w.sub.q and the pair of items m and n; [0368] (e) determining (904), at the operating system (142), a first vertex item, a second vertex item and a third vertex item, wherein the first vertex item is the item of the plurality of items comprising the greatest value for the parameter p, the second vertex item is the item of the plurality of items comprising the greatest value for the parameter q, and the third vertex item is the item of the plurality of items comprising the greatest value for a parameter r from the plurality of parameters; [0369] (f) generating (908) at the operating system (142), a plurality of points based on the stored pair of weights w.sub.p and w.sub.q and the pair of items m and n, the first vertex item, the second vertex item and the third vertex item; [0370] (g) displaying, in a display screen by the operating system (142), the visual representation of the plurality of items by displaying a plot comprising a geometric shape and comprising the plurality of points; [0371] (h) receiving an input, by the operating system (142), to select an item in the displayed plot, wherein each item of the plurality of items identifies an experimental design for performing an industrial process; and [0372] (i) performing the industrial process based on the experimental design corresponding to the selected item. [0373] 2. The method according to clause 1, the method further comprising the steps of: [0374] (I) selecting, at the operating system (142), a set of three items i, j and k from the plurality of items; [0375] (II) for the selected set of three items i, j and k, selecting, at the operating system (142), a set of three parameters p, q and r from the plurality of parameters, wherein a.sub.ip is the parameter p of the item i, b.sub.iq is the parameter q of the item i, c.sub.ir is the parameter r of the item i, a.sub.jp is the parameter p of the item j, b.sub.jq is the parameter q of the item j, c.sub.jr is the parameter r of the item j, a.sub.kp is the parameter p of the item k, b.sub.kq is the parameter q of the item k, and c.sub.kr is the parameter r of the item k; [0376] (III) calculating (1102), at the operating system (142), a set of three weights w.sub.p, w.sub.q and w.sub.r wherein w.sub.p is calculated based on a.sub.ip, b.sub.iq, c.sub.ir, a.sub.jp, b.sub.jq, c.sub.jr, a.sub.kp, b.sub.kq, and c.sub.kr, w.sub.q is calculated based on a.sub.ip, b.sub.iq, c.sub.ir, a.sub.jp, b.sub.jq, c.sub.jr, a.sub.kp, b.sub.kq, and c.sub.kr and w.sub.r is calculated based on a.sub.ip, b.sub.iq, c.sub.ir, a.sub.ip, b.sub.iq, c.sub.ir, a.sub.kp, b.sub.kq, and c.sub.kr; and [0377] (IV) if w.sub.p>0, w.sub.q>0 and w.sub.r>0, storing (1104), in the memory (154) by the operating system (142), the set of three weight values w.sub.p, w.sub.q and w.sub.r and the set of three items i, j and k;
wherein step (f) further comprises generating the plurality of points based on the stored set of three weight values w.sub.p, w.sub.q and w.sub.r and the set of three items i, j and k. [0378] 3. The method according to clause 1 further comprising repeating steps (a) through (d) for each pair of items m and n from the plurality of items. [0379] 4. The method according to any of clauses 1 and 3 further comprising repeating steps (b) through (d) for each pair of parameters p and q from the plurality of parameters. [0380] 5. The method according to clause 2 further comprising repeating the steps (I) through (IV) for each set of three items i, j and k from the plurality of items. [0381] 6. The method according to any of clauses 2, 4 and 5 further comprising repeating steps (II) through (IV) for each set of three parameters p, q and r from the plurality of parameters. [0382] 7. The method according to any of clauses 1, 3 and 4 wherein the pair of weights w.sub.p and w.sub.q in step (c) is calculated such that:
(w.sub.p*a.sub.mp)+(w.sub.q*b.sub.mq)=(w.sub.p*a.sub.np)+(w.sub.q*b.sub.nq), and
w.sub.p+w.sub.q=1. [0383] 8. The method according to any of clauses 2, 5 and 6 wherein the set of three weights w.sub.p, w.sub.q and w.sub.r in step (III) is calculated such that:
(w.sub.p*a.sub.ip)+(w.sub.q*b.sub.iq)+(w.sub.r*c.sub.ir)=(w.sub.p*a.sub.jp)+(w.sub.q*b.sub.iq)+(w.sub.r*c.sub.jr),
(w.sub.p*alp)+(w.sub.q*b.sub.iq)+(w.sub.r*c.sub.ir)=(w.sub.p*a.sub.kp)+(w.sub.q*b.sub.kq)+(w.sub.r*c.sub.kr), and
w.sub.p+w.sub.q+w.sub.r=1. [0384] 9. The method according to any of clauses 1, 3, 4 and 7 wherein generating the plurality of points in step (f) further comprises: [0385] calculating, at the operating system (142), a score value of the stored pair of weight values w.sub.p and w.sub.q and the pair of items m and n, wherein said score value is calculated as a linear combination of w.sub.p, and w.sub.q and the plurality of parameters of one of the pair of items m and n; and [0386] determining whether to discard the stored set pair of weight values w.sub.p, and w.sub.q and the pair of items m and n based on the calculated score value; [0387] or [0388] the method according to any of claims 2, 5, 6 and 8 wherein generating the plurality of points in step (f) further comprises: [0389] calculating, at the operating system (142), a score value of the stored set of three weight values w.sub.p, w.sub.q and w.sub.r and the set of three items i, j and k, wherein said score value is calculated as a linear combination of w.sub.p, w.sub.q and w.sub.r and the plurality of parameters of one of the three items i, j and k; and [0390] determining whether to discard the stored set of three weight values w.sub.p, w.sub.q and w.sub.r and the set of three items i, j and k based on the calculated score value. [0391] 10. The method according to clause 9 wherein: [0392] determining whether to discard the stored set pair of weight values w.sub.p, and w.sub.q and the pair of items m and n further comprises: [0393] calculating, at the operating system (142), another score value for each item of the plurality of items as a linear combination of the stored pair of weight values w.sub.p, and w.sub.q and the plurality of parameters of one of the pair of items m and n; [0394] obtaining, at the operating system (142), a comparison value based on the score value and another score value; and [0395] determining whether to discard the stored set pair of weight values w.sub.p, and w.sub.q and the pair of items m and n based on the comparison value; [0396] or [0397] wherein determining whether to discard the stored set of three weight values w.sub.p, w.sub.q and w.sub.r and the set of three items i, j and k further comprises: [0398] calculating, at the operating system (142), another score value for each item of the plurality of items as a linear combination of the stored set of three weight values w.sub.p, w.sub.q and w.sub.r and the plurality of parameters of one of the three items i, j and k; [0399] obtaining, at the operating system (142), a comparison value based on the score value and the another score value; and [0400] determining whether to discard the stored set of three weight values w.sub.p, w.sub.q and w.sub.r and the set of three items i, j and k based on the comparison value. [0401] 11. The method according to clause 1 wherein each parameter of the plurality of parameters further comprises a direction value and wherein the method further comprises performing, at the operating system (142), a transformation of at least one parameter of at least one item of the plurality of items based on the direction of at least one parameter. [0402] 12. The method according to clause 11, wherein performing the transformation comprises performing a sign change operation of at least one parameter of at least one item. [0403] 13. The method according to clause 11, wherein the direction value of a parameter indicates one of two directions of improvement for said parameter and wherein performing the transformation of at least one parameter based on the direction comprises performing the transformation if the direction value indicates a determined direction of improvement. [0404] 14. The method according to any of the previous clauses, wherein the plot is a ternary plot. [0405] 15. The method according to any of the previous clauses, wherein the industrial process comprises producing a product.