OUTPUT DEVICE, DATA STRUCTURE, OUTPUT METHOD, AND OUTPUT PROGRAM

Abstract

An output device 10 is provided with an output unit 11 for outputting, on the basis of job feature information indicating the features of the job of a distributed processing system, estimation model application information that is information in a format suitable for an estimation model that estimates the amount of computer resources required for processing a task constituting the job. The estimation model application information may include word-containing information having binary information that indicates whether or not a character string indicated by the character string information included in the job feature information includes a prescribed word. The estimation model application information may include numerical inversion label information having, as string label information, a value derived by converting, by a prescribed function, the numeric value indicated by the numerical information included in the job feature information.

Claims

1. An output device comprising an output unit which outputs estimation model application information on the basis of job feature information indicating the features of the job of a distributed processing system, estimation model application information that is information in a format suitable for an estimation model that estimates the amount of computer resources required for processing a task constituting the job.

2. The output device according to claim 1, wherein the estimation model application information includes word-containing information having binary information that indicates whether or not a character string indicated by the character string information included in the job feature information includes a prescribed word.

3. The output device according to claim 1, wherein the estimation model application information includes numerical inversion label information having, as string label information, a value derived by converting, by a prescribed function, the numeric value indicated by the numerical information included in the job feature information.

4. The output device according to claim 1, comprising a form conversion unit which outputs the estimation model application information output from the estimation model, in a same format as the job feature information corresponding to the estimation model application information.

5. The output device according to claim 1, comprising a computer resources estimation unit which estimates the amount of computer resources required for processing the task included in the job corresponding to the job feature information in the distributed processing system, by feeding the estimation model application information output from the output unit on the basis of the job feature information into the estimation model.

6.-8. (canceled)

9. An output method comprising outputting estimation model application information on the basis of job feature information indicating the features of the job of a distributed processing system, estimation model application information that is information in a format suitable for an estimation model that estimates the amount of computer resources required for processing a task constituting the job.

10. A non-transitory computer-readable recording medium having recorded therein an output program for causing a computer to execute an output process of outputting estimation model application information on the basis of job feature information indicating the features of the job of a distributed processing system, estimation model application information that is information in a format suitable for an estimation model that estimates the amount of computer resources required for processing a task constituting the job.

11. The output device according to claim 2, wherein the estimation model application information includes numerical inversion label information having, as string label information, a value derived by converting, by a prescribed function, the numeric value indicated by the numerical information included in the job feature information.

12. The output device according to claim 2, comprising a form conversion unit which outputs the estimation model application information output from the estimation model, in a same format as the job feature information corresponding to the estimation model application information.

13. The output device according to claim 3, comprising a form conversion unit which outputs the estimation model application information output from the estimation model, in a same format as the job feature information corresponding to the estimation model application information.

14. The output device according to claim 11, comprising a form conversion unit which outputs the estimation model application information output from the estimation model, in a same format as the job feature information corresponding to the estimation model application information.

15. The output device according to claim 2, comprising a computer resources estimation unit which estimates the amount of computer resources required for processing the task included in the job corresponding to the job feature information in the distributed processing system, by feeding the estimation model application information output from the output unit on the basis of the job feature information into the estimation model.

16. The output device according to claim 3, comprising a computer resources estimation unit which estimates the amount of computer resources required for processing the task included in the job corresponding to the job feature information in the distributed processing system, by feeding the estimation model application information output from the output unit on the basis of the job feature information into the estimation model.

17. The output device according to claim 4, comprising a computer resources estimation unit which estimates the amount of computer resources required for processing the task included in the job corresponding to the job feature information in the distributed processing system, by feeding the estimation model application information output from the output unit on the basis of the job feature information into the estimation model.

18. The output device according to claim 11, comprising a computer resources estimation unit which estimates the amount of computer resources required for processing the task included in the job corresponding to the job feature information in the distributed processing system, by feeding the estimation model application information output from the output unit on the basis of the job feature information into the estimation model.

19. The output device according to claim 12, comprising a computer resources estimation unit which estimates the amount of computer resources required for processing the task included in the job corresponding to the job feature information in the distributed processing system, by feeding the estimation model application information output from the output unit on the basis of the job feature information into the estimation model.

20. The output device according to claim 13, comprising a computer resources estimation unit which estimates the amount of computer resources required for processing the task included in the job corresponding to the job feature information in the distributed processing system, by feeding the estimation model application information output from the output unit on the basis of the job feature information into the estimation model.

21. The output device according to claim 14, comprising a computer resources estimation unit which estimates the amount of computer resources required for processing the task included in the job corresponding to the job feature information in the distributed processing system, by feeding the estimation model application information output from the output unit on the basis of the job feature information into the estimation model.

Description

BRIEF DESCRIPTION OF DRAWINGS

[0035] FIG. 1 is a block diagram depicting an example of the structure of Exemplary Embodiment 1 of a computer resources usage estimation device according to the present invention.

[0036] FIG. 2 is an explanatory diagram depicting an example of estimation model application information output from an input data conversion unit 101.

[0037] FIG. 3 is an explanatory diagram depicting another example of estimation model application information output from the input data conversion unit 101.

[0038] FIG. 4 is a flowchart depicting an operation of a word-containing information generation process by the input data conversion unit 101 in Exemplary Embodiment 1.

[0039] FIG. 5 is an explanatory diagram depicting an example of job feature information input to the input data conversion unit 101.

[0040] FIG. 6 is an explanatory diagram depicting an example of word-containing information output from the input data conversion unit 101.

[0041] FIG. 7 is a flowchart depicting another operation of a word-containing information generation process by the input data conversion unit 101 in Exemplary Embodiment 1.

[0042] FIG. 8 is an explanatory diagram depicting another example of job feature information input to the input data conversion unit 101.

[0043] FIG. 9 is an explanatory diagram depicting another example of word-containing information output from the input data conversion unit 101.

[0044] FIG. 10 is a flowchart depicting an operation of a numerical inversion label information generation process by the input data conversion unit 101 in Exemplary Embodiment 1.

[0045] FIG. 11 is an explanatory diagram depicting another example of job feature information input to the input data conversion unit 101.

[0046] FIG. 12 is an explanatory diagram depicting an example of numerical inversion label information output from the input data conversion unit 101.

[0047] FIG. 13 is a block diagram depicting an example of the structure of Exemplary Embodiment 2 of a computer resources usage estimation device according to the present invention.

[0048] FIG. 14 is a flowchart depicting an operation of a numerical inversion label information generation process by the input data conversion unit 101 in Exemplary Embodiment 2.

[0049] FIG. 15 is an explanatory diagram depicting another example of job feature information input to the input data conversion unit 101.

[0050] FIG. 16 is an explanatory diagram depicting another example of numerical inversion label information output from the input data conversion unit 101.

[0051] FIG. 17 is a flowchart depicting an operation of an estimated memory usage reverse conversion process by an estimate reverse conversion unit 104 in Exemplary Embodiment 2.

[0052] FIG. 18 is an explanatory diagram depicting an example of numerical inversion label information output from an estimation model.

[0053] FIG. 19 is an explanatory diagram depicting an example of estimated memory usage information output from the estimate reverse conversion unit 104.

[0054] FIG. 20 is a block diagram schematically depicting an output device according to the present invention.

[0055] FIG. 21 is a block diagram schematically depicting a data structure according to the present invention.

DESCRIPTION OF EMBODIMENT

Exemplary Embodiment 1

[0056] [Structure]

[0057] The following describes an exemplary embodiment of the present invention with reference to drawings. FIG. 1 is a block diagram depicting an example of the structure of Exemplary Embodiment 1 of a computer resources usage estimation device according to the present invention. A computer resources usage estimation device 100 depicted in FIG. 1 includes an input data conversion unit 101, a computer resources usage estimation model generation unit 102, and a computer resources usage estimation unit 103.

[0058] The computer resources usage estimation device 100 depicted in FIG. 1 is intended for a distributed processing system. The computer resources usage estimation device 100 estimates the amount of computer resources required for processing each task in the distributed processing system, using input data in a data format including word-containing information or string label information.

[0059] The input data conversion unit 101 has a function of converting job feature information included in the input data used for the generation of an estimation model into estimation model application information which is information in a format suitable for the estimation model to be generated, and outputting data including the estimation model application information.

[0060] As depicted in FIG. 1, computer resources usage and processing time are input to the input data conversion unit 101. Meta-information of input data and processing program configuration information are also input to the input data conversion unit 101.

[0061] FIGS. 2 and 3 each depict an example of the estimation model application information output from the input data conversion unit 101. FIG. 2 is an explanatory diagram depicting an example of estimation model application information output from the input data conversion unit 101.

[0062] FIG. 2 depicts word-containing information included in the estimation model application information. The word-containing information depicted in FIG. 2 is made up of a task identifier and a word candidate.

[0063] The task identifier corresponds to an identification symbol of job feature information. The word candidate indicates whether or not a predetermined word is included. In FIG. 2, the word-containing information is represented by binary information for each pair of an identification symbol of job feature information and a word candidate.

[0064] For example, consider the case of indicating that a character string which is job feature information A corresponding to task identifier Task1 includes word α1. To indicate that word α1 is included, binary information True (true) is set in the word candidate “job feature information A includes word α1?” in the word-containing information of Task1. The word-containing information of Task1 indicates that job feature information A includes word α1.

[0065] Consider the case of indicating that a character string which is job feature information B corresponding to task identifier Task2 does not include word βn. To indicate that word βn is not included, binary information False (false) is set in the word candidate “job feature information B includes word βn?” in the word-containing information of Task2. The word-containing information of Task2 indicates that job feature information B does not include word βn.

[0066] FIG. 3 is an explanatory diagram depicting another example of estimation model application information output from the input data conversion unit 101. FIG. 3 depicts numerical inversion label information included in the estimation model application information. The numerical inversion label information depicted in FIG. 3 is made up of a task identifier and label information.

[0067] The task identifier corresponds to an identification symbol of numerical information. The numerical information corresponds to numerical job feature information. In FIG. 3, the numerical inversion label information is represented by character string information for each pair of an identification symbol of numerical information and label information.

[0068] For example, consider the case of indicating that the label information of numerical information A corresponding to task identifier Task1 is 8. To indicate that the label information of numerical information A is 8, character string information “8” is set in the label information “label information of numerical information A” in the numerical inversion label information of Task1. The numerical inversion label information of Task1 indicates that the label information of numerical information A is 8.

[0069] Consider the case of indicating that the label information of numerical information B corresponding to task identifier Task2 is 0. To indicate that the label information of numerical information B is 0, character string information “0” is set in the label information “label information of numerical information B” in the numerical inversion label information of Task2. The numerical inversion label information of Task2 indicates that the label information of numerical information B is 0.

[0070] The computer resources usage estimation model generation unit 102 has a function of receiving the data output from the input data conversion unit 101 and generating the estimation model. As depicted in FIG. 1, the computer resources usage estimation model generation unit 102 outputs the generated estimation model to the computer resources usage estimation unit 103.

[0071] The computer resources usage estimation unit 103 has a function of estimating the computer resources usage of a task whose feature has not been recognized yet, using the received estimation model. The computer resources usage estimation unit 103 may output an estimate of an index relating to process execution, such as processing time, other than the computer resources usage.

[0072] Although the computer resources usage estimation device 100 in this exemplary embodiment estimates computer resources usage, the computer resources usage estimation device 100 may estimate a value other than computer resources usage. For example, the computer resources usage estimation device 100 may estimate task processing time in the distributed processing system. Any value estimated by the computer resources usage estimation device 100 in this exemplary embodiment is expected to have improved estimation accuracy.

[0073] The computer resources usage estimation device 100 in this exemplary embodiment is, for example, realized by a central processing unit (CPU) that executes processes according to a program stored in a storage medium. In other words, the input data conversion unit 101, the computer resources usage estimation model generation unit 102, and the computer resources usage estimation unit 103 are, for example, realized by a CPU that executes processes according to program control.

[0074] Each unit in the computer resources usage estimation device 100 may be realized by a hardware circuit.

[0075] [Operation]

[0076] The following describes the operation of the input data conversion unit 101 in this exemplary embodiment, with reference to FIGS. 4, 7, and 10.

[0077] The operation of the input data conversion unit 101 in this exemplary embodiment generating, for a job name which is one type of job feature information, word-containing information indicating whether or not each word of a word group constituting the job name is included on the basis of the job feature information is described first, with reference to FIG. 4. FIG. 4 is a flowchart depicting an operation of the word-containing information generation process by the input data conversion unit 101 in Exemplary Embodiment 1.

[0078] FIG. 5 is an explanatory diagram depicting an example of job feature information input to the input data conversion unit 101. FIG. 5 depicts part of task-related information which is observed in the processing in the distributed processing system. The job feature information depicted in FIG. 5 is made up of a task number and a job name.

[0079] FIG. 6 is an explanatory diagram depicting an example of word-containing information output from the input data conversion unit 101. FIG. 6 depicts the word-containing information generated by the input data conversion unit 101 on the basis of the job name included in the job feature information depicted in FIG. 5. The following describes the operation of the input data conversion unit 101 generating the word-containing information depicted in FIG. 6 on the basis of the job feature information depicted in FIG. 5, with reference to FIG. 4.

[0080] When the job feature information depicted in FIG. 5 is input, the input data conversion unit 101 forms the output word-containing information by the task number and the group of candidates of words constituting the job name included in the job feature information (step S101).

[0081] When forming the word-containing information, the input data conversion unit 101 generates each word candidate name by, for example, prefixing the identifier of the generation source information. The input data conversion unit 101 may generate each word candidate name by any other method, as long as the generated name is uniquely identifiable.

[0082] The job name with the task number “1” in the job feature information depicted in FIG. 5 is “Cluster Iterator running iteration 3 over priorPath: kmeans/46/clusters-2”. The job name with the task number “2” is “Cluster Iterator running iteration 5 over priorPath: kmeans/106/clusters-4”. Based on the input two job names, the input data conversion unit 101 forms the word-containing information by the group of candidates of words constituting each job name.

[0083] In detail, the input data conversion unit 101 prefixes “Jobname” to each of the words “Cluster”, “Iterator”, “running”, “iteration”, “3”, “over”, “priorPath”, “kmeans”, “46”, and “clusters-2” present in the job name with the task number “1”, to generate the word candidate name.

[0084] The input data conversion unit 101 also prefixes “Jobname” to each of the words “5”, “106”, and “clusters-4” not present in the job name with the task number “1” and only present in the job name with the task number “2”, to generate the word candidate name. The input data conversion unit 101 forms the word-containing information by the word candidate group indicating the generated names.

[0085] The input data conversion unit 101 generates the same number of pieces of word-containing information as the number of input pieces of job feature information. The input data conversion unit 101 sets the task number of the input job feature information as the task number of the generated word-containing information corresponding to the job feature information.

[0086] The input data conversion unit 101 then sets all word candidates in each generated piece of word-containing information to False, as an initialization process (step S102).

[0087] Following this, the input data conversion unit 101 divides the job name of the input job feature information into words (step S104). For example, the job name with the task number “1” is divided into the words “Cluster”, “Iterator”, “running”, “iteration”, “3”, “over”, “priorPath”, “kmeans”, “46”, and “clusters-2”.

[0088] The delimiter or delimiter character when the input data conversion unit 101 divides the job name into words is, for example, set by the user, the system, or the like. Alternatively, the input data conversion unit 101 may hold the delimiter or delimiter character beforehand.

[0089] The input data conversion unit 101 then sets the word candidates in the word-containing information corresponding to the divided words, to True (step S106). The binary information “True” indicates that the set word candidate is included in the job name. The input data conversion unit 101 sets True for the number of divided words (step S107).

[0090] For example, in the case of the job feature information of the task number “1”, True is set in each of the word candidates “Jobname-Cluster”, “Jobname-Iterator”, “Jobname-running”, “Jobname-iteration”, “Jobname-3”, “Jobname-over”, “Jobname-priorPath”, “Jobname-kmeans”, “Jobname-46”, and “Jobname-clusters-2” for which the corresponding words are present. Meanwhile, False remains to be set in each of the word candidates “Jobname-5”, “Jobname-106”, and “Jobname-clusters-4” for which the corresponding words are not present.

[0091] The data conversion unit 101 may set information other than True in the corresponding word candidate, as long as it is clear that the word candidate is included in the job name. For example, the input data conversion unit 101 may set the numerical value 1 in the corresponding word candidate, instead of True. In the case of setting the numerical value 1, the input data conversion unit 101 sets the numerical value 0 in each word candidate instead of False in the initialization process of step S102.

[0092] As a result of the input data conversion unit 101 setting True for the number of divided words (the determination condition in step S107 is met), the word-containing information corresponding to the input job feature information is generated. The input data conversion unit 101 repeatedly performs the process of steps S103 to S108 for the number of input pieces of job feature information.

[0093] After generating the word-containing information for the number of input pieces of job feature information (the determination condition in step S108 is met), the input data conversion unit 101 ends the generation process.

[0094] The following describes the effect of the information obtained as a result of the conversion as depicted in FIG. 6, on the amount of computer resources estimation algorithm. By referencing to the word-containing information depicted in FIG. 6, the computer resources usage estimation unit 103 can recognize the combination of words constituting the job name.

[0095] By referencing the word-containing information corresponding to the set of tasks that differ in the relationship between task feature information and the amount of computer resources, the computer resources usage estimation unit 103 can classify the tasks included in the task set depending on whether or not a predetermined word set is included.

[0096] For example, the task corresponding to each piece of task feature information depicted in FIG. 5 executes K-Means which is one of the machine learning algorithms. Even if the computer resources usage estimation unit 103 has not recognized beforehand that the task executes K-Means, the computer resources usage estimation unit 103 can recognize the tendency of the implementation of K-Means by extracting the task group corresponding to the word-containing information whose word candidate “Jobname-kmeans” is True in FIG. 6. By estimating the amount of computer resources required for task processing based on the recognition of the tendency of the implementation for each algorithm, the computer resources usage estimation unit 103 can enhance estimation accuracy.

[0097] The operation of the input data conversion unit 101 in this exemplary embodiment generating, for a program class name which is one type of job feature information, word-containing information indicating whether or not each word of the word group constituting the class name is included on the basis of the job feature information is described next, with reference to FIG. 7. FIG. 7 is a flowchart depicting another operation of the word-containing information generation process by the input data conversion unit 101 in Exemplary Embodiment 1.

[0098] FIG. 8 is an explanatory diagram depicting another example of job feature information input to the input data conversion unit 101. FIG. 8 depicts part of task-related information which is observed in the processing in the distributed processing system. The job feature information depicted in FIG. 8 is made up of a task number and a program class name.

[0099] FIG. 9 is an explanatory diagram depicting another example of word-containing information output from the input data conversion unit 101. FIG. 9 depicts the word-containing information generated by the input data conversion unit 101 on the basis of the program class name included in the job feature information depicted in FIG. 8. The following describes the operation of the input data conversion unit 101 generating the word-containing information depicted in FIG. 9 on the basis of the job feature information depicted in FIG. 8, with reference to FIG. 7.

[0100] When the job feature information depicted in FIG. 8 is input, the input data conversion unit 101 forms the output word-containing information by the task number and the group of candidates of words constituting the program class name included in the job feature information (step S111).

[0101] When forming the word-containing information, the input data conversion unit 101 generates each word candidate name by, for example, prefixing the identifier of the generation source information. The input data conversion unit 101 may generate each word candidate name by any other method, as long as the generated name is uniquely identifiable.

[0102] The class name with the task number “1” in the job feature information depicted in FIG. 8 is “org.apache.mahout.clustering.iterator.CIMapper”. The class name with the task number “2” is “org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper”. Based on the input two class names, the input data conversion unit 101 forms the word-containing information by the group of candidates of words constituting each class name.

[0103] In detail, the input data conversion unit 101 prefixes “Class” to each of the words “org”, “apache”, “mahout”, “clustering”, “Aerator”, and “CIMapper” present in the class name with the task number “1”, to generate the word candidate name.

[0104] The input data conversion unit 101 also prefixes “Class” to each of the words “cf”, “taste”, “hadoop”, “item”, and “ItemIDIndexMapper” not present in the class name with the task number “1” and only present in the class name with the task number “2”, to generate the word candidate name. The input data conversion unit 101 forms the word-containing information by the word candidate group indicating the generated names.

[0105] The input data conversion unit 101 generates the same number of pieces of word-containing information as the number of input pieces of job feature information. The input data conversion unit 101 sets the task number of the input job feature information as the task number of the generated word-containing information corresponding to the job feature information.

[0106] The input data conversion unit 101 then sets all word candidates in each generated piece of word-containing information to False, as an initialization process (step S112).

[0107] Following this, the input data conversion unit 101 divides the program class name of the input job feature information into words (step S114). For example, the class name with the task number “1” is divided into the words “org”, “apache”, “mahout”, “clustering”, “iterator”, and “CIMapper”.

[0108] The delimiter or delimiter character when the input data conversion unit 101 divides the class name into words are, for example, set by the user, the system, or the like. Alternatively, the input data conversion unit 101 may hold the delimiter or delimiter character beforehand.

[0109] The input data conversion unit 101 then sets the word candidates in the word-containing information corresponding to the divided words, to True (step S116). The binary information “True” indicates that the set word candidate is included in the class name. The input data conversion unit 101 sets True for the number of divided words (step S117).

[0110] For example, in the case of the job feature information of the task number “1”, True is set in each of the word candidates “Class-org”, “Class-apache”, “Class-mahout”, “Class-clustering”, “Class-iterator”, and “Class-CIMapper” for which the corresponding words are present. Meanwhile, False remains to be set in each of the word candidates “Class-cf”, “Class-taste”, “Class-hadoop”, “Class-item”, and “Class-ItemIDIndexMapper” for which the corresponding words are not present.

[0111] The data conversion unit 101 may set information other than True in the corresponding word candidate, as long as it is clear that the word candidate is included in the program class name. For example, the input data conversion unit 101 may set the numerical value 1 in the corresponding word candidate, instead of True. In the case of setting the numerical value 1, the input data conversion unit 101 sets the numerical value 0 in each word candidate instead of False in the initialization process of step S112.

[0112] As a result of the input data conversion unit 101 setting True for the number of divided words (the determination condition in step S117 is met), the word-containing information corresponding to the input job feature information is generated. The input data conversion unit 101 repeatedly performs the process of steps S113 to S118 for the number of input pieces of job feature information.

[0113] After generating the word-containing information for the number of input pieces of job feature information (the determination condition in step S118 is met), the input data conversion unit 101 ends the generation process.

[0114] The following describes the effect of the information obtained as a result of the conversion as depicted in FIG. 9, on the amount of computer resources estimation algorithm. By referencing to the word-containing information depicted in FIG. 9, the computer resources usage estimation unit 103 can recognize the combination of words constituting the program class name.

[0115] By referencing the word-containing information corresponding to the set of tasks that differ in the relationship between task feature information and the amount of computer resources, the computer resources usage estimation unit 103 can classify the tasks included in the task set depending on whether or not a predetermined word set is included.

[0116] For example, the task corresponding to each piece of task feature information depicted in FIG. 8 executes a program implemented by Apache Mahout® which is a framework for executing a machine learning algorithm in Apache Hadoop®. Hence, True is set in the word candidate “Class-mahout” in the word-containing information corresponding to the task that executes the program implemented by Apache Mahout.

[0117] Even if the computer resources usage estimation unit 103 has not recognized beforehand that the task executes the program implemented by Apache Mahout, the computer resources usage estimation unit 103 can recognize the tendency of the implementation of Apache Mahout by extracting the task group corresponding to the word-containing information whose word candidate “Class-mahout” is True in FIG. 9. By estimating the amount of computer resources required for task processing based on the recognition of the tendency of the implementation for each algorithm, the computer resources usage estimation unit 103 can enhance estimation accuracy.

[0118] The operation of the input data conversion unit 101 in this exemplary embodiment generating numerical inversion label information on the basis of one type of job feature information that includes an observation value during program execution and an option numerical value designated during program execution is described next, with reference to FIG. 10.

[0119] FIG. 10 is a flowchart depicting an operation of the numerical inversion label information generation process by the input data conversion unit 101 in Exemplary Embodiment 1. The following describes an example where the observation value during program execution is a file read byte count and the option numerical value designated during program execution is a predetermined command line argument value.

[0120] FIG. 11 is an explanatory diagram depicting another example of job feature information input to the input data conversion unit 101. FIG. 11 depicts part of task-related information which is observed in the processing in the distributed processing system. The job feature information depicted in FIG. 11 is made up of a task number, a file read byte count, and optionl which is a command line argument. Here, optionl is one of the parameters given to the algorithm executed by the task indicated by the task number.

[0121] FIG. 12 is an explanatory diagram depicting an example of numerical inversion label information output from the input data conversion unit 101. FIG. 12 depicts the numerical inversion label information generated by the input data conversion unit 101 on the basis of the values of the file read byte count and optionl included in the job feature information depicted in FIG. 11. The following describes the operation of the input data conversion unit 101 generating the numerical inversion label information depicted in FIG. 12 on the basis of the job feature information depicted in FIG. 11, with reference to FIG. 10.

[0122] When the job feature information depicted in FIG. 11 is input, the input data conversion unit 101 forms the output numerical inversion label information by the task number and a label information group (step S121). The respective values obtained by converting the values of the file read byte count and optionl included in the job feature information are set in the label information group. Each value set in label information is handled as an identifier represented by a character string.

[0123] When forming the numerical inversion label information, the input data conversion unit 101 generates each label information name by, for example, prefixing the identifier of the generation source information. The input data conversion unit 101 may generate each label information name by any other method, as long as the generated name is uniquely identifiable.

[0124] The input data conversion unit 101 may set the job feature information whose value has been replaced, as the numerical inversion label information. The numerical inversion label information depicted in FIG. 12 is generated by replacing the value in the job feature information depicted in FIG. 11. In detail, the numerical inversion label information is generated by replacing the memory usage value.

[0125] Following this, the input data conversion unit 101 converts value v included in the job feature information into value v′ using function f (step S124). Function f used when the input data conversion unit 101 converts the value is, for example, set by the user, the system, or the like. Alternatively, the input data conversion unit 101 may hold function f beforehand.

[0126] The input data conversion unit 101 uses any mathematical function for function f. Function f used for the conversion into the value depicted in FIG. 12 is f=floor(log.sub.10(v)).

[0127] The input data conversion unit 101 then sets the label information of the numerical inversion label information corresponding to value v, to converted value v′ (step S125). The input data conversion unit 101 performs the value conversion and the converted value setting for the number of conversion target values included in the job feature information (step S126).

[0128] For example, in the case of the job feature information of the task number “1” depicted in FIG. 11, the file read byte count “301355226” is converted into “8” by function f. Moreover, the option1 (command line argument) “0.01” is converted into “−2” by function f.

[0129] For example, in the case of the numerical inversion label information of the task number “1” depicted in FIG. 12, the file read byte count is set to the character string “8”, and the option1 (command line argument) is set to the character string “−2”.

[0130] As a result of the input data conversion unit 101 performing the value conversion and the converted value setting for the number of conversion target values included in the job feature information (the determination condition in step S126 is met), the numerical inversion label information corresponding to the input job feature information is generated. The input data conversion unit 101 repeatedly performs the process of steps S122 to S127 for the number of input pieces of job feature information.

[0131] After generating the numerical inversion label information for the number of input pieces of job feature information (the determination condition in step S127 is met), the input data conversion unit 101 ends the generation process.

[0132] The following describes the effect of the information obtained as a result of the conversion as depicted in FIG. 12, on the amount of computer resources estimation algorithm. The numerical inversion label information depicted in FIG. 12 includes numerical information as label information of a character string.

[0133] Accordingly, in the case of using the numerical inversion label information depicted in FIG. 12, the computer resources usage estimation unit 103 can use a favorable algorithm for which numerical information is not suitable as input data and that has advantages such as highly accurate the amount of computer resources estimation or easy implementation.

[0134] For example, the naive Bayes algorithm handles input data as discrete values. When handling numerical information which is a continuous quantity, the naive Bayes algorithm interprets all values as discontinuous discrete values.

[0135] The operation of interpretation as discontinuous discrete values is not an operation that is supposed to be performed by the naive Bayes algorithm. In the case of interpreting the information as discontinuous discrete values, the naive Bayes algorithm performs overfitting or the like in the estimation process. Overfitting or the like degrades the accuracy of the amount of computer resources estimates by the naive Bayes algorithm.

[0136] The numerical inversion label information output from the input data conversion unit 101 in this exemplary embodiment includes the numerical value converted from a continuous quantity to a discrete quantity by function f, as label information. In the case where the numerical inversion label information including the label information is the input data, the computer resources usage estimation unit 103 can use an algorithm, such as the naive Bayes algorithm, that can only handle discrete values. The possibility that the computer resources usage estimation unit 103 can accurately estimate the amount of computer resources required for task processing using the naive Bayes algorithm is thus increased.

[0137] By adjusting function f, the input data conversion unit 101 can convert the distribution of the input data into another distribution. The conversion of the data distribution increases the possibility that the computer resources usage estimation unit 103 can classify data more clearly.

[0138] According to this exemplary embodiment, the amount of computer resources required for task processing in the distributed processing system are estimated accurately. By receiving the information output from the input data conversion unit 101 as input, the computer resources usage estimation model generation unit 102 can easily classify, for each estimation algorithm, the determinant of the format of function for computing the amount of computer resources. The classification of the determinant for each estimation algorithm corresponds to extracting the task group whose word candidate “Jobname-kmeans” is True or extracting the task group whose word candidate “Class-mahout” is True as mentioned above.

[0139] By receiving the classified determinant as input for generating the amount of computer resources estimation algorithm, the computer resources usage estimation model generation unit 102 can generate a function in a format close to the value distribution in task processing. The computer resources usage estimation unit 103 can enhance estimation accuracy by estimating computer resources usage using the function in a format close to the value distribution in task processing which has been generated by the computer resources usage estimation model generation unit 102.

Exemplary Embodiment 2

[0140] [Structure]

[0141] The following describes Exemplary Embodiment 2 of the present invention with reference to drawings. FIG. 13 is a block diagram depicting an example of the structure of Exemplary Embodiment 2 of a computer resources usage estimation device according to the present invention.

[0142] As depicted in FIG. 13, the computer resources usage estimation device 100 in this exemplary embodiment differs from Exemplary Embodiment 1 in that an estimate reverse conversion unit 104 is added.

[0143] The estimate reverse conversion unit 104 has a function of reversely converting the value output from the computer resources usage estimation unit 103, into a computer resources usage estimate. The estimate reverse conversion unit 104 is, for example, realized by a CPU that executes processes according to program control.

[0144] In this exemplary embodiment, the computer resources usage estimation model generation unit 102 receives the data output from the input data conversion unit 101, and generates the estimation model. The computer resources usage estimation unit 103 receives the data output from the input data conversion unit 101, and outputs, in the same format as the received data, the value of computer resources usage of a task whose feature has not been recognized yet.

[0145] The estimate reverse conversion unit 104 converts the value indicating the computer resources usage estimate output from the computer resources usage estimation unit 103 into numerical information indicating the computer resources usage estimate, and outputs the numerical information. The use of the computer resources usage estimation device 100 in this exemplary embodiment enables the user, the distributed processing system scheduler, etc. to estimate the amount of computer resources required for task processing.

[0146] [Operation]

[0147] The following describes the operation of the input data conversion unit 101 and the operation of the estimate reverse conversion unit 104 in this exemplary embodiment, with reference to FIGS. 14 and 17 respectively.

[0148] The operation of the input data conversion unit 101 in this exemplary embodiment generating numerical inversion label information on the basis of one type of job feature information that includes computer resources usage observed during program execution is described first, with reference to FIG. 14. FIG. 14 is a flowchart depicting an operation of the numerical inversion label information generation process by the input data conversion unit 101 in Exemplary Embodiment 2.

[0149] FIG. 15 is an explanatory diagram depicting another example of job feature information input to the input data conversion unit 101. FIG. 15 depicts part of task-related information which is observed in the processing in the distributed processing system. The job feature information depicted in FIG. 15 is made up of a task number and memory usage. In this exemplary embodiment, memory usage is the amount of computer resources to be estimated.

[0150] FIG. 16 is an explanatory diagram depicting another example of numerical inversion label information output from the input data conversion unit 101. FIG. 16 depicts the numerical inversion label information generated by the input data conversion unit 101 on the basis of the memory usage included in the job feature information depicted in FIG. 15. The following describes the operation of the input data conversion unit 101 generating the numerical inversion label information depicted in FIG. 16 on the basis of the job feature information depicted in FIG. 15, with reference to FIG. 14.

[0151] When the job feature information depicted in FIG. 15 is input, the input data conversion unit 101 forms the output numerical inversion label information by the task number and a label information group (step S201). The value obtained by converting the memory usage included in the job feature information is set in the label information group. Each value set in label information is handled as an identifier represented by a character string.

[0152] When forming the numerical inversion label information, the input data conversion unit 101 generates each label information name by, for example, prefixing the identifier of the generation source information. The input data conversion unit 101 may generate each label information name by any other method, as long as the generated name is uniquely identifiable.

[0153] The input data conversion unit 101 may set the job feature information whose value has been replaced, as the numerical inversion label information. The numerical inversion label information depicted in FIG. 16 is generated by replacing the value in the job feature information depicted in FIG. 15. In detail, the numerical inversion label information is generated by replacing the memory usage value.

[0154] Following this, the input data conversion unit 101 converts value v included in the job feature information into value v′ using function f (step S204). Function f used when the input data conversion unit 101 converts the value is, for example, set by the user, the system, or the like. Alternatively, the input data conversion unit 101 may hold function f beforehand.

[0155] The input data conversion unit 101 uses any mathematical function for function f. Function f used for the conversion into the value depicted in FIG. 16 is f=floor(log.sub.2(v)).

[0156] The input data conversion unit 101 then sets the label information of the numerical inversion label information corresponding to value v, to converted value v′ (step S205). The input data conversion unit 101 performs the value conversion and the converted value setting for the number of conversion target values included in the job feature information (step S206).

[0157] For example, in the case of the job feature information of the task number “1” depicted in FIG. 15, the memory usage “1820852224” is converted into “30” by function f. In the case of the numerical inversion label information of the task number “1” depicted in FIG. 16, the memory usage is set to the character string “30”.

[0158] As a result of the input data conversion unit 101 performing the value conversion and the converted value setting for the number of conversion target values included in the job feature information (the determination condition in step S206 is met), the numerical inversion label information corresponding to the input job feature information is generated. The input data conversion unit 101 repeatedly performs the process of steps S202 to S207 for the number of input pieces of job feature information.

[0159] After generating the numerical inversion label information for the number of input pieces of job feature information (the determination condition in step S207 is met), the input data conversion unit 101 ends the generation process.

[0160] The input data conversion unit 101 outputs the generated numerical inversion label information to the computer resources usage estimation model generation unit 102 including the machine learning algorithm and the like. The computer resources usage estimation model generation unit 102 generates an estimation model for computing a memory usage estimate, using the received numerical inversion label information.

[0161] The operation of the estimate reverse conversion unit 104 in this exemplary embodiment reversely converting the output value of the estimation algorithm into estimated computer resources usage is described next, with reference to FIG. 17. FIG. 17 is a flowchart depicting an operation of the estimated memory usage reverse conversion process by the estimate reverse conversion unit 104 in Exemplary Embodiment 2. FIG. 17 depicts the operation of the estimate reverse conversion unit 104 reversely converting the output value of the estimation model into a memory usage estimate.

[0162] FIG. 18 is an explanatory diagram depicting an example of numerical inversion label information output from the estimation model. The numerical inversion label information is made up of a task number and memory usage (predicted value). The value set in the memory usage (predicted value) is the estimated memory usage after the conversion by function f.

[0163] As depicted in FIG. 18, the memory usage (predicted value) of the numerical inversion label information of the task number “11” is “27”. In other words, the output value of the estimation model for the task of the task number “11” is “27”. The memory usage (predicted value) of the numerical inversion label information of the task number “12” is “31”. In other words, the output value of the estimation model for the task of the task number “12” is “31”.

[0164] FIG. 19 is an explanatory diagram depicting an example of estimated memory usage information output from the estimate reverse conversion unit 104. FIG. 19 depicts the estimated memory usage information generated by the estimate reverse conversion unit 104 reversely converting the memory usage estimate included in the numerical inversion label information output from the estimation model depicted in FIG. 18. The estimated memory usage information is made up of a task number and memory usage (predicted value). The unit of memory usage (predicted value) is bytes.

[0165] As depicted in FIG. 19, the memory usage (predicted value) of the estimated memory usage information of the task number “11” is “134217728”. In other words, the memory usage estimate for the task of the task number “11” is 134217728 bytes. The memory usage (predicted value) of the estimated memory usage information of the task number “12” is “2147483648”. In other words, the memory usage estimate for the task of the task number “12” is 2147483648 bytes.

[0166] The following describes the operation of the estimate reverse conversion unit 104 generating the estimated memory usage information depicted in FIG. 19 on the basis of the numerical inversion label information depicted in FIG. 18, with reference to FIG. 17.

[0167] The estimate reverse conversion unit 104 feeds output value p′ included in the numerical inversion label information output from the estimation model, to inverse function f .sup.−1 of function f used in the conversion target value conversion process in step S204 in FIG. 14. In this exemplary embodiment, f.sup.−1 is f.sup.−1=2.sup.p′. As a result of feeding f.sup.−1, the estimate reverse conversion unit 104 obtains estimate p (step S211). The estimate reverse conversion unit 104 generates estimated memory usage information on the basis of the obtained estimate p.

[0168] The estimate reverse conversion unit 104 repeatedly performs the process of step S211 for the number of input pieces of numerical inversion label information. After generating estimated memory usage information for the number of input pieces of numerical inversion label information, the estimate reverse conversion unit 104 ends the process.

[0169] Thus, the computer resources usage estimation device 100 in this exemplary embodiment can convert the character string included in the numerical inversion label information output from the estimation model, into a computer resources usage estimate which is numerical information. By using the converted estimate, the distributed processing system can process the task faster or more efficiently. The use of the estimate increases the possibility that the amount of computer resources assigned to the process can be made to minimum required quantity.

[0170] For example, suppose the user sets to use 2 GB memory for all processes in the distributed processing system. With this setting, a computer with 4 GB memory can execute two processes in parallel. In the case where the memory used for a process is 1 GB, however, the setting means that 2 GB memory is unnecessarily assigned to the computer.

[0171] If it is possible to estimate that the memory required for the process is 1 GB, the user can perform setting so that the distributed processing system assigns four processes all at once to a computer with 4 GB memory. By executing the four processes in parallel, the distributed processing system can process the job at double speed, as compared with the aforementioned setting. Moreover, the unnecessarily assignment of 2 GB memory is avoided, which contributes to higher computer resources use efficiency than the aforementioned setting.

[0172] The following describes the effect of the information obtained as a result of the conversion as depicted in FIG. 16, on the amount of computer resources estimation algorithm.

[0173] The numerical inversion label information depicted in FIG. 16 includes estimation target numerical information as label information of a character string.

[0174] Accordingly, in the case of using the numerical inversion label information depicted in FIG. 16, the computer resources usage estimation unit 103 can use a favorable algorithm with which numerical information is hard to be estimated as an estimate and that has advantages such as highly accurate the amount of computer resources estimation or easy implementation.

[0175] For example, the naive Bayes algorithm handles discrete values as an estimation target. When handling numerical information which is a continuous quantity as an estimation target, the naive Bayes algorithm interprets all values as discontinuous discrete values.

[0176] The operation of interpretation as discontinuous discrete values is not an operation that is supposed to be performed by the naive Bayes algorithm. In the case of interpreting the information as discontinuous discrete values, the naive Bayes algorithm performs overfitting or the like in the estimation process. Overfitting or the like degrades the accuracy of the estimate of the amount of computer resources by the naive Bayes algorithm.

[0177] The numerical inversion label information output from the input data conversion unit 101 in this exemplary embodiment includes the numerical value converted from a continuous quantity to a discrete quantity by function f, as the label information. In the case where the numerical inversion label information including the label information is an estimation target, the computer resources usage estimation unit 103 can use an algorithm, such as the naive Bayes algorithm, that can only handle discrete values as estimates. The possibility that the computer resources usage estimation unit 103 can accurately estimate the amount of computer resources required for task processing using the naive Bayes algorithm is thus increased.

[0178] By adjusting function f, the computer resources usage estimation device 100 can obtain an estimate of appropriate resolution. For example, the computer resources usage estimation device 100 can estimate a large estimate without being affected by a slight change, by using a logarithmic function as function f. This increases the possibility that the amount of computer resources is estimated to an appropriate degree in conformity with the status of the distributed processing system.

[0179] The following describes an overview of the present invention. FIG. 20 is a block diagram schematically depicting an output device according to the present invention. An output device 10 according to the present invention is provided with an output unit 11 (for example, the input data conversion unit 101) for outputting, on the basis of job feature information indicating the features of the job of a distributed processing system, estimation model application information that is information in a format suitable for an estimation model that estimates the amount of computer resources required for processing a task constituting the job.

[0180] With such a structure, the output device can provide information in a format suitable for a model that estimates the amount of computer resources required for task processing in a distributed processing system.

[0181] The estimation model application information may include word-containing information having binary information that indicates whether or not a character string indicated by the character string information included in the job feature information includes a prescribed word.

[0182] With such a structure, the output device can provide information indicating whether or not a job name or a class name includes a prescribed word.

[0183] The estimation model application information may include numerical inversion label information having, as string label information, a value derived by converting, by a prescribed function, the numeric value indicated by the numerical information included in the job feature information.

[0184] With such a structure, the output device can provide information including string label information that can be easily handled by the estimation model.

[0185] The output device 10 may include a form conversion unit (for example, the estimate reverse conversion unit 104) for outputting the estimation model application information output from the estimation model, in a same format as the job feature information corresponding to the estimation model application information.

[0186] With such a structure, the output device can provide information of computer resources usage in a format desired by the user.

[0187] The output device 10 may include a computer resources estimation unit (for example, the computer resources usage estimation unit 103) for estimating the amount of computer resources required for processing the task included in the job corresponding to the job feature information in the distributed processing system, by feeding the estimation model application information output from the output unit 11 on the basis of the job feature information into the estimation model.

[0188] With such a structure, the output device can estimate computer resources usage on the basis of estimation model application information.

[0189] The output device 10 may include a computer resources estimation model generation unit (for example, the computer resources usage estimation model generation unit 102) for generating the estimation model for estimating the amount of computer resources required for processing the task included in the job corresponding to the job feature information in the distributed processing system, using the estimation model application information output from the output unit 11 on the basis of the job feature information.

[0190] With such a structure, the output device can generate a computer resources usage estimation model on the basis of estimation model application information.

[0191] FIG. 21 is a block diagram schematically depicting a data structure according to the present invention. The data structure according to the present invention includes estimation model application information generated on the basis of job feature information indicating the features of the job of a distributed processing system, estimation model application information that is information in a format suitable for an estimation model that estimates the amount of computer resources required for processing a task constituting the job.

[0192] With such a structure, the data structure can provide information in a format suitable for a model that estimates the amount of computer resources required for task processing in a distributed processing system.

[0193] The estimation model application information may include word-containing information having binary information that indicates whether or not a character string indicated by the character string information included in the job feature information includes a prescribed word.

[0194] With such a structure, the data structure can provide information indicating whether or not a job name or a class name include a prescribed word.

[0195] The estimation model application information may include numerical inversion label information having, as string label information, a value derived by converting, by a prescribed function, the numeric value indicated by the numerical information included in the job feature information.

[0196] With such a structure, the data structure can provide information including string label information that can be easily handled by the estimation model.

[0197] Although the present invention has been described with reference to the above exemplary embodiments and examples, the present invention is not limited to the above exemplary embodiments and examples. Various changes understandable by those skilled in the art can be made to the structures and details of the present invention within the scope of the present invention.

[0198] This application claims priority based on Japanese Patent Application No. 2015-010492 filed on Jan. 22, 2015, the disclosure of which is incorporated herein in its entirety.

REFERENCE SIGNS LIST

[0199] 10 output device

[0200] 11 output unit

[0201] 100 computer resources usage estimation device

[0202] 101 input data conversion unit

[0203] 102 computer resources usage estimation model generation unit

[0204] 103 computer resources usage estimation unit

[0205] 104 estimate reverse conversion unit

OUTPUT DEVICE, DATA STRUCTURE, OUTPUT METHOD, AND OUTPUT PROGRAM

Assignee

Inventors

Cpc classification

Classification Explorer

G06N7/01

PHYSICS

Classification Explorer

G06F11/3006

PHYSICS

Classification Explorer

G06F11/3447

PHYSICS

Classification Explorer

G06N20/00

PHYSICS

Classification Explorer

G06F30/20

PHYSICS

Classification Explorer

G06F9/50

PHYSICS

Classification Explorer

G06F11/3442

PHYSICS

Classification Explorer

G06F9/5083

PHYSICS

International classification

Classification Explorer

G06F17/50

PHYSICS

Classification Explorer

G06F11/34

PHYSICS

Classification Explorer

G06F9/50

PHYSICS

Abstract

Claims

Description