G06F2209/5019

PREDICTIVE SCALING OF CONTAINER ORCHESTRATION PLATFORMS

Systems, methods, and computer programming products leveraging recurrent neural network architectures to proactively predict workload demand of container orchestration platforms. The platform continuously collects metric data from clusters of the platform and train multiple parallel neural networks with different architectures to predict future platform workload demands. At periodic intervals, the registered neural networks in consideration for controlling the scaling operations of the platform are compared against one another to identify the neural network demonstrating the highest performance and/or most accurate workload prediction strategy for scaling the orchestration platform. The selected neural network is enforced as controller for the platform to implement the workload prediction strategy. The neural network controller enforced by the platform predictively scales up or down the number of pods within nodes of the platform and/or the number of clusters providing computational resources to the platform, in anticipation of future increased or decreased end user demand.

WORKLOAD PERFORMANCE PREDICTION AND REAL-TIME COMPUTE RESOURCE RECOMMENDATION FOR A WORKLOAD USING PLATFORM STATE SAMPLING

Embodiments described herein are generally directed to improving predictions regarding workload performance to facilitate dynamic auto device selection. In an example, based on telemetry samples collected from a computer system in real-time and indicative of a state of the computer system, one or more workload performance prediction models are built or updated for a heterogeneous set of computer resources of the computer system with reference to one or more optimization goals. At a time of execution of a workload, a particular computer resource of the heterogeneous set of computer resources on which to dispatch the workload is dynamically determined by: (i) generating multiple predicted performance scores each corresponding to one of the computer resources based on the state of the computer system and the one or more workload performance prediction models; and (ii) selecting the particular computer resource based on the predicted performance scores.

MULTI-DEVICE PROCESSING ACTIVITY ALLOCATION

Allocating processing activities among multiple computing devices can include identifying multiple computing activities of a computer-executable process and, for each computing activity identified, estimating in real time the computing resources needed. The identifying can be in response to detecting a computer-executable instruction executed by one multiple communicatively coupled computing devices, and the computer-executable instruction can be associate with the computer-executable process. A current condition and configuration of each of the computing devices can be determined in real time. For each computing device an effect induced by executing one or more of the plurality of activities can be predicted, the predicting based each computing device's current condition and configuration and performed by a machine learning model trained using data collected from prior real-time processing of example process activities. Based on the predicting, computing activities can be allocated in real time among the computing devices.

Method for establishing system resource prediction and resource management model through multi-layer correlations

A method for establishing system resource prediction and resource management model through multi-layer correlations is provided. The method builds an estimation model by analyzing the relationship between a main application workload, resource usage of the main application, and resource usage of sub-application resources and prepares in advance the specific resources to meet future requirements. This multi-layer analysis, prediction, and management method is different from the prior arts, which only focus on single-level estimation and resource deployment. The present invention can utilize more interactive relationships at different layers to effectively perform predictions, thereby achieving the advantage of reducing hidden resource management costs when operating application services.

METHOD AND SYSTEM FOR OPTIMIZING PARAMETER CONFIGURATION OF DISTRIBUTED COMPUTING JOB
20230042890 · 2023-02-09 ·

The present disclosure relates to a method and system for optimizing a parameter configuration of a distributed computing job. The method includes: obtaining job programs of different distributed computing jobs, and determining a key parameter configuration set; obtaining a cluster status during execution of the distributed computing job, randomly generating a sample data set based on the key parameter configuration set and the cluster status, and establishing a performance prediction model; correcting the performance prediction model by using a multi-objective genetic algorithm and an optimization module configured with an optimal configuration selection strategy; obtaining a job program of a to-be-optimized distributed computing job and a cluster status during execution of the to-be-optimized distributed computing job, and determining a to-be-optimized key parameter configuration item combination; and inputting, to the performance prediction model, the to-be-optimized key parameter configuration item combination and the cluster status during execution of the to-be-optimized distributed computing job, and outputting a key parameter configuration item combination with a shortest execution time. The present disclosure can rapidly and effectively optimize the key parameter configuration.

SYSTEM FOR MONITORING AND OPTIMIZING COMPUTING RESOURCE USAGE OF CLOUD BASED COMPUTING APPLICATION
20230043579 · 2023-02-09 ·

A system of monitoring and optimizing computing resources usage for computing application may include predicting a first performance metric for job load capacity of a computing application for optimal job concurrency and optimal resource utilization. The system may include generating an alerting threshold based on the first performance metric. The system may further include, in response to a difference between the alerting threshold and a job load of the computing application within an interval exceeding a threshold, predicting a second performance metric for job load capacity of the computing application for optimal job concurrency and optimal resource utilization. The system may further include, in response to a difference between the first performance metric and the second performance metric exceeding a difference threshold, updating the alerting threshold with a job load capacity with the optimal resource utilization rate corresponding to the second performance metric.

Dynamic allocation and re-allocation of learning model computing resources

This disclosure describes techniques for improving allocation of computing resources to computation of machine learning tasks, including on massive computing systems hosting machine learning models. A method includes a computing system, based on a computational metric trend and/or a predicted computational metric of a past task model, allocating a computing resource for computing of a machine learning task by a current task model prior to runtime of the current task model; computing the machine learning task by executing a copy of the current task model; quantifying a computational metric of the copy of the current task model; determining a computational metric trend based on the computational metric; deriving a predicted computational metric of the copy of the current task model based on the computational metric; and, based on the computational metric trend, changing allocation of a computing resource for computing of the machine learning task by the current task model.

Predicting and managing requests for computing resources or other resources

Requests for computing resources and other resources can be predicted and managed. For example, a system can determine a baseline prediction indicating a number of requests for an object over a future time-period. The system can then execute a first model to generate a first set of values based on seasonality in the baseline prediction, a second model to generate a second set of values based on short-term trends in the baseline prediction, and a third model to generate a third set of values based on the baseline prediction. The system can select a most accurate model from among the three models and generate an output prediction by applying the set of values output by the most accurate model to the baseline prediction. Based on the output prediction, the system can cause an adjustment to be made to a provisioning process for the object.

PREVENTION APPARATUS OF USER REQUIREMENT VIOLATION FOR CLOUD SERVICE, PREVENTION METHOD OF USER REQUIREMENT VIOLATION AND PROGRAM THEREOF

A ratio of prediction liable to result in user requirement violation is reduced by adjusting results of resource design even if it is highly likely that the user requirement violation will incur a heavy penalty. There is provided a requirement specifying functional unit (11) that specifies a user requirement for a service of interest, and a resource design unit (12) that predicts, by machine learning, performance achievable at a plurality of resource settings in performing the service of interest and selects a resource setting that satisfies the specified user requirement, based on results of the prediction, wherein the resource design unit (12) generates a P model as a model for use to predict performance, the P model using a P-mode loss function obtained by adding a function to an N model that uses an existing N-mode loss function, the added function taking a finite value when actual performance is lower than predicted performance.

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND COMPUTER-READABLE RECORDING MEDIUM STORING INFORMATION PROCESSING PROGRAM
20230010895 · 2023-01-12 · ·

An information processing apparatus includes: a memory; and a processor coupled to the memory and configured to: divide a job in units of computing nodes for a plurality of computing nodes; determine execution of scale-out or scale-in on the basis of a load in a case where each of the computing nodes is caused to execute a job obtained by the division; execute, in a case where determining execution of the scale-out, the scale-out according to the division of the job in units of computing nodes; and execute, in a case where determining execution of the scale-in, the scale-in according to the division of the job in units of computing nodes.