Patent classifications
G06F2209/501
Dynamic allocation and re-allocation of learning model computing resources
This disclosure describes techniques for improving allocation of computing resources to computation of machine learning tasks, including on massive computing systems hosting machine learning models. A method includes a computing system, based on a computational metric trend and/or a predicted computational metric of a past task model, allocating a computing resource for computing of a machine learning task by a current task model prior to runtime of the current task model; computing the machine learning task by executing a copy of the current task model; quantifying a computational metric of the copy of the current task model; determining a computational metric trend based on the computational metric; deriving a predicted computational metric of the copy of the current task model based on the computational metric; and, based on the computational metric trend, changing allocation of a computing resource for computing of the machine learning task by the current task model.
PREVENTION APPARATUS OF USER REQUIREMENT VIOLATION FOR CLOUD SERVICE, PREVENTION METHOD OF USER REQUIREMENT VIOLATION AND PROGRAM THEREOF
A ratio of prediction liable to result in user requirement violation is reduced by adjusting results of resource design even if it is highly likely that the user requirement violation will incur a heavy penalty. There is provided a requirement specifying functional unit (11) that specifies a user requirement for a service of interest, and a resource design unit (12) that predicts, by machine learning, performance achievable at a plurality of resource settings in performing the service of interest and selects a resource setting that satisfies the specified user requirement, based on results of the prediction, wherein the resource design unit (12) generates a P model as a model for use to predict performance, the P model using a P-mode loss function obtained by adding a function to an N model that uses an existing N-mode loss function, the added function taking a finite value when actual performance is lower than predicted performance.
Feature Resource Self-Tuning and Rebalancing
An apparatus comprises at least one processing device that includes a processor coupled to a memory. The processing device is configured to identify a plurality of resource objects associated with a processing device, to group correlated resource objects according to processing device utilization of the resource objects, to assign a first weight to a first resource object grouping, wherein the first weight is associated with a performance impact of the first resource object grouping on the processing device, and to release at least some of the first resource object grouping to provide additional resources to a second resource object grouping, the additional resources resulting from the releasing, wherein the first object grouping is selected for the releasing based on a comparison between the first weight and a second weight associated with the second resource object grouping, wherein the releasing is performed to improve performance of the processing device.
CPU CLUSTER SHARED RESOURCE MANAGEMENT
Embodiments include an asymmetric multiprocessing (AMP) system having a first central processing unit (CPU) cluster comprising a first core type, and a second CPU cluster comprising a second core type, where the AMP system can update a thread metric for a first thread running on the first CPU cluster based at least on: a past shared resource overloaded metric of the first CPU cluster, and on-core metrics of the first thread. The on-core metrics of the first thread can indicate that first thread contributes to contention of the same shared resource corresponding to the past shared resource overloaded metric of the first CPU cluster. The AMP system can assign the first thread to a different CPU cluster while other threads of the same thread group remain assigned to the first CPU cluster. The thread metric can include a Matrix Extension (MX) thread flag or a Bus Interface Unit (BIU) thread flag.
Provisioning edge backhauls for dynamic workloads
Network capacity is provisioned in a computing environment comprising a computing service provider and an edge computing network. A cost function is applied to usage data for a number of user endpoints at the edge computing network, a number and type of workloads at the edge computing network, offload capability of the edge computing network, and resource capacities at the edge computing network. An estimated network capacity is determined, where the workloads are dynamic, and the cost function is usable to optimize the network capacity with respect to one or more criteria.
STORAGE ARRAY RESOURCE ALLOCATION BASED ON FEATURE SENSITIVITIES
Aspects of the present disclosure relate to tuning resource allocations based on a storage array feature's impact on the array's global performance. In embodiments, one or more input/output (IO) features used by a storage array to process one or more IO workloads are determined. Additionally, each IO feature's impact for processing the IO workload within a threshold performance requirement can be determined. Further, at least one IO feature's resource allocation can be tuned based on its IO workload processing impact.
Trims for memory performance targets of applications
A memory sub-system can receive a definition of a performance target for each of a number of applications that use the memory sub-system for storage. The memory sub-system can create a plurality of partitions according to the definitions and assign each of the partitions to a block group. The memory sub-system can operate each block group with a trim tailored to the performance target corresponding to that block group and application.
Method, apparatus, client terminal, and server for data processing
Embodiments of the present specification provide a method, an apparatus, a client terminal, and a server for data processing. The method includes: selecting, based on a data attribute of to-be-processed data, a target coordinating server from a plurality of coordinating servers, the plurality of coordinating servers belonging to a plurality of server clusters respectively; and sending a data processing request to the target coordinating server, such that a server cluster to which the target coordinating server belongs processes the data processing request preferentially, the data processing request directing to the to-be-processed data.
Method, system, and computer program product for dynamically scheduling machine learning inference jobs with different quality of services on a shared infrastructure
A method, system, and computer program product for dynamically scheduling machine learning inference jobs receive or determine a plurality of performance profiles associated with a plurality of system resources, wherein each performance profile is associated with a machine learning model; receive a request for system resources for an inference job associated with the machine learning model; determine a system resource of the plurality of system resources for processing the inference job associated with the machine learning model based on the plurality of performance profiles and a quality of service requirement associated with the inference job; assign the system resource to the inference job for processing the inference job; receive result data associated with processing of the inference job with the system resource; and update based on the result data, a performance profile of the plurality of the performance profiles associated with the system resource and the machine learning model.
APPARATUS FOR MACHINE LEARNING SERVICE, METHOD FOR MACHINE LEARNING SERVICE AND PROGRAM THEREOF
To eliminate the need for a resource design process needed by the user in using a machine learning service and thereby reduce the time and costs which impose a burden on the user.
A machine learning service device includes a requirement specifying functional unit (11) used to specify a task, a model, throughput, and performance that are desired in machine learning; and a resource design unit (12) configured to predict achievable performance at a plurality of resource settings by machine learning using the task, the model, and the throughput specified via the requirement specifying functional unit and select a resource setting that satisfies the specified performance based on results of the prediction.