G06F2209/504

AUTOSCALING IN AN ELASTIC CLOUD SERVICE

Techniques described herein can optimize usage of computing resources in a data system. Dynamic throttling can be performed locally on a computing resource in the foreground and autoscaling can be performed in a centralized fashion in the background. Dynamic throttling can lower the load without overshooting while minimizing oscillation and reducing the throttle quickly. Autoscaling may involve scaling in or out the number of computing resources in a cluster as well as scaling up or down the type of computing resources to handle different types of situations.

Using constraint programming to set resource allocation limitations for allocating resources to consumers

Resource allocation limitations include resource limits and resource guarantees. A consumer is vulnerable to interruption by other consumers if using more resources than guaranteed. Resources are designated and/or assigned to consumers based on resource limits and resource guarantees. A constraint programming (CP) solver determines resource limits and resource guarantees that minimize vulnerability and/or vulnerability cost based on resource usage data. A CP data model includes limit elements, guarantee elements, and vulnerability elements. The CP data model further includes guarantee-vulnerability constraints, which relies on exceedance distributions generated from resource usage data for the consumers. The CP data model declaratively expresses combinatorial properties of a problem in terms of constraints. CP is a form of declarative programming.

Resource access based on user access ratings during constrained system performance

An overall access rating for each user in a plurality of users for accessing a computing resource of a set of computing resources is generated. Reduced performance of the computing resource is identified. Access metrics associated with each user in the plurality of users who are accessing the computing resource during the reduced performance of the computing resource are determined. The generated overall access ratings based on the determined access metrics are modified. Access to the computing resource is granted based on a ranking of the modified overall access ratings.

WORKFLOW-BASED SCHEDULING AND BATCHING IN MULTI-TENANT DISTRIBUTED SYSTEMS

Operation requests received from a tenant are added to a tenant-specific queue. A tenant scheduling work item is added to an execution queue that is shared with oilier tenants. When the tenant scheduling work item is executed, it copies up to a defined number of scheduled operations from the tenant-specific queue to the execution queue. The tenant-scheduling work item then re-adds itself to the execution queue. While the operations are executed and before the tenant scheduling work item is executed again, other tenants have an opportunity to queue their own operations. The tenant scheduling work item selects scheduled operations from the tenant-specific queue in the order they were originally requested until one of several conditions is met. Conditions may be based on how many operations are in progress, what kind of operations are in progress, and/or dependencies between operations of different types.

SHARING AND OVERSUBSCRIPTION OF GENERAL-PURPOSE GRAPHICAL PROCESSING UNITS IN DATA CENTERS

A method for managing general-purpose graphical processing units (GPGPUs) in a data center system is described. The method includes receiving, by a proxy agent, a GPGPU request from an application; selecting a GPGPU from a set of GPGPUs for processing a workload of the application based on one or more of available resources of the set of GPGPUs and requirements of the workload as indicated by the GPGPU request; establishing a session between an application agent located on a compute node on which the application is located and the proxy agent, and a second session between the GPGPU and the proxy agent in response to selecting the GPGPU to allow the GPGPU to process the workload, including subsequent GPGPU requests associated with the workload; and collecting a performance profile to describe usage of resources of the GPGPU by the workload.

CLOUD COMPUTING CAPACITY MANAGEMENT SYSTEM USING AUTOMATED FINE-GRAINED ADMISSION CONTROL

A cloud computing capacity management system can include a fine-grained admission control layer, a policy engine, and an enforcement layer. The fine-grained admission control layer can be configured to ingest capacity signals and create a capacity mitigation policy, based at least in part on the capacity signals, to protect available capacity of a cloud computing system for prioritized users. The capacity mitigation policy can be directed to users of the cloud computing system. The policy engine can be configured to control how the capacity mitigation policy is applied to the cloud computing system. The enforcement layer can be configured to handle incoming resource requests and to enforce resource limits based on the capacity mitigation policy as applied by the policy engine.

Replenishment-aware resource usage management

Provided is a system for managing the resource limit associated with a user, where the resource limit indicates the amount of compute resources the user is allowed to use. As the user requests and obtains additional resources from a pool of resources, the user's resource usage is increased to reflect the additional resources being used by the user. As the resources used by the user are released, to ensure that the pool of resources has sufficient capacity to handle additional resource requests, the replenishment status of the pool is further checked, and if the replenishment status satisfies a condition for updating the user's resource usage, the user's resource usage is decreased to reflect the resources that are no longer in use by the user. The released resources are torn down and re-provisioned back into the pool of resources.

Autoscaling and throttling in an elastic cloud service

Techniques described herein can optimize usage of computing resources in a data system. Dynamic throttling can be performed locally on a computing resource in the foreground and autoscaling can be performed in a centralized fashion in the background. Dynamic throttling can lower the load without overshooting while minimizing oscillation and reducing the throttle quickly. Autoscaling may involve scaling in or out the number of computing resources in a cluster as well as scaling up or down the type of computing resources to handle different types of situations.

Automatic Identification of Computer Agents for Throttling
20220283874 · 2022-09-08 ·

Computer agents can be throttled individually. In an example, when a computer agent completes a work item, the computer agent reports this to a central component that maintains a vote value for that agent and that increases the respective vote value based on the completed work item. When the central component determines that system performance is sufficiently diminished, central component can throttle the performance of those computer agents having respective vote values above a predetermined threshold value.

Constraints on updating or usage of memory system component resource control parameters
11442771 · 2022-09-13 · ·

Memory transactions can be tagged with a partition identifier selected depending on which software execution environment caused the memory transaction to be issued. A memory system component can control allocation of resources for handling the memory transaction or manage contention for said resources depending on a selected set of memory system component parameters selected depending on the partition identifier specified by the memory transaction. Programmable constraint storage circuitry stores at least one resource control parameter constraint used to constrain updating or usage of memory system component resource control parameters.