Patent classifications
G06F2209/508
SELECTING A NODE DEDICATED TO TRANSACTIONS OF A PARTICULAR WORK GROUP FOR EXECUTING A TARGET TRANSACTION OF ANOTHER WORK GROUP
A computing network includes nodes of different work groups. Nodes of a work group are dedicated to transactions of the work group. If a node of a first work group is predicted to have an idleness window, a second work group may borrow the node to execute a transaction of the second work group. At least a subset of steps of the transaction may be categorized into a step group. Trees of a transaction may be categorized into one or more tree groups. A node is selected for executing a transaction, if the predicted idleness duration of the node is sufficient relative to the predicted runtime of the transaction, the step group, and/or tree group. A credit system is maintained. A first work group transfers a credit to a second work group when borrowing a node of the second work group for executing a transaction of the first work group.
SELECTING A NODE GROUP OF A WORK GROUP FOR EXECUTING A TARGET TRANSACTION OF ANOTHER WORK GROUP TO OPTIMIZE PARALLEL EXECUTION OF STEPS OF THE TARGET TRANSACTION
A computing network includes nodes of different work groups. Nodes of a work group are dedicated to transactions of the work group. If a node of a first work group is predicted to have an idleness window, a second work group may borrow the node to execute a transaction of the second work group. At least a subset of steps of the transaction may be categorized into a step group. Trees of a transaction may be categorized into one or more tree groups. A node is selected for executing a transaction, if the predicted idleness duration of the node is sufficient relative to the predicted runtime of the transaction, the step group, and/or tree group. A credit system is maintained. A first work group transfers a credit to a second work group when borrowing a node of the second work group for executing a transaction of the first work group.
DISTRIBUTED TASK PROGRESS REPORT
A method for determining a progress of an execution of a task, the method may include accessing only a portion of a shared task status data structure that (a) is associated with the task, wherein the task is executed by a first plurality of compute elements, and (b) comprises multiple hierarchical levels; wherein an entry of a certain hierarchical level represents an aggregate progress associated with multiple entries of the another hierarchical level; the certain hierarchical level is higher than the other hierarchical level; and determining the progress of the execution of the task based on a content of the portion.
FLEXIBLE CLUSTER FORMATION AND WORKLOAD SCHEDULING
Techniques are disclosed for the cell/cluster formation of compute nodes and workload and processing resource scheduling. Compute nodes within an environment may be grouped (clustered) together to perform one or more designated workload tasks. The clustered compute nodes may be associated with (or assigned to) a workload cell formed to perform one or more identified task(s).
SYSTEM AND METHOD FOR LEVERAGING DISTRIBUTED REGISTER TECHNOLOGY TO MONITOR, TRACK, AND RECOMMEND UTILIZATION OF RESOURCES
Embodiments of the present invention provide a system for leveraging distributed register technology to securely monitoring, tracking, and recommending utilization of resources. The system is configured for gathering one or more input parameters from one or more entity systems, collecting activity data from one or more third party systems, analyzing the activity data collected from the one or third party systems, generating one or more recommendations based on the one or more input parameters and analyzing the activity data, wherein the one or more recommendations are associated with one or more activities, estimating resource usage for the one or more recommendations, and allocating resources to the one or more recommendations.
Infrastructure adaptive consistency level mechanism
A system to facilitate infrastructure management is described. The system includes one or more processors and a non-transitory machine-readable medium storing instructions that, when executed, cause the one or more processors to execute an infrastructure management controller to receive first monitoring data indicating a first infrastructure condition occurring at an on-premise infrastructure controller, determine a first load state of the on-premise infrastructure controller based on the first infrastructure condition and adjust a consistency level of the on-premise infrastructure controller to a first level of the consistency based on the first state.
Intelligent compute resource selection for machine learning training jobs
Techniques for intelligent compute resource selection and utilization for machine learning training jobs are described. At least a portion of a machine learning (ML) training job is executed a plurality of times using a plurality of different resource configurations, where each of the plurality of resource configurations includes at least a different type or amount of compute instances. A performance metric is measured for each of the plurality of the executions, and can be used along with a desired performance characteristic to generate a recommended resource configuration for the ML training job. The ML training job is executed using the recommended resource configuration.
DYNAMIC RENEWABLE RUNTIME RESOURCE MANAGEMENT
A system and method is provided for dynamic renewable runtime resource management in response to flexible resource allocations by a processor. In embodiments, a method includes: calculating, by a processor of a system, a resource consumption value of a first workload by aggregating allocation values of persistent resources currently allocated to the first workload by the processor; determining, by the processor, that the resource consumption value of the first workload is greater than a predefined resource allocation target for the first workload; and temporarily adjusting, by the processor, a renewable runtime resource target of the first workload from an initial target value to a temporary target value based on the resource consumption value.
COGNITIVE SCHEDULER FOR KUBERNETES
Embodiments are directed to deploying a workload on the best/highest performance node. Nodes configured to accommodate a request for a workload are selected. Information is collected on each of the selected nodes and the workload. Predicted response times expected for the workload running on each of the selected nodes are determined. The workload is deployed on a node of the selected nodes, the node having a corresponding predicted response time for the workload, the workload being deployed on the node based at least in part on the corresponding predicted response time.
System and method for automatically scaling a cluster based on metrics being monitored
In accordance with an embodiment, described herein is a system and method for use in a distributed computing environment, for automatically scaling a cluster based on metrics being monitored. A cluster that comprises a plurality of nodes or brokers and supports one or more colocated partitions across the nodes, can be associated with an exporter process and alert manager that monitors metrics associated with the cluster. Various metrics can be associated with user-configured alerts that trigger or otherwise indicate the cluster should be scaled. When a particular alert is raised, a callback handler associated with the cluster, for example an operator, can automatically bring up one or more new nodes, that are added to the cluster, and then reassign a selection of existing colocated partitions to the new nodes/brokers, such that computational load can be distributed within the newly-scaled cluster environment.