G06F9/5083

Data query method, apparatus and device

A method including obtaining resource overheads according to feature information of a received query request; according to the resource overheads and a compute node resource, dynamically adjusting a compute node in a resource pool; and querying, by using the compute node, data corresponding to the query request. A compute node in a resource pool may be dynamically adjusted, so that the compute node in the resource pool may process all the received query requests, and therefore, the processing efficiency and a resource utilization rate of the compute node are more effectively improved, such that the compute node may more efficiently perform parallel processing on the multiple query requests, and the utilization rates of a CPU resource, a memory resource and a network bandwidth resource are increased, thus achieving better effect from the perspectives of overall computing resource and user query load and improving the usage experience of a user.

TECHNIQUES FOR MODIFYING CLUSTER COMPUTING ENVIRONMENTS

Systems, devices, and methods discussed herein are directed to intelligently adjusting the set of worker nodes within a computing cluster. By way of example, a computing device (or service) may monitor performance metrics of a set of worker nodes of a computing cluster. When a performance metric is detected that is below a performance threshold, the computing device may perform a first adjustment (e.g., an increase or decrease) to the number of nodes in the cluster. Training data may be obtained based at least in part on the first adjustment and utilized with supervised learning techniques to train a machine-learning model to predict future performance changes in the cluster. Subsequent performance metrics and/or cluster metadata may be provided to the machine-learning model to obtain output indicating a predicted performance change. An additional adjustment to the number of worker nodes may be performed based at least in part on the output.

Efficient high bandwidth shared memory architectures for parallel machine learning and AI processing of large data sets and streams

The present disclosure relates to systems and methods to implement efficient high-bandwidth shared memory systems particularly suited for parallelizing and operating large scale machine learning and AI computing systems necessary to efficiently process high volume data sets and streams.

LEADER ELECTION IN A DISTRIBUTED SYSTEM BASED ON NODE WEIGHT AND LEADERSHIP PRIORITY BASED ON NETWORK PERFORMANCE

Example implementations relate to consensus protocols in a stretched network. According to an example, a distributed system includes continuously monitoring network performance and/or network latency among a cluster of a plurality of nodes in a distributed computer system. Leadership priority for each node is set based at least in part on the monitored network performance or network latency. Each node has a vote weight based at least in part on the leadership priority of the node. Each node's vote is biased by the node's vote weight. The node having a number of biased votes higher than a maximum possible number of votes biased by respective vote weights received by any other node in the cluster is selected as a leader node.

DATA LOCALITY FOR BIG DATA ON KUBERNETES
20230222004 · 2023-07-13 ·

Controlling data locality in a Kubernetes computing environment by establishing a Kubernetes computing environment including a controller and at least one executor pod for running an application, and receiving a request for a task to be run in the Kubernetes computing environment. The controller dispatches a sidecar to collect resource data from the at least one executor pod for an input to a directed acyclic graph (DAG) feature analyzer. The directed acyclic graph (DAG) feature analyzer identifies from the at least one executor pod a best dynamic resource that are available to execute. The at least one executor pod meeting the best dynamic resource that is available executes the task to be run in the Kubernetes computing.

CONSENSUS-BASED DISTRIBUTED SCHEDULER
20230221996 · 2023-07-13 ·

Methods and systems for managing workload performance in distributed systems is disclosed. The distributed system may include any number of data processing systems that may perform workloads. To manage workload performance, the distributed system may include a distributed control plane. The distributed control plane may include any number of data processing systems that both receive and service workload requests. When a workload request is received by one of the data processing systems of the control plane, a consensus based processing for a selecting one of the data processing systems to perform the workload may be performed. Consequently, the data processing system that received the workload request may or may not perform the workload to service the workload request depending on the outcome of the consensus based process.

SYSTEM AND METHOD FOR METADATA-INFORMED CONTAINER DEPLOYMENT
20230222008 · 2023-07-13 ·

Methods and systems for managing the performance of workloads in a distributed system are disclosed. The distributed system may include any number of clients and deployments where workloads may be performed. The deployments may include different hardware resources, may have different levels of performance, and/or may have other different characteristics that may impact performance of a given workload using any of the deployments. To service the workloads, container instances may be deployed to various deployments. When deciding where to deploy the container instances, the hardware resources of the deployments and/or resource expectations associated with the container instances may be taken into account. By doing so, container instances may be more likely to be deployed to deployments that meet their resource expectations.

Phantom queue link level load balancing system, method and device
11700204 · 2023-07-11 · ·

A data processing system includes a phantom queue for each of a plurality of output ports each associated with an output link for outputting data. The phantom queues receive/monitor traffic on the respective ports and/or the associated links such that the congestion or traffic volume on the output ports/links is able to be determined by a congestion mapper coupled with the phantom queues. Based on the determined congestion level on each of the ports/links, the congestion mapper selects one or more non or less congested ports/links as destination of one or more packets. A link selection logic element then processes the packets according to the selected path or multi-path thereby reducing congestion on the system.

Load balancing of machine learning algorithms

A computer implemented method of executing a plurality of discrete software modules each including a machine learning algorithm as an executable software component configurable to approximate a function relating a domain data set to a range data set; a data store; and a message handler as an executable software component arranged to receive input data and communicate output data for the module, wherein the message handler is adapted to determine domain parameters for the algorithm based on the input data and to generate the output data based on a result generated by the algorithm, each module having associated a metric of resource utilization by the module, the method including receiving a request for a machine learning task; and selecting a module from the plurality of modules for the task based on the metric associated with the module.

Cluster resource management in distributed computing systems

Techniques are provided for managing resources among clusters of computing devices in a computing system. Resource reassignment message are generated for indicating that servers are reassigned and in response to resource compute loads exceed or fall below certain thresholds. Techniques also include establishing communications with the reassigned servers to assign compute loads without physically relocating the servers from one cluster to another cluster.