Patent classifications
G06F11/3404
Content-sensitive container scheduling on clusters
Content-sensitive computer applications are run on computer hosts in a computer cluster. A computer application running history including but not limited to computer host scores is recorded for the computer cluster. A request to start a container for a content-sensitive computer application type in the computer cluster is received. In response, available computer hosts in the computer cluster are identified. A specific computer host is selected from among the available computer hosts to run the container for the content-sensitive computer application type.
Method and apparatus of monitoring interface performance of distributed application, device and storage medium
The present disclosure discloses a method and apparatus of monitoring an interface performance of a distributed application, a device and a storage medium, which relates to a field of computer technology, in particular to a field of cloud platform. The method includes: in case of detecting a caller request for calling an interface of the distributed application, obtaining a performance data of the interface for responding the caller request; updating a performance data distribution characteristic of the interface according to the performance data of the interface for responding the caller request, so as to obtain an updated performance data distribution characteristic; and monitoring the interface performance of the distributed application, according to the updated performance data distribution characteristic of the interface.
CONCURRENT DISTRIBUTED GRAPH PROCESSING SYSTEM WITH SELF-BALANCE
Techniques are provided for dynamically self-balancing communication and computation. In an embodiment, each partition of application data is stored on a respective computer of a cluster. The application is divided into distributed jobs, each of which corresponds to a partition. Each distributed job is hosted on the computer that hosts the corresponding data partition. Each computer divides its distributed job into computation tasks. Each computer has a pool of threads that execute the computation tasks. During execution, one computer receives a data access request from another computer. The data access request is executed by a thread of the pool. Threads of the pool are bimodal and may be repurposed between communication and computation, depending on workload. Each computer individually detects completion of its computation tasks. Each computer informs a central computer that its distributed job has finished. The central computer detects when all distributed jobs of the application have terminated.
Graph-based data multi-operation system
A graph-based data multi-operation system includes a data multi-operation management subsystem coupled to an application and accelerator subsystems. The data multi-operation management subsystem receives a data multi-operation graph from the application that identifies first data and defines operations for performance on the first data to transform the first data into second data. The data multi-operation management subsystem assigns each of the operations to at least one of the accelerator systems, and configures the accelerator subsystems to perform the operations in a sequence that transforms the first data into the second data, When the data multi-operation management subsystem determine a completion status for the performance of the operations by the accelerator subsystems, it transmits a completion status communication to the application that indicates the completion status of the performance of the plurality of operations by the plurality of accelerator subsystems.
Out-of-order processor that avoids deadlock in processing queues by designating a most favored instruction
An instruction sequencing unit in an out-of-order (OOO) processor includes a Most Favored Instruction (MFI) mechanism that designates an instruction as an MFI. The processing queues in the processor identify when they contain the MFI, and assures processing the MFI. The MFI remains the MFI until it is completed or is flushed, and which time the MFI mechanism selects the next MFI.
Allocating operators of a streaming application to virtual machines based on monitored performance
Performance thresholds are defined for operators in a flow graph for a streaming application. A streams manager deploys the flow graph to one or more virtual machines (VMs). The performance of each portion of the flow graph on each VM is monitored. A VM is selected. When the performance of the portion of the flow graph in the selected VM does not satisfy the defined performance threshold(s), a determination is made regarding whether the portion of the flow graph is underperforming or overperforming. When the portion of the flow graph is underperforming, the portion of the flow graph is split into multiple portions that are implemented on multiple VMs. When the portion of the flow graph is overperforming, a determination is made of whether a neighbor VM is also overperforming. When a neighbor VM is also overperforming, the two VMs may be coalesced into a single VM.
Adaptive datacenter topology for distributed frameworks job control through network awareness
Systems, methods, and computer program products to perform an operation comprising receiving a priority of a distributed computing job, an intermediate traffic type of the distributed computing job, and a set of candidate compute nodes available to process the distributed computing job, the candidate compute nodes each available to process at least one input split of the distributed computing job, and selecting a mapper node from the candidate compute nodes, for one of the input splits, wherein the mapper node is selected based on the priority and the intermediate traffic type of the distributed computing job, wherein the mapper compute node is further selected upon determining that the mapper node is not affected by an error, and a resource utilization score for the mapper node does not exceed a utilization threshold.
Per-block sort for performance enhancement of parallel processors
A method of enhancing performance of an application executing in a parallel processor and a system for executing the method are disclosed. A block size for input to the application is determined. Input is partitioned into blocks having the block size. Input within each block is sorted. The application is executed with the sorted input.
Content-sensitive container scheduling on clusters
Content-sensitive computer applications are run on computer hosts in a computer cluster. A computer application running history including but not limited to computer host scores is recorded for the computer cluster. A request to start a container for a content-sensitive computer application type in the computer cluster is received. In response, available computer hosts in the computer cluster are identified. A specific computer host is selected from among the available computer hosts to run the container for the content-sensitive computer application type.
Forming root cause groups of incidents in clustered distributed system through horizontal and vertical aggregation
A system and method for the aggregation and grouping of previously identified, causally related abnormal operating condition, that are observed in a monitored environment, is disclosed. Agents are deployed to the monitored environment which capture data describing structural aspects of the monitored environment, as well as data describing activities performed on it, like the execution of distributed transactions. The data describing structural aspects is aggregated into a topology model which describes individual components of the monitored environments, their communication activities and resource dependencies and which also identifies and groups components that serve the same purpose, like e.g. processes executing the same code. Activity related monitoring data is constantly monitored to identify abnormal operating conditions. Data describing abnormal operating condition is analyzed in combination with topology data to identify networks of causally related abnormal operating conditions. Causally related abnormal operating conditions are then grouped using known topological resource and same purpose dependencies. Identified groups are analyzed to determine their root cause relevance.