Patent classifications
G06F11/3419
DETECTING ANOMALOUS LATENT COMMUNICATIONS IN AN INTEGRATED CIRCUIT CHIP
A method of detecting anomalous latencies in communications between components on an integrated circuit (IC) chip. The method includes: (i) monitoring communications between a first component of the IC chip and other components of the IC chip, each communication comprising a command sent from the first component to another component, and a response received by the first component from that other component, the monitoring comprising: measuring the number of communications in each of a series of monitored time windows, and measuring the latency of each communication in the series of monitored time windows; (ii) calculating a maximum tolerable latency for each operational time window of the first component from the number of communications in that operational time window, an available stall time of the first component in that operational time window, and a latency penalty factor for that operational time window; and (iii) determining a measured latency to be anomalous if the measured latency is greater than the maximum tolerable latency.
Failure Prediction Using Informational Logs and Golden Signals
Embodiments relate to a computer platform to support processing of informational logs and corresponding performance data to detect and mitigate occurrence of anomalous behavior. Metrics are extracted from the informational logs and correlated with performance data, and in an exemplary embodiment golden signal metrics. A window or block of the logs is classified as potential candidates or indicators of anomalous behavior, which in an embodiment is indicative of potential failure or service outage. A control signal is dynamically issued to an operatively coupled device associated with the window or block of logs. The control signal is configured to selectively control a state of a physical device or process controlled by software, with the control directed at mitigating or eliminating the effect(s) of the anomalous behavior.
Print management device and computer readable medium
A print management device includes: an estimation unit that estimates a completion prediction time of a print process that is planned in advance according to a processing capability of a printing device, the estimation unit estimating, in response to occurrence of an abnormality in the printing device, the completion prediction time based on actual performance information on the processing capability from a start of printing by the printing device to the occurrence of the abnormality and a recovery time determined in advance, the recovery time being a time needed for dealing with the abnormality; and a notification unit that notifies the completion prediction time estimated by the estimation unit.
System and method for identifying SSDs with lowest tail latencies
A storage device is disclosed. The storage device may include storage to store data and a controller to manage reading data from and writing data to the storage. The controller may also include a receiver to receive a plurality of requests, information determination logic to determine information about the plurality of requests, storage for the information about a plurality of requests, and sharing logic to share the information with a management controller.
Adaptive, proactive raid rebuild
A data storage system includes a plurality of storage devices organized as a redundant array of inexpensive disks (RAID) storage array and a RAID controller. The RAID controller monitors the plurality of storage devices in the RAID storage array. The RAID controller also detects that a host read request of a host has a latency exceeding a latency threshold. Based on the monitoring, the RAID controller determines whether a proactive rebuild of a data requested by the host read request in absence of a data error would likely be beneficial to performance. Based on determining that a proactive rebuild of the data requested by the host read request would likely be beneficial to performance, the RAID controller initiates the proactive rebuild of the data and sends the requested data to the host.
Failure Prediction In Distributed Environments
Embodiments of the invention are directed to systems, method, and devices for detecting failures in distributed systems. A failure detection platform may identify anomalies in time series data, the time series data corresponding to historical network messages. The anomalies can be labeled and used to train a first predictive model. At least one other model may be trained using the time series data, the anomaly labels and a supervised machine-learning algorithm. A third model can be trained to identify a system failure based at least in part on the outputs provided by the first and the second model. The third model, once trained, can be utilized to predict a future system failure.
INTELLIGENT QUERY PLAN CACHE SIZE MANAGEMENT
A method for intelligent query plan cache size management can be implemented. During execution of a plurality of incoming queries in a database management system, the method can measure actual compilation times of generating query execution plans for the plurality of incoming queries. The database management system can have a query execution plan cache which has a size that can store at least some of the query execution plans. The method can monitor differences between the actual compilation times and ideal compilation times of generating query execution plans for the plurality of incoming queries. The ideal compilation times can be estimated by assuming no query execution plan is evicted from the query execution plan cache. The method can adjust the size of the query execution plan cache based on the monitored differences.
Enhanced application performance framework
This document describes a framework for measuring and improving the performance of applications, such as distributed applications and web applications. In one aspect, a method includes performing a test on an application. The test includes executing the application on one or more computers and, while executing the application, simulating a set of workload scenarios for which performance of the application is measured during the test. While performing the test, a set of performance metrics that indicate performance of individual components involved in executing the application during the test is obtained. A knowledge graph is queried using the set of performance metrics. The knowledge graph links the individual components to corresponding performance metrics and defines a set of hotspot conditions that are each based on one or more of the corresponding performance metrics for the individual components. A given hotspot condition is detected based on the set of performance metrics.
Methods and systems for power failure resistance for a distributed storage system
A plurality of computing devices are communicatively coupled to each other via a network, and each of the plurality of computing devices is operably coupled to one or more of a plurality of storage devices. One or more of the computing devices and/or the storage devices may be used to rebuild data that may be lost due to a power failure.
System and method of smart framework for troubleshooting performance issues
A system for displaying a performance dashboard comprises an input interface, a processor, and an output interface. The input interface is configured to receive log data. The log data comprises a set of process log entries. The processor is configured to determine one or more daemon response times and to determine dashboard information. The dashboard information is based at least in part on the log data and the one or more daemon response times. The output interface is configured to provide the dashboard information.