Patent classifications
G06F11/2263
Data stream fault detection mechanism in an artificial neural network processor
Novel and useful system and methods of several functional safety mechanisms for use in an artificial neural network (ANN) processor. The mechanisms can be deployed individually or in combination to provide a desired level of safety in neural networks. Multiple strategies are applied involving redundancy by design, redundancy through spatial mapping as well as self-tuning procedures that modify static (weights) and monitor dynamic (activations) behavior. The various mechanisms of the present invention address ANN system level safety in situ, as a system level strategy that is tightly coupled with the processor architecture. The NN processor incorporates several functional safety concepts which reduce its risk of failure that occurs during operation from going unnoticed. The mechanisms function to detect and promptly flag and report the occurrence of an error with some mechanisms capable of correction as well. The safety mechanisms cover data stream fault detection, software defined redundant allocation, cluster interlayer safety, cluster intralayer safety, layer control unit (LCU) instruction addressing, weights storage safety, and neural network intermediate results storage safety.
MAZE-DRIVEN SELF-DIAGNOSTICS USING REINFORCEMENT LEARNING
Systems and methods are provided for automatedly troubleshooting a computing application (e.g., a cloud-based computing application). An application domain of the computing application is modeled as a two-dimensional array of cells, a first dimension of the array representing components or microservices of the application domain, and a second dimension of the array representing states of the components or microservices, the array including paths between pairs of cells in the array. A troubleshooting goal is defined as a target state of the application domain, the target state corresponding to a target cell in the array. An initial state of the application domain is also provided, the initial state corresponding to an initial cell in the array. A reinforcement-learning-trained machine-learning algorithm can determine a solution path in the array between the initial cell and the target cell. Divergence between a failure case and a solution path indicates a probable failure cause.
Analysis of memory sub-systems based on threshold distributions
Disclosed is a system comprising a memory component having a plurality of memory cells capable of being in a plurality of states, each state of the plurality of states corresponding to a value stored by the memory cell, and a processing device, operatively coupled with the memory component, to perform operations comprising: obtaining, for the plurality of memory cells, a plurality of distributions of threshold voltages, wherein each of the plurality of distributions corresponds to one of the plurality of states, classifying each of the plurality of distributions among one of a plurality of classes, generating a vector comprising a plurality of components, wherein each of the plurality of components represents the class of a respective one of the plurality of distributions, and processing, using a classifier, the generated vector to determine a likelihood that the memory component will fail within a target period of time.
METHOD FOR AUTOMATICALLY ANALYZING TRANSACTION LOGS OF A DISTRIBUTED COMPUTING SYSTEM
An aspect of the invention relates to a method for automatically analysing a transaction log of a distributed computing system, comprising a plurality of lines, the method comprising the following steps: For each line: Cutting the line into words; Constructing a comparison vector by comparing the line with the other lines of the same size as the line; Constructing a pattern from the comparison vector; Creating an event per pattern;
Constructing at least one prediction model by training an artificial neural network on a group of training events, the prediction model being configured to predict the next event in the transaction log;
For at least one event: Using the prediction model to predict the event, from a group of prediction events; Generating from the prediction model, a causal graph of the event comprising a causal relation for each event of the group of prediction events responding to a relevance condition.
METHODS AND APPARATUS TO ANALYZE PERFORMANCE OF WATERMARK ENCODING DEVICES
Methods, apparatus, systems, and articles of manufacture are disclosed that analyze performance of manufacturer independent devices. An example apparatus includes a software development kit (SDK) deployment engine to deploy an SDK to a manufacturer of a device, the SDK to define heartbeat data to be collected from the device and interfacing techniques to transmit the heartbeat data to a measurement entity. In some examples, the apparatus includes a machine learning engine to predict whether the device is associated with one or more failure modes. The example apparatus also includes an alert generator to generate an alert based on a prediction, the alert to indicate at least one of a type of a first one of the failure modes or at least one component of the device to be remedied according to the first one of the one or more failure modes, and transmit the alert to a management agent.
System and method for advanced detection of potential system impairment
Methods and systems for managing deployments are disclosed. A deployment may include one or more devices. The devices may include hardware and/or software components. The operation of the deployment may depend on the operation of these devices and components. To manage the operation of the deployment, a system may include a deployment manager. The deployment manager may obtain logs for components of the deployment reflecting the historical operation of these components and use the log to predict the future operation of the deployment. Based on the predictions, the deployment manager may take proactive action to reduce the likelihood of the deployment becoming impaired.
Automated system for intelligent error correction within an electronic blockchain ledger
A system for automated and intelligent error correction within an electronic blockchain ledger is provided. The system may analyze unformatted/unstructured blockchain event logs using machine learning algorithms in order to identify and label the errors within the event logs. Based on the identified errors, the system may use predictive analysis in conjunction with error or rule repositories and/or machine learning to identify potential solutions to the identified errors. Once the potential solutions have been identified, the system may automatically attempt to rectify the blockchain transaction errors using the potential solutions. The system may further comprise trend/correlation analyses and reporting functions regarding various metrics and may output said metrics in various accessible formats.
Using an event graph schema for root cause identification and event classification in system monitoring
An event graph schema for a technology landscape may be determined, where the technology landscape is characterized using scores assigned to performance metrics. The event graph schema may include a plurality of nodes corresponding to the performance metrics and the scores, and directional edges connecting node pairs of the plurality of nodes, with each directional edge having a score-dependent validity criteria defined by scores of a corresponding node pair. Anomalous scores associated with an event within the technology landscape may be used to find anomalous nodes. Valid edges connecting two of the anomalous nodes and satisfying the score-dependent validity criteria thereof may be used to determine at least one path that includes the valid edges and connected anomalous nodes. In this way, it is possible to traverse the at least one path to identify at least one of the connected anomalous nodes as a root cause node of the event.
MAZE-DRIVEN SELF-DIAGNOSTICS USING REINFORCEMENT LEARNING
Systems and methods are provided for automatedly troubleshooting a computing application (e.g., a cloud-based computing application). An application domain of the computing application is modeled as a two-dimensional array of cells, a first dimension of the array representing components or microservices of the application domain, and a second dimension of the array representing states of the components or microservices, the array including paths between pairs of cells in the array. A troubleshooting goal is defined as a target state of the application domain, the target state corresponding to a target cell in the array. An initial state of the application domain is also provided, the initial state corresponding to an initial cell in the array. A reinforcement-learning-trained machine-learning algorithm can determine a solution path in the array between the initial cell and the target cell. Divergence between a failure case and a solution path indicates a probable failure cause.
Synthetic system fault generation
Systems, computer-implemented methods, and computer program products that facilitate synthetic system fault generation are provided. According to an embodiment, a system can comprise a processor that executes the following computer-executable components stored in a non-transitory computer readable medium: a generator component that employs a trained artificial intelligence (AI) model to generate a synthetic system fault, represented as a combination of discrete parameters and continuous parameters that define a system state; and a fault assembler component that analyzes the synthetic system fault and generates textual content corresponding to the synthetic system fault.