Patent classifications
G06F11/0793
Error documentation assistance
An error documentation system including tools to collect and analyze application error data for individual development teams and tools to share documented defects and solutions across development teams during any stage of development cycle. The system may receive and analyze event logs for error events triggered by applications on end-user devices. The system may automatically generate defect tickets and/or ticket entries for defects identified in event logs. The system may train one or more machine learning (ML) models to correlate input with identified defects from a defects database. In response to identifying correlated identified defects, the system may generate ticket entries indicating the correlated identified defects and associated solutions for the defects. The system may provide an interface for users to query the data stored in the database.
Modifying conditions for memory device error corrections operations
In response to a determination that an error rating condition associated with a memory device is satisfied, a first error correction operation is performed at the memory device to correct one or more first errors associated with a first memory access operation at the memory device. A detection is made that at least one of a state of the memory device has changed from a first state to a second state or a behavior of the memory device has changed from a first behavior level to a second behavior level. The error rating condition is modified in view of the at least one of the second state of the memory device or the second behavior level of the memory device. In response to a determination that the modified error rating condition is satisfied, a second error correction operation is performed at the memory device to correct one or more second errors associated with a second memory access operation performed at the memory device.
Techniques for command execution using a state machine
Techniques for processing a request may include: providing tasks to a state machine framework, wherein the tasks perform processing of a workflow for servicing the request; generating, by the state machine framework, a state machine for processing the request, wherein the state machine includes states associated with the tasks, wherein generating the state machine may include automatically determining a first state transition of the state machine between a first and a second of the states; receiving the request; and responsive to receiving the request, performing first processing using the state machine to service the request. The framework may automatically generate triggers that drive the state machine to determine subsequent states in accordance with defined state transitions. State machine internal state information may be persistently stored and used in restoring the state machine to one of its states in connection processing of the command.
Apparatus configured to perform a repair operation
An apparatus includes a storage area signal generation circuit configured to generate a storage area signal when performing an internal information storage operation and an external information storage operation; and an information storage circuit configured to receive internal failure information, stored in the apparatus, based on the storage area signal and store the received internal failure information as failure information in a set storage capacity, and store external failure information, applied from outside the apparatus, as the failure information in a variable storage capacity.
Systems and methods to identify production incidents and provide automated preventive and corrective measures
Various methods, apparatuses/systems, and media for identifying production incidents and implementing automated preventive and corrective measures are disclosed. A processor automatically triggers, in response to a generated incident of a job/process/host failure, a self-healing service. The processor identifies an application to which the event generated belongs to by accessing a database that stores the application and host details; fetches functional identification (ID) of the application from the database, identifies the type of job failure or service degradation; automatically executes, by utilizing predefined micro services, the steps required for mitigation; records, in response to executing, outcome of the mitigation in the database along with output at each stage of execution; and evaluates the outcome of the mitigation by executing health checks using micro services to determine whether the failed job or process or host is healthy; and closes the incident based on healthy determination.
DATA TAPE MEDIA QUALITY VALIDATION AND ACTION RECOMMENDATION
Techniques for generating action recommendations for a data tape system are disclosed. A data tape system generates action recommendations for a data tape based on library-based metadata messages as well as a measured data quality value of the data tape. The system initiates an operation resulting in the data tape interacting with a media drive. A data tape library controller generates one or more metadata messages based on a result of a requested operation. The metadata message may include information regarding the type of error and a default recommended course of action. The system generates the recommended action for the data tape using a trained machine learning model.
REDUCING FALSE POSITIVE FAILURE EVENTS IN A HEALTH MONITORING SYSTEM USING DIRECTIONAL GRAPHS
Embodiments for reducing panic shutdown of components in a pipelined data processing system. Components are monitored for health, processing progress, and dependencies during normal system operation. A directed graph is generated showing non-circular dependencies of components in the pipeline. Deadlock of a particular component may or may not signal a panic condition depending on whether any of its presently downstream and depended on components are operating properly. The continuously monitored knowledge of proper operation of all downstream components is thus used to intelligently apply or defer panic alerts to keep the system operating uninterrupted from panic conditions that might soon or eventually be fixed by continued operation of the system pipeline.
IDENTIFYING AND COLLECTING DATA FROM ASSETS OF A SYSTEM UNDER EVALUATION BY A SYSTEM ANALYSIS SYSTEM
An analysis system determines a system aspect of a system, determines an evaluation perspective for use in performing an asset management evaluation on the system aspect relating to a build of the system, an evaluation viewpoint corresponding to discovered information of the system and selects a plurality of data structures identifying data to be collected based thereupon. The analysis system, based upon the system aspect, the evaluation perspective, the evaluation viewpoint, and the plurality of data structures, determining context data. Based upon the plurality of data structures, the analysis system identifies a plurality of physical assets of the system for collection of data, queries the plurality of physical assets of the system to collect data to populate the plurality of data structures. The analysis system evaluates the data structures using the context data to produce an evaluation of at least some of the plurality of physical assets of the system.
Cross-component health monitoring and improved repair for self-healing platforms
Systems, apparatuses and methods may provide for technology that detects a successful boot of a first firmware component in a computing system, receives a signal from a second firmware component in the computing system, and detects an incompatibility of the first firmware component with respect to the second firmware component based on the signal. In one example, only the first firmware component is repaired in response to the incompatibility.
Communication apparatus, communication method, program, and communication system
A CCI (I3C SDR) processing section determines status of an index when requested to be accessed by an I3C master for a read operation. An error handling section then controls an I3C slave 13 to detect occurrence of an error based on the status of the index and to neglect all communication until communication is restarted or stopped by the I3C master, the I3C slave 13 being further controlled to send a NACK response when performing acknowledge processing on a signal sent from the I3C master. This technology can be applied to the I3C bus, for example.