Patent classifications
G06F11/0784
Method for deterministically reporting cause and effect in software systems
Negative outcomes experienced by a user in a live software system can be automatically, deterministically, and contemporaneously traced back to the root conditions that caused those outcomes, by generating causal event entries in a database for those root conditions as they occur, assigning unique causal IDs to those causal events, and propagating causal IDs alongside the software system state changes that are known to produce negative outcomes and which are effected by those root conditions. By selectively passing causal IDs based on the input and output values of the operation, subsequent causal events and negative outcomes can be linked accurately to causal IDs of parent events, making it simpler to trace negative outcomes for the user back to their root cause events in a software system.
Information processing system, information processing method, and non-transitory recording medium
An information processing system, an information processing method, and a non-transitory recording medium. The information processing system receives from a device, a number of times of writing operations to one or more memories included in the device and counter information of the device, determines whether there is a malfunction or a probability of malfunction based on the number of times of writing operations received from the device, and identifies software that causes or is likely to cause the malfunction based on the counter information in response to determination of the malfunction, or the probability of malfunction.
Re-initiation of microservices utilizing context information provided via service calls
An apparatus comprises a processing device configured to identify, at a first microservice, a service call that is to be transmitted to a second microservice, and to modify the service call to include context information, the context information characterizing a current state of execution of one or more tasks by one of the first microservice and the second microservice. The processing device is further configured to provide, from the first microservice to the second microservice, the modified service call including the context information. The context information enables re-initiation of said one of the first microservice and the second microservice to continue execution of the one or more tasks from the current state.
Virtual Machines Recoverable From Uncorrectable Memory Errors
The disclosed technology provides techniques, systems, and apparatus for containing and recovering from uncorrectable memory errors in distributed computing environment. An aspect of the disclosed technology includes a hypervisor or virtual machine manager that receives signaling of an uncorrectable memory error detected by a host machine. The virtual machine manager then uses information received via the signaling to identify virtual memory addresses or memory pages associated with the corrupted memory element so as to allow for containment and recovery from the error.
METHOD AND APPARATUS FOR THE ENHANCED DIAGNOSTIC COVERAGE OF A SECONDARY DEVICE OF A REDUNDANT CONTROLLER PAIR
A method for use by a primary device associated with a secondary device of a redundant pair, the primary device issuing a synchronization request to its control database causing the primary device to send a tracked memory file storage of the primary device to the secondary device to update the secondary device control database and to periodically send on request of the primary device the cached changes made in the primary device to the secondary device to update the secondary device control database. The secondary device using the updated control database to identify communications connections and paths to I/O modules and peer devices assigned to the secondary device and to perform diagnostic testing of the communications connections and paths identified by the interrogation and send diagnostic messages upon detection of faults in the communication connections and paths identified.
Service issue prioritisation based on impact using software telemetry
A system is provided herein that can correlate service issues with system telemetry associated with the software session associated with those service issues. Using a statistical approach, the system can evaluate data across numerous software sessions to rank the importance of the reported service issues. To accomplish the ranking, the system can parse the reports of service issues on a periodic basis, can extract telemetry identifiers (IDs) from the logs, can query the telemetry, may compute the relative importance of detected issues (in the context of calls going on for that day), and then can report this impact hack to the service issue database.
Error handling during asynchronous processing of sequential data blocks
A data analytics system stores a data file that includes an ordered set of data blocks. The data blocks can be parsed out of order. An error management module of the data analytics system detects a parse error occurring during parsing of a data block and generates an error message for the parse error. The error message includes unresolved location information indicating a location of the detected parse error in the data block. The error management module resolves the unresolved location information after determining that one or more additional data blocks preceding the data block in the ordered set have been parsed. The error management module generates resolved location information that indicates a location of the parse error in the data file. The error management module updates the error message with the resolved location information and outputs the updated error message.
Clustering of structured log data by key-values
Clustering structured log data by key-values includes receiving, via a user interface, a request to apply an operator to cluster a set of raw log messages according to values for a set of keys associated with the request. At least a portion of each raw log message comprises structured machine data including a set of key-value pairs. It further includes receiving a raw log message in the set of raw log messages. It further includes determining whether to include the raw log message in a cluster based at least in part on an evaluation of values in the structured machine data of the raw log message for the set of keys associated with the request. The cluster is included in a plurality of clusters. Each cluster in the plurality is associated with a different combination of values for the set of keys associated with the request. It further includes providing, via the user interface, information associated with the cluster.
LOGGING MESSAGES IN A BASEBOARD MANAGEMENT CONTROLLER USING A CO-PROCESSOR
Embodiments of this disclosure are directed towards a method of logging messages in a baseboard management controller (BMC) system. The method includes powering on a processing chip of the BMC system, wherein the processing chip has a main processor and a co-processor that is communicatively coupled to a non-transitory processor-readable memory device and snooping interface. The method further includes booting up the co-processor, and initiating a storage portion of the non-transitory processor-readable memory device the snooping interface. The method further includes triggering a boot-up of the main processor, and receiving, via the snooping interface, the messages redirected from a communication interface of the BMC system.
REDUCING OVER-REPORTING OF SERVICEABLE EVENTS
Described are techniques including a computer-implemented method of determining, by a service processor, that a first set of callouts of a first error log matches a previous set of callouts of a previous error log. The method further comprises combining the first error log with the previous error log in a first group in a service processor log of the service processor. The method further comprises transmitting information related to the first group to a management console communicatively coupled to the service processor.