Patent classifications
G06F11/0793
Method for runtime mitigation of software and firmware code weaknesses
A system of mitigating code weaknesses in a target code by adding micro functionality fixes. The system includes a mitigation module installed a memory chip of a device and a server for identifying a plurality of code weaknesses in a target code installed in a memory chip of a device and sending configuration instructions to the mitigation module, the configuration instructions comprising: a plurality of micro functionality fixes, and a plurality of code weakness locations each associated with one of the plurality of code weaknesses and one of the plurality of micro functionality fixes. The execution of the mitigation module by at least one processor of the device induces an installment of the plurality of micro functionality fixes in the plurality of code weakness locations.
Data processing platform monitoring
A device may receive error data indicating that an error occurred, the error being associated with a data processing job scheduled to be performed by a data processing platform. The device may identify input data for the data processing job associated with the error and determine that the error is based on the data processing platform not receiving the input data. In addition, the device may determine a location of the input data and determine a measure of priority associated with the data processing job. Based on the location of the input data and the measure of priority, the device may perform an action to correct the error.
Determining and implementing recovery actions for containers to recover the containers from failures
A system may include a registration module to register the system with a server cluster and a resource collector module operatively connected to the registration module, the resource collector module to identify a list of resources for a container running on the server cluster. The system may also include a resource monitor module operatively connected to the resource collector module, the resource collector module to receive the list of resources for the container, monitor a resource in the list of resources for the container, and generate an event for the container and an event manager module operatively connected to the resource monitor module, the event manager to receive the event and determine a recovery action for the container.
Hyper-parameter space optimization for machine learning data processing pipeline
A data processing pipeline may be generated to include an orchestrator node, a preparator node, and an executor node. The preparator node may generate a training dataset. The executor node may execute machine learning trials by applying, to the training dataset, a machine learning model and/or a different set of trial parameters. The orchestrator node may identify, based on a result of the machine learning trials, a machine learning model for performing a task. Data associated with the execution of the data processing pipeline may be collected for storage in a tracking database. A report including de-normalized and enriched data from the tracking database may be generated. The hyper-parameter space of the machine learning model may be analyzed based on the report. A root cause of at least one fault associated with the execution of the data processing pipeline may be identified based on the analysis.
Telematics device with input/output expansion power fault handling
A telematics device coupled to an input/output expander is provided. The telematics device includes a controller, an input/output expander interface for coupling the telematics device to the controller and a memory. The memory has machine-executable programming instructions which configure the telematics device to power-cycle the input/output expander interface by a plurality of power cycles having progressively increasing power-off durations, in response to detecting a power fault condition on the input/output expander interface.
Resolving erred 10 flows
A method for resolving an erred input/output (IO) flow, the method may include (i) sending over a path a remote direct write request associated with a certain address range; wherein the path is formed between a compute node of a storage system to a storage drive of the storage system; (ii) receiving by the compute node an error message related to the remote direct write request; wherein the error message does not indicate whether an execution of the remote direct write request failed or is only temporarily delayed; (iii) responding by the compute node to the error message by (a) preventing from sending one or more IO requests through the path, (b) preventing from sending at least one IO requests aimed to the certain address range; and (c) requesting, using a management communication link, to force an execution of pending IO requests that are related to the path; and (iv) reuse the path, by the compute node, following an indication that there are no pending IO requests that are related to the path.
AUTOMATED TROUBLESHOOTER
The technology described herein generates automated workflows from trouble shooting guides. The automated workflow generation process described herein starts with existing TSGs as the input. A first step in the process may be identifying the computer commands in the TSG. In one aspect, the commands are identified using a sequence-to-sequence model. Once a command is identified as a command, the command is associated with an application of origin. In aspects, a second model is used to identify the application associated with the command. The second model may be a metric-based meta-learning approach to associate a command with an application. Once the commands are identified and associated with an application, they may be parsed or extracted using a regular expression, which is a special text string describing a search pattern. The structure of the natural text is then parsed to build an executable decision tree and merged with the parsed commands.
Secure-Erase Prediction for Data Storage Devices
Systems and methods for predicting whether a nonvolatile memory block is likely capable of being securely erased to be eligible for composing into another composable infrastructure are described. A management module receives a secure-erase command to erase at least one nonvolatile memory block, determines health parameters of the nonvolatile memory block, calculates a failure index based on the health parameters, and, based on the failure index, either securely erases the block of memory or retires the nonvolatile memory block.
STORAGE SYSTEM
A first storage controller includes a first input and output controller performs input and output processing on host data, and a first management controller. A second storage controller includes a second input and output controller performs input and output processing on host data, and a second management controller. The first management controller is configured to verify software to be executed by the first management controller and software to be executed by the first input and output controller. The second management controller is configured to verify software to be executed by the second management controller and software to be executed by the second input and output controller. The first management controller is configured to verify the software to be executed by the second input and output controller in place of the second management controller when a failure is detected from the second management controller.
Generating actionable alert messages for resolving incidents in an information technology environment
Machine data reflecting operation of a monitored system is ingested and made available for search by a data intake and query system (DIQS). A monitoring function may search the data ingested by the DIQS to determine instances of notable events in regards to the monitored system and may further determine a defined invokable action message (IAM) associated with a notable event instance. Processing ensues to send an IAM to a communications device used by support personnel. The IAM includes information about an action invocation message (AIM) suitable to cause the performance of an action that possibly remedies or improves an operational condition represented by the notable event. Support personnel engages a user interface representation corresponding to the AIM and the AIM is sent to a remedial node where performance of the action is invoked.