G06F2201/86

System and method for facilitating management of cloud infrastructure by using smart bots

A system and method for facilitating management of cloud infrastructure by using smart bots is disclosed. The method includes obtaining one or more insights associated with one or more user accounts on a cloud infrastructure from one or more cloud infrastructure resources and determining one or more cloud infrastructure issues associated with the one or more user accounts by validating the obtained one or more insights based on a set of predefined rules. The method further includes creating one or more customized bots for the determined one or more cloud infrastructure issues based on one or more user parameters by using a rule engine based AI model and deploying the created one or more customized bots on the one or more cloud infrastructure resources. Further, the method includes managing the cloud infrastructure via the deployed one or more customized bots.

Systems and methods to identify production incidents and provide automated preventive and corrective measures

Various methods, apparatuses/systems, and media for identifying production incidents and implementing automated preventive and corrective measures are disclosed. A processor automatically triggers, in response to a generated incident of a job/process/host failure, a self-healing service. The processor identifies an application to which the event generated belongs to by accessing a database that stores the application and host details; fetches functional identification (ID) of the application from the database, identifies the type of job failure or service degradation; automatically executes, by utilizing predefined micro services, the steps required for mitigation; records, in response to executing, outcome of the mitigation in the database along with output at each stage of execution; and evaluates the outcome of the mitigation by executing health checks using micro services to determine whether the failed job or process or host is healthy; and closes the incident based on healthy determination.

EVENT VISUALIZATION FOR ASSET CONDITION MONITORING

Systems and methods for asset management are provided. Event data characterizing events experienced by assets distributed among different sites of a fleet is maintained. The event data includes an asset location within an asset hierarchy of the fleet and an event parameter corresponding to the event. A graphical user interface (GUI) is generated that displays a first window including a hierarchical list of assets organized according to their position within the asset hierarchy. When the GUI receives a selection of a level within the hierarchical list, events associated with the selected level can be identified. Identified events can be classified based upon their event data as a unique event having a single occurrence or a repeat event having multiple occurrences. In response to receipt of the selection, the GUI is updated to display a second window listing single entries for respective unique events and single entries for respective repeat events.

METHOD OF MANAGING DEBUGGING LOG IN STORAGE DEVICE

In a method of managing a debugging log in a storage device, an event trigger signal is generated based on an external power supply voltage and a plurality of configuration control signals. The event trigger signal is activated in response to an event of interest being issued for generating and storing the debugging log. The debugging log represents information associated with errors occurring in the storage device. The debugging log is generated based on the event trigger signal. The debugging log is stored in a nonvolatile memory. The event of interest includes at least one of a power up event a reset event, a link up event, a link down event or a power down event.

APPLICATION FAILURE TRACKING FEATURES

Examples described herein relate to systems and methods consistent with the disclosure. For instance, the system can comprise a processing resource, and a non-transitory machine-readable medium storing instructions executable by the processing resource determine when an application on the system is activated, monitor the application to determine an application failure using a tracking feature, take a snapshot of computing information related to the determined application failure, determine a coding language of the determined application failure, store the determined application failure, snapshot of computing information, and the coding language of the determined application to a memory device, and send the stored determined application failure, snapshot of computing information, and the coding language of the determined application to a server.

CONSTRAINED CARRIES ON SPECULATIVE COUNTERS

A computer-implemented method for of constrained carries on speculative counters includes providing one or more speculative counters having an upper portion of most significant bits partially embedded in a random-access memory (RAM) array, and a pre-counter portion external to the RAM array having a plurality of least significant bits. The one or more speculative counters are configured to count a plurality of events of interest during a processor core instruction execution. A carry output from the pre-counter portion to the RAM array is suppressed for a duration of a speculative event period.

APPLICATION TUNING BASED ON PERFORMANCE CHARACTERISTICS
20230004403 · 2023-01-05 · ·

According to examples, an apparatus may include a processor and a memory on which are stored machine-readable instructions that when executed by the processor, may cause the processor to receive information regarding a performance characteristic of an application during predetermined time periods. The processor may calculate a rate of change in the performance characteristic over the predetermined time periods. Based on a determination that the performance characteristic of the application has changed over the predetermined time periods, the processor may tune values of a set of parameters for the application based on the calculated rate of change in the performance characteristic.

Determining and implementing recovery actions for containers to recover the containers from failures

A system may include a registration module to register the system with a server cluster and a resource collector module operatively connected to the registration module, the resource collector module to identify a list of resources for a container running on the server cluster. The system may also include a resource monitor module operatively connected to the resource collector module, the resource collector module to receive the list of resources for the container, monitor a resource in the list of resources for the container, and generate an event for the container and an event manager module operatively connected to the resource monitor module, the event manager to receive the event and determine a recovery action for the container.

Hyper-parameter space optimization for machine learning data processing pipeline
11544136 · 2023-01-03 · ·

A data processing pipeline may be generated to include an orchestrator node, a preparator node, and an executor node. The preparator node may generate a training dataset. The executor node may execute machine learning trials by applying, to the training dataset, a machine learning model and/or a different set of trial parameters. The orchestrator node may identify, based on a result of the machine learning trials, a machine learning model for performing a task. Data associated with the execution of the data processing pipeline may be collected for storage in a tracking database. A report including de-normalized and enriched data from the tracking database may be generated. The hyper-parameter space of the machine learning model may be analyzed based on the report. A root cause of at least one fault associated with the execution of the data processing pipeline may be identified based on the analysis.

Enhanced tracking of data flows

Disclosed are various embodiments for tracking the flow of data through a network environment. A monitor can detect that a data transaction event has occurred. Then, the monitor can identify data involved in the data transaction event. Next, a trace identifier can be assigned to the data involved in the data transaction event. Subsequently, a transaction data subset representing a subset of the data involved in the data transaction event that is subject to a common data processing event can be identified. Then, a span identifier can be assigned to the transaction data subset. Next, a correlation identifier can be link to a combination of the span identifier and the trace identifier. Finally, a transaction event record can be written to a distributed ledger, the transaction event record comprising the span identifier and the transaction data subset.