G06F11/0712

Insider attack resistant system and method for cloud services integrity checking

An insider attack resistant system for providing cloud services integrity checking is disclosed. In particular, the system utilizes an automated integrity checking script and virtual machines to check the integrity of a service. The system may utilize the integrity checking script and virtual machines to execute a set of operations associated with the service so as to check the integrity of the service. When executing the set of operations, the system may only have access to the minimum level of access to peripherals that is required for each operation in the set of operations to be executed. After each operation is executed, the system may log each result for each operation, and analyze each result to determine if a failure exists for any of the operations. If a failure exists, the system may determine that a change in an expected system behavior associated with the service has occurred.

UTILIZING AUTOMATIC LABELLING, PRIORITIZING, AND ROOT CAUSE ANALYSIS MACHINE LEARNING MODELS AND DEPENDENCY GRAPHS TO DETERMINE RECOMMENDATIONS FOR SOFTWARE PRODUCTS

A device may receive software data identifying current logs and events associated with software products utilized by an entity and may process the software data, with a machine learning model, to generate error severity scores for the software products. The machine learning model may be trained based on historical software data identifying events and logs associated with software products utilized by the entity and based on a combination of historical health scores, historical sentiment scores, and historical dissimilarity scores for the software products. The device may process the error severity scores, with a prioritization model, to generate prioritized error scores and may process the error severity scores and the prioritized error scores, with a root cause analysis model, to generate root cause data identifying root causes associated with the error severity scores. The device may perform one or more actions based on the root cause data.

SYSTEM AND METHOD FOR A DISASTER RECOVERY ENVIRONMENT TIERING COMPONENT MAPPING FOR A PRIMARY SITE
20230229551 · 2023-07-20 ·

A method for managing specialized hardware resources includes obtaining, by a disaster recovery (DR) virtual resource agent, a request for a DR environment for a set of virtual resources in a primary site, in response to the request: monitoring the primary site to obtain virtual workload information corresponding to the set of virtual resources, performing a workload analysis on the set of virtual resources in the primary site using the virtual workload information to obtain a virtual resource mapping of each virtual resource in the primary site to a tiered component in the DR environment, and initiating a DR environment allocation of DR virtual resources based on the virtual resource mapping.

Determining a future operation failure in a cloud system

Examples described relate to determining a future operation failure in a cloud system. In an example, a historical utilization of resources for performing an operation in a cloud system may be determined. A current utilization of resources in the cloud system may be determined. Based on the historical utilization of resources for performing the operation in the cloud system and the current utilization of resources in the cloud system, a determination may be made whether a future performance of the operation in the cloud system is likely to be a failure. In response to a determination that the future performance of the operation in the cloud system is likely to be a failure, an alert may be generated.

Fast memory mapped IO support by register switch
11693722 · 2023-07-04 · ·

The technology disclosed herein enhances a fault-based communication channel between a virtual machine and a hypervisor. An example method may include: configuring, by a hypervisor, a first memory location to generate one or more faults when accessed by a virtual machine process, wherein the first memory location is mapped to a device and a second memory location is mapped to memory; detecting, by the hypervisor, a fault caused by a first execution of an instruction of the virtual machine process, wherein the instruction comprises a reference to a register comprising the first memory location; responsive to the detecting the fault, the hypervisor performing a computing task for the virtual machine process and updating the register to comprise the second memory location; and initiating, by the hypervisor, a second execution of the instruction of the virtual machine process, wherein the second execution of the instruction accesses the second memory location.

Data processing platform monitoring

A device may receive error data indicating that an error occurred, the error being associated with a data processing job scheduled to be performed by a data processing platform. The device may identify input data for the data processing job associated with the error and determine that the error is based on the data processing platform not receiving the input data. In addition, the device may determine a location of the input data and determine a measure of priority associated with the data processing job. Based on the location of the input data and the measure of priority, the device may perform an action to correct the error.

Determining and implementing recovery actions for containers to recover the containers from failures

A system may include a registration module to register the system with a server cluster and a resource collector module operatively connected to the registration module, the resource collector module to identify a list of resources for a container running on the server cluster. The system may also include a resource monitor module operatively connected to the resource collector module, the resource collector module to receive the list of resources for the container, monitor a resource in the list of resources for the container, and generate an event for the container and an event manager module operatively connected to the resource monitor module, the event manager to receive the event and determine a recovery action for the container.

Watchdog circuit, circuit, system-on-chip, method of operating a watchdog circuit, method of operating a circuit, and method of operating a system-on-chip

A watchdog circuit for monitoring a plurality of virtual machines provided by one core of a plurality of cores. The watchdog circuit may include a first memory portion, a second memory portion, and a control logic configured to count a number of pulses, to, when starting the watchdog circuit, store a global watchdog counter value in the first memory portion, and to store a local counter value for each virtual machine of the one or more virtual machines in the second memory portion, and, after a predefined number of pulses, to modify the global watchdog counter value and the local counter values, and, if the global watchdog counter value fulfills a predefined global watchdog reference criterion or any of the local watchdog counter values fulfills a predefined local watchdog reference criterion, to output an error signal.

Virtual machine fault tolerance

System and method for providing fault tolerance in virtualized computer systems use a first guest and a second guest running on virtualization software to produce outputs, which are produced when a workload is executed on the first and second guests. An output of the second guest is compared with an output of the first guest to determine if there is an output match. If there is no output match, the first guest is paused and a resynchronization of the second guest is executed to restore a checkpointed state of the first guest on the second guest. After the resynchronization of the second guest, the paused first guest is caused to resume operation.

Autonomous Error Correction in a Multi-Application Platform
20230059984 · 2023-02-23 ·

An embodiment may involve, based on a pre-defined trigger associated with a particular application, reading error data from a resource that is used by the particular application, wherein persistent storage contains definitions of a plurality of error scenarios, a plurality of fix scripts, and associations between each of the plurality of error scenarios and one or more of the plurality of fix scripts; applying one or more rules to the error data, wherein the rules involve pattern matching or parsing; based on applying the one or more rules, determining a particular error scenario represented in the error data, wherein the particular error scenario is one of the plurality of error scenarios; determining, based on the associations, a particular fix script associated with the particular error scenario, wherein the particular fix script is one of the plurality of fix scripts; and causing execution of the particular fix script.