Patent classifications
G06F11/2028
Parallel processing system runtime state reload
A parallel processing system includes at least three processors operating in parallel, state monitoring circuitry, and state reload circuitry. The state monitoring circuitry couples to the at least three parallel processors and is configured to monitor runtime states of the at least three parallel processors and identify a first processor of the at least three parallel processors having at least one runtime state error. The state reload circuitry couples to the at least three parallel processors and is configured to select a second processor of the at least three parallel processors for state reload, access a runtime state of the second processor, and load the runtime state of the second processor into the first processor. Monitoring and reload may be performed only on sub-systems of the at least three parallel processors. During reload, clocks and supply voltages of the processors may be altered. The state reload may relate to sub-systems.
Using a storage path to facilitate disaster recovery
A method, computer program product, and computing system for using a storage path to facilitate disaster recovery are described. A method may comprise receiving a selection of the storage path to facilitate access to a cloud storage device by the cloud computing client. The method may further comprise configuring the storage path to facilitate access to the cloud storage device by the cloud computing client, wherein the storage path is dedicated to the cloud computing client, and wherein a cloud computing site infrastructure is inaccessible to the cloud computing client via the storage path. The method may also comprise configuring a backup routine to generate a backed-up client resource and transmit the backed-up client resource to the cloud storage device via the storage path. The method may additionally comprise transmitting a list of backed-up client resources stored at the cloud storage device.
Optimized relocation of data based on data characteristics
A command is transmitted to a storage device to relocate first data that partially fills a first erase block of the storage device and second data that partially fills a second erase block of the storage device to a third erase block of the storage device, wherein the command causes the relocation of the first data and the second data while bypassing sending the data to the storage controller. An acknowledgement that the first data and the second data have been stored at the third erase block is received from the storage device.
DYNAMIC NODE INSERTION OF SECONDARY SERVICES FOR HIGH-AVAILABILITY DURING MAIN DECISION FAILURE AT RUNTIME
There are provided systems and methods for dynamic node insertion of secondary services for high-availability during main decision failure at runtime. A service provider, such as an electronic transaction processor for digital transactions, may utilize different decision services that implement rules and/or artificial intelligence models for decision-making of data including data in production computing environment. A main decision service may normally be used for data processing and decision-making. However, at certain times, the main decision service may fail, such as if a data processing node fails to process data or times out while processing a data processing request, such as during electronic transaction processing. During this runtime, a dynamic injection processor may dynamically inject a node that performs a call to a secondary service to process the data on behalf of the node and/or main decision service so that a response is provided to the data processing request.
RELIABILITY AVAILABILITY SERVICEABILITY (RAS) SERVICE FRAMEWORK
Examples described herein relate to execution of multiple Reliability Availability Serviceability (RAS) processes on different processors of the at least two processors to provide fallback from a first RAS process to a second RAS process executing on a processor of the at least two processors based on failure or timeout of the first RAS process. In some examples, the different processors comprise independently operating processors whereby failure or inoperability of one of the different processors is independent of another of the different processors. In some examples, failure or timeout of the first RAS process comprises failure of the second RAS process to receive an operating status signal from the first RAS process.
Locality based quorums
Disclosed are various embodiments for distributing data items within a plurality of nodes. A data item that is subject to a data item update request is updated from a master node to a plurality of slave notes. The update of the data item is determined to be locality-based durable based at least in part on acknowledgements received from the slave nodes. Upon detection that the master node has failed, a new master candidate is determined via an election among the plurality of slave nodes.
Method and system to implement cluster failure prediction to facilitate split brain resolution
Described is a system, method, and computer program product for performing elections in a database cluster, where system resource statistics information is used to predict a cluster node failure. Resource statistics data is classified and used to identify anomalies. The anomalies can be used to determine the probability of a cluster node failure and to then elect a new master node and/or surviving sub-cluster.
Targeted repair of hardware components in a computing device
A method for targeted repair of a hardware component in a computing device that is part of a cloud computing system includes monitoring a plurality of hardware components in the computing device. At some point, a defective sub-component within the hardware component of the computing device is identified. In addition to the defective sub-component, the hardware component also includes at least one sub-component that is functioning properly and a spare component that can be used in place of the defective sub-component. The method also includes initiating a targeted repair action while the computing device is connected to the cloud computing system. The targeted repair action prevents the defective sub-component from being used by the computing device without preventing sub-components that are functioning properly from being used by the computing device. The targeted repair action causes the spare component to be used in place of the defective sub-component.
High availability for a relational database management system as a service in a cloud platform
A Relational Database Management System (“RDBMS”) as a service cluster may including a master RDBMS Virtual Machine (“VM”) node associated with an Internet Protocol (“IP”) address and a standby RDBMS VM node associated with an IP address. The RDBMS as a service (e.g., PostgreSQL as a service) may also include n controller VM nodes each associated with an IP address. An internal load balancer may receive requests from cloud applications and include a frontend IP address different than the RDBMS IP as a service addresses and a backend pool including indications of the master RDBMS VM node and the standby RDBMS VM node. A Hyper-Text Transfer Protocol (“HTTP”) custom probe may transmit requests for the health of the master RDBMS VM node and the standby RDBMS VM node via the associated IP addresses, and responses to the requests may be used in connection with a failover operation.
Method and system for generating latency aware workloads using resource devices in a resource device pool
A method for managing data includes obtaining, by a management module, a workload generation request, wherein the workload generation request specifies a plurality of resource devices, identifying available resource devices in a resource device pool based on the plurality of resource devices, performing a latency analysis on the available resource devices to obtain a plurality of resource device combinations and a total latency cost of each resource device combination, and selecting a resource device combination of the plurality of resource device combinations based on the total latency cost of each resource device combination, wherein the resource device combination comprises a second plurality of resource devices and wherein each of the second plurality of resource devices is one of the plurality of resource devices.