Patent classifications
G06F11/0757
Adaptive, proactive raid rebuild
A data storage system includes a plurality of storage devices organized as a redundant array of inexpensive disks (RAID) storage array and a RAID controller. The RAID controller monitors the plurality of storage devices in the RAID storage array. The RAID controller also detects that a host read request of a host has a latency exceeding a latency threshold. Based on the monitoring, the RAID controller determines whether a proactive rebuild of a data requested by the host read request in absence of a data error would likely be beneficial to performance. Based on determining that a proactive rebuild of the data requested by the host read request would likely be beneficial to performance, the RAID controller initiates the proactive rebuild of the data and sends the requested data to the host.
REMOVING DUPLICATE TRANSACTIONS FROM A TRANSACTION EXCHANGE PLATFORM
Aspects described herein may relate to a transaction exchange platform using a streaming data platform (SDP) and microservices to process transactions according to review and approval workflows. The transaction exchange platform may receive transactions from origination sources, which may be added to the SDP as transaction objects. As the transactions are received, the transactions may be analyzed to detect duplicate transactions and/or errors in the transactions. The transaction exchange platform may take steps to remediate transactions that are recognized as duplicates or predicted to generate one or more errors. Similarly, the transaction exchange platform may take steps to remediate transactions that are rejected by a clearinghouse.
FAULT RECOVERY SYSTEM FOR FUNCTIONAL CIRCUITS
A fault recovery system includes various fault management circuits that form a hierarchical structure. One fault management circuit detects a fault in a functional circuit and executes a recovery operation to recover the functional circuit from the fault. When the fault management circuit fails to recover the functional circuit from the fault within a predetermined time duration, a fault management circuit that is in a higher hierarchical level executes another recovery operation to recover the functional circuit from the fault. Such a fault management circuit is required to execute the corresponding recovery operation within another predetermined time duration to successfully recover the functional circuit from the fault. The fault recovery system thus implements the hierarchical structure of fault management circuits to recover the functional circuit from the fault.
SHADOW LIVE MIGRATION OVER A SMART NETWORK INTERFACE CARD
A smart network interface card in an information handling system monitors a local host memory associated with a computer resource for an update to a memory page in the local host memory. After the update to the memory page, the smart network interface card copies the memory page to its memory. The smart network interface card sets a watchdog timer to detect a failure in an the information handling system that hosts the computer resource and if the failure is detected, then the smart network interface card migrates the computer resource from its to another information handling system.
METHODS AND SYSTEMS THAT AUTOMATICALLY PREDICT DISTRIBUTED-COMPUTER-SYSTEM PERFORMANCE DEGRADATION USING AUTOMATICALLY TRAINED MACHINE-LEARNING COMPONENTS
The current document is directed to methods and systems that automatically generate training data for machine-learning-based components used by a metric-data processing-and-analysis component of a distributed computer system, a subsystem within a distributed computer system, or a standalone metric-data processing-and-analysis system. The training data sets are labeled using categorical KPI values. The machine-learning-based components are applied to metric data both for predicting anomalous operational behaviors and problems within the distributed computer system and for determination of potential causes of anomalous operational behaviors and problems within the distributed computer system. Training of machine-learning-based components is carried out concurrently and asynchronously with respect to other metric-data collection, aggregation, processing, storage, and analysis tasks.
HARDWARE-ASSISTED PARAVIRTUALIZED HARDWARE WATCHDOG
A hardware-assisted paravirtualized hardware watchdog is described that is used to detect and recover from computer malfunctions. A computing device determines that a hardware-implemented watchdog of the computing device does not comply with predetermined watchdog criteria, where the hardware-implemented watchdog is configured to send a reset signal when a first predetermined amount of time elapses without receipt of a first refresh signal. If the hardware-implemented watchdog does not comply with the predetermined watchdog criteria, a runtime watchdog service is initialized using a second predetermined amount of time. The runtime watchdog service is directed to periodically send the refresh signal to the hardware-implemented watchdog before an expiration of the first predetermined amount of time that causes the hardware-implemented watchdog to expire. The hardware-implemented watchdog is directed to send the reset signal when the second predetermined amount of time elapses without receipt of a second refresh signal.
METHODS, SYSTEMS, AND COMPUTER READABLE MEDIA FOR HEALTH CHECKING INVOLVING COMMON APPLICATION PROGRAMMING INTERFACE FRAMEWORK
Methods, systems, and computer readable media for health checking involving common application programming interface (API) framework (CAPIF) are disclosed. One example method for health checking using CAPIF comprises: at a CAPIF node including at least one processor: receiving, from a sender, a
CAPIF request message including health check information usable in checking the health of an endpoint associated with the sender, wherein the health check information includes a uniform resource identifier (URI) associated with the endpoint and an acceptable response value for indicating that the endpoint is active; and determining, periodically or aperiodically, a health status of the endpoint by performing a health check procedure using the URI and the acceptable response value.
BOOT FAILURE PROTECTION ON SMARTNICS AND OTHER COMPUTING DEVICES
Boot failure protection on smartNICs and other computing devices is described. During a power-on stage of a booting process for a computing device, a boot loading environment is directed to install an application programming interface (API) able to be invoked to control operation of a hardware-implemented watchdog. During an operating system loading stage of the booting process, the application programming interface is invoked to enable the hardware-implemented watchdog. During an operating system hand-off stage of the booting process, a last watchdog refresh of the hardware-implemented watchdog is performed, and execution of the boot loading environment is handed off to a kernel boot loader of an operating system. The application programming interface may not be accessible after the hand off to the kernel boot loader.
WATCHDOG MICROSERVICE TO RESOLVE LOCKS WHEN PROCESSING FAILS ON A TRANSACTION EXCHANGE PLATFORM
Aspects described herein may relate to a transaction exchange platform using a streaming data platform (SDP) and microservices to process transactions according to review and approval workflows. The transaction exchange platform may receive transactions from origination sources, which may be added to the SDP as transaction objects. As the transactions are processed, the transactions may require access to a resource (e.g., a key value in a database). A microservice processing the transaction may request, from a locking microservice, a lock for the resource. The locking microservice may query a local cache to determine whether a lock exists for the resource. If the local cache determines that no lock exists for resource, the locking mechanism may employ a consensus protocol to obtain a lock for the resource from a plurality of clusters. If consensus is reached, a lock for the resource may be granted to the requesting microservice.
CONSENSUS KEY LOCKING WITH FAST LOCAL STORAGE FOR IDEMPOTENT TRANSACTIONS
Aspects described herein may relate to a transaction exchange platform using a streaming data platform (SDP) and microservices to process transactions according to review and approval workflows. The transaction exchange platform may receive transactions from origination sources, which may be added to the SDP as transaction objects. As the transactions are processed, the transactions may require access to a resource (e.g., a key value in a database). A microservice processing the transaction may request, from a locking microservice, a lock for the resource. The locking microservice may query a local cache to determine whether a lock exists for the resource. If the local cache determines that no lock exists for resource, the locking mechanism may employ a consensus protocol to obtain a lock for the resource from a plurality of clusters. If consensus is reached, a lock for the resource may be granted to the requesting microservice.