G06F11/24

MECHANISM FOR INTEGRATING I/O HYPERVISOR WITH A COMBINED DPU AND SERVER SOLUTION

A combined data processing unit (DPU) and server solution with DPU operating system (OS) integration is described. A DPU OS is executed on a DPU or other computing device, where the DPU OS exercises secure calls provided by a DPU's trusted firmware component, that may be invoked by DPU OS components to abstract DPU vendor-specific and server vendor-specific integration details. An invocation of one of the secure calls made on the DPU to communicate with its associated server computing device is identified. In an instance in which the one of the secure calls is invoked, the secure call invoked is translated into a call or request specific to an architecture of the server computing device and the call is performed, which may include sending a signal to the server computing device in a format interpretable by the server computing device.

MECHANISM FOR INTEGRATING I/O HYPERVISOR WITH A COMBINED DPU AND SERVER SOLUTION

A combined data processing unit (DPU) and server solution with DPU operating system (OS) integration is described. A DPU OS is executed on a DPU or other computing device, where the DPU OS exercises secure calls provided by a DPU's trusted firmware component, that may be invoked by DPU OS components to abstract DPU vendor-specific and server vendor-specific integration details. An invocation of one of the secure calls made on the DPU to communicate with its associated server computing device is identified. In an instance in which the one of the secure calls is invoked, the secure call invoked is translated into a call or request specific to an architecture of the server computing device and the call is performed, which may include sending a signal to the server computing device in a format interpretable by the server computing device.

Enhanced in-system test coverage based on detecting component degradation

In various examples, permanent faults in hardware component(s) and/or connections to the hardware component(s) of a computing platform may be predicted before they occur using in-system testing. As a result of this prediction, one or more remedial actions may be determined to enhance the safety of the computing platform (e.g., an autonomous vehicle). A degradation rate of a performance characteristic associated with the hardware component may be determined, detected, and/or computed by monitoring values of performance characteristics over time using fault testing.

Detecting execution hazards in offloaded operations

Detecting execution hazards in offloaded operations is disclosed. A second offload operation is compared to a first offload operation that precedes the second offload operation. It is determined whether the second offload operation creates an execution hazard on an offload target device based on the comparison of the second offload operation to the first offload operation. If the execution hazard is detected, an error handling operation may be performed. In some examples, the offload operations are processing-in-memory operations.

Detecting execution hazards in offloaded operations

Detecting execution hazards in offloaded operations is disclosed. A second offload operation is compared to a first offload operation that precedes the second offload operation. It is determined whether the second offload operation creates an execution hazard on an offload target device based on the comparison of the second offload operation to the first offload operation. If the execution hazard is detected, an error handling operation may be performed. In some examples, the offload operations are processing-in-memory operations.

Method of operating storage device for improving reliability, storage device performing the same and method of operating storage using the same
11593242 · 2023-02-28 · ·

A method of operating a storage device includes sensing a standby current flowing through the storage device, determining based on the sensed standby current and at least one reference value whether a product abnormality has occurred within the storage device, and when it is determined the product abnormality has occurred, performing a step-wise control operation in which two or more control processes associated with an operation of the storage device are sequentially executed.

Leveraging low power states for fault testing of processing cores at runtime

In various examples, one or more components or regions of a processing unit—such as a processing core, and/or component thereof—may be tested for faults during deployment in the field. To perform testing while in deployment, the state of a component subject to test may be retrieved and/or stored during the test to maintain state integrity, the component may be clamped to communicatively isolate the component from other components of the processing unit, a test vector may be applied to the component, and the output of the component may be compared against an expected output to determine if any faults are present. The state of the component may be restored after testing, and the clamp removed, thereby returning the component to its operating state without a perceivable detriment to operation of the processing unit in deployment.

Method and system for intelligent failure diagnosis center for burn-in devices under test

A mechanism is provided for automatically detecting, diagnosing, transporting, and repairing devices having failed during burn-in testing. Embodiments provide a system that monitors devices undergoing burn-in testing and detecting when a device or a component within a device fails the burn-in test. Embodiments can then alert burn-in-rack monitor personnel of the device failure. Embodiments can concurrently determine the nature of the failure applying a machine learning-based prediction model against log files associated with the failed device. The diagnosis along with a recommended repair strategy can be provided to the repair center as an aid in accelerating the repair process. In addition, the diagnosis can be used to order parts for the repair from a parts depot. In this manner, embodiments can reduce the time for detection, diagnosis, and repair of the failed device.

LEVERAGING LOW POWER STATES FOR FAULT TESTING OF PROCESSING CORES AT RUNTIME
20230123956 · 2023-04-20 ·

In various examples, one or more components or regions of a processing unit—such as a processing core, and/or component thereof—may be tested for faults during deployment in the field. To perform testing while in deployment, the state of a component subject to test may be retrieved and/or stored during the test to maintain state integrity, the component may be clamped to communicatively isolate the component from other components of the processing unit, a test vector may be applied to the component, and the output of the component may be compared against an expected output to determine if any faults are present. The state of the component may be restored after testing, and the clamp removed, thereby returning the component to its operating state without a perceivable detriment to operation of the processing unit in deployment.

INTRA-CLASS ADAPTATION FAULT DIAGNOSIS METHOD FOR BEARING UNDER VARIABLE WORKING CONDITIONS

The invention relates to a fault diagnosis method for a rolling bearing under variable working conditions. Based on a convolutional neural network, a transfer learning algorithm is combined to handle the problem of the reduced universality of deep learning models. Data acquired under different working conditions is segmented to obtain samples. The samples are preprocessed by using FFT. Low-level features of the samples are extracted by using improved ResNet-50, and a multi-scale feature extractor analyzes the low-level features to obtain high-level features as inputs of a classifier. In a training process, high-level features of training samples and test samples are extracted, and a conditional distribution distance between them is calculated as a part of a target function for backpropagation to implement intra-class adaptation, thereby reducing the impact of domain shift, to enable a deep learning model to better carry out fault diagnosis tasks.