G06F11/1016

ERROR HANDLING FOR RESILIENT SOFTWARE
20220129345 · 2022-04-28 ·

Error handling for resilient software includes: receiving data indicating a region of resilient memory; detecting an error associated with a region of memory; and preventing raising an exception for the error in response to the region of memory falling within the region of resilient memory by preventing the region of memory as being identified as including the error.

Error correction management for a memory device

Methods, systems, and devices for error correction management are described. A system may include a memory device that supports internal detection and correction of corrupted data, and whether such detection and correction functionality is operating properly may be evaluated. A known error may be included (e.g., intentionally introduced) into either data stored at the memory device or an associated error correction codeword, among other options, and data or other indications subsequently generated by the memory device may be evaluated for correctness in view of the error. Thus, either the memory device or a host device coupled with the memory device, among other devices, may determine whether error detection and correction functionality internal to the memory device is operating properly.

Error containment for enabling local checkpoint and recovery

Various embodiments include a parallel processing computer system that detects memory errors as a memory client loads data from memory and disables the memory client from storing data to memory, thereby reducing the likelihood that the memory error propagates to other memory clients. The memory client initiates a stall sequence, while other memory clients continue to execute instructions and the memory continues to service memory load and store operations. When a memory error is detected, a specific bit pattern is stored in conjunction with the data associated with the memory error. When the data is copied from one memory to another memory, the specific bit pattern is also copied, in order to identify the data as having a memory error.

MEMORY SYSTEM WITH ERROR DETECTION
20230307079 · 2023-09-28 ·

A memory controller generates error codes associates with write data and a write address and provides the error codes over a dedicated error detection code link to a memory device during a write operation. The memory device performs error detection, and in some cases correction, on the received write data and write address based on the error codes. If no uncorrectable errors are detected, the memory device furthermore stores the error codes in association with the write data. On a read operation, the memory device outputs the error codes over the error detection code link to the memory controller together with the read data. The memory controller performs error detection, and in some cases correction, on the received read data based on the error codes.

Devices and methods for preventing errors and detecting faults within a memory device

A data processing system includes a memory configured to receive memory access requests. Each memory access request having a corresponding access address and having a corresponding parity bit for an address value of the corresponding access address. The corresponding access address is received over a plurality of address lines and the parity bit is received over a parity line. The memory includes a memory array having a plurality of memory cells arranged in rows, each row having a corresponding word line of a plurality of word lines, and a row decoder coupled to the plurality of address lines, the parity line, and the plurality of word lines. The row decoder is configured to selectively activate a selected word line of the plurality of word lines based on the corresponding access address and the corresponding parity bit of a received memory access request. The concept can also be used with parity bits on columns of the memory cells and a column decoder that selects bit lines associated with column address lines.

MEMORY-BASED DISTRIBUTED PROCESSOR ARCHITECTURE
20210365334 · 2021-11-25 · ·

Distributed processors and methods for compiling code for execution by distributed processors are disclosed. In one implementation, a distributed processor may include a substrate; a memory array disposed on the substrate; and a processing array disposed on the substrate. The memory array may include a plurality of discrete memory banks, and the processing array may include a plurality of processor subunits, each one of the processor subunits being associated with a corresponding, dedicated one of the plurality of discrete memory banks. The distributed processor may further include a first plurality of buses, each connecting one of the plurality of processor subunits to its corresponding, dedicated memory bank, and a second plurality of buses, each connecting one of the plurality of processor subunits to another of the plurality of processor subunits.

SYSTEMS AND METHODS FOR DECODING ERROR CORRECTING CODES WITH HISTORICAL DECODING INFORMATION
20220021403 · 2022-01-20 ·

Systems and methods are provided for decoding data read from non-volatile storage devices. A method may comprise receiving a chunk of data read from a physical location of a non-volatile storage device and searching a memory for soft information associated with the physical location using a unique identifier associated with the physical location. The soft information may be generated from one or more previous decoding processes on previous data from the physical location. The method may further comprise retrieving the soft information identified by the unique identifier associated with the physical location from the memory, decoding the chunk of data with the soft information indicating reliability of bits in the chunk of data and updating the soft information with decoding information generated during the decoding.

Techniques to improve error correction using an XOR rebuild scheme of multiple codewords and prevent miscorrection from read reference voltage shifts

Examples include techniques to improve error correction using an exclusive OR (XOR) rebuild scheme that includes two uncorrectable codewords. Examples include generation of soft XOR codewords using bits of correctable codewords to rebuild a codeword read from a memory that has uncorrectable errors and adjust bit reliability information to generate a new codeword having correctable errors. Examples also include techniques to prevent mis-correction due to read reference voltage shifts using non-linear transformations.

HANDLING OPERATION SYSTEM (OS) IN SYSTEM FOR PREDICTING AND MANAGING FAULTY MEMORIES BASED ON PAGE FAULTS

A method of operating a system running a virtual machine that executes an application and an operating system (OS) includes performing first address translation from first virtual addresses to first physical addresses, identifying faulty physical addresses among the first physical addresses, each faulty physical address corresponding to a corresponding first physical address associated with a faulty memory cell, analyzing a row address and a column address of each faulty physical address and specifying a fault type of the faulty physical addresses based on the analyzing of the row address and the column address of each faulty physical address, and performing second address translation from second virtual addresses to second physical addresses based on a faulty address, thereby excluding the faulty address from the second physical addresses.

MEMORY MODULE, ERROR CORRECTION METHOD OF MEMORY CONTROLLER CONTROLLING THE SAME, AND COMPUTING SYSTEM INCLUDING THE SAME

A memory module includes first memory chips, each having a first input/output width, and configured to store data, a second memory chip having a second input/output width and configured to store an error correction code for correcting an error in the data, and a driver circuit configured to receive a clock signal, a command, and an address from a memory controller and to transmit the clock signal, the command, and the address to the first memory chips and the second memory chip. An address depth of each of the first memory chips and an address depth of the second memory chip are different from each other.