Patent classifications
G06F15/7821
COMPUTATIONAL MEMORY
An example device includes a plurality of computational memory banks. Each computational memory bank of the plurality of computational memory banks includes an array of memory units and a plurality of processing elements connected to the array of memory units. The device further includes a plurality of single instruction, multiple data (SIMD) controllers. Each SIMD controller of the plurality of SIMD controllers is contained within at least one computational memory bank of the plurality of computational memory banks. Each SIMD controller is to provide instructions to the at least one computational memory bank.
Processing-in-memory (PIM) systems
A processing-in-memory (PIM) system includes a plurality of PIM devices, a plurality of PIM controllers configured to control respective ones of the plurality of PIM devices, and an interface coupled between a host and the plurality of PIM controllers. The interface transmits first request data to a target PIM controller corresponding to one of the plurality of PIM controllers for execution of a first request output from the host. The interface transmits second request data to all of the plurality of PIM controllers for execution of a second request output from the host.
Methods for performing fused-multiply-add operations on serially allocated data within a processing-in-memory capable memory device, and related memory devices and systems
Methods, apparatuses, and systems for in- or near-memory processing are described. Strings of bits (e.g., vectors) may be fetched and processed in logic of a memory device without involving a separate processing unit. Operations (e.g., arithmetic operations) may be performed on numbers stored in a bit-serial way during a single sequence of clock cycles. Arithmetic may thus be performed in a single pass as numbers are bits of two or more strings of bits are fetched and without intermediate storage of the numbers. Vectors may be fetched (e.g., identified, transmitted, received) from one or more bit lines. Registers of the memory array may be used to write (e.g., store or temporarily store) results or ancillary bits (e.g., carry bits or carry flags) that facilitate arithmetic operations. Circuitry near, adjacent, or under the memory array may employ XOR or AND (or other) logic to fetch, organize, or operate on the data.
Discrete three-dimensional processor
A discrete three-dimensional (3-D) processor comprises first and second dice. The first die comprises 3-D memory (3D-M) arrays, whereas the second die comprises logic circuits and at least an off-die peripheral-circuit component of the 3D-M array(s). Typical off-die peripheral-circuit component could be an address decoder, a sense amplifier, a programming circuit, a read-voltage generator, a write-voltage generator, a data buffer, or a portion thereof.
Host performing an embedding operation and computing system including the same
A computing system capable of reducing data movement during an embedding operation and efficiently processing the embedding operation includes a host and a memory system. The host divides a plurality of feature tables, each including a respective plurality of embedding vectors for a respective plurality of elements, into a first feature table group and a second feature table group; generates a first embedding table configured of the first feature table group; and sends a request for a generation operation of a second embedding table configured of the second feature table group to the memory system. The memory system generates the second embedding table according to the generation operation request provided by the host. The host divides the plurality of feature tables into the first feature table group and the second feature table group based on the number of elements included in each of the plurality of feature tables.
ADJUSTABLE FUNCTION-IN-MEMORY COMPUTATION SYSTEM
A method for in-memory computing. In some embodiments, the method includes: executing, by a first function-in-memory circuit, a first instruction, to produce, as a result, a first value, wherein a first computing task includes a second computing task and a third computing task, the second computing task including the first instruction; storing, by the first function-in-memory circuit, the first value in a first buffer; reading, by a second function-in-memory circuit, the first value from the first buffer; and executing, by a second function-in-memory circuit, a second instruction, the second instruction using the first value as an argument, the third computing task including the second instruction, wherein: the storing, by the first function-in-memory circuit, of the first value in the first buffer includes directly storing the first value in the first buffer.
PROVIDING ATOMICITY FOR COMPLEX OPERATIONS USING NEAR-MEMORY COMPUTING
Providing atomicity for complex operations using near-memory computing is disclosed. In an implementation, a complex atomic operation is decomposed into a set of sequential operations that is stored in a near-memory instruction store. A memory controller receives a request from a host execution engine to issue the complex atomic operation and initiates execution of the stored set of sequential operations on a near-memory compute unit. The complex atomic operation may be a user-defined complex atomic operation.
Processing-in-memory (PIM) device
A processing-in-memory (PIM) device includes a first group of storage regions, a second group of storage regions, and a plurality of multiplication/accumulation (MAC) operators. The MAC operators are configured to communicate with the first and second groups of storage regions through a global data input/output (GIO) line. A first storage region corresponding to a storage region of the first group of storage regions, a second storage region corresponding to a storage region of the second group of storage regions, and a first MAC operator corresponding to a MAC operator of the plurality of MAC operators constitute a MAC unit. The first MAC operator is configured to receive first data and second data from the first and second storage regions, respectively, through the GIO line to perform a MAC arithmetic operation of the first and second data and to output a result of the MAC arithmetic operation.
ENHANCED DIGITAL SIGNAL PROCESSOR (DSP) NAND FLASH
A method and apparatus for systems and methods for digital signal processing (DSP) in a non-volatile memory (NVM) device comprising CMOS coupled to NVM die, of a data storage device. According to certain embodiments, one or more DSP calculations are provided by a controller to the CMOS components of the NVM, that configure one or more memory die to carry out atomic calculations on the data resident on the die. The results of calculations of each die are provided to an output latch for each die, back-propagating data back to the configured calculation portion as needed, otherwise forwarding the results to the controller. The controller aggregates the results of DSP calculations of each die and presents the results to the host system.
PROCESSING-IN-MEMORY (PIM) SYSTEM AND OPERATING METHODS OF THE PIM SYSTEM
A processing-in-memory (PIM) system includes a host configured to generate a first request for a memory access operation and a second request for an arithmetic operation, a PIM controller configured to generate a first command based on the first request or the second request, a high speed interface configured to generate a second command based on the second request, and a PIM device configured to perform the memory access operation in response to the first command from the PIM controller and to perform the arithmetic operation in response to the second command from the high speed interface.