G06F12/0844

APPARATUS, SYSTEMS, AND METHODS FOR PROVIDING COMPUTATIONAL IMAGING PIPELINE

The present application relates generally to a parallel processing device. The parallel processing device can include a plurality of processing elements, a memory subsystem, and an interconnect system. The memory subsystem can include a plurality of memory slices, at least one of which is associated with one of the plurality of processing elements and comprises a plurality of random access memory (RAM) tiles, each tile having individual read and write ports. The interconnect system is configured to couple the plurality of processing elements and the memory subsystem. The interconnect system includes a local interconnect and a global interconnect.

Memory access bounds checking for a programmable atomic operator

Devices and techniques for memory access bounds checking for a programmable atomic operator are described herein. A processor can execute a programmable atomic operator with a base memory address. The processor can obtain a memory interleave size indicator corresponding to the programmable atomic operator and calculate a contiguous memory address range from the base memory address and the memory interleave size. The processor can then detect that a memory request from the programmable atomic operator is outside the contiguous memory address range and deny the memory request when it is outside of the contiguous memory address range and allow the memory request otherwise.

Storage device performing cache read operation using page buffer and operating method thereof
11734178 · 2023-08-22 · ·

A storage device includes: a memory device including a plurality of planes, and a plurality of cache buffers and data buffers; and a memory controller for controlling the memory device to transmit first data and second data from first plane and second plane into the respective first cache buffer and second cache buffer, and control the first cache buffer and the second cache buffer to transmit the first data and the second data to the memory controller. In response to a read request for third data from a host while the first data is transmitting from the first cache buffer to the memory controller, the memory controller transmits a cache read command to the memory device such that the memory device reads the third data after the first data is completely transmitted to the memory controller, before the second data is transmitted from the second cache buffer.

Storage device performing cache read operation using page buffer and operating method thereof
11734178 · 2023-08-22 · ·

A storage device includes: a memory device including a plurality of planes, and a plurality of cache buffers and data buffers; and a memory controller for controlling the memory device to transmit first data and second data from first plane and second plane into the respective first cache buffer and second cache buffer, and control the first cache buffer and the second cache buffer to transmit the first data and the second data to the memory controller. In response to a read request for third data from a host while the first data is transmitting from the first cache buffer to the memory controller, the memory controller transmits a cache read command to the memory device such that the memory device reads the third data after the first data is completely transmitted to the memory controller, before the second data is transmitted from the second cache buffer.

Coupling wide memory interface to wide write back paths

Systems and methods are disclosed for performing wide memory operations for a wide data cache line. In some examples of the disclosed technology, a processor having two or more execution lanes includes a data cache coupled to memory, a wide memory load circuit that concurrently loads two or more words from a cache line of the data cache, and a writeback circuit situated to send a respective word of the concurrently-loaded words to a selected execution lane of the processor, either into an operand buffer or bypassing the operand buffer. In some examples, a sharding circuit is provided that allows bitwise, byte-wise, and/or word-wise manipulation of memory operation data. In some examples, wide cache loads allows for concurrent execution of plural execution lanes of the processor.

Semiconductor device and continuous reading method

A continuous reading method of a flash memory is provided, including: after outputting data held in a cache memory (C0) of a latch (L1) of a page buffer/sensing circuit, data of the cache memory (C0) of a next page is read from a memory cell array, and the read data of the cache memory (C0) is held in the latch (L1). After outputting data held in the cache memory (C1) of the latch (L1), data of the same next page of the cache memory (C1) is read from the memory cell array, and the read data of the cache memory (C1) is held in the latch (L1).

Semiconductor device and continuous reading method

A continuous reading method of a flash memory is provided, including: after outputting data held in a cache memory (C0) of a latch (L1) of a page buffer/sensing circuit, data of the cache memory (C0) of a next page is read from a memory cell array, and the read data of the cache memory (C0) is held in the latch (L1). After outputting data held in the cache memory (C1) of the latch (L1), data of the same next page of the cache memory (C1) is read from the memory cell array, and the read data of the cache memory (C1) is held in the latch (L1).

MEMORY ACCESS BOUNDS CHECKING FOR A PROGRAMMABLE ATOMIC OPERATOR
20220121567 · 2022-04-21 ·

Devices and techniques for memory access bounds checking for a programmable atomic operator are described herein. A processor can execute a programmable atomic operator with a base memory address. The processor can obtain a memory interleave size indicator corresponding to the programmable atomic operator and calculate a contiguous memory address range from the base memory address and the memory interleave size. The processor can then detect that a memory request from the programmable atomic operator is outside the contiguous memory address range and deny the memory request when it is outside of the contiguous memory address range and allow the memory request otherwise.

Techniques for handling cache coherency traffic for contended semaphores

The techniques described herein improve cache traffic performance in the context of contended lock instructions. More specifically, each core maintains a lock address contention table that stores addresses corresponding to contended lock instructions. The lock address contention table also includes a state value that indicates progress through a series of states meant to track whether a load by the core in a spin-loop associated with semaphore acquisition has obtained the semaphore in an exclusive state. Upon detecting that a load in a spin-loop has obtained the semaphore in an exclusive state, the core responds to incoming requests for access to the semaphore with negative acknowledgments. This allows the core to maintain the semaphore cache line in an exclusive state, which allows it to acquire the semaphore faster and to avoid transmitting that cache line to other cores unnecessarily.

Techniques for handling cache coherency traffic for contended semaphores

The techniques described herein improve cache traffic performance in the context of contended lock instructions. More specifically, each core maintains a lock address contention table that stores addresses corresponding to contended lock instructions. The lock address contention table also includes a state value that indicates progress through a series of states meant to track whether a load by the core in a spin-loop associated with semaphore acquisition has obtained the semaphore in an exclusive state. Upon detecting that a load in a spin-loop has obtained the semaphore in an exclusive state, the core responds to incoming requests for access to the semaphore with negative acknowledgments. This allows the core to maintain the semaphore cache line in an exclusive state, which allows it to acquire the semaphore faster and to avoid transmitting that cache line to other cores unnecessarily.