G06F12/0844

Cache efficient reading of result values in a column store database
11119742 · 2021-09-14 · ·

A system for cache efficient reading of column values in a database is provided. In some aspects, the system performs operations including pre-fetching, asynchronously and in response to a request for data in a column store database system, a plurality of first values associated with the requested data. The request may identify a row of the column store database system associated with the requested data. The plurality of first values may be located in the row. The operations may further include storing the plurality of first values in a cache memory. The operations may further include pre-fetching, asynchronously and based on the plurality of first values, a plurality of second values. The operations may further include storing the plurality of second values in the cache memory. The operations may further include reading, in response to the storing the plurality of second values, the requested data from the cache memory.

BARRIERLESS AND FENCELESS SHARED MEMORY SYNCHRONIZATION
20210286619 · 2021-09-16 ·

When communicating through shared memory, a producer thread generates a value that is written to a location in a shared memory. The value is read from the shared memory by a consumer thread. The challenge is to ensure that the consumer thread reads the location only after the value is written and is thereby synchronized. When a memory location is written by a producer thread, a flag that is simultaneously stored in the memory location along with the value is toggled. The consumer thread tracks information to determine whether the flag stored in the location indicates whether the producer has written the value to the location. The flag is read and written simultaneously with reading and writing the location in memory, thereby eliminating the need for a memory fence. After all of the consumer threads read the value, the location may be reused to write additional value(s) and simultaneously toggle the flag.

Timed data transfer between a host system and a memory sub-system
11113198 · 2021-09-07 · ·

A memory sub-system configured to schedule the transfer of data from a host system for write commands to reduce the amount and time of data being buffered in the memory sub-system. For example, after receiving a plurality of streams of write commands from a host system, the memory sub-system identifies a plurality of media units in the memory sub-system for concurrent execution of a plurality of write commands respectively. In response to the plurality of commands being identified for concurrent execution in the plurality of media units respectively, the memory sub-system initiates communication of the data of the write commands from the host system to a local buffer memory of the memory sub-system. The memory sub-system has capacity to buffer write commands in a queue, for possible out of order execution, but limited capacity for buffering only the data of a portion of the write commands that are about to be executed.

Timed data transfer between a host system and a memory sub-system
11113198 · 2021-09-07 · ·

A memory sub-system configured to schedule the transfer of data from a host system for write commands to reduce the amount and time of data being buffered in the memory sub-system. For example, after receiving a plurality of streams of write commands from a host system, the memory sub-system identifies a plurality of media units in the memory sub-system for concurrent execution of a plurality of write commands respectively. In response to the plurality of commands being identified for concurrent execution in the plurality of media units respectively, the memory sub-system initiates communication of the data of the write commands from the host system to a local buffer memory of the memory sub-system. The memory sub-system has capacity to buffer write commands in a queue, for possible out of order execution, but limited capacity for buffering only the data of a portion of the write commands that are about to be executed.

CACHE SNOOPING MODE EXTENDING COHERENCE PROTECTION FOR CERTAIN REQUESTS
20210182198 · 2021-06-17 ·

A cache memory includes a data array, a directory of contents of the data array that specifies coherence state information, and snoop logic that processes operations snooped from a system fabric by reference to the data array and the directory. The snoop logic, responsive to snooping on the system fabric a request of a first flush/clean memory access operation that specifies a target address, determines whether or not the cache memory has coherence ownership of the target address. Based on determining the cache memory has coherence ownership of the target address, the snoop logic services the request and thereafter enters a referee mode. While in the referee mode, the snoop logic protects a memory block identified by the target address against conflicting memory access requests by the plurality of processor cores until conclusion of a second flush/clean memory access operation that specifies the target address.

ENHANCED FILESYSTEM SUPPORT FOR ZONE NAMESPACE MEMORY
20210157720 · 2021-05-27 ·

A processing device in a memory sub-system identifies a first memory device and a second memory device and configures the second memory device with a zone namespace. The processing device identifies a first portion and a second portion of the first memory device, the first portion storing zone namespace metadata corresponding to the zone namespace on the second memory device. The processing device further exposes the second portion of the first memory device to a host system as a non-zoned addressable memory region.

ENHANCED FILESYSTEM SUPPORT FOR ZONE NAMESPACE MEMORY
20210157720 · 2021-05-27 ·

A processing device in a memory sub-system identifies a first memory device and a second memory device and configures the second memory device with a zone namespace. The processing device identifies a first portion and a second portion of the first memory device, the first portion storing zone namespace metadata corresponding to the zone namespace on the second memory device. The processing device further exposes the second portion of the first memory device to a host system as a non-zoned addressable memory region.

Semiconductor device and memory access setup method

Limitations on memory access decrease the computing capability of related-art semiconductor devices during convolution processing in a convolutional neural network. A semiconductor device according to an aspect of the present invention includes an accelerator section that performs computation on a plurality of intermediate layers included in a convolutional neural network by using a memory having a plurality of banks capable of changing the read/write status on an individual bank basis. The accelerator section includes a network layer control section that controls a memory control section in such a manner as to change the read/write status assigned to the banks storing input data or output data of the intermediate layers in accordance with the transfer amounts and transfer rates of the input data and output data of the intermediate layers included in the convolutional neural network.

SYSTEM AND METHOD OF A HIGHLY CONCURRENT CACHE REPLACEMENT ALGORITHM

Disclosed are a method and system for managing multi-threaded concurrent access to a cache data structure. The cache data structure includes a hash table and three queues. The hash table includes a list of elements for each hash bucket with each hash bucket containing a mutex object and elements in each of the queues containing lock objects. Multiple threads can each lock a different hash bucket to have access to the list, and multiple threads can each lock a different element in the queues. The locks permit highly concurrent access to the cache data structure without conflict. Also, atomic operations are used to obtain pointers to elements in the queues so that a thread can safely advance each pointer. Race conditions that are encountered with locking an element in the queues or entering an element into the hash table are detected, and the operation encountering the race condition is retried.

SYSTEM AND METHOD OF A HIGHLY CONCURRENT CACHE REPLACEMENT ALGORITHM

Disclosed are a method and system for managing multi-threaded concurrent access to a cache data structure. The cache data structure includes a hash table and three queues. The hash table includes a list of elements for each hash bucket with each hash bucket containing a mutex object and elements in each of the queues containing lock objects. Multiple threads can each lock a different hash bucket to have access to the list, and multiple threads can each lock a different element in the queues. The locks permit highly concurrent access to the cache data structure without conflict. Also, atomic operations are used to obtain pointers to elements in the queues so that a thread can safely advance each pointer. Race conditions that are encountered with locking an element in the queues or entering an element into the hash table are detected, and the operation encountering the race condition is retried.