Patent classifications
G06F12/0842
Cache memory architecture and management
Aspects of the present disclosure relate to data cache management. In embodiments, a storage array's memory is provisioned with cache memory, wherein the cache memory includes one or more sets of distinctly sized cache slots. Additionally, a logical storage volume (LSV) is established with at least one logical block address (LBA) group. Further, at least one of the LSV's LBA groups is associated with two or more distinctly sized cache slots based on an input/output (IO) workload received by the storage array.
MANAGING DATA STORED IN A CACHE USING A REINFORCEMENT LEARNING AGENT
Managing data stored in a cache using a reinforcement learning agent may include: determining a set of current state observations with respect to a cache, wherein the set of current state observations is determined based on historical cache accesses to the cache; inputting the set of current state observations into an actor network of a reinforcement learning (RL) agent to obtain an action output by the actor network, wherein the RL agent is configured to manage data stored at the cache; inputting the set of current state observations and the action into a critic network of the RL agent to obtain a score corresponding to the action from the critic network; causing the RL agent to perform the action with respect to managing the data stored at the cache; using the score to update the actor network; and using a reward corresponding to the action to update the critic network.
CACHE COHERENCE VALIDATION USING DELAYED FULFILLMENT OF L2 REQUESTS
Methods and systems for validating cache coherence in a data processing system are described. A processing element may detect a load instruction requesting the processing element to transfer data from a global memory location to a local memory location. The processing element may apply, in response to detecting the load instruction requesting the processing element to transfer data from the global memory location to the local memory location, a delay to the transfer of the data from the global memory location to the local memory location. The processing element may execute the load instruction and transferring the data from the global memory location to the local memory location with the applied delay. The processing element may validate, in response to executing the load instruction and transferring the data with the applied delay, a cache coherence of the data processing system.
APPARATUS AND METHOD FOR HANDLING STASH REQUESTS
An apparatus and method for handling stash requests are described. The apparatus has a processing element with an associated storage structure that is used to store data for access by the processing element, and an interface for coupling the processing element to interconnect circuitry. Stash request handling circuitry is also provided that, in response to a stash request targeting the storage structure being received at the interface from the interconnect circuitry, causes a block of data associated with the stash request to be stored within the storage structure. The stash request identifies a given address that needs translating into a corresponding physical address in memory, and also identifies an address space key. Address translation circuitry is used to convert the given address identified by the stash request into the corresponding physical address by performing an address translation that is dependent on the address space key identified by the stash request. The stash request handling circuitry is then responsive to the corresponding physical address determined by the address translation circuitry to cause the block of data to be stored at a location within the storage structure associated with the physical address.
APPARATUS AND METHOD FOR HANDLING STASH REQUESTS
An apparatus and method for handling stash requests are described. The apparatus has a processing element with an associated storage structure that is used to store data for access by the processing element, and an interface for coupling the processing element to interconnect circuitry. Stash request handling circuitry is also provided that, in response to a stash request targeting the storage structure being received at the interface from the interconnect circuitry, causes a block of data associated with the stash request to be stored within the storage structure. The stash request identifies a given address that needs translating into a corresponding physical address in memory, and also identifies an address space key. Address translation circuitry is used to convert the given address identified by the stash request into the corresponding physical address by performing an address translation that is dependent on the address space key identified by the stash request. The stash request handling circuitry is then responsive to the corresponding physical address determined by the address translation circuitry to cause the block of data to be stored at a location within the storage structure associated with the physical address.
Memory system architecture for multi-threaded processors
Disclosed embodiments relate to an improved memory system architecture for multi-threaded processors. In one example, a system includes a system comprising a multi-threaded processor core (MTPC), the MTPC comprising: P pipelines, each to concurrently process T threads; a crossbar to communicatively couple the P pipelines; a memory for use by the P pipelines, a scheduler to optimize reduction operations by assigning multiple threads to generate results of commutative arithmetic operations, and then accumulate the generated results, and a memory controller (MC) to connect with external storage and other MTPCs, the MC further comprising at least one optimization selected from: an instruction set architecture including a dual-memory operation; a direct memory access (DMA) engine; a buffer to store multiple pending instruction cache requests; multiple channels across which to stripe memory requests; and a shadow-tag coherency management unit.
Methods and Systems for Memory Bandwidth Control
Resources of an electronic device are partitioned into a plurality of resource portions to be utilized by a plurality of clients. Each resource portion is assigned to a respective client, has a respective partition identifier (ID), and corresponds to a plurality memory bandwidth usage states tracked for a plurality of memory blocks. For each resource portion, each of the memory bandwidth usage states is associated with a respective memory block and indicates at least how much of a memory access bandwidth assigned to the respective partition ID to access the respective memory block is used. A usage level is determined for each resource partition based on the memory bandwidth usage states, and applied to adjust a credit count. When the credit count is adjusted beyond a request issue threshold, a next data access request is issued from a memory access request queue for the respective partition ID.
Instruction Cache for Hardware Multi-Thread Microprocessor
Embodiments are provided for instructions cache system for a hardware multi-thread microprocessor. In some embodiments, a cache controller device includes multiple interfaces connected to a hardware multi-thread microprocessor. A first interface of the multiple interfaces can receive a fetch request from a first execution thread during a first clock cycle. A second interface of the multiple interfaces can receive a fetch request from a second execution thread during a second clock cycle after the first clock cycle. The cache controller device also includes a multiplexer to send first response signals in response to the fetch request from the first execution thread, and also to send second response signals in response to the fetch request from the second execution thread.
APPARATUS AND METHOD WITH CACHE CONTROL
A computing apparatus is provided. The computing apparatus is configured to receive control information from a host device to control a cache area, generate a cache configuration based on the received control information, determine a first cache area and a second cache area in a memory in the computing apparatus based on the generated cache configuration, cache one or more instructions to the first cache area and cache data to the second cache area, and process a thread based on the one or more cached instructions and the cached data.
Allocation of memory resources to SIMD workgroups
A memory subsystem for use with a single-instruction multiple-data (SIMD) processor comprising a plurality of processing units configured for processing one or more workgroups each comprising a plurality of SIMD tasks, the memory subsystem comprising: a shared memory partitioned into a plurality of memory portions for allocation to tasks that are to be processed by the processor; and a resource allocator configured to, in response to receiving a memory resource request for first memory resources in respect of a first-received task of a workgroup, allocate to the workgroup a block of memory portions sufficient in size for each task of the workgroup to receive memory resources in the block equivalent to the first memory resources.