Patent classifications
G06F2212/6046
PARTITIONING TLB OR CACHE ALLOCATION
A request for data from a cache (TLB or data/instruction cache) specifies a partition identifier allocated to a software execution environment associated with the request. Allocation of data to the cache is controlled based on a set of configuration information selected based on the partition identifier specified by the request. For a TLB, this allows different allocation policies to be used for requests associated with different software execution environments. In one example, the cache allocation is controlled based on an allocation threshold specified by the selected set of configuration information, which limits the maximum number of cache entries allowed to be allocated with data associated with the corresponding partition identifier.
Distributed queue pair state on a host channel adapter
A method for managing a distributed cache of a host channel adapter (HCA) that includes receiving a work request including a QP number, determining that a QP state identified by the QP number is not in the distributed cache, retrieving the QP state from main memory, and identifying a first portion and a second portion of the QP state. The method further includes storing the first portion into a first entry of a first sub-cache block associated with the first module, where the first entry is identified by a QP index number, storing the second portion into a second entry of a second sub-cache block associated with the second module, where the second entry is identified by the QP index number; and returning the QP index number of the QP state to the first module and the second module.
Placement policy for memory hierarchies
A placement policy enables the selective storage of cachelines in a multi-level cache hierarchy: Reuse behavior of a cacheline is tracked during execution of an application in both a first level cache memory and a second level cache memory. A cache placement policy for the cacheline is determined based on the tracked reuse behavior.
Methods and apparatus for data request scheduling in performing parallel IO operations
Methods and apparatus for data request scheduling in performing parallel IO operations are disclosed. In one example, IO requests directed to an operating system having an IO scheduling component are processed. There, an IO request directed from an application to the operating system is intercepted. A determination is made whether the IO request is subject to immediate processing using available parallel processing resources. When it is determined that the IO request is subject to immediate processing using the available parallel processing resources, the IO scheduling component of the operating system is bypassed. The IO request is directly and immediately processed and passed back to the application using the available parallel processing resources.
ELECTRONIC SYSTEM WITH MEMORY MANAGEMENT MECHANISM AND METHOD OF OPERATION THEREOF
An electronic system includes: a processor configured to access operation data; a high speed local memory, coupled to the processor, configured to store a limited amount of the operation data; and a memory subsystem, coupled to the high speed local memory, including: a module memory controller configured to access the operational data for the processor and the high speed local memory, a local cache controller, coupled to the module memory controller, including a fast control bus configured to store the operation data, with critical timing in a first tier memory, and a memory tier controller, coupled to the local cache controller, including a reduced performance control bus configured to store the operation data with non-critical timing in a second tier memory.
DATA PROCESSING
A data processing system comprises a master node to initiate data transmissions; one or more slave nodes to receive the data transmissions; and a home node to control coherency amongst data stored by the data processing system; in which at least one data transmission from the master node to one of the one or more slave nodes bypasses the home node.
DYNAMIC CACHE BYPASSING
A processing system fills a memory access request for data from a processor core by bypassing a cache when a write congestion condition is detected, and when transferring the data to the cache would cause eviction of a dirty cache line. The cache is bypassed by transferring the requested data to the processor core or to a different cache. Accordingly, the processing system can temporarily bypass the cache storing the dirty cache line when filling a memory access request, thereby avoiding the eviction and write back to main memory of a dirty cache line when a write congestion condition exists.
MEMORY ALLOCATION IN MULTI-CORE PROCESSORS
A system for allocating memory (e.g., heap) in multi-core processors is provided. In some implementations, the system performs operations comprising receiving, at a shared cache having a plurality of segments, a first data allocation including a plurality of data blocks, and allocating at least a first and second data block from the first allocation. First and second segments in the shared cache can each comprise a plurality of data slots (e.g., of equal length). Allocating the first and second data blocks can include storing the first data block in a data slot of the first segment and storing the second data block in a data slot of the second segment. The plurality of data slots which do not contain data may contain padding, and/or the data slots to which the first and second data blocks are allocated are not adjacent. Related systems, methods, and articles of manufacture are also described.
Dynamic cache sharing based on power state
The present invention discloses a method comprising: sending cache request; monitoring power state; comparing said power state; allocating cache resources; filling cache; updating said power state; repeating said sending, said monitoring, said comparing, said allocating, said filling, and said updating until workload is completed.
PROCESSOR WITH INSTRUCTION CACHE THAT PERFORMS ZERO CLOCK RETIRES
A method of operating a processor including performing successive read cycles from an instruction cache array and a line buffer array including providing sequential memory addresses, detecting a read hit in the line buffer array, and performing a zero clock retire while performing successive read cycles. The zero clock retire includes switching the instruction cache array from a read cycle to a write cycle for one cycle, selecting a line buffer and providing a cache line stored in the selected line buffer to be stored into the instruction cache array at an address stored in the selected line buffer, and bypassing a sequential memory address being provided to the instruction cache array during the zero clock retire. If the bypassed address missed the line buffer array, the bypassed address may be replayed with a slight time penalty, which is outweighed by the time savings of zero clock retires.