Patent classifications
G06F9/3816
Hardware coherent computational expansion memory
Embodiments herein describe transferring ownership of data (e.g., cachelines or blocks of data comprising multiple cachelines) from a host to hardware in an I/O device. In one embodiment, the host and I/O device (e.g., an accelerator) are part of a cache-coherent system where ownership of data can be transferred from a home agent (HA) in the host to a local HA in the I/O device—e.g., a computational slave agent (CSA). That way, a function on the I/O device (e.g., an accelerator function) can request data from the local HA without these requests having to be sent to the host HA. Further, the accelerator function can indicate whether the local HA tracks the data on a cacheline-basis or by a data block (e.g., multiple cachelines). This provides flexibility that can reduce overhead from tracking the data, depending on the function's desired use of the data.
Sequence alignment method of vector processor
A sequence alignment method that may be performed by a vector processor is may include loading a sequence that is an instance of vector data including a plurality of elements, dividing the sequence into two groups, aligning respective elements of the groups to generate a sequence of sorted elements according to a single instruction multiple data mode, and iteratively performing an alignment operation based on a determination that each group in the sequence of sorted elements includes more than one element of the plurality of elements. Each iteration may include dividing each group to form new groups and aligning respective elements of each pair of adjacent new groups to generate a new sequence of sorted elements. The new sequence of a current iteration of the alignment operation may be transmitted as a data output, based on a determination that each new group does not include more than one element.
CACHE LINE INVALIDATION TECHNOLOGIES
Examples described herein relate to a device issuing a single command to request invalidation of multiple cache lines associated with a memory address range in a cache device. In some examples, the cache device is associated with the processor. In some examples, the processor comprises one or more of a central processing unit (CPU), core, or graphics processing unit (GPU).
Method for managing the supply of information, such as instructions, to a microprocessor, and a corresponding system
A processor interacts with a memory set including a cache memory, a first memory storing at least a first piece of information in a first information group, and a second memory storing at least a second piece of information in a second information group. In response to a first cache miss and following a first request from the processor for the first piece of information, the first piece of information obtained from the first memory is supplied to the processor. After a second request from the processor for the second piece of information, the second piece of information obtained from the second memory is supplied to the processor, even if the first information group is currently being transferred from the first memory for loading into the cache memory.
WAY PREDICTOR AND ENABLE LOGIC FOR INSTRUCTION TIGHTLY-COUPLED MEMORY AND INSTRUCTION CACHE
Disclosed herein are systems and method for instruction tightly-coupled memory (iTIM) and instruction cache (iCache) access prediction. A processor may use a predictor to enable access to the iTIM or the iCache and a particular way (a memory structure) based on a location state and program counter value. The predictor may determine whether to stay in an enabled memory structure, move to and enable a different memory structure, or move to and enable both memory structures. Stay and move predictions may be based on whether a memory structure boundary crossing has occurred due to sequential instruction processing, branch or jump instruction processing, branch resolution, and cache miss processing. The program counter and a location state indicator may use feedback and be updated each instruction-fetch cycle to determine which memory structure(s) needs to be enabled for the next instruction fetch.
Partial write management in a multi-tiled compute engine
Embodiments described herein provide a general purpose graphics processor comprising a plurality of tiles, each tile of the plurality of tiles comprising at least one execution unit, a local cache, and a cache control unit, and a high bandwidth memory communicatively coupled to the plurality of tiles, wherein the high bandwidth memory is shared between the plurality of tiles. The cache control unit is to implement a partial write management protocol to receive a partial write operation directed to a cache line in the local cache, the partial write operation comprising write data, write the data associated with the partial write operation to the local cache when the cache line is in a modified state, and forward the write data associated with the partial write operation to the high bandwidth memory when the partial write operation triggers a cache miss or when the cache line is in an exclusive state or a shared state. Other embodiments may be described and claimed.
HARDWARE COHERENT COMPUTATIONAL EXPANSION MEMORY
Embodiments herein describe transferring ownership of data (e.g., cachelines or blocks of data comprising multiple cachelines) from a host to hardware in an I/O device. In one embodiment, the host and I/O device (e.g., an accelerator) are part of a cache-coherent system where ownership of data can be transferred from a home agent (HA) in the host to a local HA in the I/O device—e.g., a computational slave agent (CSA). That way, a function on the I/O device (e.g., an accelerator function) can request data from the local HA without these requests having to be sent to the host HA. Further, the accelerator function can indicate whether the local HA tracks the data on a cacheline-basis or by a data block (e.g., multiple cachelines). This provides flexibility that can reduce overhead from tracking the data, depending on the function's desired use of the data.
Pipelined read-modify-write operations in cache memory
In described examples, a processor system includes a processor core that generates memory write requests, a cache memory, and a memory pipeline of the cache memory. The memory pipeline has a holding buffer, an anchor stage, and an RMW pipeline. The anchor stage determines whether a data payload of a write request corresponds to a partial write. If so, the data payload is written to the holding buffer and conforming data is read from a corresponding cache memory address to merge with the data payload. The RMW pipeline has a merge stage and a syndrome generation stage. The merge stage merges the data payload in the holding buffer with the conforming data to make merged data. The syndrome generation stage generates an ECC syndrome using the merged data. The memory pipeline writes the data payload and ECC syndrome to the cache memory.
Apparatus and method for managing capability metadata
Apparatus comprising cache storage and a method of operating such a cache storage are provided. Data blocks in the cache storage have capability metadata stored in association therewith identifying whether the data block specifies a capability or a data value. At least one type of capability is a bounded pointer. Responsive to a write to a data block in the cache storage a capability metadata modification marker is set in association with the data block, indicative of whether the capability metadata associated with the data block has changed since the data block was stored in the cache storage. This supports the security of the system, such that modification of the use of a data block from a data value to a capability cannot take place unless intended. Efficiencies may also result when capability metadata is stored separately from other data in memory, as fewer accesses to memory can be made.
Hardware secure element, related processing system, integrated circuit, device and method
A hardware secure element includes a processing unit and a receiver circuit configured to receive data comprising a command field and a parameter field adapted to contain a plurality of parameters. The hardware secure element also includes at least one hardware parameter check module configured to receive at an input a parameter to be processed selected from the plurality of parameters, and to process the parameter to be processed to verify whether the parameter has given characteristics. The hardware parameter check module has associated one or more look-up tables configured to receive at an input the command field and a parameter index identifying the parameter to be processed by the hardware parameter check module, and to determine for the command field and the parameter index a configuration data element.