Patent classifications
G06F2212/27
Method, apparatus and system for optimizing cache memory transaction handling in a processor
In one embodiment, a processor includes a caching home agent (CHA) coupled to a core and a cache memory and includes a cache controller having a cache pipeline and a home agent having a home agent pipeline. The CHA may: receive, in the home agent pipeline, information from an external agent responsive to a miss for data in the cache memory; issue a global ordering signal from the home agent pipeline to a requester of the data to inform the requester of receipt of the data; and report issuance of the global ordering signal to the cache pipeline, to prevent the cache pipeline from issuance of a global ordering signal to the requester. Other embodiments are described and claimed.
No-locality hint vector memory access processors, methods, systems, and instructions
A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode a no-locality hint vector memory access instruction. The no-locality hint vector memory access instruction to indicate a packed data register of the plurality of packed data registers that is to have a source packed memory indices. The source packed memory indices to have a plurality of memory indices. The no-locality hint vector memory access instruction is to provide a no-locality hint to the processor for data elements that are to be accessed with the memory indices. The processor also includes an execution unit coupled with the decode unit and the plurality of packed data registers. The execution unit, in response to the no-locality hint vector memory access instruction, is to access the data elements at memory locations that are based on the memory indices.
POINT CLOUD DATA SORTING CIRCUIT, METHOD, SOC CHIP, AND COMPUTER DEVICE
This application provides a point cloud data sorting circuit, method. The point cloud data sorting circuit includes a sorting module, which contains a row caching unit and a row sorting unit. The row caching unit is configured to obtain and cache original point cloud data to be sorted. The original point cloud data is obtained by scanning the field of view with the LiDAR. Between two rows of original point cloud data obtained in one second scanning period, there are N rows of other original point cloud data obtained in another second scanning period. The row sorting unit is configured to perform coordinate transformation on the original point cloud data to obtain target point cloud data sorted according to pitch angles. This allows the sorting of original point cloud data to be achieved using independent hardware, improving the sorting speed of the original point cloud data.
COMPUTING SYSTEM AND METHOD FOR POWER-SAVING COMPUTE-IN-MEMORY DESIGN
A computing system with power-saving compute-in-memory (CIM) design that minimizes the computation energy of the matrix-matrix multiplication is shown. A processor control unit loads A blocks divided from a matrix A.sub.MK from a second-level (L2) memory to a first-level (L1) memory, and loads B blocks divided from a matrix B.sub.KN from the L2 memory to a CIM memory. The A blocks buffered in the L1 memory are programmed to a register file to be entered into the CIM memory. The CIM memory performs multiply-and-accumulate (MAC) calculations on the A blocks and the B blocks to generate C blocks which form a matrix C.sub.MN(=A.sub.MKB.sub.KN). Based on the size of A.sub.MK and B.sub.KN, an A block buffering capability of the L1 memory, and a B block buffering capability of the CIM memory, the reuse scheme is properly selected to reuse the buffered A blocks and B blocks.
Memory for a neural network processing system
A method may include partitioning a tensor that includes a plurality of data values into a number (K) of subtensors, where each of the K subtensors may include a respective subset of the plurality of data values. The method may also include retrieving one or more first data values of the subset of data values included in a first subtensor of the K subtensors in accordance with an access pattern associated with a neural network processor. The method may include storing the one or more first data values of the subset of data values in one of K segments of cache memory, where each of the K segments may be associated with a respective one of the K subtensors. Further, the method may include processing, using the neural network processor, the one or more first data values of the subset of data values in accordance with the access pattern.
Descriptor cache eviction for multi-queue direct memory access
Evicting queues from a memory of a direct memory access system includes monitoring a global eviction timer. From a plurality of descriptor lists stored in a plurality of entries of a cache memory, a set of candidate descriptor lists is determined. The set of candidate descriptor lists includes one or more of the plurality of descriptor lists in a prefetch only state. An eviction event can be detected by detecting a first eviction condition including a state of the global eviction timer and a second eviction condition. In response to detecting the eviction event, a descriptor list from the set of candidate descriptor lists is selected for eviction. The selected descriptor list can be evicted from the cache memory.
Computing system and method for power-saving compute-in-memory design
A computing system with power-saving compute-in-memory (CIM) design that minimizes the computation energy of the matrix-matrix multiplication is shown. A processor control unit loads A blocks divided from a matrix A.sub.MK from a second-level (L2) memory to a first-level (L1) memory, and loads B blocks divided from a matrix B.sub.KN from the L2 memory to a CIM memory. The A blocks buffered in the L1 memory are programmed to a register file to be entered into the CIM memory. The CIM memory performs multiply-and-accumulate (MAC) calculations on the A blocks and the B blocks to generate C blocks which form a matrix C.sub.MN (=A.sub.MKB.sub.KN). Based on the size of A.sub.MK and B.sub.KN, an A block buffering capability of the L1 memory, and a B block buffering capability of the CIM memory, the reuse scheme is properly selected to reuse the buffered A blocks and B blocks.