G06F2212/27

NO-LOCALITY HINT VECTOR MEMORY ACCESS PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS
20230052652 · 2023-02-16 ·

A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode a no-locality hint vector memory access instruction. The no-locality hint vector memory access instruction to indicate a packed data register of the plurality of packed data registers that is to have a source packed memory indices. The source packed memory indices to have a plurality of memory indices. The no-locality hint vector memory access instruction is to provide a no-locality hint to the processor for data elements that are to be accessed with the memory indices. The processor also includes an execution unit coupled with the decode unit and the plurality of packed data registers. The execution unit, in response to the no-locality hint vector memory access instruction, is to access the data elements at memory locations that are based on the memory indices.

MICROPROCESSOR THAT PREVENTS SAME ADDRESS LOAD-LOAD ORDERING VIOLATIONS USING PHYSICAL ADDRESS PROXIES
20220358038 · 2022-11-10 ·

A L2 set associative cache that is inclusive of an L1 cache. Each entry of a load queue holds a load physical address proxy (PAP) for a load physical memory line address (PMLA) rather than the load PMLA itself. The load PAP comprises the set index and the way that uniquely identifies the L2 entry that holds a memory line specified by the load PMLA. Each load queue entry indicates whether the load instruction has completed execution. The microprocessor removes a memory line at a removal PMLA from an L2 entry and forms a removal PAP as a proxy for the removal PMLA. The removal PAP comprises a set index and a way that uniquely identifies the removed entry. The microprocessor snoops the load queue with the removal PAP to determine whether the removal PAP matches one or more load PAPs in one or more load queue entries associated with one or more load instructions that have completed execution and, if so, signals an abort request.

PHYSICAL ADDRESS PROXY (PAP) RESIDENCY DETERMINATION FOR REDUCTION OF PAP REUSE

A L2 cache is set associative, has N ways, and is inclusive of a virtual L1 cache such that when the virtual address misses in the L1: a portion of the virtual address is translated into a physical memory line address (PMLA), the PMLA is allocated into an L2 entry, and a physical address proxy (PAP) for the PMLA is allocated into an L1 entry. The PAP for the PMLA includes a set and a way that uniquely identify the L2 entry. The L2 receives a physical memory line address for allocation, uses a set index portion of the PMLA, and for each L2 way, forms a PAP corresponding to the way. The L1, for each PAP, generates a corresponding indicator of whether the PAP is L1 resident. The L2 selects, for replacement, a way whose indicator indicates the PAP is not resident in the L1.

Control flow guided lock address prefetch and filtering

A method of prefetching target data includes, in response to detecting a lock-prefixed instruction for execution in a processor, determining a predicted target memory location for the lock-prefixed instruction based on control flow information associating the lock-prefixed instruction with the predicted target memory location. Target data is prefetched from the predicted target memory location to a cache coupled with the processor, and after completion of the prefetching, the lock-prefixed instruction is executed in the processor using the prefetched target data.

Coherent node controller

A cache coherent node controller at least includes one or more network interface controllers, each network interface controller includes at least one network interface, and at least two coherent interfaces each configured for communication with a microprocessor. A computer system includes one or more of nodes wherein each node is connected to at least one network switch, each node at least includes a cache coherent node controller.

No-locality hint vector memory access processors, methods, systems, and instructions
11392500 · 2022-07-19 · ·

A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode a no-locality hint vector memory access instruction. The no-locality hint vector memory access instruction to indicate a packed data register of the plurality of packed data registers that is to have a source packed memory indices. The source packed memory indices to have a plurality of memory indices. The no-locality hint vector memory access instruction is to provide a no-locality hint to the processor for data elements that are to be accessed with the memory indices. The processor also includes an execution unit coupled with the decode unit and the plurality of packed data registers. The execution unit, in response to the no-locality hint vector memory access instruction, is to access the data elements at memory locations that are based on the memory indices.

Physical address proxy (PAP) residency determination for reduction of PAP reuse

A L2 cache is set associative, has N ways, and is inclusive of a virtual L1 cache such that when the virtual address misses in the L1: a portion of the virtual address is translated into a physical memory line address (PMLA), the PMLA is allocated into an L2 entry, and a physical address proxy (PAP) for the PMLA is allocated into an L1 entry. The PAP for the PMLA includes a set and a way that uniquely identify the L2 entry. The L2 receives a physical memory line address for allocation, uses a set index portion of the PMLA, and for each L2 way, forms a PAP corresponding to the way. The L1, for each PAP, generates a corresponding indicator of whether the PAP is L1 resident. The L2 selects, for replacement, a way whose indicator indicates the PAP is not resident in the L1.

Methods and apparatus to implement multiple inference compute engines

Methods and apparatus to implement multiple inference compute engines are disclosed herein. A disclosed example apparatus includes a first inference compute engine, a second inference compute engine, and an accelerator on coherent fabric to couple the first inference compute engine and the second inference compute engine to a converged coherency fabric of a system-on-chip, the accelerator on coherent fabric to arbitrate requests from the first inference compute engine and the second inference compute engine to utilize a single in-die interconnect port.

NO-LOCALITY HINT VECTOR MEMORY ACCESS PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS
20210141734 · 2021-05-13 ·

A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode a no-locality hint vector memory access instruction. The no-locality hint vector memory access instruction to indicate a packed data register of the plurality of packed data registers that is to have a source packed memory indices. The source packed memory indices to have a plurality of memory indices. The no-locality hint vector memory access instruction is to provide a no-locality hint to the processor for data elements that are to be accessed with the memory indices. The processor also includes an execution unit coupled with the decode unit and the plurality of packed data registers. The execution unit, in response to the no-locality hint vector memory access instruction, is to access the data elements at memory locations that are based on the memory indices.

COHERENT NODE CONTROLLER
20210064530 · 2021-03-04 ·

A cache coherent node controller at least includes one or more network interface controllers, each network interface controller includes at least one network interface, and at least two coherent interfaces each configured for communication with a microprocessor. A computer system includes one or more of nodes wherein each node is connected to at least one network switch, each node at least includes a cache coherent node controller.