G06F9/3838

INSERTING A PROXY READ INSTRUCTION IN AN INSTRUCTION PIPELINE IN A PROCESSOR
20220365780 · 2022-11-17 ·

Inserting a proxy read instruction in an instruction pipeline in a processor is disclosed. A scheduler circuit is configured to recognize when a produced value generated by execution of a producer instruction in the instruction pipeline will not be available through a data forwarding path to be consumed for processing of a subsequent consumer instruction. In this case, the scheduling circuit is configured to insert a proxy read instruction in the instruction pipeline to cause execution of an operation to generate the same produced value as was generated by previous execution of producer instruction in the instruction pipeline. Thus, the produced value will remain available in the instruction pipeline to again be available through a data forwarding path to an earlier stage of the instruction pipeline to be consumed by a consumer instruction, which may avoid a pipeline stall.

Apparatus and methods for generating random vectors

Aspects for vector operations in neural network are described herein. The aspects may include a controller unit configured to receive an instruction to generate a random vector that includes one or more elements. The instruction may include a predetermined distribution, a count of the elements, and an address of the random vector. The aspects may further include a computation module configured to generate the one or more elements, wherein the one or more elements are subject to the predetermined distribution.

Servicing CPU demand requests with inflight prefetches

Disclosed embodiments provide a technique in which a memory controller determines whether a fetch address is a miss in an L1 cache and, when a miss occurs, allocates a way of the L1 cache, determines whether the allocated way matches a scoreboard entry of pending service requests, and, when such a match is found, determine whether a request address of the matching scoreboard entry matches the fetch address. When the matching scoreboard entry also has a request address matching the fetch address, the scoreboard entry is modified to a demand request.

PHYSICAL ADDRESS PROXY REUSE MANAGEMENT

Each load/store queue entry holds a load/store physical address proxy (PAP) for use as a proxy for a load/store physical memory line address (PMLA). The load/store PAP comprises a set index and a way that uniquely identifies an L2 cache entry holding a memory line at the load/store PMLA when an L1 cache provides the load/store PAP during the load/store instruction execution. The microprocessor removes a line at a removal PMLA from an L2 entry, forms a removal PAP as a proxy for the removal PMLA that comprises a set index and a way, snoops the load/store queue with the removal PAP to determine whether the removal PAP is being used as a proxy for the removal PMLA, fills the removed entry with a line at a fill PMLA, and prevents the removal PAP from being used as a proxy for the removal PMLA and the fill PMLA concurrently.

MICROPROCESSOR THAT PREVENTS SAME ADDRESS LOAD-LOAD ORDERING VIOLATIONS
20220358047 · 2022-11-10 ·

A microprocessor prevents same address load-load ordering violations. Each load queue entry holds a load physical memory line address (PMLA) and an indication of whether a load instruction has completed execution. The microprocessor fills a line specified by a fill PMLA into a cache entry and snoops the load queue with the fill PMLA, either before the fill or in an atomic manner with the fill with respect to ability of the filled entry to be hit upon by any load instruction, to determine whether the fill PMLA matches load PMLAs in load queue entries associated with load instructions that have completed execution and there are other load instructions in the load queue that have not completed execution. The microprocessor, if the condition is true, flushes at least the other load instructions in the load queue that have not completed execution.

STORE-TO-LOAD FORWARDING CORRECTNESS CHECKS AT STORE INSTRUCTION COMMIT
20220357955 · 2022-11-10 ·

A microprocessor includes a load queue, a store queue, and a load/store unit that, during execution of a store instruction, records store information to a store queue entry. The store information comprises store address and store size information about store data to be stored by the store instruction. The load/store unit, during execution of a load instruction that is younger in program order than the store instruction, performs forwarding behavior with respect to forwarding or not forwarding the store data from the store instruction to the load instruction and records load information to a load queue entry, which comprises load address and load size information about load data to be loaded by the load instruction, and records the forwarding behavior in the load queue entry. The load/store unit, during commit of the store instruction, uses the recorded store information and the recorded load information and the recorded forwarding behavior to check correctness of the forwarding behavior.

COMPLIANCE ENFORCEMENT VIA SERVICE DISCOVERY ANALYTICS

Systems and techniques that facilitate compliance enforcement via service discovery analytics are provided. In various embodiments, a system can comprise a receiver component that can access one or more declarative deployment manifests associated with a computing application. In various instances, the system can comprise a dependency component that can build a dependency topology based on the one or more declarative deployment manifests. In various cases, the dependency topology can indicate dependencies among one or more computing objects that are declared by the one or more declarative deployment manifests. In various aspects, the system can comprise a compliance component that can determine, based on the dependency topology, whether the computing application satisfies one or more compliance standards.

Thwarting Store-to-Load Forwarding Side Channel Attacks by Pre-Forwarding Matching of Physical Address Proxies and/or Permission Checking
20220358209 · 2022-11-10 ·

A method and system for mitigating against side channel attacks (SCA) that exploit speculative store-to-load forwarding is described. The method comprises ensuring that the physical load and store addresses match and/or that permissions are present before speculatively store-to-load forwarding. Various improvements maintain a short load-store pipeline, including usage of a virtual level-one data cache (DL1), usage of an inclusive physical level-two data cache (DL2), storage and lookup of physical data address equivalents in the DL1, and using a memory dependence predictor (MDP) to speed up or replace store queue camming of load data addresses against store data addresses.

Data Hazard Generation
20230101206 · 2023-03-30 ·

Implementations are directed to methods, systems, and computer-readable media for data hazard generation for instruction sequence generation. In one aspect, a computer-implemented method includes: obtaining data hazard information defining a data hazard to be generated during computer instruction generation, the data hazard specifying a data dependency between a first instruction and a second instruction occurring after the first instruction, and generating, based on the data hazard information and register usage data of a plurality of registers, an instruction for execution in a current processing cycle that satisfies the data dependency specified by the data hazard. The register usage data specifies, for each register of the plurality of registers, whether data was read from or written into the register in a plurality of processing cycles preceding the current processing cycle.

Processor with Hardware Pipeline
20230094013 · 2023-03-30 ·

A processor includes a blocking circuit between an upstream section and a downstream section of a hardware pipeline, and control circuitry which triggers the upstream section to process an upstream phase of a first task, with the blocking circuit in an open state whereby first data from the processing of the upstream phase of the first task passes through from the upstream section to be processed in a downstream phase of the first task. In response to detecting that the upstream section has finished processing the upstream phase of the first task, the control circuitry triggers the upstream section to start processing a second task while the downstream section is still processing the downstream phase of the first task, and switches the blocking circuit to a closed state blocking second data from the processing of the upstream phase of the second task passing to the downstream section.