Patent classifications
G06F9/3826
System and Method for Implementing Strong Load Ordering in a Processor Using a Circular Ordering Ring
A system and corresponding method enforce strong load ordering in a processor. The system comprises an ordering ring that stores entries corresponding to in-flight memory instructions associated with a program order, scanning logic, and recovery logic. The scanning logic scans the ordering ring in response to execution or completion of a given load instruction of the in-flight memory instructions and detects an ordering violation in an event at least one entry of the entries indicates that a younger load instruction has completed and is associated with an invalidated cache line. In response to the ordering violation, the recovery logic allows the given load instruction to complete, flushes the younger load instruction, and restarts execution of the processor after the given load instruction in the program order, causing data returned by the given and younger load instructions to be returned consistent with execution according to the program order to satisfy strong load ordering.
Enforcing a segmentation policy in co-existence with a system firewall
A segmentation firewall executing on a host enforces a segmentation policy. In a co-existence mode, the segmentation firewall operates in co-existence with a system firewall that enforces a security policy. The segmentation firewall is configured to either drop packets that do not match any permissive rule or pass packets that match a permissive rule to the system firewall to enable the system firewall to determine whether to drop or accept the passed packets. To enable efficient operation of the segmentation firewall when operating in co-existence with the system firewall, the segmentation firewall may include a plurality of rule chains and may be configured to exit a chain and bypass remaining rule chains upon an input packet matching a permissive rule of the segmentation policy.
Enforcing a segmentation policy in co-existence with a system firewall
A segmentation firewall executing on a host enforces a segmentation policy. In a co-existence mode, the segmentation firewall operates in co-existence with a system firewall that enforces a security policy. The segmentation firewall is configured to either drop packets that do not match any permissive rule or pass packets that match a permissive rule to the system firewall to enable the system firewall to determine whether to drop or accept the passed packets. To enable efficient operation of the segmentation firewall when operating in co-existence with the system firewall, the segmentation firewall may include a plurality of rule chains and may be configured to exit a chain and bypass remaining rule chains upon an input packet matching a permissive rule of the segmentation policy.
PREDICTING LOAD-BASED CONTROL INDEPENDENT (CI) REGISTER DATA INDEPENDENT (DI) (CIRDI) INSTRUCTIONS AS CI MEMORY DATA DEPENDENT (DD) (CIMDD) INSTRUCTIONS FOR REPLAY IN SPECULATIVE MISPREDICTION RECOVERY IN A PROCESSOR
Predicting load-based control independent (CI), register data independent (DI) (CIRDI) instructions as CI memory data dependent (DD) (CIMDD) instructions for replay in speculative misprediction recovery in a processor. The processor predicts if a source of a load-based CIRDI instruction will be forwarded by a store-based instruction (i.e. “store-forwarded”). If a load-based CIRDI instruction is predicted as store-forwarded, the load-based CIRDI instruction is considered a CIMDD instruction and is replayed in misprediction recovery. If a load-based CIRDI instruction is not predicted as store-forwarded, the processor considers such load-based CIRDI instruction as a pending load-based CIRDI instruction. If this pending load-based CIRDI instruction is determined in execution to be store-forwarded, the instruction pipeline is flushed and the pending load-based CIRDI instruction is also replayed in misprediction recovery. If this pending load-based CIRDI instruction is not determined in execution to be store-forwarded, the pending load-based CIRDI instruction is not replayed in misprediction recovery.
ARITHMETIC LOGIC UNIT LAYOUT FOR A PROCESSOR
A processor has first, second and third ALUs. The first ALU has on a first side an input and an output. The second ALU has a first side facing the first side of the first ALU, an input and an output on the first side of the second ALU and being in a rotated orientation relative to the input and the output of the first side of the first ALU, and an output on a second side of the second ALU. The third ALU has a first side facing the second side of the second ALU, and an input and an output on the first side of the third ALU. The input of the first side of the first ALU is logically directly connected to the output of the first side of the second ALU.
REUSE IN-FLIGHT REGISTER DATA IN A PROCESSOR
Devices and techniques for short-thread rescheduling in a processor are described herein. When an instruction for a thread completes, a result is produced. The condition that the same thread is scheduled in a next execution slot and that the next instruction of the thread will use the result can be detected. In response to this condition, the result can be provided directly to an execution unit for the next instruction.
MICROPROCESSOR CORE WITH A STORE DEPENDENCE PREDICTOR ACCESSED USING A TRANSLATION CONTEXT
In order to mitigate side channel attacks that exploit speculative store-to-load forwarding, a store dependence predictor is used to prevent store-to-load forwarding if the load and store instructions do not have a matching translation context (TC). In one design, a store queue (SQ) stores the TC—a function of the privilege mode (PM), address space identifier (ASID), and/or virtual machine identifier (VMID)—of each store and conditions store-to-load forwarding on matching store and load TCs. In another design, a memory dependence predictor (MDP) disambiguates predictions of store-to-load forwarding based on the load instruction's TC. In each design, the MDP or SQ does not predict or allow store-to-load forwarding for loads whose addresses, but not their TCs, match an MDP entry.
Facilitating data processing using SIMD reduction operations across SIMD lanes
Various embodiments are provided for facilitating data processing by one or more processors in a computing system. An instruction to be executed may be obtained. The instruction is a single instruction multiple data (SIMD) reduction operation of an operand vector with a plurality of vector elements. The SIMD reduction operation may be executed to produce a result vector with a plurality of alternative vector elements. One or more reduction functions may be performed on each of a pair of vector elements from the plurality of vector elements of the operand vector and a result of the one or more reduction functions may be placed in a corresponding vector element of the result vector.
Neural network unit that manages power consumption based on memory accesses per period
An apparatus includes a first memory, processing units that access the first memory, and a counter that, for each period of a sequence of periods, holds an indication of accesses to the first memory during the period; and control logic that, for each period of the sequence of periods, monitors the indication to determine whether it exceeds the threshold and, if so, stalls the processing units from accessing the first memory for a remaining portion of the period.
Arithmetic processing device and arithmetic processing method
An arithmetic processing device includes: a memory; and a processor coupled to the memory and configured to: execute a plurality of data processes each of which is divided into a plurality of pipeline stages in parallel at different timings; measure a processing time of each of the plurality of pipeline stages; and set a priority of the plurality of pipeline stages in a descending order of the measured processing time.