G06F9/30079

GRAPHICS PROCESSORS AND GRAPHICS PROCESSING UNITS HAVING DOT PRODUCT ACCUMULATE INSTRUCTION FOR HYBRID FLOATING POINT FORMAT

Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.

Programmable computer IO device interface

Methods and apparatuses for a programmable IO device interface are provided. The apparatus may comprise: a first memory unit having a plurality of programs stored thereon, the plurality of programs are associated with a plurality of actions comprising updating memory based data structure, inserting a DMA command or initiating an event; a second memory unit for receiving and storing a table result, and the table result is provided by a table engine configured to perform packet match operations on (i) a packet header vector contained in a header portion and (ii) data stored in a programmable match table; and circuitry for executing a program selected from the plurality of programs in response to the table result and an address received by the apparatus, and the program is executed until completion and the program is associated with the programmable match table.

Controller with caching and non-caching modes

An apparatus includes a CPU core, a first cache subsystem coupled to the CPU core, and a second memory coupled to the cache subsystem. The first cache subsystem includes a configuration register, a first memory, and a controller. The controller is configured to: receive a request directed to an address in the second memory and, in response to the configuration register having a first value, operate in a non-caching mode. In the non-caching mode, the controller is configured to provide the request to the second memory without caching data returned by the request in the first memory. In response to the configuration register having a second value, the controller is configured to operate in a caching mode. In the caching mode the controller is configured to provide the request to the second memory and cache data returned by the request in the first memory.

Managing processor core synchronization using interrupts
11263043 · 2022-03-01 · ·

Interrupt messages are sent from an interrupt controller to respective processor cores and data synchronization is managed among the processor cores. Each processor core includes a pipeline that includes a plurality of stages through which instructions of a program are executed, where stored order information indicates whether a state of the pipeline is in-order or out-of-order; and circuitry for receiving interrupt messages from the interrupt controller and performing an interrupt action in response to a corresponding interrupt message after ensuring that the order information indicates that the state of the pipeline is in-order when each interrupt action is performed. Managing the data synchronization includes generating a first interrupt message at an issuing processor core in response to a synchronization related instruction executed at the issuing processor core; and receiving the first interrupt message at each receiving processor core in a set of one or more receiving processor cores.

METHOD FOR MANAGING SOFTWARE THREADS DEPENDENT ON CONDITION VARIABLES
20170315806 · 2017-11-02 ·

An apparatus includes a buffer, a sequencing circuit, and an execution unit. The buffer may be configured to store a plurality of instructions. Each of the plurality of instructions may be in a first thread. In response to determining that the first instruction depends on the value of a condition variable and to determining that a count value is below a predetermined threshold, the sequencing circuit may be configured to add a wait instruction before the first instruction. The execution unit may be configured to delay execution of the first instruction for an amount of time after executing the wait instruction. The sequencing circuit may be further configured to maintain the plurality of instructions in the first buffer after executing the wait instruction, and to decrement the count value in response to determining that the value of the condition variable is updated within the amount of time.

Microprocessor with secure execution mode and store key instructions

A microprocessor conditionally grants a request to switch from a normal execution mode in which encrypted instructions cannot be executed, into a secure execution mode (SEM). Thereafter, the microprocessor executes a plurality of instructions, including a store-key instruction to write a set of one or more cryptographic key values into a secure memory of the microprocessor. After fetching an encrypted program from an instruction cache, the microprocessor decrypts the encrypted program into plaintext instructions using decryption logic within the microprocessor's instruction-processing pipeline.

System, apparatus and method for adaptive interconnect routing

In one embodiment, an apparatus includes an interconnect to couple a plurality of processing circuits. The interconnect may include a pipe stage circuit coupled between a first processing circuit and a second processing circuit. This pipe stage circuit may include: a pipe stage component having a first input to receive a signal via the interconnect and a first output to output the signal; and a selection circuit having a first input to receive the signal from the first output of the pipe stage component and a second input to receive the signal via a bypass path, where the selection circuit is dynamically controllable to output the signal received from the first output of the pipe stage component or the signal received via the bypass path. Other embodiments are described and claimed.

Rule-based data stream processing

Systems and methods for rule-based data stream processing by data collection, indexing, and visualization systems. An example method includes: receiving, by the computer system, an input data stream comprising raw machine data; processing the raw machine data by a data processing pipeline that produces transformed machine data, wherein the data processing pipeline comprises an ordered plurality of pipeline stages, wherein a pipeline stage of the ordered plurality of pipeline stages applies a rule of a set of rules to an input of the pipeline stage, wherein the rule specifies an action to be performed on the input of the pipeline stage responsive to evaluating a conditional expression applied to the input of the pipeline stage, wherein the action generates an output of the pipeline stage, and wherein the rule is selected based on a source type associated with the input data stream; and supplying the transformed machine data to a data collection, indexing, and visualization system.

Computer processor employing instructions with elided nop operations

A computer processor that operates on distinct first and second instruction streams that have a predefined timed semantic relationship. At least one of the first and second instruction streams includes variable-length instructions having a header and associated bundle bounded by a head end and a tail end. An alignment hole within the bundle encodes information representing at least one nop operation. The computer processor includes first and second multi-stage instruction processing components configured to process in parallel the first and second instruction streams. At least one of the first and second multi-stage instruction processing components includes an instruction buffer operably coupled to a decode stage. The decode stage is configured to process a variable-length instruction by isolating and interpreting the alignment hole of the variable length instruction in order to initiate zero or more nop operations that follow the timed semantic relationship between the first and second instruction streams.

Cache coherence shared state suppression

A method includes receiving, by a level two (L2) controller, a first request for a cache line in a shared cache coherence state; mapping, by the L2 controller, the first request to a second request for a cache line in an exclusive cache coherence state; and responding, by the L2 controller, to the second request.