G06F9/30079

METHOD AND APPARATUS FOR MINIMALLY INTRUSIVE INSTRUCTION POINTER-AWARE PROCESSING RESOURCE ACTIVITY PROFILING

Systems and methods for minimally intrusive instruction pointer-aware processing resource activity profiling are disclosed. In one embodiment, a graphics processor includes a grouping of processing resources and control logic that is associated with the grouping of processing resources. The control logic is configured to sample a state of at least one processing resource of the grouping of processing resources and to determine activity data from the state with the activity data including at least one of stalls and reason counts for stalling activity, instruction types, pipeline utilization, thread utilization, and shader activity.

NPU implemented for artificial neural networks to process fusion of heterogeneous data received from heterogeneous sensors
11731656 · 2023-08-22 · ·

A neural processing unit (NPU) includes a controller including a scheduler, the controller configured to receive from a compiler a machine code of an artificial neural network (ANN) including a fusion ANN, the machine code including data locality information of the fusion ANN, and receive heterogeneous sensor data from a plurality of sensors corresponding to the fusion ANN; at least one processing element configured to perform fusion operations of the fusion ANN including a convolution operation and at least one special function operation; a special function unit (SFU) configured to perform a special function operation of the fusion ANN; and an on-chip memory configured to store operation data of the fusion ANN, wherein the schedular is configured to control the at least one processing element and the on-chip memory such that all operations of the fusion ANN are processed in a predetermined sequence according to the data locality information.

Fine grained control flow enforcement to mitigate malicious call/jump oriented programming
11327755 · 2022-05-10 · ·

In one embodiment, a processor comprises a decoder to decode a first instruction, the first instruction comprising an opcode and at least one parameter, the opcode to identify the first instruction as an instruction associated with an indirect branch, the at least one parameter indicative of whether the indirect branch is allowed; and circuitry to generate an error message based on the at least one parameter.

Processing device with vector transformation execution

An integrated circuit, comprising an instruction pipeline that includes instruction fetch phase circuitry, instruction decode phase circuitry, and instruction execution circuitry. The instruction execution circuitry includes transformation circuitry for receiving an interleaved dual vector operand as an input and for outputting a first natural order vector including a first set of data values from the interleaved dual vector operand and a second natural order vector including a second set of data values from the interleaved dual vector operand.

RULE-BASED DATA STREAM PROCESSING

Systems and methods for rule-based data stream processing by data collection, indexing, and visualization systems. An example method includes: receiving, by the computer system, an input data stream comprising raw machine data; processing the raw machine data by a data processing pipeline that produces transformed machine data, wherein the data processing pipeline comprises an ordered plurality of pipeline stages, wherein a pipeline stage of the ordered plurality of pipeline stages applies a rule of a set of rules to an input of the pipeline stage, wherein the rule specifies an action to be performed on the input of the pipeline stage responsive to evaluating a conditional expression applied to the input of the pipeline stage, wherein the action generates an output of the pipeline stage, and wherein the rule is selected based on a source type associated with the input data stream; and supplying the transformed machine data to a data collection, indexing, and visualization system.

RESCHEDULING A FAILED MEMORY REQUEST IN A PROCESSOR
20220121486 · 2022-04-21 ·

Devices and techniques for rescheduling a failed memory request in a processor are described herein. When a memory request for a thread is denied at a point in the execution pipeline of the processor beyond a thread rescheduling point, the thread can be placed into a memory response path of the processor. An indicator that a register write-back will not occur for the thread can also be provided. Then, the thread can be rescheduled with other threads in the memory response path.

Processing device with vector transformation execution

An integrated circuit, comprising an instruction pipeline that includes instruction fetch phase circuitry, instruction decode phase circuitry, and instruction execution circuitry. The instruction execution circuitry includes transformation circuitry for receiving an interleaved dual vector operand as an input and for outputting a first natural order vector including a first set of data values from the interleaved dual vector operand and a second natural order vector including a second set of data values from the interleaved dual vector operand.

OPERATION ELIMINATION

A data processing apparatus is provided. Rename circuitry performs a register rename stage of a pipeline by storing, in storage circuitry, mappings between registers. Each of the mappings is associated with an elimination field value. Operation elimination circuitry replaces an operation that indicates an action is to be performed on data from a source register and stored in a destination register, with a new mapping in the storage circuitry that references the destination register and has the elimination field value set. Operation circuitry responds to a subsequent operation that accesses the destination register when the elimination field value is set; by obtaining contents of the source register, performing the action on the contents to obtain a result, and returning the result.

Merging data for write allocate

A method includes receiving, by a level two (L2) controller, a write request for an address that is not allocated as a cache line in a L2 cache. The write request specifies write data. The method also includes generating, by the L2 controller, a read request for the address; reserving, by the L2 controller, an entry in a register file for read data returned in response to the read request; updating, by the L2 controller, a data field of the entry with the write data; updating, by the L2 controller, an enable field of the entry associated with the write data; and receiving, by the L2 controller, the read data and merging the read data into the data field of the entry.

RISC-V IMPLEMENTED PROCESSOR WITH HARDWARE ACCELERATION SUPPORTING USER DEFINED INSTRUCTION SET AND METHOD THEREOF

The present invention relates to a hardware high-speed computation combined RISC-V based computation device for supporting a user-defined instruction set and a method thereof which configures a hardware high-speed computation unit executing a user-defined function through a field programmable gate array (FPGA) in a single chip together with a RISC-V based computation device, executes general computation and user-defined computation in an instruction level, not a separate bus connection configuration, through a program using a RISC-V based instruction set including a user-defined instruction set, and provides flexibility capable of optionally changing the user-defined instruction set and a corresponding function and a method thereof.