G06F9/3858

Suspending branch prediction upon entering transactional execution mode

In a computer supporting Transactional Memory (TM) Transaction Execution (TX), use of speculative branch prediction is programmably suspended during TX, and programmably resumed. The branch prediction suspension may cause the execution of one or more instructions following the branch instruction to stall in the pipeline until branch conditions and branch target addresses are resolved.

EXECUTING SYSTEM CALL VECTORED INSTRUCTIONS IN A MULTI-SLICE PROCESSOR

Executing system call vectored (SCV) instructions in a multi-slice processor including receiving, by an instruction fetch unit, a SCV instruction, wherein the SCV instruction is a system call from an operating system; sending the SCV instruction to a branch issue queue; determining, by the branch issue queue, that the SCV instruction is next-to-complete; issuing the SCV instruction to a branch resolution unit; and executing the SCV instruction by the branch resolution unit.

RISC-V-BASED 3D INTERCONNECTED MULTI-CORE PROCESSOR ARCHITECTURE AND WORKING METHOD THEREOF

An RISC-V-based 3D interconnected multi-core processor architecture and a working method thereof. The RISC-V-based 3D interconnected multi-core processor architecture includes a main control layer, a micro core array layer and an accelerator layer, wherein the main control layer includes a plurality of main cores which are RISC-V instruction set CPU cores, the micro core array layer includes a plurality of micro unit groups including a micro core, a data storage unit, an instruction storage unit and a linking controller, wherein the micro core is an RISC-V instruction set CPU core that executes partial functions of the main core; the accelerator layer is configured to optimize a running speed of space utilization for accelerators meeting specific requirements, wherein some main cores in the main control layer perform data interaction with the accelerator layer, the other main cores interact with the micro core array layer.

HYBRID BLOCK-BASED PROCESSOR AND CUSTOM FUNCTION BLOCKS

Apparatus and methods are disclosed for implementing block-based processors having custom function blocks, including field-programmable gate array (FPGA) implementations. In some examples of the disclosed technology, a dynamically configurable scheduler is configured to issue at least one block-based processor instruction. A custom function block is configured to receive input operands for the instruction and generate ready state data indicating completion of a computation performed for the instruction by the respective custom function block.

EVENT HANDLING IN PIPELINE EXECUTE STAGES
20220365787 · 2022-11-17 ·

A method includes receiving an execute packet that includes a first instruction and a second instruction and executing the first instruction and the second instruction using a pipeline. Executing the first and second instructions includes storing a result of the first instruction in a holding register; determining whether an event that interrupts execution of the execute packet occurs prior to completion of the executing of the second instruction; and based on the event not occurring, committing the result of the first instruction after completion of the executing of the second instruction.

Dynamic sequencing of data partitions for optimizing memory utilization and performance of neural networks

Optimized memory usage and management is crucial to the overall performance of a neural network (NN) or deep neural network (DNN) computing environment. Using various characteristics of the input data dimension, an apportionment sequence is calculated for the input data to be processed by the NN or DNN that optimizes the efficient use of the local and external memory components. The apportionment sequence can describe how to parcel the input data (and its associated processing parameters—e.g., processing weights) into one or more portions as well as how such portions of input data (and its associated processing parameters) are passed between the local memory, external memory, and processing unit components of the NN or DNN. Additionally, the apportionment sequence can include instructions to store generated output data in the local and/or external memory components so as to optimize the efficient use of the local and/or external memory components.

Incorporating a spatial array into one or more programmable processor cores

Functional units disposed in one or more processor cores are communicatively coupled using both a shared bypass network and a switched network. The shared bypass network enables the functional units to be operated conventionally for general processing while the switched network enables specialized processing in which the functional units are configured as a spatial array. In the spatial array configuration, operands produced by one functional unit can only be sent to a subset of functional units to which dependent instructions have been mapped a priori. The functional units may be dynamically reconfigured at runtime to toggle between operating in the general configuration and operating as the spatial array. Information to control the toggling between operating configurations may be provided in instructions received by the functional units.

Dynamic adaptive drain for write combining buffer
11256622 · 2022-02-22 · ·

In one embodiment, a processor includes a write combining buffer that includes a memory having a plurality of entries. The entries may be allocated to committed store operations transmitted by a load/store unit in the processor, and subsequent committed store operations may merge data with previous store memory operations in the buffer if the subsequent committed store operations are to addresses that match addresses of the previous committed store operations within a predefined granularity (e.g. the width of a cache port). The write combining buffer may be configured to retain up to N entries of committed store operations, but may also be configured to write one or more of the entries to the data cache responsive to receiving more than a threshold amount of non-merging committed store operations in the write combining buffer.

Mechanism for saving and retrieving micro-architecture context

Disclosed embodiments relate to processing logic for performing function operations. In one example, and apparatus includes an execution unit within a processor to execute a code block, power management hardware coupled to the execution unit, wherein the power management hardware is to monitor a first execution of the code block, store a micro-architectural context of the processor in a metadata block associated with the code block, the micro-architectural context including performance data resulting from the first execution of the code block, the performance data comprising power and energy usage data, and power management related parameters, read the associated metadata block upon a second execution of the code block, and tune the second execution based on the performance data stored in the associated metadata block to increase efficiency of executing the code block.

Instruction dispatch for superscalar processors

The present disclosure relates to instruction dispatch mechanisms for superscalar processors having a plurality of functional units for executing operations simultaneously. Each particular functional unit of the plurality of functional units may be configured to output a capability vector indicating a set of operations that the particular functional unit is currently available to perform. As instructions are received in an issue queue, the functional unit to execute the instruction is selected by comparing capabilities required by the instruction to the asserted capabilities of each of the functional units. A functional unit may reset or de-assert a particular functionality while performing an operation and then re-assert the capability when the instruction is completed. A result of the operation may be stored in a skid buffer for at least as long as the chain execution time in order to avoid resource hazards are a write port of the vector register file.