G06F9/30149

VARIABLE-LENGTH INSTRUCTION BUFFER MANAGEMENT

A vector processor is disclosed including a variety of variable-length instructions. Computer-implemented methods are disclosed for efficiently carrying out a variety of operations in a time-conscious, memory-efficient, and power-efficient manner. Methods for more efficiently managing a buffer by controlling the threshold based on the length of delay line instructions are disclosed. Methods for disposing multi-type and multi-size operations in hardware are disclosed. Methods for condensing look-up tables are disclosed. Methods for in-line alteration of variables are disclosed.

Method and system for converting instructions

A method for converting instructions is provided. The method is used in a processor and includes: receiving an instruction, wherein the instruction is an unknown instruction; determining whether the received instruction is a new instruction; and converting the received instruction into at least one old instruction when the received instruction is a new instruction.

Computer processor employing instructions with elided nop operations

A computer processor that operates on distinct first and second instruction streams that have a predefined timed semantic relationship. At least one of the first and second instruction streams includes variable-length instructions having a header and associated bundle bounded by a head end and a tail end. An alignment hole within the bundle encodes information representing at least one nop operation. The computer processor includes first and second multi-stage instruction processing components configured to process in parallel the first and second instruction streams. At least one of the first and second multi-stage instruction processing components includes an instruction buffer operably coupled to a decode stage. The decode stage is configured to process a variable-length instruction by isolating and interpreting the alignment hole of the variable length instruction in order to initiate zero or more nop operations that follow the timed semantic relationship between the first and second instruction streams.

Processors operable to allow flexible instruction alignment
09740488 · 2017-08-22 · ·

Methods and apparatus are provided for optimizing a processor core. Common processor subcircuitry is used to perform calculations for various types of instructions, including branch and non-branch instructions. Increasing the commonality of calculations across different instruction types allows branch instructions to jump to byte aligned memory address even if supported instructions are multi-byte or word aligned.

MITIGATION OF BRANCH MISPREDICTION PENALTY IN A HARDWARE MULTI-THREAD MICROPROCESSOR
20220308887 · 2022-09-29 · ·

Embodiments are provided for mitigation of branch misprediction penalty in hardware multi-thread microprocessors. In some embodiments, a hardware multi-thread microprocessor includes first stage circuitry that fetches a pair of consecutive instructions of a program executed in a thread. Such microprocessor also includes second stage circuitry that determines, during a clock cycle, that a first instruction in that pair is a branch instruction. The first stage circuitry fetches, during a second clock cycle after the clock cycle, a pair of branch target instructions of the program using a branch prediction. Such microprocessor further includes third stage circuitry that determines that the branch prediction is a misprediction during the second clock cycle. The first stage circuitry sends the second instruction to the second stage circuitry during a third clock cycle after the second clock cycle. The second stage circuitry decodes the second instruction during the third clock cycle.

Method and apparatus for permuting streamed data elements

A method is provided that includes receiving, in a permute network, a plurality of data elements for a vector instruction from a streaming engine, and mapping, by the permute network, the plurality of data elements to vector locations for execution of the vector instruction by a vector functional unit in a vector data path of a processor.

SYSTEMS, METHODS, AND APPARATUS FOR TILE CONFIGURATION

Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address; and execution circuitry to execute the decoded instruction to set a tile configuration for the processor to utilize tiles in matrix operations based on a description retrieved from the memory address, wherein a tile a set of 2-dimensional registers are discussed.

Vector friendly instruction format and execution thereof

A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams.

Stream reference register with double vector and dual single vector operating modes
11210097 · 2021-12-28 · ·

A streaming engine employed in a digital signal processor specifies a fixed read only data stream. Once fetched the data stream is stored in two head registers for presentation to functional units in the fixed order. Data use by the functional unit is preferably controlled using the input operand fields of the corresponding instruction. A first read only operand coding supplies data from the first head register. A first read/advance operand coding supplies data from the first head register and also advances the stream to the next sequential data elements. Corresponding second read only operand coding and second read/advance operand coding operate similarly with the second head register. A third read only operand coding supplies double width data from both head registers.

MIXED-ELEMENT-SIZE INSTRUCTION
20210389948 · 2021-12-16 ·

A mixed-element-size instruction is described, which specifies a first operand and a second operand stored in registers. In response to the mixed-element-size instruction, an instruction decoder controls processing circuitry to perform an arithmetic/logical operation on two or more first data elements of the first operand and two or more second data elements of the second operand, where the first data elements have a larger data element size than the second data elements. This is particularly useful for machine learning applications to improve processing throughput and memory bandwidth utilisation.