Patent classifications
G06F9/381
Method and apparatus to sort a vector for a bitonic sorting algorithm
A method is provided that includes performing, by a processor in response to a vector sort instruction, sorting of values stored in lanes of the vector to generate a sorted vector, wherein the values in a first portion of the lanes are sorted in a first order indicated by the vector sort instruction and the values in a second portion of the lanes are sorted in a second order indicated by the vector sort instruction; and storing the sorted vector in a storage location.
Branch stall elimination in pipelined microprocessors
A system can include a microprocessor having a prefetch queue including a plurality of slots configured to store program counter values (PCVs) and instructions, a pipeline configured to receive instructions from the prefetch queue, and a select circuit coupled to the prefetch queue. The select circuit may selectively freeze a first slot of the plurality of slots and selectively output a frozen PCV and a frozen instruction from the first slot while frozen. The microprocessor can include write logic coupled to the prefetch queue and a comparator circuit coupled to the prefetch queue and the select circuit. The write logic may load data into unfrozen slots of the prefetch queue. The comparator circuit may compare a target PCV with the frozen PCV to determine a match. The select circuit indicates, to the pipeline, whether the frozen instruction is valid based on the comparing.
Repeat instruction for loading and/or executing code in a claimable repeat cache a specified number of times
A processor is disclosed including: a barrel-threaded execution unit for executing concurrent threads, and a repeat cache shared between the concurrent threads. The processor's instruction set includes a repeat instruction which takes a repeat count operand. When the repeat cache is not claimed and the repeat instruction is executed in a first thread, a portion of code is cached from the first thread into the repeat cache, the state of the repeat cache is changed to record it as claimed, and the cached code is executed a number of times. When the repeat instruction is then executed in a further thread, then the already-cached portion of code is again executed a respective number of times, each time from the repeat cache. For each of the first and further instructions, the repeat count operand in the respective instruction specifies the number of times to execute the cached code.
Method and Apparatus for Dual Issue Multiply Instructions
Various configurations of processors are provided. In a configuration, the processor comprises first and second multiplication unit. Each of these multiplication units includes carry-save adder circuitry with a respective outputs, partial product alignment multiplexing logic coupled to the outputs of the associated carry-save adder circuitry. The processor further comprises communication paths coupled between the outputs of the carry-save adder circuitry of the first multiplication unit and the partial product alignment multiplexing logic of the second multiplication unit. In other configurations, each of the first and second multiplication units may include one or more instances of masking logic, one or more instances of a multiplier array coupled to the associated instance(s) of masking logic, and one or more instances of a multiplexer set coupled to the associated instance(s) of multiplier array(s). Each of multiplexer set instance(s) of a particular multiplication unit is coupled to the carry-save adder circuitry of that multiplication unit.
Method and apparatus for vector sorting using vector permutation logic
A method for sorting of a vector in a processor is provided that includes performing, by the processor in response to a vector sort instruction, generating a control input vector for vector permutation logic comprised in the processor based on values in lanes of the vector and a sort order for the vector indicated by the vector sort instruction and storing the control input vector in a storage location.
TRACKING STREAMING ENGINE VECTOR PREDICATES TO CONTROL PROCESSOR EXECUTION
In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.
STREAMING ENGINE WITH EARLY EXIT FROM LOOP LEVELS SUPPORTING EARLY EXIT LOOPS AND IRREGULAR LOOPS
A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. Upon a stream break instruction specifying one of the nested loops, the stream engine ends a current iteration of the loop. If the specified loop was not the outermost loop, the streaming engine begins an iteration of a next outer loop. If the specified loop was the outermost nested loop, the streaming engine ends the stream. The streaming engine places a vector of data elements in order in lanes within a stream head register. A stream break instruction is operable upon a vector break.
Detecting a repetitive pattern in an instruction pipeline of a processor to reduce repeated fetching
Exemplary aspects disclosed herein include detecting a repetitive pattern in an instruction pipeline of a processor to reduce repeated fetching. The processor includes a pattern record circuit configured to receive information in a data stream (e.g., instructions or consumed data) in the instruction pipeline. The pattern record circuit includes a first in, first out (FIFO) table circuit that contains an input record column and plurality of additional adjacent record columns. As new data occurs in the data stream, the data record circuit is configured to sequentially record next incoming data from the data stream into a next input entry of an input record column and then shift previously recorded data into adjacent entries of adjacent record columns. The distance between the input record column and the additional record column that has matching data is the distance in the data stream between a reoccurrence of data in the data stream.
Method and Apparatus for Vector Based Matrix Multiplication
A method is provided that includes performing, by a processor in response to a vector matrix multiply instruction, multiplying an m×n matrix (A matrix) and a n×p matrix (B matrix) to generate elements of an m×p matrix (R matrix), and storing the elements of the R matrix in a storage location specified by the vector matrix multiply instruction.
Method and Apparatus for Dual Issue Multiply Instructions
A method is provided that includes performing, by a processor in response to a dual issue multiply instruction, multiplication of operands of the dual issue multiply instruction using multiplication units comprised in a data path of the processor and configured to operate together to determine a product of the operands, and storing, by the processor, the product in a storage location indicated by the dual issue multiply instruction.