G06F9/30072

Method and apparatus to sort a vector for a bitonic sorting algorithm

A method is provided that includes performing, by a processor in response to a vector sort instruction, sorting of values stored in lanes of the vector to generate a sorted vector, wherein the values in a first portion of the lanes are sorted in a first order indicated by the vector sort instruction and the values in a second portion of the lanes are sorted in a second order indicated by the vector sort instruction; and storing the sorted vector in a storage location.

STREAMING ENGINE WITH MULTI DIMENSIONAL CIRCULAR ADDRESSING SELECTABLE AT EACH DIMENSION
20230359565 · 2023-11-09 ·

A streaming engine employed in a digital data processor may specify a fixed read-only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template register independently specifies a linear address or a circular address mode for each of the nested loops.

Method and Apparatus for Dual Issue Multiply Instructions
20230350813 · 2023-11-02 ·

Various configurations of processors are provided. In a configuration, the processor comprises first and second multiplication unit. Each of these multiplication units includes carry-save adder circuitry with a respective outputs, partial product alignment multiplexing logic coupled to the outputs of the associated carry-save adder circuitry. The processor further comprises communication paths coupled between the outputs of the carry-save adder circuitry of the first multiplication unit and the partial product alignment multiplexing logic of the second multiplication unit. In other configurations, each of the first and second multiplication units may include one or more instances of masking logic, one or more instances of a multiplier array coupled to the associated instance(s) of masking logic, and one or more instances of a multiplexer set coupled to the associated instance(s) of multiplier array(s). Each of multiplexer set instance(s) of a particular multiplication unit is coupled to the carry-save adder circuitry of that multiplication unit.

Prediction class determination

There is provided an apparatus, method and medium. The apparatus comprises processing circuitry to perform data processing in response to decoded instructions and prediction circuitry to generate a prediction of a number of iterations of a fetching process. The fetching process is used to control fetching of data or instructions to be used in processing operations that are predicted to be performed by the processing circuitry. The processing circuitry is configured to tolerate performing one or more unnecessary iterations of the fetching process following an over-prediction of the number of iterations and, for at least one prediction, to determine a class of a plurality of prediction classes, each of which corresponds to a range of numbers of iterations. The prediction circuitry is also arranged to signal a predetermined number of iterations associated with the class to the processing circuitry to trigger at least the predetermined number of iterations of the fetching process.

Method and apparatus for vector sorting using vector permutation logic

A method for sorting of a vector in a processor is provided that includes performing, by the processor in response to a vector sort instruction, generating a control input vector for vector permutation logic comprised in the processor based on values in lanes of the vector and a sort order for the vector indicated by the vector sort instruction and storing the control input vector in a storage location.

TRACKING STREAMING ENGINE VECTOR PREDICATES TO CONTROL PROCESSOR EXECUTION
20230084716 · 2023-03-16 ·

In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.

Method and Apparatus for Vector Based Matrix Multiplication
20220283810 · 2022-09-08 ·

A method is provided that includes performing, by a processor in response to a vector matrix multiply instruction, multiplying an m×n matrix (A matrix) and a n×p matrix (B matrix) to generate elements of an m×p matrix (R matrix), and storing the elements of the R matrix in a storage location specified by the vector matrix multiply instruction.

PREDICATION METHODS FOR VECTOR PROCESSORS
20220261245 · 2022-08-18 ·

A technique for method for executing instructions in a processor includes receiving a first instruction, receiving a second instruction, identifying a functional unit specified by an opcode contained in an opcode field of the first instruction, selecting a field of the second instruction that contains predicate information based on the identified functional unit, and executing the first instruction in a conditional manner using the identified functional unit and the predicate information contained in the selected field of the second instruction.

Nested loop control

A method for compiling and executing a nested loop includes initializing a nested loop controller with an outer loop count value and an inner loop count value. The nested loop controller includes a predicate FIFO. The method also includes coalescing the nested loop and, during execution of the coalesced nested loop, causing the nested loop controller to populate the predicate FIFO and executing a get predicate instruction having an offset value, where the get predicate returns a value from the predicate FIFO specified by the offset value. The method further includes predicating an outer loop instruction on the returned value from the predicate FIFO.

CONFLICT DETECTION METHOD AND SYSTEM FOR INTERNET OF THINGS (IoT) DEVICE SCHEDULING
20220300287 · 2022-09-22 ·

The present disclosure discloses a conflict detection method for Internet of Things (IoT) device scheduling, relating to the technical field of the IoT, and specific steps include: acquiring data of a device model; converting a device scheduling instruction into a conditional instruction according to the data of the device model; determining a scheduling conflict rule according to device scheduling conflicts in historical data; detecting whether the conditional instruction is in a conflict state based on the scheduling conflict rule; if the conditional instruction is in a conflict state, performing a first conflict resolution, or if the conditional instruction is in a non-conflict state or after a conflict is resolved, performing a second detection; converting, in the second detection, the conditional instruction into an SMT formula, inputting the SMT formula into an SMT solver for detection, and determining whether the conditional instruction is in a conflict state; and if the conditional instruction is in a conflict state, performing second conflict resolution, or if the conditional instruction is in a non-conflict state or after a conflict is resolved, executing the conditional instruction. The present disclosure ensures consistency between different services of the IoT, and a rule-based method and an SMT solver-based method are adopted to perform conflict detection.