G06F9/30021

UNIFIED INTEGER AND FLOATING-POINT COMPARE CIRCUITRY
20170357506 · 2017-12-14 ·

Techniques are disclosed relating to comparison circuitry. In some embodiments, compare circuitry is configured to generate comparison results for sets of inputs in both one or more integer formats and one or more floating-point formats. In some embodiments, the compare circuitry includes padding circuitry configured to add one or more bits to each of first and second input values to generate first and second padded values. In some embodiments, the compare circuitry also includes integer subtraction circuitry configured to subtract the first padded value from the second padded value to generate a subtraction result. In some embodiments, the compare circuitry includes output logic configured to generate the comparison result based on the subtraction result. In various embodiments, using at least a portion of the same circuitry (e.g., the subtractor) for both integer and floating-point comparisons may reduce processor area.

Information processing apparatus and method of controlling information processing apparatus
09841973 · 2017-12-12 · ·

An information processing apparatus includes a plurality of arithmetic processing devices, a common timer unit configured to measure time in common among the plurality of arithmetic processing devices, a plurality of individual timer units to measure execution time of a program per plurality of arithmetic processing devices, a comparing unit configured to compare the program execution time of each of the plurality of arithmetic processing devices, the program execution time being measured by the plurality of individual timer units, with time measured by the common timer unit, and a control unit configured to control processing of the plurality of arithmetic processing devices on the basis of a result of the comparison made by the comparing unit.

Generation and use of memory access instruction order encodings

Apparatus and methods are disclosed for controlling execution of memory access instructions in a block-based processor architecture using a hardware structure that indicates a relative ordering of memory access instruction in an instruction block. In one example of the disclosed technology, a method of executing an instruction block having a plurality of memory load and/or memory store instructions includes selecting a next memory load or memory store instruction to execute based on dependencies encoded within the block, and on a store vector that stores data indicating which memory load and memory store instructions in the instruction block have executed. The store vector can be masked using a store mask. The store mask can be generated when decoding the instruction block, or copied from an instruction block header. Based on the encoded dependencies and the masked store vector, the next instruction can issue when its dependencies are available.

Method and apparatus for vector based finite impulse response (FIR) filtering

A method is provided that includes performing, by a processor in response to a vector finite impulse response (VFIR) filter instruction, generating of a plurality of filter outputs using a plurality of coefficients and a plurality of sequential data elements, the plurality of coefficients specified by a coefficient operand of the VFIR filter instruction and the plurality of sequential data elements specified by a data operand of the VFIR filter instruction, and storing the filter outputs in a storage location specified by the VFIR filter instruction.

Multi-lane solutions for addressing vector elements using vector index registers
11681594 · 2023-06-20 · ·

Disclosed herein are vector index registers for storing or loading indexes of true and/or false results of conditional operations using multiple lane processing in vector processors. Each of the vector index registers store multiple addresses for accessing multiple positions in operand vectors in various types of operations that can leverage multi-lane processing.

Tracking streaming engine vector predicates to control processor execution

In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.

Method and apparatus for permuting streamed data elements

A method is provided that includes receiving, in a permute network, a plurality of data elements for a vector instruction from a streaming engine, and mapping, by the permute network, the plurality of data elements to vector locations for execution of the vector instruction by a vector functional unit in a vector data path of a processor.

APPARATUS AND METHOD FOR PERFORMING A SPIN-LOOP JUMP
20170329609 · 2017-11-16 ·

An apparatus and method for performing a spin-loop jump. One embodiment of a processor comprises: jump-pause execution logic to execute a jump-pause instruction, the jump-pause instruction to specify a condition and identify a destination instruction; wherein responsive to the execution of the jump-pause instruction, the jump-pause execution logic is to provide a hint that a loop between the jump-pause instruction and the destination instruction comprises a spin-wait loop and to test the condition, the jump-pause execution logic to delay execution by a specified amount prior to jumping to the destination instruction if the condition is satisfied. A second embodiment of a processor comprises test-subtract execution logic to execute a test-subtract instruction, the test-subtract instruction to decrement the counter value in a second source register, the test-subtract execution logic to further test the monitored value in a first source register or memory and the counter value in the second source register, wherein the test-subtract execution logic is to exit a spin-wait loop if the monitored value has a value indicating an exit condition or if the counter value is equal to zero.

Method and apparatus for implied bit handling in floating point multiplication

A method is provided that includes performing, by a processor in response to a floating point multiply instruction, multiplication of floating point numbers, wherein determination of values of implied bits of leading bit encoded mantissas of the floating point numbers is performed in parallel with multiplication of the encoded mantissas, and storing, by the processor, a result of the floating point multiply instruction in a storage location indicated by the floating point multiply instruction.

VECTOR SIMD VLIW DATA PATH ARCHITECTURE

A Very Long Instruction Word (VLIW) digital signal processor particularly adapted for single instruction multiple data (SIMD) operation on various operand widths and data sizes. A vector compare instruction compares first and second operands and stores compare bits. A companion vector conditional instruction performs conditional operations based upon the state of a corresponding predicate data register bit. A predicate unit performs data processing operations on data in at least one predicate data register including unary operations and binary operations. The predicate unit may also transfer data between a general data register file and the predicate data register file.