G06F9/30065

ZERO-OVERHEAD LOOP IN AN EMBEDDED DIGITAL SIGNAL PROCESSOR
20170344375 · 2017-11-30 ·

A decoding logic method is arranged to execute a zero-overhead loop in an embedded digital signal processor (DSP). In the method, instruction data is fetched from a memory, and a plurality of instruction tokens, which are derived from the instruction data, are stored in a token buffer. A first portion of one or more instruction tokens from the token buffer are passed to a first decode module, which may be an instruction decode module, and a second portion of the one or more instruction tokens from the token buffer are passed to a second decode module, which may be a loop decode module. The second decode module detects a special loop instruction token, and based on the detection of the special loop instruction token, a loop counter is conditionally tested. Using the first decode module, at least one instruction token of an iterative algorithm is assembled into a single instruction, which is executable in a single execution cycle. Based on the conditional test of the loop counter, the first decode module further assembles a loop branch instruction of the iterative algorithm into the single instruction executable in one execution cycle.

Tracking streaming engine vector predicates to control processor execution

In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.

Critical problem exception handling

Methods, apparatus, computer program products for handling critical problem exceptions during an execution of an application are provided. The method comprises: detecting, by one or more processing units, an occurrence of a certain type of critical problem exception during an execution of an application, the critical problem exception resulting in a termination of the application; instructing, by one or more processing units, to call a Super Handling Routine (SHR) corresponding to the type of the critical problem exception at a pre-configured address based on a pre-determined context registered by the application, the SHR being configured to handle critical problem exceptions; and handing, by one or more processing units, control to the SHR to handle the type of the critical problem exception.

GRAPHICS PROCESSORS AND GRAPHICS PROCESSING UNITS HAVING DOT PRODUCT ACCUMULATE INSTRUCTION FOR HYBRID FLOATING POINT FORMAT

Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.

Method and apparatus for permuting streamed data elements

A method is provided that includes receiving, in a permute network, a plurality of data elements for a vector instruction from a streaming engine, and mapping, by the permute network, the plurality of data elements to vector locations for execution of the vector instruction by a vector functional unit in a vector data path of a processor.

APPARATUS AND METHOD FOR PERFORMING A SPIN-LOOP JUMP
20170329609 · 2017-11-16 ·

An apparatus and method for performing a spin-loop jump. One embodiment of a processor comprises: jump-pause execution logic to execute a jump-pause instruction, the jump-pause instruction to specify a condition and identify a destination instruction; wherein responsive to the execution of the jump-pause instruction, the jump-pause execution logic is to provide a hint that a loop between the jump-pause instruction and the destination instruction comprises a spin-wait loop and to test the condition, the jump-pause execution logic to delay execution by a specified amount prior to jumping to the destination instruction if the condition is satisfied. A second embodiment of a processor comprises test-subtract execution logic to execute a test-subtract instruction, the test-subtract instruction to decrement the counter value in a second source register, the test-subtract execution logic to further test the monitored value in a first source register or memory and the counter value in the second source register, wherein the test-subtract execution logic is to exit a spin-wait loop if the monitored value has a value indicating an exit condition or if the counter value is equal to zero.

Compound instruction set architecture for a neural inference chip

A device for controlling neural inference processor cores is provided, including a compound instruction set architecture. The device comprises an instruction memory, which comprises a plurality of instructions for controlling a neural inference processor core. Each of the plurality of instructions comprises a control operation. The device further comprises a program counter. The device further comprises at least one loop counter register. The device is adapted to execute the plurality of instructions. Executing the plurality of instructions comprises: reading an instruction from the instruction memory based on a value of the program counter; updating the at least one loop counter register according to the control operation of the instruction; and updating the program counter according to the control operation of the instruction and a value of the at least one loop counter register.

Error correction in network packets using lookup tables

Error correction in network packets using lookup tables are disclosed herein. The method can include extracting soft information from copies of a network packet, using the soft information to select positions in a payload of the network packet with uncertain values of bits, changing values at the positions to a combination of values to obtain a modified payload of the network packet, and calculating the CRC for the modified payload. When the modified payload matches an error detection code in the network packet, errors in the payload are corrected.

Zero overhead looping by a decoder generating and enqueuing a branch instruction

A method and apparatus for zero overheard loops is provided herein. The method includes the steps of identifying, by a decoder, a loop instruction and identifying, by the decoder, a last instruction in a loop body that corresponds to the loop instruction. The method further includes the steps of generating, by the decoder, a branch instruction that returns execution to a beginning of the loop body, and enqueing, by the decoder, the branch instruction into a branch reservation queue concurrently with an enqueing of the last instruction in a reservation queue.

Systems, methods, and apparatuses for dot product operations

Embodiments detailed herein relate to matrix operations. For example, embodiments of instruction support for matrix (tile) dot product operations are detailed. Exemplary instructions including computing a dot product of signed words and accumulating in a quadword data elements of a matrix pair. Additionally, in some instances, non-accumulating quadword data elements of the matrix pair are set to zero.