G06F9/30021

Signed multiplication using unsigned multiplier with dynamic fine-grained operand isolation

An N×N multiplier may include a N/2×N first multiplier, a N/2×N/2 second multiplier, and a N/2×N/2 third multiplier. The N×N multiplier receives two operands to multiply. The first, second and/or third multipliers are selectively disabled if an operand equals zero or has a small value. If the operands are both less than 2.sup.N/2, the second or the third multiplier are used to multiply the operands. If one operand is less than 2.sup.N/2 and the other operand is equal to or greater than 2.sup.N/2, the first multiplier is used or the second and third multipliers are used to multiply the operands. If both operands are equal to or greater than 2.sup.N/2, the first, second and third multipliers are used to multiply the operands.

METHOD AND APPARATUS FOR VECTOR SORTING USING VECTOR PERMUTATION LOGIC
20230037321 · 2023-02-09 ·

A method for sorting of a vector in a processor is provided that includes performing, by the processor in response to a vector sort instruction, generating a control input vector for vector permutation logic comprised in the processor based on values in lanes of the vector and a sort order for the vector indicated by the vector sort instruction and storing the control input vector in a storage location.

Vector SIMD VLIW data path architecture

A Very Long Instruction Word (VLIW) digital signal processor particularly adapted for single instruction multiple data (SIMD) operation on various operand widths and data sizes. A vector compare instruction compares first and second operands and stores compare bits. A companion vector conditional instruction performs conditional operations based upon the state of a corresponding predicate data register bit. A predicate unit performs data processing operations on data in at least one predicate data register including unary operations and binary operations. The predicate unit may also transfer data between a general data register file and the predicate data register file.

Implementing specialized instructions for accelerating Smith-Waterman sequence alignments

Various techniques for accelerating Smith-Waterman sequence alignments are provided. For example, threads in a group of threads are employed to use an interleaved cell layout to store relevant data in registers while computing sub-alignment data for one or more local alignment problems. In another example, specialized instructions that reduce the number of cycles required to compute each sub-alignment score are utilized. In another example, threads are employed to compute sub-alignment data for a subset of columns of one or more local alignment problems while other threads begin computing sub-alignment data based on partial result data received from the preceding threads. After computing a maximum sub-alignment score, a thread stores the maximum sub-alignment score and the corresponding position in global memory.

SORT ACCELERATION PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS
20180004520 · 2018-01-04 · ·

A processor of an aspect includes packed data registers, and a decode unit to decode an instruction. The instruction may indicate a first source packed data to include at least four data elements, to indicate a second source packed data to include at least four data elements, and to indicate a destination storage location. An execution unit is coupled with the packed data registers and the decode unit. The execution unit, in response to the instruction, is to store a result packed data in the destination storage location. The result packed data may include at least four indexes that may identify corresponding data element positions in the first and second source packed data. The indexes may be stored in positions in the result packed data that are to represent a sorted order of corresponding data elements in the first and second source packed data.

PROCESSOR AND CONTROL METHOD OF PROCESSOR

A processor includes: an address generating unit that, when an instruction decoded by a decoding unit is an instruction to execute arithmetic processing on a plurality of operand sets each including a plurality of operands that are objects of the arithmetic processing, in parallel a plurality of times, generates an address set corresponding to each of the operand sets of the arithmetic processing for each time, based on a certain address displacement with respect to the plurality of operands included in each of the operand sets; a plurality of instruction queues that hold the generated address sets corresponding to the respective operand sets, in correspondence to respective processing units; and a plurality of processing units that perform the arithmetic processing in parallel on the operand sets obtained based on the respective address sets outputted by the plurality of instruction queues.

ELEMENT ORDERING HANDLING IN A RING BUFFER
20230004346 · 2023-01-05 ·

Data processing apparatuses, methods of data processing, complementary instructions and programs related to ring buffer administration are disclosed. An enqueuing operation performs an atomic compare-and-swap oper-ation to store a first processed data item indication to an enqueuing-target slot in the ring buffer contingent on an in-order marker not being present there and, when successful, determines that a ready-to-dequeue condition is true for the first processed data item indication. A dequeuing operation, when the ready-to-de-queue condition for a dequeuing-target slot is true, comprises writing a null data item to the dequeuing-target slot and, when dequeuing in-order, further comprises, dependent on whether a next contiguous slot has null content, determining a retirement condition and, when the retirement condition is true, performing a retirement process on the next contiguous slot comprising making the next con-tiguous slot available to a subsequent enqueuing operation. Further subsequent slots may also be retired.

INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND RECORDING MEDIUM
20230237097 · 2023-07-27 · ·

An information processing device performs a decision tree based on a decision tree which has condition determination nodes and leaf nodes. In the information processing device, an instruction unification means generates a unified instruction by unifying an instruction, which each of the condition determination nodes included in the decision tree executes, to be suitable for a parallel processing. An acquisition means acquires a plurality of pieces of input data. A condition determination means performs, by the parallel processing, a condition determination with respect to the plurality of pieces of input data for each of the condition determination nodes.

Apparatus and method for ray tracing instruction processing and execution

An apparatus and method to execute ray tracing instructions. For example, one embodiment of an apparatus comprises execution circuitry to execute a dequantize instruction to convert a plurality of quantized data values to a plurality of dequantized data values, the dequantize instruction including a first source operand to identify a plurality of packed quantized data values in a source register and a destination operand to identify a destination register in which to store a plurality of packed dequantized data values, wherein the execution circuitry is to convert each packed quantized data value in the source register to a floating point value, to multiply the floating point value by a first value to generate a first product and to add the first product to a second value to generate a dequantized data value, and to store the dequantized data value in a packed data element location in the destination register.

Inferring future value for speculative branch resolution

Aspects of the invention include includes determining a first instruction in a processing pipeline, wherein the first instruction includes a compare instruction, determining a second instruction in the processing pipeline, wherein the second instruction includes a conditional branch instruction relying on the compare instruction, determining a predicted result of the compare instruction, and completing the conditional branch instruction using the predicted result prior to executing the compare instruction.