G06F9/30094

USING FUZZY-JBIT LOCATION OF FLOATING-POINT MULTIPLY-ACCUMULATE RESULTS
20210279038 · 2021-09-09 ·

Disclosed embodiments relate to performing floating-point (FP) arithmetic. In one example, a processor is to decode an instruction specifying locations of first, second, and third floating-point (FP) operands and an opcode calling for accumulating a FP product of the first and second FP operands with the third FP operand, and execution circuitry to, in a first cycle, generate the FP product having a Fuzzy-Jbit format comprising a sign bit, a 9-bit exponent, and a 25-bit mantissa having two possible positions for a JBit and, in a second cycle, to accumulate the FP product with the third FP operand, while concurrently, based on Jbit positions of the FP product and the third FP operand, determining an exponent adjustment and a mantissa shift control of a result of the accumulation, wherein performing the exponent adjustment concurrently enhances an ability to perform the accumulation in one cycle.

Addition instructions with independent carry chains

A number of addition instructions are provided that have no data dependency between each other. A first addition instruction stores its carry output in a first flag of a flags register without modifying a second flag in the flags register. A second addition instruction stores its carry output in the second flag of the flags register without modifying the first flag in the flags register.

Register file structures combining vector and scalar data with global and local accesses

The number of registers required is reduced by overlapping scalar and vector registers. This allows increased compiler flexibility when mixing scalar and vector instructions. Local register read ports are reduced by restricting read access. Dedicated predicate registers reduce requirements for general registers, and allows reduction of critical timing paths by allowing the predicate registers to be placed next to the predicate unit.

Data processing apparatus and method for generating a status flag using predicate indicators

Data processing apparatus comprises processing circuitry to selectively apply vector processing operations to one or more data items of one or more data vectors each comprising an ordered plurality of data items at respective vector positions in the data vector, according to the state of respective predicate indicators associated with the vector positions; predicate generation circuitry to apply a processing operation to generate an ordered set of predicate indicators, each associated with a respective one of the vector positions, the ordered set of predicate indicators being associated with an ordered set of active indicators each having an active or an inactive state; and a detector to detect a status flag indicative of whether a predicate indicator at a position, in the ordered set of predicate indicators, corresponding to the position of an outermost active indicator having the active state, has a given state; in which the detector comprises: first and second circuitry to combine the ordered set of predicate indicators and the ordered set of active indicators using first and second respective logical bit-wise combinations to generate first and second ordered sets of intermediate data; and arithmetic circuitry to combine the first and second ordered sets of intermediate data using an arithmetic combination generating a carry bit, the detector generating the status flag in dependence upon the carry bit.

Digit validation check control in instruction execution

Digit validation check control for execution of an instruction. A process obtains an instruction to perform operation(s) using input value(s). The instruction includes a no validation indicator for controlling whether digit validation check control is enabled for execution of the instruction. The process executes the instruction, including determining, based on the no validation indicator, whether digit validation check control is enabled for execution of the instruction, and performing processing based on the determining. Based on the no validation indicator being set to a defined value, digit validation check control is enabled and the processing includes forcing a digit check error indicator output by the executing to indicate no digit check error with respect to the at least one input value.

System and method for determining bit types for polar encoding and decoding

Embodiments described herein provide a code generation mechanism (FIG. 3, 301) in a Polar encoder (FIG. 2, 204) to determine a bit type (FIG. 3, 312) corresponding to each coded bit in the Polar code before sending the data bits for encoding (FIG. 3, 303). For example, each bit in the Polar code is determined to have a bit type of a frozen bit, parity bit, an information bit, or a cyclic redundancy check (CRC) bit based at least on the respective reliability index of the bit from a pre-computed reliability index lookup table (FIG. 4A, 411). In this way, the bit type determination can be completed in one loop by iterating the list of entries in the pre-computed reliability index lookup table.

RECONSTRUCTION OF FLAGS AND DATA FOR IMMEDIATE FOLDING

In one embodiment, a processor includes a fetch logic to fetch instructions, a decode logic to decode the instructions, an execution logic to execute at least some of the instructions, and a reconstruction logic. The decode logic may identify a first instruction having a first immediate value, accumulate the first immediate value with a folded immediate value associated with a first operand of the first instruction, and prevent the first instruction from provision to the execution logic, such that the first instruction is not to be executed within the execution logic. The reconstruction logic may reconstruct one or more flags associated with a result of the first instruction. Other embodiments are described and claimed.

Propagation instruction to generate a set of predicate flags based on previous and current prediction data

Data processing apparatus comprises processing circuitry to selectively apply a vector processing operation to data items at positions within data vectors according to the states of a set of respective predicate flags associated with the positions, the data vectors having a data vector processing order, each data vector comprising a plurality of data items having a data item order, the processing circuitry comprising: instruction decoder circuitry to decode program instructions; and instruction processing circuitry to execute instructions decoded by the instruction decoder circuitry; wherein the instruction decoder circuitry is responsive to a propagation instruction to control the instruction processing circuitry to derive a set of predicate flags applicable to a current data vector in dependence upon a set of predicate flags applicable to a preceding data vector in the data vector processing order, wherein when one or more last-most predicate flags of the set applicable to the preceding data vector are inactive, all of the derived predicate flags in the set applicable to the current data vector are inactive.

Technique for processing a sequence of atomic add with carry instructions when a data value is not present in a cache
11036500 · 2021-06-15 · ·

Processing circuitry performs processing operations specified by program instructions. An instruction decoder decodes an atomic-add-with-carry instruction AADDC to control the processing circuitry to perform an atomic operation of an add of an addend operand value and a data value stored in a memory to generate a result value stored in the memory and a carry value indicative of whether or not the add generated a carry out. The atomic-add-with-carry instructions may be used within systems which accumulate a local sum value prior to a data value being returned into a local cache memory at which time the local sum value is added to the return data value. The atomic-add-with-carry instructions may also be used in embodiments comprising a coalescing tree of respective processing apparatus where the carry out values generated from local sums produced at each node are returned early to higher nodes within the hierarchy thereby releasing them to commence other processing.

Enabling removal and reconstruction of flag operations in a processor

In one embodiment, a processor includes a fetch logic to fetch instructions, a decode logic to decode the fetched instructions, and an execution logic to execute at least some of the instructions. The decode logic may determine whether a flag portion of a first instruction to be folded is to be performed, and if not, accumulate a first immediate value of the first instruction with a folded immediate value obtained from an entry of an immediate buffer. Other embodiments are described and claimed.