Patent classifications
G06F2207/483
Apparatus and method for vector processing
An apparatus comprises processing circuitry for performing, in response to a vector instruction, a plurality of lanes of processing or respective data elements with at least one operand vector to generate corresponding result data elements of a result vector. The processing circuitry may support performing at least two of the lanes of processing with different rounding modes for generating rounding values for the corresponding result data elements of the result vector. This allows two or more calculations with different rounding modes to be executed in response to a single instruction, to improve performance.
FLOATING POINT ADDITION WITH EARLY SHIFTING
A floating point adder includes leading zero anticipation circuitry 18 to determine a number of leading zeros within a result significand value of a sum of a first floating point operand and a second floating point operand. This number of leading zeros is used to generate a mask which in turn selects input bits from a non-normalized significand produced by adding the first significand value and the second significand value. The non-normalized significand is then normalized at the same time as the output rounding bits used to round the normalized significand value are generated by rounding bit generation circuitry 40.
VARIABLE PRECISION FLOATING-POINT MULTIPLIER
Integrated circuits with specialized processing blocks are provided. The specialized processing blocks may include floating-point multiplier circuits that can be configured to support variable precision. A multiplier circuit may include a first carry-propagate adder (CPA), a second carry-propagate adder (CPA), and an associated rounding circuit. The first CPA may be wide enough to handle the required precision of the mantissa. In a bridged mode, the first CPA may borrow an additional bit from the second CPA while the rounding circuit will monitor the appropriate bits to select the proper multiplier output. A parallel prefix tree operable in a non-bridged mode or the bridged mode may be used to compute multiple multiplier outputs. The multiplier circuit may also include exponent and exception handling circuitry using various masks corresponding to the desired precision width.
Exponent monitoring
A processing apparatus includes floating point arithmetic circuitry coupled to monitoring circuitry. The monitoring circuitry stores exponent limit data indicating at least one of a maximum exponent value and a minimum exponent value processed when performing the floating point arithmetic operations. The monitoring circuitry may be selectively enabled in dependence upon a virtual machine identifier, an application specific identifier or a program counter value range. Exponent limit data may be gathered in respect of different portions of the floating point arithmetic circuitry and/or may be aggregated to form global exponent limit data for the system.
Reproducible stochastic rounding for out of order processors
A method for generating a random number for use in a stochastic rounding operation is provided. The method includes executing an instruction that causes at least two operands to produce an intermediate result and incrementing a state of a random number generator. The method d further includes causing the random number generator to generate a random number in accordance with the state and producing a final result by utilizing the random number to determine a rounding of the intermediate result.
FLOATING POINT UNIT WITH SUPPORT FOR VARIABLE LENGTH NUMBERS
Embodiments of a processor are disclosed for performing arithmetic operations on a machine independent number format. The processor may include a floating point unit, and a number unit. The number format may include a sign/exponent block, a length block, and multiple mantissa digits. The number unit may be configured to perform an operation on two operands by converting the digit format of each mantissa digit of each operand, to perform the operation using the converted mantissa digits, and then to convert each mantissa digit of the result of the operation back into the original digit format.
Apparatus and method for performing conversion operation
An apparatus comprises processing circuitry to perform a conversion operation to convert a floating-point value to a vector comprising a plurality of data elements representing respective bit significance portions of a binary value corresponding to the floating-point value.
Data processing apparatus and method using programmable significance data
An apparatus includes processing circuitry to perform one or more arithmetic operations for generating a result value based on at least one operand. For at least one arithmetic operation, the processing circuitry is responsive to programmable significance data indicative of a target significance for the result value, to generate the result value having the target significance. For example, this allows programmers to set a significance boundary for the arithmetic operation so that it is not necessary for the processing circuitry to calculate bit values having a significance outside the specified boundary, enabling a performance improvement.
Vector operands with component representing different significance portions
A data processing system supports vector operands with components representing different bit significance portions of an integer number. Processing circuitry performs a processing operation specified by a program instruction in dependence upon a number of components comprising the vector as specified by metadata for the vector.
Lane position information for processing of vector
Processing circuitry performs a plurality of lanes of processing on respective data elements of at least one operand vector to generate corresponding result data elements of a result vector. The processing circuitry identifies lane position information for each lane of processing, the lane position information for a given lane identifying a relative position of the corresponding result data element to be generated by the given lane within a corresponding result data value spanning one or more result data elements of the result vector. The processing circuitry is configured to perform each lane of processing in dependence on the lane position information identified for that lane. This enables generation of results which are wider or narrower than the vector size supported in hardware.