G06F7/4991

AN APPARATUS AND METHOD FOR PROCESSING FLOATING POINT VALUES
20180173497 · 2018-06-21 ·

An apparatus and method are provided for processing floating point values using an intermediate representation which has significand, exponent and shadow sections. A less significant portion of the exponent of the floating point value defines a range of positions within the significand section where the representation of the significand is to be held. The exponent section holds a representation of a more significant portion of the exponent indicating a selected window of multiple contiguous windows spanning a value range of a format of the floating point value. A first portion of the significand section corresponds to the selected window and a second portion corresponds to an overlap into a further window which is adjacent to and lower in the value range. The shadow section holds values for populating the second portion of the significand section when the representation of the significand of the floating point value is moved to a higher window which is adjacent to and higher in the value range than the selected window. The shadow section allows the selected window to be shifted, such that the summation of multiple values produces the same result independent of the order in which the values are summed.

OVERFLOW DETECTION FOR SIGN-MAGNITUDE ADDERS
20180165063 · 2018-06-14 ·

A circuit is provided which includes arithmetic computation logic configured to add or subtract operands of variable length to produce a result in a sign-magnitude data format. The circuit also includes an overflow detector to provide an overflow signal indicative of whether the result fits within a specified result length l. The overflow detector operates on the operands prior to the arithmetic computation logic producing the result to determine, independent of the result produced by the arithmetic computation logic, whether the result fits within the specified result length l.

Tininess prediction and handler engine for smooth handling of numeric underflow

Embodiments of the present disclosure include a tininess prediction and handler engine for handling numeric underflow while streamlining the data path for handling normal range cases, thereby avoiding flushes, and reducing the complexity of a scheduler with respect to how dependent operations are handled. A preemptive tiny detection logic section can detect a potential tiny result for the function or operation that is being performed, and can produce a pessimistic tiny indicator. The tininess prediction and handler engine can further include a subnormal post-processing pipe, which can denormalize and round one or more subnormal operations while in a post-processing mode. A schedule modification logic section can reschedule in-flight operations. The schedule modification logic section can issue dependent operations optimistically assuming that a producing operation will not produce a tiny result, and so will not incur extra latency associated with fixing the tiny result in the post-processing pipe.

Apparatus and method for vector processing
09916130 · 2018-03-13 · ·

An apparatus comprises processing circuitry for performing, in response to a vector instruction, a plurality of lanes of processing or respective data elements with at least one operand vector to generate corresponding result data elements of a result vector. The processing circuitry may support performing at least two of the lanes of processing with different rounding modes for generating rounding values for the corresponding result data elements of the result vector. This allows two or more calculations with different rounding modes to be executed in response to a single instruction, to improve performance.

Executing Perform Floating Point Operation Instructions
20180039482 · 2018-02-08 ·

A method and system are disclosed for executing a machine instruction in a central processing unit. The method comprises the steps of obtaining a perform floating-point operation instruction; obtaining a test bit; and determining a value of the test bit. If the test bit has a first value, (a) a specified floating-point operation function is performed, and (b) a condition code is set to a value determined by said specified function. If the test bit has a second value, (c) a check is made to determine if said specified function is valid and installed on the machine, (d) if said specified function is valid and installed on the machine, the condition code is set to one code value, and (e) if said specified function is either not valid or not installed on the machine, the condition code is set to a second code value.

Exponent monitoring

A processing apparatus includes floating point arithmetic circuitry coupled to monitoring circuitry. The monitoring circuitry stores exponent limit data indicating at least one of a maximum exponent value and a minimum exponent value processed when performing the floating point arithmetic operations. The monitoring circuitry may be selectively enabled in dependence upon a virtual machine identifier, an application specific identifier or a program counter value range. Exponent limit data may be gathered in respect of different portions of the floating point arithmetic circuitry and/or may be aggregated to form global exponent limit data for the system.

Data processing system configured for separated computations for positive and negative data
12182533 · 2024-12-31 · ·

A method includes obtaining input data and separating the input data into a first subset of input data and a second subset of input data, the first subset of input data including positive input data and the second subset of input data including negative input data. The method includes performing positive computations on the first subset of input data to determine one or more first results and performing negative computations on the second subset of input data to determine one or more second results. The method includes aggregating the one or more first results and the one or more second results to determine a solution based on the aggregating. The method includes executing an application using a machine learning model or a deep neural network based on the determined solution.

Multiplier circuit

A multiplier circuit includes a first circuit comprising a first transistor, a second transistor, a first capacitor, and a second capacitor. It further includes a second circuit comprising a third transistor, a fourth transistor, a third capacitor, and a fourth capacitor.

Low power hardware architecture for handling accumulation overflows in a convolution operation

In a low power hardware architecture for handling accumulation overflows in a convolver unit, an accumulator of the convolver unit computes a running total by successively summing dot products from a dot product computation module during an accumulation cycle. In response to the running total overflowing the maximum or minimum value of a data storage element, the accumulator transmits an overflow indicator to a controller and sets its output equal to a positive or negative overflow value. In turn, the controller disables the dot product computation module by clock gating, clamping one of its inputs to zero and/or holding its inputs to constant values. At the end of the accumulation cycle, the output of the accumulator is sampled. In response to a clear signal being asserted, the dot product computation module is enabled, and the running total is set to zero for the start of the next accumulation cycle.

Double rounded combined floating-point multiply and add

Methods, apparatus, instructions and logic are disclosed providing double rounded combined floating-point multiply and add functionality as scalar or vector SIMD instructions or as fused micro-operations. Embodiments include detecting floating-point (FP) multiplication operations and subsequent FP operations specifying as source operands results of the FP multiplications. The FP multiplications and the subsequent FP operations are encoded as combined FP operations including rounding of the results of FP multiplication followed by the subsequent FP operations. The encoding of said combined FP operations may be stored and executed as part of an executable thread portion using fused-multiply-add hardware that includes overflow detection for the product of FP multipliers, first and second FP adders to add third operand addend mantissas and the products of the FP multipliers with different rounding inputs based on overflow, or no overflow, in the products of the FP multiplier. Final results are selected respectively using overflow detection.