Patent classifications
G06F2205/00
VARIABLE PRECISION FLOATING-POINT ADDER AND SUBTRACTOR
An integrated circuit may include a floating-point adder that supports variable precisions. The floating-point adder may receive first and second inputs to be added, where the first and second inputs each have a mantissa and an exponent. The mantissa and exponent values may be split into a near path and a far path using a dual path floating-point adder architecture depending on the difference of the exponents and on whether an addition or subtraction is being performed. The mantissa values may be left justified, while the sticky bits are right justified. The hardware for the largest mantissa can be used to support the calculations for the smaller mantissas using no additional arithmetic structures, with only some multiplexing and decoding logic.
FLOATING POINT ADDITION WITH EARLY SHIFTING
A floating point adder includes leading zero anticipation circuitry 18 to determine a number of leading zeros within a result significand value of a sum of a first floating point operand and a second floating point operand. This number of leading zeros is used to generate a mask which in turn selects input bits from a non-normalized significand produced by adding the first significand value and the second significand value. The non-normalized significand is then normalized at the same time as the output rounding bits used to round the normalized significand value are generated by rounding bit generation circuitry 40.
SYSTEM, METHOD, AND RECORDING MEDIUM FOR MIRRORING MATRICES FOR BATCHED CHOLESKY DECOMPOSITION ON A GRAPHIC PROCESSING UNIT
A batched Cholesky decomposition method, system, and non-transitory computer readable medium for a Graphics Processing Unit (GPU) including at least a first problem and a second problem, include mirroring a second problem matrix of the second problem to a first problem matrix of the first problem, combining the first problem matrix and the mirrored second problem matrix into a single problem matrix, and allocating data read to a thread and to the first problem and the second problem, respectively.
Digital low pass filter
A digital low pass filter for producing an output value given a target value includes a memory which stores a scaling factor, a previous output value, a previous intermediate value, and the target value; the difference between the target value and the previous output value is evaluated, and then multiplied by the scaling factor to produce an intermediate value; the previous intermediate value is multiplied by the scaling factor minus unity; the output value is evaluated by summing the previous output value, twice the intermediate value, and the previous intermediate value multiplied by the scaling factor minus unity; the output value is then stored in memory as the previous output value, and the intermediate value as the previous intermediate value, such that the filter provides a second-order response but requires fewer hardware multipliers than the direct form implementation of a second-order filter.
HARDWARE ACCELERATED MACHINE LEARNING
A machine learning hardware accelerator architecture and associated techniques are disclosed. The architecture features multiple memory banks of very wide SRAM that may be concurrently accessed by a large number of parallel operational units. Each operational unit supports an instruction set specific to machine learning, including optimizations for performing tensor operations and convolutions. Optimized addressing, an optimized shift reader and variations on a multicast network that permutes and copies data and associates with an operational unit that support those operations are also disclosed.
Digital Low Pass Filter
A digital low pass filter for producing an output value given a target value includes a memory which stores a scaling factor, a previous output value, a previous intermediate value, and the target value; the difference between the target value and the previous output value is evaluated, and then multiplied by the scaling factor to produce an intermediate value; the previous intermediate value is multiplied by the scaling factor minus unity; the output value is evaluated by summing the previous output value, twice the intermediate value, and the previous intermediate value multiplied by the scaling factor minus unity; the output value is then stored in memory as the previous output value, and the intermediate value as the previous intermediate value, such that the filter provides a second-order response but requires fewer hardware multipliers than the direct form implementation of a second-order filter.
DATA PROCESSING SYSTEMS
A method of operating a data processing system when determining a b-bit unsigned normalized integer representation U of a number x is disclosed. When the number x has a value between 0 and 1, the method comprises determining the integer part I of (x2.sup.b), and determining whether to use the integer part I, an incremented version of the integer part I, or a decremented version of the integer part I for the unsigned normalized integer representation U of the number x based on a comparison that uses the fractional part F of (x2.sup.b) and the number x.
DATA PROCESSING SYSTEMS
A method of operating a data processing system when determining an unsigned normalized integer representation U of a number x is disclosed. When the number x has a value between 0 and 1, it is determined 31 whether the number x is greater than or equal to 0.5. When it is determined that the number x is greater than or equal to 0.5, the bit of the binary representation of the number x that represents the value 0.5 is inverted 32, and the unsigned normalized integer representation U of the number x is determined using the value of the binary representation of the number x having its bit that represents the value 0.5 inverted.
ENTROPY SOURCE WITH MAGNETO-RESISTIVE ELEMENT FOR RANDOM NUMBER GENERATOR
An entropy source and a random number (RN) generator are disclosed. In one aspect, a low-energy entropy source includes a magneto-resistive (MR) element and a sensing circuit. The MR element is applied a static current and has a variable resistance determined based on magnetization of the MR element. The sensing circuit senses the resistance of the MR element and provides random values based on the sensed resistance of the MR element. In another aspect, a RN generator includes an entropy source and a post-processing module. The entropy source includes at least one MR element and provides first random values based on the at least one MR element. The post-processing module receives and processes the first random values (e.g., based on a cryptographic hash function, an error detection code, a stream cipher algorithm, etc.) and provides second random values having improved randomness characteristics.
Hardware accelerated machine learning
A machine learning hardware accelerator architecture and associated techniques are disclosed. The architecture features multiple memory banks of very wide SRAM that may be concurrently accessed by a large number of parallel operational units. Each operational unit supports an instruction set specific to machine learning, including optimizations for performing tensor operations and convolutions. Optimized addressing, an optimized shift reader and variations on a multicast network that permutes and copies data and associates with an operational unit that support those operations are also disclosed.