G06F7/556

Floating point unit for exponential function implementation

A computer-implemented method for performing an exponential calculation using only two fully-pipelined instructions in a floating point unit that includes. The method includes computing an intermediate value y′ by multiplying an input operand with a predetermined constant value. The input operand is received in floating point representation. The method further includes computing an exponential result for the input operand by executing a fused instruction. The fused instructions includes converting the intermediate value y′ to an integer representation z represented by v most significant bits (MSB), and w least significant bits (LSB). The fused instruction further includes determining exponent bits of the exponential result based on the v MSB from the integer representation z. The method further includes determining mantissa bits of the exponential result according to a piece-wise linear mapping function using a predetermined number of segments based on the w LSB from the integer representation z.

Floating point unit for exponential function implementation

A computer-implemented method for performing an exponential calculation using only two fully-pipelined instructions in a floating point unit that includes. The method includes computing an intermediate value y′ by multiplying an input operand with a predetermined constant value. The input operand is received in floating point representation. The method further includes computing an exponential result for the input operand by executing a fused instruction. The fused instructions includes converting the intermediate value y′ to an integer representation z represented by v most significant bits (MSB), and w least significant bits (LSB). The fused instruction further includes determining exponent bits of the exponential result based on the v MSB from the integer representation z. The method further includes determining mantissa bits of the exponential result according to a piece-wise linear mapping function using a predetermined number of segments based on the w LSB from the integer representation z.

SYSTEM AND METHOD FOR DISTRIBUTED LAPLACE NOISE GENERATION FOR DIFFERENTIAL PRIVACY

A computer-implemented method includes generating shared random bits at the two or more nodes in a multi-party computation system, obtaining one or more Gaussian samples at the two or more modes utilizing the shared random bits, at each of the two or more nodes, generate and output one or more Laplacian samples using the one or more Gaussian samples.

SYSTEM AND METHOD FOR DISTRIBUTED LAPLACE NOISE GENERATION FOR DIFFERENTIAL PRIVACY

A computer-implemented method includes generating shared random bits at the two or more nodes in a multi-party computation system, obtaining one or more Gaussian samples at the two or more modes utilizing the shared random bits, at each of the two or more nodes, generate and output one or more Laplacian samples using the one or more Gaussian samples.

Float Division by Constant Integer
20230297338 · 2023-09-21 ·

A binary logic circuit for determining the ratio x/d where x is a variable integer input, the binary logic circuit comprising: a logarithmic tree of modulo units each configured to calculate x[a:b]mod d for respective block positions a and b in x where b > a with the numbering of block positions increasing from the most significant bit of x up to the least significant bit of x, the modulo units being arranged such that a subset of M - 1 modulo units of the logarithmic tree provide x[0: m]mod d for all m ∈ {1, M}, and, on the basis that any given modulo unit introduces a delay of 1: all of the modulo units are arranged in the logarithmic tree within a delay envelope of log.sub.2 M; and more than M - 2.sup.u of the subset of modulo units are arranged at the maximal delay of log.sub.2 M, where 2.sup.u is the power of 2 immediately smaller than M.

Float Division by Constant Integer
20230297338 · 2023-09-21 ·

A binary logic circuit for determining the ratio x/d where x is a variable integer input, the binary logic circuit comprising: a logarithmic tree of modulo units each configured to calculate x[a:b]mod d for respective block positions a and b in x where b > a with the numbering of block positions increasing from the most significant bit of x up to the least significant bit of x, the modulo units being arranged such that a subset of M - 1 modulo units of the logarithmic tree provide x[0: m]mod d for all m ∈ {1, M}, and, on the basis that any given modulo unit introduces a delay of 1: all of the modulo units are arranged in the logarithmic tree within a delay envelope of log.sub.2 M; and more than M - 2.sup.u of the subset of modulo units are arranged at the maximal delay of log.sub.2 M, where 2.sup.u is the power of 2 immediately smaller than M.

Deep neural network architecture using piecewise linear approximation

A log circuit for piecewise linear approximation is disclosed. The log circuit identifies an input associated with a logarithm operation to be performed using piecewise linear approximation. The log circuit then identifies a range that the input falls within from various ranges associated with piecewise linear approximation (PLA) equations for the logarithm operation, where the identified range corresponds to one of the PLA equations. The log circuit computes a result of the corresponding PLA equation based on the respective operands of the equation. The log circuit then returns an output associated with the logarithm operation, which is based at least partially on the result of the PLA equation.

Deep neural network architecture using piecewise linear approximation

A log circuit for piecewise linear approximation is disclosed. The log circuit identifies an input associated with a logarithm operation to be performed using piecewise linear approximation. The log circuit then identifies a range that the input falls within from various ranges associated with piecewise linear approximation (PLA) equations for the logarithm operation, where the identified range corresponds to one of the PLA equations. The log circuit computes a result of the corresponding PLA equation based on the respective operands of the equation. The log circuit then returns an output associated with the logarithm operation, which is based at least partially on the result of the PLA equation.

SYSTEM AND METHOD FOR ACCELERATING TRAINING OF DEEP LEARNING NETWORKS
20230297337 · 2023-09-21 ·

A system and method for accelerating multiply-accumulate (MAC) floating-point units during training of deep learning networks. The method including: receiving a first input data stream A and a second input data stream B; adding exponents of the first data stream A and the second data stream B in pairs to produce product exponents; determining a maximum exponent using a comparator; determining a number of bits by which each significand in the second data stream has to be shifted prior to accumulation by adding product exponent deltas to the corresponding term in the first data stream and using an adder tree to reduce the operands in the second data stream into a single partial sum; adding the partial sum to a corresponding aligned value using the maximum exponent to determine accumulated values; and outputting the accumulated values.

SYSTEM AND METHOD FOR ACCELERATING TRAINING OF DEEP LEARNING NETWORKS
20230297337 · 2023-09-21 ·

A system and method for accelerating multiply-accumulate (MAC) floating-point units during training of deep learning networks. The method including: receiving a first input data stream A and a second input data stream B; adding exponents of the first data stream A and the second data stream B in pairs to produce product exponents; determining a maximum exponent using a comparator; determining a number of bits by which each significand in the second data stream has to be shifted prior to accumulation by adding product exponent deltas to the corresponding term in the first data stream and using an adder tree to reduce the operands in the second data stream into a single partial sum; adding the partial sum to a corresponding aligned value using the maximum exponent to determine accumulated values; and outputting the accumulated values.