Patent classifications
G06F7/48
Neural processing element with single instruction multiple data (SIMD) compute lanes
An architecture is disclosed for an neural processing element having single instruction, multiple data (“SIMD”) compute lanes. The neural processing element includes compute lanes having multipliers configured to multiply a binary operand with another binary operand to generate a binary output. The neural processing element also includes a single adder tree for summing the binary outputs of the hardware binary multipliers. The neural processing element also includes a storage element for storing a binary output of the single hardware binary adder tree.
Quantum circuit optimization using windowed quantum arithmetic
Methods, systems and apparatus for performing windowed quantum arithmetic. In one aspect, a method for performing a product addition operation includes: determining multiple entries of a lookup table, comprising, for each index in a first set of indices, multiplying the index value by a scalar for the product addition operation; for each index in a second set of indices, determining multiple address values, comprising extracting source register values corresponding to indices between i) the index in the second set of indices, and ii) the index in the second set of indices plus the predetermined window size; and adjusting values of a target quantum register based on the determined multiple entries of the lookup table and the determined multiple address values.
Neural network hardware accelerator architectures and operating method thereof
A memory-centric neural network system and operating method thereof includes: a processing unit; semiconductor memory devices coupled to the processing unit, the semiconductor memory devices containing instructions executed by the processing unit; a weight matrix constructed with rows and columns of memory cells, inputs of the memory cells of a same row being connected to one of axons, outputs of the memory cells of a same column being connected to one of neurons; timestamp registers registering timestamps of the axons and the neurons; and a lookup table containing adjusting values indexed in accordance with the timestamps, wherein the processing unit updates the weight matrix in accordance with the adjusting values.
Neural network hardware accelerator architectures and operating method thereof
A memory-centric neural network system and operating method thereof includes: a processing unit; semiconductor memory devices coupled to the processing unit, the semiconductor memory devices containing instructions executed by the processing unit; a weight matrix constructed with rows and columns of memory cells, inputs of the memory cells of a same row being connected to one of axons, outputs of the memory cells of a same column being connected to one of neurons; timestamp registers registering timestamps of the axons and the neurons; and a lookup table containing adjusting values indexed in accordance with the timestamps, wherein the processing unit updates the weight matrix in accordance with the adjusting values.
Method and Apparatus for Configuring a Reduced Instruction Set Computer Processor Architecture to Execute a Fully Homomorphic Encryption Algorithm
Systems and methods for configuring a reduced instruction set computer processor architecture to execute fully homomorphic encryption (FHE) logic gates as a streaming topology. The method includes parsing sequential FHE logic gate code, transforming the FHE logic gate code into a set of code modules that each have in input and an output that is a function of the input and which do not pass control to other functions, creating a node wrapper around each code module, configuring at least one of the primary processing cores to implement the logic element equivalents of each element in a manner which operates in a streaming mode wherein data streams out of corresponding arithmetic logic units into the main memory and other ones of the plurality arithmetic logic units.
METHOD FOR IMPLEMENTING DOT PRODUCT OPERATION, ELECTRONIC DEVICE AND STORAGE MEDIUM
Method and device relate to the fields of deep learning and artificial intelligence; the method may include: acquiring N operand sets, N is a positive integer greater than one, the N operand sets are all in a first data input format or all in a second data input format, the first data input format includes half-precision floating point data and char data, and the second data input format includes signed fixed point data and the char data; determining input data corresponding to each operand, and inputting the input data into a corresponding multiplier to obtain an output result, where different operands correspond to different multipliers respectively; and calculating a sum of the output results of the multipliers by one or more adders to obtain an operation result of the N-dot-product operation.
Floating point to fixed point conversion
A binary logic circuit converts a number in floating point format having an exponent E of ew bits, an exponent bias B given by B=2.sup.ew-1−1, and a significand comprising a mantissa M of mw bits into a fixed point format with an integer width of iw bits and a fractional width of fw bits. The circuit includes a shifter operable to receive a significand input comprising a contiguous set of the most significant bits of the significand and configured to left-shift the significand input by a number of bits equal to the value represented by k least significant bits of the exponent to generate a shifter output, wherein min{(ew−1),bitwidth(iw−2−s.sub.y)}≤k≤(ew−1) where s.sub.y=1 for a signed floating point number and s.sub.y=0 for an unsigned floating point number, and a multiplexer coupled to the shifter and configured to: receive an input comprising a contiguous set of bits of the shifter output; and output the input if the most significant bit of the exponent is equal to one.
Methods and Apparatus for Quotient Digit Recoding in a High-Performance Arithmetic Unit
A divider includes a digit recoder that recodes upper bits of a partial remainder into sets of lower-radix multiples without carry propagate addition. Elimination of the carry propagate adder makes computation of the quotient carry free and independent of the number of bits computed per cycle, thereby enabling a higher number of bits per cycle, as well as increased clock speeds.
In-memory full adder
A non-destructive memory array implements a full adder. The array includes a column connected by a bit line and a full adder unit. The column stores a first bit in a first row of the bit line, a second bit in a second row of the bit line, and an inverse of a carry-in bit in a third row of the bit line. The full adder unit stores, in the second and third rows of the bit line, a sum bit and a carry out bit output, respectively, of adding the first bit, the second bit and the carry-in bit. The full adder unit does not overwrite any of the bits when a full adder table indicates that the sum bit and the carry out bit are equivalent to the second bit and the carry-in bit.
Signal processing apparatus, method, program, and recording medium
A signal processing apparatus comprises an operation processing part that performs operation processing on data represented in the two's complement representation and a storage processing part that performs storage processing on data represented in a second representation format as a data representation format, and in the second representation format, a data value is identical to one in the two's complement representation when the value is positive or zero, and all the bits lower than the most significant bit that indicates the sign in the two's complement representation are inverted when a data value is negative.