Patent classifications
G06F7/5235
Logarithmic addition-accumulator circuitry, processing pipeline including same, and methods of operation
An integrated circuit including a plurality of logarithmic addition-accumulator circuits, connected in series, to, in operation, perform logarithmic addition and accumulate operations, wherein each logarithmic addition-accumulator circuit includes: (i) a logarithmic addition circuit to add a first input data and a filter weight data, each having the logarithmic data format, and to generate and output first sum data having a logarithmic data format, and (ii) an accumulator, coupled to the logarithmic addition circuit of the associated logarithmic addition-accumulator circuit, to add a second input data and the first sum data output by the associated logarithmic addition circuit to generate first accumulation data. The integrated circuit may further include first data format conversion circuitry, coupled to the output of each logarithmic addition circuit, to convert the data format of the first sum data to a floating point data format wherein the accumulator may be a floating point type.
COMPUTE IN MEMORY THREE-DIMENSIONAL NON-VOLATILE NOR MEMORY FOR NEURAL NETWORKS
A non-volatile memory device for performing compute in memory operations for a neural network uses a three dimensional NOR architecture in which vertical NOR strings are formed of multiple memory cells connected in parallel between a source line and a bit line. Weights of the neural network are encoded as threshold voltages of the memory cells and activations are encoded as word line voltages applied to the memory cells of the NOR strings. The memory cells are operated in the subthreshold region, where the word line voltages are below the threshold voltages. The NOR structure naturally sums the resultant subthreshold currents of the individual memory cells to generate the product of the activations and the weights of the neural network by concurrently applying input voltages to multiple memory cells of a NOR string.
Digital approximate multipliers for machine learning and artificial intelligence applications
Digital approximate multipliers (aMULT) utilizing interpolative apparatuses, circuits, and methods are described in this disclosure. The disclosed aMULT interpolative methods can be arranged or programmed to operate asynchronously and or synchronously. For applications where less precision is acceptable, fewer interpolations can yield less precise multiplication results, while such approximate multiplication can be computed faster and at lower power consumption. Conversely, for applications where higher precision is required, more interpolations can generate more precise multiplication results. As such, by utilizing the disclosed aMULT method, the resolution and precision objectives of an approximate multiplication function can be pre-programmed or adjusted real-time and or on the fly, which enables optimizing for different and flexible power consumption and speed of multiplication, in addition to enabling the optimization of an approximate multiplier's die size and cost in accordance with cost-performance objectives.
PROCESSING WITH COMPACT ARITHMETIC PROCESSING ELEMENT
A processor or other device, such as a programmable and/or massively parallel processor or other device, includes processing elements designed to perform arithmetic operations (possibly but not necessarily including, for example, one or more of addition, multiplication, subtraction, and division) on numerical values of low precision but high dynamic range (“LPHDR arithmetic”). Such a processor or other device may, for example, be implemented on a single chip. Whether or not implemented on a single chip, the number of LPHDR arithmetic elements in the processor or other device in certain embodiments of the present invention significantly exceeds (e.g., by at least 20 more than three times) the number of arithmetic elements, if any, in the processor or other device which are designed to perform high dynamic range arithmetic of traditional precision (such as 32 bit or 64 bit floating point arithmetic).
Processing with compact arithmetic processing element
A processor or other device, such as a programmable and/or massively parallel processor or other device, includes processing elements designed to perform arithmetic operations (possibly but not necessarily including, for example, one or more of addition, multiplication, subtraction, and division) on numerical values of low precision but high dynamic range (“LPHDR arithmetic”). Such a processor or other device may, for example, be implemented on a single chip. Whether or not implemented on a single chip, the number of LPHDR arithmetic elements in the processor or other device in certain embodiments of the present invention significantly exceeds (e.g., by at least 20 more than three times) the number of arithmetic elements, if any, in the processor or other device which are designed to perform high dynamic range arithmetic of traditional precision (such as 32 bit or 64 bit floating point arithmetic).
PROCESSING WITH COMPACT ARITHMETIC PROCESSING ELEMENT
A processor or other device, such as a programmable and/or massively parallel processor or other device, includes processing elements designed to perform arithmetic operations (possibly but not necessarily including, for example, one or more of addition, multiplication, subtraction, and division) on numerical values of low precision but high dynamic range (“LPHDR arithmetic”). Such a processor or other device may, for example, be implemented on a single chip. Whether or not implemented on a single chip, the number of LPHDR arithmetic elements in the processor or other device in certain embodiments of the present invention significantly exceeds (e.g., by at least 20 more than three times) the number of arithmetic elements, if any, in the processor or other device which are designed to perform high dynamic range arithmetic of traditional precision (such as 32 bit or 64 bit floating point arithmetic).
Extended use of logarithm and exponent instructions
Embodiments of the present disclosure are based on a recognition that some processors are configured with instructions to compute logarithms and exponents (i.e. some processors include log and exp circuits). Embodiments of the present disclosure are further based on an insight that the use of the existing log and exp circuits could be extended to compute certain other functions by using the existing log and exp circuits to transform from a Cartesian to a logarithmic domain and vice versa and performing the actual computations of the functions in the logarithmic domain, which may be computationally easier than performing the computations in the Cartesian domain.
Optimization of neural networks using hardware calculation efficiency and adjustment factors
In one embodiment, a method includes receiving a request for an operation to be performed; determining that the operation is associated with a machine-learning algorithm, and in response, route the operation to a computing circuit; performing, at the computing circuit, the operation, including: determining a linear domain product of a first log-domain number and a second log-domain number associated with the operation based on a summation of the first log-domain number and the second log-domain number and output a third log-domain number approximating the linear domain product of the first log-domain number and the second log-domain number; converting the third log-domain number to a first linear-domain number; summing the first linear-domain number and a second linear-domain number associated with the operation, and output a third linear-domain number as the summed result.
Multiplication circuit, system on chip, and electronic device
A multiplication circuit includes an addition subcircuit configured to obtain logarithmic field data a and b that correspond to A and B, and perform an addition operation on a and b to obtain c, where c includes an integral part and a fractional part, an exponentiation operation subcircuit configured to perform an exponentiation operation in which a base is 2 and an exponent is the fractional part of c, to obtain an exponentiation operation result, a shift subcircuit configured to shift the exponentiation operation result based on the integral part of c to obtain a shift result, and an output subcircuit configured to output a product of A and B based on signs of a and b and with reference to the shift result.
Processing with compact arithmetic processing element
Low precision computers can be efficient at finding possible answers to search problems. However, sometimes the task demands finding better answers than a single low precision search. A computer system augments low precision computing with a small amount of high precision computing, to improve search quality with little additional computing.