G06F7/48

Cascaded computing for convolutional neural networks

Techniques are described for efficiently reducing the amount of total computation in convolutional neural networks (CNNs) without affecting the output result or classification accuracy. Computation redundancy in CNNs is reduced by exploiting the computing nature of the convolution and subsequent pooling (e.g., sub-sampling) operations. In some implementations, the input features may be divided into a group of precision values and the operation(s) may be cascaded. A maximum may be identified (e.g., by 90% probability) using a small number of bits in the input features, and the full-precision convolution may then be performed on the maximum input. Accordingly, the total number of bits used to perform the convolution is reduced without affecting the output features or the final classification accuracy.

Cascaded computing for convolutional neural networks

Techniques are described for efficiently reducing the amount of total computation in convolutional neural networks (CNNs) without affecting the output result or classification accuracy. Computation redundancy in CNNs is reduced by exploiting the computing nature of the convolution and subsequent pooling (e.g., sub-sampling) operations. In some implementations, the input features may be divided into a group of precision values and the operation(s) may be cascaded. A maximum may be identified (e.g., by 90% probability) using a small number of bits in the input features, and the full-precision convolution may then be performed on the maximum input. Accordingly, the total number of bits used to perform the convolution is reduced without affecting the output features or the final classification accuracy.

QUANTUM CIRCUIT OPTIMIZATION USING WINDOWED QUANTUM ARITHMETIC
20230281497 · 2023-09-07 ·

Methods, systems and apparatus for performing windowed quantum arithmetic. In one aspect, a method for performing a product addition operation includes: determining multiple entries of a lookup table, comprising, for each index in a first set of indices, multiplying the index value by a scalar for the product addition operation; for each index in a second set of indices, determining multiple address values, comprising extracting source register values corresponding to indices between i) the index in the second set of indices, and ii) the index in the second set of indices plus the predetermined window size; and adjusting values of a target quantum register based on the determined multiple entries of the lookup table and the determined multiple address values.

Information processing apparatus, arithmetic processing device, and method of controlling information processing apparatus
11756289 · 2023-09-12 · ·

An information processing apparatus includes: a first preprocessing arithmetic device configured to execute preprocessing for analog data from a first sensor; and a first post-processing arithmetic device connected to the first preprocessing arithmetic device and configured to execute post-processing for first preprocessed data, wherein the first preprocessing arithmetic device includes a first processor configured to: receive the analog data from the first sensor and convert the analog data into digital data; output feature data on the basis of a result of execution of feature extraction processing for the digital data; and output the feature data, and the first post-processing arithmetic device includes a second processor configured to: input the feature data; store the feature data in a first memory; and store, in the first memory, recognition result data based on a result of execution of recognition processing for the feature data.

Information processing apparatus, arithmetic processing device, and method of controlling information processing apparatus
11756289 · 2023-09-12 · ·

An information processing apparatus includes: a first preprocessing arithmetic device configured to execute preprocessing for analog data from a first sensor; and a first post-processing arithmetic device connected to the first preprocessing arithmetic device and configured to execute post-processing for first preprocessed data, wherein the first preprocessing arithmetic device includes a first processor configured to: receive the analog data from the first sensor and convert the analog data into digital data; output feature data on the basis of a result of execution of feature extraction processing for the digital data; and output the feature data, and the first post-processing arithmetic device includes a second processor configured to: input the feature data; store the feature data in a first memory; and store, in the first memory, recognition result data based on a result of execution of recognition processing for the feature data.

High throughput parallel architecture for recursive sinusoid synthesizer

A first multiplier multiplies a first input with a first coefficient and a first adder sums an output of the first multiplier and a second input to generate a first output. A second multiplier multiplies a third input with a second coefficient, a third multiplier multiplies a fourth input with a third coefficient, and a second adder sums outputs of the second and third multipliers to generate a second output. The second and third inputs are derived from the first output and the first and fourth inputs are derived from the second output. The first and second outputs generate digital values for first and second digital sinusoids, respectively.

Error Correction in Computation
20220414185 · 2022-12-29 ·

Introduced here is a technique to detect and/or correct errors in computation. The ability to correct errors in computation can increase the speed of the processor, reduce the power consumption of the processor, and reduce the distance between the transistors within the processor because the errors thus generated can be detected and corrected. In one embodiment, an error correcting module, running either in software or in hardware, can detect an error in matrix multiplication, by calculating an expected sum of all elements in the resulting matrix, and an actual sum of all elements in the resulting matrix. When there is a difference between the expected sum and the resulting sum, the error correcting module detects an error. In another embodiment, in addition to detecting the error, the error correcting module can determine the location and the magnitude of the error, thus correcting the erroneous computation.

Iterative Estimation Hardware
20220391205 · 2022-12-08 ·

A function estimation hardware logic unit may be implemented as part of an execution pipeline in a processor. The function estimation hardware logic unit is arranged to calculate, in hardware logic, an improved estimate of a function of an input value, d, where the function is given by

[00001] 1 / d i .

The hardware logic comprises a plurality of multipliers and adders arranged to implement a m.sup.th-order polynomial with coefficients that are rational numbers, where m is not equal to two and in various examples m is not equal to a power of two. In various examples i=1, i=2 or i=3. In various examples m=3.

Measurement based uncomputation for quantum circuit optimization
11531923 · 2022-12-20 · ·

Methods and apparatus for optimizing a quantum circuit. In one aspect, a method includes identifying one or more sequences of operations in the quantum circuit that un-compute respective qubits on which the quantum circuit operates; generating an adjusted quantum circuit, comprising, for each identified sequence of operations in the quantum circuit, replacing the sequence of operations with an X basis measurement and a classically-controlled phase correction operation, wherein a result of the X basis measurement acts as a control for the classically-controlled correction phase operation; and executing the adjusted quantum circuit.

Semiconductor memory device employing processing in memory (PIM) and method of operating the semiconductor memory device

A semiconductor memory device includes a plurality of memory bank groups configured to be accessed in parallel; an internal memory bus configured to receive external data from outside the plurality of memory bank groups; and a first computation circuit configured to receive internal data from a first memory bank group of the plurality of memory bank groups during each first period of a plurality of first periods, receive the external data through the internal memory bus during each second period of a plurality of second periods, the second period being shorter than the first period, and perform a processing in memory (PIM) arithmetic operation on the internal data and the external data during each second period.