G06F7/48

Semiconductor memory device employing processing in memory (PIM) and method of operating the semiconductor memory device

A semiconductor memory device includes a plurality of memory bank groups configured to be accessed in parallel; an internal memory bus configured to receive external data from outside the plurality of memory bank groups; and a first computation circuit configured to receive internal data from a first memory bank group of the plurality of memory bank groups during each first period of a plurality of first periods, receive the external data through the internal memory bus during each second period of a plurality of second periods, the second period being shorter than the first period, and perform a processing in memory (PIM) arithmetic operation on the internal data and the external data during each second period.

Method and Apparatus for Configuring a Reduced Instruction Set Computer Processor Architecture to Execute a Fully Homomorphic Encryption Algorithm

Systems and methods for configuring a reduced instruction set computer processor architecture to execute fully homomorphic encryption (FHE) logic gates as a streaming topology. The method includes parsing sequential FHE logic gate code, transforming the FHE logic gate code into a set of code modules that each have in input and an output that is a function of the input and which do not pass control to other functions, creating a node wrapper around each code module, configuring at least one of the primary processing cores to implement the logic element equivalents of each element in a manner which operates in a streaming mode wherein data streams out of corresponding arithmetic logic units into the main memory and other ones of the plurality arithmetic logic units.

Semiconductor memory device employing processing in memory (PIM) and method of operating the semiconductor memory device

A semiconductor memory device includes a plurality of memory bank groups configured to be accessed in parallel; an internal memory bus configured to receive external data from outside the plurality of memory bank groups; and a first computation circuit configured to receive internal data from a first memory bank group of the plurality of memory bank groups during each first period of a plurality of first periods, receive the external data through the internal memory bus during each second period of a plurality of second periods, the second period being shorter than the first period, and perform a processing in memory (PIM) arithmetic operation on the internal data and the external data during each second period.

Semiconductor memory device employing processing in memory (PIM) and method of operating the semiconductor memory device

A semiconductor memory device includes a plurality of memory bank groups configured to be accessed in parallel; an internal memory bus configured to receive external data from outside the plurality of memory bank groups; and a first computation circuit configured to receive internal data from a first memory bank group of the plurality of memory bank groups during each first period of a plurality of first periods, receive the external data through the internal memory bus during each second period of a plurality of second periods, the second period being shorter than the first period, and perform a processing in memory (PIM) arithmetic operation on the internal data and the external data during each second period.

QUANTUM DIVISION OPERATION METHOD AND APPARATUS WITH PRECISION
20230376276 · 2023-11-23 ·

The disclosure relates to the field of quantum computing, specifically to a method and device for quantum division operation with precision. The method includes: obtaining dividend data and divisor data to be operated, transforming the dividend data into a first target quantum state, and transforming the divisor data into a second target quantum state; for the first target quantum state and the second target quantum state, iteratively executing quantum state evolution corresponding to a subtraction operation, counting the number of executions of the subtraction operation until the dividend data is reduced to a negative number, and outputting a finally obtained counting result as integer part of a quotient of dividing the dividend data by the divisor data; for a current first target quantum state and a current second target quantum state, iteratively executing quantum state evolution corresponding to fractional part operation of the quotient; and outputting a finally obtained quantum state on a qubit with preset precision bits. The disclosure realizes a basic arithmetic operation that can be used in quantum circuits, and fills the gap in the related art.

Surface code computations using auto-CCZ quantum states
11568298 · 2023-01-31 · ·

Methods and apparatus for performing surface code computations using Auto-CCZ states. In one aspect, a method for implementing a delayed choice CZ operation on a first and second data qubit using a quantum computer includes: preparing a first and second routing qubit in a magic state; interacting the first data qubit with the first routing qubit and the second data qubit with the second routing qubit using a first and second CNOT operation, where the first and second data qubits act as controls for the CNOT operations; if a received first classical bit represents an off state: applying a first and second Hadamard gate to the first and second routing qubit; measuring the first and second routing qubit using Z basis measurements to obtain a second and third classical bit; and performing classically controlled fixup operations on the first and second data qubit using the second and third classical bits.

N-POINT COMPLEX FOURIER TRANSFORM STRUCTURE HAVING ONLY 2N REAL MULTIPLIES, AND OTHER MATRIX MULTIPLY OPERATIONS
20220398295 · 2022-12-15 ·

An integrated circuit chip implementing multiplication of an M×N element matrix with an N-element vector to obtain an M-element product by combining the vector with rows of bits of the same significance selected from the matrix one bit-row at a time to form partial products, exploiting the fact that the same potential combinations are needed for all bit-rows and all matrix rows to precompute all of the combinations once and for all, and combining selected partial products for different bit place-significance with a shift-and-add operation only once for each of the M product elements, thereby effectively using only M multiply-equivalent structures. An N-point Complex Fourier Transform can therefore be claimed which only needs 2N real multiplies and the product of an N×N matrix with another N×N matrix requires only N.sup.2 multiplies.

Neural network hardware accelerator architectures and operating method thereof
11501130 · 2022-11-15 · ·

A memory-centric neural network system and operating method thereof includes: a processing unit; semiconductor memory devices coupled to the processing unit, the semiconductor memory devices containing instructions executed by the processing unit; a weight matrix constructed with rows and columns of memory cells, inputs of the memory cells of a same row being connected to one of axons, outputs of the memory cells of a same column being connected to one of neurons; timestamp registers registering timestamps of the axons and the neurons; and a lookup table containing adjusting values indexed in accordance with the timestamps, wherein the processing unit updates the weight matrix in accordance with the adjusting values.

Neural network hardware accelerator architectures and operating method thereof
11501130 · 2022-11-15 · ·

A memory-centric neural network system and operating method thereof includes: a processing unit; semiconductor memory devices coupled to the processing unit, the semiconductor memory devices containing instructions executed by the processing unit; a weight matrix constructed with rows and columns of memory cells, inputs of the memory cells of a same row being connected to one of axons, outputs of the memory cells of a same column being connected to one of neurons; timestamp registers registering timestamps of the axons and the neurons; and a lookup table containing adjusting values indexed in accordance with the timestamps, wherein the processing unit updates the weight matrix in accordance with the adjusting values.

Oblivious carry runway registers for performing piecewise additions
11475348 · 2022-10-18 · ·

Methods and apparatus for piecewise addition into an accumulation register using one or more carry runway registers, where the accumulation register includes a first plurality of qubits with each qubit representing a respective bit of a first binary number and where each carry runway register includes multiple qubits representing a respective binary number. In one aspect, a method includes inserting the one or more carry runway registers into the accumulation register at respective predetermined qubit positions, respectively, of the accumulation register; initializing each qubit of each carry runway register in a plus state; applying one or more subtraction operations to the accumulation register, where each subtraction operation subtracts a state of a respective carry runway register from a corresponding portion of the accumulation register; and adding one or more input binary numbers into the accumulation register using piecewise addition.