Patent classifications
G06F7/523
PROCESSING-IN-MEMORY (PIM) SYSTEM AND OPERATING METHODS OF THE PIM SYSTEM
A processing-in-memory (PIM) system includes a PIM device and a controller. The PIM device includes a data storage region and an arithmetic circuit for performing an arithmetic operation for data outputted from the data storage region. The controller is configured to control the PIM device. The PIM device is configured to transmit arithmetic quantity data of the arithmetic circuit to the controller in response to a request of the controller.
PROCESSING-IN-MEMORY (PIM) SYSTEM AND OPERATING METHODS OF THE PIM SYSTEM
A processing-in-memory (PIM) system includes a PIM device and a controller. The PIM device includes a data storage region and an arithmetic circuit for performing an arithmetic operation for data outputted from the data storage region. The controller is configured to control the PIM device. The PIM device is configured to transmit arithmetic quantity data of the arithmetic circuit to the controller in response to a request of the controller.
Decoupled Execution Of Workload For Crossbar Arrays
A computing system architecture is presented for decoupling execution of workload by crossbar arrays and similar memory modules. The computing system includes: a data bus; a core controller connected to the data bus; and a plurality of local tiles connected to the data bus. Each local tile in the plurality of local tiles includes a local controller and at least one memory module, where the memory module performs computation using the data stored in memory without reading the data out of the memory.
Decoupled Execution Of Workload For Crossbar Arrays
A computing system architecture is presented for decoupling execution of workload by crossbar arrays and similar memory modules. The computing system includes: a data bus; a core controller connected to the data bus; and a plurality of local tiles connected to the data bus. Each local tile in the plurality of local tiles includes a local controller and at least one memory module, where the memory module performs computation using the data stored in memory without reading the data out of the memory.
SMALL MULTIPLIER AFTER INITIAL APPROXIMATION FOR OPERATIONS WITH INCREASING PRECISION
In an aspect, a processor includes circuitry for iterative refinement approaches, e.g., Newton-Raphson, to evaluating functions, such as square root, reciprocal, and for division. The circuitry includes circuitry for producing an initial approximation; which can include a LookUp Table (LUT). LUT may produce an output that (with implementation-dependent processing) forms an initial approximation of a value, with a number of bits of precision. A limited-precision multiplier multiplies that initial approximation with another value; an output of the limited precision multiplier goes to a full precision multiplier circuit that performs remaining multiplications required for iteration(s) in the particular refinement process being implemented. For example, in division, the output being calculated is for a reciprocal of the divisor. The full-precision multiplier circuit requires a first number of clock cycles to complete, and both the small multiplier and the initial approximation circuitry complete within the first number of clock cycles.
SMALL MULTIPLIER AFTER INITIAL APPROXIMATION FOR OPERATIONS WITH INCREASING PRECISION
In an aspect, a processor includes circuitry for iterative refinement approaches, e.g., Newton-Raphson, to evaluating functions, such as square root, reciprocal, and for division. The circuitry includes circuitry for producing an initial approximation; which can include a LookUp Table (LUT). LUT may produce an output that (with implementation-dependent processing) forms an initial approximation of a value, with a number of bits of precision. A limited-precision multiplier multiplies that initial approximation with another value; an output of the limited precision multiplier goes to a full precision multiplier circuit that performs remaining multiplications required for iteration(s) in the particular refinement process being implemented. For example, in division, the output being calculated is for a reciprocal of the divisor. The full-precision multiplier circuit requires a first number of clock cycles to complete, and both the small multiplier and the initial approximation circuitry complete within the first number of clock cycles.
SYSTOLIC PARALLEL GALOIS HASH COMPUTING DEVICE
A computing device (e.g., an FPGA or integrated circuit) processes an incoming packet comprising data to compute a Galois hash. The computing device includes a plurality of circuits, each circuit providing a respective result used to determine the Galois hash, and each circuit including: a first multiplier configured to receive a portion of the data; a first exclusive-OR gate configured to receive an output of the first multiplier as a first input, and to provide the respective result; and a second multiplier configured to receive an output of the first exclusive-OR gate, wherein the first exclusive-OR gate is further configured to receive an output of the second multiplier as a second input. In one embodiment, the computing device further comprises a second exclusive-OR gate configured to output the Galois hash, wherein each respective result is provided as an input to the second exclusive-OR gate.
SYSTOLIC PARALLEL GALOIS HASH COMPUTING DEVICE
A computing device (e.g., an FPGA or integrated circuit) processes an incoming packet comprising data to compute a Galois hash. The computing device includes a plurality of circuits, each circuit providing a respective result used to determine the Galois hash, and each circuit including: a first multiplier configured to receive a portion of the data; a first exclusive-OR gate configured to receive an output of the first multiplier as a first input, and to provide the respective result; and a second multiplier configured to receive an output of the first exclusive-OR gate, wherein the first exclusive-OR gate is further configured to receive an output of the second multiplier as a second input. In one embodiment, the computing device further comprises a second exclusive-OR gate configured to output the Galois hash, wherein each respective result is provided as an input to the second exclusive-OR gate.
TWO-DIMENSIONAL ARRAY-BASED NEUROMORPHIC PROCESSOR AND IMPLEMENTING METHOD
A 2D array-based neuromorphic processor includes: axon circuits each being configured to receive a first input corresponding to one bit from among bits indicating n-bit activation; first direction lines extending in a first direction from the axon circuits; second direction lines intersecting the first direction lines; synapse circuits disposed at intersections of the first direction lines and the second direction lines, and each being configured to store a second input corresponding to one bit from among bits indicating an m-bit weight and to output operation values of the first input and the second input; and neuron circuits connected to the first or second direction lines, each of the neuron circuits being configured to receive an operation value output from at least one of the synapse circuits, based on time information assigned individually to the synapse circuits, and to perform an arithmetic operation by using the operation values.
TWO-DIMENSIONAL ARRAY-BASED NEUROMORPHIC PROCESSOR AND IMPLEMENTING METHOD
A 2D array-based neuromorphic processor includes: axon circuits each being configured to receive a first input corresponding to one bit from among bits indicating n-bit activation; first direction lines extending in a first direction from the axon circuits; second direction lines intersecting the first direction lines; synapse circuits disposed at intersections of the first direction lines and the second direction lines, and each being configured to store a second input corresponding to one bit from among bits indicating an m-bit weight and to output operation values of the first input and the second input; and neuron circuits connected to the first or second direction lines, each of the neuron circuits being configured to receive an operation value output from at least one of the synapse circuits, based on time information assigned individually to the synapse circuits, and to perform an arithmetic operation by using the operation values.