G06F7/57

Arithmetic processing apparatus, control method of arithmetic processing apparatus, and non-transitory computer-readable storage medium for storing program
11593071 · 2023-02-28 · ·

An arithmetic processing apparatus includes: a plurality of nodes (N nodes) capable of communicating with each other, each of the plurality of nodes including a memory and a processor, the memory being configured to store a value and an operation result, the processor being configured to execute first processing when N is a natural number of 2 or more, n is a natural number of 1 or more, and N≠2.sup.n, wherein the first processing is configured to divide by 2 a value held by a first node, the first node being any of the plurality of nodes and a last node in an order of counting, obtain one or more node pairs by pairing remaining nodes among the plurality of nodes exception for the first node, and calculate repeatedly an average value of values held by each node pair of the one or more node pairs.

Arithmetic device
11593070 · 2023-02-28 · ·

According to one embodiment, an arithmetic device includes an arithmetic circuit. The arithmetic circuit includes a memory part including a plurality of memory regions, and an arithmetic part. One of the memory regions includes a capacitance including a first terminal, and a first electrical circuit electrically connected to the first terminal and configured to output a voltage signal corresponding to a potential of the first terminal.

Arithmetic device
11593070 · 2023-02-28 · ·

According to one embodiment, an arithmetic device includes an arithmetic circuit. The arithmetic circuit includes a memory part including a plurality of memory regions, and an arithmetic part. One of the memory regions includes a capacitance including a first terminal, and a first electrical circuit electrically connected to the first terminal and configured to output a voltage signal corresponding to a potential of the first terminal.

APPLICATION PROGRAMMING INTERFACE TO DECOMPRESS DATA

Apparatuses, systems, and techniques to perform an operation to indicate one or more non-zero values within one or more matrices of data; to perform an API to compress one or more matrices of data; to perform a matrix multiply accumulate (MMA) operation on two or more matrices of data, wherein at least one of the two or more matrices contain compressed data; and/or to perform an API to decompress one or more matrices of data. In at least one embodiment, one or more circuits are configured to receive and compile one or more instructions to perform computational operations for a sparse matrix multiplication.

Parallel processor optimized for machine learning

A parallel processor system for machine learning includes an arithmetic unit (ALU) array including several ALUs and a controller to provide instructions for the ALUs. The system further includes a direct-access memory (DMA) block containing multiple DMA engines to access an external memory to retrieve data. An input-stream buffer decouples the DMA block from the ALU array and provides aligning and reordering of the retrieved data. The DMA engines operate in parallel and include rasterization logic capable of performing a three-dimensional (3-D) rasterization.

Parallel processor optimized for machine learning

A parallel processor system for machine learning includes an arithmetic unit (ALU) array including several ALUs and a controller to provide instructions for the ALUs. The system further includes a direct-access memory (DMA) block containing multiple DMA engines to access an external memory to retrieve data. An input-stream buffer decouples the DMA block from the ALU array and provides aligning and reordering of the retrieved data. The DMA engines operate in parallel and include rasterization logic capable of performing a three-dimensional (3-D) rasterization.

Method, system and device for multi-cycle division operation
11500612 · 2022-11-15 · ·

The present disclosure relates generally to arithmetic units of processors, and may relate more particularly to multi-cycle division operations. Multiple-cycles of a radix-m division operation may be performed to generate one or more signal states representative of a result value based at least in part on a dividend value and a divisor value.

Method, system and device for multi-cycle division operation
11500612 · 2022-11-15 · ·

The present disclosure relates generally to arithmetic units of processors, and may relate more particularly to multi-cycle division operations. Multiple-cycles of a radix-m division operation may be performed to generate one or more signal states representative of a result value based at least in part on a dividend value and a divisor value.

Method and Apparatus for Configuring a Reduced Instruction Set Computer Processor Architecture to Execute a Fully Homomorphic Encryption Algorithm

Systems and methods for configuring a reduced instruction set computer processor architecture to execute fully homomorphic encryption (FHE) logic gates as a streaming topology. The method includes parsing sequential FHE logic gate code, transforming the FHE logic gate code into a set of code modules that each have in input and an output that is a function of the input and which do not pass control to other functions, creating a node wrapper around each code module, configuring at least one of the primary processing cores to implement the logic element equivalents of each element in a manner which operates in a streaming mode wherein data streams out of corresponding arithmetic logic units into the main memory and other ones of the plurality arithmetic logic units.

ARITHMETIC PROCESSING DEVICE AND ARITHMETIC METHOD
20220357925 · 2022-11-10 · ·

An arithmetic processing device includes: a first multiplier circuit configured to calculate a product DY of an approximate value D obtained by approximating a reciprocal 1/Y of a divisor Y; a dividend operation circuit configured to compare a dividend X and the divisor Y, and generate an operation value twice the dividend X or the operation value equal to the dividend X based on a comparison result; a second multiplier circuit configured to calculate a product of the approximate value D and the operation value as an initial value R(0) of a partial remainder R(n); a third multiplier circuit configured to calculate a product DY*q(n) of the product DY and a partial quotient q(n) that is a predetermined number of upper bits of the partial remainder R(n); and a first addition circuit configured to calculate a new partial remainder R(n) by subtracting the product DY*q(n) from the partial remainder R(n).