G06F7/768

FFT ENGINE HAVING COMBINED BIT-REVERSAL AND MEMORY TRANSPOSE OPERATIONS
20200192968 · 2020-06-18 ·

A data processing device includes: 1) Fast Fourier Transform (FFT) logic configured to generate FFT output samples for each of a plurality of digital input signals; 3) a first memory device with a plurality of banks; 4) a second memory device; 5) a bit-reversed address generator and first set of circular shift components configured to shift between the plurality of banks when writing the generated FFT output samples in bit-reversed address order to the first memory device; and 6) a second set of circular shift components configured to shift between the plurality of banks when reading FFT output samples in linear address order from the first memory device for storage in the second memory device, wherein the first and second set of circular shift components together are configured to read FFT output samples in transpose order using combined bit-reversal and memory transpose operations.

AUDIO SIGNAL CIRCUIT WITH IN-PLACE BIT-REVERSAL
20200125334 · 2020-04-23 · ·

An integrated circuit for processing audio signals from a microphone assembly, combinations thereof and methods therefor, including a multi-issue processor configured to execute multiple instructions concurrently and connectable to a memory with a plurality of locations each represented by a corresponding index. Bit-reversal is performed on a sequence of audio data bits stored in memory by concurrently performing a load or store operation related to a first index and determining whether to perform a load operation for a second index.

Tiled Switch Matrix Data Permutation Circuit
20200073638 · 2020-03-05 ·

Embodiments of the present disclosure pertain to switch matrix circuit including a data permutation circuit. In one embodiment, the switch matrix comprises a plurality of adjacent switching blocks configured along a first axis, wherein the plurality of adjacent switching blocks each receive data and switch control settings along a second axis. The switch matrix includes a permutation circuit comprising, in each switching block, a plurality of switching stages spanning a plurality of adjacent switching blocks and at least one switching stage that does not span to adjacent switching blocks. The permutation circuit receives data in a first pattern and outputs the data in a second pattern. The data permutation performed by the switching stages is based on the particular switch control settings received in the adjacent switching blocks along the second axis.

TRANSPOSING IN A MATRIX-VECTOR PROCESSOR

A circuit for transposing a matrix comprising reversal circuitry configured, for each of one or more diagonals of the matrix, to receive elements of the matrix in a first vector and generate a second vector that includes the elements of the matrix in an order that is a reverse of an order of the elements of the matrix in the first vector, and rotation circuitry configured, for each of the one or more diagonals of the matrix, to determine a number of positions by which to rotate the elements of the matrix in the second vector, receive the second vector of elements of the matrix, and generate a third vector that includes the elements of the matrix in the second vector in an order that is a rotation of the elements of the matrix in the second vector by the determined number of positions.

Secure joining system, method, secure computing apparatus and program

A secure joining system is a secure joining system including a plurality of secure computing apparatuses. The plurality of secure computing apparatuses include a first vector joining unit, a first permutation calculation unit, a first vector generation unit, a second vector joining unit, a first permutation application unit, a second vector generation unit, a first inverse permutation application unit, a first vector extraction unit, a second permutation application unit, a third vector generation unit, a second inverse permutation application unit, a second vector extraction unit, a modified second table generation unit, a third permutation application unit, a fourth vector generation unit, a shifting unit, a third inverse permutation application unit, a bit inversion unit, a third vector extraction unit, a modified first table generation unit, a first table joining unit, and a first table formatting unit.

COMPUTE ENGINE WITH TRANSPOSE CIRCUITRY
20240103813 · 2024-03-28 ·

An integrated circuit that combines transpose and compute operations may include a transpose circuit coupled to a set of compute channels. Each compute channel may include multiple arithmetic logic unit (ALU) circuits coupled in series. The transpose circuit is operable to receive an input tensor, transpose the input tensor, and output a transposed tensor to the set of compute channels. The set of compute channels is operable to generate outputs in parallel, with each of the outputs being generated from a corresponding vector of the transposed tensor.

COMPUTATION METHOD AND COMPUTATION APPARATUS WITH INPUT SWAPPING

A computation apparatus and a computation method with input swapping are provided. The computation apparatus includes a non-zero detection circuit, a swapper policy circuit, a swapper matrix circuit, and an adder tree. The non-zero detection circuit is configured to receive input vectors, inspect non-zero operands in the input vectors and generate a non-zero indicative signal indicating the non-zero operands. The swapper policy circuit is configured to receive and interpret the non-zero indicative signal, and generate multiplexer (MUX) selection signals for swapping the non-zero operands according to a set of swapping policies. The swapper matrix circuit is configured to receive the input vectors and the MUX selection signal, and perform swapping on operands in the input vectors according to the MUX selection signal. The adder tree is configured to receive the input vectors with the swapped operands and perform additions on the input vectors to output a computation result.

HARDWARE-BASED GALOIS MULTIPLICATION

A processor includes an instruction fetch unit that fetches instructions to be executed, an architected register file including a plurality of registers for storing source and destination operands, and an execution unit for executing a Galois multiply instruction. The execution unit includes a carryless multiplier configured to multiply operands of the Galois multiply instruction to generate a product. The execution unit further includes a modular reduction circuit configured to receive the product and determine, based on a logical combination of the product and a fixed polynomial, a reduced product having a fewer number of bits than the product. The execution unit is configured to store the reduced product to the architected register file as a result of the Galois multiply instruction.

CONSTANT MODULO VIA RECIRCULANT REDUCTION

Described herein is a generalized optimal reduction scheme for reducing an array modulo a constant. The constant modulo operation calculates a result for array of bits x.sub.i, width n modulo an odd positive integer constant d, (e.g., x[n:0] mod d). Circuitry to perform such operation can be configured to compress the array of bits x.sub.i, width n into an array of bits y.sub.i width m. The techniques described herein enable the design of optimal circuitry via iterative exploration of all potential reduction strategies that are available given the input constraints.

Transposing in a matrix-vector processor

A circuit for transposing a matrix comprising reversal circuitry configured, for each of one or more diagonals of the matrix, to receive elements of the matrix in a first vector and generate a second vector that includes the elements of the matrix in an order that is a reverse of an order of the elements of the matrix in the first vector, and rotation circuitry configured, for each of the one or more diagonals of the matrix, to determine a number of positions by which to rotate the elements of the matrix in the second vector, receive the second vector of elements of the matrix, and generate a third vector that includes the elements of the matrix in the second vector in an order that is a rotation of the elements of the matrix in the second vector by the determined number of positions.