G06F7/768

FFT ENGINE HAVING COMBINED BIT-REVERSAL AND MEMORY TRANSPOSE OPERATIONS
20210279298 · 2021-09-09 ·

A data processing device includes: 1) Fast Fourier Transform (FFT) logic configured to generate FFT output samples for each of a plurality of digital input signals; 3) a first memory device with a plurality of banks; 4) a second memory device; 5) a bit-reversed address generator and first set of circular shift components configured to shift between the plurality of banks when writing the generated FFT output samples in bit-reversed address order to the first memory device; and 6) a second set of circular shift components configured to shift between the plurality of banks when reading FFT output samples in linear address order from the first memory device for storage in the second memory device, wherein the first and second set of circular shift components together are configured to read FFT output samples in transpose order using combined bit-reversal and memory transpose operations.

TRANSPOSING IN A MATRIX-VECTOR PROCESSOR

A circuit for transposing a matrix comprising reversal circuitry configured, for each of one or more diagonals of the matrix, to receive elements of the matrix in a first vector and generate a second vector that includes the elements of the matrix in an order that is a reverse of an order of the elements of the matrix in the first vector, and rotation circuitry configured, for each of the one or more diagonals of the matrix, to determine a number of positions by which to rotate the elements of the matrix in the second vector, receive the second vector of elements of the matrix, and generate a third vector that includes the elements of the matrix in the second vector in an order that is a rotation of the elements of the matrix in the second vector by the determined number of positions.

SECURE AGGREGATE ORDER SYSTEM, SECURE COMPUTATION APPARATUS, SECURE AGGREGATE ORDER METHOD, AND PROGRAM

An aggregate order is efficiently obtained while keeping confidentiality. An inverse permutating part (12) generates a share of a vector representing an inversely permutated cross tabulation by applying inverse permutation to a cross tabulation of a table, the inverse permutation being a permutation which moves elements so that, when the table is grouped based on a key attribute, last elements of each group are sequentially arranged from beginning. A partial summing part (13) computes a prefix sum from the inversely permutated cross tabulation. The order computing part (14) generates a share of a vector representing ascending order within a group from a result of the prefix sum.

DATA REFORMAT OPERATION

Devices, methods, and systems are provided. In one example, a device is described to include circuitry that collects data received from a data source, references a descriptor that describes a data reformat operation to perform on the data received from the data source, reformats the data received from the data source according to the data reformat operation, and provides the reformatted data to the data target via the second device interface.

Transposing in a matrix-vector processor

A circuit for transposing a matrix comprising reversal circuitry configured, for each of one or more diagonals of the matrix, to receive elements of the matrix in a first vector and generate a second vector that includes the elements of the matrix in an order that is a reverse of an order of the elements of the matrix in the first vector, and rotation circuitry configured, for each of the one or more diagonals of the matrix, to determine a number of positions by which to rotate the elements of the matrix in the second vector, receive the second vector of elements of the matrix, and generate a third vector that includes the elements of the matrix in the second vector in an order that is a rotation of the elements of the matrix in the second vector by the determined number of positions.

Audio signal circuit with in-place bit-reversal
10908880 · 2021-02-02 · ·

An integrated circuit for processing audio signals from a microphone assembly, combinations thereof and methods therefor, including a multi-issue processor configured to execute multiple instructions concurrently and connectable to a memory with a plurality of locations each represented by a corresponding index. Bit-reversal is performed on a sequence of audio data bits stored in memory by concurrently performing a load or store operation related to a first index and determining whether to perform a load operation for a second index.

PROCESSOR MEMORY OPTIMIZATION METHOD AND APPARATUS FOR DEEP LEARNING TRAINING TASKS

The present application discloses a processor video memory optimization method and apparatus for deep learning training tasks, and relates to the technical field of artificial intelligence. In the method, by determining an optimal path for transferring a computing result, the computing result of a first computing unit is transferred to a second computing unit by using the optimal path. Thus, occupying the video memory is avoided, and meanwhile, a problem of low utilization rate of the computing unit of a GPU caused by video memory swaps is avoided, so that training speed of most tasks is hardly reduced.

Tiled Switch Matrix Data Permutation Circuit
20200348911 · 2020-11-05 ·

Embodiments of the present disclosure pertain to switch matrix circuit including a data permutation circuit. In one embodiment, the switch matrix comprises a plurality of adjacent switching blocks configured along a first axis, wherein the plurality of adjacent switching blocks each receive data and switch control settings along a second axis. The switch matrix includes a permutation circuit comprising, in each switching block, a plurality of switching stages spanning a plurality of adjacent switching blocks and at least one switching stage that does not span to adjacent switching blocks. The permutation circuit receives data in a first pattern and outputs the data in a second pattern. The data permutation performed by the switching stages is based on the particular switch control settings received in the adjacent switching blocks along the second axis.

Tiled switch matrix data permutation circuit
10754621 · 2020-08-25 · ·

Embodiments of the present disclosure pertain to switch matrix circuit including a data permutation circuit. In one embodiment, the switch matrix comprises a plurality of adjacent switching blocks configured along a first axis, wherein the plurality of adjacent switching blocks each receive data and switch control settings along a second axis. The switch matrix includes a permutation circuit comprising, in each switching block, a plurality of switching stages spanning a plurality of adjacent switching blocks and at least one switching stage that does not span to adjacent switching blocks. The permutation circuit receives data in a first pattern and outputs the data in a second pattern. The data permutation performed by the switching stages is based on the particular switch control settings received in the adjacent switching blocks along the second axis.

MATRIX NORMAL/TRANSPOSE READ AND A RECONFIGURABLE DATA PROCESSOR INCLUDING SAME

A configurable circuit configurable according to the data width of elements of a matrix is described that includes a memory array, logic to write a matrix to the memory array having elements with a data width which can be specified using configuration data, logic for a transpose read of the matrix as-written and logic for normal read of the matrix as-written. The memory array includes first and second read ports operable in parallel. Transpose read logic and normal read logic can be coupled to the first and second read ports, respectively, allowing transpose and normal read of a matrix simultaneously.