G06F17/144

EMBEDDED SYSTEM, COMMUNICATION UNIT AND METHODS FOR IMPLEMENTING A FAST FOURIER TRANSFORM
20180253399 · 2018-09-06 ·

An embedded system is described. The embedded system includes a processing circuit comprising at least one processor configured to support an implementation of a non-power-of-2 fast Fourier transform of length N using a multiplication of at least two smaller FFTs of a respective first length N1 and second length N2, where N1 and N2 are whole numbers; and a memory, operably coupled to the processing circuit and comprising at least input data. The processing circuit is configured to: receive an input data complex number sequence; adapt the input data complex number sequence by inserting at least one zero into every X.sup.th data point that results in an excess number of data points above N, where X=N1, such that the inserted zeroes enables a use of a multiple-of-Q FFT; perform a first decomposed FFT of a respective first length N1 on the adapted input data complex number sequence and produce a first output complex number sequence; restore a number of data points of the first output complex number sequence to N after performing the first decomposed FFT; and perform a second decomposed FFT of a respective second length N2 on the first output complex number sequence that produces a second output complex number sequence.

EMBEDDED SYSTEM, METHOD AND COMMUNICATION UNIT FOR IMPLEMENTING A FAST FOURIER TRANSFORM USING CUSTOMIZED INSTRUCTIONS
20180253400 · 2018-09-06 ·

An embedded system is described. The embedded system includes a processing circuit comprising a processing circuit comprising Q processing units that can be operated in parallel. The processing circuit is configured to support an implementation of a non-power-of-2 fast Fourier transform of length N using a multiplication of at least two smaller FFTs of a respective first length N1 and second length N2, where N1 and N2 are whole numbers. A memory is operably coupled to the processing circuit and includes at least input data. The processing circuit is configured to: employ a customized instruction configured to perform an FFT operation of length less than Q using a first of the at least two smaller FFTs.

MULTI-DIMENSIONAL FFT COMPUTATION PIPELINED HARDWARE ARCHITECTURE USING RADIX-3 AND RADIX-2? BUTTERFLIES

A system includes Radix-2.sup.2 butterfly stages, each including first and second Radix-2.sup.2 butterfly circuits, in which the first Radix-2.sup.2 butterfly circuit of a first Radix-2.sup.2 butterfly stage includes a data input coupled to a system data input, and one of the first Radix-2.sup.2 butterfly circuit and the second Radix-2.sup.2 butterfly circuit of a last Radix-2.sup.2 butterfly stage includes a data output coupled to a system data output. The system further includes a Radix-3 butterfly circuit including a data input coupled to the system data input and a data output selectively couplable to a data input of one of the first or second Radix-2.sup.2 butterfly circuits of a second or later Radix-2.sup.2 butterfly stage based on a particular point transform to be performed by the system. A set of memories are used by either the first Radix-2.sup.2 butterfly stage or the Radix-3 butterfly circuit, depending on the particular point transform.

WINOGRAD ALGORITHM ON A MATRIX PROCESSING ARCHITECTURE
20180189237 · 2018-07-05 · ·

In one embodiment, a matrix operation may be performed, wherein the matrix operation comprises a matrix multiplication operation on a plurality of matrix operands. Matrix data may be received from a multi-dimensional memory, wherein the matrix data is associated with the plurality of matrix operands. The plurality of matrix operands may be extracted from the matrix data, wherein the plurality of matrix operands comprises a first matrix operand and a second matrix operand. A first transform may be performed on the first matrix operand to obtain a transformed matrix operand, wherein performing matrix multiplication using the transformed matrix operand is faster than performing matrix multiplication using the first matrix operand. Matrix multiplication may be performed on the transformed matrix operand to obtain a partial result. A second transform may be performed on the partial result to obtain a result of the matrix multiplication operation.

CLASSICAL AND QUANTUM COMPUTATIONAL METHOD AND APPARATUS FOR PERFORMING PRIME FACTORIZATION OF AN INTEGER, CLASSICAL AND QUANTUM COMPUTATIONAL METHOD AND APPARATUS FOR INVERTING A LOGIC GATE CIRCUIT
20240378480 · 2024-11-14 ·

A quantum computational method of performing prime factorization of an integer includes determining a logic gate circuit including logic gates, the circuit configured to compute a multiplication function outputting the integer. The method includes determining gate-encoding Hamiltonians (HG) for each logic gate, each HG encodes an input-output relation of a logic gate and is a sum of summand Hamiltonians; providing a quantum system comprising constituents, wherein each summand Hamiltonian of each HG is associated with a constituent of the system; determining a first set of short-range quantum interactions of the constituents based on the logic gates; determining a second set of short-range quantum interactions of the constituents based on the integer; implementing the first and the second set of short-range quantum interactions; measuring at least a portion of the quantum system to obtain a read-out and determining a prime factor of the integer based on the read-out.

Low overhead implementation of Winograd for CNN with 3x3, 1x3 and 3x1 filters on weight station dot-product based CNN accelerators

A system and a method are disclosed for forming an output feature map (OFM). Activation values in an input feature map (IFM) are selected and transformed on-the-fly into the Winograd domain. Elements in a Winograd filter is selected that respectively correspond to the transformed activation values. A transformed activation value is multiplied by a corresponding element of the Winograd filter to form a corresponding product value in the Winograd domain. Activation values are repeatedly selected, transformed and multiplied by a corresponding element in the Winograd filter to form corresponding product values in the Winograd domain until all activation values in the IFM have been transformed and multiplied by the corresponding element. The product values are summed in the Winograd domain to form elements of a feature map in the Winograd domain. The elements of the feature map in the Winograd domain are inverse-Winograd transformed on-the-fly to form the OFM.

RECONFIGURABLE COMPUTE CIRCUITRY TO PERFORM FULLY HOMOMORPHIC ENCRYPTION (FHE) TO MAP UNCONSTRAINED POWERS-OF-2 FHE POLYNOMIALS

A reconfigurable compute circuitry to perform Fully Homomorphic Encryption (FHE) enables a full utilization of compute resources and data movement resources by mapping multiple N*1024 polynomials on to a (M*N)*1024 polynomial. To counteract the shuffling of the coefficients during Number-Theoretic-Transforms (NTT) and inverse-NTT operations, compute elements in the compute circuitry operate in a bypass mode that is enabled by a data movement instruction, to convert from the shuffled form to contiguous form without modifying the values of the coefficients.

ENCRYPTED COMPUTATION COMPRISING A BLIND ROTATION
20250036714 · 2025-01-30 ·

Some embodiments are directed to a cryptographic encrypted computation method (400). The method involves performing a blind rotation of a ciphertext according to a test polynomial. The blind rotation results in an encrypted polynomial product of the test polynomial and a bootstrapping monomial represents the plaintext value as an exponent, modulo a modulus (q) and modulo a quotient polynomial (p(X)). The quotient polynomial p(X) divides an NTT polynomial X.sup.M1 that allows a number-theoretic transform modulo the modulus q, e.g., q is a power of two and p(X)=X.sup.N+X.sup.N/2+1. The blind rotation is made more efficient by using the NTT, while the test polynomial is defined in such a way that the polynomial product is programmed to have desired output values for respective plaintext values as a fixed coefficient.

METHOD AND APPARATUS WITH HOMOMORPHIC ENCRYPTION OPERATION

A method with a number-theoretic transform (NTT) operation includes allocating an element of a matrix to a data lane such that elements in a first column of the matrix corresponding to a polynomial are allocated to the data lane of a first lane group among lane groups, wherein the matrix is a square matrix, and a number of elements comprised in the matrix is N, performing a first NTT operation on a data lane of a fourth root of the N for each of the lane groups, allocating a result of the first NTT operation to the data lane such that the matrix is transposed, based on adjustment of a reading order of a buffer that stores the result of the first NTT operation, and performing a second NTT operation on the data lane of the fourth root of the N for each of the lane groups.

METHOD AND APPARATUS WITH POLYNOMIAL MULTIPLICATION OPERATION

A processor-implemented method with a polynomial multiplication operation includes obtaining an auxiliary modulus corresponding to moduli according to a Chinese remainder theorem (CRT), performing a number theoretic transform (NTT) operation with respect to the auxiliary modulus on a result of a modulo operation of the moduli for each of a first polynomial and a second polynomial, performing an element-wise multiplication operation between an NTT operation result corresponding to the first polynomial and an NTT operation result corresponding to the second polynomial, and transforming a result of the element-wise multiplication operation into a polynomial corresponding to each of the moduli.