METHOD FOR ACCELERATING FAST FOURIER TRANSFORM BASED ON FIELD PROGRAMMABLE GATE ARRAY

20230237121 · 2023-07-27

    Inventors

    Cpc classification

    International classification

    Abstract

    A method for accelerating fast Fourier transform (FFT) based on field programmable gate array is provided. A sequence requiring N-point FFT is decomposed equally into 4 subsequences. The 4 subsequences are processed through 4 parallel FFT intellectual property (IP) cores. Finally, an arithmetic operation is performed on the processed data and twiddle factor data pre-stored in a memory to obtain a result of the N-point FFT of an original sequence. An FFT decomposition module, a twiddle factor storage module, and an operation processing module are provided. Through the processing method, a time delay consumed by an N-point FFT operation can be reduced, and excellent application value can be achieved in a high-speed digital signal processing system.

    Claims

    1. A method for accelerating fast Fourier transform based on field-programmable gate array (FPGA), wherein the FPGA is internally provided with a fast Fourier transform (FFT) decomposition module, a twiddle factor storage module, and an arithmetic operation processing module; and the method comprises: step S1: decomposing, by the FFT decomposition module, an input sequence having a length of N into 4 subsequences having a length of N/4, and synchronously performing N/4-point FFT; step S2: outputting, by the twiddle factor storage module, pre-stored twiddle factor data by determining an external input signal for performing arithmetic operation; and step S3: processing, by the arithmetic operation processing module, data output by the FFT decomposition module and the twiddle factor storage module to obtain a result of N-point FFT of an original sequence; wherein the FFT decomposition module comprises a sequence decomposition module and four parallel N/4-point FFT submodules, wherein the N/4-point FFT submodules are FFT intellectual property (IP) cores provided by the FPGA, with an output mode of sequential output; the twiddle factor storage module comprises two control units respectively marked as ctr1 and ctr2, and three independent single-port block random access memories (BRAMs) respectively configured to store three different types of twiddle factor data; the three independent single-port BRAMs respectively have a depth of N/4 and are respectively marked as BRAM1, BRAM2, and BRAM3; and corresponding twiddle factor data stored in the three independent single-port BRAMs are as follows: WN 1 = e - j 2 π N k k = 0 , 1 .Math. , N 4 - 1 ; ( 1 ) WN 2 = e - j 2 π N k k = N 4 , N 4 + 1 , .Math. , N 2 - 1 ; ( 2 ) and WN 3 = e - j 2 π N / 2 k k = 0 , 1 .Math. N 4 - 1 ; ( 3 ) wherein the step S1 comprises: substep S11: upon determining that a signal fft_valid==1, decomposing the input sequence x(n) having the length of N into the 4 subsequences, x(m), x(m+2), x(m+1), and x(m+3), each having the length of N/4, wherein n=0, 1, . . . , N−1, and m=0, 1, . . . , N/4−1; substep S12: transforming the 4 subsequences x(m), x(m+2), x(m+1), and x(m+3) through four parallel N/4-point FFT IP cores respectively, to obtain 4 output sequences z.sub.1(m), z.sub.2(m), z.sub.3(m), and z.sub.4(m), wherein m=0, 1, . . . , N/4−1; z.sub.1(m) corresponds to an output of x(m), z.sub.2(m) corresponds to an output of x(m+2), z.sub.3(m) corresponds to an output of x(m+1), and z.sub.4(m) corresponds to an output of x(m+3); and substep S13: outputting synchronously a data valid signal data1_valid to the twiddle factor storage module while outputting the sequences z.sub.1(m), z.sub.2(m), z.sub.3(m), and z.sub.4(m).

    2. The method for accelerating the fast Fourier transform based on the FPGA according to claim 1, wherein the step S2 further comprises: substep S21: sequentially storing corresponding twiddle factors into three independent memory cells BRAM1, BRAM2, and BRAM3 in a binary form according to an increasing order of k values; substep S22: when the control unit ctr1 determines that the signal data1_valid==1, generating a read enable signal rd1 and a read address counter addr1 of BRAM3, sequentially acquiring the twiddle factor data of WN3, and transmitting the twiddle factor data of WN3 to the operation processing module; and substep S23: when the control unit ctr2 determines that a signal data2 valid==1, generating a read enable signal rd2 and a read address counter addr2 of BRAM1 and BRAM2, sequentially acquiring the twiddle factor data of WN1 and WN2, and transmitting the twiddle factor data of WN1 and WN2 to the operation processing module.

    3. The method for accelerating the fast Fourier transform based on the FPGA according to claim 1, wherein step S3 further comprises: substep S31: performing a delay beating on the sequences z.sub.2(m) and z.sub.4(m) to wait for the twiddle factor data of WN3; and multiplying the sequences z.sub.2(m) and z.sub.4(m) by the twiddle factor data of WN3 in sequence in a pipeline architecture to obtain sequences z.sub.21(m) and z.sub.41(m) having a length of N/4, wherein m=0, 1, . . . , N/4−1; substep S32: performing the delay beating on the sequences z.sub.1(m) and z.sub.3(m) to wait for z.sub.21(m) and z.sub.41(m); adding z.sub.1(m) and z.sub.21(m) in sequence in the pipeline architecture to obtain a sequence Z.sub.1(m) having a length of N/4, wherein m=0, 1, . . . , N/4−1; performing a subtraction operation on z.sub.1(m) and z.sub.21(m) in sequence in the pipeline architecture, by subtracting z.sub.21(m) from z.sub.1(m) to obtain a sequence Z.sub.2(m) having a length of N/4, wherein m=0, 1, . . . , N/4−1; adding z.sub.3(m) and z.sub.41(m) in sequence in the pipeline architecture to obtain a sequence Z.sub.3(m) having a length of N/4, wherein m=0, 1, . . . , N/4−1; and performing a subtraction operation on z.sub.3(m) and z.sub.41(m) in sequence in the pipeline architecture, by subtracting z.sub.41(m) from z.sub.3(m) to obtain a sequence Z.sub.4(m) having a length of N/4, wherein m=0, 1, . . . , N/4−1; substep S33: outputting synchronously a data valid signal data2_valid to the twiddle factor storage module while obtaining Z.sub.3(m) and Z.sub.4(m); substep S34: performing the delay beating on the sequences Z.sub.3(m) and Z.sub.4(m) to wait for the twiddle factor data of WN1 and the twiddle factor data of WN2; multiplying the sequence Z.sub.3(m) and the twiddle factor data of WN1 in sequence in the pipeline architecture to obtain a sequence Z.sub.31(m) having a length of N/4, wherein m=0, 1, . . . , N/4−1; and multiplying the sequence Z.sub.4(m) and the twiddle factor data of WN2 in sequence in the pipeline architecture to obtain a sequence Z.sub.41(m) having a length of N/4, wherein m=0, 1, . . . , N/4−1; substep S35: performing the delay beating on the sequences Z.sub.1(m) and Z.sub.2(m) to wait for Z.sub.31(m) and Z.sub.41(m); adding the sequences Z.sub.1(m) and Z.sub.31(m) in sequence in the pipeline architecture to obtain a sequence y.sub.1(n) having a length of N/4, wherein n=0, 1, . . . , N/4−1; adding the sequences Z.sub.2(m) and Z.sub.41(m) in sequence in the pipeline architecture to obtain a sequence y.sub.2(n) having a length of N/4, wherein n=0, 1, . . . , N/4−1; performing a subtraction operation on the sequences Z.sub.1(m) and Z.sub.31(m) in sequence in the pipeline architecture, by subtracting Z.sub.31(m) from Z.sub.1(m) to obtain a sequence y.sub.3(n) having a length of N/4, wherein n=0, 1, . . . , N/4−1; and performing a subtraction operation on the sequences Z.sub.2(m) and Z.sub.41(m) in sequence in the pipeline architecture, by subtracting Z.sub.41(m) from Z.sub.2(m) to obtain a sequence y.sub.4(n) having a length of N/4, wherein n=0, 1, . . . , N/4−1; and substep S36: outputting synchronously a data valid signal dout_valid while outputting y.sub.1(m), y.sub.2(m), y.sub.3(m), and y.sub.4(m).

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0028] FIG. 1 is an overall schematic diagram of a method for accelerating fast Fourier transform based on FPGA according to the present disclosure;

    [0029] FIG. 2 is a schematic diagram of an FFT decomposition module according to the present disclosure;

    [0030] FIG. 3 is a schematic diagram of a twiddle factor storage module according to the present disclosure; and

    [0031] FIG. 4 is a schematic diagram of an operation processing module according to the present disclosure.

    [0032] The present disclosure is further described below with reference to the following specific embodiments and the above accompanying drawings.

    DETAILED DESCRIPTION OF THE EMBODIMENTS

    [0033] Specific implementations of the present disclosure are described below in detail with reference to the accompanying drawings:

    [0034] FIG. 1 is an overall schematic diagram of a method for accelerating fast Fourier transform based on FPGA according to the present disclosure. An FFT decomposition module, a twiddle factor storage module, and an operation processing module are provided. FIG. 2 is a schematic diagram of the FFT decomposition module according to the present disclosure. The FFT decomposition module includes a sequence decomposition module and four parallel N/4-point FFT submodules. The N/4-point FFT submodules of the FFT decomposition module are FFT intellectual property (IP) cores provided by the FPGA, and an output mode of the FFT IP cores of the FFT decomposition module is sequential output. FIG. 3 is a schematic diagram of the twiddle factor storage module according to the present disclosure. The twiddle factor storage module includes two control units respectively marked as ctr1 and ctr 2, and three independent single-port block random access memories (BRAMs) respectively configured to store three different types of twiddle factor data; the three independent single-port BRAMs included in the twiddle factor storage module respectively have a depth of N/4 and are respectively marked as BRAM1, BRAM2, and BRAM3. Corresponding twiddle factor data stored in the three independent single-port BRAMs are as follows:

    [00002] WN 1 = e - j 2 π N k k = 0 , 1 .Math. , N 4 - 1 ; ( 1 ) WN 2 = e - j 2 π N k k = N 4 , N 4 + 1 , .Math. , N 2 - 1 ; ( 2 ) and WN 3 = e - j 2 π N / 2 k k = 0 , 1 .Math. N 4 - 1 ; ( 3 ) .

    [0035] The specific implementation of the present disclosure includes steps S1-S3.

    [0036] Step S1: the FFT decomposition module decomposes an input sequence having a length of N into 4 subsequences having a length of N/4, and N/4 point FFT is synchronously performed.

    [0037] Step S2: the twiddle factor storage module outputs pre-stored twiddle factor data by determining an external input signal for performing arithmetic operation; and

    [0038] Step S3: the arithmetic operation processing module processes data output by the FFT decomposition module and the twiddle factor storage module to obtain a result of N-point FFT of an original sequence.

    [0039] FIG. 2 is a schematic diagram of the FFT decomposition module according to the present disclosure. The step S1 further includes substeps S11-S13.

    [0040] Substep S11: upon determining that a signal fft_valid==1, the input sequence x(n) having the length of N is decomposed into the 4 subsequences, x(m), x(m+2), x(m+1), and x(m+3), each having the length of N/4, where n=0, 1, . . . , N−1, and m=0, 1, . . . , N/4−1.

    [0041] Substep S12: the 4 subsequences x(m), x(m+2), x(m+1), and x(m+3) are transformed through four parallel N/4-point FFT IP cores to obtain 4 output sequences z.sub.1(m), z.sub.2(m), z.sub.3(m), and z.sub.4(m), where m=0, 1, . . . , N/4−1; z.sub.1(m) corresponds to an output of x(m), z.sub.2(m) corresponds to an output of x(m+2), z.sub.3(m) corresponds to an output of x(m+1), and z.sub.4(m) corresponds to an output of x(m+3).

    [0042] Substep S13: a data valid signal data1_valid is output synchronously to the twiddle factor storage module while the sequences z.sub.1(m), z.sub.2(m), z.sub.3(m), and z.sub.4(m) are output, which is valid at a high level.

    [0043] FIG. 3 is a schematic diagram of the twiddle factor storage module according to the present disclosure. The step S2 further includes substeps S21-S23.

    [0044] Substep S21: twiddle factors shown in formula (1), formula (2), and formula (3) are sequentially stored into three independent memory cells BRAM1, BRAM2, and BRAM3 in a binary form according to an increasing order of k values.

    [0045] Substep S22: when the control unit ctr1 determines that the signal data1_valid==1, a read enable signal rd1 and a read address counter addr1 of BRAM3 are generated, twiddle factor data of WN3 are acquired sequentially and transmitted to the operation processing module.

    [0046] Substep S23: when the control unit ctr2 determines that a signal data2 valid==1, a read enable signal rd2 and a read address counter addr2 of BRAM1 and BRAM2 are generated, twiddle factor data of WN1 and WN2 are acquired sequentially and transmitted to the operation processing module.

    [0047] FIG. 4 is a schematic diagram of the operation processing module according to the present disclosure. In the step S3, a corresponding result of the N-point FFT is obtained by performing operation processing according to formula (4), formula (5), formula (6), and formula (7), where y.sub.1(m) corresponds to bits 1 to N/4 of the result of the N-point FFT in sequence, y.sub.2(m) corresponds to bits N/4+1 to N/2 of the result of the N-point FFT in sequence, y.sub.3(m) corresponds to bits N/2+1 to 3N/4 of the result of the N-point FFT in sequence, and y.sub.4(m) corresponds to bits 3N/4+1 to N of the result of the N-point FFT in sequence;

    [00003] y 1 ( m ) = z 1 ( m ) + e - j 2 π N / 2 m z 2 ( m ) + e - j 2 π N m ( z 3 ( m ) + e - j 2 π N / 2 m z 4 ( m ) ) m = 0 , 1 , .Math. , N 4 - 1 ; ( 4 ) y 2 ( m ) = z 1 ( m ) - e - j 2 π N / 2 m z 2 ( m ) + e - j 2 π N ( m + N 4 ) ( z 3 ( m ) - e - j 2 π N / 2 m z 4 ( m ) ) m = 0 , 1 , .Math. , N 4 - 1 ; ( 5 ) y 3 ( m ) = z 1 ( m ) + e - j 2 π N / 2 m z 2 ( m ) - e - j 2 π N m ( z 3 ( m ) + e - j 2 π N / 2 m z 4 ( m ) ) m = 0 , 1 , .Math. , N 4 - 1 ; ( 6 ) y 4 ( m ) = z 1 ( m ) - e - j 2 π N / 2 m z 2 ( m ) - e - j 2 π N ( m + N 4 ) ( z 3 ( m ) - e - j 2 π N / 2 m z 4 ( m ) ) m = 0 , 1 , .Math. , N 4 - 1 ; ( 7 )

    [0048] The step S3 further includes substep S31-S36.

    [0049] Substep S31: a delay beating is performed on the sequences z.sub.2(m) and z.sub.4(m) to wait for the twiddle factor data of WN3; and the sequences z.sub.2(m) and z.sub.4(m) are multiplied by the twiddle factor data of WN3 in sequence in a pipeline architecture to obtain sequences z.sub.21(m) and z.sub.41(m) having a length of N/4, where m=0, 1, . . . , N/4−1.

    [0050] Substep S32: the delay beating is performed on the sequences z.sub.1(m) and z.sub.3(m) to wait for z.sub.21(m) and z.sub.41(m); z.sub.1(m) and z.sub.21(m) are added in sequence in the pipeline architecture to obtain a sequence Z.sub.1(m) having a length of N/4, where m=0, 1, . . . , N/4−1; a subtraction operation is performed on z.sub.1(m) and z.sub.21(m) in sequence in the pipeline architecture, i.e., z.sub.21(m) is subtracted from z.sub.1(m) to obtain a sequence Z.sub.2(m) having a length of N/4, where m=0, 1, . . . , N/4−1; z.sub.3(m) and z.sub.41(m) are added in sequence in the pipeline architecture to obtain a sequence Z.sub.3(m) having a length of N/4, where m=0, 1, . . . , N/4−1; and a subtraction operation is performed on z.sub.3(m) and z.sub.41(m) in sequence in the pipeline architecture, i.e., z.sub.41(m) is subtracted from z.sub.3(m) to obtain a sequence Z.sub.4(m) having a length of N/4, where m=0, 1, . . . , N/4−1.

    [0051] Substep S33: a data valid signal data2 valid is output synchronously to the twiddle factor storage module while Z.sub.3(m) and Z.sub.4(m) are obtained, which is valid at a high level.

    [0052] Substep S34: the delay beating is performed on the sequences Z.sub.3(m) and Z.sub.4(m) to wait for the twiddle factor data of WN1 and the twiddle factor data of WN2; the sequence Z.sub.3(m) and the twiddle factor data of WN1 are multiplied in sequence in the pipeline architecture to obtain a sequence Z.sub.31(m) having a length of N/4, where m=0, 1, . . . , N/4−1; and the sequence Z.sub.4(m) and the twiddle factor data of WN2 are multiplied in sequence in the pipeline architecture to obtain a sequence Z.sub.41(m) having a length of N/4, where m=0, 1, . . . , N/4−1.

    [0053] Substep S35: the delay beating is performed on the sequences Z.sub.1(m) and Z.sub.2(m) to wait for Z.sub.31(m) and Z.sub.41(m); the sequences Z.sub.1(m) and Z.sub.31(m) are added in sequence in the pipeline architecture to obtain a sequence y.sub.1(n) having a length of N/4, where n=0, 1, . . . , N/4−1; the sequences Z.sub.2(m) and Z.sub.41(m) are added in sequence in the pipeline architecture to obtain a sequence y.sub.2(n) having a length of N/4, where n=0, 1, . . . , N/4−1; a subtraction operation is performed on the sequences Z.sub.1(m) and Z.sub.31(m) in sequence in the pipeline architecture, i.e., Z.sub.31(m) is subtracted from Z.sub.1(m) to obtain a sequence y.sub.3(n) having a length of N/4, where n=0, 1, . . . , N/4−1; and a subtraction operation is performed on the sequences Z.sub.2(m) and Z.sub.41(m) in sequence in the pipeline architecture, i.e., Z.sub.41(m) is subtracted from Z.sub.2(m) to obtain a sequence y.sub.4(n) having a length of N/4, where n=0, 1, . . . , N/4−1.

    [0054] Substep S36: a data valid signal dout_valid is output synchronously while y.sub.1(m), y.sub.2(m), y.sub.3(m), and y.sub.4(m) are output, which is valid at a high level.

    [0055] The above are the specific implementation steps explained by the inventor in conjunction with the examples, and the present disclosure is applicable to a digital baseband module and a digital image processing module of an FPGA-based communication system. It should be pointed out that those skilled in the art could improve and perfect the method without departing from the present disclosure, but it should be understood that the above examples do not impose restrictions on the protection scope of the present disclosure, and any improvement and perfection based on the present disclosure should fall within the protection scope of the present disclosure.