Patent classifications
G06F2207/3892
PIPELINED CASCADED DIGITAL SIGNAL PROCESSING STRUCTURES AND METHODS
Circuitry operating under a floating-point mode or a fixed-point mode includes a first circuit accepting a first data input and generating a first data output. The first circuit includes a first arithmetic element accepting the first data input, a plurality of pipeline registers disposed in connection with the first arithmetic element, and a cascade register that outputs the first data output. The circuitry further includes a second circuit accepting a second data input and generating a second data output. The second circuit is cascaded to the first circuit such that the first data output is connected to the second data input via the cascade register. The cascade register is selectively bypassed when the first circuit is operated under the fixed-point mode.
PIPELINED CASCADED DIGITAL SIGNAL PROCESSING STRUCTURES AND METHODS
Circuitry operating under a floating-point mode or a fixed-point mode includes a first circuit accepting a first data input and generating a first data output. The first circuit includes a first arithmetic element accepting the first data input, a plurality of pipeline registers disposed in connection with the first arithmetic element, and a cascade register that outputs the first data output. The circuitry further includes a second circuit accepting a second data input and generating a second data output. The second circuit is cascaded to the first circuit such that the first data output is connected to the second data input via the cascade register. The cascade register is selectively bypassed when the first circuit is operated under the fixed-point mode.
HARDWARE ACCELERATOR WITH MATRIX BLOCK STREAMING
A hardware accelerator including tiles arranged in a systolic array. At each of the tiles, the systolic array receives a first input block that includes first input matrix elements of a first input matrix. In each of a plurality of multiplication iterations, at each of the tiles, the systolic array receives a respective second input block. The systolic array computes tile products of the first input matrix elements and second input matrix elements included in the second input blocks. The systolic array adds the tile products to column-wise partial sums and transmits the column-wise partial sums to subsequent tiles along accumulator rings included in array columns of the systolic array. In a subset of the multiplication iterations, the systolic array outputs product block rows of a product matrix. The product block rows each include product matrix blocks computed as rows of the column-wise partial sums.