Patent classifications
G06F7/4806
Reconfigurable digital signal processing (DSP) vector engine
Systems and methods described herein may relate to providing a dynamically configurable circuitry able to process data associated with a variety of matrix dimensions using one or more complex number operations, one or more real number operations, or both. Configurations may be applied to the configurable circuitry to program the configurable circuitry for a next operation. The configurable circuitry may process data according to a variety of operations based at least in part on operation of a repeated processing element coupled in a compute network of processing elements.
APPARATUS AND METHOD FOR COMPLEX MULTIPLY AND ACCUMULATE
An embodiment of the invention is a processor including execution circuitry to calculate, in response to a decoded instruction, a result of a complex multiply-accumulate of a first complex number, a second complex number, and a third complex number. The calculation includes a first operation to calculate a first term of a real component of the result and a first term of the imaginary component of the result. The calculation also includes a second operation to calculate a second term of the real component of the result and a second term of the imaginary component of the result. The processor also includes a decoder to decode an instruction to generate the decoded instruction and a first source register, a second source register, and a source and destination register to provide the first complex number, the second complex number, and the third complex number, respectively.
APPARATUS AND METHOD FOR COMPLEX MULTIPLICATION
An embodiment of the invention is a processor including execution circuitry to calculate, in response to a decoded instruction, a result of a complex multiplication of a first complex number and a second complex number. The calculation includes a first operation to calculate a first term of a real component of the result and a first term of the imaginary component of the result. The calculation also includes a second operation to calculate a second term of the real component of the result and a second term of the imaginary component of the result. The processor also includes a decoder, a first source register, and a second source register. The decoder is to decode an instruction to generate the decoded instruction. The first source register is to provide the first complex number and the second source register is to provide the second complex number.
HYBRID NON-UNIFORM CONVOLUTION TRANSFORM ENGINE FOR DEEP LEARNING APPLICATIONS
A system performs convolution operations based on an analysis of the input size. The input includes data elements and filter weights. The system includes multiple processing elements. Each processing element includes multipliers and adders, with more of the adders than the multipliers. According to at least the analysis result which indicates whether the input size matches a predetermined size, the system is operative to select a first mode or a second mode. In the first mode, a greater number of the adders than the multipliers are enabled for each processing element to multiply transformed input and to perform an inverse transformation. In the second mode, an equal number of the adders and the multipliers are enabled for each processing element to multiply-and-accumulate the input. One or more of the multipliers are shared by the first mode and the second mode.
APPARATUS AND METHOD FOR PERFORMING TRANSFORMS OF PACKED COMPLEX DATA HAVING REAL AND IMAGINARY COMPONENTS
An apparatus and method for performing a transform on complex data. For example, one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed real and imaginary data elements; a second source register to store a second plurality of packed real and imaginary data elements; a third source register to store a third plurality of packed real and imaginary data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to select real and imaginary data elements in the first and second source registers to multiply based on an immediate of the first instruction, the multiplier circuitry to multiply first packed data elements from the first source register with second packed data elements from the second source register in accordance with the immediate to generate a plurality of real and imaginary products, adder circuitry to select real and imaginary data elements in the third source register based on the immediate, the adder circuitry to add and subtract selected real and imaginary values from the real and imaginary products to generate first real and imaginary results; scaling, rounding, and/or saturation circuitry to scale, round, and/or saturate the first real and imaginary results to generate final real and imaginary data elements; and a destination register to store the final real and imaginary data elements in specified data element positions.
APPARATUS AND METHOD FOR MULTIPLICATION AND ACCUMULATION OF COMPLEX AND REAL PACKED DATA ELEMENTS
An apparatus and method for multiplying packed real and imaginary components of complex numbers. For example, one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed real and imaginary data elements; a second source register to store a second plurality of packed real and imaginary data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to select real and imaginary data elements in the first source register and second source register to multiply, the multiplier circuitry to multiply each selected imaginary data element in the first source register with a selected real data element in the second source register, and to multiply each selected real data element in the first source register with a selected imaginary data element in the second source register to generate a plurality of imaginary products, adder circuitry to add a first subset of the plurality of imaginary products to generate a first temporary result and to add a second subset of the plurality of imaginary products to generate a second temporary result; negation circuitry to negate the first temporary result to generate a third temporary result and to negate the second temporary result to generate a fourth temporary result; accumulation circuitry to combine the third temporary result with first data from a destination register to generate a first final result and to combine the fourth temporary result with second data from the destination register to generate a second final result and to store the first final result and second final result back in the destination register.
Computer-Implemented Method for Determining a Gaussian Integer Congruent to a Given Gaussian Integer Modulo a Gaussian Integer Modulus, Method for Determining a Reduction of a Given Gaussian Integer Modulo a Gaussian Integer Modulus and Cryptographic Method and Error-Correction Method
Various embodiments of the teachings herein include methods for determining a Gaussian integer congruent to a given Gaussian integer modulo. The method may include: starting with a Gaussian integer base raised to an integer exponent having a norm smaller than or equal to that of the Gaussian integer modulus and larger than the norm of the difference of the Gaussian integer base raised to the integer exponent and the Gaussian n integer modulus; initializing a variable value candidate for the Gaussian integer congruent with the given Gaussian integer; then iteratively decrementing the variable value by a product of the Gaussian integer modulus and a component-wise down rounded quotient of the current value of the variable value candidate and the Gaussian integer base raised to the integer exponent, as long as the quotient is not vanishing; and identifying the resulting variable value candidate as the Gaussian integer congruent.
METHOD AND SYSTEM FOR PERFORMING ANALOG COMPLEX VECTOR-MATRIX MULTIPLICATION
A hardware device and method for performing a multiply-accumulate operation are described. The device includes inputs lines, weight cells and output lines. The input lines receive input signals, each of which is has a magnitude and a phase and can represent a complex value. The weight cells couple the input lines with the output lines. Each of the weight cells has an electrical admittance corresponding to a weight. The electrical admittance is programmable and capable of being complex valued. The input lines, the weight cells and the output lines form a crossbar array. Each of the output lines provides an output signal. The output signal for an output line is a sum of an input signal for each of the input lines connected to the output line multiplied by the electrical admittance of each of the weight cells connecting the input lines to the output line.
Fast Fourier Transforms for Processing-in-Memory
Fast Fourier transforms for processing-in-memory are described. In accordance with the described techniques, a computing device includes a memory, a host processing unit, and a processing-in-memory unit that operates on data of one or more banks of the memory. The host processing unit stores interacting elements of a fast Fourier transform at locations in the one or more banks. The locations are mapped to a lane of the processing-in-memory unit. The host processing unit issues processing-in-memory commands instructing the processing-in-memory unit to load the interacting elements from the locations into the lane of the processing-in-memory unit, and execute an operation on the interacting elements.
Linear approximation of a complex number magnitude
A device includes a comparison circuit and a calculation circuit coupled to the comparison circuit. The comparison circuit is configured to receive a first digital input value (X) and a second digital input value (Y), and provide a first digital output value that indicates one of a first relationship, a second relationship, and a third relationship between X and Y. The calculation circuit is configured to receive X and Y, receive the first digital output value, and provide a second digital output value. The second digital output value is a first linear combination of X and Y responsive to the first digital output value indicating the first relationship, a second linear combination of X and Y responsive to the first digital output value indicating the second relationship, and a third linear combination of X and Y responsive to the first digital output value indicating the third relationship.