Patent classifications
G06F7/505
Embedded Arithmetic Blocks for Structured ASICs
An integrated circuit is provided that includes via-configured structured logic circuitry and an embedded arithmetic block that interfaces with the via-configured structured logic circuitry to perform an arithmetic function. The embedded arithmetic block includes at least one monolithic arithmetic circuit that can perform the arithmetic function more efficiently or taking up less die space than a comparable circuit formed from the via-configured structured logic circuitry.
PARALLEL COMPUTATION OF A LOGIC OPERATION, INCREMENT, AND DECREMENT OF ANY PORTION OF A SUM
One embodiment provides a processor comprising at least one of a first mask to receive a first input operand and a second input operand and to generate a selected portion of an AND of a sum of the first input operand and the second input operand using an AND chain of the first mask in parallel with generation of the sum by an adder; and a second mask to receive the first input operand and the second input operand and to generate the selected portion of an OR of the sum using an OR chain of the second mask in parallel with generation of the sum.
SURFACE CODE COMPUTATIONS USING AUTO-CCZ QUANTUM STATES
Methods and apparatus for performing surface code computations using Auto-CCZ states. In one aspect, a method for implementing a delayed choice CZ operation on a first and second data qubit using a quantum computer includes: preparing a first and second routing qubit in a magic state; interacting the first data qubit with the first routing qubit and the second data qubit with the second routing qubit using a first and second CNOT operation, where the first and second data qubits act as controls for the CNOT operations; if a received first classical bit represents an off state: applying a first and second Hadamard gate to the first and second routing qubit; measuring the first and second routing qubit using Z basis measurements to obtain a second and third classical bit; and performing classically controlled fixup operations on the first and second data qubit using the second and third classical bits.
SURFACE CODE COMPUTATIONS USING AUTO-CCZ QUANTUM STATES
Methods and apparatus for performing surface code computations using Auto-CCZ states. In one aspect, a method for implementing a delayed choice CZ operation on a first and second data qubit using a quantum computer includes: preparing a first and second routing qubit in a magic state; interacting the first data qubit with the first routing qubit and the second data qubit with the second routing qubit using a first and second CNOT operation, where the first and second data qubits act as controls for the CNOT operations; if a received first classical bit represents an off state: applying a first and second Hadamard gate to the first and second routing qubit; measuring the first and second routing qubit using Z basis measurements to obtain a second and third classical bit; and performing classically controlled fixup operations on the first and second data qubit using the second and third classical bits.
MULTIPLE ACCUMULATE BUSSES IN A SYSTOLIC ARRAY
Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element of a given columnar bus can receive an input partial sum from a prior element of the given columnar bus, and perform arithmetic operations on the input partial sum. Each processing element can generate an output partial sum based on the arithmetic operations, provide the output partial sum to a next processing element of the given columnar bus, without the output partial sum being processed by a processing element of the column located between the two processing elements that uses a different columnar bus. Use of columnar busses can enable parallelization to increase speed or enable increased latency at individual processing elements.
MULTIPLE ACCUMULATE BUSSES IN A SYSTOLIC ARRAY
Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element of a given columnar bus can receive an input partial sum from a prior element of the given columnar bus, and perform arithmetic operations on the input partial sum. Each processing element can generate an output partial sum based on the arithmetic operations, provide the output partial sum to a next processing element of the given columnar bus, without the output partial sum being processed by a processing element of the column located between the two processing elements that uses a different columnar bus. Use of columnar busses can enable parallelization to increase speed or enable increased latency at individual processing elements.
In-memory bit-serial addition system
An in-memory vector addition method for a dynamic random access memory (DRAM) is disclosed which includes consecutively transposing two numbers across a plurality of rows of the DRAM, each number transposed across a fixed number of rows associated with a corresponding number of bits, assigning a scratch-pad including two consecutive bits for each bit of each number being added, two consecutive bits for carry-in (C.sub.in), and two consecutive bits for carry-out-bar (
Increment/decrement apparatus and method
A method comprises receiving an N-bit unsigned number and a control signal, in response to the control signal indicating an increment operation, increasing the N-bit unsigned number by 1 through an increment/decrement apparatus having (2m+3) levels of 2-input logic gates, wherein m is equal to log.sub.2.sup.(N) and in response to the control signal indicating a decrement operation, decreasing the N-bit unsigned number by 1 through the increment/decrement apparatus.
FIXED-POINT AND FLOATING-POINT ARITHMETIC OPERATOR CIRCUITS IN SPECIALIZED PROCESSING BLOCKS
The present embodiments relate to circuitry that efficiently performs floating-point arithmetic operations and fixed-point arithmetic operations. Such circuitry may be implemented in specialized processing blocks. If desired, the specialized processing blocks may include configurable interconnect circuitry to support a variety of different use modes. For example, the specialized processing block may efficiently perform a fixed-point or floating-point addition operation or a portion thereof, a fixed-point or floating-point multiplication operation or a portion thereof, a fixed-point or floating-point multiply-add operation or a portion thereof, just to name a few. In some embodiments, two or more specialized processing blocks may be arranged in a cascade chain and perform together more complex operations such as a recursive mode dot product of two vectors of floating-point numbers or a Radix-2 Butterfly circuit, just to name a few.
Apparatus and method for vector horizontal add of signed/unsigned words and doublewords
An apparatus and method for performing a packed horizontal addition of words and doublewords. One embodiment of a processor includes a decoder to decode a packed horizontal add instruction which includes an opcode and one or more operands used to identify a plurality of packed words; a source register to store a plurality of packed words; execution circuitry to execute the decoded instruction, and a destination register to store a final result as a packed result word in a designated data element position. The execution circuitry includes operand selection circuitry to identify first and second packed words from the source register in accordance with the operands and opcode; adder circuitry to add the two packed words to generate a temporary sum; a temporary storage of at least 17 bits to store the temporary sum; and saturation circuitry to saturate the temporary sum if necessary to generate the final result.