G06F7/505

Apparatus and method for vector horizontal add of signed/unsigned words and doublewords

An apparatus and method for performing a packed horizontal addition of words and doublewords. One embodiment of a processor includes a decoder to decode a packed horizontal add instruction which includes an opcode and one or more operands used to identify a plurality of packed words; a source register to store a plurality of packed words; execution circuitry to execute the decoded instruction, and a destination register to store a final result as a packed result word in a designated data element position. The execution circuitry includes operand selection circuitry to identify first and second packed words from the source register in accordance with the operands and opcode; adder circuitry to add the two packed words to generate a temporary sum; a temporary storage of at least 17 bits to store the temporary sum; and saturation circuitry to saturate the temporary sum if necessary to generate the final result.

Processing-memory architectures performing atomic read-modify-write operations in deep learning systems

A computational apparatus includes a memory unit and Read-Modify-Write (RMW) logic. The memory unit is configured to hold a data value. The RMW logic, which is coupled to the memory unit, is configured to perform an atomic RMW operation on the data value stored in the memory unit.

Processing-memory architectures performing atomic read-modify-write operations in deep learning systems

A computational apparatus includes a memory unit and Read-Modify-Write (RMW) logic. The memory unit is configured to hold a data value. The RMW logic, which is coupled to the memory unit, is configured to perform an atomic RMW operation on the data value stored in the memory unit.

Lookup table sharing for memory-based computing

Methods and systems for memory-based computing include combining multiple operations into a single lookup table and combining multiple memory-based operation requests into a single read request. Operation result values are read from a multi-operation lookup table that includes result values for a first operation above a diagonal of the lookup table and includes result values for a second operation below the diagonal. Numerical inputs are used as column and row addresses in the lookup table and the requested operation determines which input corresponds to the column address and which input corresponds to the row address. Multiple operations are combined into a single request by combining respective members from each operation into respective inputs an reading an operation result value from a lookup table to produce a combined result output. The combined result output is separated into a plurality of individual result outputs corresponding to the plurality of requests.

APPARATUS AND METHOD FOR VECTOR HORIZONTAL ADD OF SIGNED/UNSIGNED WORDS AND DOUBLEWORDS

An apparatus and method for performing a packed horizontal addition of words and doublewords. One embodiment of a processor includes a decoder to decode a packed horizontal add instruction which includes an opcode and one or more operands used to identify a plurality of packed words; a source register to store a plurality of packed words; execution circuitry to execute the decoded instruction, and a destination register to store a final result as a packed result word in a designated data element position. The execution circuitry includes operand selection circuitry to identify first and second packed words from the source register in accordance with the operands and opcode; adder circuitry to add the two packed words to generate a temporary sum; a temporary storage of at least 17 bits to store the temporary sum; and saturation circuitry to saturate the temporary sum if necessary to generate the final result.

APPARATUS AND METHOD FOR VECTOR HORIZONTAL ADD OF SIGNED/UNSIGNED WORDS AND DOUBLEWORDS

An apparatus and method for performing a packed horizontal addition of words and doublewords. One embodiment of a processor includes a decoder to decode a packed horizontal add instruction which includes an opcode and one or more operands used to identify a plurality of packed words; a source register to store a plurality of packed words; execution circuitry to execute the decoded instruction, and a destination register to store a final result as a packed result word in a designated data element position. The execution circuitry includes operand selection circuitry to identify first and second packed words from the source register in accordance with the operands and opcode; adder circuitry to add the two packed words to generate a temporary sum; a temporary storage of at least 17 bits to store the temporary sum; and saturation circuitry to saturate the temporary sum if necessary to generate the final result.

MEASUREMENT BASED UNCOMPUTATION FOR QUANTUM CIRCUIT OPTIMIZATION
20220237493 · 2022-07-28 ·

Methods and apparatus for optimizing a quantum circuit. In one aspect, a method includes identifying one or more sequences of operations in the quantum circuit that un-compute respective qubits on which the quantum circuit operates; generating an adjusted quantum circuit, comprising, for each identified sequence of operations in the quantum circuit, replacing the sequence of operations with an X basis measurement and a classically-controlled phase correction operation, wherein a result of the X basis measurement acts as a control for the classically-controlled correction phase operation; and executing the adjusted quantum circuit.

MEASUREMENT BASED UNCOMPUTATION FOR QUANTUM CIRCUIT OPTIMIZATION
20220237493 · 2022-07-28 ·

Methods and apparatus for optimizing a quantum circuit. In one aspect, a method includes identifying one or more sequences of operations in the quantum circuit that un-compute respective qubits on which the quantum circuit operates; generating an adjusted quantum circuit, comprising, for each identified sequence of operations in the quantum circuit, replacing the sequence of operations with an X basis measurement and a classically-controlled phase correction operation, wherein a result of the X basis measurement acts as a control for the classically-controlled correction phase operation; and executing the adjusted quantum circuit.

Novel fast adder
20220206748 · 2022-06-30 ·

Disclosed is a novel fast adder, which belongs to the field of computer hardware processor design. By means of the novel fast adder, the number of gate circuit levels of a common adder can be reduced, such that the operating speed of a computer is increased. Two groups of recording modules are used for recording signals, and after the two groups of recording modules complete signal recording, a signal unit of one group of recording modules transfers the recorded signals to a signal-free unit of the other group of recording modules, and simplification of operation data is completed, and then a data addition operation is carried out, such that the operation time is shortened.

Interlayer Exchange Coupled Adder

An adder device for binary magnetic applied fields uses Interlayer Exchange Coupling (IEC) structure where two layers of ferromagnetic material are separated from each other by non-magnetic layers of electrically conductive material of atomic thickness, sufficient to generate anti-magnetic response in a magnetized layer. A set of regions are positioned on a top layer above a continuous bottom layer, and the regions excited with magnetization for A and not A, B and not B, and C and not C to form a sum and an inverse carry output magnetization.