G06F7/57

Systems and methods involving hybrid quantum machines, aspects of quantum information technology and/or other features
11699092 · 2023-07-11 · ·

Systems and methods involving quantum machines, hybrid quantum machines, aspects of quantum information technology and/or other features are disclosed. In one exemplary implementation, a system is provided comprising a quantum register that stores quantum information using qubits, wherein the qubits are configured to store the quantum information using particles or objects arranged in a lattice of quantum gates, a clock that provides a clock cycle to the quantum register, and a qubit-tie computing component coupled to the quantum register, wherein the qubit-tie computing component is configured to shift the quantum information between the qubits, wherein the system stores the qubits in different states using physical qualities, which may define qubits that are configured to be entangled and superposed at a same time. Further, the quantum register may comprise an entanglement component, and/or the qubit-tie computing component may comprise a superposition component.

PARALLEL PROCESSOR OPTIMIZED FOR MACHINE LEARNING
20230008138 · 2023-01-12 ·

A parallel processor system for machine learning includes an arithmetic unit (ALU) array including several ALUs and a controller to provide instructions for the ALUs. The system includes a direct-access memory (DMA) block containing multiple DMA engines to access an external memory to retrieve data. An input-stream buffer decouples the DMA block from the ALU array and provides aligning and reordering of the retrieved data. The DMA engines operate in parallel and include rasterization logic capable of performing a three-dimensional (3-D) rasterization.

PARALLEL PROCESSOR OPTIMIZED FOR MACHINE LEARNING
20230008138 · 2023-01-12 ·

A parallel processor system for machine learning includes an arithmetic unit (ALU) array including several ALUs and a controller to provide instructions for the ALUs. The system includes a direct-access memory (DMA) block containing multiple DMA engines to access an external memory to retrieve data. An input-stream buffer decouples the DMA block from the ALU array and provides aligning and reordering of the retrieved data. The DMA engines operate in parallel and include rasterization logic capable of performing a three-dimensional (3-D) rasterization.

Processing apparatus, method of controlling the same, and non-transitory computer readable storage medium
11550546 · 2023-01-10 · ·

A processing apparatus having a programmable circuit including a plurality of ALUs, comprises a holding unit which holds configuration information for switching the programmable circuit from a first circuit setting to a second circuit setting, and timing information; and an updating unit which updates each ALU so as to switch the programmable circuit from the first circuit setting to the second circuit setting, wherein in switching from the first circuit setting to the second circuit setting after the programmable circuit has executed the first data processing, the updating unit, using the timing information, updates the first ALU at a timing at which last data of the first data processing is output from the first ALU, and updates the second ALU at a timing at which the last data is output from the second ALU.

Processing apparatus, method of controlling the same, and non-transitory computer readable storage medium
11550546 · 2023-01-10 · ·

A processing apparatus having a programmable circuit including a plurality of ALUs, comprises a holding unit which holds configuration information for switching the programmable circuit from a first circuit setting to a second circuit setting, and timing information; and an updating unit which updates each ALU so as to switch the programmable circuit from the first circuit setting to the second circuit setting, wherein in switching from the first circuit setting to the second circuit setting after the programmable circuit has executed the first data processing, the updating unit, using the timing information, updates the first ALU at a timing at which last data of the first data processing is output from the first ALU, and updates the second ALU at a timing at which the last data is output from the second ALU.

Method and apparatus for configuring a reduced instruction set computer processor architecture to execute a fully homomorphic encryption algorithm

Systems and methods for configuring a reduced instruction set computer processor architecture to execute fully homomorphic encryption (FHE) logic gates as a streaming topology. The method includes parsing sequential FHE logic gate code, transforming the FHE logic gate code into a set of code modules that each have in input and an output that is a function of the input and which do not pass control to other functions, creating a node wrapper around each code module, configuring at least one of the primary processing cores to implement the logic element equivalents of each element in a manner which operates in a streaming mode wherein data streams out of corresponding arithmetic logic units into the main memory and other ones of the plurality arithmetic logic units.

LARGE INTEGER MULTIPLICATION ENHANCEMENTS FOR GRAPHICS ENVIRONMENT

An apparatus to facilitate large integer multiplication enhancements in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising multiplier circuitry to: receive operands for a multiplication operation, wherein the multiplication operation is part of a chain of multiplication operations for a large integer multiplication; and issue a multiply and add (MAD) instruction for the multiplication operation utilizing at least one of a double precision multiplier or a 48 bit output, wherein the MAD instruction to generate an output in a single clock cycle of the processor.

LARGE INTEGER MULTIPLICATION ENHANCEMENTS FOR GRAPHICS ENVIRONMENT

An apparatus to facilitate large integer multiplication enhancements in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising multiplier circuitry to: receive operands for a multiplication operation, wherein the multiplication operation is part of a chain of multiplication operations for a large integer multiplication; and issue a multiply and add (MAD) instruction for the multiplication operation utilizing at least one of a double precision multiplier or a 48 bit output, wherein the MAD instruction to generate an output in a single clock cycle of the processor.

ROUTING INSTRUCTIONS IN A MICROPROCESSOR

A computer system, processor, programming instructions and/or method for balancing the workload of processing pipelines that includes an execution slice, the execution slice comprising at least two processing pipelines having one or more execution units for processing instructions, wherein at least a first processing pipeline and a second processing pipeline are capable of executing a first instruction type; and an instruction decode unit for decoding instructions to determine which of the first processing pipeline or the second processing pipeline to execute the first instruction type. The processor configured to calculate at least one of a workload group consisting of: the first processing pipeline workload, the second processing pipeline workload, and combinations thereof; and select the first processing pipeline or the second processing pipeline to execute the first instruction type based upon at least one of the workload group.

ROUTING INSTRUCTIONS IN A MICROPROCESSOR

A computer system, processor, programming instructions and/or method for balancing the workload of processing pipelines that includes an execution slice, the execution slice comprising at least two processing pipelines having one or more execution units for processing instructions, wherein at least a first processing pipeline and a second processing pipeline are capable of executing a first instruction type; and an instruction decode unit for decoding instructions to determine which of the first processing pipeline or the second processing pipeline to execute the first instruction type. The processor configured to calculate at least one of a workload group consisting of: the first processing pipeline workload, the second processing pipeline workload, and combinations thereof; and select the first processing pipeline or the second processing pipeline to execute the first instruction type based upon at least one of the workload group.