Patent classifications
G06F9/382
Coprocessor Prefetcher
A prefetcher for a coprocessor is disclosed. An apparatus includes a processor and a coprocessor that are configured to execute processor and coprocessor instructions, respectively. The processor and coprocessor instructions appear together in code sequences fetched by the processor, with the coprocessor instructions being provided to the coprocessor by the processor. The apparatus further includes a coprocessor prefetcher configured to monitor a code sequence fetched by the processor and, in response to identifying a presence of coprocessor instructions in the code sequence, capture the memory addresses, generated by the processor, of operand data for coprocessor instructions. The coprocessor is further configured to issue, for a cache memory accessible to the coprocessor, prefetches for data associated with the memory addresses prior to execution of the coprocessor instructions by the coprocessor.
Flushing a fetch queue using predecode circuitry and prediction information
A data processing apparatus is provided. It includes control flow detection prediction circuitry that performs a presence prediction of whether a block of instructions contains a control flow instruction. A fetch queue stores, in association with prediction information, a queue of indications of the instructions and the prediction information comprises the presence prediction. An instruction cache stores fetched instructions that have been fetched according to the fetch queue. Post-fetch correction circuitry receives the fetched instructions prior to the fetched instructions being received by decode circuitry, the post-fetch correction circuitry includes analysis circuitry that causes the fetch queue to be at least partly flushed in dependence on a type of a given fetched instruction and the prediction information associated with the given fetched instruction.
APPARATUS AND METHOD FOR SCALABLE QUBIT ADDRESSING
An apparatus and method for scalable qubit addressing. For example, one embodiment of a processor comprises: a decoder comprising quantum instruction decode circuitry to decode quantum instructions to generate quantum microoperations (uops) and non-quantum decode circuitry to decode non-quantum instructions to generate non-quantum uops; execution circuitry comprising: an address generation unit (AGU) to generate a system memory address responsive to execution of one or more of the non-quantum uops; and quantum index generation circuitry to generate quantum index values responsive to execution of one or more of the quantum uops, each quantum index value uniquely identifying a quantum bit (qubit) in a quantum processor; wherein to generate a first quantum index value for a first quantum uop, the quantum index generation circuitry is to read the first quantum index value from a first architectural register identified by the first quantum uop.
METHODS AND APPARATUS FOR DECODING PROGRAM INSTRUCTIONS
Aspects of the present disclosure relate to an apparatus comprising fetch circuitry. The fetch circuitry comprises a pointer-based fetch queue for queuing processing instructions retrieved from a storage, and pointer storage for storing a pointer identifying a current fetch queue element. The apparatus comprises decode circuitry having a plurality of decode units, and fetch queue extraction circuitry to, based on the pointer, extract the content of a plurality of elements of the fetch queue; apply combinatorial logic to speculatively produce, from the content of said fetch queue entries, a plurality of speculative potential instructions; and transmit each speculative potential instruction to a corresponding one of said decode units. Each decode unit is configured to decode the corresponding speculative potential instruction. The instruction extraction circuitry is configured to extract a subset of said plurality of speculative potential instructions, and transmit said determined subset to pipeline component circuitry.
Pipeline-based system for configuration checking and reporting associated with an information processing system
Pipeline-based techniques for system configuration management are provided. For example, a method comprises, in a pipeline-based system comprising a set of one or more pipelines, for a given one of the set of one or more pipelines, collecting a set of one or more configuration datasets respectively associated with a set of one or more elements of an information processing system, wherein each of the configuration datasets of the collected set of one or more configuration datasets is specific to the respective element of the information processing system from which it is collected; executing a set of one or more configuration checks on the set of one or more configuration datasets; receiving a set of one or more output results from the executed one or more configuration checks; and generating at least one report from the one or more output results.
GEOMETRY-BASED COMPRESSION FOR QUANTUM COMPUTING DEVICES
A quantum computing device comprises a surface code lattice that includes/logical qubits, where/is a positive integer. The surface code lattice is partitioned into two or more regions based on lattice geometry. A compression engine is coupled to each logical qubit of the/logical qubits. Each compression engine is configured to compress syndrome data generated by the surface code lattice using a geometry-based compression scheme. A decompression engine is coupled to each compression engine. Each decompression engine is configured to receive compressed syndrome data, decompress the received compressed syndrome data, and route the decompressed syndrome data to a decoder block.
Lock free streaming of executable code data
A disassembler receives instructions and disassembles them into a plurality of separate opcodes. The disassembler creates a table identifying boundaries between each opcode. Each opcode is written to memory in an opcode-by-opcode manner by atomically writing standard blocks of memory. Debug break point opcodes are appended to opcode to create a full block of memory when needed. The block of memory may be thirty-two or sixty-four bits long, for example. Long opcodes may overlap two or more memory blocks. Debug break point opcodes may be appended to a second portion of the long opcode to create a full block of memory. A stream fault interceptor identifies when a requested data page is not available and retrieving the data page.
Instructions and logic for vector multiply add with zero skipping
Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.
Methods and apparatus for scheduling instructions using pre-decode data
Systems and methods for scheduling instructions using pre-decode data corresponding to each instruction. In one embodiment, a multi-core processor includes a scheduling unit in each core for selecting instructions from two or more threads each scheduling cycle for execution on that particular core. As threads are scheduled for execution on the core, instructions from the threads are fetched into a buffer without being decoded. The pre-decode data is determined by a compiler and is extracted by the scheduling unit during runtime and used to control selection of threads for execution. The pre-decode data may specify a number of scheduling cycles to wait before scheduling the instruction. The pre-decode data may also specify a scheduling priority for the instruction. Once the scheduling unit selects an instruction to issue for execution, a decode unit fully decodes the instruction.
Processor with a Program Counter Increment Based on Decoding of Predecode Bits
A processor includes: an instruction fetch portion configured to fetch simultaneously a plurality of fixed-length instructions in accordance with a program counter; an instruction predecoder configured to predecode specific fields in a part of the plurality of fixed-length instructions; and a program counter management portion configured to control an increment of the program counter in accordance with a result of the predecoding.