G06F9/3016

Out-of-order block-based processors and instruction schedulers using ready state data indexed by instruction position identifiers

Apparatus and methods are disclosed for implementing block-based processors including field programmable gate-array implementations. In one example of the disclosed technology, a block-based processor includes an instruction decoder configured to generate decoded ready dependencies for a transactional block of instructions, where each of the instructions is associated with a different instruction identifier encoded in the transactional block. The processor further includes an instruction scheduler configured to issue an instruction from a set of instructions of the transactional block of instructions. The instruction is issued based on determining that decoded ready state dependencies for an instruction are satisfied. The determining includes accessing storage with the decoded ready dependencies indexed with a respective instruction identifier that is encoded in the transactional block of instructions.

NESTED QUANTUM ANNEALING CORRECTION

Systems and methods of processing using a quantum processor are described. A method includes obtaining a problem Hamiltonian and defining a nested Hamiltonian with a plurality of logical qubits by embedding a logical K.sub.N representing the problem Hamiltonian into a larger K.sub.C×N, where N represents a number of the logical qubits and C represents a nesting level defining the amount of hardware resources for the nest Hamiltonian. The method also includes encoding the nested Hamiltonian into the plurality of physical qubits of the quantum processor; and performing a quantum annealing process with the quantum processor after the encoding.

MIXED INFERENCE USING LOW AND HIGH PRECISION

One embodiment provides for a graphics processing unit (GPU) to accelerate machine learning operations, the GPU comprising an instruction cache to store a first instruction and a second instruction, the first instruction to cause the GPU to perform a floating-point operation, including a multi-dimensional floating-point operation, and the second instruction to cause the GPU to perform an integer operation; and a general-purpose graphics compute unit having a single instruction, multiple thread architecture, the general-purpose graphics compute unit to concurrently execute the first instruction and the second instruction.

CIRCUITRY AND METHODS FOR IMPLEMENTING NON-REDUNDANT METADATA STORAGE ADDRESSED BY BOUNDED CAPABILITIES
20230195614 · 2023-06-22 ·

Systems, methods, and apparatuses for implementing non-redundant metadata storage addressed by bounded capabilities are described. In certain examples, a hardware processor core comprises an execution circuit to generate a first memory access request for a first single object in memory by a first capability and a second memory access request for a second different sized single object in the memory by a second capability, wherein a format of each of the first capability and the second capability comprises a single metadata field for access control of a single object in the memory, a bounds field that is to indicate a lower bound and an upper bound of the single object in the memory to which the single metadata field authorizes access, and an address field to indicate an address in the single object that is to be accessed; and a capability management circuit to determine a first location of a corresponding first metadata field in the memory based on the bounds field of the first capability, proceed with the first memory access request in response to a match of metadata in the single metadata field of the first capability against metadata at the corresponding first metadata field in the memory, determine a second location of a corresponding second metadata field in the memory based on the bounds field of the second capability, and proceed with the second memory access request in response to a match of metadata in the single metadata field of the second capability against metadata at the corresponding second metadata field in the memory.

GLOBAL UNIFIED INTERDEPENDENT MULTI-TENANT QUALITY-OF-SERVICE PROCESSOR SCHEME
20230195462 · 2023-06-22 · ·

Embodiments of apparatuses, methods, and systems for a hierarchical multi-tenant processor scheme are disclosed. In an embodiment, a processor includes circuitry to execute threads, registers to store first values to define a tenant hierarchy, registers to store second values to specify a location of a thread corresponding to a tenant within the tenant hierarchy, and circuitry to include the second values in a request to access a resource. Use of the resource is to be monitored or controlled based on the location of the tenant within the tenant hierarchy.

Lock free streaming of executable code data

A disassembler receives instructions and disassembles them into a plurality of separate opcodes. The disassembler creates a table identifying boundaries between each opcode. Each opcode is written to memory in an opcode-by-opcode manner by atomically writing standard blocks of memory. Debug break point opcodes are appended to opcode to create a full block of memory when needed. The block of memory may be thirty-two or sixty-four bits long, for example. Long opcodes may overlap two or more memory blocks. Debug break point opcodes may be appended to a second portion of the long opcode to create a full block of memory. A stream fault interceptor identifies when a requested data page is not available and retrieving the data page.

Method for forming constant extensions in the same execute packet in a VLIW processor

In a very long instruction word (VLIW) central processing unit instructions are grouped into execute packets that execute in parallel. A constant may be specified or extended by bits in a constant extension instruction in the same execute packet. If an instruction includes an indication of constant extension, the decoder employs bits of a constant extension instruction to extend the constant of an immediate field. Two or more constant extension slots are permitted in each execute packet, each extending constants for a different predetermined subset of functional unit instructions. In an alternative embodiment, more than one functional unit may have constants extended from the same constant extension instruction employing the same extended bits. A long extended constant may be formed using the extension bits of two constant extension instructions.

Generation and use of memory access instruction order encodings

Apparatus and methods are disclosed for controlling execution of memory access instructions in a block-based processor architecture using a hardware structure that indicates a relative ordering of memory access instruction in an instruction block. In one example of the disclosed technology, a method of executing an instruction block having a plurality of memory load and/or memory store instructions includes selecting a next memory load or memory store instruction to execute based on dependencies encoded within the block, and on a store vector that stores data indicating which memory load and memory store instructions in the instruction block have executed. The store vector can be masked using a store mask. The store mask can be generated when decoding the instruction block, or copied from an instruction block header. Based on the encoded dependencies and the masked store vector, the next instruction can issue when its dependencies are available.

Method and apparatus for vector based finite impulse response (FIR) filtering

A method is provided that includes performing, by a processor in response to a vector finite impulse response (VFIR) filter instruction, generating of a plurality of filter outputs using a plurality of coefficients and a plurality of sequential data elements, the plurality of coefficients specified by a coefficient operand of the VFIR filter instruction and the plurality of sequential data elements specified by a data operand of the VFIR filter instruction, and storing the filter outputs in a storage location specified by the VFIR filter instruction.

PROCESSOR THAT DETECTS MEMORY ALIASING IN HARDWARE AND ASSURES CORRECT OPERATION WHEN MEMORY ALIASING OCCURS
20170351495 · 2017-12-07 ·

Processor hardware detects when memory aliasing occurs, and assures proper operation of the code even in the presence of memory aliasing. Because the hardware can detect and correct for memory aliasing, this allows a compiler to make optimizations such as register promotion even in regions of the code where memory aliasing can occur. The result is code that is more optimized and therefore runs faster.