Patent classifications
G06F9/30185
METHOD FOR EXECUTING A MACHINE CODE BY MEANS OF A MICROPROCESSOR
A method for executing a machine code using a microprocessor includes, after an operation of decoding a current loaded instruction, constructing a mask from the signals generated by an instruction decoder in response to decoding of the current loaded instruction by the decoder. The constructed mask varies as a function of the current loaded instruction. Subsequently, before an operation of decoding a next loaded instruction, the next loaded instruction is unmasked using the constructed mask.
SCHEDULING TASKS USING SWAP FLAGS
A method of activating scheduling instructions within a parallel processing unit is described. The method comprises decoding, in an instruction decoder, an instruction in a scheduled task in an active state and checking, by an instruction controller, if a swap flag is set in the decoded instruction. If the swap flag in the decoded instruction is set, a scheduler is triggered to de-activate the scheduled task by changing the scheduled task from the active state to a non-active state.
Stacked transistors with different gate lengths in different device strata
Disclosed herein are stacked transistors with different gate lengths in different device strata, as well as related methods and devices. In some embodiments, an integrated circuit structure may include stacked strata of transistors, with two different device strata having different gate lengths.
DISPATCH BANDWIDTH OF MEMORY-CENTRIC REQUESTS BY BYPASSING STORAGE ARRAY ADDRESS CHECKING
A technical solution to the technical problem of how to improve dispatch throughput for memory-centric commands bypasses address checking for certain memory-centric commands. Implementations include using an Address Check Bypass (ACB) bit to specify whether address checking should be performed for a memory-centric command. ACB bit values are specified in memory-centric instructions, automatically specified by a process, such as a compiler, or by host hardware, such as dispatch hardware, based upon whether a memory-centric command explicitly references memory. Implementations include bypassing, i.e., not performing, address checking for memory-centric commands that do not access memory and also for memory-centric commands that do access memory, but that have the same physical address as a prior memory-centric command that explicitly accessed memory to ensure that any data in caches was flushed to memory and/or invalidated.
PROCESSOR AND INSTRUCTION SET
A processor includes a register file having a plurality of register file addresses, a processing unit, configured to perform processing in accordance with a configuration defined by information stored in the register file, and an instruction sequencer. The instruction sequencer is configured to control the processing unit by retrieving a sequence of instructions from a memory, in which each instruction includes an opcode, and a subset of the instructions includes a data portion. For each instruction in the sequence of instructions, the instruction sequencer performs an action defined by the opcode. The action for the subset of the opcodes includes writing the data portion to a register file address defined by the opcode. The sequence of instructions includes variable length instructions.
CAPABILITY-GENERATING ADDRESS CALCULATING INSTRUCTION
An apparatus has processing circuitry, an instruction decoder, and capability registers, each capability register to store a capability comprising a pointer and constraint metadata for constraining valid use of the pointer/capability. In response to a capability-generating address calculating instruction specifying an offset value, a reference capability register is selected as one of a program counter capability register and a further capability register. A result capability is generated for which the pointer of the result capability indicates a window address identifying a selected window within an address space, the selected window being offset from a reference window by a number of windows determined based on the offset value of the capability-generating address calculating instruction. The reference window comprises the window comprising an address indicated by the pointer of the reference capability register.
COMPUTE OPTIMIZATIONS FOR LOW PRECISION MACHINE LEARNING OPERATIONS
One embodiment provides an apparatus comprising a memory stack including multiple memory dies and a parallel processor including a plurality of multiprocessors. Each multiprocessor has a single instruction, multiple thread (SIMT) architecture, the parallel processor coupled to the memory stack via one or more memory interfaces. At least one multiprocessor comprises a multiply-accumulate circuit to perform multiply-accumulate operations on matrix data in a stage of a neural network implementation to produce a result matrix comprising a plurality of matrix data elements at a first precision, precision tracking logic to evaluate metrics associated with the matrix data elements and indicate if an optimization is to be performed for representing data at a second stage of the neural network implementation, and a numerical transform unit to dynamically perform a numerical transform operation on the matrix data elements based on the indication to produce transformed matrix data elements at a second precision.
INSTRUCTION EXECUTION METHOD AND INSTRUCTION EXECUTION DEVICE
An instruction configuration and execution method includes the following steps. A target instruction is received through an instruction cache. The target instruction is decoded by an instruction translator. It is determined whether the target instruction has the authority to read or write the model specific register in an unprivileged state. It is determined whether the model specific register index of the specific instruction corresponds to a specific model specific register, so as to order the microprocessor to perform an instruction serialization operation.
INSTRUCTION EXECUTION METHOD AND INSTRUCTION EXECUTION DEVICE
An instruction execution method for a microprocessor is provided. The microprocessor includes a model specific register (MSR). And, the instruction execution method includes the following steps. A target instruction is received using an instruction cache. The target instruction is decoded using an instruction translator to determine whether the target instruction is a specific instruction is a specific instruction. When the target instruction is the specific instruction, a model specific register index of the target instruction is obtained to directly read or write the model specific register.
SOFTWARE-DIRECTED DIVERGENT BRANCH TARGET PRIORITIZATION
Instruction set architecture extensions to configure priority ordering of divergent target branch instructions on SIMT computing platforms to enable tools such as compilers (e.g., under influence of execution profilers) or human software developers to configure branch direction prioritization explicitly in code. Extensions for simple (two-way) branch instructions as well as multi-target (more than two branch target instructions) are disclosed.