Patent classifications
G06F9/384
PROCESSORS EMPLOYING MEMORY DATA BYPASSING IN MEMORY DATA DEPENDENT INSTRUCTIONS AS A STORE DATA FORWARDING MECHANISM, AND RELATED METHODS
Processors employing memory bypassing in memory data dependent instructions as a store data forwarding mechanism, and related methods. To reduce stalls of memory data dependent, load-based instructions, a memory data dependency detection circuit is configured to detect a memory hazard between a store-based instruction and a load-based instruction based on their opcodes and designation/source operands. Some store-based and load-based instructions have opcodes identifying these instructions as having respective store and load address operand types that can be compared without resolution of their respective store and load addresses. For these detected types of instructions, the memory data dependency detection circuit is configured to determine if a source operand of a load-based instruction matches a target operand of a store-based instruction to detect a memory hazard earlier in the instruction pipeline. Identifying memory hazards earlier in an instruction pipeline can allow memory dependent instructions to be processed with avoided or reduced stalls.
DECOUPLED ACCESS-EXECUTE PROCESSING
An apparatus comprises first instruction execution circuitry, second instruction execution circuitry, and a decoupled access buffer. Instructions of an ordered sequence of instructions are issued to one of the first and second instruction execution circuitry for execution in dependence on whether the instruction has a first type label or a second type label. An instruction with the first type label is an access-related instruction which determines at least one characteristic of a load operation to retrieve a data value from a memory address. Instruction execution by the first instruction execution circuitry of instructions having the first type label is prioritised over instruction execution by the second instruction execution circuitry of instructions having the second type label. Data values retrieved from memory as a result of execution of the first type instructions are stored in the decoupled access buffer.
OPERATION DEVICE OF CONVOLUTIONAL NEURAL NETWORK, OPERATION METHOD OF CONVOLUTIONAL NEURAL NETWORK AND COMPUTER PROGRAM STORED IN A RECORDING MEDIUM TO EXECUTE THE METHOD THEREOF
A convolutional operation method of generating a feature data matrix corresponding to an output data matrix by performing a general matrix multiplication (GEMM) operation on an input data matrix with a set filter matrix includes updating, by at least one processor, a register mapping table so that first destination register addresses of redundant components indicating data redundant with each other among a plurality of components of the input data matrix correspond to a same second destination register address, and performing, by the at least one processor, a convolutional operation by reusing a register having the same second destination register address with respect to the redundant components, based on the register mapping table.
Method and apparatus for renaming source operands of instructions
A renaming unit configured to rename source operands of instructions in a group. A renaming register maintains architectural to physical register mappings. Architectural to physical register mappings propagate from the renaming register through a chain of update units (U) over bus lines denoted with the architectural registers 0 to L. Update units (U) sequentially, in program order, insert physical register identifiers PR(i) allocated to instructions I(i) with destination operands DOP(i) on bus lines denoted with the destination operands DOP(i). Source operands of an instruction I(i) may be renamed to physical register identifiers after physical register identifiers allocated to instructions older than I(i) are sequentially, in program order, inserted on the bus lines, but before physical register identifiers allocated to I(i) and younger instructions are inserted on the bus lines. A source operand SOP(i) is renamed to a physical register identifier that propagates on a bus line denoted with SOP(i).
EVICTING AND RESTORING INFORMATION USING A SINGLE PORT OF A LOGICAL REGISTER MAPPER AND HISTORY BUFFER IN A MICROPROCESSOR COMPRISING MULTIPLE MAIN REGISTER FILE ENTRIES MAPPED TO ONE ACCUMULATOR REGISTER FILE ENTRY
A computer system, processor, programming instructions and/or method of processing data that includes a main register file having a plurality of entries for storing data; an accumulator register file having a plurality of entries for storing data wherein multiple main register file entries are mapped to one accumulator register file entry in the at least one accumulator register file; a logical register mapper to track and map logical registers to main register file entries, and a history buffer. Processing wide data width instructions includes evicting and restoring information from a single primary entry in the logical register mapper through a single read or write port in the logical register mapper without evicting or restoring the remaining other multiple main register file entries mapped in the accumulator register.
APPARATUS AND METHOD FOR IDENTIFYING AND PRIORITIZING CERTAIN INSTRUCTIONS IN A MICROPROCESSOR INSTRUCTION PIPELINE
A microprocessor improves Memory Level Parallelism (MLP) with minimal added complexity and without requiring segregated storage or management of instructions, by marking memory instructions and related instructions as urgent, and dispatching marked and unmarked instructions into common queuing circuitry for scheduled execution within scheduling circuitry that is configured to prioritize the execution of marked instructions. Instruction marking may be limited to the span of the renaming stage or may be extended to the span of the reorder buffer for additional gains in MLP.
Hierarchical general register file (GRF) for execution block
In an example, an apparatus comprises a plurality of execution units, and a first general register file (GRF) communicatively couple to the plurality of execution units, wherein the first GRF is shared by the plurality of execution units. Other embodiments are also disclosed and claimed.
System and method for instruction unwinding in an out-of-order processor
A system and corresponding method unwind instructions in an out-of-order (OoO) processor. The system comprises a mapper. In response to a restart event causing at least one instruction to be unwound, the mapper restores a present integer mapper state and present floating-point (FP) mapper state, used for mapping instructions, to a former integer mapper state and former FP mapper state, respectively. The mapper stores integer snapshots and FP snapshots of the present integer and FP mapper state, respectively, to expedite restoration to the former integer and FP mapper state, respectively. Access to the FP snapshots is blocked, intermittently, as a function of at least one FP present indicator used by the mapper to record presence of FP registers used as destinations in the instructions. Blocking the access, intermittently, improves power efficiency of the OoO processor.
TIGHTLY-COUPLED SLICE TARGET FILE DATA
A system may determine that two instructions may be combined based on a processing power of the processor and a size of the instructions, fuse the two instructions into a pair, map the two instructions with two register tags, write the two register tags into a mapper, write the fused instruction pair into an issue queue, issue the fused instruction pair to a vector-scalar transformation unit (VSU), and execute the two instructions.
Method and apparatus for executing vector instructions with merging behavior
A processor includes a register file and control logic that detects multiple different sets of sequential zero bits of a register in the register file, wherein each of the multiple different sets has a bit length that corresponds to a partial instruction width and operates at a first partial instruction width or a second partial instruction width with the register file depending on number of sets of zero bits detected in the register. In certain examples, the control logic causes operating at first instruction width that avoids merging of a first bit length of data in the register and operating at the second instruction width that avoids merging of a second bit length of data in the register. In some examples, a register rename map table incudes multiple zero bits that identify the detected multiple different sets of bits of sequential zeros.