IPIQ

G06F9/3824

COMBINING LOADS OR STORES IN COMPUTER PROCESSING

20170249144 · 2017-08-31 ·

Aspects disclosed herein relate to combining instructions to load data from or store data in memory while processing instructions in processors. An exemplary method includes detecting a pattern of pipelined instructions to access memory using a first portion of available bus width and, in response to detecting the pattern, combining the pipelined instructions into a single instruction to access the memory using a second portion of the available bus width that is wider than the first portion. Devices including processors using disclosed aspects may execute currently available software in a more efficient manner without the software being modified.

Computer processor employing byte-addressable dedicated memory for operand storage

09747216 · 2017-08-29 ·

Mill Computing, Inc.

A computer processor including a first memory structure that operates over multiple cycles to temporarily store operands referenced by at least one instruction. A plurality of functional units performs operations that produce and access operands stored in the first memory structure. A second memory structure is provided, separate from the first memory structure. The second memory structure is configured as a dedicated memory for storage of operands copied from the first memory structure. The second memory structure is organized with a byte-addressable memory space and each operand stored in the second memory structure is accessed by a given byte address into the byte-addressable memory space.

Area and power efficient mechanism to wakeup store-dependent loads according to store drain merges

11243773 · 2022-02-08 ·

International Business Machines Corporation

A computer system, includes a store queue that holds store entries and a load queue that holds load entries sleeping on a store entry. A processor detects a store drain merge operation call and generates a pair of store tags comprising a first store tag corresponding to a first store entry to be drained and a second store tag corresponding to a second store entry to be drained. The processor determines the pair of store tags an even-type store tag or an odd-type store tag. The processor disables the odd store tag included in the even-type store tag pair when detecting the even-type store tag pair, and wakes up a first load entry dependent on the even store tag and a second load entry dependent on the odd store tag based on the even store tag included in the even-type store tag pair while the odd store tag is disabled.

Apparatus and method to preclude X86 special bus cycle load replays in an out-of-order processor

09740271 · 2017-08-22 ·

Via Alliance Semiconductor Co., Ltd.

An apparatus including first and second reservation stations. The first reservation station dispatches a load micro instruction, and indicates on a hold bus if the load micro instruction is a specified load micro instruction directed to retrieve an operand from a prescribed resource other than on-core cache memory, where the specified load instruction comprises a load instruction resulting from execution of an x86 special bus cycle. The second reservation station is coupled to the hold bus, and dispatches one or more younger micro instructions therein that depend on the load micro instruction for execution after a number of clock cycles following dispatch of the first load micro instruction, and if it is indicated on the hold bus that the load micro instruction is the specified load micro instruction, the second reservation station is configured to stall dispatch of the one or more younger micro instructions until the load micro instruction has received the operand, and is configured to preclude assertion of any indications that would otherwise result in a replay event.

Data selection for a processor pipeline using multiple supply lines

11429389 · 2022-08-30 ·

Imagination Technologies Limited

A method for a plurality of pipelines, each having a processing element having first and second inputs and first and second lines, wherein at least one of the pipelines includes first and second logic operable to select a respective line so that data is received at the first and second inputs respectively. A first mode is selected and for the at least one pipeline, the first and second lines of that pipeline are selected such that the processing element of that pipeline receives data via the first and second lines of that pipeline, the first line being capable of supplying data that is different to the second line. A second mode is selected and for the at least one pipeline a line of another pipeline is selected, the second line of the at least one pipeline is selected and the same data at the second line is supplied as the first line.

Method and Apparatus for Scheduling of Instructions in a Multi-Strand Out-Of-Order Processor

20170235578 · 2017-08-17 ·

In accordance with embodiments disclosed herein, there are provided methods, systems, and apparatuses for scheduling instructions in a multi-strand out-of-order processor. For example, an apparatus for scheduling instructions in a multi-strand out-of-order processor includes an out-of-order instruction fetch unit to retrieve a plurality of interdependent instructions for execution from a multi-strand representation of a sequential program listing; an instruction scheduling unit to schedule the execution of the plurality of interdependent instructions based at least in part on operand synchronization bits encoded within each of the plurality of interdependent instructions; and a plurality of execution units to execute at least a subset of the plurality of interdependent instructions in parallel.

CACHE SYSTEMS AND CIRCUITS FOR SYNCING CACHES OR CACHE SETS

20220308886 · 2022-09-29 ·

Steven Jeffrey Wallach

A cache system, having a first cache, a second cache, and a logic circuit coupled to control the first cache and the second cache according to an execution type of a processor. When an execution type of a processor is a first type indicating non-speculative execution of instructions and the first cache is configured to service commands from a command bus for accessing a memory system, the logic circuit is configured to copy a portion of content cached in the first cache to the second cache. The cache system can include a configurable data bit. The logic circuit can be coupled to control the caches according to the bit. Alternatively, the caches can include cache sets. The caches can also include registers associated with the cache sets respectively. The logic circuit can be coupled to control the cache sets according to the registers.

HANDLING OVERSIZE STORE TO LOAD FORWARDING IN A PROCESSOR

20220035631 · 2022-02-03 ·

System includes at least one computer processor having a load store execution unit (LSU) for processing load and store instructions, wherein the LSU includes (a) a store queue having a plurality of entries for storing data, each store queue entry having a data field for storing the data, the data field having a width for storing the data; and (b) a gather buffer for holding data, wherein the processor is configured to: process oversize data larger than the width of the data field of the store queue, and process an oversize load instruction for oversize data by executing two passes through the LSU, a first pass through the LSU configured to store a first portion of the oversize data in the gather buffer and a second pass through the LSU configured to merge the first portion of the oversize data with a second portion of the oversize data.

OPERATION OF A MULTI-SLICE PROCESSOR WITH DYNAMIC CANCELING OF PARTIAL LOADS

20170277543 · 2017-09-28 ·

Operation of a multi-slice processor that includes a plurality of execution slices and a plurality of load/store slices, where the multi-slice processor is configured to dynamically cancel partial load operations by, among other steps, receiving a load instruction requesting multiple portions of data; receiving a load instruction requesting multiple portions of data; determining that a load of one portion of the requested multiple portions is unavailable to be issued; and responsive to determining that the load of the one portion of the requested multiple portions is unavailable to be issued, delaying issuance of the load instruction.

EFFICIENT WORK EXECUTION IN A PARALLEL COMPUTING SYSTEM

20170277567 · 2017-09-28 ·

A computing device performs parallel computations using a set of thread processing units and a memory shuffle engine. The memory shuffle engine includes a register array to store an array of data elements retrieved from a memory buffer, and an array of input selectors. According to a first control signal, each input selector transfers at least a first data element from a corresponding subset of the register array, which is coupled to the input selector via input lines, to one or more corresponding thread processing units. According to a second control signal, each input selector transfers at least a second data element from another subset of the register array, which is coupled to another input selector via other input lines, to the one or more corresponding thread processing units.

Patent classifications

G06F9/3824