IPIQ

G06F9/32

Generation and use of memory access instruction order encodings

11681531 · 2023-06-20 ·

Microsoft Technology Licensing, Llc

Apparatus and methods are disclosed for controlling execution of memory access instructions in a block-based processor architecture using a hardware structure that indicates a relative ordering of memory access instruction in an instruction block. In one example of the disclosed technology, a method of executing an instruction block having a plurality of memory load and/or memory store instructions includes selecting a next memory load or memory store instruction to execute based on dependencies encoded within the block, and on a store vector that stores data indicating which memory load and memory store instructions in the instruction block have executed. The store vector can be masked using a store mask. The store mask can be generated when decoding the instruction block, or copied from an instruction block header. Based on the encoded dependencies and the masked store vector, the next instruction can issue when its dependencies are available.

Generation and use of memory access instruction order encodings

11681531 · 2023-06-20 ·

Microsoft Technology Licensing, Llc

Instruction and Logic for Total Store Elimination

20170351516 · 2017-12-07 ·

A processor includes a front end including circuitry to decode instructions from an instruction stream, a data cache unit including circuitry to cache data for the processor, and a binary translator. The binary translator includes circuitry to identify a redundant store in the instruction stream, mark the start and end of a region of the instruction stream with the redundant store, remove the redundant store, and store an amended instruction stream with the redundant store removed.

Configuration profiles for graphics processing unit

11514551 · 2022-11-29 ·

Intel Corporation

A system may include a graphics processing unit including a command counter. The system may also include a general-purpose processor to: in response to a detection of a timing signal, determine a count value of the command counter included in the graphics processing unit; determine a first threshold range of a plurality of threshold ranges that matches the determined count value of the command counter; select, based on the determined first threshold range, a first configuration profile of a plurality of configuration profiles for the graphics processing unit; and cause the graphics processing unit to use the selected first configuration profile. Other embodiments are described and claimed.

TECHNIQUES FOR EFFICIENTLY TRANSFERRING DATA TO A PROCESSOR

20230185570 · 2023-06-15 ·

A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.

METHOD AND APPARATUS FOR REORDERING IN A NON-UNIFORM COMPUTE DEVICE

20170344367 · 2017-11-30 ·

Arm Limited

A data processing apparatus includes a multi-level memory system, one or more first processing unit coupled to the memory system at a first level and one or more second processing units each coupled to the memory system at a second level. A first reorder buffer maintains data order during execution of instructions by the first and second processing units and a second reorder buffer maintains data order during execution of the instructions by an associated second processing unit. An entry in the first reorder buffer is configured, dependent upon an indicator bit, as an entry for a single instruction or a pointer to an entry in the second reorder buffer. An entry in the second reorder buffer includes instruction block start and end addresses and indicators of input and output register. Instructions are released to a processing unit when all inputs, as indicated by the reorder buffers, are available.

METHOD AND APPARATUS FOR REORDERING IN A NON-UNIFORM COMPUTE DEVICE

20170344367 · 2017-11-30 ·

Arm Limited

Loop execution control for a multi-threaded, self-scheduling reconfigurable computing fabric using a reenter queue

11675598 · 2023-06-13 ·

Micron Technology, Inc.

Tony M. Brewer

Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an interconnection network; a processor; and a plurality of configurable circuit clusters. Each configurable circuit cluster includes a plurality of configurable circuits arranged in an array; a synchronous network coupled to each configurable circuit of the array; and an asynchronous packet network coupled to each configurable circuit of the array. A representative configurable circuit includes a configurable computation circuit and a configuration memory having a first, instruction memory storing a plurality of data path configuration instructions to configure a data path of the configurable computation circuit; and a second, instruction and instruction index memory storing a plurality of spoke instructions and data path configuration instruction indices for selection of a master synchronous input, a current data path configuration instruction, and a next data path configuration instruction for a next configurable computation circuit.

Loop thread order execution control of a multi-threaded, self-scheduling reconfigurable computing fabric

11675734 · 2023-06-13 ·

Micron Technology, Inc.

Tony M. Brewer

Tracking streaming engine vector predicates to control processor execution

11507520 · 2022-11-22 ·

Texas Instruments Incorporated

In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.

Patent classifications

G06F9/32