G06F9/32

Tracking streaming engine vector predicates to control processor execution

In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.

GENERATING A MASK VECTOR FOR DETERMINING A PROCESSOR INSTRUCTION ADDRESS USING AN INSTRUCTION TAG IN A MULTI-SLICE PROCESSOR
20170344379 · 2017-11-30 ·

Methods and apparatus for generating a mask vector for determining a processor instruction address using an instruction tag (ITAG) in a multi-slice processor including receiving a first ITAG value and an interrupt ITAG value; generating the mask vector divided into mask sections comprising a plurality of elements with unset flags; for each mask section: if the mask section comprises the first ITAG value, setting a flag of an element in the mask section corresponding to the first ITAG value; if the mask section comprises the interrupt ITAG value, setting a flag of an element in the mask section corresponding to the interrupt ITAG value; setting each flag of each element in the mask vector between the element in the mask vector corresponding to the first ITAG value and the element in the mask vector corresponding to the interrupt ITAG value; and providing the mask vector to an instruction fetch unit.

Replacement policy information for training table used by prefetch circuitry

Prefetch circuitry generates prefetch requests to prefetch information to a cache, based on prediction information trained using a training table comprising training entries. A given training entry associates a program counter indication associated with a trigger training memory access, a region indication indicative of a memory address region comprising a target address specified by the trigger training memory access, corresponding prediction information trained based on subsequent training memory access requests specifying target addresses in the same region as the target address of the trigger training memory access, and first and second replacement policy information. The first replacement policy information is used for replacement of an entry with another entry for the same program counter indication but different region. The second replacement policy information is used for replacement of an entry with another entry for a different program counter indication. This helps to increase prediction performance and reduce power consumption.

Microprocessor using compressed and uncompressed microcode storage

A microprocessor includes compressed and uncompressed microcode memory storages, having N-bit wide and M-bit wide addressable words, respectively, where N<M. The microprocessor also includes a fetch unit, an instruction translator, and an execution stage. When the instruction translator receives an architectural instruction, it writes information identifying source and destination registers specified by the architectural instruction to an indirection register. It also issues one or more fetch addresses to retrieve a sequence of one or more microcode instructions from one of the uncompressed microcode memory storage and the compressed microcode memory storage to implement the architectural instruction. It merges information in the indirection register with the sequence of one or more microcode instructions to generate a sequence of one or more implementing microinstructions.

Microprocessor using compressed and uncompressed microcode storage

A microprocessor includes compressed and uncompressed microcode memory storages, having N-bit wide and M-bit wide addressable words, respectively, where N<M. The microprocessor also includes a fetch unit, an instruction translator, and an execution stage. When the instruction translator receives an architectural instruction, it writes information identifying source and destination registers specified by the architectural instruction to an indirection register. It also issues one or more fetch addresses to retrieve a sequence of one or more microcode instructions from one of the uncompressed microcode memory storage and the compressed microcode memory storage to implement the architectural instruction. It merges information in the indirection register with the sequence of one or more microcode instructions to generate a sequence of one or more implementing microinstructions.

Method and apparatus for permuting streamed data elements

A method is provided that includes receiving, in a permute network, a plurality of data elements for a vector instruction from a streaming engine, and mapping, by the permute network, the plurality of data elements to vector locations for execution of the vector instruction by a vector functional unit in a vector data path of a processor.

Method and apparatus for permuting streamed data elements

A method is provided that includes receiving, in a permute network, a plurality of data elements for a vector instruction from a streaming engine, and mapping, by the permute network, the plurality of data elements to vector locations for execution of the vector instruction by a vector functional unit in a vector data path of a processor.

AUTOMATED RUNTIME CONFIGURATION FOR DATAFLOWS

Methods, systems and computer program products are provided for automated runtime configuration for dataflows to automatically select or adapt a runtime environment or resources to a dataflow plan prior to execution. Metadata generated for dataflows indicates dataflow information, such as numbers and types of sources, sinks and operations, and the amount of data being consumed, processed and written. Weighted dataflow plans are created from unweighted dataflow plans based on metadata. Weights that indicate operation complexity or resource consumption are generated for data operations. A runtime environment or resources to execute a dataflow plan is/are selected based on the weighted dataflow and/or a maximum flow. Preferences may be provided to influence weighting and runtime selections.

TECHNOLOGY TO LEARN AND OFFLOAD COMMON PATTERNS OF MEMORY ACCESS AND COMPUTATION

An example system includes memory; a central processing unit (CPU) to execute first operations; in-memory execution circuitry in the memory; and detector software to cause offloading of second operations to the in-memory execution circuitry, the in-memory execution circuitry to execute the second operations in parallel with the CPU executing the first operations.

CACHE-BASED COMMUNICATION FOR TRUSTED EXECUTION ENVIRONMENTS
20230168954 · 2023-06-01 ·

A method executes inter-enclave communication via cache memory of a processor. The method includes: instantiating a first enclave such that it is configured to execute a first communication thread, which is configured to read/write data to the cache memory; instantiating a second enclave such that it is configured to execute a second communication thread, which is configured to read/write data to cache memory; executing, by the first enclave, the first communication thread to send message data to the second enclave, executing the first communication thread comprising writing the message data to the cache memory; and executing, by the second enclave, the second communication thread to receive the message data. Executing the second communication thread can include: monitoring the cache memory to determine whether the data message is being sent; and based upon determining the data message is being sent, reading from the cache memory to receive the data message.