G06F9/321

Configuration profiles for graphics processing unit

A system may include a graphics processing unit including a command counter. The system may also include a general-purpose processor to: in response to a detection of a timing signal, determine a count value of the command counter included in the graphics processing unit; determine a first threshold range of a plurality of threshold ranges that matches the determined count value of the command counter; select, based on the determined first threshold range, a first configuration profile of a plurality of configuration profiles for the graphics processing unit; and cause the graphics processing unit to use the selected first configuration profile. Other embodiments are described and claimed.

MANAGING AN EFFECTIVE ADDRESS TABLE IN A MULTI-SLICE PROCESSOR

Methods and apparatus for managing an effective address table (EAT) in a multi-slice processor including receiving, from an instruction sequence unit, a next-to-complete instruction tag (ITAG); obtaining, from the EAT, a first ITAG from a tail-plus-one EAT row, wherein the EAT comprises a tail EAT row that precedes the tail-plus-one EAT row; determining, based on a comparison of the next-to-complete ITAG and the first ITAG, that the tail EAT row has completed; and retiring the tail EAT row based on the determination.

Replacement policy information for training table used by prefetch circuitry

Prefetch circuitry generates prefetch requests to prefetch information to a cache, based on prediction information trained using a training table comprising training entries. A given training entry associates a program counter indication associated with a trigger training memory access, a region indication indicative of a memory address region comprising a target address specified by the trigger training memory access, corresponding prediction information trained based on subsequent training memory access requests specifying target addresses in the same region as the target address of the trigger training memory access, and first and second replacement policy information. The first replacement policy information is used for replacement of an entry with another entry for the same program counter indication but different region. The second replacement policy information is used for replacement of an entry with another entry for a different program counter indication. This helps to increase prediction performance and reduce power consumption.

TECHNOLOGY TO LEARN AND OFFLOAD COMMON PATTERNS OF MEMORY ACCESS AND COMPUTATION

An example system includes memory; a central processing unit (CPU) to execute first operations; in-memory execution circuitry in the memory; and detector software to cause offloading of second operations to the in-memory execution circuitry, the in-memory execution circuitry to execute the second operations in parallel with the CPU executing the first operations.

CACHE-BASED COMMUNICATION FOR TRUSTED EXECUTION ENVIRONMENTS
20230168954 · 2023-06-01 ·

A method executes inter-enclave communication via cache memory of a processor. The method includes: instantiating a first enclave such that it is configured to execute a first communication thread, which is configured to read/write data to the cache memory; instantiating a second enclave such that it is configured to execute a second communication thread, which is configured to read/write data to cache memory; executing, by the first enclave, the first communication thread to send message data to the second enclave, executing the first communication thread comprising writing the message data to the cache memory; and executing, by the second enclave, the second communication thread to receive the message data. Executing the second communication thread can include: monitoring the cache memory to determine whether the data message is being sent; and based upon determining the data message is being sent, reading from the cache memory to receive the data message.

Data storage optimization for non-volatile memory

Non-volatile devices may be configured such that a clear operation on a single bit clears an entire block of bits. The representation of particular data structures may be optimized to reduce the number of clear operations required to store the representation in non-volatile memory. A data schema may indicate that a data structure of an application may be optimized for storage in non-volatile memory. A translation layer may convert an application level representation of a data value associated with the data structure to an optimized storage representation of the data value before storing the optimized storage representation of the data value in non-volatile memory.

COMPUTING DEVICE
20230176865 · 2023-06-08 ·

The present disclosure relates to a computing device. A computing device includes an arithmetic processing circuit configured to execute a program, and a program memory for storing the program. Each instruction in the program has a length of 16 bits. The program memory has a first memory area, and a second memory area in which higher addresses than the first memory area are associated. The arithmetic processing circuit has a 16-bit program counter for specifying an address to be read, and reads and executes an instruction at an address corresponding to an upper 15-bit value of the program counter from a target memory area, wherein the target memory area is, of the first memory area and the second memory area, a memory area corresponding to a value of a least significant bit in the program counter.

Memory Systems and Memory Control Methods

Memory systems and memory control methods are described. According to one aspect, a memory system includes a plurality of memory cells individually configured to store data, program memory configured to store a plurality of first executable instructions which are ordered according to a first instruction sequence and a plurality of second executable instructions which are ordered according to a second instruction sequence, substitution circuitry configured to replace one of the first executable instructions with a substitute executable instruction, and a control unit configured to execute the first and second executable instructions to control reading and writing of the data with respect to the memory, wherein the control unit is configured to execute the first executable instructions according to the first instruction sequence, to execute the substitute executable instruction after the execution of the first executable instructions, and to execute the second executable instructions according to the second instruction sequence as a result of execution of the substitute executable instruction.

Processor with a Program Counter Increment Based on Decoding of Predecode Bits
20170277540 · 2017-09-28 ·

A processor includes: an instruction fetch portion configured to fetch simultaneously a plurality of fixed-length instructions in accordance with a program counter; an instruction predecoder configured to predecode specific fields in a part of the plurality of fixed-length instructions; and a program counter management portion configured to control an increment of the program counter in accordance with a result of the predecoding.

DUAL DATA STREAMS SHARING DUAL LEVEL TWO CACHE ACCESS PORTS TO MAXIMIZE BANDWIDTH UTILIZATION
20170249150 · 2017-08-31 ·

A streaming engine employed in a digital data processor specifies fixed first and second read only data streams. Corresponding stream address generator produces address of data elements of the two streams. Corresponding steam head registers stores data elements next to be supplied to functional units for use as operands. The two streams share two memory ports. A toggling preference of stream to port ensures fair allocation. The arbiters permit one stream to borrow the other's interface when the other interface is idle. Thus one stream may issue two memory requests, one from each memory port, if the other stream is idle. This spreads the bandwidth demand for each stream across both interfaces, ensuring neither interface becomes a bottleneck.