G06F2212/452

CONTROL STATE PRESERVATION DURING TRANSACTIONAL EXECUTION

A method includes saving a control state for a processor in response to commencing a transactional processing sequence, wherein saving the control state produces a saved control state. The method also includes permitting updates to the control state for the processor while executing the transactional processing sequence. Examples of updates to the control state include key mask changes, primary region table origin changes, primary segment table origin changes, CPU tracing mode changes, and interrupt mode changes. The method also includes restoring the control state for the processor to the saved control state in response to encountering a transactional error during the transactional processing sequence. In some embodiments, saving the control state comprises saving the current control state to memory corresponding to internal registers for an unused thread or another level of virtualization. A corresponding computer system and computer program product are also disclosed herein.

Systems and methods for policy execution processing

A system and method of processing instructions may comprise an application processing domain (APD) and a metadata processing domain (MTD). The APD may comprise an application processor executing instructions and providing related information to the MTD. The MTD may comprise a tag processing unit (TPU) having a cache of policy-based rules enforced by the MTD. The TPU may determine, based on policies being enforced and metadata tags and operands associated with the instructions, that the instructions are allowed to execute (i.e., are valid). The TPU may write, if the instructions are valid, the metadata tags to a queue. The queue may (i) receive operation output information from the application processing domain, (ii) receive, from the TPU, the metadata tags, (iii) output, responsive to receiving the metadata tags, resulting information indicative of the operation output information and the metadata tags; and (iv) permit the resulting information to be written to memory.

ZERO LATENCY PREFETCHING IN CACHES

This invention involves a cache system in a digital data processing apparatus including: a central processing unit core; a level one instruction cache; and a level two cache. The cache lines in the second level cache are twice the size of the cache lines in the first level instruction cache. The central processing unit core requests additional program instructions when needed via a request address. Upon a miss in the level one instruction cache that causes a hit in the upper half of a level two cache line, the level two cache supplies the upper half level cache line to the level one instruction cache. On a following level two cache memory cycle, the level two cache supplies the lower half of the cache line to the level one instruction cache. This cache technique thus prefetches the lower half level two cache line employing fewer resources than an ordinary prefetch.

MEMORY BUILT-IN DEVICE, PROCESSING METHOD, PARAMETER SETTING METHOD, AND IMAGE SENSOR DEVICE
20230236984 · 2023-07-27 · ·

A memory built-in device according to the present disclosure is a memory built-in device including a processor; a memory access controller; and a memory to be accessed in accordance with a process by the memory access controller, wherein the memory access controller is configured to read and write data to be used in an operation of a convolution arithmetic circuit from and to the memory according to designation of a parameter.

Systems and methods for distributed scalable ray processing

Ray tracing systems have computation units (“RACs”) adapted to perform ray tracing operations (e.g. intersection testing). There are multiple RACs. A centralized packet unit controls the allocation and testing of rays by the RACs. This allows RACs to be implemented without Content Addressable Memories (CAMs) which are expensive to implement, but the functionality of CAMs can still be achieved by implemented them in the centralized controller.

Streaming engine with early and late address and loop count registers to track architectural state

A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. The streaming engine stores an early address of next to be fetched data elements and a late address of a data element in the stream head register for each of the nested loops. The streaming engine stores an early loop counts of next to be fetched data elements and a late loop counts of a data element in the stream head register for each of the nested loops.

Feature map and weight selection method and accelerating device

The present disclosure provides a processing device including: a coarse-grained pruning unit configured to perform coarse-grained pruning on a weight of a neural network to obtain a pruned weight, an operation unit configured to train the neural network according to the pruned weight. The coarse-grained pruning unit is specifically configured to select M weights from the weights of the neural network through a sliding window, and when the M weights meet a preset condition, all or part of the M weights may be set to 0. The processing device can reduce the memory access while reducing the amount of computation, thereby obtaining an acceleration ratio and reducing energy consumption.

METHOD AND APPARATUS TO SORT A VECTOR FOR A BITONIC SORTING ALGORITHM
20230229448 · 2023-07-20 ·

A method is provided that includes performing, by a processor in response to a vector sort instruction, sorting of values stored in lanes of the vector to generate a sorted vector, wherein the values in a first portion of the lanes are sorted in a first order indicated by the vector sort instruction and the values in a second portion of the lanes are sorted in a second order indicated by the vector sort instruction; and storing the sorted vector in a storage location.

Computing device and method

The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.

ARITHMETIC PROCESSOR AND METHOD FOR OPERATING ARITHMETIC PROCESSOR
20230010353 · 2023-01-12 · ·

An arithmetic processor including a plurality of core groups each including a plurality of cores and a cache unit, a plurality of home agents each including a tag directory and a store command queue and a store command queue. The store command queue enters the received store request to the entry queue in order of reception, the cache unit stores the data of the store request in a data RAM. The store command queue sets a data ownership acquisition flag of the store request to valid when obtaining a data ownership of the store request and issues a top-of-queue notification to the cache control unit when the flag of the top-of-queue entry is valid. In response to the top-of-queue notification, the cache unit update a cache tag to modified state and issue a store request completion notification.