G06F9/3004

Apparatus for Processor with Macro-Instruction and Associated Methods

An apparatus includes an array processor to process array data in response to a set of macro-instructions. A macro-instruction in the set of macro-instructions performs loop operations, array iteration operations, and/or arithmetic logic unit (ALU) operations.

Compression assist instructions
11537399 · 2022-12-27 · ·

In an embodiment, a processor supports one or more compression assist instructions which may be employed in compression software to improve the performance of the processor when performing compression/decompression. That is, the compression/decompression task may be performed more rapidly and consume less power when the compression assist instructions are employed then when they are not. In some cases, the cost of a more effective, more complex compression algorithm may be reduced to the cost of a less effective, less complex compression algorithm.

ASSOCIATIVELY INDEXED CIRCULAR BUFFER
20220405097 · 2022-12-22 ·

Some embodiments of the present disclosure provide an associatively indexed circular buffer (ACB). The ACB may be viewed as a dynamically allocatable memory structure that offers in-order data access (say, first-in-first-out, or “FIFO”) or random order data access at a fixed, relatively low latency. The ACB includes a data store of non-contiguous storage. To manage the pushing of data to, and popping data from, the data store, the ACB includes a contiguous pointer generator, a content addressable memory (CAM) and a free pool.

Display control apparatus, non-transitory recording medium and display controlling method for creating first tag, second tag not overlapping other tags displayed, and indicator correlating second tag with first tag
11531465 · 2022-12-20 · ·

A display control apparatus, including a processor and a storage storing instructions that, when executed by the processor, controls the processor to determine whether input of one or more first operations which are correlated to any position within a screen of a display is accepted, in a case where it is determined that the input of one or more first operations is accepted, specify one first position corresponding to the any position within the screen of the display, newly create one first-kind display area according to the one or more first operations, and display the newly created one first-kind display area on the specified one first position within the screen of the display, determine whether input of one or more second operations which are correlated to the displayed one first-kind display area is accepted, and in a case where it is determined that the input of the one or more second operations is accepted, specify one second position within the screen of the display, newly create one second-kind display area according to the one or more second operations, and display the newly created one second-kind display area on the specified one second position within the screen of the display so as to allow recognition of correlation of the one second-kind display area with the one first-kind display area.

Smallest or largest value element determination
11526355 · 2022-12-13 · ·

Examples of the present disclosure provide apparatuses and methods for smallest value element or largest value element determination in memory. An example method comprises: storing an elements vector comprising a plurality of elements in a group of memory cells coupled to an access line of an array; performing, using sensing circuitry coupled to the array, a logical operation using a first vector and a second vector as inputs, with a result of the logical operation being stored in the array as a result vector; updating the result vector responsive to performing a plurality of subsequent logical operations using the sensing circuitry; and providing an indication of which of the plurality of elements have one of a smallest value and a largest value.

Thread associated memory allocation and memory architecture aware allocation
11520633 · 2022-12-06 · ·

A method and system for thread aware, class aware, and topology aware memory allocations. Embodiments include a compiler configured to generate compiled code (e.g., for a runtime) that when executed allocates memory on a per class per thread basis that is system topology (e.g., for non-uniform memory architecture (NUMA)) aware. Embodiments can further include an executable configured to allocate a respective memory pool during runtime for each instance of a class for each thread. The memory pools are local to a respective processor, core, etc., where each thread executes.

Writing store data of multiple store operations into a cache line in a single cycle

A load-store unit (LSU) of a processor core determines whether or not a second store operation specifies an adjacent update to that specified by a first store operation. The LSU additionally determines whether the total store data length of the first and second store operations exceeds a maximum size. Based on determining the second store operation specifies an adjacent update and the total store data length does not exceed the maximum size, the LSU merges the first and second store operations and writes merged store data into a same write block of a cache. Based on determining that the total store data length exceeds the maximum size, the LSU splits the second store operation into first and second portions, merges the first portion with the first store operation, and writes store data of the partially merged store operation into the write block.

CRYPTOGRAPHIC COMPUTING USING ENCRYPTED BASE ADDRESSES AND USED IN MULTI-TENANT ENVIRONMENTS

Technologies disclosed herein provide cryptographic computing with cryptographically encoded pointers in multi-tenant environments. An example method comprises executing, by a trusted runtime, first instructions to generate a first address key for a private memory region in the memory and generate a first cryptographically encoded pointer to the private memory region in the memory. Generating the first cryptographically encoded pointer includes storing first context information associated with the private memory region in first bits of the first cryptographically encoded pointer and performing a cryptographic algorithm on a slice of a first linear address of the private memory region based, at least in part, on the first address key and a first tweak, the first tweak including the first context information. The method further includes permitting a first tenant in the multi-tenant environment to access the first address key and the first cryptographically encoded pointer to the private memory region.

Implementation of load acquire/store release instructions using load/store operation with DMB operation

A system and method are provided for simplifying load acquire and store release semantics that are used in reduced instruction set computing (RISC). Translating the semantics into micro-operations, or low-level instructions used to implement complex machine instructions, can avoid having to implement complicated new memory operations. Using one or more data memory barrier operations in conjunction with load and store operations can provide sufficient ordering as a data memory barrier ensures that prior instructions are performed and completed before subsequent instructions are executed.

Multiply-accumulation in a data processing apparatus
11513796 · 2022-11-29 · ·

A data processing apparatus, a method of operating a data processing apparatus, a non-transitory computer readable storage medium, and an instruction are provided. The instruction specifies a first source register, a second source register, and a set of N accumulation registers. In response to the instruction control signals are generated, causing processing circuitry to extract N data elements from content of the first source register, perform a multiplication of each of the N data elements by content of the second source register, and apply a result of each multiplication to content of a respective target register of the set of N accumulation registers. As a result plural (N) multiplications are performed in a manner that effectively provides a multiplier N times the register width, but without requiring the register file to be made N times larger.