G06F9/30145

Systems and methods for policy execution processing

A system and method of processing instructions may comprise an application processing domain (APD) and a metadata processing domain (MTD). The APD may comprise an application processor executing instructions and providing related information to the MTD. The MTD may comprise a tag processing unit (TPU) having a cache of policy-based rules enforced by the MTD. The TPU may determine, based on policies being enforced and metadata tags and operands associated with the instructions, that the instructions are allowed to execute (i.e., are valid). The TPU may write, if the instructions are valid, the metadata tags to a queue. The queue may (i) receive operation output information from the application processing domain, (ii) receive, from the TPU, the metadata tags, (iii) output, responsive to receiving the metadata tags, resulting information indicative of the operation output information and the metadata tags; and (iv) permit the resulting information to be written to memory.

Look-up table initialize

A digital data processor includes an instruction memory storing instructions specifying a data processing operation and a data operand field, an instruction decoder coupled to the instruction memory for recalling instructions from the instruction memory and determining the operation and the data operand, and an operational unit coupled to a data register file and to an instruction decoder to perform a data processing operation upon an operand corresponding to an instruction decoded by the instruction decoder and storing results of the data processing operation. The operational unit is configured to perform a table write in response to a look up table initialization instruction by duplicating at least one data element from a source data register to create duplicated data elements, and writing the duplicated data elements to a specified location in a specified number of at least one table and a corresponding location in at least one other table.

SYSTEMS, METHODS, AND APPARATUSES FOR TILE LOAD

Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in the form of decode circuitry to decode an instruction having fields for an opcode, a destination matrix operand identifier, and source memory information, and execution circuitry to execute the decoded instruction to load groups of strided data elements from memory into configured rows of the identified destination matrix operand to memory.

MEMORY DEVICE FOR PROCESSING OPERATION, DATA PROCESSING SYSTEM INCLUDING THE SAME, AND METHOD OF OPERATING THE MEMORY DEVICE
20230236836 · 2023-07-27 ·

A memory device includes a memory having a memory bank, a processor in memory (PIM) circuit, and control logic. The PIM circuit includes instruction memory storing at least one instruction provided from a host. The PIM circuit is configured to process an operation using data provided by the host or data read from the memory bank and to store at least one instruction provided by the host. The control logic is configured to decode a command/address received from the host to generate a decoding result and to perform a control operation so that one of i) a memory operation on the memory bank is performed and ii) the PIM circuit performs a processing operation, based on the decoding result. A counting value of a program counter instructing a position of the instruction memory is controlled in response to the command/address instructing the processing operation be performed.

NEURAL NETWORK ACCELERATOR AND OPERATING METHOD THEREOF

A neural network accelerator includes an operator that calculates a first operation result based on a first tiled input feature map and first tiled filter data, a quantizer that generates a quantization result by quantizing the first operation result based on a second bit width extended compared with a first bit width of the first tiled input feature map, a compressor that generates a partial sum by compressing the quantization result, and a decompressor that generates a second operation result by decompressing the partial sum, the operator calculates a third operation result based on a second tiled input feature map, second tiled filter data, and the second operation result, and an output feature map is generated based on the third operation result.

APPARATUS AND METHOD FOR VECTOR PACKED DUAL COMPLEX-BY-COMPLEX AND DUAL COMPLEX-BY-COMPLEX CONJUGATE MULTIPLICATION

An apparatus and method for multiplying packed real and imaginary components of complex numbers and complex conjugates. For example, one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed real and imaginary data elements; a second source register to store a second plurality of packed real and imaginary data elements; and execution circuitry to execute the decoded instruction. The execution circuitry includes multiplier circuitry to multiply select real and imaginary data elements in the first and second source registers to generate a plurality of real and imaginary products; adder circuitry to add/subtract various real and imaginary products, scale the results according to an immediate of the instruction, round the scaled results; and saturation circuitry to saturate the rounded results.

Processor with instruction iteration

A processor includes a plurality of execution units. At least one of the execution units is configured to repeatedly execute a first instruction based on a first field of the first instruction indicating that the first instruction is to be iteratively executed.

Software assisted power management

Embodiments include an apparatus comprising an execution unit coupled to a memory, a microcode controller, and a hardware controller. The microcode controller is to identify a global power and performance hint in an instruction stream that includes first and second instruction phases to be executed in parallel, identify a local hint based on synchronization dependence in the first instruction phase, and use the first local hint to balance power consumption between the execution unit and the memory during parallel executions of the first and second instruction phases. The hardware controller is to use the global hint to determine an appropriate voltage level of a compute voltage and a frequency of a compute clock signal for the execution unit during the parallel executions of the first and second instruction phases. The first local hint includes a processing rate for the first instruction phase or an indication of the processing rate.

Systems, methods, and apparatuses for tile store

Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in at least a form of decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and destination memory information, and execution circuitry to execute the decoded instruction to store each data element of configured rows of the identified source matrix operand to memory based on the destination memory information.

Predicated vector load micro-operation for performing a complete vector load when issued before a predicate operation is available and a predetermined condition is unsatisfied
11714644 · 2023-08-01 · ·

A predicated vector load micro-operation specifies a load target address, a destination vector register for which active vector elements of the destination vector register are to be loaded with data associated with addresses identified based on the load target address, and a predicate operand indicative of whether each vector element of the destination vector register is active or inactive. A predetermined type of predicated vector load micro-operation can be issued to the processing circuitry before the predicate operand is determined to meet an availability condition, and if issued in this way memory access circuitry can determine, based on the load target address, whether the predetermined type of predicated vector load micro-operation satisfies a predetermined condition, and if the predetermined condition is unsatisfied, perform a complete vector load assuming all vector elements of the destination vector register are active vector elements, independent of whether the predicate operand when available identifies any inactive vector element of the destination vector register.