G06F9/30098

Compiler program, compiling method, information processing device

A compiler program causes a computer to execute optimization processing for an optimization target program. The optimization target program includes a loop including a vector store instruction and a vector load instruction for an array variable. The optimization processing includes (1) unrolling the vector store instruction and the vector load instruction in the loop by an unrolling number of times to generate a plurality of unrolled vector store instructions and a plurality of unrolled vector load instructions, and (2) scheduling to move an unrolled vector load instruction among the plurality of unrolled vector load instructions, which is located after a first unrolled vector store instruction that is located at first among the plurality of unrolled vector load instructions, before the first unrolled vector store instruction.

STREAMING ENGINE WITH STREAM METADATA SAVING FOR CONTEXT SWITCHING
20230004391 · 2023-01-05 ·

A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces addresses of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. Stream metadata is stored in response to a stream store instruction. Stored stream metadata is restored to the stream engine in response to a stream restore instruction. An interrupt changes an open stream to a frozen state discarding stored stream data. A return from interrupt changes a frozen stream to an active state.

Power control arbitration

A local power control arbiter is provided to interface with a global power control unit of a processing platform having a plurality of processing entities. The local power control arbiter controls a local processing unit of the processing platform. The local power arbiter has an interface to receive from the global power control unit, a local performance limit allocated to the local processing unit depending on a global power control evaluation and processing circuitry to determine any change to one or more processing conditions prevailing in the local processing unit on a timescale shorter than a duration for which the local performance limit is applied to the local processing unit by the global power control unit and to select a performance level for the local processing unit depending on both the local performance limit and the determined change, if any, to the prevailing processing conditions on the local processing unit.

APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS FOR A HARDWARE ASSISTED HETEROGENEOUS INSTRUCTION SET ARCHITECTURE DISPATCHER

Systems, methods, and apparatuses to support instructions for a hardware assisted heterogeneous instruction set architecture dispatcher are described. In one embodiment, a hardware processor includes a plurality of processor cores comprising a first type of processor core that supports a first instruction set architecture and a second type of processor core that supports a second different instruction set architecture, a decoder circuit of a processor core of the plurality of processor cores to decode a single instruction into a decoded single instruction, the single instruction including a field that identifies a requested core type and an opcode that indicates an execution circuit of the processor core is to: read a register to determine a core type of the processor core, cause the processor core to enter a first mode, that only permits execution of the first instruction set architecture by the processor core, when the requested core type and the core type of the processor core are the first type, cause the processor core to enter a second mode, that only permits execution of the second different instruction set architecture by the processor core, when the requested core type and the core type of the processor core are the second type, cause the processor core to enter a third mode, that only permits execution of the first instruction set architecture by the processor core, when the requested core type is the second type and the core type of the processor core is the first type, and cause the processor core to enter a fourth mode, that only permits execution of the second different instruction set architecture by the processor core, when the requested core type is the first type and the core type of the processor core is the second type, and the execution circuit of the processor core to execute the decoded single instruction according to the opcode.

SYSTEM AND METHOD FOR FORMAL FAULT PROPAGATION ANALYSIS
20220414306 · 2022-12-29 ·

A system and method are disclosed for formulating a sequential equivalency problem for fault (non)propagation with minimal circuit logic duplication by leveraging information about the location and nature of a fault. The system and method further apply formal checking to safety diagnoses and efficiently models simple and complex transient faults.

Compression assist instructions
11537399 · 2022-12-27 · ·

In an embodiment, a processor supports one or more compression assist instructions which may be employed in compression software to improve the performance of the processor when performing compression/decompression. That is, the compression/decompression task may be performed more rapidly and consume less power when the compression assist instructions are employed then when they are not. In some cases, the cost of a more effective, more complex compression algorithm may be reduced to the cost of a less effective, less complex compression algorithm.

Mapping convolution to a channel convolution engine

A processor system comprises a first and second group of registers and a hardware channel convolution processor unit. The first group of registers is configured to store data elements of channels of a portion of a convolution data matrix. Each register stores at least one data element from each channel. The second group of registers is configured to store data elements of convolution weight matrices including a separate convolution weight matrix for each channel. Each register stores at least one data element from each convolution weight matrix. The hardware channel convolution processor unit is configured to multiply each data element in the first group of registers with a corresponding data element in the second group of registers and sum together the multiplication results for each specific channel to determine corresponding channel convolution result data elements in a corresponding channel convolution result matrix.

Use of conductive ink segments to establish secure device key

In one aspect, a system component includes a printed circuit (PC) board on which plural conductive ink segments are disposed. The system component also includes a sealed housing that houses the PC board. The plural conductive ink segments define a bit pattern to establish a key.

Reduction mode of planar engine in neural processor

Embodiments relate to a neural processor that includes one or more neural engine circuits and planar engine circuits. The neural engine circuits can perform convolution operations of input data with one or more kernels to generate outputs. The planar engine circuit is coupled to the plurality of neural engine circuits. A planar engine circuit can be configured to multiple modes. In a reduction mode, the planar engine circuit may process values arranged in one or more dimensions of input to generate a reduced value. The reduced values across multiple input data may be accumulated. The planar engine circuit may program a filter circuit as a reduction tree to gradually reduce the data into a reduced value. The reduction operation reduces the size of one or more dimensions of a tensor.

System and method for instruction mapping in an out-of-order processor
11531549 · 2022-12-20 · ·

A system and corresponding method map instructions in an out-of-order (OoO) processor. The system comprises a mapper, integer snapshot circuitry, and floating-point (FP) snapshot circuitry. The mapper maps instructions by mapping integer and FP architectural registers (ARs) of the instructions to integer and FP physical registers of the OoO processor, respectively. The mapper records, via at least one present FP indicator, presence of FP ARs used as destinations in the instructions. The mapper copies, periodically, the integer mapper state to the integer snapshot circuitry and copies, intermittently, based on the at least one FP present indicator, the FP mapper state to the FP snapshot circuitry. Copies of the integer and FP mapper state in the integer and FP snapshot circuitry, respectively, improve performance for instruction unwinding caused, for example, by an exception, branch/jump mispredict, etc. By copying the FP mapper state, intermittently, power efficiency of the OoO processor is improved.