G06F9/3853

Instruction packing scheme for VLIW CPU architecture

A processor is provided and includes a core that is configured to perform a decode operation on a multi-instruction packet comprising multiple instructions. The decode operation includes receiving the multi-instruction packet that includes first and second instructions. The first instruction includes a primary portion at a fixed first location and a secondary portion. The second instruction includes a primary portion at a fixed second location between the primary portion of the first instruction and the secondary portion of the first instruction. An operational code portion of the primary portion of each of the first and second instructions is accessed and decoded. An instruction packet including the primary and secondary portions of the first instruction is created, and a second instruction packet including the primary portion of the second instruction is created. The first and second instructions packets are dispatched to respective first and second functional units.

Streaming engine with multi dimensional circular addressing selectable at each dimension
11709779 · 2023-07-25 · ·

A streaming engine employed in a digital data processor may specify a fixed read-only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template register independently specifies a linear address or a circular address mode for each of the nested loops.

System, apparatus, and method for a transient load instruction within a VLIW operation
11561792 · 2023-01-24 · ·

A transient load instruction for a processor may include a transient or temporary load instruction that is executed in parallel with a plurality of input operands. The temporary load instruction loads a memory value into a temporary location for use within the instruction packet. According to some examples, a VLIW based microprocessor architecture may include a temporary cache for use in writing/reading a temporary memory value during a single VLIW packet cycle. The temporary cache is different from the normal register bank that does not allow writing and then reading the value just written during the same VLIW packet cycle.

Vector computational unit receiving data elements in parallel from a last row of a computational array

A microprocessor system comprises a vector computational unit and a control unit. The vector computational unit includes a plurality of processing elements. The control unit is configured to provide at least a single processor instruction to the vector computational unit. The single processor instruction specifies a plurality of component instructions to be executed by the vector computational unit in response to the single processor instruction and each of the plurality of processing elements of the vector computational unit is configured to process different data elements in parallel with other processing elements in response to the single processor instruction.

INSTRUCTION PACKING SCHEME FOR VLIW CPU ARCHITECTURE

A processor is provided and includes a core that is configured to perform a decode operation on a multi-instruction packet comprising multiple instructions. The decode operation includes receiving the multi-instruction packet that includes first and second instructions. The first instruction includes a primary portion at a fixed first location and a secondary portion. The second instruction includes a primary portion at a fixed second location between the primary portion of the first instruction and the secondary portion of the first instruction. An operational code portion of the primary portion of each of the first and second instructions is accessed and decoded. An instruction packet including the primary and secondary portions of the first instruction is created, and a second instruction packet including the primary portion of the second instruction is created. The first and second instructions packets are dispatched to respective first and second functional units.

STREAMING ENGINE WITH STREAM METADATA SAVING FOR CONTEXT SWITCHING
20230004391 · 2023-01-05 ·

A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces addresses of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. Stream metadata is stored in response to a stream store instruction. Stored stream metadata is restored to the stream engine in response to a stream restore instruction. An interrupt changes an open stream to a frozen state discarding stored stream data. A return from interrupt changes a frozen stream to an active state.

Memory-network processor with programmable optimizations

Various embodiments are disclosed of a multiprocessor system with processing elements optimized for high performance and low power dissipation and an associated method of programming the processing elements. Each processing element may comprise a fetch unit and a plurality of address generator units and a plurality of pipelined datapaths. The fetch unit may be configured to receive a multi-part instruction, wherein the multi-part instruction includes a plurality of fields. First and second address generator units may generate, based on different fields of the multi-part instruction, addresses from which to retrieve first and second data for use by an execution unit for the multi-part instruction or a subsequent multi-part instruction. The execution units may perform operations using a single pipeline or multiple pipelines based on third and fourth fields of the multi-part instruction.

Apparatus for Processor with Macro-Instruction and Associated Methods

An apparatus includes an array processor to process array data in response to a set of macro-instructions. A macro-instruction in the set of macro-instructions performs loop operations, array iteration operations, and/or arithmetic logic unit (ALU) operations.

Combining load or store instructions

Various aspects disclosed herein relate to combining instructions to load data from or store data in memory while processing instructions in a computer processor. More particularly, at least one pattern of multiple memory access instructions that reference a common base register and do not fully utilize an available bus width may be identified in a processor pipeline. In response to determining that the multiple memory access instructions target adjacent memory or non-contiguous memory that can fit on a single cache line, the multiple memory access instructions may be replaced within the processor pipeline with one equivalent memory access instruction that utilizes more of the available bus width than either of the replaced memory access instructions.

STREAMING ENGINE WITH FLEXIBLE STREAMING ENGINE TEMPLATE SUPPORTING DIFFERING NUMBER OF NESTED LOOPS WITH CORRESPONDING LOOP COUNTS AND LOOP OFFSETS
20230053842 · 2023-02-23 ·

A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template specifies loop count and loop dimension for each nested loop. A format definition field in the stream template specifies the number of loops and the stream template bits devoted to the loop counts and loop dimensions. This permits the same bits of the stream template to be interpreted differently enabling trade off between the number of loops supported and the size of the loop counts and loop dimensions.