INSTRUCTION PACKING SCHEME FOR VLIW CPU ARCHITECTURE
20230221959 · 2023-07-13
Inventors
- Saya Goud Langadi (Bangalore, IN)
- Venkatesh Natarajan (Karnataka, IN)
- Alexander Tessarolo (Lindfield, AU)
Cpc classification
G06F9/30152
PHYSICS
G06F9/4843
PHYSICS
International classification
G06F9/30
PHYSICS
G06F9/38
PHYSICS
Abstract
A processor is provided and includes a core that is configured to perform a decode operation on a multi-instruction packet comprising multiple instructions. The decode operation includes receiving the multi-instruction packet that includes first and second instructions. The first instruction includes a primary portion at a fixed first location and a secondary portion. The second instruction includes a primary portion at a fixed second location between the primary portion of the first instruction and the secondary portion of the first instruction. An operational code portion of the primary portion of each of the first and second instructions is accessed and decoded. An instruction packet including the primary and secondary portions of the first instruction is created, and a second instruction packet including the primary portion of the second instruction is created. The first and second instructions packets are dispatched to respective first and second functional units.
Claims
1. A method comprising: receiving a first packet that includes: a first instruction that includes a first primary opcode portion; and a second instruction that includes: a second primary opcode portion arranged adjacent to the first primary opcode portion in the first packet; and a secondary opcode portion separated from the second primary opcode portion in the first packet; determining a first functional unit associated with the first instruction based on the first primary opcode portion; providing a second packet that includes the first primary opcode portion to the first functional unit; determining a second functional unit associated with the second instruction based on the second primary opcode portion; and providing a third packet that includes the second primary opcode portion and the secondary opcode portion to the second functional unit.
2. The method of claim 1, wherein: the first instruction includes a first size field that specifies a size of the first instruction based on the first size field and the first primary opcode portion; and the second instruction includes a second size field that specifies a size of the second instruction based on the second size field, the second primary opcode portion, and the secondary opcode portion.
3. The method of claim 2, wherein the size of the first instruction is different from the size of the second instruction.
4. The method of claim 2, wherein: the first instruction includes a first link field that specifies whether the first instruction is last in the first packet; the size of the first instruction as specified by the first size field is further based on the first link field; the second instruction includes a second link field that specifies whether the second instruction is last in the first packet; and the size of the second instruction as specified by the second size field is further based on the second link field.
5. The method of claim 1, wherein a size of the first primary opcode portion is equal to a size of the second primary opcode portion.
6. The method of claim 1, wherein the first instruction does not include a secondary opcode portion.
7. The method of claim 1, wherein the determining of the first functional unit and the determining of the second functional unit are performed prior to accessing the secondary opcode portion of the second instruction.
8. The method of claim 1, wherein the providing of the third packet to the second functional unit is performed without decoding the secondary opcode portion of the second instruction.
9. The method of claim 1, wherein the secondary opcode portion is separated from the second primary opcode portion in the first packet by a primary opcode portion of at least one other instruction.
10. A circuit device comprising: a set of processor functional units that includes a first functional unit and a second functional unit; and a decoder coupled to the set of processor functional units and configured to: receive a packet that includes: a first instruction that includes a first primary opcode portion; and a second instruction that includes: a second primary opcode portion arranged adjacent to the first primary opcode portion in the packet; and a secondary opcode portion separated from the second primary opcode portion in the packet; determine, based on the first primary opcode portion, that the first instruction is associated with the first functional unit; provide the first instruction to the first functional unit; determine, based on the second primary opcode portion, that the second instruction is associated with the second functional unit; and provide the second instruction to the second functional unit.
11. The circuit device of claim 10, wherein: the first instruction includes a first size field that specifies a size of the first instruction based on the first size field and the first primary opcode portion; and the second instruction includes a second size field that specifies a size of the second instruction based on the second size field, the second primary opcode portion, and the secondary opcode portion.
12. The circuit device of claim 11, wherein the size of the first instruction is different from the size of the second instruction.
13. The circuit device of claim 11, wherein: the first instruction includes a first link field that specifies whether the first instruction is last in the packet; the size of the first instruction as specified by the first size field is further based on the first link field; the second instruction includes a second link field that specifies whether the second instruction is last in the packet; and the size of the second instruction as specified by the second size field is further based on the second link field.
14. The circuit device of claim 10, wherein a size of the first primary opcode portion is equal to a size of the second primary opcode portion.
15. The circuit device of claim 10, wherein the first instruction does not include a secondary opcode portion.
16. The circuit device of claim 10, wherein the decoder is configured to determine that the first instruction is associated with the first functional unit and to determine that the second instruction is associated with the second functional unit prior to accessing the secondary opcode portion.
17. The circuit device of claim 10, wherein the decoder is configured to provide the second instruction to the second functional unit without decoding the secondary opcode portion of the second instruction.
18. The circuit device of claim 10, wherein the secondary opcode portion is separated from the second primary opcode portion in the packet by a primary opcode portion of at least one other instruction.
19. A circuit device comprising: a set of processor functional units that includes a first functional unit and a second functional unit; and a decoder coupled to the set of processor functional units and configured to: receive a packet that includes: a first portion that includes a set of primary opcode portions of a set of instructions; and a second portion that includes a second of secondary opcode portions of a subset of the set of instruction, wherein the second portion and the first portion are not interleaved; determine, based on the first portion of the packet, that a first instruction of the set of instructions is associated with the first functional unit and that a second instruction of the set of instructions is associated with the second functional unit; and provide the first instruction to the first functional unit and the second instruction to the second functional unit.
20. The circuit device of claim 19, wherein each primary opcode portion of the set of primary opcode portions has the same size.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0005] In the drawings:
[0006]
[0007]
[0008]
[0009]
[0010]
[0011]
DETAILED DESCRIPTION
[0012]
[0013] Decoder 106 includes decode logic 116 that decodes each instruction packet as described herein. Following the decode, a dispatch unit 118 sends each instruction to its designated functional unit for carrying out the instruction. In one example, processor 100 operates as a very long instruction word (VLIW) processor capable of operating on plural instructions in corresponding functional units simultaneously. Preferably, a compiler organizes instructions in multi-instruction packets that are executed together. Instruction dispatch unit 118 directs each instruction to its target functional unit 108. In an example, instruction dispatch unit 118 may operate on plural instructions in parallel such as having at least a portion of each instruction processed simultaneously. The number of such parallel instructions may be set by the number of instructions within the multi-instruction packet. A packet size calculation unit 120 keeps track of each instruction packet transferred from the instruction buffer 104.
[0014]
[0015]
[0016]
[0017] Referring to
[0018] As stated above, the first 13 bits of the opcode portions 206, 306, 406 of the instructions 200, 300, 400 form the primary portions 208, 308, 408 together with the respective link portions 202, 302, 402 and size portions 204, 304, 404. These first 13 bits include instructions for which a reduction in the amount of time needed to identify, decode, and send the bits on for further processing leads to an increase in CPU speed and performance. That is, the quicker the 13 bits can be identified, decoded, and processed, the faster the CPU can operate. Any remaining bits of the instruction in the secondary portion do not need to be decoded as part of the decoding process in order to determine where the instruction needs to be routed (e.g., to a particular functional unit 108). Instead, the secondary portion information can be provided to the respective functional unit for carrying out the instruction without spending time decoding the secondary portion. In this manner, the processor 100 avoids decoding the secondary portion when decoding the instructions 200, 300, 400 to determine where to dispatch the instructions 200, 300, 400.
[0019]
[0020]
[0021] Since for the first, 48-bit instruction 502 there exists an additional 32 bits of operational code in the secondary portion 510, the decoder 106 knows that additional opcode bits for instruction 502 are to be found in the multi-instruction packet 500. However, acquisition of this additional opcode is deferred until after all primary segments have been decoded so that processing time can be reduced, thus increasing CPU speed and performance.
[0022] At step 612, technique 600 determines whether additional instructions follow the recently decoded instruction in the current instruction packet based at least in part on the link portion of the recently decoded instruction. In the example illustrated in
[0023] In the example illustrated in
[0024] Since the third instruction 506 follows the second instruction 504, the link portion L1 of the second instruction 504 will indicate that at least one additional instruction follows the second instruction 504. Accordingly, the technique 600 returns (614) again to step 604 to decode the third instruction 506 following the steps outlined above in steps 604-610.
[0025] Following the decoding of the third instruction 506, the determination (at step 612) whether additional instructions follow will indicate that the third instruction 506 was the last instruction. Accordingly, technique 600 moves 616 to step 618 to create a complete instruction packet for each instruction 502, 504, 506. In the case of instructions 502, 504, their primary and secondary portions are joined. That is, based on the decoded size portion 204, 304, 404, technique 600 can determine the length of any secondary portion 310, 410 to be joined with their respective primary portions 308, 408. At step 620, the generated instruction packets are sent to the appropriate functional units 108 by, for example, the dispatch unit 118 (
[0026] Multi-instruction packets such as those described herein include instructions that are to be carried out in parallel or substantially simultaneously. The size of the instruction buffer 104 (
[0027] Additional timing savings can be accomplished by ensuring that the placement or ordering of the instructions in each multi-instruction packet is by size. For example, encoding each packet such that the instruction sizes follow a largest-to-smallest size can reduce decoding time. A table of example packet encoding is shown below.
TABLE-US-00001 TABLE 1 I0 I1 I2 I3 I4 I5 I6 I7 P1 48 bits 48 bits 32 bits P2 48 bits 48 bits 16 bits 16 bits P3 48 bits 32 bits 32 bits 16 bits P4 48 bits 32 bits 16 bits 16 bits 16 bits P5 48 bits 16 bits 16 bits 16 bits 16 bits 16 bits P6 32 bits 32 bits 32 bits 32 bits P7 32 bits 32 bits 32 bits 16 bits 16 bits P8 32 bits 32 bits 16 bits 16 bits 16 bits 16 bits P9 32 bits 16 bits 16 bits 16 bits 16 bits 16 bits 16 bits P10 16 bits 16 bits 16 bits 16 bits 16 bits 16 bits 16 bits 16 bits
[0028] For each instruction in Table 1, an example of the ordering of instructions (I0 up to I7) in various instruction packets (P1-P10) occupying all bits of a 128-bit instruction buffer is shown according to an example. When ensuring that the order of the possible instructions (I0-I7) in each packet is largest-to-smallest, it is possible to reduce CPU decoding time based on encountering a 16-bit instruction in a particular packet. For example, in instruction packet P2 of Table 1, a first 16-bit instruction is encountered at instruction I2. Since the instructions are in length order of largest-to-smallest, the decoding technique 600 can know that any subsequent instructions in the same instruction packet are also 16 bits and, therefore, does not need to decode the size portion of any subsequent instruction. Accordingly, for instruction packet P2, the decoding of instruction I3 can skip the decoding of its size. Similarly, instruction packets P4-P5 and P7-P10 can forgo decoding of the size of the instructions after the first 16-bit instruction is identified.
[0029] While instruction packets (P1-P10) in Table 1 illustrate using all 128-bits (or all bits of the instruction buffer 104), other instruction packets not using all 128 bits may be transferred from the instruction buffer. For example, an instruction packet may include a 48-bit instruction followed by three 16-bit instructions. The packet encoding scheme above, however, also works for these packets not using all 128 bits. That is, after a 16-bit instruction is found, the decoding of each subsequent instruction can skip decoding the instruction's size.
[0030] The foregoing description of various preferred embodiments of the invention have been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed, and obviously many modifications and variations are possible in light of the above teaching. The example embodiments, as described above, were chosen and described in order to best explain the principles of the invention and its practical application to thereby enable others skilled in the art to best utilize the invention in various embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims appended hereto.