PROCESSOR FOR PERFORMING A PREDETERMINED COMPUTATIONAL OPERATION, AND PROCESSING UNIT
20230315341 ยท 2023-10-05
Inventors
Cpc classification
G06F3/0659
PHYSICS
G06F9/4881
PHYSICS
G06F9/5055
PHYSICS
International classification
Abstract
A processor for performing a predetermined computational operation in which one or multiple data element(s) is/are used to determine a result. The processor includes one or more processor core(s) and at least one buffer memory, connectable to a main memory, and if the main memory is connected, it is designed to access the main memory. Each processor core is designed to execute instructions. The at least one buffer memory includes a calculation circuit which is designed to perform the computational operation in response to an execution signal if the one or the multiple data element(s) is/are stored in the buffer memory, the result being stored in the buffer memory. The processor is designed to perform the computational operation optionally using one of the processor cores with the aid of the instructions or to perform it in the at least one buffer memory using the respective calculation circuit.
Claims
1. A processor configured to perform a predetermined computational operation in which one or more data elements is used to determine a result, the processor comprising: one or more processor cores; at least one buffer memory; wherein the processor is connectable to a main memory, and being configured to access the main memory if the main memory is connected, each processor core being configured to execute instructions; wherein the at least one buffer memory includes a calculation circuit configured to execute the computational operation in response to an execution signal if the one or the multiple data elements is stored in the buffer memory, the result being stored in the at least one buffer memory; and wherein the processor is configured to perform the computational operation optionally using one of the processor cores using instructions or in the at least one buffer memory using the calculation circuit.
2. The processor as recited in claim 1, wherein the execution signal is an instruction and/or a write access to one of the data elements and/or a read access to the result.
3. The processor as recited in claim 1, wherein a decision whether the computational operation is performed using one of the processor cores or in the at least one buffer memory is based on an expected reusability degree of the one or more data elements and/or of the result.
4. The processor as recited in claim 3, wherein the processor is configured to select the at least one buffer memory to perform the computational operation if the expected reusability degree lies within a predetermined range.
5. The processor as recited in claim 1, wherein: the at least one buffer memory includes multiple first buffer memories and a second buffer memory, each of the first buffer memories being assigned to one of the processor cores or to a subset of the processor cores, and the second buffer memory being assigned to all processor cores; and the processor being configured to, if the computational operation is to be performed in the at least one buffer memory, perform the computational operation optionally in one of the first buffer memories or in the second buffer memory.
6. The processor as recited in claim 5, wherein the processor is configured to make a decision whether the computational operation is to be performed using one of the first buffer memories or in the second buffer memory based on an expected reusability degree of the one or more data elements and/or of the result.
7. The processor as recited in claim 6, wherein the processor is configured to select the one of the first buffer memories to perform the computational operation if the expected reusability degree lies above a predetermined first threshold, and/or to select the second buffer memory to perform the computational operation if the expected reusability degree lies below the predetermined first threshold or is equal to the predetermined first threshold.
8. The processor as recited in claim 1, wherein the processor is configured to determine, if the computational operation is to be performed in the at least one buffer memory, whether the one or more data elements is in the at least one buffer memory, and if the one or the multiple data elements is not stored in the at least one buffer memory, to load or store the one or more data elements into or in the at least one buffer memory.
9. The processor as recited in claim 1, wherein the processor is configured to, if the computational operation is to be performed in the at least one buffer memory, write the one or more data elements to predetermined memory addresses or to memory addresses, determined by initializable registers, in the at least one buffer memory.
10. A processing unit, comprising: a processor configured to perform a predetermined computational operation in which one or more data elements is used to determine a result, the processor including: one or more processor cores, at least one buffer memory, wherein the processor is connectable to a main memory, and being configured to access the main memory if the main memory is connected, each processor core being configured to execute instructions, wherein the at least one buffer memory includes a calculation circuit configured to execute the computational operation in response to an execution signal if the one or the multiple data elements is stored in the buffer memory, the result being stored in the at least one buffer memory, and wherein the processor is configured to perform the computational operation optionally using one of the processor cores using instructions or in the at least one buffer memory using the calculation circuit of the at least one buffer memory; and the main memory connected to the processor, the main memory including a calculation circuit, which is configured to perform the computational operation in response to an execution signal if the one or more data elements is stored in the main memory, the result being stored in the main memory; wherein the processor is configured to optionally induce the main memory to perform the computational operation using the calculation circuit of the main memory.
11. The processing unit as recited in claim 10, wherein the processor is configured to make a decision whether the main memory is to be induced to perform the computational operation based on an expected reusability degree of the one or more data elements and/or of the result.
12. The processing unit as recited in claim 11, wherein the processor is configured to induce the main memory to perform the computational operation if the reusability degree lies below a predetermined second threshold.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0036]
[0037]
[0038]
DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS
[0039]
[0040] A schema of a first buffer memory 10 is shown in an enlarged view on the left in
[0041] First buffer memory 10 has a calculation circuit 14 (or buffer memory calculation circuit, or first buffer memory calculation circuit). Calculation circuit 14 is designed to perform a certain computational operation such as the above-mentioned MAC operation for data elements stored in memory array 12 of first buffer memory 10, and to store the result of the computational operation as a data element in memory array 12. In the illustrated example, the computational operation or calculation circuit 14 which implements the computational operation, determines a result, which is stored as a data element 34 in memory array 16, from two operands that are stored as data elements 31, 32 in memory array 16. In the example of the MAC operation, the end result of the summation is stored as the result, and no buffer storing/loading of intermediate results of a running total or similar operation takes place. The computational operation is performed by calculation circuit 14 of first buffer memory 10 without any involvement of processor cores 4. A transfer of the data elements into one of the processor cores and a transfer of the result back to the first buffer memory are therefore omitted, which means that delays caused by the data transfer can be avoided. The computational operation is performed by the calculation circuit in response to a corresponding instruction or, more generally, in response to an execution signal.
[0042]
[0043] Processor 2 includes a plurality of subassemblies 3 (two, by way of example, but the number may also differ), which may be designed according to the embodiment illustrated in
[0044] Processor 2 additionally includes a second buffer memory 20 (shown in an enlarged view in
[0045] Second buffer memory 20 (e.g., a level 2 cache) has a similar development to first buffer memory 10, and thus includes a memory array 22 (a system of memory cells, e.g., SRAM cells), an addressing circuit 26 by which data elements 31, 32, 34 stored in memory array 12 are able to be addressed, so that a (read and write) access to them is possible, and a memory latch 28, which is used to briefly store the data elements or their bits while memory accesses to memory array 22 take place, so that they can be read out with the aid of the memory latch or the content of the memory latch be transmitted to the memory cells.
[0046] Second buffer memory 20 includes a calculation circuit 24 (second buffer memory calculation circuit). Calculation circuit 24 is designed to perform the particular computational operation such as the above-mentioned MAC operation for data elements stored in memory array 22 of second buffer memory 20 and to store the result of the computational operation as a data element in memory array 22. In the illustrated example, the computational operation, or calculation circuit 24 which implements the computational operation, determines a result from two operands that are stored as data elements 31, 32 in memory array 26, the result being stored as a data element 34 in memory array 26. The computational operation is carried out by calculation circuit 24 of second buffer memory 20 without any involvement of processor cores 4 (or of the first buffer memory). A transfer of the data elements to one of the processor cores, which would additionally run via first buffer memory 10, is therefore omitted.
[0047]
[0048] As also first and/or second buffer memory 10, 20, main memory 6 preferably includes a calculation circuit 44 (main memory calculation circuit), which is designed to execute the particular computational operation for data elements stored in main memory 6 and to store the result of the computational operation as a data element in main memory 6.