METHOD FOR END-OF-COMPUTATION FLAG GENERATION IN A PULSE GENERATION CIRCUIT FOR AN IN-MEMORY COMPUTING SYSTEM
20250355563 · 2025-11-20
Inventors
Cpc classification
G06F3/0659
PHYSICS
G06N3/049
PHYSICS
G06F7/62
PHYSICS
International classification
Abstract
The present invention proposes a novel integrated circuit architecture for in-memory computing matrix-vector multipliers such that the computational latency is inversely proportional to the incoming magnitude of neuron activations. The main contribution of the present invention is that the proposed circuit is self-aware of the computational latency. At the end of the generated data pulses in which the number of pulses is proportional to the magnitude of incoming neuron activations, the circuit generates an end-of-computation flag such that the computing circuit can shorten the processing time of matrix-vector multiplications. The present invention can be integrated with any kind of analogue readout circuit, and the proposed circuit can be integrated with any kind of memory elements.
Claims
1. A method of computing for an in-memory computing system, the method comprising: a set of in-memory computing counters of an in-memory computing driver receiving a set of input neuron data sets through a set of input data lines; the in-memory computing driver setting an end-of-computation signal to a first end-of-computation signal value once an in-memory computing clock signal changes from a first signal value to a second signal value; the set of in-memory computing counters registering the set of input neuron data sets, a respective in-memory computing counter registering a respective input neuron data set received through a respective input data line during a respective time window; the set of in-memory computing counters initialising a set of internal counters with the set of input neuron data sets, a respective internal counter being initialised with the respective input neuron data set; setting the value of a respective flag signal of the respective in-memory computing counter to a first flag value if the value of the respective internal counter deviates from a first counter value, and increasing or decreasing the value of the respective internal counter during an adjustment cycle until the value of the respective internal counter equals the first counter value, and generating a respective set of signal pulses during the adjustment cycle to be fed to a memory array, the number of signal pulses generated during the adjustment cycle by the respective in-memory computing counter being proportional to the magnitude of the respective input neuron data set received by the respective in-memory computing counter during the respective time window; setting the value of the respective flag signal of the respective in-memory computing counter to a second flag value if the value of the respective internal counter equals the first counter value; and the in-memory computing driver setting the end-of-computation signal to a second end-of-computation signal value if the values of the flag signals of all the in-memory computing counters of the in-memory computing driver equal the second flag value.
2. The method according to claim 1, wherein a signal pulse is generated every time the value of the respective internal counter is increased or decreased.
3. The method according to claim 1, wherein the first counter value equals a signal low value, the first flag value equals a signal high value, the second flag value is a signal low value, and wherein the internal counters operate as down-counters decreasing the value of the internal counters by one at a frequency of a clock signal if the value of the respective internal counter is decreased during the adjustment cycle.
4. The method according to claim 1, wherein the first end-of-computation signal value is a signal high value, and the second end-of-computation signal value is a signal low value, and/or the end-of-computation signal is set to the first end-of-computation signal value as soon as the in-memory computing clock signal changes from a signal low value to a signal high value, or vice versa, and the end-of-computation signal is set to the second end-of-computation signal value if the values of the flag signals of all the in-memory computing counters of the in-memory computing driver equal the second flag value but only upon the in-memory computing clock signal changing from a signal low value to a signal high value, or vice versa.
5. The method according to claim 1, wherein the set of internal counters are initialised after a given delay from the registration of the set of input neuron data sets.
6. The method according to claim 1, wherein the change of the end-of-computation signal to the first end-of-computation signal value is indicative of a beginning of a signal computation cycle, and the change of the end-of-computation signal to the second end-of-computation signal value is indicative of an end-of-the signal computation cycle.
7. The method according to claim 1, wherein the method further comprises the step of feeding the end-of-computation signal to a memory array.
8. The method according to claim 1, wherein the end-of-computation signal is generated by an end-of-computation circuit comprising an arrangement of logic OR gates such that the second end-of-computation signal value is obtained as soon as the in-memory computing clock signal changes from the second signal value to the first signal value, and the values of the flag signals of all the in-memory computing counters of the in-memory computing driver equal the second flag value.
9. The method according to claim 1, wherein the respective in-memory counter comprises a respective input register for registering the respective input neuron data set, the respective internal counter, a respective reset operator for the respective internal counter and configured to receive a latch signal and the respective input neuron data set as an input data set, a respective pulse generator for generating the latch signal, and a respective flag controller for generating the respective flag signal and the respective set of signal pulses.
10. The method according to claim 9, wherein the respective internal counter comprises an individual flip-flop circuit for each bit position of the respective input neuron data set such that a respective individual flip-flop circuit is arranged to output a single bit value of the respective flip-flop circuit.
11. The method according to claim 10, wherein the respective flag controller comprises an arrangement of logic gates and is configured to receive as inputs the single bit values of the respective flip-flop circuit or their inverted values and output the respective flag signal, the respective set of signal pulses, and a clock signal to be fed to the respective internal counter.
12. The method according to claim 9, wherein the respective reset operator comprises an individual multiplexer circuit for each bit position of the respective input neuron data set to feed an individual bit to a respective individual flip-flop circuit of the counter.
13. The method according to claim 1, wherein the duty cycle of the in-memory computing clock signal is greater than 50%.
14. A computer program product comprising instructions for implementing the steps of the method according to claim 1 when loaded and run on an electronic device.
15. A computing device for an in-memory computing memory, the computing device comprising a set of in-memory computing counters, the computing device being configured to perform operations comprising: receive by the set of in-memory computing counters a set of input neuron data sets through a set of input data lines; set an end-of-computation signal to a first end-of-computation signal value once an in-memory computing clock signal changes from a first signal value to a second signal value; register by the set of in-memory computing counters the set of input neuron data sets, a respective in-memory computing counter registering a respective input neuron data set received through a respective input data line during a respective time window; initialise by the set of in-memory computing counters a set of internal counters with the set of input neuron data sets, a respective internal counter being initialised with the respective input neuron data set; set the value of a respective flag signal of the respective in-memory computing counter to a first flag value if the value of the respective internal counter deviates from a first counter value, and increasing or decreasing the value of the respective internal counter during an adjustment cycle until the value of the respective internal counter equals the first counter value, and generating a respective set of signal pulses during the adjustment cycle to be fed to a memory array, the number of signal pulses generated during the adjustment cycle by the respective in-memory computing counter being proportional to the magnitude of the respective input neuron data set received by the respective in-memory computing counter during the respective time window; set the value of the respective flag signal of the respective in-memory computing counter to a second flag value if the value of the respective internal counter equals the first counter value; and set the end-of-computation signal to a second end-of-computation signal value if the values of the flag signals of all the in-memory computing counters of the in-memory computing driver equal the second flag value.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0010] Other features and advantages of the invention will become apparent from the following description of a non-limiting example embodiment, with reference to the appended drawings, in which:
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
DETAILED DESCRIPTION OF AN EMBODIMENT OF THE INVENTION
[0018] An embodiment of the present invention will now be described in detail with reference to the attached figures. Identical or corresponding functional and structural elements which appear in different drawings are assigned the same reference signs. It is to be noted that the use of words first and second may not imply any kind of particular order or hierarchy unless such order or hierarchy is explicitly or implicitly made clear in the context. In the present description, signal value 0 represents a signal low value, or logic zero, while signal value 1 represents a signal high value, or logic high. In other words, signal value 0 may be considered to be a first or second signal value, while signal value 1 may be considered to be a second or first signal value. Similarly, flag value 0 may be considered to be a first flag value or a second flag value, while flag value 1 may be considered to be a second flag value or a first flag value. Furthermore, counter value 0 is in the following also referred to as a first counter value.
[0019]
[0020] Time windows T.sub.1, T.sub.2, etc. define a computational or counting cycle for the memory array 5. According to the present invention, the length of these time windows is dynamically adjusted based on the maximum number of pulses transferred in a given time window. According to prior art solutions, the length of these time windows is fixed. For instance, the length of these time windows in the solution disclosed in U.S. Pat. No. 11,322,195B2 is 7. According to the present invention, for example: [0021] In the time window T.sub.1, X.sub.0=4 outputs 4 pulses, X.sub.1=0 outputs no pulses, X.sub.2=1 outputs 1 pulse, X.sub.3=2 outputs 2 pulses, and X.sub.N=1 outputs 1 pulse. [0022] In the time window T.sub.2, X.sub.0=0 outputs no pulses, X.sub.1=2 outputs 2 pulses, X.sub.2=1 outputs 1 pulse, X.sub.3=0 outputs no pulses, and X.sub.N=1 outputs 1 pulse.
[0023] Determining the window duration (T.sub.1, T.sub.2, . . . ), which is the period of the end-of-computation (EOC) signal, is the main novelty of the present invention. As described in
[0026] As is shown in
[0027]
[0028] In this example, the EOC signal goes to the 0 state, when all the N+1 IMC counters reach the 0 state, and it stays in the 0 state until the next rising edge of f.sub.IMC. The EOC signal is generated by combining all N+1 FLG.sub.CNT(N) signals and the f.sub.IMC signal by OR gating. The EOC signal is shared between all the N+1 IMC counters 9 ensuring all the N+1 IMC counters are synchronised. If any IMC counter is still down counting, such that the falling edge of the FLG.sub.CNT(N) signal is not generated (at least one FLG.sub.CNT(N)=1), the output of O.sub.1 gate as shown in
[0029]
[0030] The input neurons (X.sub.0, 1, . . . , N) are in this example registered with REG[2:0] at the rising edge of the EOC signal. After a short delay, LAT.sub.CNT, which is a short pulse, is generated. LAT.sub.CNT is used to initialise the counter 16 according to the input neuron values.
[0031] If X[2:0] is non-zero such that REG[2:0] is also non-zero, then at least one bit of CNT[2:0] will be initialised as 1 thereby making the output of the NAND gate N.sub.1 1, which is FLG.sub.CNT. Then the first AND gate A.sub.1 becomes a buffer of negative or inverted f.sub.IMC signal to CKC, which is a counter-clock. In this case, the CKC signal keeps down-counting the initialised flip-flops 23 until CNT[2:0] reaches the 0 state. Since FLG.sub.CNT is 1, EN.sub.WL is outputting pulses while the counter is down-counting. If the CNT[2:0] reaches 0, the output of the NAND gate N.sub.1 (FLG.sub.CNT) becomes 0 and thus both CKC and EN.sub.WL are gated, i.e. kept to 0, ensuring the counter is inactive.
[0032] For example, if X[2:0]=3, then: [0033] REG[2:0] is registered as 011 and thus the counter 16 is initialised as 011 at the LAT.sub.CNT pulse. [0034] The FLG.sub.CNT is 1 and thus every falling edge of f.sub.IMC signal activates the counter 16 to count down. [0035] Since it requires three pulses to count CNT[2:0] down to 0, EN.sub.WL outputs 3 pulses which are synchronised with f.sub.IMC. [0036] When the counter 16 reaches 0, the FLG.sub.CNT becomes 0 and EN.sub.WL generates no pulses. [0037] When the rising edge of the EOC signal comes in, the same operating cycles begin. [0038] The magnitude of X[2:0] determines how many cycles are required to make FLG.sub.CNT signal to 0, and thus the EOC signal to 0, seamlessly realising the magnitude-aware computation latency.
[0039] For example, if X[2:0]=0, then: [0040] REG[2:0] is registered as 000 and thus the counter 16 is initialised as 000 at the LAT.sub.CNT pulse. [0041] The FLG.sub.CNT signal stays 0 and thus EN.sub.WL signal generates no pulses, seamlessly supporting the input sparsity management without requiring any additional circuitry. [0042] When the rising edge of the EOC signal comes in, the same operating cycles begin.
[0043] The flow chart of
[0044]
[0045]
[0046] To summarise the above teachings, one aspect of the present invention proposes a novel integrated circuit architecture for in-memory computing matrix-vector multipliers such that the computational latency is inversely proportional to the incoming magnitude of neuron activations. The main contribution of the present invention is that the proposed circuit is self-aware of the computational latency. At the end of the generated data pulses in which the number of pulses is proportional to the magnitude of incoming neuron activations, the circuit generates an end-of-computation flag such that the computing circuit can shorten the processing time of matrix-vector multiplications. The present invention can be integrated with any kind of analogue readout circuit, such as oscillator-based analog-to-digital converter (ADC), or successive approximation register (SAR) ADC. The proposed circuit can be integrated with any kind of memory elements, such as static random-access memory (SRAM), memristors, etc.
[0047] It is to be noted the above-described method may be modified in many ways. For instance, instead of operating as a down-counter, the counter 16 may operate as an up-counter counting up to a given threshold value. In this case, at step 48 the CKC signal would be used to up-count the counter until the given threshold value is reached. Furthermore, instead of an action being triggered at a rising or falling edge, the action could be triggered at a falling or rising edge, respectively. Moreover, a different arrangement of logic gates may be used depending how the signals are arranged.
[0048] The method steps described above may be carried out by suitable circuits or circuitry when the process is implemented in hardware or using hardware for individual steps. However, the method or at least some of the method steps may also or instead be implemented in software. Thus, at least some of the method steps can be considered as computer-implemented steps. The terms circuits and circuitry refer to physical electronic components or modules (e.g., hardware), and any software and/or firmware (code) that may configure the hardware, be executed by the hardware, and or otherwise be associated with the hardware. The circuits may thus be operable (i.e., configured) to carry out or they comprise means for carrying out the required method steps as described above.
[0049] While the invention has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative or exemplary and not restrictive, the invention being not limited to the disclosed embodiment. Other embodiments and variants are understood, and can be achieved by those skilled in the art when carrying out the claimed invention, based on a study of the drawings, the disclosure and the appended claims. Further embodiments may be obtained by combining any of the teachings above.
[0050] In the claims, the word comprising does not exclude other elements or steps, and the indefinite article a or an does not exclude a plurality. The mere fact that different features are recited in mutually different dependent claims does not indicate that a combination of these features cannot be advantageously used. Any reference signs in the claims should not be construed as limiting the scope of the invention.