Control plane organization for flexible digital data plane
10802743 ยท 2020-10-13
Assignee
Inventors
- Francky Catthoor (Temse, BE)
- Praveen Raghavan (Los Gatos, CA, US)
- Daniele Garbin (Leuven, BE)
- Dimitrios Rodopoulos (Leuven, BE)
- Odysseas Zografos (Leuven, BE)
Cpc classification
G06F3/0604
PHYSICS
G06F3/0646
PHYSICS
G11C17/165
PHYSICS
G11C7/1006
PHYSICS
International classification
G06F12/00
PHYSICS
G11C7/10
PHYSICS
G11C13/00
PHYSICS
Abstract
A control plane for controlling transfer of data to a data plane is disclosed. In one aspect, the control plane comprises memory cells for storing a digitally coded parameter value and having a data input electrode, a data output electrode and a control electrode, n data input terminals that receive a data input value and apply it to the data input electrode of an associated memory cell, and n data output terminals coupled to a data output electrode of an associated memory cell. The control plane further comprise a first delay line having delay elements and arranged for receiving a stream of control bit values, and a second delay line having delay elements and arranged for receiving a signal for enabling the control bit values in the first delay line, wherein data is transferred in a controlled and synchronized fashion to an output electrode.
Claims
1. A control plane for controlling transfer of data to a data plane, the control plane comprising: a number n of at least two memory cells for each storing a digitally coded parameter value, each memory cell (D.sub.j) having a data input electrode, a data output electrode and a control electrode, wherein n is a natural number, n2, and j is a natural number; n data input terminals for each receiving a data input value and applying the data input value to the data input electrode of an associated memory cell (D.sub.j) among the memory cells to which the data input terminals are coupled, and n data output terminals, each coupled to a data output electrode of an associated memory cell (D.sub.j), wherein the control plane furthermore comprises a first delay line comprising n or n1 first delay elements, the first delay line being arranged for receiving a stream of control bit values, each first delay element controlling, based on a respective current control bit value in the stream of control bit values, by means of an access control device, the transfer of data received by a memory cell (Dj) via an associated data input electrode to an associated data output electrode, thereby combining the data input value with the digitally coded parameter value stored in the memory cell (D.sub.j), and a second delay line comprising n or n1 second delay elements, the second delay line being arranged for receiving an enabling signal for enabling the control bit values of the stream of control bit values in the first delay line to be passed to the control electrode of an associated memory cell (D.sub.j) such that, when a data input value appears at the data input electrode of the memory cell (Dj), the data input value is conditionally transferred in a controlled and synchronized fashion to the associated data output electrode, depending on the stream of control bit values and the enabling signal.
2. The control plane according to claim 1, wherein each corresponding element of the first delay line and the second delay line have a pairwise matched delay.
3. The control plane according to claim 1, wherein the first delay line has a first delay, and the second delay line has a second delay, matched to a fixed ratio, the value of the second delay being substantially twice the value of the first delay.
4. The control plane according to claim 1, wherein the memory cells (Dj) are implemented as thin film transistor (TFT) devices.
5. The control plane according to claim 1, wherein the at least two data output terminals are connected to a common node, the at least two data output terminals each receiving a combination of the value stored in the memory cell (Dj) and the data input value (j) applied to this memory cell, wherein j is a real number representing a voltage or a current.
6. The control plane according to claim 1, further comprising an output wire for delivering to an output node a sum of products of values stored in the memory cells (D.sub.j) and data input values (.sub.i) applied to the corresponding memory cell (D.sub.j) wherein .sub.i is a real number representing a voltage or a current and i is a natural number.
7. The control plane according to claim 1, wherein the synchronization in the first and/or second delay lines is achieved by means of a global clock signal.
8. The control plane according to claim 1, wherein the first and/or second delay lines includes a wave pipeline sequentially steered shared control line for taking care of synchronization between both.
9. The control plane according to claim 1, wherein the memory cells (D.sub.j) comprise low-leakage devices.
10. The control plane according to claim 9, wherein the memory cells (D.sub.j) are implemented as TFT devices in the back-end-of-line (BEOL).
11. The control plane according to claim 10, wherein the storing of the digitally coded parameter value is done by any of: weighted geometrical coding, current scaling, transistor threshold voltage scaling or accumulation period scaling.
12. The control plane according to claim 1, wherein the memory cells (D.sub.j) are implemented in 3D BEOL stacking technology.
13. The control plane according to claim 1, wherein the enabling signal has a single pulse.
14. A neural network or neuromorphic computing platform making use of a control plane for controlling transfer of data to a data plane, the control plane comprising: a number n of at least two memory cells for each storing a digitally coded parameter value, each memory cell (D.sub.j) having a data input electrode, a data output electrode and a control electrode, wherein n is a natural number, n2, and j is a natural number; n data input terminals for each receiving a data input value and applying the data input value to the data input electrode of an associated memory cell (D.sub.j) among the memory cells to which the data input terminals are coupled, and n data output terminals, each coupled to a data output electrode of an associated memory cell (D.sub.j), wherein the control plane furthermore comprises a first delay line comprising n or n1 first delay elements, the first delay line being arranged for receiving a stream of control bit values, each first delay element controlling, based on a respective current control bit value in the stream of control bit values, by means of an access control device, the transfer of data received by a memory cell (Dj) via an associated data input electrode to an associated data output electrode, thereby combining the data input value with the digitally coded parameter value stored in the memory cell (D.sub.j), and a second delay line comprising n or n1 second delay elements, the second delay line being arranged for receiving an enabling signal for enabling the control bit values of the stream of control bit values in the first delay line to be passed to the control electrode of an associated memory cell (D.sub.j) such that, when a data input value appears at the data input electrode of the memory cell (Dj), the data input value is conditionally transferred in a controlled and synchronized fashion to the associated data output electrode, depending on the stream of control bit values and the enabling signal.
15. A method for machine learning making use of a control plane for controlling transfer of data to a data plane, the control plane comprising: a number n of at least two memory cells for each storing a digitally coded parameter value, each memory cell (D.sub.j) having a data input electrode, a data output electrode and a control electrode, wherein n is a natural number, n2, and j is a natural number; n data input terminals for each receiving a data input value and applying the data input value to the data input electrode of an associated memory cell (D.sub.j) among the memory cells to which the data input terminals are coupled, and n data output terminals, each coupled to a data output electrode of an associated memory cell (D.sub.j), wherein the control plane furthermore comprises a first delay line comprising n or n1 first delay elements, the first delay line being arranged for receiving a stream of control bit values, each first delay element controlling, based on a respective current control bit value in the stream of control bit values, by means of an access control device, the transfer of data received by a memory cell (Dj) via an associated data input electrode to an associated data output electrode, thereby combining the data input value with the digitally coded parameter value stored in the memory cell (D.sub.j), and a second delay line comprising n or n1 second delay elements, the second delay line being arranged for receiving an enabling signal for enabling the control bit values of the stream of control bit values in the first delay line to be passed to the control electrode of an associated memory cell (D.sub.j) such that, when a data input value appears at the data input electrode of the memory cell (Dj), the data input value is conditionally transferred in a controlled and synchronized fashion to the associated data output electrode, depending on the stream of control bit values and the enabling signal.
16. The control plane according to claim 2, wherein the first delay line has a first delay, and the second delay line has a second delay, matched to a fixed ratio, the value of the second delay being substantially twice the value of the first delay.
17. The control plane according to claim 16, wherein the memory cells (Dj) are implemented as TFT devices.
18. The control plane according to claim 16, wherein the at least two data output terminals are connected to a common node, the at least two data output terminals each receiving a combination of the value stored in the memory cell (Dj) and the data input value (j) applied to this memory cell, wherein j is a real number representing a voltage or a current.
19. The control plane according to claim 18, further comprising an output wire for delivering to an output node a sum of products of values stored in the memory cells (D.sub.j) and data input values (.sub.i) applied to the corresponding memory cell (D.sub.j) wherein .sub.i is a real number representing a voltage or a current and i is a natural number.
20. The control plane according to claim 19, wherein the synchronization in the first and/or second delay lines is achieved by means of a global clock signal.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The disclosed technology will now be described further, by way of example, with reference to the accompanying drawings, wherein like reference numerals refer to like elements in the various figures.
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
DETAILED DESCRIPTION OF CERTAIN ILLUSTRATIVE EMBODIMENTS
(17) The disclosed technology will be described with respect to particular embodiments and with reference to certain drawings but the disclosed technology is not limited thereto.
(18) In this text, the main illustrations will come from the machine learning and neuromorphic domains, but the disclosed technology is not limited to these and can also be implemented in other domains, such as, e.g., digital multimedia processing (e.g., digital filtering).
(19) The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a sequence, either temporally, spatially, in ranking or in any other manner. It is to be understood that the terms so used are interchangeable under appropriate circumstances and that the embodiments of the disclosed technology described herein are capable of operation in other sequences than described or illustrated herein.
(20) It is to be noticed that the term comprising, used in the claims, should not be interpreted as being restricted to the means listed thereafter; it does not exclude other elements or steps. It is thus to be interpreted as specifying the presence of the stated features, integers, steps or components as referred to, but does not preclude the presence or addition of one or more other features, integers, steps or components, or groups thereof. Thus, the scope of the expression a device comprising means A and B should not be limited to devices consisting only of components A and B. It means that with respect to the disclosed technology, the only relevant components of the device are A and B.
(21) Reference throughout this specification to one embodiment or an embodiment means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosed technology. Thus, appearances of the phrases in one embodiment or in an embodiment in various places throughout this specification are not necessarily all referring to the same embodiment, but may. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner, as would be apparent to one of ordinary skill in the art from this disclosure, in one or more embodiments.
(22) Similarly it should be appreciated that in the description of exemplary embodiments of the disclosed technology, various features of the disclosed technology are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of the disclosed technology.
(23) Furthermore, while some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the disclosed technology, and form different embodiments, as would be understood by those in the art. For example, in the following claims, any of the claimed embodiments can be used in any combination.
(24) It should be noted that the use of particular terminology when describing certain features or aspects of the disclosed technology should not be taken to imply that the terminology is being re-defined herein to be restricted to include any specific characteristics of the features or aspects of the disclosed technology with which that terminology is associated.
(25) In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the disclosed technology may be practiced without these specific details. In other instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
(26) In the context of the disclosed technology, the data plane is the conceptual realization of transfer and processing of data between at least one data input and at least one data output. The detailed architectural and circuit implementation is irrelevant for the scope of this patent, and it can include a.o. datapaths, FIFO buffers, switches such as crossbar switches, etc.
(27) The control plane is the conceptual realization of the way the data plane is controlled. Here, however, we do focus also on the detailed architectural and circuit implementation (i.e., the control structure), which will be disclosed in more detail hereinafter.
(28) As an illustrative example, machine learning is dealt with hereinafter, without being intended to be limiting for the disclosed technology. Another important domain, but also not intended to be limiting, is neuromorphic engineering, for example spiking neural networks.
(29) Neuromorphic engineering tries to emulate the structure and function of the nervous system, to better understand the brain and/or to design more intelligent machines. Hereto, the similarity between semiconductor physics and electrophysiology is relied on, and the electrical characteristics of neurons and synapses are mimicked in CMOS. In neuromorphic computing, neurons and synapses behave as distributed processors and storage devices.
(30) Current neuromorphic platforms focus mostly on the local array and they nearly always use a 2-dimensional array structure for that, connecting an input and output layer of neurons (1-dimensional). In the state-of-the-art these local array organizations are not sufficiently optimized for energy and cost. They use costly non-volatile memory technologies that incur a large non-recurring engineering (NRE) fabrication cost. Moreover, they are not truly scalable towards a 3D integration because multiple memory layers require costly fabrication options again.
(31) In contrast, in embodiments of the disclosed technology high impedance devices, hence devices with low leakage current, e.g., below 10.sup.8 A, are used. When combined with a control device, these devices have a storage facility on their gate where charges can be stored. These devices can for example be microelectronic transistor devices. In particular embodiments, advantage can be taken of cheaply fabricated TFT technologies, and these are integrated in the BEOL. The parametric weight storage for both the local array and the inter-array communication network is obtained by storing a charge on the gate of a TFT device, which is isolated from leaking away by a second TFT device. The latter is enabled by the near-zero leakage, e.g., with a leakage current below 10.sup.8 A, of the TFT devices. In embodiments of the disclosed technology, the charge can be coded in a binary way similarly to the geometrically factorized parameters, in an incremental way, or in any other suitable way. It can also be accumulated across all the active parameters connected to an output neuron and read out by a single shared sense amplifier. These optimizations enable additional cost and energy advantages compared to STT-MRAM/PCM solutions (which cannot be accumulated across synapses for a neuron) or VMCO or PCMO/ORAM solutions (where the number of binary levels are very reduced). The control of this network organization can be achieved by a wave pipelined based shared delay line to further reduce the cost overhead. The area of a single TFT device is expected to be scalable to 30-45 nm which is not extremely dense. But, because the technology is very cheap and the devices are integrated between the metal layers in the BEOL stack, this is not seen as a real issue for many applications especially when they are cost rather than performance-oriented. In addition, these devices are relatively slow (tens to hundreds of nsec). However, because of the dominant parameter storage in neuromorphic platforms, time multiplexing has a relatively high overhead and hence a smaller benefit than traditional microprocessor platforms. And the sample rate of practical neural algorithm applications does not require speeds beyond the sec (micro second) periods. So also that disadvantage is in practice not really an issue in most applications. Only the neurons have to be integrated in the bottom-layer FEOL. All the synapses are stacked on top in a potentially huge monolithic 3D BEOL stack. When the number of metal BEOL layers would need to become too high for the size of the overall neuromorphic platform, TSV technology allows further stacking dies in a so-called 3D SoC.
(32) The overall organization of a data plane template is shown in
(33) Any subset of the template of
(34) The remainder of this description is focused on embodiments of the disclosed technology implemented for the local arrays. The disclosed technology, however, is not limited thereto, and is also applicable for instance for the inter-array communication network.
(35) A more detailed view of the internal organization of such local arrays 40 is provided in
(36) The disclosed technology provides a control plane for controlling transfer of data to a data plane. For clarifying the basics of the disclosed technology, abstraction is made from whatever happens in the data plane. A schematic overview of such control plane 100 is given in
(37) The control plane 100 comprises: a first number n of at least two memory cells Dj1, Dj, Dj+1 for each storing a digitally coded parameter value, which can be a single or multi bit value. Each memory cell Dj1, Dj, Dj+1 has a data input electrode 101, a data output electrode 102 and a control electrode 103; at least two data input terminals 104 for each receiving a data input value and applying it to the data input electrode 101 of one of the memory cells (Dj1, Dj, Dj+1) to which the data input terminal 104 is coupled, and at least two data output terminals 105, each coupled to a data output electrode 102 of one of the memory cells Dj1, Dj, Dj+1.
(38) In the following, the data input terminals 104 and the data output terminals 105 which are connected to a particular memory cell Dj1, Dj, Dj+1 are called input and output terminals associated with the memory cell Dj1, Dj, Dj+1, respectively.
(39) The control plane 100 also generally comprises a first delay line 81 comprising n or n1 first delay elements 106.sub.j1, 106.sub.j, 106.sub.j+1. The first delay line 81 is arranged for receiving a stream of control bit values. The control plane 100 is configured such that each first delay element 106.sub.j1, 106.sub.j, 106.sub.j+1 controls based on its current control bit value, by means of an access control device 108.sub.j1, 108.sub.j, 108.sub.j+1, the transfer of data received by a memory cell Dj1, Dj, Dj+1, e.g., from a data plane 120, via its associated data input electrode 104 to its associated data output electrode 105 and as such, e.g., back to the data plane 120, once combined with the digitally coded parameter value stored in this memory cell Dj1, Dj, Dj+1. In particular embodiments, the signal applied to the first delay line 81 may be adapted such that the very first delay element 106.sub.j1 can be left out, as illustrated in
(40) The control plane 100 also generally comprises a second delay line 80 comprising n or n1 second delay elements 107.sub.j1, 107.sub.j, 107.sub.j+1. The second delay line 80 is arranged for receiving an enabling signal for enabling the control bit values of the stream of control bit values in the first delay line 81 to be passed to the control electrode 103 of the associated memory cells Dj1, Dj, Dj+1 such that, when data in appears at the data input electrode 104 associated with a memory cell Dj1, Dj, Dj+1, it is transferred in a controlled and synchronized fashion to the associated output electrode 105, depending on the stream of control bit values and the enabling signal. Similar to the first delay line 81, also the second delay line 80 does not need, in particular embodiments, the very first delay element 107.sub.j1 in the series, as again illustrated in
(41) In one embodiment, the disclosed technology relates to a control plane controlling the parameter update of a local array 40 as shown in
(42) The first delay line structure 81 allows achieving a very low area cost because it restricts the number of parallel control wires in the control plane which would typically be required for obtaining full connectivity of the control distribution. This way, area saving is implemented in the control plane, as a trade-off with time required to ripple all values through the system. The reduced wire overhead according to embodiments of the disclosed technology also restricts the capacitive load in the control signals and hence the dynamic energy of the control plane used for controlling the data plane. It comes at a cost of an increased latency to wait for all the parameters to be updated sequentially. But that is acceptable for the typical neural algorithms, both for feed-forward (in the data plane) and recurrent algorithms proposed in the literature, or for other applications wherein the parameters W.sub.j do not need to be updated frequently. Sometimes parameters have to be updated only once a day, sometimes faster. For instance running the first delay line 81 with a few 100 elements in the line at a clock rate of 100 MHz, may be sufficient if the sample rate is 10 kHz. The delay (or latency) of the delay line has to be selected such that the updating speed specification is met. Depending on how many parameter bits have to be controlled, a number P of parallel delay lines then have to be provided in the overall control plane organization. At least one delay line per row of memory cells is provided, but the disclosed technology is not limited thereto, and in particular embodiments multiple delay lines may be provided per row of cells.
(43) According to embodiments of the disclosed technology, a geometrical coding of the parameters Wj can be implemented. This is further illustrated in
(44)
(45) In principle, the above geometrical parameter coding approach can also be beneficial to reduce the amount of devices needed when charge-based functionality is exploited, as in the case of low-leakage TFT devices. In that case the width of these devices or the Vth threshold can be coded in geometrical powers (e.g., binary as illustrated above). The charge that is accumulated on these TFT devices is then proportional to the geometric code and so the same effect is realized as described for the resistive devices above. Hence, the proposed geometric coding of the parameter can be used in the inter-array communication network and also at the level of the local array, in the data plane. This is the case both for resistive and for charge-based devices.
(46) It is an advantage of embodiments of the disclosed technology that the control plane organization can be implemented at least partly in a back-end-of-line (BEOL) fabric with TFT technology, for instance ultra-low leakage TFT technology, for instance having a leakage value below 10.sup.8 A for each switch. The use of ultra-low leakage TFT devices implies that only parameter updates which change the data value have to be effectively loaded. All the other parameter values will stay semi-permanently stored on the gates of the ultra-low leakage TFT devices.
(47) It is therefore advantageous if not only the data busses are isolated by nearly non-leaking TFT switches, e.g., TFT switches having a leakage current of 10.sup.13 A or below, at their boundaries, but that also the control lines are isolated. That will allow to put on, in the order of 10.sup.13 switches with about 1 W overall leakage, as needed to approach brain-size communication networks. It is to be noted that such a switch typically contains many individual devices (depending on the circuit diagram used) so the specification on the individual devices of the switch is much lower, e.g., in the order 10.sup.15 and below. When a number of data bits share the same control line, it means a single control line can be shared also in the netlist for the local array. If this control line then has an isolation switch 52 at the entry point of the first delay line 81, where the information of the next control state is sent/driven, this isolation switch 52, e.g., TFT isolation switch, can make sure that the control line keeps its state (nearly without leaking) as long as that position of the 3-way data switch D.sub.j (e.g., transistor) should be maintained. In practice, many data values .sub.j are transferred across the 3-way switch D.sub.j in that position, sequentially in time, before it has to be changed. That avoids the waste of unnecessary dynamic energy to be spent on the control lines, as it does not leak and keeps its value. The data values .sub.j can for instance be transferred at a few 100 MHz in a packet of N values and for that entire packet the 3-way switch D.sub.j remains in the same control state. After this packet has passed, it can be that the 3-way switch D.sub.j is not used for some time and then everything is just maintained where the control state is still not modified. Also when the control state for the next data packet maintains the same path, the control line does not need to be updated. Only when a new data packet has to be transferred through another path, the control of the 3-way switch D.sub.j has to be updated and some dynamic energy has to be spent.
(48) In a particular embodiment, the TFT devices are advantageously implemented with Indium-Gallium-Zinc-Oxide (IGZO) or ITZO devices, which exhibit extremely low leakage, e.g., below 10.sup.15 A per device, leading to well below 10.sup.9 A for the entire switch, further reducing the global power and energy cost functions. The term IGZO encompasses all realizable varieties of the compound In.sub.xGa.sub.yZn.sub.zO.sub.w in terms of the values of the atomic numbers x, y, z and w, for example In.sub.2Ga.sub.2ZnO.
(49) The disclosed technology also relates to a method for synchronizing the stream of control bit values with the proper insertion rate of new weight values corresponding to the rate at which new data values are introduced in the memory cells D.sub.j. This is also illustrated in
(50) The proof of why this operates correctly is shown by induction in
(51) The combination of this more detailed control plane circuit implementation with the neuromorphic data plane of the local arrays is illustrated in
(52) In particular embodiments, the method may use so-called wave pipelining to realize the first and second delay lines. In that case no clock signal is required. Wave-pipelining implements pipelining in logic without the use of external latches or registers. It provides a method for significantly reducing clock loads and the associated area, power and latency while retaining the external functionality and timing of a synchronous circuit.
(53) In alternative embodiments it is also possible to utilize clock signals to achieve a fully synchronous implementation, as shown in
(54) In yet other embodiments, the control plane approach can also utilize local decoders to reduce the number of parameter control bits that have to be injected in the delay lines. If a n-to-2.sup.n decoder, e.g., a 2 to 4 decoder as in
(55) In still other embodiments, the control plane structure is also usable for controlling the data plane of the inter-array communication network. A basic 3-way switch for this network is shown in
(56) These 3-way switches as illustrated in
(57) The control bits required to steer the switches 131 are illustrated in
(58) In one embodiment the data plane of the communication network is implemented in a 3D layer structure, for example 3D integrated BEOL, a 3D package structure or a combination thereof. A 3D layer can be implemented with TFT, e.g., IGZO. Doing so strongly improves the scalability.
(59) The time-division multiplexing is preferably organized according to a Local Parallel Global Sequential scheme.
(60) In one embodiment the distributed loop buffer concept as described in EP1958059, which was initially intended for conventional instruction-set processor programming, is advantageously reused. This is a very energy-efficient solution to realize the look-up-table storing the (instruction) control bits for the potentially huge amount of devices Dj to be controlled.
(61) For the neuromorphic synapse control, however, the distributed loop buffer concept should be reused in a re-projected form. For instance, in the illustration of
(62) One solution according to embodiments of the disclosed technology indeed allows meeting the above-mentioned objectives. The proposed solution allows for scaling by adapting the number of parallel delay lines P. The flexibility is created by the potential to load any control bit sequence into the data plane. Further, by implementing the control plane at least for a part in a BEOL fabric with TFT devices, the scalability and in particular the leakage energy-efficiency of the proposed solution is even more improved. The realisation of devices in BEOL allows directly reducing the vertical wire length in a substantial way (because one does not have to go back and forth to the FEOL layer for all devices in the delay lines) and also the horizontal wire length is reduced because a significant amount of devices can be removed from the FEOL layer, so the overall area then reduces with a resulting average wire reduction as added advantage. So, as a result, the specific trade-offs between the main design objectives are changing, in particular area, energy and performance. This BEOL TFT device implementation can advantageously be applied in this context because the control bit values can be expected to be stable for long periods of time, so they do not have to switch at the most advanced clock rates, which otherwise would only have been feasible with the strongly speed-optimized FEOL devices.
(63) While the disclosed technology has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative or exemplary and not restrictive. The foregoing description details certain embodiments of the disclosed technology. It will be appreciated, however, that no matter how detailed the foregoing appears in text, the disclosed technology may be practiced in many ways. The present innovations are not limited to the disclosed embodiments.
(64) Other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing such technology, from a study of the drawings, the disclosure and the appended claims. In the claims, the word comprising does not exclude other elements or steps, and the indefinite article a or an does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage. A computer program may be stored/distributed on a suitable medium, such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems. Any reference signs in the claims should not be construed as limiting the scope.