DEVICE, SENSOR NODE, ACCESS CONTROLLER, DATA TRANSFER METHOD, AND PROCESSING METHOD IN MICROCONTROLLER
20220157361 · 2022-05-19
Assignee
Inventors
- Masanori Natsui (Sendai-shi, JP)
- Daisuke Suzuki (Sendai-shi, JP)
- Akira Tamakoshi (Sendai-shi, JP)
- Takahiro Hanyu (Sendai-shi, JP)
- Tetsuo ENDOH (Sendai-shi, JP)
- Hideo OHNO (Sendai-shi, JP)
Cpc classification
G06F17/142
PHYSICS
G06F12/06
PHYSICS
G06F12/00
PHYSICS
G11C5/14
PHYSICS
International classification
Abstract
The present invention provides a device with low power and high performance, which can be applied to sensor nodes, a sensor node using the same, an access controller, a data transfer method, and execute a processing method in a microcontroller. The device has: an MRAM; a non-volatile CPU configured to include a nonvolatile memory; a non-volatile FPGA-ACC configured to include a nonvolatile memory and execute a part of operations on the nonvolatile CPU; and a power-gating control unit that controls power supply to each memory cell in the MRAM, the non-volatile CPU, and the non-volatile FPGA-ACC. The device is further provided with an access controller that controls accesses to the MRAM by reading data in advance and backing up the data when data is to be read from the MRAM.
Claims
1. A device, comprising: an MRAM configured to include multiple memory cells separated into multiple regions including selection transistors and MTJs; a nonvolatile CPU configured to include a nonvolatile memory; a nonvolatile FPGA-ACC configured to include a nonvolatile memory and execute a part of operations on the nonvolatile CPU; and a power-gating controller configured to control power supply to each memory cell in the MRAM, the nonvolatile CPU, and the nonvolatile FPGA-ACC.
2. The device according to claim 1, wherein the nonvolatile CPU configured to transmit data to the MRAM, allowing the nonvolatile FPGA-ACC to read the data from the MRAM; and the nonvolatile FPGA-ACC configured to transmit data to the MRAM, allowing the nonvolatile CPU to read the data from the MRAM.
3. The device according to claim 1, wherein the nonvolatile FPGA-ACC comprises a nonvolatile memory function, which stores configuration data.
4. The device according to claim 1, wherein each memory cell in the MRAM is configured with 2T-2MTJ including two selection transistors and two MTJs.
5. The device according to claim 1, wherein each of the plurality of regions in the MRAM is configured with divided blocks and power-gating is performed for each of the divided blocks.
6. The device according to claim 1, wherein the nonvolatile FPGA-ACC comprises: a tile configured to perform a part of operations on the nonvolatile CPU instead of the nonvolatile CPU; and a DSP (Digital Signal Processor) configured to perform a part of the operations on the nonvolatile CPU instead of the nonvolatile CPU faster than the tile.
7. The device according to claim 1, further comprising: an access controller configured to control an access to the MRAM by reading and storing data in advance and backing up the data when the data are to be read from the MRAM.
8. The device according to claim 7, wherein the access controller receives a data-reading instruction from the nonvolatile CPU and determines whether or not the data have been read from the MRAM in advance, and if the data have been already read, the access controller transmits the stored data to the nonvolatile CPU.
9. The device according to claim 7, wherein the access controller comprises: an address-storing register configured to receive an input about an MRAM address that is a reading destination, the address-storing resister storing the address; a multiplexer configured to output multiple destinations of the MRAM stored in the address-storing register to the MRAM for reading; multiple data-storing registers configured to store data read from the MRAM; and a comparator configured to receive a reading instruction together with a specified reading destinations and to compare an address related to the specified reading destination with a reading address stored in the address-storing register, wherein the access controller configured to receive a reading instruction together with a specified reading destinations and to output data already read and stored in any one of the plurality of data-storing registers in response to the reading instruction when the comparator determines the data have been read from the MRAM in advance.
10. The access controller according to claim 9, further comprising: a prefetch address generator connected to the multiplexer, wherein the prefetch address generator generates an address including a reading address destination stored in the address-storing register.
11. The device according to claim 1, being usable when an average power is no more than 100 μW.
12. The device according to claim 1, wherein an intermittent operation interval can be set to no more than 100 ms.
13. A sensor node, configured to include the device according to claim 1.
14. An access controller, comprising: an address-storing register configured to receive an input about an MRAM address that is a reading destination, the address-storing resister storing the address; a multiplexer configured to output multiple destinations of the MRAM stored in the address-storing register to the MRAM for reading; multiple data-storing registers configured to store data read from the MRAM; and a comparator configured to receive a reading instruction together with a specified reading destination and to compare an address related to the specified reading destination with a reading address stored in the address-storing register, wherein the access controller configured to receive a reading instruction together with a specified reading destination and to output data already read and stored in any one of the data-storing registers in response to the reading instruction when the comparator determines the data have been read from the MRAM in advance.
15. The access controller according to claim 14, further comprising: a prefetch address generator connected to the multiplexer, wherein the prefetch address generator is configured to generate an address including a reading address destination stored in the address-storing register.
16. A data transfer method between a CPU and an MRAM via an access controller, comprising: the access controller receiving a data-reading instruction from the CPU together with a reading address; the access controller reading multiple address data including the reading address from the MRAM in advance; the access controller receiving a data-reading instruction from the CPU together with a next reading address; and the access controller determining whether or not the data have been already read and responding to the reading instruction, if the reading instruction is for an already-read data, using the data read in advance without performing data-reading from the MRAM.
17. A processing method in a microcontroller comprising a nonvolatile CPU, an MRAM, and a nonvolatile FPGA-ACC as a reconfigurable computing module, wherein the MRAM is configured with a region to store data transmitted between the nonvolatile CPU and the nonvolatile FPGA-ACC, and the method comprises: the MRAM writing data into the region, the data being input from the nonvolatile CPU to the nonvolatile FPGA-ACC; the nonvolatile CPU passing a signal indicating completion of preparation for the writing and starting a calculation to the nonvolatile FPGA-ACC; the nonvolatile FPGA-ACC starting an operation by using the data written in the region; and the nonvolatile CPU being passed an operation result computed by the nonvolatile FPGA-ACC to the nonvolatile CPU through the region.
18. The processing method in the microcontroller according to claim 17, wherein the nonvolatile CPU and the nonvolatile FPGA-ACC are subject to power-gating control.
19. The processing method in the microcontroller according to claim 18, wherein the power-gating control does not include data-store or restore operations.
20. The processing method in the microcontroller according to claim 17, wherein computing by the nonvolatile FPGA-ACC relates to any one of processing of a Laplacian filter, a DCT (Discrete Cosine Transform), an FIR (Finite Impulse Response) filter, and an FFT (Fast Fourier Transform).
21. The processing method in the microcontroller according to claim 17, wherein the nonvolatile CPU and a nonvolatile FPGA-based accelerator parallelly process sequential processing.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0055]
[0056]
[0057]
[0058]
[0059]
[0060]
[0061]
[0062]
[0063]
[0064]
[0065]
[0066]
[0067]
[0068]
[0069]
[0070]
[0071]
[0072]
[0073]
[0074]
[0075]
[0076]
[0077]
[0078]
[0079]
[0080]
[0081]
[0082]
[0083]
[0084]
[0085]
[0086]
[0087]
[0088]
[0089]
[0090]
[0091]
DESCRIPTION OF EMBODIMENTS
[0092] Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. Variations and modifications related to the description of the embodiment can be made without departing from the scope of the present invention.
[0093]
[0094] In device 1 according to the present invention, specifically, in the MCU (Microcontroller Unit), NV-CPU 3 transmits data to STT-MRAM 2, allowing NV-FPGA 4 to read the data from STT-MRAM 2; and NV-FPGA 4 transmits data to STT-MRAM 2, allowing NV-CPU 3 to read the data from STT-MRAM 2. That is, the following operations are performed: STT-MRAM 2 stores the results computed by NV-CPU 3; using the results stored in STT-MRAM 2, NV-FPGA 4 further performs computing and returns the results to STT-MRAM 2; and NV-CPU 3 receives from STT-MRAM 2 the results computed by NV-FPGA 4.
[0095] In the embodiment of the present invention, it is possible to provide a microcomputer appropriate for a sensor node and the like which enables both high performance (for example, operating frequency of about 200 to 300 MHz) and low power consumption (for example, no more than 100 μW). As for the low power consumption, using a nonvolatile memory other than an MRAM may have a certain effect because it can reduce the standby power. However, in a case where a nonvolatile memory other than an MRAM is used, high-speed data-writing or -reading is impossible.
[0096] Thus, to achieve high performance with several hundred MHz operating frequency, in the embodiment of the present invention, in the MCU as device 1, an MRAM, preferably, STT-MRAM 2 is employed for a region to store data related to computing by the CPU and the FPGA.
[0097] Device 1 may be referred to as a nonvolatile microcomputer chip, a nonvolatile microcomputer, or a nonvolatile microcontroller unit.
[0098] STT-MRAM 2 is configured to include multiple memory cells separated into multiple regions including selection transistors and MTJs (Magnetic Tunneling Junctions). Preferably, STT-MRAM 2 is configured with multiple sub-array blocks and each of the blocks has a switch to turn ON/OFF the power from a power supply unit, not shown in the figure. Power gating controller 5 allows STT-MRAM 2 to be power-gated per block division. Here, the block division is a separated block in the multiple regions in the MRAM; in
[0099] In NV-CPU 3, the memory installed in the module is constituted of only nonvolatile memories. NV-CPU 3 has a switch to turn ON/OFF the power for the whole module from a power supply unit, not shown in the figure. Since NV-CPU 3 is constituted of nonvolatile memories, it is unnecessary to back up or write data in the CPU when the switch is turned ON/OFF (that is, whenever power-gated), and power-gating control can be performed. Naturally, since no data is backed up or written, there is no power consumption. It is especially effective in a device that intermittently executes a certain number of operations and enters into a standby state between processes, especially in an IoT sensor node, because no data back up or writing is required in power-gating control.
[0100] NV-FPGA 4 is configured to include a nonvolatile memory. Each tile in NV-FPGA 4 has a switch for power-gating, which enables to turn ON/OFF the power from a power supply unit, not shown in the figure. In addition, when a DSP is installed in NV-FPGA 4, the DSP has a switch to turn ON/OFF the power from a power supply unit to the DSP, not shown in the figure. Since NV-FPGA 4 is configured to include a nonvolatile memory, it is unnecessary to back up or write data in the FPGA when the switches are turned ON/OFF (that is, whenever power-gated), and also unnecessary to save or write configuration data. No need for backing up or writing data and the configuration data means no power consumption for that. A conventional device installed with a volatile FPGA and without a nonvolatile FPGA requires backing up and writing data and the configuration data whenever power-gated. However, the embodiment of the present invention, in which an FPGA is nonvolatile, does not require backing up or writing data and the configuration data whenever power-gated.
[0101] Power gating controller 5 controls power supply to each MRAM sub-array in STT-MRAM 2, NV-CPU 3, and each tile and a DSP in NV-FPGA 4 and supplies power only to the designated modules in operation.
[0102] Here, in STT-MRAM 2, each memory cell is preferably configured with 2T-2MTJ including two selection transistors and two MTJs. The STT-MRAM includes 1T-1MTJ, 2T-2MTJ, and 4T-4MTJ. In order to perform power-gating, the 4T-4MTJ requires peripheral equipment for power-gating which causes power consumption, thus, unpreferable. On the other hand, the 1T-1MTJ and the 2T-2MTJ, are suitable because their cell configurations themselves have a power-gating function; and in order to enhance the performance, the 2T-2MTJ is more preferable from the view of the number of bits. In addition, 2T-2MTJ provided with WL, BL, /BL, SL, and/SL in each cell may be sufficient, however, 2T-2MTJ provided with WL, BL, /BL, and SL in each cell where SL and/SL are shared is the most preferable because it can suppress the lay-out size. Further, it can adjust the writing pulse width in response to writing characteristics of the MTJ, which can suppress the writing current to optimize it.
[0103] A concept of device 1 according to an embodiment of the present invention will be described.
[0104] Thus, this embodiment of the present invention can be implemented by using nonvolatile memories for all modules in device 1. A usual IoT sensor node intermittently executes a certain number of operations and enters into a standby state between processes. Conventional CMOS-based architectures use volatile internal memories, which require data transfer between internal and external memories to back up data before turning off the power. The embodiment does not require the data back up. In addition, a device in which only nonvolatile memories are used for all modules (MCU) does not require external memories and does not need to transfer the stored data. Therefore, the power-gating technique can be effectively applied at a granular level and can actively cut wasteful power consumption. As NV-FPGA 4 is configured to include a nonvolatile memory, it is unnecessary to back up or write data in the FPGA whenever power-gated, and unnecessary to back up or write the configuration data. It is especially effective in a device that intermittently executes a certain number of operations and enters into a standby state between processes, especially in an IoT sensor node, because no data back up or writing is required in power-gating control.
[0105]
[0106] Thus, it can reduce a sequential processing time between operation intervals, and enables further power-saving. Here, an FPGA configures an MTJ on a CMOS, enabling a nonvolatile FPGA with super low power consumption.
[0107] As described above, a concept of the present invention is to realize a microcomputer appropriate for a sensor node and the like, which enables both high performance (for example, operating frequency of about 200 to 300 MHz) and low power consumption (for example, no more than 100 μW). As for the low power consumption, using a nonvolatile memory other than an MRAM may have a certain effect because it can reduce the standby power. However, in a case where a nonvolatile memory other than an MRAM is used, it is impossible to realize both high-speed data-writing and -reading and computing performance with several hundred MHz operating frequency. On the other hand, in a case where an MRAM is used, it is possible to realize high-speed performance with high-speed writing and reading function and low-power consumption by using nonvolatile memories, simultaneously. Thus, it exerts a great effect on a microcomputer as a device configured with an NV-CPU, an NV-FPGA, and an MRAM memory according to an embodiment of the present invention.
[0108] Thus, in order to realize a microcomputer with both high performance and low power consumption appropriate for a sensor node, implementation of an MRAM as a memory to a microcomputer installed with a CPU and an FPGA requiring a high computing performance exerts a great effect.
[0109] Next, a sensor node using the device in
[0110] The sensor node 110 includes: a sensor element 111 to measure various physical quantity; an MCU 112 to process data from sensor element 111 into information; a communications unit 113 to output the information processed by MCU 112 and various control data to the outside; and a power supply 114 to convert natural energy, artificial vibrations, and the like, into power and store it.
[0111] MCU 112, a device with a configuration shown in
[0112] Next, a concrete configuration of device 1 will be explained.
[0113]
[0114] Ydec is arranged at both left and right sides of the left-side array and the right-side array because an operation to read memory data is different in left and right. The outer Ydec flows a constant current into a reading-subject memory cell and generates a voltage signal corresponding to a cell state (or, a resistance state of the MTJ). The voltage signal is amplified by a sense amplifier (SA) attached to the inner Ydec and data are extracted from the reading-subject memory cell. In the diagram, a solid arrow represents a control signal and a dotted arrow represents a data signal.
[0115] STT-MRAM 11 is not provided with a switch for power-gating because there is no power supply line in each memory cell.
[0116] As shown in the right side of
[0117]
[0118] As described with reference to
[0119]
[0120] Thus, access controller 14: includes an address-storing register (reg) provided at an input side of STT-MRAM 11, which receives an input about an address in STT-MRAM 11 that is a reading destination and stores the address; a multiplexer (Multiplexer) that reads multiple destinations of STT-MRAM 11 specified in the address-storing register (reg); multiple data-storing registers (reg 0, reg 1) that store data read from STT-MRAM 11; and a comparator (cmp) that receives a reading instruction together with a specified reading destination and compares an address with an reading destination address stored in the address-storing register (reg).
[0121] The data address to be used is passed from CPU 12 to access controller (Accelerator) 14 via HADDR; at that time, the address is stored in the left side register (reg) in the Accelerator of
[0122] In a case 16-bit instructions stored in the consecutive memory addresses are consecutively executed, data for four instructions are captured into reg 0 and reg 1 in one time by the above-described processing, therefore, the data passed from CPU 12 via HADDR is compared with the data stored in reg, and if the above-described conditions are satisfied, the data captured in reg 0 and reg 1 in advance and corresponding to the address specified in HADDR is specified by a computing unit, “Output control” in the FIG., then used as an output to HRDATA, or a reading instruction from CPU 12. At that time, in response to a reading instruction from CPU 12 via HADDR, data are not passed from MRAM 11 but from reg 0 or reg 1; therefore, data are not returned at a possible transfer speed between the Accelerator and the MRAM (for example, 50 MHz) but returned at a possible transfer speed between CPU 12 and the Accelerator (for example, 200 MHz).
[0123] In a case 32-bit instructions stored in the consecutive memory addresses are consecutively executed, data for two instructions are captured into reg 0 and reg 1 in one time by the above-described processing, therefore, the processing same as the above is performed. In this case, the data is returned at 100 MHz.
[0124] The access from CPU 12 to MRAM 11 is performed in multiplex and the read data are temporally saved in registers (reg 0, reg 1). If accesses to the same memory address are repeated, the data stored in registers is re-used instead of memory data.
[0125] In
[0126]
[0127] Thus, speed enhancement has been achieved by so-called best effort manner. The degree of actual performance improvement depends on the program to be executed. Specifically, it is possible even in a program in which memory accesses to random accesses due to conditional branching frequently occur; however, it is preferably effective in processing to execute a sequential processing such as an MCU for a sensor node application. In such processing, accesses to the memory are also regular, it therefore functions very effectively. As a result, similar effects to cache can be obtained without increasing the area or the power overhead.
[0128] Here, this access controller will be described in more detail. For example, a test chip for a nonvolatile VLSI processor using a 40 nm MOS/MTJ process has been fabricated. It can be designed by using an automatic design flow and a cell library for an MTJ-based NV-LIM LSI. An area overhead due to introduction of the accelerator circuit can be estimated as 13.6% on the basis of the number of gate in each block. Note that each block, separately designed for evaluation of overhead in this case, can be integrated and laid-out as one circuit block. In that case, the area overhead would be expected to be even smaller.
[0129]
[0130]
[0131] Low performance MRAM (LP-MRAM): reading/writing at 50 MHz.
[0132] Middle performance MRAM (MP-MRAM): reading/writing at 100 MHz
[0133] High Performance MRAM (HP-MRAM): reading/writing at 200 MHz.
[0134] Note that all the MRAMs are designed in the same manner.
[0135]
[0136] Here, performances in the case with and without the accelerator circuit are compared. In the evaluation, an area, power consumption, and a processing performance are evaluated in three type of systems with MRAMS having different performances described above, a system with a cache, and a system with the accelerator circuit.
TABLE-US-00001 TABLE 1 Conventional Example Present Example Performance w/LP-MRAM w/MP-MRAM w/HP-MRAM w = cache w/LP-MRAM Area ratio 1.0 1.0 1.0 2.36 1.03 Voltage (V) 1.1 1.1 1.1 1.1 1.1 Frequency (MHz) 50 100 200 50/200 50/100/200 Peak perf. (MIPS) 49.56 99.12 198.24 198.24 198.24 Power (mW) 2.014 2.702 3.524 2.487 2.170 Peak efficiency ratio 1 1.49 2.29 3.24 3.71 Temperature range (° C.) 0-100 30-100 70-100 0-100 0-100
[0137] As shown in TABLE 1, in the implementations with middle or high performance MRAM, their efficiency decrease as the MRAM consumes more power, which narrows temperature range ensuring the operation. In the implementation with a cache, the performance could be expected to be higher; however, the area overhead becomes very large. On the other hand, in the implementation with the accelerator circuit, an accelerative unit can be embedded with a small overhead area and the operating frequency of the CPU can be accelerated without changing performance requirements for the MRAM. As a result, area efficiency can be improved. Thus, comparing with the implementation with a conventional cache, its performance efficiency (MIPS/mW) improves from 2.29 times to 3.71 times, and reading and writing operations can be ensured in a wide temperature range.
[0138] The performance of the access controller varies depending on programs that should be executed; however, it is revealed that a benchmark using some sample programs enables the access controller to perform at more than about 100 MHz even in a filter operation by relatively large capacity memory access, and that it very effectively performs in programs with relatively few memory accesses or branches.
[0139] Next, the NV-CPU will be explained. All the flip-flops used for the NV-CPU are MTJ-based nonvolatile flip-flops. Since they are nonvolatile, there is no need to back up data for power-gating.
[0140]
[0141]
[0142] In
[0143] The FF circuit can be mainly divided into three units: a master unit, a slave unit, a nonvolatile memory unit. The master unit captures an input signal DATA when a clock is “0” and the master unit passes the signal to the slave unit when the clock is “1”. The slave unit captures the DATA from the master unit and further outputs it to “Q” and “QB” when the clock is “1”. The slave unit performs nothing when the clock is “0”. The combination of the master unit and the slave unit works as a usual master-slave type D flip-flop.
[0144] The nonvolatile memory unit includes: two MTJ elements that complementally store 1-bit memory; and a writing circuit that generates current to write data into the MTJ elements. The nonvolatile memory unit writes data captured in the slave unit into the nonvolatile memory or reads the data from the nonvolatile memory to the slave unit depending on the control signal LB or WB.
[0145] Next, an NV-FPGA will be explained in detail.
[0146]
[0147] A routing switch, which is a basic component of the above-described CB and SB, is a circuit to control turning ON/OFF of a path transistor on the basis of memory data. The memory data are stored in an MTJ-based latch with an area efficiency. The path transistor is implemented using an NMOS switch. The controller is used to perform power-gating at block level. Each function block is optimally turned off.
[0148] The switch block (SB) and the connection block (CB) are both configured to include a basic component referred to as a routing switch, which includes a nonvolatile storage area.
[0149] Note that the tile includes a decoder and a driver, which embodies a reconfigurable computing module after the fabrication.
[0150] The configurable logic block (CLB) is configured to include a basic component referred to as a logic element, which is constituted of a nonvolatile Lookup Table circuit (nonvolatile LUT circuit) and a nonvolatile flip-flop (nonvolatile FF), both having a nonvolatile memory function.
[0151]
[0152] A truth table for an arbitrary 6-input logical function is stored in series connected MTJ devices with 64 pairs in the MTJ configuration array, such as (R.sub.0, R.sub.64), (R.sub.1, R.sub.65), (R.sub.63, R.sub.127). The writing operation to store a logical function information into the MTJ devices is performed by activating a word line (WL) and a bit line (BL). It is almost same as a writing operation in a conventional magnetic RAM (MRAM). BL.sub.0 and BL.sub.2 are shared between the MTJ configuration array and a programmable calibration resistor. A writing access transistor Mwc is shared between the 64 MTJ pairs in the MTJ configuration array. The logical operation of the LUT circuit is completely different from the reading operation of the MRAM because neither the BL nor the WL are used in the operation.
[0153] When an EN is set to high and both an NMOS selector and an NMOS reference tree are activated by complementary logic inputs X, a current I.sub.F and I.sub.REF respectively pass through a pair of MTJ corresponding in the MTJ configuration array and the programmable calibration resistor. When a difference between I.sub.F and I.sub.REF is sensed, a complemental full swing outputs (Z, Z′) are generated by the sense amplifier.
[0154] In order to ensure a sufficient sensing margin, series/parallel connected MTJ devices in the MTJ configuration array and the programmable calibration resistor are configured as follows. First, in the MTJ configuration array, when the stored data Y are 0, it is configured to (R.sub.AP, R.sub.AP); and when the stored data Y are 1, it is configured to (R.sub.P, R.sub.P). When a resistance value of the MTJ device follows Gaussian distribution N (R, σ.sub.R.sup.2) (where R is an average value and σ.sub.R is a standard deviation), the total resistance value of series connected MTJ devices follows N (2R, 2σ.sub.R.sup.2). That means the resistance distribution can be narrowed to avoid an overlap of (R.sub.P, R.sub.P) state and (R.sub.AP, R.sub.AP) state.
[0155] Next, in the programmable calibration resistor, the total resistance is adjusted to insert I.sub.REF in the middle of I (R.sub.P, R.sub.P) and I (R.sub.AP, R.sub.AP). As a MTJ device has two different resistance values, by using four MTJ devices (R.sub.r0, R.sub.r1, R.sub.r2, R.sub.r3), 16 different reference resistance values can be obtained. The total resistance value can be adjusted following the fluctuation of the I.sub.F current level due to process variation. Note that R.sub.P represents a low resistance and R.sub.AP represents a high resistance in the MTJ device.
[0156]
[0157]
[0158]
[0159]
[0160]
[0161] In the nonvolatile FF circuit shown in
[0162] Preferably, a DSP (Digital Signal Processor) is incorporated. Using a DSP enables even a relatively large volume computing. The DSP is also provided with a power switch (PS) and a controller to turn ON/OFF the power, enabling each tile to be power-gated. Similarly in the tile, a switch block and a connection block in the DSP are configured to include a basic component, referred to as a routing switch, which includes a nonvolatile storage area.
[0163] As described above, each basic component in the NV-FPGA includes a nonvolatile memory, which stores the configuration data. In addition, the nonvolatile memory also stores a memory state of the flip-flop. Therefore, it is unnecessary to back up data in an external nonvolatile memory immediately before turning off the power or write back the data after turning on the power again, enabling easy turning ON/OFF the power. By writing a certain computing into a nonvolatile FPGA in advance and turning on the power as needed basis, computing can be immediately started and the CPU processing can be accelerated. In addition, turning off the power during the non-use period can avoid wasteful power consumption.
[0164] In the circuit configuration of a DSP core in
when (SEL [0], SEL [1])=(0, 0), OUT=A×B;
when (SEL [0], SEL [1])=(0, 1), OUT=A×B+C;
when (SEL [0], SEL [1])=(1, 0), OUT=A+B; and
(SEL [0], SEL [1])=(1, 1) is not used.
[0165] Here, any circuit configuration can be used and other configurations are possible.
[0166] Thus, NV-FPGA 4 preferably has one or more tiles to perform a part of operations on CPU 3 and a DSP (Digital Signal Processor) to perform a part of operations on CPU 3 faster than the tile because they can satisfy both low power consumption and high performance as shown in
Implementation Examples
[0167] Next, implementation examples will be explained. TABLE 2 provides specifications for chips actually fabricated.
TABLE-US-00002 TABLE 2 MOS Tech. Node 40-nm LVT, SVT, HVT MTJ Tech. Node 39-nm perpendicular (Electrically determined size) Supply Voltage 1.1-1.3 V (Core) 1.8/3.3 V (Peripherals) MRAM Capacity 64 kB (4 kB sub-array × 16) MRAM Sub- 2T-2MTJ cell Array Structure 256 cols. × 64 rows × 2 FPGA Capacity 1,176 six-input LUTs 7 DSPs Transistor Count 4.8M MTJ Count 1.5M
[0168]
[0169]
[0170]
[0171]
[0172] In the case without power-gating, average power consumption is kept high, 1000 regardless of the intermittent operation interval. On the other hand, in the case with power-gating, the longer the intermittent operation interval, the less the average power consumption. In addition, the power-gating significantly reduces the power consumption. Further, in the case with power-gating and also acceleration processing by FPGA, the longer the intermittent operation interval, the less the average power consumption, furthermore, the FPGA used in the same intermittent operation interval far greater reduces the power consumption comparing the case without the FPGA.
[0173] When the intermittent operation interval is 50 msec, the average power consumption with power-gating is 100 μW, and the case also with FPGA is 47.14 μW, which achieves reduction of power consumption by 54% comparing the case without power-gating.
[0174] The results indicating in
[0175]
[0176] Thus,
[0177] The results indicates that a device fabricated based on the present invention, in which the MRAM, the NV-CPU, and the NV-FPGA are configured with a nonvolatile memory for inactive units using an MTJ, can cut a wasteful power consumption by using power-gating technique, in which it is unnecessary to back up the data stored in a memory cell in the MRAM, the NV-CPU, and the NV-FPGA and a power controller stops power supply to each module in the MRAM, the NV-CPU and the NV-FPGA, or inactive units. In addition, a reconfigurable computing module in the FPGA implements various signal processing at high speed. Further, an access controller enables an effective data transfer between the NV-CPU and the MRAM, which allows the whole system to operate at high speed. Thus, it has been found that a microcomputer as a device with low-power and high-performance can be provided.
[0178]
[0179] TABLE 3 represents the number of times of using tiles, DSPs, LUTs, and FFs, the maximum operating frequency, and the power at 200 MHz in the processing by a Laplacian filter, a DCT (Discrete Cosine Transform), an FIR (Finite Impulse Response) filter, and an FFT (Fast Fourier Transform).
TABLE-US-00003 TABLE 3 Max Power@200 MHz Configured Device utilization frequency operation application Tiles DSPs LUTs FFs [MHz] [mW] Laplacian 63 1 501 325 228 3.21 filter DCT 71 2 566 297 253 3.50 FIR filter 94 1 752 356 205 4.57 FFT 38 2 302 199 236 1.94
[0180] The maximum operating frequency exceeds 200 MHz regardless of the kinds of operation; in the DCT, it exceeds 250 MHz. The power at 200 MHz decreases in the order of the FIR filter, the DCT, the Laplacian filter, and the FFT. As for the kinds of function used for the operation, the number of used times increases in the order of the DSPs, the tiles, the FFs, and the LUTs.
[0181] The present example will be compared with other conventional embodiments.
[0182] Although the present description uses the abbreviations “NV-CPU” and “NV-FPGA”, they can be interpreted as “nonvolatile CPU” and “nonvolatile FPGA”. In addition, the term “memory cell”, used in an NV-CPU, an NV-FPGA, and an MRAM, can be referred to as storage area. The NV-FPGA is an FPGA-ACC shown in
[0183] Concepts of embodiments of the present inventions are as follows.
[0184] First, a device includes:
[0185] an MRAM configured to include multiple memory cells separated into multiple regions including selection transistors and MTJs;
[0186] a nonvolatile CPU configured to include a nonvolatile memory;
[0187] a nonvolatile FPGA-ACC configured to include a nonvolatile memory and execute a part of operations on the nonvolatile CPU; and
[0188] a power-gating controller that controls power supply to each memory cell in the MRAM, the nonvolatile CPU, and the nonvolatile FPGA-ACC.
[0189] This allows a configuration as an FPGA-based accelerator to separately perform computing of the nonvolatile FPGA-ACC and the nonvolatile CPU; and to store data in the MRAM to be related to computing of the nonvolatile FPGA-ACC and the nonvolatile CPU. First, since both of the FPGA and the CPU are nonvolatile, it is unnecessary to back up or write the data and the configuration data in the FPGA whenever power-gating is performed by the power gating controller, further, it is also unnecessary to back up or write the data in the CPU (see
[0190] Second, the above-described device farther includes an access controller that controls accesses to the MRAM by reading data in advance and backing up the data when the data are to be read from the MRAM. Such an access controller receives a data-reading instruction from the nonvolatile CPU and determines whether or not the data have been read from the MRAM in advance, and if the data have been already read, the access controller transmits the stored data to the nonvolatile CPU. This enables a multiplexing access from the CPU to the MRAM and allows the read data to be temporally saved in the accelerator. When accesses to the same memory address are repeated, temporally-saved data are not read from the MRAM and is reused (see
[0191] Such a configuration of an accelerator can be embodied as shown in
[0192] The access controller includes:
[0193] an address-storing register that receives an input about an MRAM address that is a reading destination, the address-storing resister storing the address;
[0194] a multiplexer that outputs multiple destinations of the MRAM stored in the address-storing register to the MRAM for reading;
[0195] multiple data-storing registers that store data read from the MRAM in response to an input from the multiplexer; and
[0196] a comparator that receives a reading instruction together with a specified reading destination and compares an address related to the specified reading destination with a reading address stored in the address-storing register, wherein the access controller that receives a reading instruction together with a specified reading destination and outputs data already read and stored in any one of the data-storing registers in response to the reading instruction when the comparator determines the data have been read from the MRAM in advance.
[0197] In addition, the access controller further including a prefetch address generator connected to the multiplexer, wherein
[0198] the prefetch address generator generates an address including a reading address destination stored in the address-storing register.
[0199] Third, a data transfer method between a CPU and an MRAM via an access controller, including:
[0200] the access controller receiving a data-reading instruction from the CPU together with a reading address;
[0201] the access controller reading multiple address data including the reading address from the MRAM in advance;
[0202] the access controller receiving a data-reading instruction from the CPU together with a next reading address; and
[0203] the access controller determining whether or not the data have been already read and responding to the reading instruction, if the reading instruction is for already-read data, using the data read in advance without performing data-reading from the MRAM.
[0204] That is, it can simultaneously perform a prefetch data request, data storage in the access controller, and a fetch operation, and also can sequentially perform fetch operations (see
[0205] Fourth, a processing method in a microcontroller including a nonvolatile CPU, an MRAM, and a nonvolatile FPGA-ACC as a reconfigurable computing module, wherein
[0206] the MRAM is configured with a region to store data transmitted between the nonvolatile CPU and the nonvolatile FPGA-ACC,
[0207] the method comprises: [0208] the MRAM writing data into the region, the data being input from the nonvolatile CPU to the nonvolatile FPGA-ACC; [0209] the nonvolatile CPU passing a signal indicating completion of preparation for the writing and starting a calculation to the nonvolatile FPGA-ACC; [0210] the nonvolatile FPGA-ACC starting an operation by using the data written in the region; and [0211] the nonvolatile CPU being passed an operation result computed by the nonvolatile FPGA-ACC to the nonvolatile CPU through the region.
[0212] This method enables an efficient processing in a microcontroller because it is unnecessary to pass information about the memory address storing data required for processing between a nonvolatile CPU, a nonvolatile FPGA-ACC, and an MRAM.
[0213] In the above-described processing method, in particular, it is preferable that power-gating control is performed for the nonvolatile CPU and the nonvolatile FPGA-ACC. The power-gating control is to supply power only during computing, that is, not to supply power during inactive intervals. Using the nonvolatile CPU and the nonvolatile FPGA-ACC removes the need for baking up or writing data when the power is turning ON or OFF. Thus, power-gating can reduce average power consumption and the longer the intermittent operation interval, the less power is consumed (see the result “with PG & FPGA-ACC” in
[0214] In the above-described processing method, in particular, it is preferable that computing by the nonvolatile FPGA-ACC relates to any one of processes by a Laplacian filter, a DCT (Discrete Cosine Transform), an FIR (Finite Impulse Response) filter, and an FFT (Fast Fourier Transform). In the case a processing method in a microcontroller as an IoT sensor node, in particular, as explained with reference to
[0215] In the above-described processing method, in particular, it is preferable that the nonvolatile CPU and a nonvolatile FPGA-based accelerator parallelly perform sequential processing. The sequential processing by the nonvolatile CPU and the nonvolatile FPGA-ACC in parallel realizes high computing performance and low power consumption, thus, it is preferably applied for an IoT sensor node.
REFERENCE SIGNS LIST
[0216] 1: device [0217] 2, 11: STT-MRAM (MRAM) [0218] 3, 12: NV-CPU [0219] 4, 13: NV-FPGA [0220] 5: power-gating controller [0221] 6, 14: access controller [0222] 7: bus [0223] 100: system [0224] 110: sensor node [0225] 120: gateway (GW) [0226] 130: communications network [0227] 140: cloud system