ELECTRONIC CIRCUIT AND DEVICE FOR COMPUTATION
20260057197 ยท 2026-02-26
Inventors
Cpc classification
International classification
Abstract
An electronic circuit is provided. The electronic circuit includes a plurality of digital-to-analog converters (DACs) configured to generate a plurality of analog input signals. The electronic circuit includes a computation matrix coupled to the plurality of DACs and configured to receive the plurality of analog input signals from the plurality of DACs. The computation matrix comprises a plurality of computation nodes. Each computation node comprises: a bias circuit configured to generate a positive bias current and a negative bias current based on an analog input signal among the plurality of analog input signals, and a computation circuit configured to generate a computation result current based on the positive bias current, the negative bias current, and a digital weight signal. An electronic device and a method are also provided.
Claims
1. An electronic circuit comprising: a plurality of digital-to-analog converters (DACs) configured to generate a plurality of analog input signals; and a computation matrix coupled to the plurality of DACs and configured to receive the plurality of analog input signals from the plurality of DACs, wherein the computation matrix comprises a plurality of computation nodes, wherein each computation node comprises a bias circuit configured to generate a positive bias current and a negative bias current based on an analog input signal among the plurality of analog input signals, and a computation circuit configured to generate a computation result current based on the positive bias current, the negative bias current, and a digital weight signal.
2. The electronic circuit of claim 1, wherein, for each computation node, the bias circuit comprises a current mirror circuit.
3. The electronic circuit of claim 2, wherein, for each computation node, the current mirror circuit comprises: an input transistor configured to receive analog input signal; a p-type metaloxidesemiconductor (PMOS) output transistor configured to generate the positive bias current; and an n-type metaloxidesemiconductor (NMOS) output transistor configured to generate the negative bias current.
4. The electronic circuit of claim 1, wherein, for each computation node, the computation circuit comprises a plurality of groups of weighting transistors, an output node, and a control circuit, wherein each group of weighting transistors comprises at least one a p-type metaloxidesemiconductor (PMOS) transistor, at least one n-type metaloxidesemiconductor (NMOS) output transistor, a first switch configured to couple the at least one PMOS transistor to the output node, and a second switch configured to couple the at least one NMOS transistor to the output node, and wherein the control circuit is configured to control the first switch and the second switch based on the digital weight signal.
5. The electronic circuit of claim 4, wherein, for each computation node, the computation result current of that computation node is a superposition of i) a current at the output node of that computation node, and ii) a computation result current generated by an adjacent computation node in the computation matrix.
6. The electronic circuit of claim 4, wherein, for each computation node, the plurality of groups of weighting transistors correspond to a plurality of multipliers.
7. The electronic circuit of claim 6, wherein each group of weighting transistors has at least one of: a number of PMOS transistors corresponding to a multiplier of that group and a number of NMOS transistors corresponding to the multiplier of that group, a width of the PMOS transistors corresponding to the multiplier of that group and a width of the NMOS transistors corresponding to the multiplier of that group, or a length of the PMOS transistors corresponding to the multiplier of that group and a length of the NMOS transistors corresponding to the multiplier of that group.
8. The electronic circuit of claim 6, wherein, for each computation node, the plurality of multipliers are powers of 2.
9. The electronic circuit of claim 1, wherein the computation matrix comprises a plurality of rows and a plurality of columns, and wherein the plurality of DACs are respectively coupled to the plurality of rows.
10. The electronic circuit of claim 9, further comprising a plurality of analog-to-digital converters (ADCs) coupled to the plurality of columns and configured to receive a plurality of analog output signals from the computation matrix and convert the plurality of analog output signals to a plurality of digital output signals.
11. The electronic circuit of claim 1, wherein each DAC comprises a plurality of groups of converting transistors, an output node, and a control circuit configured to receive an input digital signal, wherein each group of converting transistors comprises at least one a p-type metaloxidesemiconductor (PMOS) transistor, at least one n-type metaloxidesemiconductor (NMOS) output transistor, a first switch configured to couple the at least one PMOS transistor to the output node, and a second switch configured to couple the at least one NMOS transistor to the output node, and wherein the control circuit is configured to control the first switch and the second switch based on the input digital signal.
12. The electronic circuit of claim 1, wherein the computation matrix is a first computation matrix, and wherein the electronic circuit further comprises: one or more second computation matrices; and a coupling circuit connected between the first computation matrix and the one or more second computation matrices.
13. The electronic circuit of claim 12, wherein the coupling circuit comprises a plurality of diodes.
14. The electronic circuit of claim 1, further comprising: at least one demultiplexer configured to receive digital input data; and a plurality of first-in-first-out (FIFO) circuits coupled between the at least one demultiplexer and the plurality of DACs.
15. The electronic circuit of claim 10, further comprising: at least one multiplexer configured to generate digital output data; and a plurality of first-in-first-out (FIFO) circuits coupled between the at least one multiplexer and the plurality of ADCs.
16. An electronic device comprising: a receiver port configured to receive digital input data; a transmitter port configured to transmit digital output data; and a computation circuit configured to receive the digital input data from the receiver port and provide the digital output data to the transmitter port, wherein the computation circuit comprises: a plurality of digital-to-analog converters (DACs) configured to generate a plurality of analog input signals based on the digital input data; a computation matrix coupled to the plurality of DACs and configured to receive the plurality of analog input signals from the plurality of DACs and generate a plurality of analog output signals, wherein the computation matrix comprises a plurality of computation nodes; and a plurality of analog-to-digital converters (ADCs) coupled to the computation matrix and configured to receive the plurality of analog output signals from the computation matrix and convert the plurality of analog output signals to a plurality of digital output signals; wherein each computation node comprises a bias circuit configured to generate a positive bias current and a negative bias current based on an analog input signal among the plurality of analog input signals, and a computation circuit configured to generate a computation result current based on the positive bias current, the negative bias current, and a digital weight signal.
17. The electronic device of claim 16, wherein, for each computation node, the computation circuit comprises a plurality of groups of weighting transistors, an output node, and a control circuit, wherein each group of weighting transistors comprises at least one a p-type metaloxidesemiconductor (PMOS) transistor, at least one n-type metaloxidesemiconductor (NMOS) output transistor, a first switch configured to couple the at least one PMOS transistor to the output node, and a second switch configured to couple the at least one NMOS transistor to the output node, and wherein the control circuit is configured to control the first switch and the second switch based on the digital weight signal.
18. The electronic device of claim 17, wherein, for each computation node, the computation result current of that computation node is a superposition of i) a current at the output node of that computation node, and ii) a computation result current generated by an adjacent computation node in the computation matrix.
19. The electronic device of claim 17, wherein, for each computation node, the plurality of groups of weighting transistors correspond to a plurality of multipliers.
20. A method comprising: receiving, by a computation matrix and from a plurality of digital-to-analog converters (DACs), a plurality of analog input signals, wherein the computation matrix comprises a plurality of computation nodes; at each computation node, generating a positive bias current and a negative bias current based on an analog input signal among the plurality of analog input signals; and generating a computation result current based on the positive bias current, the negative bias current, and a digital weight signal.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029] Figures are not drawn to scale. Like reference numbers refer to like components.
DETAILED DESCRIPTION
[0030] The performance of an AI application can be affected by the applications capacity to process a large amount of data and make complex computations. For example, in machine learning applications where neural networks are configured to learn from training data and make inferences (e.g., predictions), the ability for a neural network to process a large amount of training data and the depth (e.g., number of layers) of the neural network can affect the accuracy of its predictions. To accommodate the increasing needs for AI applications with high computation capacity, AI accelerator circuits are provided with architectures designed for conducting complex computations. In a typical AI accelerator architecture, computation is performed in multiplication-accumulation (MAC) circuitry.
[0031] Some AI accelerator architectures have MAC circuitry that performs computation in the digital domain. In these architectures, the data is represented as digital code, such as binary bits at different voltage levels, and the multiplication and accumulation operations are performed using digital multipliers and adders made of logic gates. A disadvantage with these architectures is the large number of transistors that constitute the logic gates, which consume large circuit area. Another disadvantage with these architectures is the high power consumption often associated with the operations of the logic gates, e.g., during the transition between logic levels. These disadvantages can limit the potential for these architectures to accommodate the complex and high volume data in AI applications.
[0032] Implementations of this disclosure advantageously improve the power consumption and the circuit size of AI accelerators. As described in detail below, the MAC circuitry according to some implementations operates in the analog domain based on current scaling and superposition, which, compared with the digital domain computation, significantly reduces the power consumption and circuit area due to the large number of logic gates. With one or more features of the described circuits, implementations of this disclosure improve the computing capacity of AI accelerators and hence allow for more efficient and more accurate AI applications.
[0033]
[0034] As illustrated, system 100 includes one or more central processing units (CPUs) 101, one or more memories 103, and one or more AI accelerators 102, which are communicatively coupled to each other. When performing an AI application, CPU 101 can configure AI accelerators 102 to execute computation tasks according to the application. Accordingly, AI accelerators 102 can access data from memory 103 and perform computations, such as MAC operations, based on the data, and output results to memory 103 and/or CPU 101.
[0035] Each of AI accelerators 102 can be implemented on a standalone electronic device, such as a circuit board with one or more semiconductor IC chips. Alternatively, multiple AI accelerators 102 can be implemented on a single electronic device or as a single IC chip. Accelerators 102 can be located within the same facility (e.g., server room of a data center) as CPU 101 and memory 103, or can be remotely connected to CPU 101 and memory 103 via network connections.
[0036]
[0037] Electronic device 200 includes one or more receiver (RX) ports and transmitter (TX) ports for exchanging data and control signals with other devices. For example, as illustrated, electronic device 200 includes three pairs of RX and TX ports 231-233. Ports 231 can serve as serializer/deserializer (SerDes) input/output (IO) ports for communications over board-to-board links, such as the communication links between instances of AI accelerators 102 in system 100. Ports 232 can be configured to exchange data with a CPU and/or storage circuits, such as CPU 101 and memory 103 of system 100 of
[0038] Electronic device 200 also includes AI accelerator circuit 210 and controller 220. While controller 220 is illustrated in
[0039]
[0040] Electronic circuit 300 receives digital input data 301, performs computation in the analog domain, and outputs digital output data 330. The computation (e.g., MAC computation) takes place primarily in computation matrix 306, which has a plurality of rows and columns intersecting at a plurality of computation nodes 307, illustrated by the symbol . The rows of computation matrix 306 are respectively coupled to a plurality of DACs 304, which are configured to convert digital input data 301 to analog input signals 305 respectively input to the rows of computation matrix 306. In some implementations, digital input data 301 undergoes de-multiplexing by one or more de-multiplexers (de-MUXes) 302 before being input to DACs 304. For example, de-MUXes 302 can convert a stream of digital input data 301 into a plurality of streams of parallel data and stores the parallel data in a plurality of FIFO circuits 303. The output of each of FIFO circuits 303 is then input to a respective DAC 304 for converting to an analog input signal 305 in a corresponding row.
[0041] Each computation node 307 is configured to perform a multiplication operation. The multiplication takes the current amplitude of analog input signal 305 in the corresponding row as a multiplicand, and takes a digital weight signal at a corresponding column as a multiplier. The multiplication results are output by computation nodes 307 as analog currents, with the current amplitudes representing the value of the multiplication results.
[0042] The multiplication results in each column are then added in accumulation operations. The accumulation operations are implemented by superposition of currents output by computation node 307 in the same column. Accordingly, at the output of computation matrix 306 (illustrated as the bottom of the matrix), the current output by each column is a superposition of currents output by all of computation nodes 307 of that column, which is illustrated by the symbol . The output currents at all columns constitute analog output signals 320.
[0043] At the output, the columns of computation matrix 306 are respectively coupled to a plurality of ADCs 321, which are configured to respectively convert analog output signals 320 to digital output signals 330. In some implementations, digital outputs from ADCs 321 are respectively buffered in a plurality of FIFO circuits 322 and undergo multiplexing by one or more multiplexers (MUXes) 302 to become digital output signals 330.
[0044]
[0045] Using computation node 307a as an example, computation node 307a is configured to perform multiplication using current I.sub.1 as a multiplicand and using digital weight signal WGT_A as a multiplier. As described above, current I.sub.1 can correspond to an instance of analog input signal 305, which is output by an instance of DAC 304 based on digital input data 301 provided by an AI application. Additionally, digital weight signal WGT_A can also be provided by the AI application. For example, in the training phase of an AI application, digital weight signal WGT_A can be a parameter that affects the influence of a corresponding neuron (e.g., a node of the neural network) in the overall learning process. Digital weight signal WGT_A is represented in the form of one or more binary bits. For example, as illustrated, an AI application can specify that WGT_A is a two-bit signal that equals 2b00. With the inputs provided by the AI application, computation node 307a is configured to obtain the product of a) digital input data corresponding to analog input signal 305m, which equals the amplitude of current I.sub.1, and b) digital weight signal WGT_A, which equals 2b00. Similarly, for computation nodes 307b-307d, the AI application can specify that WGT_B, WGT_C, and WGT_D are two-bit signals that equal 2b01, 2b10, and 2b11, respectively. With these inputs, computation nodes 307b-307d are likewise configured to perform multiplications similar to that performed by computation node 307a. It is noted that the two-bit digital weight signals WGT_A to WGT_D are merely provided as examples. Other implementations can have digital weight signals with different number of bits and/or different logic values.
[0046] In the multiplications performed by computation nodes 307a-307d, the logic values of digital weight signals WGT_A to WGT_D are used as operands without being converted to analog signals first. More details about the multiplications are provided below with reference to
[0047] After performing a multiplication, computation node 307a generates current i 00, whose amplitude represents the product of a) and b) described above. Current i 00flows to connecting node A and is superposed on current i x output by an upstream computation node in computation matrix 306. In the context of computation matrix 306 where analog output signals 320 are output to ADCs 321 at the bottom of the matrix, an upstream computation node is a computation node that is disposed farther away from the bottom of computation matrix 306, while a downstream computation node is a computation node that is disposed closer to the bottom of computation matrix 306. For example, computation node 307a is an upstream computation node of computation node 307c, and computation node 307b is an upstream computation node of computation node 307d.
[0048] The superposition of currents i 00and i x can be equivalent to the accumulation of two analog signals, which results in current i a as the output of computation node 307a. In other words, after performing a multiplication operation using data provided by the AI application, computation node 307a performs an accumulation operation using the product i 00of the multiplication and an output i x of its upstream node. Similar to the superposition of currents i 00and i x at connecting node A, currents i 10and i aare superposed at connecting node A to become output current i c of computation node 307c. Likewise, currents i 01and i yare superposed at connecting node B to become output current i b of computation node 307b, and currents i 11and i bare superposed at connecting node D to become output current i d of computation node 307d. Connecting nodes A-D can be implemented as wire nodes where, according to Kirchhoffs current law, the sum of current inflow equals the sum of current outflow.
[0049]
[0050] Using computation node 407a as an example, computation node 407a is configured to perform multiplication using current I.sub.1 as a multiplicand and using digital weight signal WGT_A (which equals 2b00 in this example) as a multiplier, as described above with reference to
[0051] To perform the multiplication, computation node 407a has bias circuit 410, controller 412A, and multiplication circuit 413A. Controller 412A and multiplication circuit 413A can be collectively referred to as a computation circuit. Controller 412A can be similar to at least a portion of controller 220 illustrated in
[0052] Bias circuit 410 receives current I.sub.1 of an analog input signal (e.g., analog input signal 305m of
[0053] Multiplication circuit 413A is configured to receive bias currents pbias_0 and nbias_0 at inputs P and N, respectively. Multiplication circuit 413A is also configured to receive, from controller 412A, positive switch signal SWP.sub.a and negative switch signal SWN.sub.a, which are digital signals that control one or more switches of multiplication circuit 413A to perform the multiplication operation and output currents i00.
[0054] Controller 412A is configured to receive digital weight signal WGT_A from register circuit 411A, which can be synchronized with controller 412A to output a digital code (e.g., 2b00 in the example of
[0055] Computation nodes 407b-407d are configured to operate similarly to computation node 407a. For example, computation node 407b includes multiplication circuit 413B configured to receive bias currents pbias_0 and nbias_0 from bias circuit 410. Likewise, computation nodes 407c and 407d include multiplication circuits 413C and 413D, respectively, which are configured to receive bias currents pbias_1 and nbias_1 from bias circuit 420. In a more general scenario, multiple computation nodes in the same row of a computation matrix can receive bias currents from the same bias circuit. Alternatively or additionally, at least two computation nodes in the same row of a computation matrix can receive bias currents from multiple bias circuits, even though the amplitudes of bias currents are the same across the multiple bias circuits.
[0056] Similar to the configuration of multiplication circuit 413A, multiplication circuit 413B is configured to receive positive switch signal SWP.sub.b and negative switch signal SWN.sub.b from controller 412B, which is configured to receive digital weight signal WGT_B from register circuit 411B. Multiplication circuit 413C is configured to receive positive switch signal SWP.sub.c and negative switch signal SWN.sub.c from controller 412C, which is configured to receive digital weight signal WGT_C from register circuit 411C. Multiplication circuit 413D is configured to receive positive switch signal SWP.sub.d and negative switch signal SWN.sub.d from controller 412D, which is configured to receive digital weight signal WGT_D from register circuit 411D.
[0057] Similar to the superposition of i00 and i.sub.x at connecting node A, current i01 output by multiplication circuit 413B is superposed with current iy to become result current ib . Likewise, current i10 output by multiplication circuit 413C is superposed with current ia to become result current ic, and current i11 output by multiplication circuit 413D is superposed with current ib to become result current id.
[0058]
[0059] Current mirror circuit 500 includes transistors P.sub.A, P.sub.B, and N.sub.B, which can be metal-oxide-semiconductor field-effect transistors (MOSFETs). As illustrated, transistors P.sub.A and P.sub.B are PMOS transistors, whereas transistor N.sub.B is an NMOS transistor. Transistors P.sub.A and N.sub.B are diode-connected, e.g., with their respective gate terminals coupled to their respective drain terminals. The gate terminal of transistor P.sub.A is coupled to the gate terminal of transistor P.sub.B, and the drain terminal of transistor P.sub.B is coupled to the drain and gate terminals of transistor N.sub.B.
[0060] Current mirror circuit 500 receives input current I.sub.K as a reference current at the drain terminal of transistor P.sub.A. Current mirror circuit 500 also provides positive bias current pbias at the gate terminal of transistor P.sub.B (which is coupled to the gate terminal of transistor P.sub.A) and provides negative bias current nbias at the gate terminal of transistor N.sub.B (which is coupled to the drain terminal of transistor P.sub.B). When transistors P.sub.A, P.sub.B, and N.sub.B are fabricated to have the same dimensions, positive bias current pbias can mirror the amplitude and direction of input current I.sub.K, and negative bias current nbias can mirror the amplitude input current I.sub.K but flow in an opposite direction. By changing the dimension of transistors P.sub.A, P.sub.B, and N.sub.B, it is possible to scale (e.g., increase or decrease) the amplitudes of bias currents pbias and nbias. Furthermore, because transistor P.sub.A is diode-connected, transistor P.sub.A can block a sink current that flows in a direction opposite to that of input current I.sub.K, thereby operating as a rectified linear unit (ReLU).
[0061]
[0062] Multiplication circuit 600 includes a plurality of groups of weighting transistors. A first group includes PMOS transistor P.sub.0 and NMOS transistor N.sub.0, whose drain terminals are respectively coupled to switches SP.sub.0 and SN.sub.0 and whose source terminals are respectively coupled to a high voltage supply (e.g., VDD) and a low voltage supply (e.g., ground). A second group includes PMOS transistor P.sub.1 and NMOS transistor N.sub.1, whose drain terminals are respectively coupled to switches SP.sub.1 and SN.sub.1 and whose source terminals are respectively coupled to the high voltage supply and the low voltage supply. Likewise, an (n+1)-th group (n is a positive integer) includes PMOS transistor P.sub.n and NMOS transistor N.sub.n, whose drain terminals are respectively coupled to switches SP.sub.n and SN.sub.n and whose source terminals are respectively coupled to the high voltage supply and the low voltage supply. Multiplication circuit 600 also includes output node 610 coupled to the nodes between switches SP.sub.i and SN.sub.i (i=0, 1, 2, .Math. n). Accordingly, in each group and depending on the On/Off status of the switches in that group, the PMOS and NMOS transistors can provide paths for currents to flow through the transistors to output node 610.
[0063] The currents flowing from all paths to output node 610 together become output current I.sub.out of multiplication circuit 600. For example, when multiplication circuit 600 is instantiated as multiplication circuit 413A of
[0064] The groups of PMOS and NMOS transistors each can be associated with a different multiplier. The multiplier m.sub.i of the (i+1)-th group represents a ratio of the current amplitude of the (i+1)-th path divided by the amplitude of a bias current. Using the first group (i=0) as an example, when SP.sub.0 is switched on, the current flowing through PMOS transistor P.sub.0 to output node 610 equals positive bias current pbias multiplied by the multiplier m.sub.0. Assuming positive bias current pbias mirrors current I.sub.1 of
[0065] The association between a group of PMOS and NMOS transistors and the multiplier of that group can be obtained by varying the characteristics of transistors for different groups. One way of varying the characteristics for different groups is to vary the dimensions of the transistors. For example, in order for group A to have twice the multiplier of group B, the PMOS and NMOS transistors of group A can each be made approximately twice the width of those of group B, or can be made approximately half the length of those of group B.
[0066] The association between a group of PMOS and NMOS transistors and the multiplier of that group can be obtained by having different numbers of transistors for different groups. For example, in some implementations, instead of having only one PMOS transistor and one NMOS transistor in each group, some groups can have more than one PMOS transistor and more than one NMOS transistor. Accordingly, assuming all the PMOS transistors in the same group are coupled in series and all the NMOS transistors in the same group are also coupled in series, group A can have half the number of PMOS transistors and half the number of NMOS transistors of group B in order to have twice the multiplier of group B. Alternatively or additionally, assuming all the PMOS transistors in the same group are coupled in parallel and all the NMOS transistors in the same group are also coupled in parallel, group C can have twice the number of PMOS transistors and twice the number of NMOS transistors of group D in order to have twice the multiplier of group B.
[0067] In addition to varying the dimensions of transistors and varying the number of transistors, there are other ways of associating a group of transistors with a multiplier. For example, some implementations can have groups of transistors that differ both in dimension and in number. One of ordinary skill in the art reading this disclosure would have readily understood the other approaches that are within the spirit of this disclosure and yet omitted from description.
[0068] In some implementations, the multipliers can be, e.g., powers of two. For example, PMOS transistor P.sub.0 and NMOS transistor N.sub.0 in the first group can correspond to a multiplier of m.sub.0=2.sup.0=1, PMOS transistor P.sub.1 and NMOS transistor N.sub.1 in the second group can correspond to a multiplier of m.sub.1=2.sup.1=2, PMOS transistor P.sub.2 and NMOS transistor N.sub.2 in the third group can correspond to a multiplier of m.sub.2=2.sup.2=4, PMOS transistor P.sub.n and NMOS transistor N.sub.n in the (n+1)-th group can correspond to a multiplier of m.sub.2=2.sup.n. Some other implementations can have different multipliers for the groups of PMOS and NMOS transistors.
[0069] Multiplication circuit 600 performs multiplications by switching switches SP.sub.i and SN.sub.i in each group according to switch signals SWP and SWN. Each of switch signals SWP and SWN can have (n+1) bits, with each bit controlling a corresponding switch. For example, SWP[0] and SWN[0] can respectively control switches SP.sub.0 and SN.sub.0, SWP[1] and SWN[1] can respectively control switches SP.sub.1 and SN.sub.1, and SWP[n] and SWN[n] can respectively control switches SP.sub.n and SN.sub.n.
[0070] Switch signals SWP and SWN are generated by a controller based on the value represented by the digital weight signals, such as digital weight signals WGT_A to WGT_D in implementations illustrated in
[0071] Similarly, in the example of computation node 407b of
[0072] Similarly, in the example of computation node 407c of
[0073] In addition to natural binary code and grey code illustrated above, the digital weight signals provided to a multiplication circuit can possibly be coded based on other coding schemes, such as 2s supplement, cyclic code, hamming code, or other error correction coding schemes. Depending on the coding scheme, the controller in the multiplication circuit can determine the value behind the coded digital weight signal and output switch signals SWP and SWN accordingly. When the controller determines that the decimal value behind a digital weight signal is negative, the controller can use switch signals SWP and SWN to switch on one or more switches SN.sub.i to have negative currents flow to output node 610, thereby realizing a subtraction operation when output current I.sub.out is then superposed at a connecting node. In some implementations, the controller can be configured to switch off all switches SP.sub.i and only control switches SN.sub.i. These implementations can be used to realize the functions of a ReLU, a Gaussian error linear unit (GeLU), or a sigmoid linear unit (SiLU), which are commonly used in neural networks.
[0074] As described above with reference to
[0075] In some implementations, the architecture of multiplication circuit 600 can be similarly used to implement DAC circuits of AI accelerators, such as DACs 304 of electronic circuit 300. Example implementations are described below with reference to
[0076]
[0077] DAC circuit 700 can be configured to convert digital input data 701 to current I.sub.out, which is output at output node 705. The conversion can be achieved by setting bias currents pbias and nbias at constant levels (e.g., +1 unit and -1 unit, respectively) and controlling switches SP.sub.j and SN.sub.j to turn on or off based on digital input data 701. For example, when digital input data 701 is 4b0010, controller 730 can determine (e.g., based on a truth table stored therein) that the corresponding analog current should have a magnitude of +2 units. Accordingly, controller 730 can generate switch signals SWP and SWN to turn on only switch SP.sub.1 such that current I.sub.out equals 2(the amplitude of pbias)= 2 units. Likewise, when digital input data 701 is 4b0111, controller 730 can determine that the corresponding analog current should have a magnitude of +7 units. Accordingly, controller 730 can generate switch signals SWP and SWN to turn on only switches SP.sub.0, SP.sub.1, and SP.sub.2 such that current I.sub.out equals (1+2+4)(the amplitude of pbias)= 7 units.
[0078] Using DAC circuit 700 or similar DAC circuits in the MAC architecture of the above-described implementations can have many advantages. In addition to reduced circuit complexity and power consumption, the similarities between the DAC circuit architecture and the multiplication circuit architecture can increase the reusability and portability of circuit designs. For example, after expending efforts to design the multiplication circuit under various constraints (e.g., power supply, timing, temperature, or size), a circuit designer can conveniently transfer and reuse a large portion of the designed multiplication circuit when designing the DAC circuits, thereby reducing design cost. The increased reusability and portability can also streamline the fabrication process during the manufacture of an AI accelerator chip, thereby reducing manufacturing cost.
[0079] Electronic circuits according to one or more implementations described above can be conveniently expanded to increase the computation capacity. Example implementations are described below with reference to
[0080]
[0081] In some implementations, electronic circuit 800 has one or more coupling circuits configured to couple consecutive computation matrices in the cascade. The coupling circuits can include, e.g., one or more diodes that allow currents to flow unidirectionally from one computation matrix to another, thereby implementing ReLU functionality.
[0082] Using computation matrix 806_1 as an example, computation matrix 806_1 is similar to computation matrix 306 in receiving a plurality of analog input signals respectively from a plurality of DACs. Different from computation matrix 306, the analog input signals generated by computation matrix 806_1 are provided to the next matrix, computation matrix 806_2, as analog input signals. Likewise, the analog output signals of each next computation matrix are provided to the following computation matrix in the cascade, until the last computation matrix 802_N, whose analog output signals are provided to the ADCs.
[0083] The cascade structure described above with reference to
[0084] In the cascade structures described above, the plurality of computation matrices can correspond to a plurality of layers in multi-layer neural networks, such as a multi-layer deep neural network (DNN). When implemented according to the described cascade structures, multi-layer neural networks can perform complex computations with reduced latency and reduced power consumption because there is no memory or other storage circuitry needed between consecutive layers.
[0085] With the cascade structures described above, an AI accelerator according to some implementations can have great flexibility of addressing computation tasks with different data formats, computation complexities, data speeds, and/or circuit sizes. Moreover, because the computation matrices in the cascade can have the same or similar circuitry, the reusability and portability of circuit designs can be improved, which in turn can lead to reduced circuit size, complexity, and design and manufacturing costs.
[0086]
[0087] At 902, method 900 involves receiving, by a computation matrix and from a plurality of DACs, a plurality of analog input signals. The computation matrix includes a plurality of computation nodes, such as computation node 307 of computation matrix 306 illustrated in
[0088] At 904, method 900 involves, at each computation node, generating a positive bias current and a negative bias current based on an analog input signal among the plurality of analog input signals. The generation of the positive bias current and the negative bias current can utilize a current mirror circuit, such as current mirror circuit 500 illustrated in
[0089] At 906, method 900 involves, at each computation node, generating a computation result current based on the positive bias current, the negative bias current, and a digital weight signal. The generation of the computation result current can utilize a computation circuit, such as those illustrated in
[0090] While this specification includes many specific implementation details, these should not be construed as limitations on the scope of what may be claimed, but rather as descriptions of features that may be specific to particular implementations. Certain features that are described in this specification in the context of separate implementations can also be implemented, in combination, in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations, separately, or in any suitable sub-combination. Moreover, although previously described features may be described as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can, in some cases, be excised from the combination, and the claimed combination may be directed to a sub-combination or variation of a sub-combination.
[0091] Particular implementations of the subject matter have been described. Other implementations, alterations, and permutations of the described implementations are within the scope of the following claims as will be apparent to those skilled in the art. While operations are depicted in the drawings or claims in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed (some operations may be considered optional), to achieve desirable results. In certain circumstances, multitasking or parallel processing (or a combination of multitasking and parallel processing) may be advantageous and performed as deemed appropriate.
[0092] Moreover, the separation or integration of various system modules and components in the previously described implementations should not be understood as requiring such separation or integration in all implementations, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
[0093] Accordingly, the previously described example implementations do not define or constrain the present disclosure. Other changes, substitutions, and alterations are also possible without departing from the spirit and scope of the present disclosure.