THIN-FILM LARGE-AREA CLASSIFIER
20190147329 ยท 2019-05-16
Inventors
Cpc classification
H03M3/406
ELECTRICITY
G06F3/002
PHYSICS
International classification
G06F3/00
PHYSICS
Abstract
A classifier system implementing an equivalent deep neural network (DNN) includes a weight block, classification block, row selector, and sensor array coupled with the weight block, classification block and row selector. The sensor array includes row lines, column lines, a data integration line, an integration start line, and multiple sensor cells corresponding to respective neurons in an input layer of the equivalent DNN. The sensor cells share a common terminal connected to the data integration line, the row lines are controlled by the row selector, and the column lines receive respective weight values from the weight block. The classification block includes a first integrator receiving a signal generated on the data integration line when the integration start line is selected, and a first thresholding unit receiving a signal from the first integrator. The first thresholding unit is coupled to second integrators and second thresholding units arranged in a two-dimensional matrix.
Claims
1. A classifier system implementing an equivalent deep neural network (DNN), the classifier system comprising: a weight block; a classification block; a row selector; and a sensor array operatively coupled with the weight block, the classification block and the row selector, the sensor array comprising a plurality of sensor cells corresponding to respective neurons in an input layer of the equivalent DNN, the sensor cells arranged in a two-dimensional matrix configuration, row lines, column lines, a data integration line and an integration start line; wherein the classifier system is configured such that the plurality of sensor cells share a common terminal connected to the data integration line, the row lines are controlled by the row selector, and the column lines receive respective weight values from the weight block, the weight values received from the weight block corresponding to connection weights in the equivalent DNN; and wherein the classification block includes a first integrator receiving a signal generated on the data integration line when the integration start line is selected, and a first thresholding unit receiving a signal from the first integrator, the first thresholding unit operatively coupled to an array of second integrators and second thresholding units arranged in a two-dimensional matrix configuration, the array of second integrators and second thresholding units including row lines, controlled by the row selector, and column lines receiving respective weight values from the weight block.
2. The system of claim 1, wherein the sensor array comprises a plurality of storage capacitors, a first plurality of select transistors and a second plurality of select transistors, and wherein a gate terminal of each of the first plurality of select transistors in each row is controlled by a corresponding one of the row lines connected to the row selector, a first source/drain terminal of each of the first plurality of select transistors in each column receives a weight value from a corresponding one of the column lines connected to the weight block, a second source/drain terminal of each of the first plurality of select transistors is connected to a corresponding one of the storage capacitors and to a first source/drain terminal of each of the second plurality of select transistors, a gate terminal of each of the second plurality of select transistors is connected to the integration start line, a second source/drain terminal of each of the second plurality of select transistors is connected to a first terminal of each of a corresponding one of the plurality of sensor cells, and a second terminal of each of the plurality of sensor cells is connected to the data integration line.
3. The system of claim 2, wherein the column lines used in the sensor array are the same as the column lines used in the classification block.
4. The system of claim 1, wherein the array of the second integrators and second thresholding units in the classification block is further described in that the second integrators in the first row of the array receive signals from the first thresholding unit and receive weight values from the column lines, the second integrators in a given row except the first row receive signals from the second thresholding units in the previous row and receive weight values from the column lines, the second thresholding units in a given row receive signals from the second integrators in the same row, the second thresholding units in a given row except the last row provide signal to the second integrators in the next row, and the second thresholding units in the last row provide signal to the output circuit.
5. The system of claim 1, wherein an integration time of the classification block is configured to be shorter than resistance-capacitance (RC) delays associated with sensor resistance values and storage capacitance values.
6. The system of claim 1, wherein voltages loaded by the weight block onto the column lines in the sensor array are associated with the connection weights of the input layer of the equivalent DNN.
7. The system of the claim 1, wherein voltages loaded by the weight block onto the column lines in the classification block are associated with weights of at least one of a hidden layer of the equivalent DNN and the output layer of the equivalent DNN.
8. The system of claim 1, wherein the integration start line is triggered after each full scan of the sensor array.
9. The system of claim 1, wherein the sensor array is scanned H1 times for each classification, where H1 is an integer indicative of a number of neurons in a first hidden layer of the equivalent DNN.
10. The system of claim 1, wherein the two-dimensional array of second integrators in the classification block is scanned sequentially following a scanning of the sensor array.
11. The system of claim 2, wherein each of at least a subset of at least one of the first and second select transistors comprises a thin-film transistor.
12. The system of claim 2, wherein each of at least a subset of at least one of the first and second select transistors comprises a heterojunction field effect transistor comprised of hydrogenated amorphous silicon contacts on single-crystalline or poly-crystalline silicon.
13. The system of claim 12, wherein the heterojunction field effect transistors are formed on at least one of a large-area substrate and a flexible substrate.
14. The system of claim 1, wherein each of at least a subset of sensor cells comprises a two-terminal hydrogenated amorphous silicon photoconductor.
15. The system of claim 1, wherein the weight block comprises a cross-bar array of programmable resistive elements.
16. The system of claim 1, wherein the thresholding units are configured to implement a sigmoid activation function.
17. The system of claim 1, wherein at least one of the first and second integrators in the classification block comprises an operational amplifier and a capacitor, the capacitor connected between an input and an output of the operational amplifier.
18. The system of claim 17, wherein the operational amplifier comprises a single transistor connected in a common-source configuration.
19. A method of classifying data acquired from a sensor array, the method comprising: providing a classifier system implementing an equivalent deep neural network (DNN), the classifier system including a weight block, a classification block, a row selector, and the sensor array operatively coupled with the weight block, the classification block and the row selector, the sensor array comprising row lines, column lines, a data integration line, an integration start line and a plurality of sensor cells corresponding to respective neurons in an input layer of the equivalent DNN, the sensor cells arranged in a two-dimensional matrix configuration, the classification block comprising at least a first integrator coupled to a first thresholding unit, a classification array arranged in a two dimensional matrix of second integrators coupled to corresponding second thresholding units, row lines and column lines; resetting the second integrators in the classification block; resetting a first integrator in the classification block; scanning the sensor array sequentially by selecting a given one of the row lines of the classification array and loading prescribed voltages provided by the column lines onto storage capacitors included in the respective sensor cells coupled to the given one of the row lines, the prescribed voltages corresponding to connection weights associated with the neurons in the input layer of the equivalent DNN; generating a weighted sum of sensor readings obtained from scanning the sensor array by connecting the sensor array to the first integrator; thresholding the weighted sum of the sensor readings by applying an activation function using the first thresholding unit; multiplying the thresholded weighted sum of the sensor readings with the prescribed voltages corresponding to connection weights provided by the column lines using the second integrators in the selected given one of the row lines of the classification array, the prescribed voltages corresponding to connection weights between neurons in a first hidden layer and neurons in one of a next hidden layer and an output layer of the equivalent DNN; generating a cumulative weighted sum of the thresholded weighted sums of the sensor readings by repeating the steps of resetting a first integrator, sequentially scanning the sensor array, generating the weighted sum of sensor readings, thresholding the weighted sum of the sensor readings and multiplying the thresholded weighted sum of the sensor readings for H1 cycles using the second integrators in each column of the selected given one of the row lines of the classification array, H1 being a number of neurons in the first hidden layer of the equivalent DNN; and generating an output by applying the cumulative weighted sum to the second thresholding units in the corresponding columns of the selected given one of the row lines of the classification array, or generating corresponding inputs for the second integrators in a next subsequent row of the classification array.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0015] The following drawings are presented by way of example only and without limitation, wherein like reference numerals (when used) indicate corresponding elements throughout the several views, and wherein:
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031] It is to be appreciated that elements in the figures are illustrated for simplicity and clarity. Common but well-understood elements that may be useful or necessary in a commercially feasible embodiment may not be shown in order to facilitate a less hindered view of the illustrated embodiments.
DETAILED DESCRIPTION
[0032] Principles of the present disclosure will be described herein in the context of illustrative classification blocks or modules and related circuitry for use in a classifier system based on a deep neural network (DNN) for enhancing the performance of the system. It is to be appreciated, however, that the specific systems, circuits and/or methods illustratively shown and described herein are to be considered exemplary as opposed to limiting. Moreover, it will become apparent to those skilled in the art given the teachings herein that numerous modifications can be made to the embodiments shown that are within the scope of the appended claims. That is, no limitations with respect to the embodiments shown and described herein are intended or should be inferred.
[0033]
[0034] An overall DNN 200 can be modeled as a plurality of nodes, 202, 204 and 206, organized into hierarchical layers, namely, an input layer 208, one or more hidden layers 210, and an output layer 212, as shown in
[0035] As previously stated, neural networks and other sophisticated classifier systems, such as, for example, support vector machines and machine learning algorithms, may be implemented in complementary metal-oxide-semiconductor (CMOS) technology using very large scale integration (V.sub.LSI) devices. Such classifier systems may include, for example, amplifiers, bias generators, derivative generators, multipliers, etc. However, such implementations are not feasible for thin-film transistors (TFTs) due to their low performance and lack (or feasibility) of complementary transistors, among other disadvantages. Furthermore, such implementations are too complex for large-area and flexible electronics. Consequently, implementations using TFTs have thus far been restricted to linear decision boundaries or a combination of linear decision boundaries.
[0036] Aspects according to one or more embodiments of the invention provide a novel thin-film classifier architecture which achieves superior performance compared to conventional thin-film classifiers, and which is particularly well-suited for neural network and other sophisticated classifier applications. Embodiments of the invention exploit the higher performance of thin-film heterojunction field-effect transistor (HJFET) devices compared to conventional TFTs such as amorphous silicon (Si) devices. While training is still performed offline, the thin-film classifier according to aspects of the invention can enable complex decision boundaries and accurate classification based on deep neural networks (there is no fundamental limit on the number of hidden layers used in the novel implementation). Some possible applications include, but are not limited to, image classification (e.g., hand-written text), classification of bio-sensory data (e.g., epileptic vs. normal from the brain surface) and structural health monitoring (e.g., using pressure sensors).
[0037]
[0038] Each of the sensor cells 304 in the sensor array 302 includes a first transistor, M1, which may be a field-effect transistor (FET), and a storage capacitor, C.sub.S, coupled with M1. Specifically, a first source/drain of transistor M1 in the sensor cell 304 is connected with a corresponding column line, such as Col 1 (which is shared between all the N sensor cells in that column), a second source/drain of M1 is connected with a first terminal of the storage capacitor C.sub.S, a second terminal of C.sub.S is connected with a first voltage supply, which may be VSS or ground, and a gate of M1 is connected with a corresponding row line, such as Row 1 (which is shared between all the M sensor cells in that row). The sensor cell 304 further includes a second transistor, M2, and a sensor device which is represented, at least in part, by a variable resistor, R. A first source/drain of transistor M2 is connected to the first terminal of the storage capacitor C.sub.S and the second source/drain of transistor M1, a second source/drain of M2 is connected to the first terminal of the variable resistor R (i.e., the sensor device), a gate of M2 is connected to the integration start line, Row N+1 (which is shared between all the MN sensor cells), and a second terminal of the variable resistor R (the sensor device) is connected to the data integration line 312 (which is shared between all the MN sensor cells). In one or more preferred embodiments, the transistors M1 and M2 used in the sensor array, as well as the transistors used in the other blocks of the classifier system, particularly the classification block, are heterojunction field-effect transistors.
[0039] As previously stated, the variable resistor R represents at least a portion of a sensor device. In one exemplary application wherein the classifier system 300 is used for image classification, the sensor device is a two-terminal element comprised of a photoconductive material such as hydrogenated amorphous silicon or other chalcogenides. When an image is projected over the sensor array, the sensor devices in different pixels are exposed to different illumination levels and therefore exhibit different R values; the higher the illumination level, the higher the photoconduction and therefore the lower the R value. In the case of monochrome (e.g., grayscale) images, the same sensor devices may be used in all pixels. In the case of color images, different sensors (or the same sensors with different color filters) may be used for each of the principal colors (e.g. red, green and blue); for instance, in each of three adjacent pixels.
[0040] In another exemplary application wherein the classifier system 300 is used for the classification of mechanical strain distribution patterns over a surface, the sensor device is preferably a two-terminal element comprised of a piezo-resistive material. The spatial distribution of mechanical strain results in different strain levels applied to different sensors thus resulting in different R values in the pixels. In another example where the classifier system 300 is used for the classification of thermal distribution patterns over a surface, the sensor device is preferably a two-terminal element comprised of a thermo-electric material. The spatial distribution of temperature results in different temperature levels and thus different R values in different pixels.
[0041] In one or more other embodiments, each of at least a subset of the sensor devices may include additional terminals (not explicitly shown in the equivalent circuit of
[0042] In another exemplary application, wherein thermal distribution and strain distribution data are both necessary for classification, the sensor array may include both thermo-electric and piezo-resistive sensors. For instance, of every two adjacent pixels, one may contain a thermo-electric sensor and the other may contain a piezo-resistive sensor. It will be appreciated by those skilled in the art that the different sensors used in the array may not be of equal numbers. For instance, a smaller number of temperature sensors than strain sensors (e.g., one in every four adjacent pixels) may be used in one or more of the following situations: thermal distribution data are less important to decision making than the strain distribution data; available temperature sensors produce a stronger response (because of higher sensitivity and/or stronger input signal) than the available strain sensors; the thermal gradients are smaller than the strain gradients in the medium of interest.
[0043] The number of pixels in the sensor array corresponds to the number of neurons, m, in the input layer of an equivalent neural network (as denoted in the exemplary neural network of
[0044] If the number of neurons in any of the hidden layers or the output layer of the equivalent neural network is larger than the number of columns M in the sensor array, the classifier system may include additional columns; for example, a total of D columns, where D is the largest number of neurons in any of the hidden layers and the output layer of the equivalent neural network. For example, if M=10, H1=15 (H1 representing the number of neurons in the first hidden layer), H2=12 (H2 representing the number of neurons in the second hidden layer) and K=8 (K representing the number of neurons in the output layer), then D=15 and 5 column lines are required in addition to the 10 column lines used for the sensor array. In another example, where M=10, H1=8, H2=5 and K=5, no additional column lines are required.
[0045] In the exemplary classifier system 300 of
[0046] During a data acquisition operation of the sensor array, e.g., while an image is being projected over an image sensor array, all M2 transistors are disabled (i.e., turned off) by assertion of an appropriate voltage (e.g., a logical low) on the integration start line, row N+1, and the M1 transistors are enabled (i.e., turned on) row-by-row, by assertion of an appropriate voltage (e.g., a logical high) on row lines 1 through N, sequentially (i.e., one row at a time). With transistor M1 turned on, a prescribed voltage placed by the weight block 310 on the corresponding column line is passed through M1 and applied across the storage capacitor C.sub.S, which retains the applied voltage even after the transistor M1 is disabled (i.e., turned off); once M1 is turned off, the first terminal of the storage capacitor C.sub.S effectively becomes a floating node. Next, the M2 transistors are enabled (while the M1 transistors are disabled) for data transfer to the classification block 306 via the data integration line 312. This procedure is repeated, sequentially, H1 times, where H1 is the number of neurons in the first hidden layer of the equivalent neural network.
[0047] More specifically, referring to sensor cell 304, when row 1 is selected, the M1 transistor in cell 304 is turned on, and the voltage placed on Col 1 is stored across C.sub.S in cell 304. Since the M1 transistors in the rest of the cells in the first row are also selected, the voltages placed on Col 2 through M are also stored across the C.sub.S capacitors in the corresponding cells in the first row. Next, row 1 is deselected, new voltages are placed on the column lines by the weight block 310, row 2 is selected, and the voltages placed on the column lines are stored across the cells in the second row. Once the entire sensor array 302 has been scanned in this manner, i.e., by sequential selection of rows 1 through N, and loading the voltages placed on columns 1 through M, a total of m=MN voltages have been loaded from the weight block 310 across the C.sub.S capacitors.
[0048] As will be discussed further below, the voltages provided by the column lines and stored across the storage capacitors are proportional (linearly, to the first order) to the weights associated with the connections between the neurons in the input layer and the neurons in the first hidden layer of the equivalent neural network. Therefore, the column line voltages may be referred to as weight voltages. The m=MN weight voltages loaded on the sensor array 302 after a single scan of the array correspond to the weights associated with the connections between the input layer and a single neuron in the first hidden layer of the equivalent neural network. Since there are H1 neurons in the first hidden layer of the equivalent neural network, the sensor array is scanned H1 times to load all the MNH1 weights associated with the connections between the m=MN neurons in the input layer and the H1 neurons in the first hidden layer of the equivalent neural network. Once each scan is finished, the integration start line, row N+1, is selected to turn on the M2 transistors (while the M1 transistors are off) and transfer the weighted sum of the acquired sensor data to the classification block 306 for processing, via the data integration line 312, as will be discussed below. Therefore, during each data acquisition cycle, rows 1 through N+1 are scanned sequentially, H1 times.
[0049] It will be appreciated that, in some embodiments, the M2 transistors may be replaced with a single transistor connected between the sensor array 302 and the classification block 306 (e.g., transistor M2.sub.eq illustrated in the equivalent circuit of
[0050] As known in the art, appropriate high and low logic levels depend on the type and parameters of the transistors being used. For example, for an n-channel conventional thin-film transistor (which is, in principle, a metal-oxide field-effect transistor, or MOSFET), the logical high is chosen sufficiently above the threshold voltage of the transistor and therefore is typically a sufficiently large positive voltage, e.g., V.sub.DD; whereas the logical low is chosen sufficiently below the threshold voltage, and therefore is typically close to zero (e.g., ground) or negative. For an n-channel HJFET, the logical high is chosen sufficiently above the pinch-off voltage, e.g., close to zero; whereas the logical low is chosen sufficiently below the pinch-off voltage and therefore is typically a sufficiently large negative voltage, e.g., V.sub.SS. Note, the pinch-off voltage of an n-channel HJFET has a negative value, whereas the pinch-off voltage of a p-channel HJFET has a positive value. When n-channel MOS transistors or n-channel HJFETs are used, row lines are selected by asserting logical high voltages and deselected by applying logical low voltages. Conversely, when p-channel transistors are used, row lines are selected by asserting logical low voltages and deselected by applying logical high voltages. The exemplary embodiments described herein use n-channel transistors; however, it will become apparent to those skilled in the art given the teachings herein that these embodiments can be similarly implemented using p-channel transistors.
[0051] Due to the sequential operation of the sensor array as described above, transistors with high switching speeds are desired to reduce the data acquisition time. In addition, transistors with sufficiently low off-currents are required to avoid substantial discharge of the C.sub.S storage capacitors during the scan of the sensor array. Moreover, transistors with low operation voltages are desired to reduce the system-level power consumption. The requirements described above are also applicable to the operation of the classification block, which will be described in further detail herein below. Conventional thin-film transistors used in large-area electronics and displays may not (at least not satisfactorily) meet one or more of the above requirements. On the other hand, high-performance conventional transistors used in modern semiconductor chips are generally not compatible with large-area and flexible substrates. HJFET devices, however, are compatible with large-area and flexible substrates, and surpass the device performance of conventional thin-film transistors. Therefore, in one or more preferred embodiments, HJFET devices are used for the implementation of the sensor array 302 and the classification block 306. HJFET devices may also be used in the implementation of the row selector 308 and/or the weight block 310, in one or more embodiments.
[0052] In a preferred embodiment, the weights are provided to the sensor array 302 as weight voltages, V.sub.ij (where 1iM and 1jN; i and j are integers), by the weight block 310, as illustrated in
[0053] By way of example only and without limitation,
[0054] The inverting input of the first amplifier A1 is adapted to receive a weighted sum of the sensor data via the data integration line from the sensor array (e.g., 302 in
[0055] An output of the thresholding unit 401 is coupled with an output block 402. The output block 402, in one or more embodiments, includes a plurality of column circuits, each of the column circuits comprising an integrator and corresponding access and processing circuitry. Specifically, each column circuit incudes a transistor M5, a transistor M6, and a transistor M7. Each column circuit further includes a second amplifier, A2, an optional pair of clamping diodes, D3 and D4, and a second thresholding unit 404. Each column circuit is coupled with an output column circuit comprising a transistor M8, and an indicator element, which in one or more embodiments may be a light-emitting diode (LED). In the illustrated circuit, the LED brightness is proportional to the M8 current, which may be an approximately quadratic function of the output voltage of the thresholding unit 404; i.e., the LED brightness being proportional to (1V.sub.out,th/V.sub.p,8).sup.2, where V.sub.out,th is the output voltage of the thresholding unit, and V.sub.p,8 is the pinch-off voltage of M8.
[0056] A first source/drain of transistor M5 is coupled with a corresponding column line, a second source/drain of M5 is coupled with an inverting input of the amplifier A2, and a gate of M5 is coupled with the output of thresholding unit 401. A non-inverting input of amplifier A2 is connected to VSS. An integration capacitor, C.sub.int,1, is connected between the inverting input of A2 and an output of A2, thus forming an integrator circuit. A first source/drain of transistor M6 is coupled with the output of the amplifier A2, a second source/drain of M6 is connected to VSS, and a gate of M6 is adapted to receive a control signal from the reset row, row RST, which is connected to the gates of M6 transistors of all columns. A first source/drain of transistor M7 is coupled with the output of amplifier A2, a second source/drain of M7 is coupled with an input of the thresholding unit 404, and a gate of M7 is adapted to receive a control signal from row N+3, which is connected to the gates of the M7 transistors in all columns. Diodes D3 and D4 are connected in series between V.sub.H and V.sub.L, in a manner consistent with diodes D1 and D2, and serve as a means of limiting the output voltage of the amplifier A2 (i.e., the output voltage of the integrator) between approximately V.sub.L and V.sub.H. A first source/drain of transistor M8 is coupled with a cathode of the LED, an anode of the LED is coupled with VDD, a second source/drain of M8 is connected to VSS, and a gate of M8 is coupled with an output of the thresholding unit 404.
[0057] Each column circuit in the output block 402 corresponds to a neuron in the output layer of the equivalent neural network. The brightness of, for instance, LED 1 in column 1, is an indication of the certainty with which an input to the classifier system (e.g., an image) is being classified by the classifier system under class 1; i.e., the higher the brightness level of the LED 1, the higher the certainty that the input belongs to class 1, as determined by the classifier system. Similarly, the higher the brightness of the LED 2, the higher the certainty that the input belongs to class 2, and so on for the rest of the columns, 3 through K. In other embodiments, the output voltages of the thresholding units 404 may be routed to analog-to-digital (A/D) converters for readout. Other forms of indicator or readout circuitry known in the art may also be used.
[0058] The classification block 400 may be operated as follows, in conjunction with the sensor array to which it is connected. Rows 1 through N are selected sequentially to scan the sensor array, followed by row N+1 to transfer the weighted sums of the sensor data to the integrator circuit formed by C.sub.int and A1, via the data integration line; and finally row N+2 is selected to transfer the output of the integrator to the thresholding unit 401, where the activation function is applied to the integrator output and the result is transferred to the output block 402. This procedure is repeated H times for each data acquisition, where H is the number of neurons in the hidden layer of the equivalent neural network. (Note, the equivalent neural network corresponding to the classification block 400 has only one hidden layer, which may be treated as the first hidden layer; i.e., H=H1).
[0059] Prior to each sensor data transfer, the integrator output is reset by M3 (which may be controlled by row N). The module described above (C.sub.int, A1, M3, M4 and thresholding unit 401) corresponds to a single neuron in the hidden layer of the equivalent neural network. Since this module is used H times (sequentially) per sensor data acquisition, it is implemented only once in hardware. This is in contrast to the output block 402, where the number of columns K corresponds to the number of neurons in the output layer of the equivalent neural network. The weight voltages placed on the column lines are updated by the weight block prior to each selection of row N+2 (in the same manner as described earlier for the sensor array), so that appropriate weights are applied to the output voltages received from the thresholding unit 401 and integrated by the C.sub.int,1/A2 pair to generate a weighted sum associated with the neurons in the output layer.
[0060] During each sensor data acquisition, H output voltages (corresponding to H neurons in the hidden layer of the equivalent neural network) are received sequentially from the thresholding unit 401 and applied to the gates of the M5 transistors. The weight voltages loaded on the column lines 1 through K (where K is the number of neurons in the output layer of the equivalent neural network) with the receipt of the h-th output voltage from the thresholding unit 401 (where 1hH) correspond to the weights associated with the connections between the h-th neuron in the hidden layer and the neurons 1 through K in the output layer of the equivalent neural network. Since H output voltages are received, HK weights are loaded by the column lines and the associated weighted sums are generated by the corresponding column integrators. The output of the column integrators are reset by selecting a reset row, row RST (which turns on the M6 transistors) prior to the receipt of the first output voltage from the thresholding unit 401; namely, prior to the first selection of row N+2, in each data acquisition. That is, row RST is selected once per data acquisition, not H times. As a result, the output of the column integrators in the output block is the sum of H weighted sums generated from the H output voltages received from the thresholding unit 401.
[0061] In some embodiments, the output columns may further include additional transistors, M9 (not shown) connected in series with integrating capacitors C.sub.int,1, (i.e., either between the input of A2 and C.sub.int,1, or between the output of A2 and C.sub.int,1), with the gates of the M9 transistors connected to row N+2. Therefore, when row N+2 is selected, the M9 transistors are turned on, and the column integrators function the same as described above. When row N+2 is deselected, the M9 transistors are turned off, thus suppressing potential discharge of C.sub.int,1 via the input and/or the output terminals of A2. Finally, row N+3 is selected to transfer the outputs of the column integrators to the thresholding units 404 (where the activation function is applied) and generate an output.
[0062] With reference now to
[0063] In this exemplary embodiment, the hidden block 501 is implemented in a manner consistent with the output block 502; except that in the output block 502, the M10 transistors are enabled simultaneously by a shared row line, row N+H2+3 (in the same manner that in the output block 402 shown in
[0064] The classification block 500 may be operated as follows, in conjunction with the sensor array to which it is connected. Rows 1 through N are selected sequentially to scan the sensor array, followed by rows N+1 and N+2. This is repeated H1 times, where H1 is the number of neurons in the first hidden layer of the equivalent neural network. Subsequently, rows N+3 through N+H2+3 are selected sequentially, followed by row N+H2+3 to generate the output. The number of columns in the hidden block 501, corresponds to the number of neurons, H2, in the second hidden layer of the equivalent neural network; and the number of columns in the output block 502, corresponds to the number of neurons, K, in the output layer of the equivalent neural network.
[0065] It will be appreciated that, in some embodiments, parts of the sensor array, the hidden block and/or the output block may be chosen to remain unused for a given classification task; e.g., a classifier system comprised of MN sensors may be used to implement an equivalent neural network with fewer than MN neurons in its input layer; or a hidden block with H2 columns may be used to implement an equivalent neural network with fewer than H2 neurons in its second hidden layer. It will be also be appreciated that the classifier system may include further components and/or control signals (not shown) to improve the performance and/or add desired functionality, using techniques known in the art. For instance, when an amplifier or a thresholding unit is not in use, its power supply lines (VDD and VSS) may be disconnected to eliminate standby power consumption in the amplifier or the thresholding unit.
[0066]
[0067] Next, in step 604, the sensor inputs are multiplied by their respective weights, summed up and thresholded to calculate an output of the second neuron 605 in the first hidden layer, which is then multiplied by its respective weights and fed into the second hidden layer. The same process is performed for the rest of the neurons in the first hidden layer. Since this process is repeated H1 times, only one integrator (e.g., C.sub.INT/A1 in
[0068] The values fed to the neurons in the second hidden layer during each iteration are summed up with the previous values fed to them (using the column integrators C.sub.INT,1/A2 in the hidden block 501 shown in
[0069] With reference now to
[0070] According to an alternative embodiment of the invention as shown in
[0071] In one or more embodiments in which three (or more) hidden layers are employed, the illustrative methodology conceptually depicted in
[0072] Next, in step 904, the outputs from the neurons in the second hidden layer multiplied by their respective weights (i.e., weights associated with the connections illustrated between neurons in the second hidden layer and the second neuron 905 in the third hidden layer) are summed up and thresholded to calculate an output of the second neuron 905 in the third hidden layer, which is then multiplied by its respective weights (i.e., weights associated with the connections illustrated between the second neuron 905 in the third hidden layer and the neurons in the output layer) and fed into the output layer. This process is continued for H3 iterations, where H3 is the number of neurons in the third hidden layer. It is to be understood that other approaches, timings and combinations may be similarly employed, in accordance with other embodiments of the invention.
[0073] In the block diagram of the exemplary classifier system 300 shown in
[0074]
[0075] More particularly, the sensor block 1002 is connected to a first source/drain of the transistor M2.sub.eq, a second source/drain of M2.sub.eq is connected to an inverting () input of the amplifier A1, and a gate of M2.sub.eq is adapted for receiving a row N+1 control signal. The integration capacitor C.sub.int is connected between the inverting input and an output of the amplifier A1 in a negative feedback configuration, thus forming an integrator circuit. A non-inverting (+) input of the amplifier A1 is connected to a voltage source, which in this embodiment is VSS, and the output of A1 generates an output, V.sub.OUT, of the circuit 1000.
[0076] The sensor block 1002 represents a plurality of sensor cells organized in an MN array such as that in the sensor block 300, shown in
[0077] When M2.sub.eq is turned on by selecting row N+1 (at time t=0), a current, i.sub.C.sub.
where V.sub.0,ij is the voltage across the storage capacitor C.sub.S in a given sensor cell prior to the selection of row N+1 (at t=0) as previously stated. Using the expression above, the output voltage V.sub.OUT can be determined as a function of time as follows:
Based on the assumption that integration time t<<C.sub.SR.sub.ij (the integration time t is the time period during which row N+1 is HIGH and therefore M2.sub.eq is on, and C.sub.SR.sub.ij is the resistance-capacitance (RC) delay associated with the storage capacitance value of C.sub.S and the sensor resistance value of R.sub.ij), the output voltage can be determined as function of integration time as follows:
In analogy with the illustrative neural network of
[0078] When the sensor array 1002 is scanned, the respective weight voltages V.sub.0,ij are stored in the storage capacitors C.sub.S for all cells in rows i and columns j. When the scan is completed, integration is started by selecting row N+1 to calculate the output of a given neuron in the following (i.e., downstream) layer. The integration time t, as well as values for the storage capacitor C.sub.S and resistor R.sub.ij, are preferably chosen such that the assumption t<<C.sub.SR.sub.ij holds for all i, j. Prior to the start of each integration, a control signal conveyed on row N can be used to reset the integrator output to an appropriate voltage such as ground or VSS, e.g. by activating the transistor M3 shown in
[0079] By way of example only and without limitation,
[0080] The sensor cell 1102, like the sensor cells in the sensor array 1002 shown in
[0081] In this example, a simple common-source amplifier circuit is used for implementing the integration amplifier 1104 (consistent with amplifier A1 in
[0082] Simulation results, using, for example, an analog electronic circuit simulator (e.g., HSPICE, a registered trademark of Synopsys, Inc.), were obtained using the exemplary sensor data integration circuit 1100.
[0083] With reference to
[0084] With reference to
[0085]
[0086] The transistor M1 in the circuit 1400 is preferably biased in a linear regime. A voltage, V.sub.C, on the corresponding column line COL j is supplied as an input to the circuit 1400 by the weight block. In one or more embodiments, the control signal V.sub.IN is generated as an output of a thresholder (e.g., 401 in
where I.sub.DSS is the saturation current of the HJFET at zero gate-to-source bias, and V.sub.p is the pinch-off voltage of the HJFET. With reference to the term
the plus (+) and minus () signs are applicable to V.sub.C0 and V.sub.C0, respectively. For conventional thin-film transistors, a similar equation may be obtained by replacing the term
with the term C.sub.iW/L, and replacing V.sub.p with V.sub.T, where is charge-carrier effective mobility, W is the channel width, L is the channel length, C.sub.i is the gate dielectric capacitance per unit area, and V.sub.T is the threshold voltage of the transistor.
[0087] Even though the output of circuit 1400 does not implement an exact analog multiplication, this circuit may be used in one or more embodiments to perform a classification task. This is because, as known in the art, one or more of the following apply to deep neural networks: (i) high precision multiplication is not necessarily required for providing accurate classification results; (ii) in an appropriately-trained neural network, particularly with regularization, the resulting weights are generally small, and/or the abnormally large weights may be reduced without substantial reduction in classification accuracy; and (iii) the training algorithm may be chosen such that the weights obtained from training are smaller than a desired value (bound). Therefore, with sufficiently small (in absolute value) V.sub.C, the
term may become negligible or have negligible effect on the classification results. However, it is also possible to eliminate the
term, for example by using the exemplary circuit 1500 shown in
[0088] More particularly, circuit 1500 includes a first transistor, M1, which in this embodiment is an HJFET, having a first source/drain connectable to a corresponding column line, COL j, where j is an integer, a second source/drain connected with an inverting () input of an amplifier, A1, which in this embodiment is an operational amplifier, and a gate adapted to receive an input signal, V.sub.IN. The circuit 1500 further includes a second transistor, M2, which in this embodiment is an HJFET having its gate terminal connected its source/drain terminal. Specifically, a first source/drain and a gate of transistor M2 is connected with the inverting input of the amplifier A1, and a second source/drain of M2 is connected to the column line COL j. The circuit 1500 includes first and second resistors, R1 and R2, respectively, connected in a voltage divider arrangement with a non-inverting (+) input of the amplifier A1. Specifically, a first terminal of the resistor R1 is connected with the corresponding column line COL j, a second terminal of R1 is connected to a first terminal of resistor R2 and to the non-inverting input of the amplifier A1, and a second terminal of R2 is connected to ground (e.g., zero volt). The circuit 1500 further includes an integration capacitor, C.sub.int, connected between an output of the amplifier A1 and the inverting input of the amplifier. The output of the amplifier A1 generates an output voltage, V.sub.OUT, of the circuit 1500.
[0089] In one or more embodiments, the resistors R1 and R2 are the same value, and therefore a voltage, V.sub.C, on the corresponding column line COL j supplied as an input to the circuit 1500 is divided by two and applied to the non-inverting input of the amplifier A1; i.e., the non-inverting (+) input of amplifier A1 is biased at V.sub.C/2 and as a result the inverting () input of amplifier A1 is virtually biased at also V.sub.C/2. Therefore, a first source/drain terminal of M1 and M2 receive V.sub.C (via COL j) and a second source/drain terminal of M1 and M2 receive V.sub.C/2 (via the inverting terminal of A1); and the gate of M1 receives the input V.sub.IN while the gate of M2 receives V.sub.C/2 (via the inverting terminal of A1). Since the current flowing into C.sub.int is given by the algebraic sum of the currents flowing through M1 and M2, it is straightforward to find that the square term
is eliminated in the algebraic sum, and the output voltage V.sub.OUT may be expressed as follows:
Therefore V.sub.OUT may be expressed accurately as w.sub.j V.sub.IN, where
is the weight of the associated connection in the equivalent neural network, corresponding to column j. It will be appreciated by those skilled in the art that other circuits, generally known in the art as analog multipliers, may be similarly employed to generate an output proportional to V.sub.INV.sub.C and used for implementing the column integrators in the hidden blocks and/or the output blocks of the disclosed classification system.
[0090] The circuits 1400 and 1500 may optionally include an integration control transistor, e.g. M.sub.int (not shown) in series with C.sub.int, e.g., a first source/drain of M.sub.int may be connected to the inverting () input of the amplifier A1, a second source/drain of M.sub.int may be connected to a first terminal of C.sub.int and a second terminal of C.sub.int may be connected to the output of the amplifier A1. The gate of M.sub.int may be controlled by a control signal that enables M.sub.int (and thus allows the charge or discharge of C.sub.int) only during a desired integration period. For instance, if the circuit 1400 or 1500 is utilized in the hidden block 501 of the classification block 500 shown in
[0091] Turning now to an implementation of a weighting function, there are various techniques known in the art which are suitable for use with embodiments of the invention. By way of example only and without limitation,
[0092] The weighting circuit 1600 includes a weighting array 1602 connected with a row selection block or circuit 1604. The weighting array 1602, in this embodiment, includes a plurality of memory elements 1606 arranged in a cross-bar configuration. Each of the memory elements may be implemented using, for example, resistive memory, phase-change memory, conductive bridge memory, and a selecting element, such as, for example, a diode or an ovonic threshold switch (OTS). In one or more other embodiments, a thin-film transistor (TFT), such as, for example, a heterojunction field-effect transistor (HJFET) with programmable threshold voltage (shifted by applying bias), may be used as the memory element. The memory elements may be programmable as illustrated in this embodiment, or read-only in other embodiments where the PROG j columns and the WRITE row may be omitted. Some embodiments may include stacked arrays of memory elements.
[0093] In the illustrative embodiment shown in
[0094] During the write period of the weighting circuit 1604 (i.e., prior to sensor data acquisition and classification operations of the classifier system), the memory elements are programmed to values representing the weights in an equivalent neural network, by deselecting the READ line and selecting the WRITE line, and applying appropriate voltages on the row and column lines. Thus, each memory element 1606 in the array 1602 may be individually selected and programmed to a desired value using a unique pair of a corresponding row and program line associated therewith, as known in the art with respect to cross-bar memory configurations.
[0095] During the read period of the weighting circuit 1604 (i.e., during the sensor data acquisition and classification operations of the classifier system), the WRITE line is deselected, the READ line is selected, and the values stored in a desired row of the memory elements are provided to the column lines COLj by selecting the desired row line. It will be appreciated that during the read operation of the weighting circuit, the row lines may be selected using the same or different voltage values, V.sub.bias, compared to that during the write operation. In one example where the weighting circuit 1600 is used to implement at least a portion of the weight block 310 shown in
[0096] Each column j in the memory array 1602 is selectively coupled with an output circuit configured to generate an output signal, V.sub.colj, at a corresponding output, COLj, of the weighting circuit 1600. Specifically, each output circuit in column 1 includes an amplifier, A1, which may be an operational amplifier, connected with a first transistor, M1. The inverting () input of the amplifier A1 is connected to an output of the amplifier. A non-inverting (+) input of the amplifier A1 is coupled with a first source/drain of the transistor M1, a second source/drain of M1 is connected with the corresponding program line, PROGj, in the array 1602, and a gate of M1 is adapted to receive a control signal, READ, supplied by the row selection block 1604. A reference resistor, R.sub.ref, is connected between the non-inverting input of the amplifier A1 and V.sub.REF. The output of the amplifier A1 generates an output signal, V.sub.COLj, at the output COLj of column j.
[0097] The voltage V.sub.colj generated at the output COLj of a given column j can be closely determined using the following expression:
When V.sub.BIAS is set equal to V.sub.REF, where V.sub.REF<0, the output voltage V.sub.colj of column j can be closely determined as a function of the resistances R.sub.ij and R.sub.ref as follows:
Here, the resistances R.sub.ij and R.sub.ref function essentially as a voltage divider in determining the voltage contribution to the total output voltage. In this regard, memory elements in a given row are selected at a time.
[0098]
[0099] By way of example only and without limitation, an illustrative implementation of a thresholding circuit will now be described, with reference to
[0100] The inverter circuit 1802 includes a first transistor, M1, which may be an n-channel HJFET having a first source/drain coupled with a first terminal of a first resistor, RS1, a second source/drain coupled with a first terminal of a second resistor, RD1, and a gate adapted to receive the input signal V.sub.IN. A second terminal of resistor RS1 is connected to a first voltage source, VSS1, and a second terminal of resistor RD1 is connected to a first supply voltage, VDD1. The junction between the second source/drain of transistor M1 and resistor RD1 forms the output of the inverter circuit 1802. The buffer circuit 1804 includes a first transistor, MU, which may be an n-channel HJFET having a first source/drain connected to a first terminal of a third resistor, RD2, and a gate adapted to receive the input signal V.sub.IN. A second terminal of resistor RD2 is connected to a second supply voltage, VDD2. The buffer circuit 1804 further includes a second transistor, MD, which may be an n-channel HJFET having a first source/drain connected to a first terminal of a fourth resistor, RS2, a second source/drain connected to a second source/drain of transistor MU, and a gate coupled with the output of the inverter circuit 1802 at the second source/drain of transistor M1. A second terminal of resistor RS2 is connected to a second voltage source, VSS2. The junction between transistors MU and MD forms the output of the buffer circuit 1804 for generating the output signal V.sub.OUT. In some embodiments, one or more of the passive resistors (RD, RD2, RS, and RS2) may be replaced by active loads. For example, an n-channel HJFET with its gate terminal connected to its source terminal can function as an active load.
[0101] As is known in the art, a buffer circuit comprised of a pair of complementary pull-up and pull-down transistors may be used to approximate a sigmoid or a similar activation function, with adequately symmetric output values with respect to negative and positive input voltages. Other complementary circuits have also been used in the art. However, complementary transistors are not available in many thin-film transistor technologies, and when available (e.g. in the case of HJFET), fabricating both n-channel and p-channel transistors increases the number of process steps and generally results in higher fabrication costs that may be prohibitive in many applications in large-area and flexible electronics. In the thresholding circuit 1800, the two transistors MU and MD in the buffer circuit 1804 function as pull-up and pull-down devices, respectively, even though both transistors have the same channel type (e.g., n-channel) in this embodiment. This is because while MU and MD are not complementary, they receive complementary inputs at their gates, i.e. the input signal V.sub.IN and the inverse (complementary form) of V.sub.IN generated by the inverter circuit 1802 are fed to the buffer circuit 1804. As a result, the thresholding circuit 1800 can approximate an activation function with sufficiently symmetric output characteristics for positive and negative input voltages.
[0102]
[0103] It will be appreciated that various other activation functions known in the art, of the so-called S shape similar to the sigmoid function may also be approximated using the circuit 1802. A rectified linear unit (ReLU) may be implemented using a diode (and if required, a series resistor). Given that the gate/channel interface of an HJFET forms a p-n heterojunction, in some embodiments, a diode may be implemented by using the gate terminal of an HJFET as the first terminal of the diode, and using the drain and/or the source terminal of the HJFET as the second terminal of the diode.
[0104] Given the discussion thus far, it will be appreciated that, in general terms, an exemplary classifier system for implementing an equivalent deep neural network includes a weight block, a classification block, a row selector and a sensor array operatively coupled with the weight block, the classification block and the row selector. The sensor array comprises a plurality of sensor cells corresponding to respective neurons in an input layer of the equivalent DNN, the sensor cells being arranged in a two-dimensional matrix configuration, row lines, column lines, a data integration line and an integration start line.
[0105] The classifier system is configured such that the sensor cells share a common terminal connected to the data integration line, the row lines are controlled by the row selector, and the column lines receive respective weight values from the weight block, the weight values received from the weight block corresponding to connection weights in the equivalent DNN. The classification block includes a first integrator receiving a signal generated on the data integration line when the integration start line is selected, and a first thresholding unit receiving a signal from the first integrator, the first thresholding unit operatively coupled to an array of second integrators and second thresholding units arranged in a two-dimensional matrix configuration. The array of second integrators and second thresholding units include row lines, controlled by the row selector, and column lines receiving respective weight values from the weight block.
[0106] Given the discussion thus far, it will be further appreciated that a method of classifying data acquired from a sensor array includes: providing a classifier system implementing an equivalent deep neural network (DNN), the classifier system including a weight block, a classification block, a row selector, and the sensor array operatively coupled with the weight block, the classification block and the row selector, the sensor array comprising row lines, column lines, a data integration line, an integration start line and a plurality of sensor cells corresponding to respective neurons in an input layer of the equivalent DNN, the sensor cells arranged in a two-dimensional matrix configuration, the classification block comprising at least a first integrator coupled to a first thresholding unit, a classification array arranged in a two dimensional matrix of second integrators coupled to corresponding second thresholding units, row lines and column lines; resetting the second integrators in the classification block; resetting a first integrator in the classification block; scanning the sensor array sequentially by selecting a given one of the row lines of the classification array and loading prescribed voltages provided by the column lines onto storage capacitors included in the respective sensor cells coupled to the given one of the row lines, the prescribed voltages corresponding to connection weights associated with the neurons in the input layer of the equivalent DNN; generating a weighted sum of sensor readings obtained from scanning the sensor array by connecting the sensor array to the first integrator; thresholding the weighted sum of the sensor readings by applying an activation function using the first thresholding unit; multiplying the thresholded weighted sum of the sensor readings with the prescribed voltages corresponding to connection weights provided by the column lines using the second integrators in the selected given one of the row lines of the classification array, the prescribed voltages corresponding to connection weights between neurons in a first hidden layer and neurons in one of a next hidden layer and an output layer of the equivalent DNN; generating a cumulative weighted sum of the thresholded weighted sums of the sensor readings by repeating the steps of resetting a first integrator, sequentially scanning the sensor array, generating the weighted sum of sensor readings, thresholding the weighted sum of the sensor readings and multiplying the thresholded weighted sum of the sensor readings for H1 cycles using the second integrators in each column of the selected given one of the row lines of the classification array, H1 being a number of neurons in the first hidden layer of the equivalent DNN; and generating an output by applying the cumulative weighted sum to the second thresholding units in the corresponding columns of the selected given one of the row lines of the classification array, or generating corresponding inputs for the second integrators in a next subsequent row of the classification array.
[0107] At least a portion of the apparatus, methods and/or system described above may be implemented in an integrated circuit. In forming integrated circuits, identical dies are typically fabricated in a repeated pattern on a surface of a semiconductor wafer. Each die includes a device described herein, and may include other structures and/or circuits. The individual dies are cut or diced from the wafer, then packaged as an integrated circuit. One skilled in the art would know how to dice wafers and package die to produce integrated circuits.
[0108] Those skilled in the art will appreciate that the exemplary structures discussed above can be distributed in raw form (i.e., a single wafer having multiple unpackaged chips), as bare dies, in packaged form, or incorporated as parts of intermediate products or end products that benefit from having classification circuitry formed in accordance with one or more of the exemplary embodiments.
[0109] The illustrations of embodiments described herein are intended to provide a general understanding of the various embodiments, and they are not intended to serve as a complete description of all the elements and/or features of apparatus, methods and systems that might make use of the circuits and techniques described herein. Many other embodiments will become apparent to those skilled in the art given the teachings herein; other embodiments are utilized and derived therefrom, such that structural and logical substitutions and changes can be made without departing from the scope of this disclosure. It should also be noted that, in some alternative implementations, some of the steps of exemplary methods described herein may occur out of the order described or noted in the figures (where shown). For example, two steps described or shown in succession may, in fact, be executed substantially concurrently, or certain steps may sometimes be executed in the reverse order, depending upon the functionality involved. The drawings are also merely representational and are not drawn to scale. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.
[0110] Embodiments are referred to herein, individually and/or collectively, by the term embodiment merely for convenience and without intending to limit the scope of this application to any single embodiment or inventive concept if more than one is, in fact, shown. Thus, although specific embodiments have been illustrated and described herein, it should be understood that an arrangement achieving the same purpose can be substituted for the specific embodiment(s) shown; that is, this disclosure is intended to cover any and all adaptations or variations of various embodiments. Combinations of the above embodiments, and other embodiments not specifically described herein, will become apparent to those of skill in the art given the teachings herein.
[0111] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting. As used herein, the singular forms a, an and the are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms comprises and/or comprising, when used in this specification, specify the presence of stated features, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, elements, components, and/or groups thereof. Terms such as above and below, where used, are intended to indicate positioning of elements or structures relative to each other as opposed to absolute elevation.
[0112] The corresponding structures, materials, acts, and equivalents of any means or step-plus-function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the various embodiments has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the forms disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit thereof. The embodiments were chosen and described in order to best explain principles and practical applications, and to enable others of ordinary skill in the art to understand the various embodiments with various modifications as are suited to the particular use contemplated.
[0113] The abstract is provided to comply with 37 C.F.R. 1.72(b), which requires an abstract that will allow the reader to quickly ascertain the nature of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. In addition, in the foregoing Detailed Description, it can be seen that various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments require more features than are expressly recited in each claim. Rather, as the appended claims reflect, the claimed subject matter may lie in less than all features of a single embodiment. Thus the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as separately claimed subject matter.
[0114] Given the teachings provided herein, one of ordinary skill in the art will be able to contemplate other implementations and applications of the techniques and disclosed embodiments. Although illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that illustrative embodiments are not limited to those precise embodiments, and that various other changes and modifications are made therein by one skilled in the art without departing from the scope of the appended claims.