COMPUTE-IN-MEMORY MACRO DEVICE AND ELECTRONIC DEVICE
20220366947 · 2022-11-17
Assignee
Inventors
Cpc classification
G11C7/1063
PHYSICS
G06F15/7821
PHYSICS
Y02D10/00
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
G11C5/025
PHYSICS
G11C7/1006
PHYSICS
G11C7/1012
PHYSICS
International classification
Abstract
A compute-in-memory (CIM) macro device and an electronic device are proposed. The CIM macro device includes a CIM cell array including multiple CIM cells. First data is being divided into at least two bit groups including a first bit group which is the most significant bits of the first data and a second bit group which is the least significant bits of the first data, and the bit groups are respectively loaded in CIM cells of different columns of the CIM cell array. The electronic device includes at least one CIM macro and at least one processing circuit. The processing circuit is configured to receive and perform operation on parallel outputs respectively corresponding to the columns of the CIM cell array, where the parallel outputs include multiple correspondences, and where each of the correspondences includes most significant bits of an output activation and least significant bits of the output activation.
Claims
1. A compute-in-memory (CIM) macro device comprising: a CIM cell array comprising a plurality of CIM cells, wherein first data is being divided into at least two bit groups comprising a first bit group which is the most significant bits of the first data and a second bit group which is the least significant bits of the first data, and the at least two bit groups are respectively loaded in CIM cells of different columns of the CIM cell array.
2. The CIM macro device according to claim 1, wherein there is second data input to the CIM cells of the different columns of the CIM cell array after the first data is loaded in the CIM cells of the different columns, and wherein the first data is a weight and the second data is an input activation.
3. The CIM macro device according to claim 1, wherein there is second data input to the CIM cells of the different columns of the CIM cell array after the first data is loaded in the CIM cells of the different columns, and wherein the first data is an input activation and the second data is a weight.
4. The CIM macro device according to claim 1, wherein the first bit group of the first data is loaded in one CIM cell of an odd column of the CIM cell array and the second bit group of the first data is loaded in one CIM cell of an even column of the CIM cell array.
5. The CIM macro device according to claim 1, wherein the number of bits of each of the at least two bit groups are determined based on the number of bits per CIM cell.
6. The CIM macro device according to claim 1, wherein each of the at least two bit groups comprises a part of bits of the first data, and the at least two bit groups are respectively loaded in different CIM cells belonging to different groups of columns of the CIM cell array, and wherein the different groups of columns are grouping based on a common difference between every two adjacent columns of each group of columns.
7. The CIM macro device according to claim 1, wherein each of the at least two bit groups comprises a part of bits of the first data, and each bit group of the at least two bit groups is loaded in a CIM cell of a group of columns which comprises consecutive columns.
8. The CIM macro device according to claim 1, wherein a portion of the CIM cell array belong to a first CIM macro and another portion of the CIM cell array belong to a second CIM macro.
9. An electronic device comprising: at least one compute-in-memory (CIM) macro, wherein each of the at least one CIM macro comprises a CIM cell array comprising a plurality of CIM cells, and wherein first data is being divided into at least two bit groups comprising a first bit group which is the most significant bits of the first data and a second bit group which is the least significant bits of the first data, the at least two bit groups of the first data are respectively loaded in CIM cells of different columns of the CIM cell array, and second data is input to the CIM cells of the different columns of the CIM cell array after the first data is loaded in the CIM cells of the different columns; and at least one processing circuit, configured to receive and perform operation on a plurality of parallel outputs respectively corresponding to the columns of the CIM cell array, wherein the parallel outputs comprise a plurality of correspondences, wherein each of the correspondences comprises most significant bits of an output activation and least significant bits of the output activation.
10. The electronic device according to claim 9, wherein the CIM cells of the different columns of the CIM cell array are used for a convolution operation of the first data and the second data, and wherein one of the first data and the second data is a weight and the other one is an input activation.
11. The electronic device according to claim 9, wherein the most significant bits and the least significant bits of each of the weights are alternately loaded in odd columns and even columns of the columns of the CIM cells, and wherein for each of the correspondences, the at least one processing circuit shifts the most significant bits of the output activation and adds the shifted most significant bits of the output activation to the least significant bits of the output activation.
12. The electronic device according to claim 9, wherein the weights are unsigned numbers or signed numbers.
13. The electronic device according to claim 9, wherein the number of bits of the first bit group and the number of bits of the second bit group are determined based on the number of bits per CIM cell.
14. The electronic device according to claim 9, wherein each of the at least two bit groups comprises a part of bits of the first data, and the at least two bit groups are respectively loaded in different CIM cells belonging to different groups of columns of the CIM cell array, and wherein the different groups of columns are grouping based on a common difference between every two adjacent columns of each group of columns, and wherein for each of the correspondences, the at least one processing circuit shifts the most significant bits of the output activation and adds the shifted most significant bits of the output activation to the least significant bits of the output activation.
15. The electronic device according to claim 9, wherein each of the at least two bit groups comprises a part of bits of the first data, and each bit group of the at least two bit groups is loaded in a CIM cell of a group of columns which comprises consecutive columns, and wherein for each of the correspondences, the at least one processing circuit shifts the most significant bits of the output activation and adds the shifted most significant bits of the output activation to the least significant bits of the output activation.
16. The electronic device according to claim 9, wherein a portion of the CIM cell array belong to a first CIM macro and another portion of the CIM cell array belong to a second CIM macro.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] The accompanying drawings are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments of the disclosure and, together with the description, serve to explain the principles of the disclosure.
[0009]
[0010]
[0011]
[0012]
[0013] To make the above features and advantages of the application more comprehensible, several embodiments accompanied with drawings are described in detail as follows.
DESCRIPTION OF THE EMBODIMENTS
[0014]
[0015] An image input may be computed through a neural network model to acquire more features. An input feature map includes multiple input activations and multiple input feature maps are also called input channels. Multiple weights applied to the input feature map is regarded as a filter. By convolution operation applied to the input channels with the filters, an output feature map including multiple output activations, called an output channel, is generated. Referring to
[0016] To solve the prominent issue, some embodiments of the disclosure will now be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the application are shown. Indeed, various embodiments of the disclosure may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements. Like reference numerals refer to like elements throughout.
[0017]
[0018] Referring to
[0019] In one example, second data is input to the CIM cells of the different columns of the CIM cell array after the first data is loaded in the CIM cells of the different columns, where the first data is a weight and the second data is an input activation. In another example, second data is input to the CIM cells of the different columns of the CIM cell array after the first data is loaded in the CIM cells of the different columns, where the first data is an input activation and the second data is a weight.
[0020] In the present exemplary embodiment, the first bit group of the first data is loaded in one CIM cell of an odd column of the CIM cell array (e.g., one of the columns C201, C203, . . . C263) and the second bit group of the first data is loaded in one CIM cell of an even column of the CIM cell array (e.g., one of the columns C202, C204, . . . C264). As an example, an 8-bit unsigned weight may be split into the first bit group and the second bit group, where the most significant bits may be loaded in an odd column, and the least significant bits may be loaded in an even column adjacent to the aforesaid odd column. As another example, an 8-bit signed weight may be split into the first bit group and the second bit group, where the most significant bits may include a signed bit and may be loaded in an odd column and the least significant bits may be loaded in an even column adjacent to the aforesaid odd column. For illustrative purposes, W1M(k,3:0) and W1L(k,3:0) respectively denote signed most significant bits and unsigned least significant bits of an 8-bit weight, where k=1, 2, . . . , 256∈N. The rest can be deduced accordingly. As illustrated in
[0021] In the present exemplary embodiment, the processing circuit 250 is configured to receive and perform operation on multiple parallel outputs respectively corresponding to the columns C201-C264 of the CIM cell array. The parallel outputs include multiple correspondences, where each of the correspondences includes most significant bits of an output activation and least significant bits of the output activation. For example, a correspondence of the column C201 is a summation of W1M(1,3:0) to W1M(256,3:0), a correspondence of the column C202 is a summation of W1L(1,3:0) to W1L(256,3:0). Each CIM cell of the CIM cell array are used for a convolution operation of the first data and the second data, where one of the first data and the second data is a weight and the other one is an input activation.
[0022] Herein, the parallel outputs include 32 correspondences, where each of the correspondences include most significant bits of an output activation and least significant bits of the output activation. For illustrative purposes, parallel outputs O1M and O1L respectively representing most significant bits and least significant bits of an output activation O1 are considered as one correspondence. For each of the correspondences, the processing circuit 250 shifts the most significant bits of the output activation and adds the shifted most significant bits of the output activation to the least significant bits of the output activation, where the output activation may be represented as Oi=OiM<<4+OiL, i=1, 2, . . . , 32∈N. In a case where the weights are signed, for each of the correspondence, the processing circuit 250 shifts the most significant bits of the output activation including a signed bit and adds the shifted most significant bits of the output activation including the shifted signed bit with the least significant bits of the output activation.
[0023] In another exemplary embodiment, bits of each of the weights may be split into more than two bit groups. In such case, the number of bits of each of the bit groups may be determined based on the number of bits per CIM cell, and each of the bit groups includes a part of bits of the first data, and the bit groups are respectively loaded in different CIM cells belonging to different groups of columns of the CIM cell array, where the different groups of columns are grouping based on a common difference between every two adjacent columns of each group of columns. As an example, for a 10-bit weight to be stored in a CIM cell array of bit-width 4-bits, the 10-bit weight may be split into three bit groups: a first bit group may contain the most significant 4 bits, a second bit group may contain the least significant 4 bits, and an intermediate group may contain the remaining 2 bits. The number of bits of each bit group of the first data may be not exceed the bit-width of the CIM cell. As another example, for a 12-bit weight, the bits may be evenly spit into three bit groups: each of the first bit group, the intermediate bit group, and the second bit group may all contain 4 bits. Note that in both examples, the first bit group may be loaded into an m.sup.th column of CIM cells among the columns C201-C264, the intermediate bit group may be loaded into an (m+1).sup.th column of CIM cells among the columns C201-C264, and the second bit group may be loaded into an (m+2).sup.th column of CIM cells among the columns C201-C264, where m E N. In this example, the common difference between every two neighbouring columns in each group of columns is three columns. The processing circuit 150 may receive and perform operation on multiple parallel outputs respectively corresponding to the columns of CIM cell array in a similar fashion to the previous exemplary embodiment.
[0024]
[0025] Referring to
[0026] Similarly, the processing circuit 350 is configured to receive and perform operation on multiple parallel outputs respectively corresponding to the columns C301-C364 of the CIM cell array. The parallel outputs include multiple correspondences, where each of the correspondences includes most significant bits of an output activation from the first group of columns C301-C332 and least significant bits of the output activation from the second group of columns C333-C364. For illustrative purposes, parallel output O1M and O1L respectively representing most significant bits and least significant bits of an output activation O1 are considered as one correspondence. For each of the correspondences, the processing circuit 350 shifts the most significant bits of the output activation and adds the shifted most significant bits of the output activation to the least significant bits of the output activation, where the output activation may be represented as Oi=OiM<<4+OiL, i=1, 2, . . . , 32∈N. In a case where the weights are signed, for each of the correspondence, the processing circuit 350 shifts the most significant bits of the output activation including a signed bit and adds the shifted most significant bits of the output activation including the shifted signed bit to the least significant bits of the output activation.
[0027]
[0028] Referring to
[0029] The most significant bits and least significant bits of each of the weights are respectively loaded in the first CIM macro 410A and the second CIM macro 410B. As an example, an 8-bit unsigned weight can be split into two columns respectively stored in two CIM macros 410A and 410B, where the most significant bits may be loaded in a column of the first CIM macro 410A and the least significant bits may be loaded in a corresponding column of the second CIM macro 410B. As another example, an 8-bit signed weight can be split into two columns respectively stored in two CIM macros 410A and 410B, where the most significant bits may include a signed bit and may be loaded in a column of the first CIM macro 410A and the least significant bits may be loaded in a corresponding column of the second CIM macro 410B. For illustrative purposes, the column C401A of the first CIM macro 410A and the column C401B of the second CIM macro 410B are loaded by most significant bits and least significant bits of a same weight. The rest can be deduced accordingly.
[0030] The processing circuit 450 is configured to receive and perform operation on multiple parallel outputs respectively corresponding to the first CIM macro 410A and the second CIM macro 410B. The parallel outputs include 64 correspondences, where each of the correspondences include most significant bits of an output activation from the first CIM macro 410A and least significant bits of the output activation from the second CIM macro 410B. For illustrative purposes, parallel outputs O1M and O1L respectively representing most significant bits and least significant bits of an output activation O1 are considered as one correspondence. For each of the correspondence, the processing circuit 450 shifts the most significant bits of the output activation and adds the shifted most significant bits of the output activation to the least significant bits of the output activation, where the output activation may be represented as Oi=OiM<<4+OiL, i=1, 2, . . . , 64∈N. In a case where the weights are signed, for each of the correspondence, the processing circuit 450 shifts the most significant bits of the output activation including a signed bit and adds the shifted most significant bits of the output activation including the shifted signed bit to the least significant bits of the output activation.
[0031] In view of the aforementioned descriptions, the proposed technique allows the bit-width of the computation to be expanded without changing the width of the physical macro computation to facilitate different bit-width requirements of computation.
[0032] No element, act, or instruction used in the detailed description of disclosed embodiments of the present application should be construed as absolutely critical or essential to the present disclosure unless explicitly described as such. Also, as used herein, each of the indefinite articles “a” and “an” could include more than one item. If only one item is intended, the terms “a single” or similar languages would be used. Furthermore, the terms “any of” followed by a listing of a plurality of items and/or a plurality of categories of items, as used herein, are intended to include “any of”, “any combination of”, “any multiple of”, and/or “any combination of multiples of the items and/or the categories of items, individually or in conjunction with other items and/or other categories of items. Further, as used herein, the term “set” is intended to include any number of items, including zero. Further, as used herein, the term “number” is intended to include any number, including zero.
[0033] It will be apparent to those skilled in the art that various modifications and variations can be made to the structure of the disclosed embodiments without departing from the scope or spirit of the disclosure. In view of the foregoing, it is intended that the disclosure cover modifications and variations of this disclosure provided they fall within the scope of the following claims and their equivalents.