DEVICE FOR COMPUTING AN INNER PRODUCT OF VECTORS

Abstract

A device for computing an inner product of vectors includes a vector data arranger, a vector data pre-accumulator, a number converter, and a post-accumulator. The vector data arranger stores a first vector and sequentially outputs a plurality of vector data based on the first vector. The vector data pre-accumulator stores a second vector, receives each of the vector data, and pre-accumulates the second vector, so as to generate a plurality accumulation results. The number converter and the post-accumulator receive and process all the accumulation results corresponding to each of the vector data to generate an inner product value. The present invention implements a lookup table with the vector data pre-accumulator and the number converter to increase calculation speed and reduce power consumption.

Claims

1. A device for computing an inner product of vectors comprising: a vector data arranger configured to store a first vector for computing an inner product of vectors, wherein the first vector includes sub-vectors, the vector data arranger is configured to sequentially output a plurality of vector data, and each of the plurality of vector data includes at least one identical bit of each of the sub-vectors; a vector data pre-accumulator including word lines that are arranged in parallel and coupled to the vector data arranger, wherein the vector data pre-accumulator is configured to store a second vector for computing an inner product of vectors, the word lines are configured to receive each of the plurality of vector data, each of the plurality of vector data enables the word line, and an enabled the word line pre-accumulates the second vector to generate accumulation results; a number converter coupled to the vector data pre-accumulator and configured to receive, shift and add the accumulation results corresponding to each of the plurality of vector data to obtain a total data value in number format; and a post-accumulator coupled to the number converter and configured to receive, shift, and accumulate the total data values corresponding to the plurality of vector data, thereby generating an inner product value.

2. The device for computing an inner product of vectors according to claim 1, wherein the vector data pre-accumulator further comprises memory cells and bit lines arranged in parallel, the second vector includes data word vectors, each of the word lines is coupled to the bit lines through the memory cell, the memory cells respectively corresponding to the word lines are respectively configured to store the data word vectors, and the vector data pre-accumulator is configured to accumulate the data word vectors corresponding to the bit lines corresponding to an enabled the word line, thereby generating the accumulation results respectively corresponding to the bit lines.

3. The device for computing an inner product of vectors according to claim 1, wherein the number converter is a redundant to 2's complement (RTC) converter and the number format is 2's complement format.

4. The device for computing an inner product of vectors according to claim 3, wherein the post-accumulator is configured to shift and accumulate the total data values corresponding to the plurality of vector data based on an equation of P=Σ.sub.j=0.sup.N−1T.sub.j.Math.2.sup.j, thereby generating the inner product value, P represents the inner product value, N represents total number of the plurality of vector data, and T.sub.j represents the total data value corresponding to a j-th vector datum of the plurality of vector data.

5. The device for computing an inner product of vectors according to claim 1, wherein the vector data pre-accumulator is a computing-in-memory architecture.

6. The device for computing an inner product of vectors according to claim 1, wherein the data word vectors include logic “1” or logic “0”.

7. The device for computing an inner product of vectors according to claim 6, wherein each of the accumulation results generated by the vector data pre-accumulator is total number of a corresponding the logic “1”.

8. The device for computing an inner product of vectors according to claim 1, wherein the number converter and the post-accumulator are integrated into a carry-save adder.

9. A device for computing an inner product of vectors comprising: a vector data arranger configured to store a first vector for computing an inner product of vectors, wherein the first vector includes sub-vectors, the vector data arranger is configured to sequentially output a plurality of vector data, and each of the plurality of vector data includes at least one identical bit of each of the sub-vectors; a vector data pre-accumulator including word lines that are arranged in parallel and coupled to the vector data arranger, wherein the vector data pre-accumulator is configured to store a second vector for computing an inner product of vectors, the word lines are configured to receive each of the plurality of vector data, each of the plurality of vector data enables the word line, and an enabled the word line pre-accumulates the second vector to generate accumulation results; a post-accumulator coupled to the vector data pre-accumulator and configured to receive, shift, and accumulate the accumulation results corresponding to the plurality of vector data, thereby obtaining accumulation data values in redundant format; and a number converter coupled to the post-accumulator and configured to receive, shift, and add the accumulation data values, thereby obtaining an inner product value in number format.

10. The device for computing an inner product of vectors according to claim 9, wherein the vector data pre-accumulator further comprises memory cells and bit lines arranged in parallel, the second vector includes data word vectors, each of the word lines is coupled to the bit lines through the memory cell, the memory cells respectively corresponding to the word lines are respectively configured to store the data word vectors, and the vector data pre-accumulator is configured to accumulate the data word vectors corresponding to the bit lines corresponding to an enabled the word line, thereby generating the accumulation results respectively corresponding to the bit lines.

11. The device for computing an inner product of vectors according to claim 9, wherein the number converter is a redundant to 2's complement (RTC) converter and the number format is 2's complement format.

12. The device for computing an inner product of vectors according to claim 11, wherein the number converter is configured to shift and add the accumulation data values based on an equation of P=Σ.sub.j=0.sup.N+M−2 AD.sub.j.Math.2.sup.j, thereby generating the inner product value, P represents the inner product value, N represents total number of the plurality of vector data, AD.sub.j represents a j-th accumulation data value of the accumulation data values in redundant format, and M represents total number of the accumulation results corresponding to each of the plurality of vector data.

13. The device for computing an inner product of vectors according to claim 9, wherein the vector data pre-accumulator is a computing-in-memory architecture.

14. The device for computing an inner product of vectors according to claim 9, wherein the data word vectors include logic “1” or logic “0”.

15. The device for computing an inner product of vectors according to claim 14, wherein each of the accumulation results generated by the vector data pre-accumulator is total number of a corresponding the logic “1”.

16. The device for computing an inner product of vectors according to claim 9, wherein the number converter and the post-accumulator are integrated into a carry-save adder.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0025] FIG. 1 is a schematic diagram illustrating a conventional hardware architecture using distributed arithmetic;

[0026] FIG. 2 is a schematic diagram illustrating a device for computing an inner product of vectors according to a first embodiment of the present invention; and

[0027] FIG. 3 is a schematic diagram illustrating a device for computing an inner product of vectors according to a second embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

[0028] Reference will now be made in detail to embodiments illustrated in the accompanying drawings. Wherever possible, the same reference numbers are used in the drawings and the description to refer to the same or like parts. In the drawings, the shape and thickness may be exaggerated for clarity and convenience. This description will be directed in particular to elements forming part of, or cooperating more directly with, methods and apparatus in accordance with the present disclosure. It is to be understood that elements not specifically shown or described may take various forms well known to those skilled in the art. Many alternatives and modifications will be apparent to those skilled in the art, once informed by the present disclosure.

[0029] Unless otherwise specified, some conditional sentences or words, such as “can”, “could”, “might”, or “may”, usually attempt to express that the embodiment in the present invention has, but it can also be interpreted as a feature, element, or step that may not be needed. In other embodiments, these features, elements, or steps may not be required.

[0030] Certain terms are used throughout the description and the claims to refer to particular components. One skilled in the art appreciates that a component may be referred to as different names. This disclosure does not intend to distinguish between components that differ in name but not in function. In the description and in the claims, the term “comprise” is used in an open-ended fashion, and thus should be interpreted to mean “include, but not limited to.” The phrases “be coupled to,” “couples to,” and “coupling to” are intended to compass any indirect or direct connection. Accordingly, if this disclosure mentioned that a first device is coupled with a second device, it means that the first device may be directly or indirectly connected to the second device through electrical connections, wireless communications, optical communications, or other signal connections with/without other intermediate devices or connection means.

[0031] Reference throughout this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment.

[0032] FIG. 2 is a schematic diagram illustrating a device for computing an inner product of vectors according to a first embodiment of the present invention. Referring to FIG. 2, the first embodiment of the device for computing an inner product of vectors of the present invention is introduced as follows. The device 20 for computing an inner product of vectors includes a vector data arranger 201, a vector data pre-accumulator 202, a number converter 203, and a post-accumulator 204. The vector data arranger 201 is configured to store a first vector for computing an inner product of vectors. The first vector includes sub-vectors. The total number of the sub-vectors is K. Each sub-vector has N bits. The vector data arranger 201 is configured to sequentially output a plurality of vector data, wherein each of the plurality of vector data includes at least one identical bit or one identical byte of each of the sub-vectors. For example, the first vector includes three sub-vectors. Each sub-vector includes three bits or three bytes. The vector data arranger 201 is configured to sequentially output three vector data. Assume that the first sub-vector, the second sub-vector, and the third sub-vector are respectively [000], [010], and [100]. The first vector datum includes the first bit of each sub-vector, namely [000]. The second vector datum includes the second bit of each sub-vector, namely [010]. The third vector datum includes the third bit of each sub-vector, namely [001]. In the first embodiment, the total number of the vector data is N/B. B is the bit-width for selecting the same bit data of each sub-vector that forms the vector datum. Each vector datum has K bits. N and K are natural numbers. The vector data pre-accumulator 202 includes word lines 2021 that are arranged in parallel. The number of the word lines 2021 is K. All the word lines 2021 are coupled to the vector data arranger 201. The vector data pre-accumulator 202 is configured to store a second vector for computing an inner product of vectors. All the word lines 2021 are configured to receive each of the plurality of vector data. Each of the plurality of vector data enables the word line 2021. The enabled word line 2021 pre-accumulates the second vector to generate accumulation results R. The number converter 203 is coupled to the vector data pre-accumulator 202 and configured to receive, shift and add the accumulation results R corresponding to each of the plurality of vector data to obtain a total data value T in number format. The post-accumulator 204 is coupled to the number converter 203 and configured to receive, shift, and accumulate the total data values T corresponding to the plurality of vector data, thereby generating an inner product value P. For example, the number converter 203 may be a redundant to 2's complement (RTC) converter and the number format may be redundant to 2's complement format. The post-accumulator 204 may be configured to shift and accumulate the total data values T corresponding to the plurality of vector data based on an equation of P=Σ.sub.j=0.sup.N−1T.sub.j.Math.2.sub.j, thereby generating the inner product value. P represents the inner product value, and T.sub.j represents the total data value T corresponding to a j-th vector datum of the plurality of vector data. Besides, the number converter 203 and the post-accumulator 204 may be integrated into a carry-save adder, thereby reducing calculation delay and implementation cost.

[0033] In some embodiments of the present invention, the vector data pre-accumulator 202 may further include bit lines 2022 arranged in parallel and a memory array 2023. The memory array 2023 includes memory cells. The second vector includes data word vectors h1, h2, . . . , and hk. For example, the vector data pre-accumulator 202 may be a computing-in-memory architecture. The number of the bit lines 2022 is M. Each of the word lines 2021 is coupled to all the bit lines 2022 through the memory cell. The memory cells respectively corresponding to the word lines 2021 are respectively configured to store the data word vectors h1, h2, . . . , and hk. For example, the word lines 2021 from top to bottom are respectively used as a first word line, a second word line, . . . , and a K-th word line. The memory cells coupled to the first word line are configured to store the data word vector h1. The memory cells coupled to the second word line are configured to store the data word vector h2. The memory cells coupled to the K-th word line are configured to store the data word vector hk. In the conventional technology, the memory array enables one word line one time. However, the vector data pre-accumulator 202 can enable the word lines 2021 one time. The vector data pre-accumulator 202 is configured to accumulate the data word vectors h1, h2, . . . , and hk corresponding to all the bit lines 2022 corresponding to the enabled word line 2021, thereby generating all the accumulation results R respectively corresponding to all the bit lines 2022. In the first embodiment, the data word vectors h1, h2, . . . , and hk include logic “0” or logic “1”. The total number of the data word vectors h1, h2, . . . , and hk is K. Each of the data word vectors h1, h2, . . . , and hk has M bits. The total number of all the accumulation results R corresponding to each of the vector data is M. M is a natural number. Each accumulation result R has a length of log.sub.2(K+1) bits. In an embodiment of the present invention, each of the accumulation results R generated by the vector data pre-accumulator 202 is the total number of a corresponding the logic “1”, but the present invention is not limited thereto. As a result, the device for computing an inner product of vectors sense word lines 2021 and bit lines 2022 and implement a look-up table memory with the vector data pre-accumulator 202 and the number converter 203. The memory size of the vector data pre-accumulator 202 linearly increase with the length of the vector. Thus, the device for computing an inner product of vectors applies to computing an inner product of long vectors, greatly reduces computation amount, increases computation speed, and decreases power consumption.

[0034] Assume that N is equal to 3 and K is equal to 4. The vector data arranger 201 sequentially outputs the first vector datum, the second vector datum, and the third vector datum. When the vector data arranger 201 outputs the first vector datum, j is equal to 0. When the vector data arranger 201 outputs the second vector datum, j is equal to 1. When the vector data arranger 201 outputs the third vector datum, j is equal to 2. The accumulation results R may be the first accumulation results, the second accumulation results, or the third accumulation results.

[0035] When the first vector datum is [0001], the vector data pre-accumulator 202 receives the first vector datum and pre-accumulates the data word vectors h1, h2, h3, and h4 based on the first vector datum, thereby generating the first accumulation results. The first accumulation results are equivalent to h1. The number converter 203 receives, shifts, and adds the first accumulation results to obtain T.sub.0. When the second vector datum is [0011], the vector data pre-accumulator 202 receives the second vector datum and pre-accumulates the data word vectors h1, h2, h3, and h4 based on the second vector datum, thereby generating the second accumulation results. The second accumulation results are equivalent to h1+h2. The number converter 203 receives, shifts, and adds the second accumulation results to obtain T.sub.1. When the third vector datum is [1111], the vector data pre-accumulator 202 receives the third vector datum and pre-accumulates the data word vectors h1, h2, h3, and h4 based on the third vector datum, thereby generating the third accumulation results. The third accumulation results are equivalent to h1+h2+h3+h4. The number converter 203 receives, shifts, and adds the third accumulation results to obtain T.sub.2. Finally, the post-accumulator 204 receives, shifts, and accumulates T.sub.0, T.sub.1, and T.sub.2 to generate the inner product value P based on an equation of P=Σ.sub.j=0.sup.N−1T.sub.j.Math.2.sup.j.

[0036] FIG. 3 is a schematic diagram illustrating a device for computing an inner product of vectors according to a second embodiment of the present invention. Referring to FIG. 3, the second embodiment of the device for computing an inner product of vectors of the present invention is introduced as follows. The device 30 for computing an inner product of vectors includes a vector data arranger 301, a vector data pre-accumulator 302, a post-accumulator 303, and a number converter 304. The vector data arranger 301 is configured to store a first vector for computing an inner product of vectors. The first vector includes sub-vectors. The total number of the sub-vectors is K. Each sub-vector has N bits. The vector data arranger 301 is configured to sequentially output a plurality of vector data, wherein each of the plurality of vector data includes at least one identical bit or one identical byte of each of the sub-vectors. For example, the first vector includes three sub-vectors. Each sub-vector includes three bits or three bytes. The vector data arranger 201 is configured to sequentially output three vector data. Assume that the first sub-vector, the second sub-vector, and the third sub-vector are respectively [000], [010], and [100]. The first vector datum includes the first bit of each sub-vector, namely [000]. The second vector datum includes the second bit of each sub-vector, namely [010]. The third vector datum includes the third bit of each sub-vector, namely [001]. In the second embodiment, the total number of the vector data is N/B. B is the bit-width for selecting the same bit data of each sub-vector that forms the vector datum. Each vector datum has K bits. N and K are natural numbers. The vector data pre-accumulator 302 includes word lines 3021 that are arranged in parallel. The number of the word lines 3021 is K. All the word lines 3021 are coupled to the vector data arranger 301. The vector data pre-accumulator 302 is configured to store a second vector for computing an inner product of vectors. All the word lines 3021 are configured to receive each of the plurality of vector data. Each of the plurality of vector data enables the word line 3021. The enabled word line 3021 pre-accumulates the second vector to generate accumulation results R. The post-accumulator 303 is coupled to the vector data pre-accumulator 302 and configured to receive, shift, and accumulate the accumulation results R corresponding to the plurality of vector data, thereby obtaining accumulation data values AD in redundant format. The number converter 304 is coupled to the post-accumulator 303 and configured to receive, shift, and add the accumulation data values AD, thereby obtaining an inner product value P in number format. For example, the number converter 304 is a redundant to 2's complement (RTC) converter and the number format is 2's complement format. The number converter 304 is configured to shift and add the accumulation data values AD based on an equation of P=Σ.sub.j=0.sup.N+M−2AD.sub.j.Math.2.sup.j, thereby generating the inner product value P. AD.sub.j represents a j-th accumulation data value of the accumulation data values AD in redundant format, and M represents the total number of the accumulation results R corresponding to each of the plurality of vector data. Since the accumulation results R are accumulated in redundant format and then converted into a value in 2's complement format, the calculation speed and relative power consumption of the number converter 304 can be reduced, thereby increasing the operation speed of hardware. In addition, the number converter 304 and the post-accumulator 303 are integrated into a carry-save adder, thereby reducing calculation delay and implementation cost.

[0037] In some embodiments of the present invention, the vector data pre-accumulator 302 may further include bit lines 3022 arranged in parallel and a memory array 3023. The memory array 3023 includes memory cells. The second vector includes data word vectors h1, h2, . . . , and hk. For example, the vector data pre-accumulator 302 may be a computing-in-memory architecture. The number of the bit lines 3022 is M. Each of the word lines 3021 is coupled to all the bit lines 3022 through the memory cell. The memory cells respectively corresponding to the word lines 3021 are respectively configured to store the data word vectors h1, h2, . . . , and hk. For example, the word lines 3021 from top to bottom are respectively used as a first word line, a second word line, . . . , and a K-th word line. The memory cells coupled to the first word line are configured to store the data word vector h1. The memory cells coupled to the second word line are configured to store the data word vector h2. The memory cells coupled to the K-th word line are configured to store the data word vector hk. In the conventional technology, the memory array enables one word line one time. Like the first embodiment, the vector data pre-accumulator 302 can enable the word lines 3021 one time. The vector data pre-accumulator 302 is configured to accumulate the data word vectors h1, h2, . . . , and hk corresponding to all the bit lines 3022 corresponding to the enabled word line 3021, thereby generating all the accumulation results R respectively corresponding to all the bit lines 3022. In the second embodiment, the data word vectors h1, h2, . . . , and hk include logic “0” or logic “1”. The total number of the data word vectors h1, h2, . . . , and hk is K. Each of the data word vectors h1, h2, . . . , and hk has M bits. The total number of all the accumulation results R corresponding to each of the vector data is M. M is a natural number. Each accumulation result R has a length of log.sub.2(K+1) bits. In an embodiment of the present invention, each of the accumulation results R generated by the vector data pre-accumulator 302 is the total number of a corresponding the logic “1”, but the present invention is not limited thereto. As a result, the device for computing an inner product of vectors sense word lines 3021 and bit lines 3022 and implement a look-up table memory with the vector data pre-accumulator 302 and the number converter 304. The memory size of the vector data pre-accumulator 302 linearly increase with the length of the vector. Thus, the device for computing an inner product of vectors applies to computing an inner product of long vectors, greatly reduces computation amount, increases computation speed, and decreases power consumption.

[0038] Assume that N is equal to 3, K is equal to 4, and M is equal to 3. The vector data arranger 301 sequentially outputs the first vector datum, the second vector datum, and the third vector datum. Assume that h1 is [001], h2 is [010], h3 is [011], and h4 is [100].

[0039] When the first vector datum is [0001], the vector data pre-accumulator 302 receives the first vector datum and pre-accumulates the data word vectors h1, h2, h3, and h4 based on the first vector datum, thereby generating the first accumulation results R. The first accumulation results R are equivalent to h1. When the second vector datum is [0011], the vector data pre-accumulator 302 receives the second vector datum and pre-accumulates the data word vectors h1, h2, h3, and h4 based on the second vector datum, thereby generating the second accumulation results R. The second accumulation results R are equivalent to h1+h2, namely [011]. When the third vector datum is [1111], the vector data pre-accumulator 302 receives the third vector datum and pre-accumulates the data word vectors h1, h2, h3, and h4 based on the third vector datum, thereby generating the third accumulation results R. The third accumulation results R are equivalent to h1+h2+h3+h4, namely [022]. The post-accumulator 303 receives, shifts, and accumulates the first accumulation results R, the second accumulation results R, and the third accumulation results R to obtain total data values AD.sub.0, AD.sub.1, AD.sub.2, AD.sub.3, and AD.sub.4 in number format. As shown in formula (2), AD.sub.0 is 1, AD.sub.1 is 1, AD.sub.2 is 3, AD.sub.3 is 2, and AD.sub.4 is 0. Finally, the number converter 304 shifts and adds the total data values AD.sub.0, AD.sub.1, AD.sub.2, AD.sub.3, and AD.sub.4 based on an equation of P=Σ.sub.j=0.sup.N+M−2 AD.sub.j.Math.2.sup.j, thereby generating the inner product value P.

[−001]+[−0110]+[02200]=[02311] (2)

[0040] According to the embodiments provided above, the device for computing an inner product of vectors sense word lines and bit lines and implement a look-up table memory with the vector data pre-accumulator 202 and the number converter. The memory size of the vector data pre-accumulator linearly increase with the length of the vector. Thus, the device for computing an inner product of vectors applies to computing an inner product of long vectors, greatly reduces computation amount, increases computation speed, and decreases power consumption.

[0041] The embodiments described above are only to exemplify the present invention but not to limit the scope of the present invention. Therefore, any equivalent modification or variation according to the shapes, structures, features, or spirit disclosed by the present invention is to be also included within the scope of the present invention.

DEVICE FOR COMPUTING AN INNER PRODUCT OF VECTORS

Inventors

Cpc classification

Classification Explorer

G06F2207/4814

PHYSICS

Classification Explorer

G06F5/01

PHYSICS

Classification Explorer

G06F7/5443

PHYSICS

Classification Explorer

G06F17/16

PHYSICS

Classification Explorer

Y02D10/00

GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS

Classification Explorer

G06F7/5275

PHYSICS

Classification Explorer

G06F7/50

PHYSICS

International classification

Classification Explorer

G06F17/16

PHYSICS

Classification Explorer

G06F5/01

PHYSICS

Classification Explorer

G06F7/50

PHYSICS

Classification Explorer

G06F7/527

PHYSICS

Classification Explorer

G06F7/544

PHYSICS

Abstract

Claims

Description