ARITHMETIC CIRCUITRY, MEMORY SYSTEM, AND METHOD OF CONTROLLING NON-VOLATILE MEMORY

20260029990 ยท 2026-01-29

Assignee

Inventors

Cpc classification

International classification

Abstract

Arithmetic circuitry according one embodiment performs a first arithmetic operation by AND operations and XOR operations. The first arithmetic operation corresponds to p multiplications (p is an integer of 2 or more) to be performed in series. The p multiplications are respectively represented by p order-3 tensors each receiving two elements of a Galois field as inputs and outputting one element as a result of multiplication of the two elements. The AND operations calculate AND values of a plurality of elements used in the p multiplications. The XOR operations are based on a contracted tensor obtained by contraction of an order-3p tensor obtained by a direct product of the p order-3 tensors and the AND values.

Claims

1. Arithmetic circuitry comprising: circuitry configured to perform a first arithmetic operation by AND operations and XOR operations, the first arithmetic operation corresponding to p multiplications to be performed in series, p being an integer of 2 or more, the p multiplications being respectively represented by p order-3 tensors each receiving two elements of a Galois field as inputs and outputting one element as a result of multiplication of the two elements, the AND operations calculating AND values of a plurality of elements used in the p multiplications, the XOR operations being based on a contracted tensor obtained by contraction of an order-3p tensor obtained by a direct product of the p order-3 tensors and the AND values.

2. The arithmetic circuitry according to claim 1, wherein the plurality of elements includes one or more first elements corresponding to a 2.sup.u-th power of an element of the Galois field, u being an integer of 1 or more, and the contracted tensor is obtained by contraction of a direct product of the order-3p tensor and one or more order-2 tensors representing a 2.sup.u-th power corresponding to the one or more first elements.

3. The arithmetic circuitry according to claim 2, wherein the p order-3 tensors are each obtained by using a companion matrix determined in accordance with the Galois field, and the one or more order-2 tensors are obtained by using the p order-3 tensors.

4. The arithmetic circuitry according to claim 1, wherein the contracted tensor is obtained by contraction of the order-3p tensor to one or more sets, in which an output from an order-3 tensor serves as an input to another order-3 tensor, from among multiple sets each including two order-3 tensors selected from the p order-3 tensors, the contraction being performed with respect to an index corresponding to the output and an index regarding the input.

5. The arithmetic circuitry according to claim 1, wherein the contracted tensor is represented by a matrix obtained by arranging one-dimensional vectors each being a vector in which a plurality of indices corresponding to a plurality of inputs is combined, the one-dimensional vectors being arranged such that a number of the one-dimensional vectors are identical to a number of elements of an index corresponding to one output.

6. A memory system comprising: a non-volatile memory in which data having been encoded in an error correction code is stored; and a memory controller including the arithmetic circuitry according to claim 1, the memory controller being configured to calculate syndromes being elements of a Galois field by using a received word read from the non-volatile memory, perform the first arithmetic operation by using the arithmetic circuitry with some of the syndromes as the plurality of elements, calculate an error position by using an error locator polynomial having a coefficient including a result of the first arithmetic operation, and correct an error at the error position calculated.

7. A method of controlling a non-volatile memory, the method comprising: storing, in the non-volatile memory, data having been encoded in an error correction code; reading, as a received word, the data from the non-volatile memory; calculating syndromes being elements of a Galois field by using a received word read from the non-volatile memory; performing the first arithmetic operation by using the arithmetic circuitry according to claim 1 with some of the syndromes as the plurality of elements; calculating an error position by using an error locator polynomial having a coefficient including a result of the first arithmetic operation; and correcting an error at the error position calculated.

8. The method according to claim 7, wherein the plurality of elements includes one or more first elements corresponding to a 2.sup.u-th power of an element of the Galois field, u being an integer of 1 or more, and the contracted tensor is obtained by contraction of a direct product of the order-3p tensor and one or more order-2 tensors representing a 2.sup.u-th power corresponding to the one or more first elements.

9. The method according to claim 8, wherein the p order-3 tensors are each obtained by using a companion matrix determined in accordance with the Galois field, and the one or more order-2 tensors are obtained by using the p order-3 tensors.

10. The method according to claim 7, wherein the contracted tensor is obtained by contraction of the order-3p tensor to one or more sets, in which an output from an order-3 tensor serves as an input to another order-3 tensor, from among multiple sets each including two order-3 tensors selected from the p order-3 tensors, the contraction being performed with respect to an index corresponding to the output and an index regarding the input.

11. The method according to claim 7, wherein the contracted tensor is represented by a matrix obtained by arranging one-dimensional vectors each being a vector in which a plurality of indices corresponding to a plurality of inputs is combined, the one-dimensional vectors being arranged such that a number of the one-dimensional vectors are identical to a number of elements of an index corresponding to one output.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0005] FIG. 1 is a block diagram of a memory system according to an embodiment;

[0006] FIG. 2 is a block diagram of a decoder according to the embodiment;

[0007] FIG. 3 is a diagram illustrating an outline of a calculation procedure of a syndrome and an error locator polynomial;

[0008] FIG. 4 is a diagram illustrating an example of the relationship between the syndrome and the error locator polynomial;

[0009] FIG. 5 is a diagram illustrating examples of vector representation of elements of a Galois field GF(2.sup.4);

[0010] FIG. 6 is a diagram for describing an example of a companion matrix;

[0011] FIG. 7 is a diagram illustrating an example of a method of obtaining a tensor T.sup.i;

[0012] FIG. 8 is a diagram illustrating a specific example of a method of obtaining a tensor T.sup.0;

[0013] FIG. 9 is a diagram illustrating an example of a method of obtaining a tensor S(1);

[0014] FIG. 10 is a diagram illustrating a specific example of a method of obtaining the tensor S(1);

[0015] FIG. 11 is a diagram illustrating a configuration example of an arithmetic unit;

[0016] FIG. 12 is a diagram illustrating a circuit configuration example in which the arithmetic unit of FIG. 11 is specified;

[0017] FIG. 13 is a diagram of a matrix in which six tensors each represented by a one-dimensional vector are arranged;

[0018] FIG. 14 is a diagram illustrating examples of diagram representation of tensors;

[0019] FIG. 15 is a diagram illustrating an example of diagram representation corresponding to direct product;

[0020] FIG. 16 is a diagram illustrating an example of diagram representation corresponding to contraction;

[0021] FIG. 17 is a diagram illustrating an example of diagram representation corresponding to contraction;

[0022] FIG. 18 is a diagram illustrating the relationship between an arithmetic unit in a comparative example and an arithmetic unit in the embodiment;

[0023] FIG. 19 is a diagram illustrating a circuit configuration example according to the embodiment, in which an arithmetic unit is specified;

[0024] FIG. 20 is a diagram illustrating an example of a flattened matrix;

[0025] FIG. 21 is a table for describing examples of reductions in the number of steps;

[0026] FIG. 22 is a diagram illustrating configuration examples of arithmetic units;

[0027] FIG. 23 is a diagram illustrating configuration examples of arithmetic units;

[0028] FIG. 24 is a diagram illustrating an example of an arithmetic unit;

[0029] FIG. 25 is a diagram illustrating a configuration example of an arithmetic unit that calculates an inverse element;

[0030] FIG. 26 is a diagram illustrating a configuration example of an arithmetic unit;

[0031] FIG. 27 is a diagram illustrating a configuration example of an arithmetic unit;

[0032] FIG. 28 is a diagram illustrating a configuration example of an arithmetic unit; and

[0033] FIG. 29 is a flowchart illustrating an example of decoding processing.

DETAILED DESCRIPTION

[0034] According to an embodiment, arithmetic circuitry includes circuitry that is configured to perform a first arithmetic operation by AND operations and XOR operations. The first arithmetic operation corresponds to p multiplications (p is an integer of 2 or more) to be performed in series. The p multiplications are respectively represented by p order-3 tensors each receiving two elements of a Galois field as inputs and outputting one element as a result of multiplication of the two elements. The AND operation calculates an AND value of a plurality of elements used in the p multiplications. The XOR operation is based on a contracted tensor obtained by contraction of an order-3p tensor obtained by a direct product of the p order-3 tensors and the AND value.

[0035] Preferred embodiments of arithmetic circuitry according to the invention will be described in detail below with reference to the accompanying drawings.

[0036] Hereinafter, a memory system including arithmetic circuitry that performs the multiplication of a Galois field at the time of decoding of an error correction code will be described as an example. A configuration using the arithmetic circuitry is not limited to this example, and any system (apparatus or device) may be used. For example, the arithmetic circuitry described below can be also applied to a memory system that performs the multiplication of a Galois field at the time of calculation of an error position, a system that performs the multiplication of a Galois field at the time of cipher processing, and the like.

[0037] A memory system according to the present embodiment will be described in detail with reference to the drawings. FIG. 1 is a block diagram illustrating a schematic configuration example of the memory system according to the present embodiment. As illustrated in FIG. 1, a memory system 1 includes a memory controller 10 and a non-volatile memory 20. The memory system 1 can be connected to a host 30, and FIG. 1 illustrates the memory system 1 in connection with the host 30. The host 30 may be an electronic device, such as a personal computer or a mobile terminal.

[0038] The non-volatile memory 20 is a non-volatile memory that stores data in a non-volatile manner and is, for example, a NAND flash memory (hereinafter, simply referred to as a NAND memory). Although a case where a NAND memory is used as the non-volatile memory 20 will be exemplified in the following description, a storage device other than the NAND memory, such as a three-dimensional structure flash memory, a resistive random access memory (ReRAM), or a ferroelectric random access memory (FeRAM), can be used as the non-volatile memory 20. In addition, the non-volatile memory 20 is not necessarily a semiconductor memory, and thus the present embodiment can be also applied to various types of storage media other than the semiconductor memory.

[0039] The memory system 1 may be any type of memory system including the non-volatile memory 20, such as a so-called solid state drive (SSD) or a memory card in which the memory controller 10 and the non-volatile memory 20 are configured as a single package.

[0040] The memory controller 10 controls writing to the non-volatile memory 20 in accordance with a write request from the host 30. In addition, the memory controller 10 controls reading from the non-volatile memory 20 in accordance with a read request from the host 30. The memory controller 10 is, for example, a semiconductor integrated circuit configured as a system on a chip (SoC). The memory controller 10 includes a host interface (host I/F) 15, a memory interface (memory I/F) 13, a control unit 11, an encoding/decoding unit (CODEC) 14, and a data buffer 12. The host I/F 15, the memory I/F 13, the control unit 11, the encoding/decoding unit 14, and the data buffer 12 are mutually connected through an internal bus 16. Part or the entirety of the operation of each constituent element of the memory controller 10 described below may be implemented by execution of firmware by a central processing unit (CPU) or may be implemented by hardware.

[0041] The host I/F 15 serves as a circuit that performs processing pursuant to an interface standard with the host 30 and outputs a command, user data to be written, or the like received from the host 30 to the internal bus 16. In addition, the host I/F 15 transmits user data restored after reading from the non-volatile memory 20, a response from the control unit 11, or the like to the host 30.

[0042] The memory I/F 13 serves as a circuit that performs write processing to the non-volatile memory 20 based on an instruction from the control unit 11. In addition, the memory I/F 13 performs read processing from the non-volatile memory 20 in accordance with an instruction from the control unit 11.

[0043] The control unit 11 controls the constituent components of the memory system 1, integrally. In a case where a command is received from the host 30 through the host I/F 15, the control unit 11 performs control pursuant to the command. For example, in accordance with a command from the host 30, the control unit 11 instructs the memory I/F 13 to write user data and parity to the non-volatile memory 20. In addition, in accordance with a command from the host 30, the control unit 11 instructs the memory I/F 13 to read user data and parity from the non-volatile memory 20.

[0044] In addition, in a case where a write request is received from the host 30, the control unit 11 determines a storage area (memory area) on the non-volatile memory 20 for the user data accumulated in the data buffer 12. That is, the control unit 11 manages a write destination for the user data. The correspondence between the logical address of user data received from the host 30 and the physical address indicating the storage area in which the user data is stored on the non-volatile memory 20 is stored as an address conversion table.

[0045] In addition, in a case where a read request is received from the host 30, the control unit 11 converts the logical address designated by the read request into a physical address by using the above-described address conversion table and instructs the memory I/F 13 to perform reading from the physical address.

[0046] In general, in a NAND memory, writing and reading are performed in data units called pages and erasing is performed in data units called blocks. In the present embodiment, a plurality of memory cells connected to the same word line is referred to as a memory cell group. In a case where each memory cell is a single level cell (SLC), one memory cell group corresponds to one page. In a case where each memory cell is multiple level cell (MLC), one memory cell group corresponds to multiple pages. Note that, in the present description, examples of the MLC include a triple level cell (TLC) and a quad level cell (QLC). Each memory cell is connected to a word line and is also connected to a bit line. Therefore, each memory cell can be identified by an address for identifying the word line and an address for identifying the bit line.

[0047] The data buffer 12 temporarily stores the user data received from the host 30 by the memory controller 10 until the user data is stored into the non-volatile memory 20. In addition, the data buffer 12 temporarily stores the user data read from the non-volatile memory 20 until the user data is transmitted to the host 30. For the data buffer 12, a general-purpose memory, such as a static random access memory (SRAM) or a dynamic random access memory (DRAM), can be used. Note that the data buffer 12 may be mounted outside the memory controller 10 instead of being built in the memory controller 10.

[0048] The user data transmitted from the host 30 is transferred to the internal bus 16 and then is temporarily stored in the data buffer 12. The encoding/decoding unit 14 encodes the user data to be stored in the non-volatile memory 20 to generate a code word. In addition, the encoding/decoding unit 14 decodes the received word read from the non-volatile memory 20 to restore the user data. Therefore, the encoding/decoding unit 14 includes an encoder 17 and a decoder 18. Note that data to be encoded by the encoding/decoding unit 14 may include, for example, control data for use inside the memory controller 10 in addition to the user data.

[0049] Next, write processing in the present embodiment will be described. At the time of writing to the non-volatile memory 20, the control unit 11 instructs the encoder 17 to encode user data. At that time, the control unit 11 determines a storage location (storage address) for a code word in the non-volatile memory 20 and also instructs the memory I/F 13 on the determined storage location.

[0050] The encoder 17 encodes the user data on the data buffer 12 to generate a code word in accordance with on the instruction from the control unit 11. Examples of an encoding method include: an encoding method using an algebraic code, such as a Bose-Chaudhuri-Hocquenghem (BCH) code or a Reed-Solomon (RS) code, and an encoding method (e.g., a product code) using the BCH code and the RS code as component codes in a row direction and a column direction. The memory I/F 13 performs control to store the code word in the storage location on the non-volatile memory 20 instructed from the control unit 11. Hereinafter, a case where a BCH code is used for correcting an error of t bits or less (t is an integer of 2 or more) will be described as an example.

[0051] Next, processing at the time of reading from the non-volatile memory 20 in the present embodiment will be described. At the time of reading from the non-volatile memory 20, the control unit 11 designates an address on the non-volatile memory 20 and instructs the memory I/F 13 to perform reading. In addition, the control unit 11 instructs the decoder 18 to start decoding. In accordance with the instruction from the control unit 11 the memory I/F 13 reads a received word from the designated address of the non-volatile memory 20 and inputs the read received word to the decoder 18. The decoder 18 decodes the received word read from the non-volatile memory 20.

[0052] The decoder 18 decodes the received word read from the non-volatile memory 20. The decoder 18 performs calculation of an error locator polynomial by using, for example, the Peterson Gorenstein Zierler (PGZ) algorithm. The PGZ algorithm is a method of solving, by matrix calculation, a system of equations between coefficients of an error locator polynomial and the syndrome of a received word.

[0053] FIG. 2 is a block diagram illustrating a configuration example of the decoder 18 according to the present embodiment. As illustrated in FIG. 2, the decoder 18 includes a syndrome calculation unit 101, an error locator polynomial calculation unit 102, an error position calculation unit 103, and a bit flipping unit 104.

[0054] The syndrome calculation unit 101 calculates a syndrome by using a received word (read sequence) read from the non-volatile memory 20. The syndrome calculation unit 101 may calculate a syndrome based on any conventionally available method. In a case where the values of all syndromes are zero, it can be determined that the received word has no error and thus the decoder 18 can end the decoding processing without performing the subsequent processing.

[0055] The error locator polynomial calculation unit 102 calculates an error locator polynomial based on the PGZ algorithm by using the syndrome. Some of the coefficients of such an error locator polynomial are calculated by performing the addition of syndromes and the multiplication of syndromes.

[0056] FIG. 3 is a diagram illustrating an outline of a calculation procedure of a syndrome and an error locator polynomial with a BCH code. A read sequence r.sub.0, r.sub.1, r.sub.2, . . . , r.sub.n1 having a code length of n bits is read as received words from the non-volatile memory 20. The syndrome calculation unit 101 receives the received words as inputs to calculate syndromes S.sub.1, S.sub.3, . . . , and S.sub.2t1. The error locator polynomial calculation unit 102 calculates coefficients .sub.0, .sub.1, . . . , .sub.t1, and .sub.t of a t-th-degree error locator polynomial from the syndromes. As illustrated in FIG. 3, the syndromes and the coefficients calculated by using the syndromes are elements of a Galois field.

[0057] FIG. 4 is a diagram illustrating an example of the relationship between a syndrome and an error locator polynomial. Note that FIG. 4 illustrates an example of first-degree to fourth-degree error locator polynomials (first-degree polynomial to fourth-degree polynomial) in a case where a BCH code is used for correcting an error of 4 bits or less (t=4). |M.sub.2|, |M.sub.3|, |M.sub.4|, and |M.sub.5| included in any of the second-degree polynomial to the fourth-degree polynomial are each calculated by the corresponding formula illustrated in the lower part of FIG. 4.

[0058] The first-degree to fourth-degree polynomials are formulas for error position calculation in a case where the numbers of errors are one to four, respectively. In the example of FIG. 4, the numbers of multipliers required for calculating coefficients of the first-degree polynomial to the fourth-degree polynomial are, for example, 0, 1, 5, and 21, respectively, except for the second power that can be implemented by simple calculation described with reference to FIGS. 9 and 10. Thus, as the number of error-correctable bits t increases, the number of multipliers for coefficient calculation of an error locator polynomial increases.

[0059] Therefore, a technique using an optimized arithmetic circuitry has been proposed. Such an optimized arithmetic circuitry is capable of performing shared computation of at least some of multiplication of syndromes used in coefficient calculation of an error locator polynomial. This enables suppression of an increase in the circuit size for performing the multiplication of a Galois field. This technique corresponds to a technique for commonly calculating a single multiplication that is included in common in plural types of multiplication. In one example, arithmetic circuitry is configured to calculate the multiplication S.sub.1S.sub.3 that is included in common in each of S.sub.1.sup.2S.sub.3, S.sub.1S.sub.3, S.sub.1.sup.4S.sub.3, and S.sub.1S.sub.3.sup.2, which are four types of multiplication.

[0060] Meanwhile, the coefficient calculation of an error locator polynomial may include multiple times of multiplication to be performed in series (hereinafter, referred to as multiple steps of multiplication). Multiple times of multiplication to be performed in series refer to multiplication in which an output from a certain multiplication included in the multiple times of multiplication serves as an input to another multiplication. Referring to FIG. 4, multiplications 301 to 308 each correspond to an arithmetic operation that can be interpreted as including multiple steps of multiplication. The multiplication 301 can be interpreted as a calculation including two steps of multiplication as in S.sub.1.sup.5S.sub.3=S.sub.1S.sub.1.sup.4S.sub.3. An increase in the number of steps of multiplication may cause an increase in calculation time. For this reason, desirably, multiple steps of multiplication are calculated more efficiently.

[0061] Therefore, an arithmetic unit 110 in the present embodiment is configured to efficiently perform an arithmetic operation including multiple steps of multiplication. Hereinafter, an example in which the arithmetic unit 110 is configured to calculate the multiplication 304 that is the seventh power of the syndrome S.sub.1 will be mainly described. The arithmetic unit 110 may be configured to calculate another multiplication (for example, any of the multiplications 301 to 303 and the multiplications 305 to 308) or may be configured to calculate two or more multiplications (for example, any two or more of the multiplications 301 to 308).

[0062] Referring back to FIG. 2, the arithmetic unit 110 (an example of arithmetic circuitry) will be further described. As illustrated in FIG. 2, the error locator polynomial calculation unit 102 includes the arithmetic unit 110 that performs an arithmetic operation of elements of a Galois field including the multiplication of elements of the Galois field. The arithmetic unit 110 includes an AND calculation unit 111 and an XOR calculation unit 112.

[0063] The AND calculation unit 111 performs AND operations to perform the multiplication of elements of the Galois field. The XOR calculation unit 112 performs XOR operations to perform the multiplication of elements of the Galois field. Details of the AND calculation unit 111 and the XOR calculation unit 112 will be described later.

[0064] Note that the arithmetic unit 110 serves as a constituent unit that performs at least part of the multiplication of syndromes required for coefficient calculation of an error locator polynomial. Any multiplication of syndromes that the arithmetic unit 110 does not perform is calculated by, for example, the error locator polynomial calculation unit 102. In this case, the error locator polynomial calculation unit 102 may perform coefficient calculation (including the multiplication of syndromes) based on any conventionally available method.

[0065] The error position calculation unit 103 calculates an error position by using the error locator polynomial calculated by the error locator polynomial calculation unit 102. Although processing for calculating an error position (search processing) may be implemented by any method, for example, Chien search can be used. The Chien search is a method of sequentially substituting a value into an error locator polynomial and searching for an error position based on the value at which the error locator polynomial has an output value of 0.

[0066] The bit flipping unit 104 inverts the bit at the error position calculated by the search processing (bit flipping) to perform error correction.

[0067] Next, details of the arithmetic unit 110 (AND calculation unit 111 and XOR calculation unit 112) will be described. The arithmetic unit 110 is configured based on the following procedure, for example. [0068] (P1) Determine m defining the number of elements 2.sup.m of a Galois field and a primitive polynomial p(x) with a degree of m. Note that, instead of such a primitive polynomial, an irreducible polynomial may be used. [0069] (P2) Obtain a companion matrix corresponding to the primitive polynomial p(x). [0070] (P3) Obtain a plurality of tensors to be used in the multiplication of elements of the Galois field, by using the companion matrix. When an element of the Galois field is represented by an m-dimensional vector, an element that is an output obtained by multiplying two elements is also represented by an m-dimensional vector. A tensor is obtained for each component of an element as an output represented by an m-dimensional vector. Thus, m tensors, each of which is a function that receives two vectors as inputs and outputs one value (one component of a vector) as an output, are obtained in total. Hereinafter, a tensor defined for the i-th component of an m-dimensional vector is represented as T.sup.i (i is an integer satisfying 0im1). The tensor T.sup.i may be referred to as an order-2 tensor because the number of input vectors is two. In addition, the entirety of m order-2 tensors is characterized by three indices and thus may be referred to as an order-3 tensor. [0071] (P4) Obtain an order-2 tensor S(u) representing the 2.sup.u-th power of an element of the Galois field (u is an integer of one or more). [0072] (P5) For an arithmetic operation including multiple steps of multiplication to be performed by the arithmetic unit 110, rewrite a plurality of sets of XOR operations segmented by AND operations to a single set of XOR operations. Specifically, an arithmetic operation by the arithmetic unit 110 is represented by AND operations of m-dimensional vectors and XOR operations represented by a tensor obtained by direct product and contraction. [0073] (P6) Configure the arithmetic unit 110 to perform XOR operations pursuant to the tensor.

[0074] Each of the above procedures will be further described below.

[0075] Regarding (P1), the definition of a Galois field will be described. The Galois field is determined by m{1, 2, 3 . . . } defining the number of elements and a primitive polynomial p(x) with a degree of m. The Galois field has the following characteristics. [0076] The Galois field has 2.sup.m elements consisting of one zero element 0 and (2.sup.m1) non-zero elements. [0077] A non-zero element can be represented by the power of a primitive element a that is the root of the primitive polynomial p(x) as in the following Formula (1).

[00001] GF ( 2 m ) := { 0 , 0 ( = 1 ) , 1 , 2 , ... , 2 m - 2 } , p ( x = ) = 0 ( 1 )

[0078] In a BCH code having a code length n=2.sup.m1 (m is an integer of 2 or more), a Galois field GF(2.sup.m) having 2.sup.m elements may be used. The Galois field GF(2.sup.m) has an element that can be represented by an m-bit vector (m-dimensional vector).

[0079] An element a GF(2.sup.m) of the Galois field GF(2.sup.m) can be represented by a polynomial on GF(2) having a degree of (m1) with respect to the primitive element a as in the following Formula (2). Note that i is an integer satisfying 0im1.

[00002] a = a 0 0 + a 1 1 + a 2 2 + .Math. + a m - 1 m - 1 , a i { 0 , 1 } = GF ( 2 ) ( 2 )

[0080] Therefore, the element a can be represented by an m-dimensional vector on GF(2) having the coefficients of a polynomial on GF(2) as components, as shown in the following Formula (3).

[00003] a .fwdarw. = ( a 0 , a 1 , a 2 , ... , a m - 1 ) ( 3 )

[0081] For example, a Galois field GF(2.sup.10) used in a BCH code having a code length of n=210-1 has an element that can be represented by a 10-bit vector. In addition, a Galois field GF(2.sup.4) used in a BCH code having a code length of n=24-1 has an element that can be represented by a 4-bit vector. FIG. 5 is a diagram illustrating examples of vector representation of elements of the Galois field GF(2.sup.4) in which the primitive polynomial is represented by p(x)=x.sup.4+x+1 with m=4.

[0082] An element 0 is represented by a polynomial 0.sup.0+0.sup.1+0.sup.2+0.sup.3 and is represented by a four-dimensional vector (0, 0, 0, 0) having the coefficients of the polynomial as components. Since a primitive element a satisfies a primitive polynomial p()=.sup.4++1=0, in other words, satisfies .sup.4=+1, an element .sup.4 can be transformed into 1+. Therefore, the element .sup.4 is represented by a polynomial 1.sup.0+1.sup.1+0.sup.2+0.sup.3 and is represented by a four-dimensional vector (1, 1, 0, 0) having the coefficients of the polynomial as components.

[0083] In addition, the addition of elements of the Galois field is represented by the XOR of two vectors for each bit. In the present embodiment, the multiplication of elements of the Galois field is performed by circuitry as an arithmetic circuitry in which AND operations and XOR operations are in combination.

[0084] In (P1), m defining the number of elements of a Galois field to be subjected to an arithmetic operation is determined. The value of m may be determined in any manner and thus may be determined in accordance with, for example, an encoding method to be applied or a type of memory to be applied. In a case where the memory system 1 uses a BCH code having a code length of 1000 bits, m is determined to be 10 such that the number of elements (2.sup.10=1024>1000) larger than the code length is included. In the Galois field, one or more primitive polynomials are defined for each value of m. Among the primitive polynomials, one primitive polynomial to be used is determined.

[0085] Next, (P2) will be described. Upon determining a primitive polynomial p(x), a matrix called a companion matrix is determined. FIG. 6 is a diagram for describing an example of the companion matrix.

[0086] The multiplication of an element a and a primitive element of the Galois field can be represented by the multiplication of a companion matrix C and a vector representing the element a. The multiplication with the primitive element a can be divided into an arithmetic operation corresponding to right shift and an arithmetic operation corresponding to feedback (FB). The circuit in FIG. 6 illustrates an example of a circuit corresponding to such right shift and feedback. The feedback can be interpreted as processing for a term that overflows due to the right shift. The companion matrix C can be also divided into a column corresponding to the right shift (first column to (m1)-th column) and a column corresponding to the feedback (m-th column).

[0087] The multiplication of a in the Galois field GF(2.sup.4) in which the primitive polynomial is represented by p(x)=x.sup.4+x+1 can be represented by the multiplication of the companion matrix C and the element a as shown in the formula in the lower part of FIG. 6. In the procedure (P2), the companion matrix corresponding to the determined primitive polynomial is obtained in this manner.

[0088] Next, (P3) will be described. In (P3), m tensors T.sup.i for use in the multiplication of elements of the Galois field are obtained by using the companion matrix. When m=10, ten tensors T.sup.0 to T.sup.9 are obtained.

[0089] FIG. 7 is a diagram illustrating an example of a method of obtaining the tensor T.sup.i. As illustrated in FIG. 7, the tensor T.sup.i is obtained by arranging the respective i-th row vectors of matrices C(0) to C(m1). Note that a matrix C(p) means the p-th power of the companion matrix C(p is an integer satisfying 0pm1).

[0090] FIG. 7 also illustrates an example of a formula indicating the relationship between the tensor T.sup.i and the multiplication of elements a and b of the Galois field. (ab).sub.i represents the i-th bit of a m-bit vector indicating a result of the multiplication of the elements a and b. As illustrated in FIG. 7, (ab).sub.i can be represented by the multiplication of a vector indicating the element a, a vector indicating the element b, and the tensor T.sup.i. In addition, (ab).sub.i can be represented in the form of the sum of AND operations of the j-th component a.sub.j of a (j is an integer satisfying 0jm1) and the k-th component bx of b (k is an integer satisfying 0km1).

[0091] As illustrated in FIG. 7, the tensor T.sup.i for use in the multiplication of the elements a and b can be represented by a combination of a.sub.jb.sub.k being an AND operation and XOR operations (T.sub.jk.sup.i) corresponding to the sum of results of the AND operations. Note that the tensor T.sub.jk.sup.i represents a tensor corresponding to an XOR operation. The subscripts j and k of the tensor T.sub.jk.sup.i correspond, respectively, to the indices of the elements a and b, which are vectors as inputs to the XOR operation (input vectors). The superscript i of the tensor T.sub.jk.sup.i corresponds to the index of ab, which is a vector as an output from the XOR operation (output vector). Note that the tensor T.sub.jk.sup.i may be referred to as a circular convolution tensor. The tensor T.sub.jk.sup.i (circular convolution tensor) corresponds to an order-3 tensor that receives two elements of the Galois field as inputs (for example, the elements a and b) and outputs one element as a result of multiplication of the two elements (for example, an element ab).

[0092] Hereinafter, similarly, the subscript and superscript of a tensor corresponding to an arithmetic operation may be represented, respectively, by the index of the input vector to the arithmetic operation and the index of the output vector from the arithmetic operation. For example, in a case where the element a, which is an m-dimensional vector, has an index j and an element a.sup.2, which is an m-dimensional vector, has an index i, a tensor S(1) representing the second power of the element a can be represented as S.sub.j.sup.i(1). Note that a vector that is an order-1 tensor has a component represented by a subscript. For example, the i-th component of the element a, which is an m-dimensional vector, is represented by a.sub.i.

[0093] FIG. 8 is a diagram illustrating a specific example of a method of obtaining a tensor T.sup.0 in the case of m=4 and the primitive polynomial p(x)=x.sup.4+x+1. First, matrices C(0) to C(3) are obtained from the companion matrix C. The tensor T.sup.0 is obtained by arranging the respective zeroth rows of the matrices C(0) to C(3). In a case of using the tensor T.sup.0, (ab).sub.0 is represented as a.sub.0b.sub.0+a.sub.1b.sub.3+a.sub.2b.sub.2+a.sub.3b.sub.1. Similarly, tensors T.sup.1, T.sup.2, and T.sup.3 can be obtained.

[0094] Next, (P4) will be described. The tensor S(1) representing the second power of an element of the Galois field can be obtained from the tensor T.sup.i. FIG. 9 is a diagram illustrating an example of a method of obtaining the tensor S(1). As illustrated in FIG. 9, the tensor S(1) is obtained by setting the main diagonal of the tensor T.sup.i as the i-th row vector.

[0095] FIG. 9 also illustrates an example of a formula indicating the relationship between the tensor S(1) and the second power of the element a of the Galois field (multiplication of the element a and the element a). As illustrated in FIG. 9, the second power (aa) of the element a can be represented by the multiplication of a vector indicating the element a and the tensor S(1).

[0096] FIG. 10 is a diagram illustrating a specific example of a method of obtaining the tensor S(1) in the case of m=4 and the primitive polynomial p(x)=x.sup.4+x+1. The tensor S(1) is obtained by arranging respective diagonal components of the tensors T.sup.0 to T.sup.3. In a case of using the tensor S(1), a.sup.2(=aa) is calculated as a vector (a.sub.0+a.sub.2, a.sub.2, a.sub.1+a.sub.3, a.sub.3).

[0097] The reason why only the diagonal components are extracted is that the components symmetrical with respect to the main diagonal are canceled to result in zero. For example, (aa).sub.0 corresponding to the tensor T.sup.0 is a.sub.0a.sub.0+a.sub.2a.sub.2+a.sub.3a.sub.1+a.sub.1a.sub.3. In GF(2), the second power a.sup.2 of the element a is equal to a, and the addition (a+a) of the same element a is zero. For example, a.sub.0a.sub.0 and a.sub.2a.sub.2 are a.sub.0 and a.sub.2, respectively. In addition, a.sub.3a.sub.1+a.sub.1a.sub.3 is zero because of the addition of the same element. Therefore, a.sub.0a.sub.0+a.sub.2a.sub.2+a.sub.3a.sub.1+a.sub.1a.sub.3 is represented as a.sub.0+a.sub.2.

[0098] The order-2 tensor S(u) representing the 2.sup.u-th power of an element of the Galois field in a case where u is 2 or more can be obtained by raising S(1) obtained as described above to the u-th power. For example, an order-2 tensor S(2) representing the fourth power of an element of the Galois field is obtained by the calculation S(1)S(1).

[0099] Next, (P5) will be described. Hereinafter, a configuration example of an arithmetic unit that calculates the seventh power of a of the Galois field (a.sup.7) will be described.

[0100] FIG. 11 is a diagram illustrating a configuration example of an arithmetic unit 500 that calculates the seventh power of a of the Galois field (a.sup.7). The arithmetic unit 500 includes XOR calculation units 501 and 502 and multiplication units 503 and 504.

[0101] The XOR calculation unit 501 performs XOR operations corresponding to the second power of the Galois field. As described above, an order-2 tensor representing the second power of an element of the Galois field is represented by S(1). The XOR calculation unit 501 performs XOR operations corresponding to the tensor S(1) to calculate a.sup.2 that is the second power of the input element a.

[0102] The XOR calculation unit 502 performs XOR operations corresponding to the fourth power of the Galois field. As described above, an order-2 tensor representing the fourth power of an element of the Galois field is represented by S(2). The XOR calculation unit 502 performs XOR operations corresponding to the tensor S(2) to calculate a.sup.4 that is the fourth power of the input element a.

[0103] The multiplication unit 503 performs multiplication of the input element a and a.sup.2 output from the XOR calculation unit 501 to output a.sup.3 resulting from the multiplication.

[0104] The multiplication unit 504 performs multiplication of a.sup.3 output from the multiplication unit 503 and a.sup.4 output from the XOR calculation unit 502 to output a.sup.7 resulting from the multiplication.

[0105] FIG. 12 is a diagram illustrating a circuit configuration example of the arithmetic unit 500 in which the constituent elements are specified as a logic circuit. Unlike that of the present embodiment, as a configuration example, two XOR calculation units are separate by an AND calculation unit. Note that FIG. 12 illustrates a configuration example of the arithmetic unit 500 with m=10.

[0106] As described above, the multiplication of elements of the Galois field is represented by a combination of AND operations and XOR operations. For example, the multiplication unit 503 that performs the multiplication of the element a and the element a.sup.2 is represented by an AND calculation unit based on an element a.sub.j and an element (a.sup.2).sub. and an XOR calculation unit corresponding to a tensor T.sub.j.sup.. The multiplication unit 504 that performs the multiplication of the element a.sup.3 and the element a.sup.4 is represented by an AND calculation unit based on an element (a.sup.3) u and an element (a.sup.4), and an XOR calculation unit corresponding to a tensor T.sub..sup.i. Note that, in the example of FIG. 12, i, j, k, l, , , and are each an integer of 0 or more and 9 or less.

[0107] Here, the computational complexity (calculation time) of the arithmetic unit 500 in a comparative example as in FIG. 12 will be described. First, the transformation of representation of a tensor for use in an arithmetic operation will be described.

[0108] In general, the tensor T.sub.jk.sup.i can be represented by transformation to a one-dimensional vector with a plurality of indices j and k of input vectors in combination. The processing of transforming a tensor into a one-dimensional vector is referred to as flatten. In addition, a matrix obtained by flatten is referred to as a flattened matrix. The flattened matrix corresponds to a matrix in which one-dimensional vectors, in which a plurality of indices corresponding to a plurality of inputs is combined, are arranged and the number of one-dimensional vectors is identical to the number of elements of an index corresponding to one output. Hereinafter, a case where a Galois field GF(26) is used in which the primitive polynomial is represented by p(x)=x.sup.6+x+1 with m=6 will be described as an example.

[0109] The multiplication of an element a and an element b is represented by using an order-3 tensor T.sub.jk.sup.i as in the following Formula (4). The zeroth element (i=0) in an output vector is calculated by an order-2 tensor T.sub.jk.sup.0 represented by the following Formula (5).

[00004] ( ab ) i = .Math. j , k T jk i a j b k ( 4 ) T jk 0 = ( 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 ) ( 5 )

[0110] The tensor T.sub.jk.sup.0 in Formula (5) can be transformed into a one-dimensional vector (1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0) by coupling each row. The component at the j-th row and the k-th column of T.sub.jk.sup.0 corresponds to the component at the q-th column of the one-dimensional vector (q=j+6k: q is an integer satisfying 0q35). In a similar procedure, the other tensors T.sub.jk.sup.1 to T.sub.jk.sup.5 can be each transformed into a one-dimensional vector.

[0111] FIG. 13 is a diagram illustrating an example of a matrix in which six tensors T.sub.jk.sup.0 to T.sub.jk.sup.5 each represented by a one-dimensional vector are arranged (order-2 tensor). A vector 1301 is a one-dimensional vector corresponding to the tensor T.sub.jk.sup.0.

[0112] A component having a value of 1 indicates that a.sub.jb.sub.k, which is the value of the corresponding AND operation, is used in XOR operations. Hereinafter, use in XOR operations may be referred to as contribution to XOR operations. In the example of Formula (5), the zeroth element in an output vector (ab).sub.0 is represented by (ab).sub.0=a.sub.0b.sub.0+a.sub.5b.sub.1+a.sub.4b.sub.2+a.sub.3b.sub.3+a.sub.2b.sub.4+a.sub.1b.sub.5. In this example, the number of terms for use in XOR operations (the number of terms contributing to XOR operations) is six.

[0113] In the present embodiment, as illustrated in the lower part of FIG. 13, based on a tree structure having terms as leaf nodes, XOR operations are performed in a tournament style. That is, first, three XOR operations are performed: a.sub.0b.sub.0+a.sub.5b.sub.1, a.sub.4b.sub.2+a.sub.3b.sub.3, and a.sub.2b.sub.4+a.sub.1b.sub.5. The respective results from the three XOR operations are defined as arithmetic results R01, R02, and R03. Next, an XOR operation is performed to the two arithmetic results R01 and R02 to output an arithmetic result R11. Finally, an XOR operation is performed to the arithmetic results R11 and R03 to output (ab).sub.0 as an arithmetic result.

[0114] In this example, the number of XOR operations to be performed in series for obtaining the final arithmetic result is three. Hereinafter, the number thereof is referred to as the number of stages of XOR operations (critical path). The number of stages ns of XOR operations is calculated based on ns=ceiling(log(the number of terms of XOR)). Ceiling(x) is a ceiling function that outputs a minimum integer of x or more. In the example of FIG. 13, the number of stages ns is as follows: ceiling(log(6))=3.

[0115] The computational complexity of the arithmetic unit 500 can be evaluated by such a number of steps as described above. The multiplication units 503 and 504 that perform multiplication account for most of the computational complexity of the arithmetic unit 500, and thus the computational complexity of the multiplication units 503 and 504 will be described.

[0116] The number of stages of the AND calculation unit in the multiplication unit 503 is one. With m=10, the number of terms in XOR operations of the XOR calculation unit ranges from 10 to 28. Because of ceiling(log(10))=4 and ceiling(log(28))=5, the number of stages of the XOR calculation unit in the multiplication unit 503 ranges from four to five. The number of stages of the entire multiplication unit 503 ranges from five to six. The number of stages of arithmetic operations of the multiplication unit 504 also ranges from five to six the same as the multiplication unit 503.

[0117] In the arithmetic unit 500 in the comparative example, the multiplication based on the multiplication unit 503 and the multiplication based on the multiplication unit 504 are performed in series. Therefore, the number of stages of arithmetic operations of the entire arithmetic unit 500 is 10 to 12 (two stages belonging to the AND calculation units and 8 to 10 stages belonging to the XOR calculation units). Since the multiplications to be performed in series are included, an increase in the number of stages of arithmetic operations may cause an increase in the calculation time of arithmetic operations of the entire arithmetic unit 500.

[0118] In the present embodiment, such multiple steps of multiplication as included in the arithmetic unit 500 are replaced with a single step of multiplication. Thus, the number of steps of entire arithmetic operation is suppressed, enabling higher-speed performance of multiple times of multiplication of the Galois field. [0119] (P5) will be further described. In the present embodiment, by a procedure of tensor direct product and contraction, multiple steps of multiplication are replaced with a single step of multiplication. Terminology used for describing replacement of multiplication is described below.

[0120] In the present embodiment, a tensor T is defined as follows. [0121] With a plurality of input vectors as inputs, transformation is provided such that linearity is established for each input (multilinearity). The linearity for each input is, for example, represented as follows.

[00005] T ( a + b , c ) = T ( a , c ) + T ( b , c ) T ( a , b + c ) = T ( a , b ) + T ( a , c ) [0122] The number of vectors related to transformation is called order. [0123] The index of an input vector is arranged as a subscript and the index of an output vector is arranged as a superscript.

[0124] Moreover, hereinafter, an arithmetic operation of the Galois field may be represented as a network with a diagrammatically represented tensor (including a vector and a matrix). FIG. 14 is a diagram illustrating examples of diagram representation of tensors.

[0125] A vector a has a component a.sub.i that is identified by a single index i. Therefore, the diagram representation of the vector a includes: a symbol a that is denoted inside a circle and represents a vector, and an undirected edge that outputs from the circle. The index i is added to the edge.

[0126] A companion matrix C that is an order-2 tensor has a component C.sub.j.sup.i identified by two indices i and j. The diagram representation of the companion matrix C includes: a symbol C that is denoted inside a rectangle and represents the companion matrix inside, an edge that is denoted with an index j and is input to the rectangle, and an edge that is denoted with an index i and is output from the rectangle.

[0127] A circular convolution tensor T that is an order-3 tensor has a component T.sub.jk.sup.i identified by three indices i, j, and k. The diagram representation of the circular convolution tensor T includes: a symbol T that is denoted inside a rectangle and represents the circular convolution tensor inside, two edges that are denoted with indices j and k and are input to the rectangle, and an edge that is denoted with an index i and is output from the rectangle.

[0128] Next, tensor direct product will be described. The tensor direct product represents an operation to obtain a higher-order tensor having the product of respective components of two tensors. For example, a direct product of an order-3 tensor having A.sub.jk.sup.i as a component and an order-2 tensor having B.sub.m.sup.l as a component corresponds to an operation to obtain an order-5 tensor having C.sub.jkm.sup.il=A.sub.jk.sup.iB.sub.m.sup.l as a component. FIG. 15 is a diagram illustrating an example of diagram representation corresponding to this direct product.

[0129] Next, tensor contraction will be described. The tensor contraction represents an operation in which two different indices are selected and then are replaced with the same index to one tensor and then the sum is taken over the same index. For example, an operation in which two indices l and k are selected and then are replaced with the same index K to an order-5 tensor having C.sub.jkm.sup.il=A.sub.jkB.sub.m.sup.l as a component and then the sum is taken over the index K to obtain an order-3 tensor D.sub.jm.sup.i illustrated in the following Formula (6) corresponds to contraction. FIG. 16 is a diagram illustrating an example of diagram representation corresponding to this contraction.

[00006] D jm i = .Math. = 0 m - 1 C j m i = .Math. = 0 m - 1 A j i B m ( 6 )

[0130] Here, the relationship between the multiplication of matrices (order-2 tensors) and the direct product and contraction will be described. A matrix S(2) obtained by multiplying two matrices (order-2 tensors) S(1) and S(1) is represented by the following Formula (7).

[00007] S ( 2 ) = .Math. = 0 m - 1 S i ( 1 ) S j ( 1 ) ( 7 )

[0131] The matrix S(2) corresponds to a tensor (matrix) obtained by taking a direct product of the two matrices S(1) and S(1) and performing contraction by an index with an edge having its both endpoints in a region to be under a single operation. For example, an index K for linking the index 1 of an output from the first matrix S.sub.j.sup.l(1) and the index k of an input to the second matrix S.sub.k.sup.i(1) together is used for contracting the two matrices S(1) to a matrix S.sub.j.sup.i(2). FIG. 17 is a diagram illustrating an example of diagram representation corresponding to this contraction.

[0132] Next, an example in which direct product and contraction are applied to the arithmetic unit 500, in the comparative example, including two steps of multiplication to configure the arithmetic unit 110 in the embodiment will be described. FIG. 18 is a diagram illustrating the relationship between the arithmetic unit 500 and the arithmetic unit 110.

[0133] The arithmetic unit 500 illustrated in the left part of FIG. 18 has a configuration corresponding to, for example, a configuration in which the arithmetic operation of the arithmetic unit 500 in FIG. 12 is diagrammatically represented.

[0134] The arithmetic unit 500 is replaced with the arithmetic unit 110 illustrated in the right part of FIG. 18 due to direct product and contraction. The arithmetic unit 110 corresponds to a tensor P(111) representing the seventh power of an element a. Note that the P of the tensor P(111) represents power. 111 corresponds to the binary representation of 7 of the seventh power. The tensor P(111) is represented by the following Formula (8) by using the indices j, k, and l of inputs and the index i of an output.

[00008] P jkl i ( 111 ) = .Math. , , T i T j S k ( 1 ) S l ( 2 ) ( 8 )

[0135] FIG. 19 is a diagram illustrating a circuit configuration example according to the embodiment, in which the arithmetic unit 110 is specified. Note that FIG. 19 illustrates a configuration example of the arithmetic unit 110 with m=10. The AND calculation unit 111 performs AND operations of elements a.sub.j, a.sub.k, and a.sub.l. The XOR calculation unit 112 performs XOR operations corresponding to the tensor P(111).

[0136] Next, (P6) will be described. In (P6), the arithmetic unit 110 is configured (designed) to perform XOR operations pursuant to the tensor P(111). The configuration of the arithmetic unit 110 may be performed by any conventionally available method. For example, a method using a tool or the like that performs circuit design in a hardware description language at register transfer level (RTL) can be applied.

[0137] In (P6), the flatten of the tensor P(111) may be performed. FIG. 20 is a diagram illustrating an example of a flattened matrix obtained by flattening the tensor P(111). FIG. 20 illustrates an example of a flattened matrix of the tensor P(111) with m=6.

[0138] Since the tensor P(111) has three indices (j, k, l) for input and a single index (i) for output, a flattened matrix with simple flattening is a matrix of 6 rows and 63 columns. Meanwhile, three input vectors to the tensor P(111) are based on the same element a. For this reason, the flattened matrix can be simplified.

[0139] In a case where the indices (j, k, l) for input as a set are (2, 1, 1), a value for use in XOR operations is a.sub.2a.sub.1a.sub.1. As described above, in GF(2), the second power a.sup.2 of the element a is equal to a. Therefore, a.sub.2a.sub.1a.sub.1 is equal to a.sub.1a.sub.2. In addition, the contribution of a.sub.2a.sub.1a.sub.1 to XOR operations is identical to the contribution of a.sub.1a.sub.2. Similarly, regarding a set in which the value of any one index of (j, k, l) is 2 and the values of the other indices are 1 and a set in which the values of any two indices of (j, k, l) are 2 and the value of the other index is 1, the contribution of each set to XOR operations is identical to the contribution of a.sub.1a.sub.2. Thus, terms the same in terms of the contribution to XOR operations are collected into one and, in the case of even numbers of contributions, the contribution is brought into zero, so that the flattened matrix can be simplified.

[0140] FIG. 20 illustrates an example of a flattened matrix obtained by such simplification. Note that the flattened matrix in FIG. 20 corresponds to (a.sup.7).sub.0 to (a.sup.7).sub.5 as the output vector of the tensor P(111) represented as in the following Formula (9).

[00009] ( a 7 ) 0 = a 0 + a 1 + a 3 + a 4 + a 0 a 1 + a 0 a 2 + .Math. + a 2 a 3 a 4 + a 2 a 4 a 5 .Math. ( a 7 ) 5 = a 1 + a 2 + a 3 + a 4 + a 0 a 3 + a 0 a 4 + .Math. + a 2 a 4 a 5 + a 3 a 4 a 5 } ( 9 )

[0141] Note that at least some of the above procedures (P1) to (P6) may be implemented by one or more processing units. Such processing units are implemented by one or more processors. The processing unit may be implemented by causing a processor, such as a central processing unit (CPU) or a graphics processing unit (GPU), to execute a program, namely, by software. The processing unit may be implemented by a processor such as a dedicated integrated circuit (IC), namely, by hardware. The processing unit may be implemented by software and hardware in combination. At least some of the procedures (P1) to (P6) may be implemented as the function of a tool used for the configuration of the arithmetic unit 110.

[0142] The computational complexity of the arithmetic unit 110 in the present embodiment will be described. The number of stages of the AND calculation unit 111 is two. With m=10, the number of terms in XOR operations by the XOR calculation unit 112 ranges from 79 to 92. Because of ceiling(log(79))=7 and ceiling(log(92))=7, the number of stages of the XOR calculation unit 112 is seven. The number of stages of the entire arithmetic unit 110 is nine.

[0143] Thus, in the present embodiment, the arithmetic unit 110 is configured such that a plurality of XOR calculation units is made compact together (in the example of FIG. 12, the respective XOR calculation units included in the multiplication units 503 and 504 and additionally the XOR calculation units 501 and 502). Thus, for example, in comparison to the arithmetic unit 500, of which the number of stages ranges from 10 to 12, in the comparative example, the number of stages of arithmetic operation of the arithmetic unit 110 can be suppressed to nine. That is, multiple times of multiplication of the Galois field can be performed at higher speed.

[0144] In the above, the comparison has been made in the number of stages with m=10. Even in any case other than the case of m=10, the number of stages of XOR operations can be reduced by applying the present embodiment. FIG. 21 is a table for describing examples of reductions in the number of stages with m=8 to 14.

[0145] FIG. 21 illustrates examples of the code length of the BCH code, examples of the primitive polynomial, examples of the maximum number of stages of XOR in the comparative example, examples of the maximum number of stages of XOR in the present embodiment, and examples of the increase rate of the number of transistors. Regarding the maximum number of stages of XOR in the comparative example, an estimate without an increase in the number of stages due to the XOR calculation unit 501 corresponding to the second power of the Galois field is described on the left and an estimate to which the number of stages of the XOR calculation unit 501 is added is described on the right. Note that FIG. 21 illustrates an example of the number of stages of arithmetic operation for calculation of a.sup.7 of the Galois field. In addition, the maximum number of stages of XOR corresponds to, for example, the number of stages of the XOR calculation unit 112 in the arithmetic unit 110.

[0146] As described above, the number of elements 2.sup.m of the Galois field is defined in accordance with the code length n=2.sup.m1 of the BCH code. With m ranging from 8 to 14, the maximum number of stages of XOR in the comparative example is 10 minimum and 17 maximum. In contrast to this, in the present embodiment, the maximum number of stages of XOR ranges from 6 to 8, and thus, regardless of the values of m, the number of steps can be made smaller in the present embodiment than in the comparative example. A larger m causes an increase in the increase rate of circuit size (the number of transistors) due to replacement of multiplication (direct product and contraction). However, for example, in a case where an increase is made in the input/output bit width of the decoder 18 in order to enhance the entire decoder 18 in speed, the circuit size for which the arithmetic unit 110 (error locator polynomial calculation unit 102) accounts in the decoder 18 is relatively small. For this reason, the influence of an increase in the circuit size of the arithmetic unit 110 can be brought into an allowable range.

[0147] The configuration example of the arithmetic unit 110 that calculates a.sup.7 of the Galois field has been described above. This arithmetic unit 110 is configured based on direct product and contraction to two order-3 tensors as an example. The number of order-3 tensors to be subjected to direct product and contraction is not limited to two and thus may be three or more.

[0148] That is, the arithmetic unit 110 can be configured such that AND operations (AND calculation unit 111) and XOR operations (XOR calculation unit 112) perform an arithmetic operation (first arithmetic operation) corresponding to p multiplications to be performed in series that are represented, respectively, by p order-3 tensors (p is an integer of 2 or more). The AND operations correspond to an arithmetic operation that calculate the AND values of a plurality of elements for use in the p multiplications. The XOR operations correspond to an arithmetic operation based on a tensor obtained by contraction of an order-3p tensor obtained by a direct product of the p order-3 tensors (hereinafter, referred to as a contracted tensor) and an AND values resulting from the AND operations.

[0149] The contracted tensor can be interpreted as being obtained as follows. That is, the contracted tensor corresponds to a tensor obtained by contraction of an order-3p tensor to one or more sets in which an output from one order-3 tensor serves as an input to the other order-3 tensor from among plural sets each including two order-3 tensors selected from p order-3 tensors, with respect to an index corresponding to the output and an index regarding the input.

[0150] In addition, as an example, the arithmetic unit 110 that calculates a.sup.7 of the Galois field includes, as elements for use in p multiplications (two multiplications), one or more elements (first element) corresponding to the 2.sup.u-th power of an element of the Galois field. The arithmetic unit 110 uses, for multiplication, an element a.sup.2 corresponding to the second power of an element a (u=1) and an element a.sup.4 corresponding to the fourth power of the element a (u=2). In such a case, the arithmetic unit 110 is configured to use a tensor (contracted tensor) resulting from contraction of a direct product of an order-3p tensor and one or more order-2 tensors representing the 2.sup.u-th power.

[0151] An arithmetic operation to be performed by the arithmetic unit 110 is not limited to the seventh power of an element of the Galois field. Hereinafter, a configuration example of the arithmetic unit 110 that performs an arithmetic operation different from the seventh power will be described.

[0152] FIG. 22 is a diagram illustrating a configuration example of the arithmetic unit 110 that calculates a.sup.6b of the Galois field. Note that an example of the arithmetic unit 500 before tensor direct product and contraction are performed is illustrated in the upper part of FIG. 22.

[0153] a.sup.6b for which the arithmetic unit 110 in FIG. 22 performs an arithmetic operation is represented by the following Formula (10). In addition, a tensor T.sub.jk.sup.i (a.sup.6b) in Formula (10) is represented by the following Formula (11). The tensor T.sub.jkl.sup.i(a.sup.6b) is an order-4 tensor for calculating a.sup.6b.

[00010] ( a 6 b ) i = .Math. j , k , l T jkl i ( a 6 b ) a j a k b i ( 10 ) T jkl i ( a 6 b ) = .Math. , , T l i T S k ( 2 ) S j ( 1 ) ( 11 )

[0154] FIG. 23 is a diagram illustrating a configuration example of the arithmetic unit 110 that calculates abc of the Galois field. Note that an example of the arithmetic unit 500 before tensor direct product and contraction are performed is illustrated in the upper part of FIG. 23.

[0155] abc for which the arithmetic unit 110 in FIG. 23 performs an arithmetic operation is represented by the following Formula (12). In addition, a tensor T.sub.jkl.sup.i(abc) in the Formula (12) is represented by the following Formula (13). The tensor T.sub.jkl.sup.i(abc) is an order-4 tensor for calculating abc.

[00011] ( abc ) i = .Math. j , k , l T jkl i ( abc ) a j b k c l ( 12 ) T jkl i ( abc ) = .Math. T l i T jk ( 13 )

[0156] As an example, the arithmetic unit 500 in the comparative example illustrated in FIG. 18, 22, or 23 includes two steps of multiplication corresponding to two tensors T. Although an arithmetic operation corresponding to two tensors T is included, even without performing direct product and tensor contraction to these tensors, a single step of multiplication may be provided.

[0157] FIG. 24 is a diagram illustrating an example of an arithmetic unit 600 having such a configuration as described above. The arithmetic operation that the arithmetic unit 600 performs is represented by the following Formula (14).

[00012] ( ( ab ) i 1 ( bc ) i 2 ) = ( .Math. j 1 , j 2 T j 1 j 2 i 1 a j 1 b j 2 .Math. j 2 , j 3 T j 2 j 3 i 2 b j 2 c j 3 ) ( 14 )

[0158] In the arithmetic unit 600, an arithmetic result from one tensor T is not used in the arithmetic operation of the other tensor T. For this reason, the arithmetic operation based on the two tensors T is not required to be performed in series. Thus, the arithmetic unit 600 can be configured such that the number of steps of multiplication is one. Such an arithmetic unit 600 does not need to perform such replacement of multiplication due to direct product and contraction as in the present embodiment.

[0159] Hereinafter, another example of an arithmetic unit to which such replacement of multiplication due to direct product and contraction as in the present embodiment can be applied will be described.

[0160] FIG. 25 is a diagram illustrating a configuration example of the arithmetic unit 500 that calculates an inverse element of an element a of the Galois field. Such an arithmetic unit 500 includes multiple times of multiplication to be performed in series (hereinafter, referred to as multiple steps of multiplication). Therefore, the transformation to a single step of multiplication in the present embodiment can be applied.

[0161] FIG. 26 is a diagram illustrating a configuration example of the arithmetic unit 500 that calculates a.sup.3b.sup.2 and a.sup.3b.sup.4 of the Galois field. FIGS. 27 and 28 are each a diagram illustrating a configuration example of the arithmetic unit 500 that calculates a.sup.3b.sup.6 of the Galois field. The arithmetic unit 500 in each of FIGS. 26 to 28 includes three tensors T. Meanwhile, the arithmetic unit 500 in each of FIGS. 26 and 28 includes two steps of multiplication, and the arithmetic unit 500 in FIG. 27 includes three steps of multiplication. Thus, the number of included tensors T and the number of steps of multiplication do not necessarily match with each other. Even in any case, provided that the number of steps of multiplication is two or more, the replacement of multiplication in the present embodiment can be applied.

[0162] In FIG. 26, all the tensors included in the arithmetic unit 500 may be brought into a single AND calculation unit and a single XOR calculation unit due to direct product and contraction. Alternatively, either a set of the left tensor and the upper right tensor or a set of the left tensor and the lower right tensor may be brought into a single AND calculation unit and a single XOR calculation unit due to direct product and contraction. In FIG. 27, all the tensors included in the arithmetic unit 500 may be brought into a single AND calculation unit and a single XOR calculation unit due to direct product and contraction. Alternatively, either a set of the left tensor and the middle tensor or a set of the middle tensor and the right tensor may be brought into a single AND calculation unit and a single XOR calculation unit due to direct product and contraction. In FIG. 28, all the tensors included in the arithmetic unit 500 may be brought into a single AND calculation unit and a single XOR calculation unit due to direct product and contraction. Alternatively, either a set of the upper left tensor and the right tensor or a set of the lower left tensor and the right tensor may be brought into a single AND calculation unit and a single XOR calculation unit due to direct product and contraction.

[0163] Note that the configuration in FIG. 28 can be interpreted as an improved version of the configuration in FIG. 27 such that a smaller number of steps of multiplication are performed in a tournament style. As illustrated in FIG. 28, the number of steps of multiplication can be configured to be ceiling(log(the number of tensors T)). That is, similarly to the case of an XOR operation, adoption of a tournament style enables an arithmetic operation to be performed at higher speed.

[0164] Next, a procedure of decoding processing in the memory system 1 will be described. FIG. 29 is a flowchart illustrating an example of decoding processing in the present embodiment.

[0165] The control unit 11 reads an error correction code from the non-volatile memory 20 and obtains a received word (step S101). In addition, the control unit 11 instructs the decoder 18 to start decoding.

[0166] The syndrome calculation unit 101 of the decoder 18 calculates a syndrome from the received word (step S102). The decoder 18 determines whether or not the values of all the calculated syndromes are zero (step S103).

[0167] In a case where all the syndromes are zero (step S103: Yes), it can be determined that no error exists in the received word, and thus the decoder 18 ends the decoding processing. In a case where all the syndromes are not zero (step S103: No), the error locator polynomial calculation unit 102 calculates an error locator polynomial by using the syndromes in accordance with the PGZ algorithm (step S104). In this case, for at least some of the coefficients of the error locator polynomial, the arithmetic unit 110 calculates the multiplication of syndromes.

[0168] The error position calculation unit 103 searches for an error position based on the calculated error locator polynomial (step S105). The bit flipping unit 104 corrects an error by inverting the bit at the error position obtained by the searching (by bit flipping) (step S106) and then ends the decoding processing.

[0169] As described above, according to the present embodiment, multiple times of multiplication of the Galois field can be performed at higher speed.

[0170] While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; moreover, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.