DEEP LEARNING NETWORK DEVICE, MEMORY ACCESS METHOD AND NON-VOLATILE STORAGE MEDIUM

20220414458 · 2022-12-29

    Inventors

    Cpc classification

    International classification

    Abstract

    A memory access method used when training a deep learning network is illustrated in the present disclosure. When calculating the weightings of the current layer to the previous layer, the differential terms generated by the weighting updating calculation from the next layer to the current layer are used for reducing the access number of accessing the memory. Since the memory access method greatly reduces the access number of accessing the memory, the training time and power consumption can be reduced, and the lifetime of the battery and memory of the deep learning network device can be prolonged. Especially in the case of limited battery power, the deep learning network device can run longer.

    Claims

    1. A memory access method, which is used when training a deep learning network, wherein the deep learning network is a neural network or a convolution neural network, the neural network or a full connection layer of the convolution neural network comprises an input layer, L hidden layers and an output layer, and the memory access method comprises: updating weightings of paths between the output layer and a L.sup.th hidden layer of the L hidden layers, and storing differential terms of all nodes of the output layer in a memory; updating weightings of paths between the L.sup.th hidden layer and a (L−1).sup.th hidden layer of the L hidden layers based on the differential terms of the all nodes of the output layer stored in the memory, and storing differential terms of all nodes of the L.sup.th hidden layer in the memory; updating weightings of paths between j.sup.th hidden layer of the L hidden layers and a (j−1).sup.th hidden layer of the L hidden layers based on differential terms of all nodes of a (j+1).sup.th hidden layer of the L hidden layers stored in the memory, and storing differential terms of all nodes of the j.sup.th hidden layer in the memory, wherein j is an integer from 2 to (L−1); and updating weightings of paths between the input layer and a 1.sup.st hidden layer of the L hidden layers based on differential terms of all nodes of a 2.sup.nd hidden layer of the L hidden layers stored in the memory.

    2. The memory access method of claim 1, wherein the deep learning network is a convolution neural network, and a transfer learning is used to only train the full connection layer of the convolution neural network.

    3. The memory access method of claim 1, wherein the differential term of the node O.sub.x of the output layer is expressed as:
    Δ.sub.O.sub.x=(out.sub.O.sub.x−Y.sub.O.sub.x)D(Act_fn).sub.O.sub.x; wherein Y.sub.O.sub.x is a target value of the node O.sub.x of the output layer, D(Act_fn).sub.O.sub.x is a derivative function of an activation function of the node O.sub.x of the output layer.

    4. The memory access method of claim 3, wherein the differential term of the node H.sub.Li of the L.sup.th hidden layer is expressed as:
    Δ.sub.H.sub.Li=(Σ.sub.i=1.sup.n[Δ.sub.O.sub.iw.sub.xi]); wherein n is a number of the all nodes of the output layer, w.sub.xi is a weighting of a path between the node H.sub.Li of the L.sup.th hidden layer corresponding to a weighting w.sub.x and the node O.sub.i of the output layer, and Δ.sub.O.sub.i is the differential term of the node O.sub.i of the output layer.

    5. The memory access method of claim 4, wherein the differential term of the node H.sub.ji of the j.sup.th hidden layer hidden layer is expressed as:
    Δ.sub.H.sub.ji=(Σ.sub.i=1.sup.n′[Δ.sub.H.sub.(j+1)iw.sub.x′i]); wherein n′ is a number of the all nodes of the j.sup.th hidden layer, w.sub.x′i is a weighting of a path between the node H.sub.ji of the j.sup.th hidden layer corresponding to a weighting w.sub.x, and the node H.sub.(j+1)i of the (j+1).sup.th hidden layer, and Δ.sub.H.sub.(j+1)i is the differential term of the node H.sub.(j+1)i of the (j+1).sup.th hidden layer.

    6. The memory access method of claim 5, wherein when updating the updating the all weightings of the paths between the j.sup.th hidden layer and the (j−1).sup.th hidden layer, an access number of accessing the memory is M.sub.Lj=(2N.sub.H(j+1)+2)N.sub.HjN.sub.H(j−1), wherein the N.sub.Hj is a number of the all nodes of the j.sup.th hidden layer, N.sub.H(j−1) is a number of the all nodes of the (j−1).sup.th hidden layer, and the N.sub.H(j+1) is a number of the all nodes of the (j+1).sup.th hidden layer.

    7. A deep learning network device, implemented by a computer device with a software, or implemented by a hardware circuit, which is characterized by being configured to execute the memory access method of claim 1 when training the deep learning network.

    8. The deep learning network device of claim 7, further comprising: a communication unit, used to communicate with an external electronic device; wherein only when the communication unit is unable to communicate with the external electronic device, the memory access method is executed when training the deep learning network.

    9. The deep learning network device of claim 7, wherein the deep learning network device is an edge computing device, an IoT sensor or a sensor for monitoring.

    10. The deep learning network device of claim 7, wherein the deep learning network is a convolution neural network, and a transfer learning is used to only train the full connection layer of the convolution neural network.

    11. The deep learning network device of claim 7, wherein the differential term of the node O.sub.x of the output layer is expressed as:
    Δ.sub.O.sub.x=(out.sub.O.sub.x−Y.sub.O.sub.x)D(Act_fn).sub.O.sub.x; wherein Y.sub.O.sub.x is a target value of the node O.sub.x of the output layer, D(Act_fn).sub.O.sub.x is a derivative function of an activation function of the node O.sub.x of the output layer.

    12. The deep learning network device of claim 11, wherein the differential term of the node H.sub.Li of the L.sup.th hidden layer is expressed as:
    Δ.sub.H.sub.Li=(Σ.sub.i=1.sup.n[Δ.sub.O.sub.iw.sub.xi]); wherein n is a number of the all nodes of the output layer, w.sub.xi is a weighting of a path between the node H.sub.Li of the L.sup.th hidden layer corresponding to a weighting w.sub.x and the node O.sub.i of the output layer, and Δ.sub.O.sub.i is the differential term of the node O.sub.i of the output layer.

    13. The deep learning network device of claim 12, wherein the differential term of the node H.sub.ji of the j.sup.th hidden layer hidden layer is expressed as:
    Δ.sub.H.sub.ji=(Σ.sub.i=1.sup.n′[Δ.sub.H.sub.(j+1)iw.sub.x′i]); wherein n′ is a number of the all nodes of the j.sup.th hidden layer, w.sub.x′i is a weighting of a path between the node H.sub.ji of the j.sup.th hidden layer corresponding to a weighting w.sub.x, and the node H.sub.(j+1)i of the (j+1).sup.th hidden layer, and Δ.sub.H.sub.(j+1)i is the differential term of the node H.sub.(j+1)i of the (j+1).sup.th hidden layer.

    14. The deep learning network device of claim 13, wherein when updating the updating the all weightings of the paths between the j.sup.th hidden layer and the (j−1).sup.th hidden layer, an access number of accessing the memory is M.sub.Lj=(2N.sub.H(j+1)+2)N.sub.HjN.sub.H(j−1), wherein the N.sub.Hj is a number of the all nodes of the j.sup.th hidden layer, N.sub.H(j−1) is a number of the all nodes of the (j−1).sup.th hidden layer, and the N.sub.H(j+1) is a number of the all nodes of the (j+1).sup.th hidden layer.

    15. A non-volatile storage medium, for storing program codes of the memory access method of claim 1.

    16. The non-volatile storage medium of claim 15, wherein the deep learning network is a convolution neural network, and a transfer learning is used to only train the full connection layer of the convolution neural network.

    17. The non-volatile storage medium of claim 15, wherein the differential term of the node O.sub.x of the output layer is expressed as:
    Δ.sub.O.sub.x=(out.sub.O.sub.x−Y.sub.O.sub.x)D(Act_fn).sub.O.sub.x; wherein Y.sub.O.sub.x is a target value of the node O.sub.x of the output layer, D(Act_fn).sub.O.sub.x is a derivative function of an activation function of the node O.sub.x of the output layer.

    18. The non-volatile storage medium of claim 17, wherein the differential term of the node H.sub.Li of the L.sup.th hidden layer is expressed as:
    Δ.sub.H.sub.Li=(Σ.sub.i=1.sup.n[Δ.sub.O.sub.iw.sub.xi]); wherein n is a number of the all nodes of the output layer, w.sub.xi is a weighting of a path between the node H.sub.Li of the L.sup.th hidden layer corresponding to a weighting w.sub.x and the node O.sub.i of the output layer, and Δ.sub.O.sub.i is the differential term of the node O.sub.i of the output layer.

    19. The non-volatile storage medium of claim 18, wherein the differential term of the node H.sub.ji of the j.sup.th hidden layer hidden layer is expressed as:
    Δ.sub.H.sub.ji=(Σ.sub.i=1.sup.n′[Δ.sub.H.sub.(j+1)iw.sub.x′i]); wherein n′ is a number of the all nodes of the j.sup.th hidden layer, w.sub.x′i is a weighting of a path between the node H.sub.ji of the j.sup.th hidden layer corresponding to a weighting w.sub.x, and the node H.sub.(j+1)i of the (j+1).sup.th hidden layer, and Δ.sub.H.sub.(j+1)i is the differential term of the node H.sub.(j+1)i of the (j+1).sup.th hidden layer.

    20. The non-volatile storage medium of claim 19, wherein when updating the updating the all weightings of the paths between the j.sup.th hidden layer and the (j−1).sup.th hidden layer, an access number of accessing the memory is M.sub.Lj=(2N.sub.H(j+1)+2)N.sub.HjN.sub.H(j−1), wherein the N.sub.Hj is a number of the all nodes of the j.sup.th hidden layer, N.sub.H(j−1) is a number of the all nodes of the (j−1).sup.th hidden layer, and the N.sub.H(j+1) is a number of the all nodes of the (j+1).sup.th hidden layer.

    Description

    BRIEF DESCRIPTIONS OF DRAWINGS

    [0022] The accompanying drawings are included to provide a further understanding of the present disclosure, and are incorporated in and constitute a part of this specification. The drawings illustrate exemplary embodiments of the present disclosure and, together with the description, serve to explain the principles of the present disclosure.

    [0023] FIG. 1 is a schematic diagram showing a neural network or a full connection layer, which comprises two hidden layers.

    [0024] FIG. 2 is a schematic diagram showing relation of nodes of the output layer and the nodes of the last hidden layer in the neural network or the full connection layer.

    [0025] FIG. 3 is a block diagram of a deep learning network device according to a first embodiment of the present disclosure.

    [0026] FIG. 4 is a block diagram of a deep learning network device according to a second embodiment of the present disclosure.

    [0027] FIG. 5 is flow chart of a memory access method used in a deep learning network device during training according to an embodiment of the present disclosure.

    DETAILS OF EMBODIMENTS

    [0028] Reference will now be made in detail to the exemplary embodiments of the present disclosure, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers are used in the drawings and the description to refer to the same or like parts.

    [0029] In order to reduce the access number of accessing the memory required to train the full connection layer of the convolution neural network or neural network, the embodiment of the present disclosure provides a memory access method used when training a deep learning network and a deep learning network device using the memory access method during training. Since the access number of accessing the memory is greatly reduced, training time and power consumption can be reduced, and the life time of the battery and memory of the deep learning network device can be prolonged.

    [0030] Firstly, refer to FIG. 3, and FIG. 3 is a block diagram of a deep learning network device according to a first embodiment of the present disclosure. The deep learning network device 3 is mainly realized through computer device and software. The deep learning network device 3 comprises a graphic processing unit 31, a processing unit 32, a memory 33, a direct memory access unit 34 and a communication unit 35. The processing unit 32 is electrically connected to the graphic processing unit 31, the memory 33, and communication unit 35, and the direct memory access unit 34 is electrically connected to the graphic processing unit 31 and the memory 33.

    [0031] In one of the implementations, the graphic processing unit 31 is used to perform the calculation of determination and training of the deep learning network under the control of the processing unit 32, and can directly access the memory 33 through the direct memory access unit 34. In another implementation, the direct memory access unit 34 can be removed, and the graphic processing unit 31 is used to the calculation of determination and training of the deep learning network under the control of the processing unit 32, but the memory 33 must be accessed through the processing unit 32. In yet another implementation, the processing unit 32 performs the calculation of determination and training of the deep learning network, and in this implementation, the direct memory access unit 34 and the graphic processing unit 31 can be removed.

    [0032] The communication unit 35 is used to communicate with an external electronic device, such as a cloud computing device. When the communication unit 35 can communicate with an external electronic device, the training of the deep learning network can be performed by the external electronic device communication; when the communication unit 35 cannot communicate with the external electronic device (for example, a natural disaster occurs and the network is disconnected, and the deep learning network device 3 is a rescue aerial camera with the limited battery capacity, which should be trained regularly or irregularly to accurately interpret the rescue images), and the training of the deep learning network is carried out by the deep learning network device 3. In the embodiment of the present disclosure, the training of the deep learning network can only train the neural network or the full connection layer. For example, in the case of transfer learning, only the full connection layer is trained, or in another case, the entire convolution neural network may be trained (including training of feature filter matrices, etc.), and the present disclosure is not limited thereto.

    [0033] Further, refer to FIG. 4, and FIG. 4 is a block diagram of a deep learning network device according to a second embodiment of the present disclosure. Being different from the first embodiment, the deep learning network device 4 is mainly implemented by pure hardware circuits (for example, but not limited to a field programmable gate array (FPGA) or a specific application integrated chip (ASIC)). The deep learning network device 4 comprises a deep learning network circuit 41, a control unit 42, a memory 43 and a communication unit 44, wherein the control unit 42 is electrically connected to the deep learning network circuit 41, the memory 43 and the communication unit 44. The deep learning network circuit 41 is used to perform the calculations of the determination and training of the deep learning network, and access the memory 43 through the control unit 42.

    [0034] The communication unit 44 is used to communicate with an external electronic device, such as a cloud computing device. When the communication unit 44 can communicate with the external electronic device, the training of the deep learning network can be performed by the external electronic device communication; when the communication unit 44 cannot communicate with the external electronic device, the training of the deep learning network is performed by the deep learning network device 4. In the embodiment of the present disclosure, the training of the deep learning network may only refer to the training of the neural network or the full connection layer (in the case of transfer learning), or it may also include the training of the entire convolution neural network (including the training of the feature filter matrices, etc.), and the present disclosure is not limited thereto. By the way, the deep learning network device 3 or 4 can be an edge computing device, an IoT sensor or a sensor for monitoring, and the present disclosure is not limited thereto.

    [0035] The deep learning network device 3 or 4 will train the neural network or the full connection layer, starting from the output layer to the previous layer, and gradually updating the weightings layer by layer (that is, using the back propagation method). In order to reduce the access number of accessing the memory 33 or 43, when deep learning network device 3 or 4 updates the weightings of paths between the current layer and the previous layer, the differential term of each node of the current layer is stored in the memory 33 or 43. For example, when updating the weightings of the paths between the output layer and the last hidden layer, the differential term of each node of the output layer will be stored in the memory 33 or 43, and when updating the weightings of the paths between the third and second hidden layers, the differential term of each node of the third hidden layer is stored in the memory 33 or 43. In this way, when updating the weightings of the paths between the current layer and the previous layer, the differential terms of the next layer of the current layer can be repeatedly used to reduce the access number of accessing the memory 33 or 43. For example, when updating the weights of the paths between the second hidden layer and the first hidden layer, the differential terms of the nodes of the third hidden layer (or the nodes of the output layer, if there are only two hidden layers) can be used.

    [0036] The differential term of the node O.sub.x of the output layer can be defined as:


    Δ.sub.O.sub.x=(out.sub.O.sub.x−Y.sub.O.sub.x)D(Act_fn).sub.O.sub.x,  EQUATION(7).

    By using EQUATION (7), the EQUATION (6) can be written as:

    [00010] L w x = .Math. i = 1 n [ Δ O i w xi ] D ( Act_fn ) H p O H q , EQUATION ( 8 )

    wherein w.sub.xi is a weighting of a path between the node H.sub.p of the last hidden layer corresponding to the weighting w.sub.x and the node O.sub.i of the output layer. By using the differential term of the node O.sub.i of the output layer, when updating the weightings of the paths between the last hidden layer and the previous layer (the second last hidden layer or input layer if there is merely one hidden layer) and calculating the derivative function

    [00011] ( L w x )

    of the loss function L over w.sub.x, to obtain the required values for calculating, the required access number of accessing the memory is (N.sub.OL+2). Take FIG. 1 as example, when updating the weightings of the paths between the 2.sup.nd hidden layer and the 1.sup.st hidden layer, the required access number of accessing the memory is totally M.sub.LL=(2N.sub.OL+2)N.sub.L1N.sub.L2. Simply, compared to the related art, totally N.sub.OLN.sub.L1N.sub.L2 times of accessing the memory can be reduced.

    [0037] If there are L hidden layers, when updating the weightings of the paths between the L.sup.th hidden layer and the (L−1).sup.th hidden layer, the differential terms of all the nodes of the L.sup.th hidden layer are stored in the memory. Each of the differential terms of all the nodes of the L.sup.th hidden layer can be expressed as:


    Δ.sub.H.sub.Li=(Σ.sub.i=1.sup.n[Δ.sub.O.sub.iw.sub.xi]),  EQUATION(9);

    wherein w.sub.xi is a weighting of a path between the node H.sub.Li of the L.sup.th hidden layer corresponding to a weighting w.sub.x and the node O.sub.i of the output layer. Therefore, when updating the weighting w.sub.x of the path between the (L−1).sup.th hidden layer and the (L−2).sup.th hidden layer, the derivative function

    [00012] ( L w x )

    of the loss function L over w.sub.x can be expressed as:

    [00013] L w x = .Math. i = 1 k [ Δ H Li w xi ] D ( Act_fn ) H ( L - 1 ) p O H ( L - 2 ) q , EQUATION ( 10 )

    wherein D(Act_fn).sub.H.sub.(L−1)p is the derivative function of the activation function of the node H.sub.(L−1)p of the (L−1).sup.th hidden layer, k is the number of the node of the L.sup.th hidden layer, and O.sub.H.sub.(L−2)q is the output value of the node H.sub.H(L−2)q corresponding to the weighting w.sub.x (i.e. the node H.sub.H(L−2)q of the (L−2).sup.th hidden layer connected to the node H.sub.H(L−1)p of the (L−1).sup.th hidden layer). By using the differential terms of all the nodes of the L.sup.th hidden layer, when updating the weighting w.sub.x of the path between the (L−1).sup.th hidden layer and the (L−2).sup.th hidden layer and calculating the derivative function

    [00014] ( L w x )

    or me loss function L over w.sub.x, to obtain the required values for calculating, the memory needs to be accessed (N.sub.HL+2) times, wherein N.sub.HL is the number of the node of the L.sup.th hidden layer. When updating the all weightings of the path between the (L−1).sup.th hidden layer and the (L−2).sup.th hidden layer, the memory needs to be accessed M.sub.OL=(N.sub.HL+2)N.sub.H(L−1)N.sub.H(L−2) times, wherein N.sub.H(L−1) is the number of the node of the (L−1).sup.th hidden layer, and N.sub.H(L−2) is the number of the node of the (L−2).sup.th hidden layer. Simply, compared to the related art, totally 3N.sub.OLN.sub.HLN.sub.H(L−1)N.sub.H(L−2) times for accessing the memory can be reduced during this updating.

    [0038] According to the above descriptions, when updating the weightings of the paths between the j.sup.th hidden layer and the (j−1).sup.th hidden layer, the differential terms of all the nodes of the j.sup.th hidden layer are stored in the memory. Each of the differential terms of all the nodes of the j.sup.th hidden layer can be expressed as:


    Δ.sub.H.sub.ji=(Σ.sub.i=1.sup.n′[Δ.sub.H.sub.(j+1)iw.sub.xi]  EQUATION(11);

    wherein n′ is the number of the node of the j.sup.th hidden layer, and w.sub.xi is a weighting of a path between the node H.sub.ji of the j.sup.th hidden layer hidden layer corresponding to a weighting w.sub.x and the node H.sub.(j+i)i of the (j+1).sup.th hidden layer. Therefore, when updating the weighting w.sub.x of the path between the (j−1).sup.th hidden layer and the (j−2).sup.th hidden layer, the derivative function

    [00015] ( L w x )

    of the loss function L over w.sub.x can be expressed as:

    [00016] L w x = .Math. i = 1 k [ Δ H ji w xi ] D ( Act_fn ) H ( j - 1 ) p O H ( j - 2 ) q , EQUATION ( 12 )

    wherein D(Act_fn).sub.H.sub.(j−1)p is the derivative function of the activation function of the node H.sub.(j−1)p of the (j−1).sup.th hidden layer, k′ is the number of the node of the j.sup.th hidden layer, and O.sub.H.sub.(j−2)q is the output value of the node H.sub.(j−2)q corresponding to the weighting w.sub.x (i.e. the node H.sub.(j−2)q of the (j−2).sup.th hidden layer connected to the node H.sub.(j−1)p of the (j−1).sup.th hidden layer). By using the differential terms of all the nodes of the j.sup.th hidden layer, when updating the weighting w.sub.x of the path between the (j−1).sup.th hidden layer and the (j−2).sup.th hidden layer and calculating the derivative function

    [00017] ( L w x )

    or me loss function L over w.sub.x, to obtain the required values for calculating, the memory needs to be accessed (N.sub.Hj, +2) times, wherein N.sub.Hj is the number of the node of the j.sup.th hidden layer. When updating the all weightings of the path between the (j−1).sup.th hidden layer and the (j−2).sup.th hidden layer, the memory needs to be accessed M.sub.L(j−1)=(2N.sub.Hj+2)N.sub.H(j−1)N.sub.H(j−2) times, wherein N.sub.H(j−1) is the number of the node of the (j−1).sup.th hidden layer, and N.sub.H(j−2) is the number of the node of the (j−2).sup.th hidden layer.

    [0039] When updating the weighting w.sub.x of the path between the 1.sup.st hidden layer and the input hidden layer, the derivative function

    [00018] ( L w x )

    of the loss function L over w.sub.x can be expressed as:

    [00019] L w x = .Math. i = 1 k [ Δ H 2 i w xi ] D ( Act_fn ) H 1 p O I q , EQUATION ( 13 )

    wherein D(Act_fn)H.sub.1p is the derivative function of the activation function of the node H.sub.1p of the 1.sup.st hidden layer, k″ is the number of the node of the 2.sup.nd hidden layer, and O.sub.I.sub.q is the output value of the node I.sub.q corresponding to the weighting w.sub.x (i.e. the node I.sub.q of the input layer connected to the node H.sub.1p of the 1.sup.st hidden layer). By using the differential terms of all the nodes of the 2.sup.nd hidden layer, when updating the weighting w.sub.x of the path between the 1.sup.st hidden layer and the input layer and calculating the derivative function

    [00020] ( L w x )

    of the loss function L over w.sub.x, to obtain the required values for calculating, the memory needs to be accessed (N.sub.H2+2) times, wherein N.sub.H2 is the number of the node of the 2.sup.nd hidden layer. When updating the all weightings of the path between the 1.sup.st hidden layer and the input layer, the memory needs to be accessed M.sub.L1=(2N.sub.H2+2)N.sub.H1N.sub.IL times, wherein N.sub.H1 is the number of the node of the 1.sup.st hidden layer, and N.sub.IL is the number of the node of the input layer.

    [0040] Please note here that when updating the weightings of the paths between the first hidden layer and the input layer, because all the differential terms of the first hidden layer will not be used later, there is no need to access the memory to store these differential terms of the first hidden layer. In addition, through the above-mentioned memory access method, memory requires additional memory space to record the differential terms Δ.sub.O.sub.i and Δ.sub.H.sub.ji, but the increased memory space is not large, only additional storage space for storing (N.sub.OL+N.sub.HL N.sub.H2) difference terms is added.

    [0041] Further, please refer to FIG. 5. The neural network or the full connection layer is composed of an input layer, L hidden layers and an output layer, and therefore, there are steps S5_1 to S5_(L+1) to be executed. At step S5_1, weightings of paths between the output layer and a L.sup.th hidden layer of the L hidden layers are updated, and differential terms of all nodes of the output layer are stored in a memory. Then, at step S5_2, weightings of paths between the L.sup.th hidden layer and a (L−1).sup.th hidden layer of the L hidden layers are updated, and differential terms of all nodes of the L.sup.th hidden layer are stored in a memory, wherein when updating the weightings of the paths between the L.sup.th hidden layer and the (L−1).sup.th hidden layer, the memory is accessed, and the differential terms of the all nodes of the output layer are used for updating. Next, at step S5_3, weightings of paths between the (L−1).sup.th hidden layer and a (L−2).sup.th hidden layer of the L hidden layers are updated, and differential terms of all nodes of the (L−1).sup.th hidden layer are stored in a memory, wherein when updating the weightings of the paths between the (L−2).sup.th hidden layer and the (L−2).sup.th hidden layer, the memory is accessed, and the differential terms of the all nodes of the L.sup.th hidden layer are used for updating. Step S5_4 to step S5_L can be known in the similar manner. Last, at step S5_(L+1), weightings of paths between the input layer and a 1.sup.st hidden layer of the L hidden layers are updated, wherein when updating the weightings of the paths between the input layer and the 1.sup.st hidden layer, the memory is accessed, and the differential terms of the all nodes of the 2.sup.nd hidden layer are used for updating. In addition, an embodiment of the present disclosure also provides a non-volatile storage medium for storing multiple program codes of the above-mentioned memory access method.

    [0042] Specifically, the embodiment of the present disclosure provides a memory access method used when training a deep learning network and a training deep learning network device using the memory access method. Since the memory access method greatly reduces the access number of accessing the memory, training time and power consumption can be reduced, and the battery and memory life time of the deep learning network device can be prolonged. Especially in the case of limited battery power, the deep learning network device can run longer.

    [0043] The above-mentioned descriptions represent merely the exemplary embodiment of the present disclosure, without any intention to limit the scope of the present disclosure thereto. Various equivalent changes, alternations or modifications based on the claims of present disclosure are all consequently viewed as being embraced by the scope of the present disclosure.