Inference calculation for neural networks with protection against memory errors

Abstract

A method for operating a hardware platform for the inference calculation of a layered neural network. In the method: a first portion of input data which are required for the inference calculation of a first layer of the neural network and redundancy information relating to the input data are read in from an external working memory into an internal working memory of the computing unit; the integrity of the input data is checked based on the redundancy information; in response to the input data here being identified as error-free, the computing unit carries out at least part of the first-layer inference calculation for the input data to obtain a work result; redundancy information for the work result is determined, based which the integrity of the work result can be verified; the work result and the redundancy information are written to the external working memory.

Claims

1. A method for operating a hardware platform for an inference calculation of a layered neural network, the hardware platform including a computing unit with an internal working memory and an external working memory arranged outside the computing unit and connected to the computing unit, the method comprising the following steps: reading in a first portion of input data which are required for the inference calculation of a first layer of the neural network and redundancy information relating to said input data from the external working memory into the internal working memory of the computing unit; checking integrity of the input data based on the redundancy information; in response to the input data being identified as error-free, carrying out, by the computing unit at least part of the first-layer inference calculation for the input data to obtain a work result; determining redundancy information for the work result, based on which an integrity of the work result can be verified; and writing the work result and the redundancy information to the external working memory, wherein the work result includes outputs from neurons of the first layer which each contain a weighted sum of inputs of the neurons of the first layer, wherein the inference calculation includes convolution of data with a plurality of convolutional kernels, and wherein the redundancy information is determined by convolving the data with a control kernel which is a sum of the convolutional kernels.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 shows an exemplary embodiment of the method 100, according to the present invention.

(2) FIGS. 2A and 2B show a breakdown of the inference calculation 130 without nonlinearity 132 (FIG. 2A) and with nonlinearity 132 (FIG. 2B), according to example embodiments of the present invention.

(3) FIG. 3 shows an exemplary embodiment of the hardware platform 1, according to the present invention.

DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS

(4) FIG. 1 is a schematic sequence diagram of an exemplary embodiment of the method 100. According to step 105, those data types which are specifically the most important for the orientation of an at least partially self-driving vehicle in road traffic may be provided as input data in the input matrix 1. FIG. 3 explains the hardware platform 1 which is operated by the method 100 in greater detail.

(5) In step 110, input data 11 which are required for the inference calculation of a first layer of the neural network together with associated redundancy information 11a are read in from the external working memory 3. In step 120, the integrity of the input data 11 is checked on the basis of the redundancy information 11a. If said check is positive (truth value 1), i.e., the input data 11 have been identified as error-free, in step 130 the computing unit 2 carries out at least part of the first-layer inference calculation for the input data 11 in order to obtain a work result 12.

(6) In step 140, redundancy information 12a for the work result 12 is determined, on the basis of which the integrity of the work result 12 can be verified. Optionally, in step 145, it is additionally verified on the basis of said redundancy information 12a whether the work result 12 has been correctly calculated. If this is the case (truth value 1), in step 150 the work result 12 together with the redundancy information 12a is written to the external working memory 3.

(7) In the example shown in FIG. 1, processing of the neural network as a whole is organized such that it is checked in step 160 whether all the input data which are required for the first-layer inference calculation have already been processed. If this is not yet the case (truth value 0), the method branches back to step 110 and the next portion of input data 11 of said first layer is read in from the external working memory 3. If, however, the complete first layer has already been processed (truth value 1), the method switches over in step 170 to a second-layer inference calculation. In other words, the method branches back again to step 110 in order to read in portions of input data 11 from the external working memory 3. However, these are then input data 11 of the second layer. Once all the layers have been processed, the work result 12* of the neural network as a whole is output. Said work result 12* can be processed to yield a control signal 9 in step 220. According to step 230, said control signal 9 can then be used to control a vehicle 50, and/or a classification system 60, and/or a system 70 for the quality control of mass-produced products, and/or a system 80 for medical imaging, and/or an access control system 90.

(8) According to block 131, the first-layer inference calculation 130 may in particular comprise convolving data 13 with a plurality of convolutional kernels. According to block 132, said inference calculation 130 may also comprise at least one nonlinear calculation step in the inputs of the neurons. If said nonlinear calculation step according to block 133 is only carried out at a subsequent point in time, at which the work result 12 is again located in the internal working memory 2a of the computing unit 2 after having been read back in from the external working memory 3, this saves computing time and memory space. In this case, the work result 12 can be saved in the external working memory 3 in a state which is not yet contaminated by nonlinearity. The same redundancy information 12a may then be used both for the check 145 for correct calculation of the work result 12a and for the subsequent check 120 for correct storage and correct data transfer.

(9) According to block 141, the redundancy information 12a can be determined by convolution with a control kernel which is a sum of the stated convolutional kernels.

(10) If an error is identified in one of the checks 120 or 145 (truth value 0 in the respective check), according to block 180 the error in the input data 11, or in the work result 12, can be corrected on the basis of the respective redundancy information 11a, 12a. Alternatively, the input data 11, or the work result 12, can be recalculated according to block 185.

(11) In the example shown in FIG. 1, an error counter for a memory area of the external working memory 3 or a hardware component which comes into consideration as the cause of the error can then additionally be incremented according to block 190. In step 195, it can then be checked whether said error counter has exceeded a specified threshold value. If this is the case (truth value 1), the memory area or the hardware component can be identified as defective according to block 200. According to block 210, the hardware platform 1 can then be reconfigured such that a standby memory area or a standby hardware component is respectively used for further calculations.

(12) If the input data 11, or the work result 12, have been corrected according to block 180 or recalculated according to block 185, the originally intended inference calculation 130, or the originally intended storage 150 in the external working memory 3, can be resumed.

(13) FIG. 2 shows two examples of how the inference calculation 130 can be organized.

(14) FIG. 2a shows a simple example in which the inference calculation 130, 130a comprises only convolutions 131 of the input data 11 to yield a work result 12. As explained above, the calculations which lead to the work result 12 are then linear in the inputs supplied to the neurons. The same redundancy information 12a may then be used both for the check 145 for correct calculation 130 and for the subsequent check 120 for correct saving and reading in.

(15) In this example, the data 13 which are convolved in block 131 are identical to the input data 11. The work result 12 corresponds to the complete result of the inference calculation 130. When the next layer is processed, said work result 12 is read back in from the external working memory 3 as input data 11, and the redundancy information 12a stored with the work result 12 is the redundancy information 11a with which these input data 11 are verified. The next convolution 131 gives rise to the next work result 12.

(16) FIG. 2b shows a further example in which the inference calculation 130, 130b contains a nonlinearity 132. In contrast with FIG. 2a, the data 13 which are supplied to the convolution 131 are now no longer identical to the input data 11, but are instead obtained from these input data 11 by application of the nonlinearity 132. The convolution 131 itself, however, again contains only linear operations, such that the redundancy information 12a can be put to dual use in a manner similar to FIG. 2a.

(17) In contrast with FIG. 2a, the inference calculation 130, 130b is not yet complete at the point in time at which the work result 12 is saved in the external memory 3. Instead, it is taken to completion at a subsequent point in time when convolution 131 for the next layer is pending. In the intervening period, a linear intermediate product of the nonlinear inference calculation 130, 130b is therefore present in the external memory 3.

(18) FIG. 3 shows an exemplary embodiment of the hardware platform 1. A computing unit 2 with an internal working memory 2a is linked via a communication link 4 to an external working memory 3. Work results 12 are buffered in said external working memory 3 so that they can subsequently be read back in as input data 11 for the inference calculation of new layers. The external working memory 3 and the communication link 4 are susceptible to transient errors 18 which can be identified and corrected using the method 100.

Inference calculation for neural networks with protection against memory errors

Assignee

Inventors

Cpc classification

Classification Explorer

G06F11/141

PHYSICS

Classification Explorer

G06F11/1476

PHYSICS

Classification Explorer

G06N3/045

PHYSICS

International classification

Classification Explorer

G06F11/00

PHYSICS

Classification Explorer

G06F11/14

PHYSICS

Abstract

Claims

Description