Data Processing Method and Apparatus, and Related Device
20230162008 · 2023-05-25
Inventors
- Yongzhong Wang (Hangzhong, CN)
- Fuchun Wei (Hangzhou, CN)
- Wei Zhang (Shenzhen, CN)
- Xiaoxin Xu (Hangzhou, CN)
Cpc classification
G06V10/77
PHYSICS
International classification
Abstract
A data processing method includes obtaining first data and second data, where the first data and the second data are adjacent sequence data, and a sequence of the first data is prior to a sequence of the second data; padding third data between the first data and the second data according to a preset rule to obtain fourth data, where the third data isolates the first data from the second data; and completing data processing on the fourth data using a convolutional neural network.
Claims
1. A method implemented by a computing device, the method comprising: obtaining first data and second data, wherein the first data and the second data are adjacent sequence data, and wherein a first sequence of the first data is prior to a second sequence of the second data; padding third data between the first data and the second data according to a preset rule to obtain fourth data, wherein the third data isolates the first data from the second data; and completing data processing on the fourth data using a convolutional neural network.
2. The method of claim 1, wherein each the first data and the second data identifies an image data, voice data, or a text sequence.
3. The method of claim 2, wherein padding the third data between the first data and the second data according to the preset rule to obtain the fourth data comprises padding the third data of h1 rows and c1 columns between a last column of the first data and a first column of the second data to obtain the fourth data, wherein the first data comprises h1 rows and w1 columns, the second data comprises h2 rows and w2 columns, and the fourth data comprises h1+2p1 rows and w1+c1+w2+2p1 columns, wherein values of h1 and h2 are the same, and wherein p1 is a padding size corresponding to a network layer to which the fourth data is to be input.
4. The method of claim 2, wherein padding the third data between the first data and the second data according to the preset rule to obtain the fourth data comprises padding the third data of r1 rows and w1 columns between a last row of the first data and a first row of the second data to obtain the fourth data, wherein the first data comprises h1 rows and w1 columns, the second data comprises h2 rows and w2 columns, and the fourth data comprises h1+r1+h2+2p1 rows and w1+2p1 columns, wherein values of w1 and w2 are the same, and wherein p1 is a padding size corresponding to a network layer to which the fourth data is to be input.
5. The method of claim 3, wherein padding the third data of h1 rows and c1 columns between the last column of the first data and the first column of the second data to obtain the fourth data comprises: determining a first column quantity c1 of the third data based on a second column quantity w1 of the first data and network parameters of the network layer, wherein the network parameters comprise a size k1 of a convolutional kernel or a pooling kernel, the padding size p1, and a stride size s1; obtaining a row quantity h1 of the third data; and padding the third data of h1 rows and c1 columns between the last column of the first data and the first column of the second data to obtain the fourth data.
6. The method of claim 5, wherein determining the first column quantity c1 of the third data based on the second column quantity w1 of the first data and the network parameters of the network layer comprises: determining, based on the second column quantity w1 of the first data, the size k1, the padding size p1, and the stride size s1, a third column quantity wo1 of fifth data output after the first data is input to the network layer; determining, based on the second column quantity w1, the padding size p1, the stride size s1, and the third column quantity wo1, a distance Δw between a first center of a last operation on the first data and a second center of a first operation on the second data in a horizontal direction when the convolutional kernel or the pooling kernel processes spliced data after the spliced data is obtained by padding sixth data of h1 rows and p1 columns between the last column of the first data and the first column of the second data; and determining the first column quantity c1 based on the padding size p1, the stride size s1, and the distance Δw.
7. The method of claim 4, wherein padding the third data of r1 rows and w1 columns between the last row of the first data and the first row of the second data to obtain the fourth data comprises: determining a first row quantity r1 of the third data based on a second row quantity h1 of the first data and network parameters of the network layer, wherein the network parameters comprise a size k1 of a convolutional kernel or a pooling kernel, the padding size p1, and a stride size s1; obtaining a column quantity w1 of the first data; and padding the third data of r1 rows and w1 columns between the last row of the first data and the first row of the second data to obtain the fourth data.
8. The method of claim 7, wherein determining the first row quantity r1 of the third data based on the second row quantity h1 of the first data and the network parameters of the network layer comprises: determining, based on the second row quantity h1 of the first data, the size k1, the padding size p1, and the stride size s1, a third row quantity ho1 of fifth data output after the first data is input to the network layer; determining, based on the second row quantity h1, the padding size p1, the stride size s1, and the third row quantity ho1, a distance Δh between a first center of a last operation on the first data and a second center of a first operation on the second data in a vertical direction when the convolutional kernel or the pooling kernel processes spliced data after the spliced data is obtained by padding sixth data of h1 rows and p1 columns between the last row of the first data and the first row of the second data; and determining the first row quantity r1 based on the padding size p1, the stride size s1, and the distance Δh.
9. The method of claim 7, wherein completing the data processing on the fourth data using the convolutional neural network comprises: inputting the fourth data into the network layer for processing to obtain sixth data, wherein the sixth data comprises seventh data, eighth data, and interference data, wherein the seventh data is obtained after the network layer processes the first data, wherein the eighth data is obtained after the network layer processes the second data, and wherein the interference data is ninth data between the last row of the seventh data and the first row of the eighth data; determining a third row quantity r2 of the interference data; deleting the interference data of r2 rows; determining a fourth row quantity r3 of tenth data padded between the last row of the seventh data and the first row of the eighth data; padding the tenth data of r3 rows between the last row of the seventh data and the first row of the eighth data to obtain eleventh data; and completing data processing on the eleventh data using the convolutional neural network.
10. A computing device comprising: a memory configured to store instructions; and a processor coupled to the memory and configured to execute the instructions to cause the computing device to: obtain first data and second data, wherein the first data and the second data are adjacent sequence data, and a first sequence of the first data is prior to a second sequence of the second data; pad third data between the first data and the second data according to a preset rule to obtain fourth data, wherein the third data isolates the first data from the second data; and complete data processing on the fourth data using a convolutional neural network.
11. The computing device of claim 10, wherein each the first data and the second data identifies an image data, voice data, or a text sequence.
12. The computing device of claim 10, wherein when padding the third data, the processor is further configured to execute the instructions to cause the computing device to pad the third data of h1 rows and c1 columns between a last column of the first data and a first column of the second data to obtain the fourth data, wherein the first data comprises h1 rows and w1 columns, the second data comprises h2 rows and w2 columns, the fourth data comprises h1+2p1 rows and w1+c1+w2+2p1 columns, and wherein values of h1 and h2 are the same, and p1 is a padding size corresponding to a network layer to which the fourth data is to be input.
13. The computing device of claim 10, wherein when padding the third data, the processor is further configured to execute the instructions to cause the computing device to pad the third data of r1 rows and w1 columns between a last row of the first data and a first row of the second data to obtain the fourth data, wherein the first data comprises h1 rows and w1 columns, the second data comprises h2 rows and w2 columns, and the fourth data comprises h1+r1+h2+2p1 rows and w1+2p1 columns, and wherein values of w1 and w2 are the same, and p1 is a padding size corresponding to a network layer to which the fourth data is to be input.
14. The computing device of claim 12, wherein when padding the third data of h1 rows and c1 columns between the last column of the first data and the first column of the second data, the processor is further configured to execute the instructions to cause the computing device to: determine a first column quantity c1 of the third data based on a second column quantity w1 of the first data and network parameters of first network layer, wherein the network parameters comprise a size k1 of a convolutional kernel or a pooling kernel, the padding size p1, and a stride size s1; obtain a row quantity h1 of the third data; and pad the third data of h1 rows and c1 columns between the last column of the first data and the first column of the second data to obtain the fourth data.
15. The computing device of claim 14, wherein when determining the first column quantity c1 of the third data, the processor is further configured to execute the instructions to cause the computing device to: determine, based on the second column quantity w1 of the first data, the size k1, the padding size p1, and the stride size s1, a third column quantity wo1 of fifth data output after the first data is input to the network layer; determine, based on the second column quantity w1, the padding size p1, the stride size s1, and the third column quantity wo1, a distance Δw between a first center of a last operation on the first data and a second center of a first operation on the second data in a horizontal direction when the convolutional kernel or the pooling kernel processes spliced data after the spliced data is obtained by padding sixth data of h1 rows and p1 columns between the last column of the first data and the first column of the second data; and determine the first column quantity c1 based on the padding size p1, the stride size s1, and the distance Δw.
16. The computing device of claim 13, wherein when padding the third data of r1 rows and w1 columns between the last row of the first data and the first row of the second data, the processor is further configured to execute the instructions to cause the computing device to: determine a first row quantity r1 of the third data based on a second row quantity h1 of the first data and network parameters of the network layer, wherein the network parameters comprise a size k1 of a convolutional kernel or a pooling kernel, the padding size p1, and a stride size s1; obtain a column quantity w1 of the first data; and pad the third data of r1 rows and w1 columns between the last row of the first data and the first row of the second data to obtain the fourth data.
17. The computing device of claim 16, wherein to determine the first row quantity r1 of the third data based on the second row quantity h1 of the first data and the network parameters of the network layer, the processor is further configured to execute the instructions to cause the computing device to: determine, based on the second row quantity h1 of the first data, the size k1, the padding size p1, and the stride size s1, a third row quantity ho1 of fifth data output after the first data is input to the network layer; determine, based on the second row quantity h1, the padding size p1, the stride size s1, and the third row quantity ho1, a distance Δh between a first center of a last operation on the first data and a second center of a first operation on the second data in a vertical direction when the convolutional kernel or the pooling kernel processes spliced data after the spliced data is obtained by padding sixth data of h1 rows and p1 columns between the last row of the first data and the first row of the second data; and determine the first row quantity r1 based on the padding size p1, the stride size s1, and the distance Δh.
18. The computing device of claim 16, wherein when completing the data processing on the fourth data using the convolutional neural network, the processor is further configured to execute the instructions to cause the computing device to: input the fourth data into the network layer for processing to obtain sixth data, wherein the sixth data comprises seventh data, eighth data, and interference data, and wherein the seventh data is obtained after the network layer processes the first data, the eighth data is obtained after the network layer processes the second data, and the interference data is ninth data between the last row of the seventh data and the first row of the eighth data; determine a third row quantity r2 of the interference data; delete the interference data of r2 rows; determine a fourth row quantity r3 of tenth data padded between the last row of the seventh data and the first row of the eighth data; pad the tenth data of r3 rows between the last row of the seventh data and the first row of the eighth data to obtain eleventh data; and complete the data processing on the eleventh data using the convolutional neural network.
19. A computer program product comprising computer-executable instructions stored on a non-transitory computer-readable storage medium, the computer-executable instructions when executed by a processor of a computing device, cause the computing device to: obtain first data and second data, wherein the first data and the second data are adjacent sequence data, and a first sequence of the first data is prior to a second sequence of the second data; pad third data between the first data and the second data according to a preset rule to obtain fourth data, wherein the third data isolates the first data from the second data; and complete data processing on the fourth data using a convolutional neural network.
20. The computer program product of claim 19, wherein each the first data and the second data identifies an image data, voice data, or a text sequence.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
DESCRIPTION OF EMBODIMENTS
[0037] A data processing apparatus and method provided in the present disclosure are described in detail below with reference to the accompanying drawings.
[0038] A CNN is a deep learning model, which is usually used to analyze data such as images. As shown in
[0039] In the fields of image processing and natural language processing, a plurality of channels of data are usually processed simultaneously. In an example of image processing, in a video surveillance system shown in
[0040] An embodiment of the present disclosure provides a data processing method. Two or more images are adaptively spliced to obtain a large image, and then the spliced image is input to a convolutional neural network for forward inference, to fully utilize a bandwidth and a computing capability of an intelligent processing apparatus, and improve data processing efficiency of the intelligent processing apparatus.
[0041] It should be noted that in a process of processing an image by the convolutional neural network, output results of a convolutional layer and a pooling layer are both referred to as feature maps. The image is input to the convolutional neural network in a form of pixel matrix. For example, if the image is a gray image, a two-dimensional pixel matrix is input, or if the image is a color image, a three-dimensional pixel matrix (or referred to as a tensor) is input. After a pixel matrix corresponding to one image is input to a first convolutional layer of the convolutional neural network, a feature map corresponding to the image is obtained through convolutional processing. Therefore, a pixel matrix corresponding to an image and a feature map corresponding to the image are both in matrix forms. In this embodiment of the present disclosure, for ease of description, a pixel matrix input by the first convolutional layer of the convolutional neural network is also referred to as a feature map, and each value in the feature map is referred to as an element.
[0042] When feature maps corresponding to images are spliced, horizontal splicing or vertical splicing may be performed.
[0043] In this embodiment of the present disclosure, an example in which input data is images, and two images are horizontally spliced and input to a convolutional layer is used to describe the data processing method in this embodiment of the present disclosure.
[0044] S501: Obtain first data and second data.
[0045] The first data is a first feature map corresponding to a first image, and the second data is a second feature map corresponding to a second image. The first data and the second data are two pieces of data of adjacent sequences to be spliced. The first data includes elements of h.sub.1 rows and w.sub.1 columns. The second data includes elements of h.sub.2 rows and w.sub.2 columns. When the first data and the second data are horizontally spliced, values of h.sub.1 and h.sub.2 are the same.
[0046] S502: Pad third data between the first data and the second data according to a preset rule to obtain fourth data.
[0047] The third data isolates the first data from the second data, and the fourth data includes the first data, the second data, and the third data. A method for padding the third data between the first feature map and the second feature map to obtain a third feature map is described in detail using an example in which the first feature map and the second feature map are horizontally spliced, and elements of the first feature map are located on a left side of elements of the second feature map after splicing.
[0048] After obtaining the first feature map and the second feature map, an intelligent processing apparatus determines, based on a column quantity w.sub.1 of the first feature map and convolutional parameters of a first convolutional layer to which the first feature map and the second feature map are to be input, a column quantity of the third data to be padded between the first feature map and the second feature map. Convolutional parameters include a convolutional kernel size corresponding to a convolutional layer, a padding size corresponding to the convolutional layer, a stride size of a convolutional kernel, and a dilation rate. When the convolutional kernel is 3*3, the convolutional kernel size is equal to 3.
[0049] With reference to (formula 1), the intelligent processing apparatus can determine, based on the convolutional parameters of the first convolutional layer to which the first feature map and the second feature map are to be input and the column quantity w.sub.1 of the first feature map, a column quantity w.sub.o1 of fifth data output after the first feature map is separately input to the first convolutional layer for a convolutional operation:
[0050] In formula 1, the ceil operation indicates that a minimum integer greater than or equal to a specified expression is returned, p.sub.1 is a padding size corresponding to the first convolutional layer, k.sub.1 is a convolutional kernel size corresponding to the first convolutional layer, s.sub.1 is a stride size of a convolutional kernel corresponding to the first convolutional layer, and d.sub.1 is a dilation rate. For example, the column quantity w.sub.1 of the first feature map is 4, the padding size p.sub.1 in the convolutional parameters is 2, the convolutional kernel size k.sub.1 is 5, the stride size s.sub.1 is 1, and the dilation rate d.sub.1 is 1. Therefore, the column quantity w.sub.o1 of an output feature map is 4.
[0051] After obtaining, according to the foregoing (formula 1), the column quantity w.sub.o1 of the output feature map after the first feature map is separately input to the first convolutional layer for the convolutional operation, the intelligent processing apparatus may determine, based on the column quantity w.sub.1 of the first feature map, the padding size p.sub.1 of the first convolutional layer, the column quantity w.sub.o1 of the output feature map, and the stride size s.sub.1 of the convolutional kernel, a distance Δw between a center of the last operation on the first feature map by the convolutional kernel and a center of the first operation on the second feature map by the convolutional kernel in the first horizontal movement of the convolutional kernel in a process in which elements of p.sub.1 columns are padded between the first feature map and the second feature map to obtain a spliced feature map, the spliced feature map is input to the first convolutional layer, and then the convolutional kernel performs a convolutional operation on the spliced feature map. The spliced feature map indicates a feature map obtained after the elements of p.sub.1 columns are padded between the first feature map and the second feature map to splice the first feature map and the second feature map, elements of p.sub.1 columns are padded on each of a left side and a right side of the spliced feature map, and elements of p.sub.1 rows are padded on each of an upper side and a lower side of the spliced feature map. The intelligent processing apparatus calculates the distance Δw using the following (formula 2):
Δw=w.sub.1+p.sub.1−w.sub.o1*s.sub.1 (Formula 2)
[0052] After the distance Δw is obtained, to ensure that the distance between the center of the last operation on the first feature map by the convolutional kernel and the center of the first operation on the second feature map by the convolutional kernel in the first horizontal movement of the convolutional kernel is an integer multiple of the stride size s1 of the convolutional kernel, the intelligent processing apparatus calculates, according to (formula 3) based on the distance Δw, the padding size p1, and the stride size s1, the column quantity c1 of the third data to be finally padded between the first feature map and the second feature map:
[0053] After the column quantity c1 of the third data is obtained through calculation, when the first feature map and the second feature map are spliced, elements of h1 rows and c1 columns (that is, the third data) are padded between the last column of the first feature map and the first column of the second feature map, the elements of p1 columns are padded on each of the left side and the right side of the spliced feature map, and the elements of p1 rows are padded on each of the upper side and the lower side of the spliced feature map, to obtain the third feature map (that is, the fourth data). In other words, after the third data is padded between the first feature map and the second feature map to obtain the spliced feature map, values of the row quantity and the column quantity of the elements that are padded on the upper side, the lower side, the left side, and the right side of the spliced feature map are the padding size p1 of the first convolutional layer. The third feature map includes elements of h1+2p1 rows and w1+c1+w2+2p1 columns, where values of the padded elements are 0s.
[0054] For example, as shown in
[0055] S503: Complete data processing on the fourth data using a convolutional neural network.
[0056] After the elements of c1 columns are padded between the first feature map and the second feature map, and the first feature map and the second feature map are spliced to obtain the third feature map, the third feature map is input to the first convolutional layer for convolutional processing to obtain sixth data (that is, a fourth feature map). The fourth feature map includes seventh data (that is, a fifth feature map), eighth data (that is, a sixth feature map), and interference data. The fifth feature map is a feature map obtained after the first feature map is separately input to the first convolutional layer for convolutional processing. The sixth feature map is a feature map obtained after the second feature map is separately input to the first convolutional layer for convolutional processing. That is, the fifth feature map is a feature extracted from the first feature map, and the sixth feature map is a feature extracted from the second feature map. The interference data is elements between the last column of the fifth feature map and the first column of the sixth feature map.
[0057] Splicing feature maps corresponding to two images into one feature map including more elements, and inputting the one feature map obtained after splicing to a convolutional neural network model for processing can avoid separately performing operations such as vectorization and fractal on each image in a process of simultaneously processing a plurality of images by the convolutional neural network. This improves efficiency of data processing performed by the intelligent processing apparatus using the convolutional neural network, and fully utilizes a bandwidth of the intelligent processing apparatus.
[0058] In addition, if no element used for isolation is padded between the first feature map and the second feature map, and the first feature map and the second feature map are directly spliced to obtain a spliced feature map, in an output feature map obtained by performing a convolutional operation on the spliced feature map when the spliced feature map passes through a convolutional layer, a receptive field corresponding to one element may include elements of the two feature maps. For example, as shown in
[0059] In this embodiment of the present disclosure, by padding elements used to isolate two feature maps between the two feature maps, a result obtained after a spliced feature map passes through a convolutional layer of the convolutional neural network includes a result obtained when each feature map is separately processed. This avoids the following problem: after two feature maps are directly spliced for convolutional processing, receptive fields corresponding to some elements in the output feature map include elements of the two feature maps, and consequently final detection and recognition results of the convolutional neural network are inaccurate.
[0060] Because the convolutional neural network includes a plurality of convolutional layers and a plurality of pooling layers, after the third feature map obtained by splicing is input to the first convolutional layer for convolutional processing to obtain the fourth feature map, the fourth feature map is further input to a next network layer (a convolutional layer or a pooling layer) of the convolutional neural network for convolutional processing or pooling processing. The fourth feature map includes one or more columns of interference elements used to isolate the fifth feature map from the sixth feature map, and values of the interference elements are not all 0s. Therefore, before the fourth feature map is input to the next network layer of the convolutional neural network, a column quantity c2 of the interference elements needs to be determined, and the interference elements of the c2 columns need to be deleted from the fourth feature map. Then, a column quantity c3 of ninth data that needs be padded between the last column of the fifth feature map and the first column of the sixth feature map is calculated based on convolutional parameters or pooling parameters of the next network layer with reference to the foregoing (formula 1) to (formula 3). In addition, the ninth data is padded between the last column of the fifth feature map and the first column of the sixth feature map to obtain tenth data (that is, a seventh feature map), and finally the seventh feature map is input to a next network layer.
[0061] The intelligent processing apparatus determines, based on the column quantity of the third feature map and the convolutional parameters of the first convolutional layer, a column quantity wo2 of the fourth feature map output by the first convolutional layer, and then determines, based on the column quantities of the output feature maps output after the first feature map and the second feature map are separately input to the first convolutional layer for convolutional operations, a column quantity c2 of elements used for isolation in the fourth feature map. Specifically, the intelligent processing apparatus determines the column quantity wo2 of the fourth feature map according to (formula 4):
[0062] Herein, w2 is the column quantity of the second feature map. After the column quantity of the fourth feature map is obtained, the column quantity c2 of the interference elements in the fourth feature map is determined according to (formula 5):
c.sub.2=w.sub.o2−w.sub.o1−w.sub.o3 (Formula 5)
[0063] Herein, w.sub.o1 is the column quantity of the output feature map output after the first feature map is separately input to the first convolutional layer for the convolutional operation, and w.sub.o3 is the column quantity of the output feature map output after the second feature map is separately input to the first convolutional layer for the convolutional operation.
[0064] After the column quantity c.sub.2 of the interference elements is determined, the elements of c.sub.2 columns are deleted starting from the (w.sub.o1+1).sup.th column of the third feature map. Then, according to the same method in S502, the column quantity c.sub.3 of the ninth data to be padded between the last column of the fifth feature map and the first column of the sixth feature map is calculated, and the ninth data of c.sub.3 columns is padded between the last column of the fifth feature map and the first column of the sixth feature map to obtain a spliced feature map. In addition, elements of p.sub.2 rows are padded on each of an upper side and a lower side of the spliced feature map, elements of p.sub.2 columns are padded on each of a left side and a right side of the spliced feature map to obtain the tenth data (that is, the seventh feature map), and the seventh feature map is input to a next network layer of the convolutional neural network. Herein, p.sub.2 is a padding size corresponding to the next network layer of the convolutional neural network.
[0065] For example, as shown in
[0066] In a possible implementation, after the value of the column quantity c.sub.2 of the interference elements and the value of the column quantity c.sub.3 of the elements to be padded between the fifth feature map and the sixth feature map are determined, a column quantity of elements that need to be added or deleted between the fifth feature map and the sixth feature map may be determined based on the value of c.sub.2 and the value of c.sub.3.
[0067] If the value of c.sub.2 is the same as the value of c.sub.3, all values of the interference elements are replaced with 0s. Then, elements of p.sub.2 columns are padded on each of a left side and a right side of the fourth feature map, and elements of p.sub.2 rows are padded on each of an upper side and a lower side of the fourth feature map, to obtain the seventh feature map, and the seventh feature map is input to the next network layer of the convolutional neural network. Herein, p.sub.2 is a padding size corresponding to the next network layer of the convolutional neural network. If the value of c.sub.3 is less than the value of c.sub.2, elements of (c.sub.2−c.sub.3) columns in the interference elements are deleted starting from the (w.sub.o1+1).sup.th column of the fourth feature map, elements of c.sub.3 columns in the interference elements are retained, and values of the retained elements of c.sub.3 columns are replaced with 0s. Then, elements of p.sub.2 columns are padded on each of a left side and a right side of the fourth feature map, and elements of p.sub.2 rows are padded on each of an upper side and a lower side of the fourth feature map, to obtain the seventh feature map, and then the seventh feature map is input to the next network layer of the convolutional neural network. If the value of c.sub.3 is greater than the value of c.sub.2, elements of (c.sub.2−c.sub.3) columns whose values are 0s are added between the fifth feature map and the sixth feature map, and values of interference elements of c.sub.2 columns are replaced with 0s. Then, elements of p.sub.2 columns are padded on each of a left side and a right side of the fourth feature map, and elements of p.sub.2 rows are padded on each of an upper side and a lower side of the fourth feature map, to obtain the seventh feature map, and then the seventh feature map is input to the next network layer of the convolutional neural network.
[0068] The foregoing describes how to determine the column quantity of the third data using the example in which the first feature map and the second feature map are horizontally spliced, and in the third feature map, the first feature map is located on the left side of the second feature map. If in the third feature map, the second feature map is located on the left side of the first feature map, when the column quantity c.sub.1 of the third data is calculated, the column quantity in the foregoing (formula 1) and (formula 2) is replaced with the column quantity w.sub.2 of the second feature map, that is, in the (formula 1) and (formula 2), a column quantity of a feature map processed first by the convolutional kernel is used for calculation.
[0069] It should be understood that, during splicing on the first feature map and the second feature map, the first feature map and the second feature map can be further vertically spliced. When the first feature map and the second feature map are vertically spliced, the intelligent processing apparatus replaces the column quantity in the foregoing formula with a row quantity of a corresponding feature map during calculation performed according to the foregoing formula. When the first feature map and the second feature map are vertically spliced, the intelligent processing apparatus first determines a row quantity r.sub.1 of third data based on a row quantity h.sub.1 of the first feature map and network parameters of a first network layer; and then, pads the third data of r.sub.1 rows and c.sub.1 columns whose element values are 0s between the last row of the first feature map and the first row of the second feature map to obtain a spliced feature map. Then, the intelligent processing apparatus pads elements of p.sub.1 columns on each of a left side and a right side of the spliced feature map, and pads elements of p.sub.1 rows on each of an upper side and a lower side of the spliced feature map, to obtain a third feature map. The third feature map includes elements of h.sub.1+r.sub.1+h.sub.2+2p.sub.1 rows and w.sub.1+2p.sub.1 columns.
[0070] When the row quantity r.sub.1 of the third data is determined, the intelligent processing apparatus first determines, based on the row quantity h.sub.1 of the first feature map and a convolutional kernel size k.sub.1, a padding size p.sub.1, and a stride size s.sub.1 that correspond to the first convolutional layer, a row quantity h.sub.o1 of fifth data output after the first feature map is input to a first convolutional layer; based on the row quantity w.sub.1, the padding size p.sub.1, the stride size s.sub.1, and the row quantity h.sub.o1 of the fifth data, determines a distance Δh between a center of the last operation on the first feature map and a center of the first operation on the second feature map in a vertical direction when a convolutional kernel of the first convolutional layer processes spliced data after the spliced data is obtained by padding data of h.sub.1 rows and p.sub.1 columns between the last row of the first feature map and the first row of the second feature map; and determines the row quantity r.sub.1 of the third data based on the padding size p1, the stride size s1, and the distance Δh. That is, a method for calculating a row quantity of elements to be padded between the first feature map and the second feature map when the first feature map and the second feature map are vertically spliced is the same as a method for calculating a column quantity of elements to be padded between the first feature map and the second feature map when the first feature map and the second feature map are horizontally spliced, and only the column quantity of the first feature map in the foregoing (formula 1) to (formula 3) needs to be replaced with the row quantity of the first feature map. For example, when the row quantity h.sub.o1 of the output feature map output after the first feature map is separately input to the first convolutional layer for a convolutional operation is determined according to (formula 1), the column quantity w.sub.1 of the first feature map is replaced with the row quantity h.sub.1 of the first feature map.
[0071] After obtaining the third feature map, the intelligent processing apparatus inputs the third feature map to the first convolutional layer to perform convolutional processing to obtain sixth data (that is, a fourth feature map). The fourth feature map includes seventh data (that is, a fifth feature map), eighth data (that is, a sixth feature map), and interference data. The fifth feature map is a feature map obtained after the first feature map is separately input to the first convolutional layer for convolutional processing. The sixth feature map is a feature map obtained after the second feature map is separately input to the first convolutional layer for convolutional processing. That is, the fifth feature map is a feature extracted from the first feature map, and the sixth feature map is a feature extracted from the second feature map. The interference data is elements between the last row of the fifth feature map and the first row of the sixth feature map.
[0072] Before the fourth feature map is input to a next network layer, the intelligent processing apparatus determines a row quantity h.sub.o2 of the fourth feature map based on the row quantity h.sub.1+r.sub.1+h.sub.2+2p.sub.1 of the third feature map and the network parameters of the first network layer; determines a row quantity r.sub.2 of the interference data based on the row quantity h.sub.o2 of the fourth feature map, a row quantity of the fifth feature map, and a row quantity of the sixth feature map, and deletes the interference data of r.sub.2 rows; determines a row quantity r.sub.3 of ninth data to be padded between the last row of the fifth feature map and the first row of the sixth feature map, and pads the ninth data of r.sub.3 columns between the last row of the fifth feature map and the first row of the sixth feature map, to obtain tenth data (that is, a seventh feature map); and finally inputs the seventh feature map to a next network layer.
[0073] It should be understood that, a method for calculating the row quantity of the interference elements between the fifth feature map and the sixth feature map when the first feature map and the second feature map are vertically spliced is the same as a method for calculating the column quantity of the interference elements when the first feature map and the second feature map are horizontally spliced, and only a column quantity of each feature map in the foregoing (formula 4) and (formula 5) needs to be replaced with a row quantity of the corresponding feature map. For example, when the row quantity h.sub.o2 of the fourth feature map output after the third feature map is input to the first convolutional layer for a convolutional operation is determined according to (formula 4), the column quantity w.sub.1 of the first feature map is replaced with the row quantity h.sub.1 of the first feature map, the column quantity w.sub.2 of the second feature map is replaced with the row quantity h.sub.2 of the second feature map, and the column quantity c.sub.1 is replaced with r.sub.1.
[0074] It should be understood that, when the first feature map is a pixel matrix corresponding to the first image, and the second feature map is a pixel matrix corresponding to the second image, that is, when the first feature map and the second feature map are input to the first convolutional layer, it is only necessary to determine a column quantity or a row quantity of elements to be padded between the first feature map and the second feature map.
[0075] In this embodiment of the present disclosure, when the row quantities of the first feature map and the second feature map are the same, and the column quantities of the first feature map and the second feature map are also the same, a fully connected layer may be replaced with a convolutional layer. A convolutional kernel size of the convolutional layer used to replace the fully connected layer is a size of a single feature map, a stride size of a convolutional kernel in a horizontal direction is equal to a column quantity of a single feature map, and a stride size of the convolutional kernel in a vertical direction is equal to a row quantity of a single feature map. Therefore, when a feature map output by a convolutional layer or a pooling layer needs to be input to a fully connected layer, only interference elements of a feature map output by the last convolutional layer or the last pooling layer need to be determined, and after the interference elements are deleted, different feature maps are directly spliced and input to a fully connected layer, with no need to pad elements used for isolation between the different feature maps. For example, when the fourth feature map is a feature map output by the last convolutional layer, only a column quantity c.sub.2 of the interference elements between the fifth feature map and the sixth feature map needs to be determined, and the interference elements of c.sub.2 columns are deleted from the fourth feature map. Then, the fifth feature map and the sixth feature map are directly spliced to obtain a seventh feature map, and the seventh feature map may be input to a fully connected layer.
[0076] The foregoing uses an example in which the first feature map and the second feature map are input to a convolutional layer to describe how to determine the column quantity of elements to be padded between the first feature map and the second feature map, and how to splice the first feature map and the second feature map, and describe how to determine the column quantity of elements to be added or deleted between the fifth feature map and the sixth feature map after the fourth feature map is obtained and before the fourth feature map is input to a next convolutional layer or pooling layer. If the first feature map and the second feature map are input to a pooling layer, the intelligent processing apparatus needs to obtain pooling parameters of the pooling layer to which the first feature map and the second feature map are to be input, and replaces the convolutional parameters in (formula 1) to (formula 5) with the pooling parameters for calculation. For example, a convolutional kernel size is replaced with a pooling kernel size of the pooling layer, a padding size of a convolutional layer is replaced with a padding size of the pooling layer, a stride size of a convolutional kernel of the convolutional layer is replaced with a stride size of a pooling kernel of the pooling layer, and the like. When the pooling parameters are used for calculation, a value of a dilation rate d in the foregoing (formula 1) is 1.
[0077] The foregoing embodiment uses an example in which two feature maps are spliced to describe the data processing method provided in this embodiment of the present disclosure. It should be understood that the splicing method for feature maps may be further applied to splicing of more than two feature maps. When a quantity of feature maps that need to be spliced is greater than or equal to 3, two feature maps may be spliced to obtain a new feature map according to the splicing method for two feature maps, and then the new feature map and the other feature map are spliced according to the splicing method for two feature maps, until all feature maps are spliced into one feature map. When a quantity of feature maps that need to be spliced is an even number greater than 2, half of the feature maps may be horizontally spliced according to the splicing method for feature maps whose quantity is greater than or equal to 3, to obtain a new feature map. Then, the other half of the feature maps are horizontally spliced according to the same method to obtain another new feature map. Subsequently, the two new feature maps are vertically spliced to obtain a final feature map.
[0078] In the foregoing embodiment, an example in which both the first data and the second data are image data is used to describe the data processing method provided in this embodiment of the present disclosure. It should be understood that the foregoing method may be further applied to processing of voice data or text sequences. For example, when the first data and the second data each are a segment of voice data, after receiving the first data and the second data, the intelligent processing apparatus first separately converts the voice data into a text sequence, then converts each word in the text sequence into a word vector using a word embedding algorithm, and forms one matrix using the word vector corresponding to each word in the segment of voice data according to a preset rule. The matrix is in a same form as the first feature map. Therefore, the intelligent processing apparatus may convert the two segments of voice data into matrices, then splice the matrices corresponding to the segments of voice data according to a method same as the foregoing method for feature maps, and input to a spliced matrix to the convolutional neural network for processing.
[0079] It should be noted that, the foregoing data processing method may be further applied to processing of different types of data. The first data may be any one of image data, voice data, or a text sequence, and the second data may be any one of image data, voice data, or a text sequence. This is not specifically limited in embodiments of the present disclosure.
[0080] It should be noted that, for brief description, the foregoing method embodiment is described as a series of action combinations. However, a person skilled in the art should understand that the present disclosure is not limited by the described action sequence. In addition, a person skilled in the art should also understand that related actions are unnecessarily mandatory for the present disclosure.
[0081] Other appropriate step combinations that can be figured out by a person skilled in the art based on the content described above also fall within the protection scope of the present disclosure. In addition, a person skilled in the art should also understand that all embodiments described in this specification are preferred embodiments, and the related actions are not necessarily mandatory to the present disclosure.
[0082] The foregoing describes in detail the data processing method provided in embodiments of the present disclosure with reference to
[0083]
[0084] The obtaining unit 101 is configured to obtain first data and second data, where the first data is any one of image data, voice data, or a text sequence, and the second data is any one of image data, voice data, or a text sequence. The first data and the second data are data that needs to be spliced in adjacent sequences. During splicing, a sequence of the first data is prior to a sequence of the second data. In other words, in a sequence obtained after splicing is completed, the first data is processed before the second data.
[0085] The padding unit 102 is configured to pad third data between the first data and the second data to obtain fourth data, where the third data isolates the first data from the second data.
[0086] The processing unit 103 is configured to complete data processing on the fourth data using a convolutional neural network. Because the convolutional neural network includes a plurality of convolutional layers and a plurality of pooling layers, processing the fourth data using the convolutional neural network means that the fourth data is input to a first network layer for processing. The first network layer may be a convolutional layer or may be a pooling layer.
[0087] It should be understood that the data processing apparatus 100 in this embodiment of the present disclosure may be implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD). The PLD may be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general array logic (GAL), or any combination thereof. Alternatively, when the data processing method shown in
[0088] It should be understood that, when the third data is padded between the first data and the second data to splice the first data and the second data into the fourth data, the first data and the second data may be horizontally or vertically spliced. For a manner of splicing between the first data and the second data and a method for determining a row quantity and column quantity of the third data during splicing, refer to specific descriptions in the method embodiment corresponding to
[0089] In a possible implementation, as shown in
[0090] The deletion unit 104 is configured to determine a column quantity or a row quantity of the interference data, and delete the interference data. For a method for determining the column quantity or the row quantity of the interference data by the deletion unit 104, refer to the method for determining the interference column quantity c.sub.2 and the interference row quantity h.sub.2 by the intelligent processing apparatus in the foregoing method embodiment. Details are not described herein again.
[0091] The padding unit 102 is further configured to: after the interference data is deleted, determine a column quantity or a row quantity of ninth data to be padded between the seventh data and the eighth data, and pad the ninth data between the seventh data and the eighth data to obtain tenth data.
[0092] The processing unit 103 is further configured to complete data processing on the tenth data using the convolutional neural network.
[0093] Specifically, for a data processing operation implemented by the foregoing data processing apparatus 100, refer to related operations of the intelligent processing apparatus in the foregoing method embodiment. Details are not described herein again.
[0094]
[0095] The computing device 200 obtains first data and second data, pads, according to a preset rule, third data used to isolate the first data from the second data between the first data and the second data, to obtain fourth data, and then completes processing on the fourth data using a convolutional neural network. The first data and the second data are data to be spliced together, and a sequence of the first data is prior to a sequence of the second data during splicing. The first data is any one of image data, voice data, or a text sequence, and the second data is any one of image data, voice data, or a text sequence.
[0096] In this embodiment of the present disclosure, the processor 210 may have a plurality of specific implementation forms. For example, the processor 210 may be any one or a combination of a plurality of processors such as a CPU, a GPU, a TPU, or an NPU, or the processor 210 may be a single-core processor or a multi-core processor. The processor 210 may include a combination of a CPU (GPU, TPU, or NPU) and a hardware chip. The hardware chip may be an ASIC, a PLD, or a combination thereof. The PLD may be a CPLD, a FPGA, a GAL, or any combination thereof. Alternatively, the processor 210 may be implemented independently using a logic device with embedded processing logic, for example, an FPGA or a digital signal processor (DSP).
[0097] The communication interface 220 may be a wired interface or a wireless interface, and is configured to communicate with another module or device, for example, receive a video or an image sent by a surveillance device in
[0098] The memory 230 may be a non-volatile memory, for example, a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or a flash memory. The memory 230 may alternatively be a volatile memory, which may be a random access memory (RAM) that serves as an external cache.
[0099] The memory 230 may alternatively be configured to store instructions and data, so that the processor 210 invokes the instructions stored in the memory 230 to implement an operation performed by the processing unit 103 or an operation performed by the intelligent processing apparatus in the method embodiment. Further, the computing device 200 may include more or fewer components than those shown in
[0100] The bus 240 may be classified into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used for representation in
[0101] Optionally, the computing device 200 may further include an input/output interface 250. The input/output interface 250 is connected to an input/output device, and is configured to receive input information and output an operation result.
[0102] It should be understood that the computing device 200 in this embodiment of the present disclosure may correspond to the data processing apparatus 100 in the foregoing embodiment, and may perform operations performed by the intelligent processing apparatus in the foregoing method embodiment. Details are not described herein again.
[0103] An embodiment of the present disclosure further provides a non-transient computer storage medium. The computer storage medium stores instructions, and when the instructions are run on a processor, the method steps in the foregoing method embodiment may be implemented. For specific implementation of performing the foregoing method steps by the processor of the computer storage medium, refer to specific operations of the foregoing method embodiment. Details are not described herein again.
[0104] In the foregoing embodiments, the description of each embodiment has respective focuses. For a part that is not described in detail in an embodiment, refer to related descriptions in other embodiments.
[0105] All or some of the foregoing embodiments may be implemented using software, hardware, firmware, or any combination thereof. When software is used to implement embodiments, the foregoing embodiments may be implemented completely or partially in a form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded or executed on the computer, the procedures or functions according to embodiments of the present disclosure are all or partially generated. The computer may be a general-purpose computer, a dedicated computer, a computer network, or other programmable apparatuses. The computer instructions may be stored in a computer-readable storage medium or may be transmitted from a computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center to another website, computer, server, or data center in a wired (for example, a coaxial cable, an optical fiber, or a digital subscriber line (DSL)) or wireless (for example, infrared, radio, or microwave) manner. The computer-readable storage medium may be any usable medium accessible by a computer, or a data storage device, such as a server or a data center, integrating one or more usable media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, or magnetic tape), an optical medium, or a semiconductor medium, and the semiconductor medium may be a solid state disk.
[0106] The foregoing descriptions are merely specific implementations of the present disclosure. Based on the specific implementations provided in the present disclosure, a person skilled in the art can figure out variations or replacements, which shall all fall within the protection scope of the present disclosure.