Patent classifications
G06N3/063
Grouped convolution using point-to-point connected channel convolution engines
A processor system comprises a plurality of processing elements. Each processing element includes a corresponding convolution processor unit configured to perform a portion of a groupwise convolution. The corresponding convolution processor unit determines multiplication results by multiplying each data element of a portion of data elements in a convolution data matrix with a corresponding data element in a corresponding groupwise convolution weight matrix. The portion of data elements in the convolution data matrix that are multiplied belong to different channels and different groups. For each specific channel of the different channels, the corresponding convolution processor unit sums together at least some of the multiplication results belonging to the same specific channel to determine a corresponding channel convolution result data element. The processing elements sum together a portion of the channel convolution result data elements from a group of different convolution processor units to determine a groupwise convolution result data element.
Tensor dropout using a mask having a different ordering than the tensor
A method for selectively dropping out feature elements from a tensor in a neural network is disclosed. The method includes receiving a first tensor from a first layer of a neural network. The first tensor includes multiple feature elements arranged in a first order. A compressed mask for the first tensor is obtained. The compressed mask includes single-bit mask elements respectively corresponding to the multiple feature elements of the first tensor and has a second order that is different than the first order of their corresponding feature elements in the first tensor. Feature elements from the first tensor are selectively dropped out based on the compressed mask to form a second tensor which is propagated to a second layer of the neural network.
Tensor dropout using a mask having a different ordering than the tensor
A method for selectively dropping out feature elements from a tensor in a neural network is disclosed. The method includes receiving a first tensor from a first layer of a neural network. The first tensor includes multiple feature elements arranged in a first order. A compressed mask for the first tensor is obtained. The compressed mask includes single-bit mask elements respectively corresponding to the multiple feature elements of the first tensor and has a second order that is different than the first order of their corresponding feature elements in the first tensor. Feature elements from the first tensor are selectively dropped out based on the compressed mask to form a second tensor which is propagated to a second layer of the neural network.
Computing apparatus using convolutional neural network and method of operating the same
An apparatus and a method use a convolutional neural network (CNN) including a plurality of convolution layers in the field of artificial intelligence (AI) systems and applications thereof. A computing apparatus using a CNN including a plurality of convolution layers includes a memory storing one or more instructions; and one or more processors configured to execute the one or more instructions stored in the memory to obtain input data; identify a filter for performing a convolution operation with respect to the input data, on one of the plurality of convolution layers; identify a plurality of sub-filters corresponding to different filtering regions within the filter; provide a plurality of feature maps based on the plurality of sub-filters; and obtain output data, based on the plurality of feature maps.
System and method for convolutional layer structure for neural networks
An electronic device, method, and computer readable medium for 3D association of detected objects are provided. The electronic device includes a memory and at least one processor coupled to the memory. The at least one processor configured to convolve an input to a neural network with a basis kernel to generate a convolution result, scale the convolution result by a scalar to create a scaled convolution result, and combine the scaled convolution result with one or more of a plurality of scaled convolution results to generate an output feature map.
Method and system for processing neural network
The present disclosure provides a neural network processing system that comprises a multi-core processing module composed of a plurality of core processing modules and for executing vector multiplication and addition operations in a neural network operation, an on-chip storage medium, an on-chip address index module, and an ALU module for executing a non-linear operation not completable by the multi-core processing module according to input data acquired from the multi-core processing module or the on-chip storage medium, wherein the plurality of core processing modules share an on-chip storage medium and an ALU module, or the plurality of core processing modules have an independent on-chip storage medium and an ALU module. The present disclosure improves an operating speed of the neural network processing system, such that performance of the neural network processing system is higher and more efficient.
Method and system for processing neural network
The present disclosure provides a neural network processing system that comprises a multi-core processing module composed of a plurality of core processing modules and for executing vector multiplication and addition operations in a neural network operation, an on-chip storage medium, an on-chip address index module, and an ALU module for executing a non-linear operation not completable by the multi-core processing module according to input data acquired from the multi-core processing module or the on-chip storage medium, wherein the plurality of core processing modules share an on-chip storage medium and an ALU module, or the plurality of core processing modules have an independent on-chip storage medium and an ALU module. The present disclosure improves an operating speed of the neural network processing system, such that performance of the neural network processing system is higher and more efficient.
System and method for compressing activation data
A method for adapting a trained neural network is provided. Input data is input to the trained neural network and a plurality of filters are applied to generate a plurality of channels of activation data. Differences between corresponding activation values in the plurality of channels of activation data are calculated and an order of the plurality of channels is determined based on the calculated differences. The neural network is adapted so that it will output channels of activation data in the determined order. The ordering of the channels of activation data is subsequently used to compress activation data values by taking advantage of a correlation between activation data values in adjacent channels.
Processing apparatus and electronic device including the same
Provided are processing and an electronic device including the same. The processing apparatus includes a bit cell line comprising bit cells connected in series, a mirror circuit unit configured to generate a mirror current by replicating a current flowing through the bit cell line at a ratio, a charge charging unit configured to charge a voltage corresponding to the mirror current as the mirror current replicated by the mirror circuit unit is applied, and a voltage measuring unit configured to output a value corresponding to a multiply-accumulate (MAC) operation of weights and inputs applied to the bit cell line, based on the voltage charged by the charge charging unit.
Processing apparatus and electronic device including the same
Provided are processing and an electronic device including the same. The processing apparatus includes a bit cell line comprising bit cells connected in series, a mirror circuit unit configured to generate a mirror current by replicating a current flowing through the bit cell line at a ratio, a charge charging unit configured to charge a voltage corresponding to the mirror current as the mirror current replicated by the mirror circuit unit is applied, and a voltage measuring unit configured to output a value corresponding to a multiply-accumulate (MAC) operation of weights and inputs applied to the bit cell line, based on the voltage charged by the charge charging unit.