Patent classifications
G06N3/063
Neural network training mechanism
- Gokcen Cilingir ,
- Elmoustapha Ould-Ahmed-Vall ,
- Rajkishore Barik ,
- Kevin Nealis ,
- Xiaoming Chen ,
- Justin E. Gottschlich ,
- Prasoonkumar Surti ,
- Chandrasekaran Sakthivel ,
- Abhishek Appu ,
- John C. Weast ,
- Sara S. Baghsorkhi ,
- Barnan Das ,
- Narayan Biswal ,
- Stanley J. Baran ,
- Nilesh V. Shah ,
- Archie Sharma ,
- Mayuresh M. Varerkar
An apparatus to facilitate neural network (NN) training is disclosed. The apparatus includes training logic to receive one or more network constraints and train the NN by automatically determining a best network layout and parameters based on the network constraints.
Neural network training mechanism
- Gokcen Cilingir ,
- Elmoustapha Ould-Ahmed-Vall ,
- Rajkishore Barik ,
- Kevin Nealis ,
- Xiaoming Chen ,
- Justin E. Gottschlich ,
- Prasoonkumar Surti ,
- Chandrasekaran Sakthivel ,
- Abhishek Appu ,
- John C. Weast ,
- Sara S. Baghsorkhi ,
- Barnan Das ,
- Narayan Biswal ,
- Stanley J. Baran ,
- Nilesh V. Shah ,
- Archie Sharma ,
- Mayuresh M. Varerkar
An apparatus to facilitate neural network (NN) training is disclosed. The apparatus includes training logic to receive one or more network constraints and train the NN by automatically determining a best network layout and parameters based on the network constraints.
Method and device for optimizing neural network
The embodiments of this application provide a method and device for optimizing neural network. The method includes: binarizing and bit-packing input data of a convolution layer along a channel direction, and obtaining compressed input data; binarizing and bit-packing respectively each convolution kernel of the convolution layer along the channel direction, and obtaining each corresponding compressed convolution kernel; dividing the compressed input data sequentially in a convolutional computation order into blocks of the compressed input data with the same size of each compressed convolution kernel, wherein the data input to one time convolutional computation form a data block; and, taking a convolutional computation on each block of the compressed input data and each compressed convolution kernel sequentially, obtaining each convolutional result data, and obtaining multiple output data of the convolution layer according to each convolutional result data.
Method and device for optimizing neural network
The embodiments of this application provide a method and device for optimizing neural network. The method includes: binarizing and bit-packing input data of a convolution layer along a channel direction, and obtaining compressed input data; binarizing and bit-packing respectively each convolution kernel of the convolution layer along the channel direction, and obtaining each corresponding compressed convolution kernel; dividing the compressed input data sequentially in a convolutional computation order into blocks of the compressed input data with the same size of each compressed convolution kernel, wherein the data input to one time convolutional computation form a data block; and, taking a convolutional computation on each block of the compressed input data and each compressed convolution kernel sequentially, obtaining each convolutional result data, and obtaining multiple output data of the convolution layer according to each convolutional result data.
Neuromorphic event-driven neural computing architecture in a scalable neural network
An event-driven neural network including a plurality of interconnected core circuits is provided. Each core circuit includes an electronic synapse array that has multiple digital synapses interconnecting a plurality of digital electronic neurons. A synapse interconnects an axon of a pre-synaptic neuron with a dendrite of a post-synaptic neuron. A neuron integrates input spikes and generates a spike event in response to the integrated input spikes exceeding a threshold. Each core circuit also has a scheduler that receives a spike event and delivers the spike event to a selected axon in the synapse array based on a schedule for deterministic event delivery.
Neuromorphic event-driven neural computing architecture in a scalable neural network
An event-driven neural network including a plurality of interconnected core circuits is provided. Each core circuit includes an electronic synapse array that has multiple digital synapses interconnecting a plurality of digital electronic neurons. A synapse interconnects an axon of a pre-synaptic neuron with a dendrite of a post-synaptic neuron. A neuron integrates input spikes and generates a spike event in response to the integrated input spikes exceeding a threshold. Each core circuit also has a scheduler that receives a spike event and delivers the spike event to a selected axon in the synapse array based on a schedule for deterministic event delivery.
Electronic apparatus and method for optimizing trained model
An electronic apparatus is provided. The electronic apparatus includes: a memory storing a trained model including a plurality of layers; and a processor initializing a parameter matrix and a plurality of split variables of a trained model, calculating a new parameter matrix having a block-diagonal matrix for the plurality of split variables and the trained model to minimize a loss function for the trained model, a weight decay regularization term, and an objective function including a split regularization term defined by the parameter matrix and the plurality of split variables, vertically splitting the plurality of layers according to the group based on the computed split parameters and reconstruct the trained model using the computed new parameter matrix as parameters of the vertically split layers.
Electronic apparatus and method for optimizing trained model
An electronic apparatus is provided. The electronic apparatus includes: a memory storing a trained model including a plurality of layers; and a processor initializing a parameter matrix and a plurality of split variables of a trained model, calculating a new parameter matrix having a block-diagonal matrix for the plurality of split variables and the trained model to minimize a loss function for the trained model, a weight decay regularization term, and an objective function including a split regularization term defined by the parameter matrix and the plurality of split variables, vertically splitting the plurality of layers according to the group based on the computed split parameters and reconstruct the trained model using the computed new parameter matrix as parameters of the vertically split layers.
Distributed processing architecture
Embodiments of the present disclosure include techniques for processing neural networks. Various forms of parallelism may be implemented using topology that combines sequences of processors. In one embodiment, the present disclosure includes a computer system comprising a plurality of processor groups, the processor groups each comprising a plurality of processors. A plurality of network switches are coupled to subsets of the plurality of processor groups. A subset of the processors in the processor groups may be configurable to form sequences, and the network switches are configurable to form at least one sequence across one or more of the plurality of processor groups to perform neural network computations. Various alternative configurations for creating Hamiltonian cycles are disclosed to support data parallelism, pipeline parallelism, layer parallelism, or combinations thereof.
Grouped convolution using point-to-point connected channel convolution engines
A processor system comprises a plurality of processing elements. Each processing element includes a corresponding convolution processor unit configured to perform a portion of a groupwise convolution. The corresponding convolution processor unit determines multiplication results by multiplying each data element of a portion of data elements in a convolution data matrix with a corresponding data element in a corresponding groupwise convolution weight matrix. The portion of data elements in the convolution data matrix that are multiplied belong to different channels and different groups. For each specific channel of the different channels, the corresponding convolution processor unit sums together at least some of the multiplication results belonging to the same specific channel to determine a corresponding channel convolution result data element. The processing elements sum together a portion of the channel convolution result data elements from a group of different convolution processor units to determine a groupwise convolution result data element.