Patent classifications
G06N3/0495
METHODS AND APPARATUS FOR COMMUNICATING VECTOR DATA
A method of communicating time correlated vector data within a network includes reading, by a transmitting node, a first vector data including a plurality of elements, selecting, by the transmitting node, a subset of elements of the plurality of elements based on a criteria and sending, by the transmitting node, the subset of elements to a receiving node. The receiving node receives the subset of elements and estimates a plurality of elements not included in the subset of elements based on a previously received subset of element based on a second vector data. The first vector data and the second vector data are part of a time series of vectors.
METHOD AND SYSTEM FOR ON-DEVICE INFERENCE IN A DEEP NEURAL NETWORK (DNN)
The disclosure relates to method and system for on-device inference in a deep neural network (DNN). The method comprises: determining whether one or more layers of the DNN satisfy one of a first, a second and a third condition, the one or more layers including one or more convolution layers and one or more resampling layers; performing the on-device inference based on the determination, wherein performing the on-device inference comprises at least one of: optimizing the one or more convolution layers in the one or more parallel branches based on the one or more layers of the DNN satisfying the first condition, optimizing the at least one of the resampling layers based on the one or more layers of the DNN satisfying the second condition, and modifying operation of the at least one of the resampling layers based on the one or more layers of the DNN satisfying the third condition.
Method and data processing system for lossy image or video encoding, transmission and decoding
A method for lossy image or video encoding, transmission and decoding, the method comprising the steps of: receiving an input image at a first computer system; encoding the first input training image using a first trained neural network to produce a latent representation; performing a quantization process on the latent representation to produce a quantized latent; entropy encoding the quantized latent using a probability distribution, wherein the probability distribution is defined using a tensor network; transmitting the entropy encoded quantized latent to a second computer system; entropy decoding the entropy encoded quantized latent using the probability distribution to retrieve the quantized latent; and decoding the quantized latent using a second trained neural network to produce an output image, wherein the output image is an approximation of the input training image.
PARALLEL METHOD AND DEVICE FOR CONVOLUTION COMPUTATION AND DATA LOADING OF NEURAL NETWORK ACCELERATOR
Disclosed are a parallel method and device for convolution computation and data loading of a neural network accelerator. The method needs two input feature maps and two convolution kernel cache blocks, and sequentially stores the input feature maps and 64 convolution kernels into cache sub-blocks according to a loading length, so as to execute convolution computation and simultaneously load data of a next group of 64 convolution kernels.
METHOD AND SYSTEM FOR SPLITTING AND BIT-WIDTH ASSIGNMENT OF DEEP LEARNING MODELS FOR INFERENCE ON DISTRIBUTED SYSTEMS
System and method for splitting a trained neural network into a first neural network for execution on a first device and a second neural network for execution on a second device. The splitting is performed to optimize, within an accuracy constraint, an overall latency of: the execution of the first neural network on the first device to generate a feature map output based on input data, transmission of the feature map output from the first device to the second device, and execution of the second neural network on the second device to generate an inference output based on the feature map output from the first device.
Low-Power Fast-Response Machine Learning Variable Image Compression
Computing devices, such as mobile computing devices, have access to one or more image sensors that can capture images and video with multiple subjects. Some of these subjects may vary in priority for various tasks. It may be desired to increase or decrease the compression on each subject in order to more efficiently store the image data. Low-power, fast-response machine learning logic can be configured to allow for the generation of a plurality of inference data. Inference data can be associated with the type, motion and/or priority of the subjects as desired. This inference data can be utilized along with other subject data to generate one or more variable compression regions within the image data. The image data can be subsequently processed to compress different areas of the image based on a desired application. The variably compressed image can reduce file sizes and allow for more efficient storage and processing.
Computing device and method
A computing device, comprising: a computing module, comprising one or more computing units; and a control module, comprising a computing control unit, and used for controlling shutdown of the computing unit of the computing module according to a determining condition. Also provided is a computing method. The computing device and method have the advantages of low power consumption and high flexibility, and can be combined with the upgrading mode of software, thereby further increasing the computing speed, reducing the computing amount, and reducing the computing power consumption of an accelerator.
Training sparse networks with discrete weight values
Some embodiments provide a method for training a machine-trained (MT) network. The method propagates multiple inputs through the MT network to generate an output for each of the inputs. each of the inputs is associated with an expected output, the MT network uses multiple network parameters to process the inputs, and each network parameter of a set of the network parameters is defined during training as a probability distribution across a discrete set of possible values for the network parameter. The method calculates a value of a loss function for the MT network that includes (i) a first term that measures network error based on the expected outputs compared to the generated outputs and (ii) a second term that penalizes divergence of the probability distribution for each network parameter in the set of network parameters from a predefined probability distribution for the network parameter.
AUTOMATED DESIGN OF ARCHITECTURES OF ARTIFICIAL NEURAL NETWORKS
A method and apparatus of a device of determining a reduced space neural network architecture is described. In an exemplary embodiment, the device receives a full space neural network architecture, wherein the full space architecture includes a first plurality of nodes and a set of weights. The device may further transform the set of weights. In addition, the device may also reduce the first plurality of nodes using the transformed set of weights to create second plurality of nodes. Furthermore, the device can create the reduced space neural network architecture using the second plurality of nodes.
REINFORCEMENT LEARNING DEVICE AND OPERATION METHOD THEREOF
A reinforcement learning device includes a computation circuit configured to perform an operation between a weight matrix and an input activation vector and to apply an activation function on an output of the operation to generate an output activation vector. The computation circuit quantizes the input activation vector when a quantization delay time has elapsed since beginning of a learning operation and does not quantize the input activation vector otherwise.