G06F17/153

Method and system for convolution

Method and system relating generally to convolution is disclosed. In such a method, an image patch is selected from input data for a first channel of a plurality of input channels of an input layer. The selected image patch is transformed to obtain a transformed image patch. The transformed image patch is stored. Stored is a plurality of predetermined transformed filter kernels. A stored transformed filter kernel of the plurality of stored predetermined transformed filter kernels is element-wise multiplied by multipliers with the stored transformed image patch for a second channel of the plurality of input channels different from the first channel to obtain a product. The product is inverse transformed to obtain a filtered patch for the image patch.

Method and device for optimizing neural network

The embodiments of this application provide a method and device for optimizing neural network. The method includes: binarizing and bit-packing input data of a convolution layer along a channel direction, and obtaining compressed input data; binarizing and bit-packing respectively each convolution kernel of the convolution layer along the channel direction, and obtaining each corresponding compressed convolution kernel; dividing the compressed input data sequentially in a convolutional computation order into blocks of the compressed input data with the same size of each compressed convolution kernel, wherein the data input to one time convolutional computation form a data block; and, taking a convolutional computation on each block of the compressed input data and each compressed convolution kernel sequentially, obtaining each convolutional result data, and obtaining multiple output data of the convolution layer according to each convolutional result data.

Grouped convolution using point-to-point connected channel convolution engines

A processor system comprises a plurality of processing elements. Each processing element includes a corresponding convolution processor unit configured to perform a portion of a groupwise convolution. The corresponding convolution processor unit determines multiplication results by multiplying each data element of a portion of data elements in a convolution data matrix with a corresponding data element in a corresponding groupwise convolution weight matrix. The portion of data elements in the convolution data matrix that are multiplied belong to different channels and different groups. For each specific channel of the different channels, the corresponding convolution processor unit sums together at least some of the multiplication results belonging to the same specific channel to determine a corresponding channel convolution result data element. The processing elements sum together a portion of the channel convolution result data elements from a group of different convolution processor units to determine a groupwise convolution result data element.

UNSUPERVISED STATISTICAL METHOD FOR MULTIVARIATE IDENTIFICATION OF ATYPICAL SENSORS

A method for identifying atypical sensors measuring characteristics of individuals. Curves of characteristic of individuals are collected, the curves being measured by each sensor. For a given sensor, a reference curve is processed to calculate a dissimilarity index between the reference curve and each of the other curves of the sensor and the dissimilarity processing is iteratively repeated for each curve resulting from the same sensor to obtain the dissimilarity index for each curve. The dissimilarity processing is repeated for the other sensors to obtain a table of dissimilarity indices. An atypicality index is calculated for each individual from a multivariate statistical processing of the tables. Atypical individuals and atypical sensors are identified.

Physics simulation on machine-learning accelerated hardware platforms
11550971 · 2023-01-10 · ·

At least one machine-accessible storage medium that provides instructions that, when executed by a machine, will cause the machine to perform operations. The operations comprise configuring a simulated environment to be representative of a physical device based, at least in part, on an initial description of the physical device that described structural parameters of the physical device. The operations further comprise performing a physics simulation with an artificial intelligence (“AI”) accelerator. The AI accelerator includes a matrix multiply unit for computing convolution operations via a plurality of multiply-accumulate units. The operations further comprise computing a field response in response of the physical device in response to an excitation source within the simulated environment when performing the physics simulation. The field response is computed, at least in part, with the convolution operations to perform spatial differencing.

Image processing using registration by localized cross correlation (LXCOR)

Aligning multiple 3D images of an object can be difficult when the representative datasets (images) are large. An exemplary aspect of this technology teaches a technique to subdivide the images and use the alignments between the subdivided images to determine the alignment between the complete datasets.

Energy-efficient memory systems and methods

Described herein are systems and methods that increase the utilization and performance of computational resources, such as storage space and computation time, thereby, reducing computational cost. Various embodiments of the invention provide for a hardware structure that allows both streaming of source data that eliminates redundant data transfer and allows for in-memory computations that eliminate requirements for data transfer to and from intermediate storage. In certain embodiments, computational cost is reduced by using a hardware structure that enables mathematical operations, such as element-wise matrix multiplications employed by convolutional neural networks, to be performed automatically and efficiently.

HIGH THROUGHPUT MATRIX PROCESSOR WITH SUPPORT FOR CONCURRENTLY PROCESSING MULTIPLE MATRICES

A system comprises a data input vector unit, a weight input vector unit, and a plurality of calculation units. The data input vector unit is configured to concurrently receive elements of different rows of a first and second data matrix. The weight input vector unit is configured to receive a combined weight vector and at least in part concurrently provide obtained weight elements of a first and second weight matrix to a corresponding first and second group of calculation units. At least one calculation unit of each group of the first and second group of calculation units is configured to multiply elements from the data input vector unit with corresponding elements of the corresponding weight matrix from the weight input vector unit and sum together multiplication results of the corresponding calculation unit to at least in part determine a corresponding element in a first or second convolution result matrix.

Human-machine-interface system comprising a convolutional neural network hardware accelerator

A human-machine-interface system comprising: register-file-memory, configured to store input-data; a first-processing-element-slice, a second-processing-element-slice, and a controller. Each of the processing-slices comprise: a register configured to store register-data; and a processing-element configured to apply an arithmetic and logic operation on the register-data in order to provide convolution-output-data. The controller is configured to: load input-data from the register-file-memory into the first-register as the first-register-data; and load: (i) input-data from the register-file-memory, or (ii) the first-register-data from the first-register, into the second-register as the second-register-data.

Electronic device and control method thereof

Disclosed is an electronic device. The An electronic device including a storage, and a processor configured to perform convolution processing on target data and kernel data based on stride information that indicates an interval at which the kernel data is applied to the target data stored in the storage, in which the processor is further configured to divide the target data into a plurality of pieces of sub-data based on first stride information, perform the convolution processing on the plurality of pieces of sub-data and a plurality of pieces of sub-kernel data respectively corresponding to the plurality of pieces of sub-data based on second stride information that is different from the first stride information, and combine a plurality of processing results, the plurality of pieces of sub-kernel data are obtained by dividing the kernel data based on the first stride information, and the second stride information indicates that the interval at which the kernel data is applied to the target data is 1.