G06F7/78

MATRIX-BASED INTRA PREDICTION USING UPSAMPLING
20230132613 · 2023-05-04 ·

Devices, systems, and methods for digital video coding, which includes matrix-based intra prediction methods for video coding, are described. In a representative aspect, a method for video processing includes performing a conversion between a current video block of a video and a bitstream representation of the current video block using a matrix based intra prediction (MIP) mode in which a prediction block of the current video block is determined by performing, on previously coded samples of the video, a boundary downsampling operation, followed by a matrix vector multiplication operation, and followed by an upsampling operation, where the upsampling operation is performed, in both a vertical direction and a horizontal direction in a fixed order, on samples obtained from the matrix vector multiplication operation.

MATRIX-BASED INTRA PREDICTION USING UPSAMPLING
20230132613 · 2023-05-04 ·

Devices, systems, and methods for digital video coding, which includes matrix-based intra prediction methods for video coding, are described. In a representative aspect, a method for video processing includes performing a conversion between a current video block of a video and a bitstream representation of the current video block using a matrix based intra prediction (MIP) mode in which a prediction block of the current video block is determined by performing, on previously coded samples of the video, a boundary downsampling operation, followed by a matrix vector multiplication operation, and followed by an upsampling operation, where the upsampling operation is performed, in both a vertical direction and a horizontal direction in a fixed order, on samples obtained from the matrix vector multiplication operation.

Digital Sample Rate Conversion
20170371840 · 2017-12-28 ·

Methods, structures and computer program products for digital sample rate conversion are presented. An input digital sample with a first frequency is converted to an output sample with a second frequency. A sample rate conversion circuit is provided which provides an enhanced transposed farrow structure that enables an optimised trade-off between noise levels and computational complexity. Each output sample is derived by convolution of a continuous time interpolation kernel with a continuous time step function representing the input sample stream. In a sample rate conversion structure, there is a trade-off between the quality and the computational complexity. The quality is defined as a ratio between the (wanted) signal power and the (unwanted) noise power. The computational complexity may be defined as the average number of arithmetic operations that are required to generate one output sample. A higher computational complexity will generally lead to a higher power consumption and larger footprint.

Digital Sample Rate Conversion
20170371840 · 2017-12-28 ·

Methods, structures and computer program products for digital sample rate conversion are presented. An input digital sample with a first frequency is converted to an output sample with a second frequency. A sample rate conversion circuit is provided which provides an enhanced transposed farrow structure that enables an optimised trade-off between noise levels and computational complexity. Each output sample is derived by convolution of a continuous time interpolation kernel with a continuous time step function representing the input sample stream. In a sample rate conversion structure, there is a trade-off between the quality and the computational complexity. The quality is defined as a ratio between the (wanted) signal power and the (unwanted) noise power. The computational complexity may be defined as the average number of arithmetic operations that are required to generate one output sample. A higher computational complexity will generally lead to a higher power consumption and larger footprint.

BROADCASTING MODE OF PLANAR ENGINE FOR NEURAL PROCESSOR
20230206051 · 2023-06-29 ·

Embodiments relate to a neural processor that includes one or more neural engine circuits and planar engine circuits. The neural engine circuits can perform convolution operations of input data with one or more kernels to generate outputs. The planar engine circuit is coupled to the plurality of neural engine circuits. A planar engine circuit can be configured to multiple modes. In an elementwise mode, the planar engine circuit may combine two tensors by performing operations element by element. The planar engine circuit may support elementwise operation for two tensors that are in different sizes and ranks. The planar engine circuit may perform a broadcasting operation to duplicate one or more values across one or more channels to make a smaller tensor matching the size of the larger tensor.

BROADCASTING MODE OF PLANAR ENGINE FOR NEURAL PROCESSOR
20230206051 · 2023-06-29 ·

Embodiments relate to a neural processor that includes one or more neural engine circuits and planar engine circuits. The neural engine circuits can perform convolution operations of input data with one or more kernels to generate outputs. The planar engine circuit is coupled to the plurality of neural engine circuits. A planar engine circuit can be configured to multiple modes. In an elementwise mode, the planar engine circuit may combine two tensors by performing operations element by element. The planar engine circuit may support elementwise operation for two tensors that are in different sizes and ranks. The planar engine circuit may perform a broadcasting operation to duplicate one or more values across one or more channels to make a smaller tensor matching the size of the larger tensor.

COST-AWARE SECURE OUTSOURCING
20170364327 · 2017-12-21 ·

Systems and methods of the present invention provide for one or more server computers communicatively coupled to a network and configured to: receive a request to execute a computational task, including a transformed input used to execute a computational task. A client computer transforms the original input into the transformed input, using an affine mapping where the transformed input is a one-to-one equivalent to the original input (but which can't be inferred by the server computer), and according to a user selection limiting the computational complexity of the mapping according to resource constraints on the client. The server may then execute the computational task and transmit a result to the client to apply an inverse affine mapping, and receive a response which verifies that the computational task result is complete and valid.

COST-AWARE SECURE OUTSOURCING
20170364327 · 2017-12-21 ·

Systems and methods of the present invention provide for one or more server computers communicatively coupled to a network and configured to: receive a request to execute a computational task, including a transformed input used to execute a computational task. A client computer transforms the original input into the transformed input, using an affine mapping where the transformed input is a one-to-one equivalent to the original input (but which can't be inferred by the server computer), and according to a user selection limiting the computational complexity of the mapping according to resource constraints on the client. The server may then execute the computational task and transmit a result to the client to apply an inverse affine mapping, and receive a response which verifies that the computational task result is complete and valid.

Transposed convolution using systolic array

In one example, a neural network accelerator can execute a set of instructions to: load a first weight data element from a memory into a systolic array, the first weight data element having first coordinates; extract, from the instructions, information indicating a first subset of input data elements to be obtained from the memory, the first subset being based on a stride of a transposed convolution operation and second coordinates of first weight data element in a rotated array of weight data elements; based on the information, obtain the first subset of input data elements from the memory; load the first subset of input data elements into the systolic array; and control the systolic array to perform first computations based on the first weight data element and the first subset of input data elements to generate output data elements of an array of output data elements.

Transposed convolution using systolic array

In one example, a neural network accelerator can execute a set of instructions to: load a first weight data element from a memory into a systolic array, the first weight data element having first coordinates; extract, from the instructions, information indicating a first subset of input data elements to be obtained from the memory, the first subset being based on a stride of a transposed convolution operation and second coordinates of first weight data element in a rotated array of weight data elements; based on the information, obtain the first subset of input data elements from the memory; load the first subset of input data elements into the systolic array; and control the systolic array to perform first computations based on the first weight data element and the first subset of input data elements to generate output data elements of an array of output data elements.