G06F7/78

Broadcasting mode of planar engine for neural processor

Embodiments relate to a neural processor that includes one or more neural engine circuits and planar engine circuits. The neural engine circuits can perform convolution operations of input data with one or more kernels to generate outputs. The planar engine circuit is coupled to the plurality of neural engine circuits. A planar engine circuit can be configured to multiple modes. In an elementwise mode, the planar engine circuit may combine two tensors by performing operations element by element. The planar engine circuit may support elementwise operation for two tensors that are in different sizes and ranks. The planar engine circuit may perform a broadcasting operation to duplicate one or more values across one or more channels to make a smaller tensor matching the size of the larger tensor.

Broadcasting mode of planar engine for neural processor

Embodiments relate to a neural processor that includes one or more neural engine circuits and planar engine circuits. The neural engine circuits can perform convolution operations of input data with one or more kernels to generate outputs. The planar engine circuit is coupled to the plurality of neural engine circuits. A planar engine circuit can be configured to multiple modes. In an elementwise mode, the planar engine circuit may combine two tensors by performing operations element by element. The planar engine circuit may support elementwise operation for two tensors that are in different sizes and ranks. The planar engine circuit may perform a broadcasting operation to duplicate one or more values across one or more channels to make a smaller tensor matching the size of the larger tensor.

CALCULATOR AND CALCULATION METHOD
20230065733 · 2023-03-02 · ·

A calculator includes: registers each including sub-registers that hold pieces of data for use in operation; an operator that executes, in parallel, operations of the pieces of data; and a memory configured to hold a first vector and second vectors to be compared with the first vector. Each second vector is divided into sub-vectors and sub-vector groups each including the sub-vectors of the second vectors are arranged in units of sub-vector groups. A first process of transferring one of sub-vectors of the first vector to sub-registers of a first register among the registers, a second process of transferring the sub-vector group of the second vectors corresponding to the transferred sub-vector of the first vector to sub-registers of a second register, the sub-vector group being held in the memory, and a third process of calculating and integrating numbers of mismatches between bit values of the sub-vectors held are repeatedly executed.

CALCULATOR AND CALCULATION METHOD
20230065733 · 2023-03-02 · ·

A calculator includes: registers each including sub-registers that hold pieces of data for use in operation; an operator that executes, in parallel, operations of the pieces of data; and a memory configured to hold a first vector and second vectors to be compared with the first vector. Each second vector is divided into sub-vectors and sub-vector groups each including the sub-vectors of the second vectors are arranged in units of sub-vector groups. A first process of transferring one of sub-vectors of the first vector to sub-registers of a first register among the registers, a second process of transferring the sub-vector group of the second vectors corresponding to the transferred sub-vector of the first vector to sub-registers of a second register, the sub-vector group being held in the memory, and a third process of calculating and integrating numbers of mismatches between bit values of the sub-vectors held are repeatedly executed.

GENERIC IMAGE RESIZER USING MATRIX MULTIPLIER ACCELERATOR
20220327180 · 2022-10-13 ·

Techniques for resizing data including receiving input data values for resizing, placing a first number of data values from a first line of data values of the input data values in a first portion of a first vector, placing the first number of data values from a second line of data values of the input data values in a second portion of the first vector, receiving a first matrix of weights, wherein each weight of the first matrix of weights corresponds to an amount of weight to apply to a data value for a point on a first line of a set of resized data, multiplying the first vector and the first matrix of weights to determine data values for the first line of the set of resized data, and outputting the set of resized data.

GENERIC IMAGE RESIZER USING MATRIX MULTIPLIER ACCELERATOR
20220327180 · 2022-10-13 ·

Techniques for resizing data including receiving input data values for resizing, placing a first number of data values from a first line of data values of the input data values in a first portion of a first vector, placing the first number of data values from a second line of data values of the input data values in a second portion of the first vector, receiving a first matrix of weights, wherein each weight of the first matrix of weights corresponds to an amount of weight to apply to a data value for a point on a first line of a set of resized data, multiplying the first vector and the first matrix of weights to determine data values for the first line of the set of resized data, and outputting the set of resized data.

Vector bit transpose

A method to transpose source data in a processor in response to a vector bit transpose instruction includes specifying, in respective fields of the vector bit transpose instruction, a source register containing the source data and a destination register to store transposed data. The method also includes executing the vector bit transpose instruction by interpreting N×N bits of the source data as a two-dimensional array having N rows and N columns, creating transposed source data by transposing the bits by reversing a row index and a column index for each bit, and storing the transposed source data in the destination register.

Vector bit transpose

A method to transpose source data in a processor in response to a vector bit transpose instruction includes specifying, in respective fields of the vector bit transpose instruction, a source register containing the source data and a destination register to store transposed data. The method also includes executing the vector bit transpose instruction by interpreting N×N bits of the source data as a two-dimensional array having N rows and N columns, creating transposed source data by transposing the bits by reversing a row index and a column index for each bit, and storing the transposed source data in the destination register.

Methods and apparatus for power efficient design of forward error correction for optical communication systems

Consistent with a further aspect of the present disclosure, previously encoded data is stored in a memory, and an encoder accesses both input data and previously encoded data to generate new encoded data or a new codeword. Each codeword is stored in a row of the memory, and with each newly generated codeword, each previously stored code word is shifted to an adjacent row of the memory. In one example, the memory is delineated as a plurality of blocks including rows and columns of bits. When generating a new code word, randomly selected columns of bits in the memory are read from randomly selected blocks of the memory and supplied to the encoder. In this manner the number of times the memory is access is reduced and power consumption is reduced.

Methods and apparatus for power efficient design of forward error correction for optical communication systems

Consistent with a further aspect of the present disclosure, previously encoded data is stored in a memory, and an encoder accesses both input data and previously encoded data to generate new encoded data or a new codeword. Each codeword is stored in a row of the memory, and with each newly generated codeword, each previously stored code word is shifted to an adjacent row of the memory. In one example, the memory is delineated as a plurality of blocks including rows and columns of bits. When generating a new code word, randomly selected columns of bits in the memory are read from randomly selected blocks of the memory and supplied to the encoder. In this manner the number of times the memory is access is reduced and power consumption is reduced.