G06F7/768

Transposing in a matrix-vector processor

A circuit for transposing a matrix comprising reversal circuitry configured, for each of one or more diagonals of the matrix, to receive elements of the matrix in a first vector and generate a second vector that includes the elements of the matrix in an order that is a reverse of an order of the elements of the matrix in the first vector, and rotation circuitry configured, for each of the one or more diagonals of the matrix, to determine a number of positions by which to rotate the elements of the matrix in the second vector, receive the second vector of elements of the matrix, and generate a third vector that includes the elements of the matrix in the second vector in an order that is a rotation of the elements of the matrix in the second vector by the determined number of positions.

Processor memory optimization method and apparatus for deep learning training tasks

The present application discloses a processor video memory optimization method and apparatus for deep learning training tasks, and relates to the technical field of artificial intelligence. In the method, by determining an optimal path for transferring a computing result, the computing result of a first computing unit is transferred to a second computing unit by using the optimal path. Thus, occupying the video memory is avoided, and meanwhile, a problem of low utilization rate of the computing unit of a GPU caused by video memory swaps is avoided, so that training speed of most tasks is hardly reduced.

SEMICONDUCTOR DEVICE AND DATA PROCESSING SYSTEM SELECTIVELY OPERATING AS ONE OF A BIG ENDIAN OR LITTLE ENDIAN SYSTEM
20170345401 · 2017-11-30 ·

The present invention is to provide a semiconductor device that can correctly switch endians on the outside even if the endian of a parallel interface is not recognized on the outside. The semiconductor device includes a switching circuit and a first register. The switching circuit switches between whether a parallel interface with the outside is to be used as a big endian or a little endian. A first register holds control data of the switching circuit. The switching circuit regards the parallel interface as the little endian when first predetermined control information, that is unchanged in the values of specific bit positions even if its high-order and low-order bit positions are transposed, is supplied to the first register, and regards the parallel interface as the big endian when second predetermined control information, that is unchanged in the values of specific bit positions even if its high-order and low-order bit positions are transposed, is supplied to the first register. Whatever the endian setting status, the control information can be correctly inputted without being influenced by the endian setting status.

Semiconductor device and data processing system selectively operating as one of a big endian or little endian system

The present invention is to provide a semiconductor device that can correctly switch endians on the outside even if the endian of a parallel interface is not recognized on the outside. The semiconductor device includes a switching circuit and a first register. The switching circuit switches between whether a parallel interface with the outside is to be used as a big endian or a little endian. A first register holds control data of the switching circuit. The switching circuit regards the parallel interface as the little endian when first predetermined control information, that is unchanged in the values of specific bit positions even if its high-order and low-order bit positions are transposed, is supplied to the first register, and regards the parallel interface as the big endian when second predetermined control information, that is unchanged in the values of specific bit positions even if its high-order and low-order bit positions are transposed, is supplied to the first register. Whatever the endian setting status, the control information can be correctly inputted without being influenced by the endian setting status.

ACCESS NETWORK FOR ADDRESS MAPPING IN NON-VOLATILE MEMORIES
20170177470 · 2017-06-22 ·

Systems and methods for determining a physical block address (PBA) of a non-volatile memory (NVM) to enable a data access of a corresponding logical block address (LBA) are described. One such method includes generating a first physical block address (PBA) candidate from a LBA using a first function; generating a second physical block address (PBA) candidate from the LBA using a second function; and selecting either the first PBA candidate or the second PBA candidate for the data access based on information related to a background swap of data stored at the first PBA candidate and a background swap of data stored at the second PBA candidate.

DATA PROCESSING APPARATUS AND METHOD

A data processing apparatus comprises a processing circuit and instruction decoder. A bitfield manipulation instruction controls the processing apparatus to generate at least one result data element from corresponding first and second source data elements. Each result data element includes a portion corresponding to a bitfield of the corresponding first source data element. Bits of the result data element that are more significant than the inserted bitfield have a prefix value that is selected, based on a control value specified by the instruction, as one of a first prefix value having a zero value, a second prefix value having the value of a portion of the corresponding second source data element, and a third prefix value corresponding to a sign extension of the bitfield of the first source data element.

METHOD AND SYSTEM FOR REMOVING NOISE FROM DIGITAL DATA
20250053377 · 2025-02-13 ·

A method includes obtaining N data elements; arranging the N data elements into a list; carrying out a series of N sorting cycles which alternate between a first sorting cycle and a second sorting cycle, wherein: in each first sorting cycle, numerical values of pairs of data elements in a first set of adjacent positions are compared, and the positions of the data elements in each pair are selectively swapped based on a result of the respective comparison; in each second sorting cycle, the numerical values of pairs of data elements in a second set of adjacent positions are compared, and the positions of the data elements in each pair are selectively swapped based on a result of the respective comparison; and the N data elements are arranged into an ordered list which is ordered in order of their numerical values; either: (i) outputting the data element, or the numerical value of the data element, in position (N+1)/2 if N is an odd number; or outputting an average of the numerical values of the data elements in positions N/2 and (N+1)/2 if N is an even number; or (ii) outputting an average of the numerical values of a group of data elements centred on a mid-point of the ordered list; and using the output as the median value of the N data elements.

Compiler optimizations for vector instructions

An optimizing compiler includes a vector optimization mechanism that optimizes vector instructions by eliminating one or more vector element reverse operations. The compiler can generate code that includes multiple vector element reverse operations that are inserted by the compiler to account for a mismatch between the endian bias of the instruction and the endian preference indicated by the programmer or programming environment. The compiler then analyzes the code and reduces the number of vector element reverse operations to improve the run-time performance of the code.

Apparatus and method including an instruction for performing a logical operation on a repeating data value generated based on data size and control parameter portions specified by the instruction

A data processing apparatus comprises a processing circuit and instruction decoder. A bitfield manipulation instruction controls the processing apparatus to generate at least one result data element from corresponding first and second source data elements. Each result data element includes a portion corresponding to a bitfield of the corresponding first source data element. Bits of the result data element that are more significant than the inserted bitfield have a prefix value that is selected, based on a control value specified by the instruction, as one of a first prefix value having a zero value, a second prefix value having the value of a portion of the corresponding second source data element, and a third prefix value corresponding to a sign extension of the bitfield of the first source data element. Bitwise logical instructions are also described.

Compiler optimizations for vector instructions

An optimizing compiler includes a vector optimization mechanism that optimizes vector instructions by eliminating one or more vector element reverse operations. The compiler can generate code that includes multiple vector element reverse operations that are inserted by the compiler to account for a mismatch between the endian bias of the instruction and the endian preference indicated by the programmer or programming environment. The compiler then analyzes the code and reduces the number of vector element reverse operations to improve the run-time performance of the code.