Patent classifications
G06F7/49915
OPERATION OF AN ACCUMULATION PROCESS IN A DATA PROCESSING APPARATUS
Herein described is a method of operating an accumulation process in a data processing apparatus. The accumulation process comprises a plurality of accumulations which output a respective plurality of accumulated values, each based on a stored value and a computed value generated by a data processing operation. The method comprises storing a first accumulated value, the first accumulated value being one of said plurality of accumulated values, into a first storage device comprising a plurality of single-bit storage elements; determining that a predetermined trigger has been satisfied with respect to the accumulation process; and in response to the determining, storing at least a portion of a second accumulated value, the second accumulated value being one of said plurality of accumulated values, into a second storage device.
TRACKING STREAMING ENGINE VECTOR PREDICATES TO CONTROL PROCESSOR EXECUTION
In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.
Highly integrated scalable, flexible DSP megamodule architecture
Disclosed embodiments include a data processing apparatus having a processing core, a memory, and a streaming engine. The streaming engine is configured to receive a plurality of data elements stored in the memory and to provide the plurality of data elements as a data stream to the processing core, and includes an address generator to generate addresses corresponding to locations in the memory, a buffer to store the data elements received from the locations in the memory corresponding to the generated addresses, and an output to supply the data elements received from the memory to the processing core as the data stream.
FLOATING POINT TO FIXED POINT CONVERSION USING EXPONENT OFFSET
A binary logic circuit converts a number in floating point format having an exponent E, an exponent bias B=2.sup.ew-1−1, and a significand comprising a mantissa M of mw bits into a fixed point format with an integer width of iw bits and a fractional width of fw bits. The circuit includes an offset unit configured to offset the exponent of the floating point number by an offset value equal to (iw−/−s.sub.y) to generate a shift value s.sub.v of sw bits given by s.sub.v=(B−E)+(iw−1−s.sub.y), the offset value being equal to a maximum amount by which the significand can be left-shifted before overflow occurs in the fixed point format; a right-shifter operable to receive a significand input comprising a formatted set of bits derived from the significand, the shifter being configured to right-shift the input by a number of bits equal to the value represented by k least significant bits of the shift value to generate an output result, where bitwidth[min(2.sup.ew-1−1, iw−1−s.sub.y)+min(2.sup.ew-1−2, fw)]≤k≤sw, where s.sub.y=1 for a signed floating point number and s.sub.y=0 for an unsigned floating point number.
Method and apparatus for vector sorting
A method for sorting of a vector in a processor is provided that includes performing, by the processor in response to a vector sort instruction, sorting of values stored in lanes of the vector to generate a sorted vector, wherein the values are sorted in an order indicated by the vector sort instruction, and storing the sorted vector in a storage location.
ARGMAX USE FOR MACHINE LEARNING
A processing device is provided which comprises memory and a processor. The processor is configured to receive an array of floating point numbers each having a plurality of bits used to represent a probability value. For each floating point number, the processor is configured to replace values in a portion of the bits used to represent the probability value with index values to represent an index corresponding to a location of a corresponding floating point number in the memory. The processor is also configured to process the floating point numbers using SIMD instructions to execute one of an argmax operation and an argmin operation.
TWO ADDRESS TRANSLATIONS FROM A SINGLE TABLE LOOK-ASIDE BUFFER READ
A streaming engine employed in a digital data processor specifies a fixed read only data stream. An address generator produces virtual addresses of data elements. An address translation unit converts these virtual addresses to physical addresses by comparing the most significant bits of a next address N with the virtual address bits of each entry in an address translation table. Upon a match, the translated address is the physical address bits of the matching entry and the least significant bits of address N. The address translation unit can generate two translated addresses. If the most significant bits of address N+1 match those of address N, the same physical address bits are used for translation of address N+1. The sequential nature of the data stream increases the probability that consecutive addresses match the same address translation entry and can use this technique.
KERNEL COEFFICIENT QUANTIZATION
Apparatuses, systems, and techniques to optimize memory usage when performing matrix operations. In at least one embodiment, a matrix is optimized to limit memory and storage requirements while minimizing loss of precision for a sum of the members of the matrix.
Tracking streaming engine vector predicates to control processor execution
In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.
PROCESSOR AND SYSTEM TO MANIPULATE FLOATING POINT AND INTEGER VALUES IN COMPUTATIONS
Systems and techniques to convert data value types. In at least one embodiment, data value types are converted by adjusting data floating point numbers to identify integer values and adjusting integer values to identify floating point numbers without applying mathematical operations with respect to conversions.