Patent classifications
G06F7/785
Writing beyond a pointer
Data processing apparatuses and methods of data processing are disclosed wherein a processing element maintains a buffer in the memory in support of the data processing it performs. A write pointer indicates a current write location in the buffer. A cache holds copies of the data which are subject to the data processing operations and allocations into the cache from the memory and write-backs from the cache to the memory are performed in cache line units of data. When the processing element performs a data write to the buffer at a location determined by the write pointer, the processor updates the write pointer in an update direction corresponding to a progression direction of data writes in the buffer, and further locations in the progression direction in the buffer between the location indicated by the write pointer and a boundary location are signalled to be written with a predetermined value.
k-Selection Using Parallel Processing
In one embodiment, a method includes accessing a query vector; accessing object vectors; determining input distances corresponding to a distance between the query vector and the object vectors; accessing thread queues; accessing a warp queue; for each of the input distance values: selecting one of the thread queues, when the input distance value is less than a greatest one of the distance values stored in the selected thread queue, inserting the input distance value into the thread queues and ejecting the greatest distance values stored in the thread queue, and when a greatest distance value stored in any of the thread queues is less than a greatest distance value stored in the warp queue, merging the thread queue with the warp queue; identifying the objects represented by an object vector corresponding to the distance values stored in the warp queue; and providing the search results for presentation.
HARDWARE-CONFIGURABLE LOGIC UNIT AND MICROCONTROLLER HAVING SUCH A HARDWARE-CONFIGURABLE LOGIC UNIT
A hardware-configurable logic unit having a plurality of coarse-grain hardware elements and having a control element, the control element being set up to allow a configuration of the coarse-grain hardware elements to be modified.
Multidimensional partitioned storage array and method utilizing input shifters to allow multiple entire columns or rows to be accessed in a single clock cycle
A multidimensional storage array (SA) system includes storage elements (SEs) arranged in storage array partitions, a plurality of input shifters, and a plurality of output shifters. One respective input shifter and output shifter is associated with one partition. The SEs are arranged into rows and columns and each store particular bit(s) of a data word. Each of the input shifters implements a positional shift to a data word that is then loaded to the associated partition. Each of the output shifters unloads a loaded data word, reverses the positional shift of the unloaded data word, and provides the data word to a requesting device, such as a decoder. The loaded data words are exposed so that multiple row or column addressed data words may be unloaded from the SA simultaneously in a single clock cycle. Multiple column or row address data word segments may be physically diagonally arranged within each storage array partition.
Matrix transposing circuit
The disclosure provides a matrix transposing circuit for outputting a transposed NN matrix. The matrix transposing circuit includes: an input resister array with mN array; a memory having b storage blocks; an output register array with Nm array. N, m, n, b are integer in power of 2, N can be completely divided by m and n, and N=nmb. The matrix is divided into multiple sub-matrixes with mn array to form Y matrix. Each of sub-matrixes is correspondingly stored to the b storage blocks. The input resister array has a first shifting direction to receive entry data and a second shifting direction to output data to the b storage blocks. The output resister array has a first shifting direction to read data from the b storage blocks and a second shifting direction to output the transposed matrix.
SPATIAL ARCHITECTURE FOR ATTENTION MECHANISMS
The present specification discloses a computing device architecture and method for executing computationally intensive attention mechanisms, as commonly utilized in transformer models for artificial intelligence. In an embodiment, the device includes a row of interconnected processing elements (PEs), each linked to dedicated coefficient memory units (CRAMs) and controlled by a central controller. By employing a diagonal-offset and transposition technique for matrix coefficients, the architecture enables efficient execution of Generalized Matrix-Vector (GEMV) operations.