Patent classifications
G06F9/355
PARALLEL PROCESSOR, ADDRESS GENERATOR OF PARALLEL PROCESSOR, AND ELECTRONIC DEVICE INCLUDING PARALLEL PROCESSOR
Disclosed is a parallel processor. The parallel processor includes a processing element array including a plurality of processing elements arranged in rows and columns, a row memory group including row memories corresponding to rows of the processing elements, a column memory group including column memories corresponding to columns of the processing elements, and a controller to generate a first address and a second address, to send the first address to the row memory group, and to send the second address to the column memory group. The controller supports convolution operations having mutually different forms, by changing a scheme of generating the first address.
Vector index registers
Disclosed herein are vector index registers in vector processors that each store multiple addresses for accessing multiple positions in vectors. It is known to use scalar index registers in vector processors to access multiple positions of vectors by changing the scalar index registers in vector operations. By using a vector indexing register for indexing positions of one or more operand vectors, the scalar index register can be replaced and at least the continual changing of the scalar index register can be avoided.
Vector index registers
Disclosed herein are vector index registers in vector processors that each store multiple addresses for accessing multiple positions in vectors. It is known to use scalar index registers in vector processors to access multiple positions of vectors by changing the scalar index registers in vector operations. By using a vector indexing register for indexing positions of one or more operand vectors, the scalar index register can be replaced and at least the continual changing of the scalar index register can be avoided.
SCALING SYSTEM AND CALCULATION METHOD
A scaling system ranges over a first base in which a first calculating apparatus having first and second worker nodes operates, and a second base in which a storage apparatus connected to the first base by a network is set. The storage apparatus includes first and second network ports, and first and second volumes accessed by first and second worker node, respectively. A second calculating apparatus is further set in the first base, and the second worker node is moved to the second calculating apparatus and operates as a third worker node and the second volume communicates with the third worker node through the second network port if the transfer rate of the first calculating apparatus or the transfer rate of the first network port exceeds a predetermined threshold when the first and second volumes are in communication with the first calculating apparatus through the first network port.
MESSAGE BASED MULTI-PROCESSOR SYSTEM AND METHOD OF OPERATING THE SAME
The present application discloses a message based multi-processor system (1) comprising a message exchange network (R,L) and a plurality of processor clusters (Ci,j) capable to mutually exchange messages via the message exchange network. A processor cluster (Ci,j) comprises one or more processor cluster elements (PCE), and a message generator (MG). The message based multiprocessor system (1) is configured as a neural network processor system having a plurality of neural network processing layers (e.g. NL1, . . . ,NL5), each being assigned one or more of the processor clusters with their associated processor cluster elements being neural network processing elements therein. The message generator (MG) of a processor cluster (Ci,j) (associated with a neural network processing layer) comprises a logic module (MGL) and an associated message generator control storage space (MGM), wherein the logic module of a message generator in response to an activation signal (Sact([X,Y])) of a processor cluster element is configured to selectively generate and transmit a message for each of a set of destination processor clusters in accordance with respective message generation control data (CD1, CD2, CD3) for said destination processor clusters stored in the message generator control storage space (MGM).
MULTI-DIMENSIONAL FFT COMPUTATION PIPELINED HARDWARE ARCHITECTURE USING RADIX-3 AND RADIX-2² BUTTERFLIES
A Radix-3 butterfly circuit includes a first FIFO input configured to couple to a first FIFO. The circuit includes a first adder and first subtractor coupled to the first FIFO input, and a second FIFO input configured to couple to a second FIFO. The circuit includes a second adder and second subtractor coupled to the second FIFO input, and an input terminal coupled to the first adder and first subtractor. The circuit includes a first scaler coupled to the second adder and a first multiplexer, and a second scaler coupled to a third adder and second multiplexer. The circuit includes a third scaler coupled to a third subtractor and third multiplexer. An output of the first multiplexer is coupled to a complex multiplier. An output of the second multiplexer is coupled to a second FIFO output. An output of the third multiplexer is coupled to a first FIFO output.
Address generation method, related apparatus, and storage medium
A system parses a very long instruction word (VLIW) to obtain an execution parameter. The system obtains a first sliding window width count, a first sliding window height count, a first feature map width count, and a first feature map height count that correspond to first target data. In accordance with a determination that the first sliding window width count falls within the sliding window width range, the first sliding window height count falls within the sliding window height range, (the first feature map width count falls within the feature map width range, and the first feature map height count falls within the feature map height range, the system determines an offset of the first target data. The system also obtains a starting address of the first target data, and adds the starting address to the offset to obtain a first target address of the first target data.
Address generation method, related apparatus, and storage medium
A system parses a very long instruction word (VLIW) to obtain an execution parameter. The system obtains a first sliding window width count, a first sliding window height count, a first feature map width count, and a first feature map height count that correspond to first target data. In accordance with a determination that the first sliding window width count falls within the sliding window width range, the first sliding window height count falls within the sliding window height range, (the first feature map width count falls within the feature map width range, and the first feature map height count falls within the feature map height range, the system determines an offset of the first target data. The system also obtains a starting address of the first target data, and adds the starting address to the offset to obtain a first target address of the first target data.
LOOK-UP TABLE INITIALIZE
A digital data processor includes an instruction memory storing instructions specifying a data processing operation and a data operand field, an instruction decoder coupled to the instruction memory for recalling instructions from the instruction memory and determining the operation and the data operand, and an operational unit coupled to a data register file and to an instruction decoder to perform a data processing operation upon an operand corresponding to an instruction decoded by the instruction decoder and storing results of the data processing operation. The operational unit is configured to perform a table write in response to a look up table initialization instruction by duplicating at least one data element from a source data register to create duplicated data elements, and writing the duplicated data elements to a specified location in a specified number of at least one table and a corresponding location in at least one other table.
Hierarchical workload allocation in a storage system
A method for hierarchical workload allocation in a storage system, the method may include determining to reallocate a compute workload of a current compute core of the storage system; wherein the current compute core is responsible for executing a workload allocation unit that comprises one or more first type shards; and reallocating the compute workload by (a) maintaining the responsibility of the current compute core for executing the workload allocation unit, and (b) reallocating at least one first type shard of the one or more first type shards to a new workload allocation unit that is allocated to a new compute core of new compute cores.