Patent classifications
G06F9/3885
DYNAMIC WORKLOAD DISTRIBUTION FOR DATA PROCESSING
A computer-implemented method, according to one embodiment, includes: receiving a data process that includes a plurality of sub-processes. A unique subset of the sub-processes is assigned to each of: a managing thread, and at least one other thread. Moreover, performance characteristics of each of the threads is evaluated while the respective subsets of sub-processes are being performed, and a determination is made as to whether the performance characteristics of each of the threads are substantially equal to the performance characteristics of each of the other threads. In response to determining that performance characteristics of each of the threads are not substantially equal, the subsets of the sub-processes are dynamically adjusted such that the performance characteristics of each of the threads become more equal. Moreover, the adjusted subsets of the sub-processes are reassigned to each of the managing thread and at least one other thread.
Apparatus and method of a scalable and reconfigurable fast fourier transform
A novel design for conflict free address generation mechanism is provided for reading data from Block RAM (BRAM) into a Fast Fourier Transform (FFT) module and writing back the processed data back to the BRAM. Also, a novel way of reducing a memory footprint by reducing a twiddle factor table size by taking an advantage of the symmetry property of twiddle factors is presented. Further, additional architecture-specific optimizations are provided, which involve a design of deeply pipelined butterfly modules and the BRAM accesses, parallel butterfly modules for a single FFT block and parallel FFT lane implementation.
Parallel processing apparatus of controlling node activation timing, node activation method of controlling node activation timing, and non-transitory computer-readable storage medium for storing program of controlling node activation timing
A parallel processing apparatus includes: a first node including a storage unit that stores a program, the first node being activated when the program loaded from the storage unit is executed; a second node activated when the program loaded from the storage unit of the first node is executed; and a control unit configured to execute a setting process for setting a state where the program may be loaded to each of the first node and the second node, wherein the control unit starts the setting process on the second node after a predetermined time elapses since start of the setting process on the first node, and wherein the predetermined time is a time at which an activation completion timing of the first node is aligned with a completion timing of the setting process on the second node.
DATA PROCESSING SYSTEM, DATA TRANSFER DEVICE, AND CONTEXT SWITCHING METHOD
A processing section executes processes concerning a plurality of applications in a time division manner. A CSDMA engine detects a switching timing of an application to be executed in the processing section. When detecting the switching timing, the CSDMA engine saves a context of an application that is being executed in the processing section 46, to a main memory from the processing section, and installs a context of an application to be subsequently executed in the processing section, from the main memory to the processing section, not through a process by software managing the plurality of applications.
PERFORMING TESTING UTILIZING STAGGERED CLOCKS
During functional/normal operation of an integrated circuit including multiple independent processing elements, a selected independent processing element is taken offline and the functionality of the selected independent processing element is then tested while the remaining independent processing elements continue functional operation. To minimize voltage drops resulting from current fluctuations produced by the testing of the processing element, clocks used to synchronize operations within each partition of a processing element are staggered. This varies the toggle rate within each partition of the processing element during the testing of the processing core, thereby reducing the resulting voltage drop. This may also improve test quality within an automated test equipment (ATE) environment.
PARALLEL CROSS VALIDATION IN COLLABORATIVE MACHINE LEARNING
A computer-implemented method, a computer program product, and a computer system for parallel cross validation in collaborative machine learning. A server groups local models into groups. In each group, each local device uses its local data to validate accuracies of the local models and sends a validation result to a group leader or the server. The group leader or the server selects groups whose variances of the accuracies are not below a predetermined variance threshold. In each selected group, the group leader or the server compares an accuracy of each local model with an average value of the accuracies and randomly selects one or more local models whose accuracies do not exceed a predetermined accuracy threshold. The server obtains weight parameters of selected local models and updates the global model based on the weight parameters.
Micro-processor circuit and method of performing neural network operation
A micro-processor circuit and a method of performing neural network operation are provided. The micro-processor circuit is suitable for performing neural network operation. The micro-processor circuit includes a parameter generation module, a compute module and a truncation logic. The parameter generation module receives in parallel a plurality of input parameters and a plurality of weight parameters of the neural network operation. The parameter generation module generates in parallel a plurality of sub-output parameters according to the input parameters and the weight parameters. The compute module receives in parallel the sub-output parameters. The compute module sums the sub-output parameters to generate a summed parameter. The truncation logic receives the summed parameter. The truncation logic performs a truncation operation based on the summed parameter to generate a plurality of output parameters of the neural network operation.
INSTRUCTION GENERATING METHOD, ARITHMETIC PROCESSING DEVICE, AND INSTRUCTION GENERATING DEVICE
With respect to a method of generating an instruction to be executed by an arithmetic processing device including first blocks, each of the first blocks including execution sections, the method includes generating, by at least one processor, at least one data transfer instruction that causes the arithmetic processing device to perform at least one of first data transfers, second data transfers, third data transfers, or fourth data transfers. Transfer sources of the first data transfers are execution sections, transfer destinations of the first data transfers are execution sections, transfer sources of the second data transfers are first blocks, transfer destinations of the second data transfers are first blocks, transfer sources of the third data transfers are first blocks, transfer destinations of the third data transfers are execution sections, transfer sources of the fourth data transfers are execution sections, and transfer destinations of the fourth data transfers are first blocks.
Communication in a computer having multiple processors
A computer comprising a plurality of processors, each of which are configured to perform operations on data during a compute phase for the computer and, following a pre-compiled synchronisation barrier, exchange data with at least one other of the processors during an exchange phase for the computer, wherein of the processors in the computer is indexed and the data exchange operations carried out by each processor in the exchange phase depend upon its index value.
Sampling-based preview mode for a data intake and query system
Systems and methods are described for providing a user interface through which a user can program operation of a data processing pipeline by specifying a graph of nodes that transform data and interconnections that designate routing of data between individual nodes within the graph. In response to a user request, a preview mode can be activated that causes the data processing pipeline to retrieve data from at least one source specified by the graph, transform the data according to the nodes of the graph, sample the transformed data, and display the sampling of the transformed data to at least one node without writing the transformed data to at least one destination specified by the graph.