Patent classifications
G06F9/3851
Method for migrating CPU state from an inoperable core to a spare core
An apparatus is disclosed in which the apparatus may include a plurality of cores, including a first core, a second core and a third core, and circuitry coupled to the first core. The first core may be configured to process a plurality of instructions. The circuitry may be may be configured to detect that the first core stopped committing a subset of the plurality of instructions, and to send an indication to the second core that the first core stopped committing the subset. The second core may be configured to disable the first core from further processing instructions of the subset responsive to receiving the indication, and to copy data from the first core to a third core responsive to disabling the first core. The third core may be configured to resume processing the subset dependent upon the data.
JOB SCHEDULER TEST PROGRAM, JOB SCHEDULER TEST METHOD, AND INFORMATION PROCESSING APPARATUS
A non-transitory computer-readable storage medium storing therein a job scheduler test program that causes a computer to execute a process includes: determining whether or not a state of every thread of a test-target job scheduler is a standby state; and changing a time of a system referenced when the thread executes a process to a time that is put forward in a case where the state of every thread is the standby state.
Control registers to store thread identifiers for threaded loop execution in a self-scheduling reconfigurable computing fabric
Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an interconnection network; a processor; and a plurality of configurable circuit clusters. Each configurable circuit cluster includes a plurality of configurable circuits arranged in an array; a synchronous network coupled to each configurable circuit of the array; and an asynchronous packet network coupled to each configurable circuit of the array. A representative configurable circuit includes a configurable computation circuit and a configuration memory having a first, instruction memory storing a plurality of data path configuration instructions to configure a data path of the configurable computation circuit; and a second, instruction and instruction index memory storing a plurality of spoke instructions and data path configuration instruction indices for selection of a master synchronous input, a current data path configuration instruction, and a next data path configuration instruction for a next configurable computation circuit.
Channel-parallel compression with random memory access
A data compressor a zero-value remover, a zero bit mask generator, a non-zero values packer, and a row-pointer generator. The zero-value remover receives 2.sup.N bit streams of values and outputs 2.sup.N non-zero-value bit streams having zero values removed from each respective bit stream. The zero bit mask generator receives the 2.sup.N bit streams of values and generates a zero bit mask for a predetermined number of values of each bit stream in which each zero bit mask indicates a location of a zero value in the predetermined number of values corresponding to the zero bit mask. The non-zero values packer receives the 2.sup.N non-zero-value bit streams and forms a group of packed non-zero values. The row-pointer generator that generates a row-pointer for each group of packed non-zero values.
Method and apparatus for stateless parallel processing of tasks and workflows
In a method for parallel processing of a data stream, a processing task is received to process the data stream that includes a plurality of segments. A split operation is performed on the data stream to split the plurality of segments into N sub-streams. Each of the N sub-streams includes one or more segments of the plurality of segments. The N is a positive integer. N sub-processing tasks are performed on the N sub-streams to generate N processed sub-streams. A merge operation is performed on the N processed sub-streams based on a merge buffer to generate a merged output data stream. The merge buffer includes an output iFIFO buffer and N sub-output iFIFO buffers coupled to the output iFIFO buffer. The merged output data stream is identical to an output data stream that is generated when the processing task is applied directly to the data stream without the split operation.
Thread group scheduling for graphics processing
Embodiments are generally directed to thread group scheduling for graphics processing. An embodiment of an apparatus includes a plurality of processors including a plurality of graphics processors to process data; a memory; and one or more caches for storage of data for the plurality of graphics processors, wherein the one or more processors are to schedule a plurality of groups of threads for processing by the plurality of graphics processors, the scheduling of the plurality of groups of threads including the plurality of processors to apply a bias for scheduling the plurality of groups of threads according to a cache locality for the one or more caches.
THREAD SYNCHRONIZATION ACROSS MEMORY SYNCHRONIZATION DOMAINS
Various embodiments include a parallel processing computer system that provides multiple memory synchronization domains in a single parallel processor to reduce unneeded synchronization operations. During execution, one execution kernel may synchronize with one or more other execution kernels by processing outstanding memory references. The parallel processor tracks memory references for each domain to each portion of local and remote memory. During synchronization, the processor synchronizes the memory references for a specific domain while refraining from synchronizing memory references for other domains. As a result, synchronization operations between kernels complete in a reduced amount of time relative to prior approaches.
Inferring future value for speculative branch resolution
Aspects of the invention include includes determining a first instruction in a processing pipeline, wherein the first instruction includes a compare instruction, determining a second instruction in the processing pipeline, wherein the second instruction includes a conditional branch instruction relying on the compare instruction, determining a predicted result of the compare instruction, and completing the conditional branch instruction using the predicted result prior to executing the compare instruction.
METHOD AND APPARATUS TO SORT A VECTOR FOR A BITONIC SORTING ALGORITHM
A method is provided that includes performing, by a processor in response to a vector sort instruction, sorting of values stored in lanes of the vector to generate a sorted vector, wherein the values in a first portion of the lanes are sorted in a first order indicated by the vector sort instruction and the values in a second portion of the lanes are sorted in a second order indicated by the vector sort instruction; and storing the sorted vector in a storage location.
EFFICIENT INTER-THREAD COMMUNICATION BETWEEN HARDWARE PROCESSING THREADS OF A HARDWARE MULTITHREADED PROCESSOR BY SELECTIVE ALIASING OF REGISTER BLOCKS
A hardware multithreaded processor including a register file, a thread controller, and aliasing circuitry. The thread controller is configured to assign each of multiple hardware processing threads to a corresponding one of multiple register block sets in which each register block set includes at least two of multiple register blocks and in which each register block includes at least two registers. The aliasing circuitry is programmable to redirect a reference provided by a first hardware processing thread to a register of a register block assigned to a second hardware processing thread. The reference may be a register number in an instruction issued by the first hardware processing thread. The register number is converted by the aliasing circuitry to a register file address locating a register of the register block assigned to the second hardware processing thread. The aliasing circuitry may include a programmable register for one or more threads.