Patent classifications
G06F9/3885
Intra-shard parallelization of data stream processing using virtual shards
A data stream may include a plurality of records that are ordered, and the plurality of records may be assigned to a processing shard. A first set of virtual shards may be formed, the first set of virtual shards having a first quantity of virtual shards that perform parallel processing operations on behalf of the processing shard. First records of the plurality of records may be processed using the first set of virtual shards. The first quantity of virtual shards may be modified, based at least in part on an observed record age, to a second quantity of virtual shards that perform parallel processing operations on behalf of the processing shard. A second set of virtual shards may be formed having the second quantity of virtual shards. Second records of the plurality of records may be processed using the second set of virtual shards.
Parallel processing apparatus, storage medium, and job management method
A parallel processing apparatus includes a plurality of compute nodes, and a job management device that allocates computational resources of the plurality of compute nodes to jobs, the job management device including circuitry configured to determine a resource search time range based on respective scheduled execution time periods of a plurality of jobs including a job being executed and a job waiting for execution, and search for free computational resources to be allocated to a job waiting for execution that is a processing target among the plurality of jobs, from among computational resources of the plurality of compute nodes within the resource search time range, by backfill scheduling.
Memory-network processor with programmable optimizations
Various embodiments are disclosed of a multiprocessor system with processing elements optimized for high performance and low power dissipation and an associated method of programming the processing elements. Each processing element may comprise a fetch unit and a plurality of address generator units and a plurality of pipelined datapaths. The fetch unit may be configured to receive a multi-part instruction, wherein the multi-part instruction includes a plurality of fields. First and second address generator units may generate, based on different fields of the multi-part instruction, addresses from which to retrieve first and second data for use by an execution unit for the multi-part instruction or a subsequent multi-part instruction. The execution units may perform operations using a single pipeline or multiple pipelines based on third and fourth fields of the multi-part instruction.
Application and integration of a GPU server system
A graphics processing unit (GPU) server having a GPU host head with one or more host graphics processing units (GPUs). The GPU server further has a GPU system with a plurality of system GPUs that are separate from the host GPUs, and that are configured to rapidly accelerate creation of images for output to a display device. The GPU server also has a mounting assembly that integrates the GPU host head and the GPU system into a single GPU server unit. The GPU host head is independently movable relative to the GPU system.
Executing multiple programs simultaneously on a processor core
Systems and methods are disclosed for allocating resources to contexts in block-based processor architectures. In one example of the disclosed technology, a processor is configured to spatially allocate resources between multiple contexts being executed by the processor, including caches, functional units, and register files. In a second example of the disclosed technology, a processor is configured to temporally allocate resources between multiple contexts, for example, on a clock cycle basis, including caches, register files, and branch predictors. Each context is guaranteed access to its allocated resources to avoid starvation from contexts competing for resources of the processor. A results buffer can be used for folding larger instruction blocks into portions that can be mapped to smaller-sized instruction windows. The results buffer stores operand results that can be passed to subsequent portions of an instruction block.
Apparatus and method for multi-adapter encoding
An apparatus and method for multi-adapter and/or multi-pass encoding on dual graphics processors. For example, one embodiment of a processor comprises: a central processor integrated on a first die, the central processor comprising a plurality of cores to execute instructions and process data; an first graphics processor integrated on the first die, the first graphics processor comprising media processing circuitry to perform one or more preliminary lookahead operations on video content to generate lookahead statistics; an interconnect to couple the first graphics processor to a lookahead buffer, the first graphics processor to transmit the lookahead statistics over the interconnect to the lookahead buffer; wherein the lookahead statistics are to be used by a second graphics processor to encode the video content to generate encoded video.
Upgrade management method and scheduling node, and storage system
A storage system and method for performing upgrade management are provided. A plurality of nodes in the storage systems are divided into groups. A scheduling node obtains a constraint condition of each group. The constraint condition comprises a maximum quantity of nodes in a corresponding group that are allowed to be upgraded in parallel. Nodes in a first batch for upgrading in parallel from the groups based on the constraint condition of each group are selected. An upgrade instruction is sent to the nodes in the first batch.
GRAPHICS PROCESSING
There is disclosed an instruction that can be included into a graphics processor shader program to be executed by a group of execution threads that when executed will cause a group of execution lanes to be in an ‘active’ (e.g. SIMD) execution state in which active state processing operations can be performed using the group of plural execution lanes together. The processing operations can then be performed using the execution lanes in the active state together. The execution lanes are then allowed or caused to return to their prior execution state once the processing operations have finished.
Arbitrating throttling recommendations for a systolic array
Throttling recommendations for a systolic array may be arbitrated. Throttling recommendations may be received at an arbiter for a systolic array from different sources, such as one or more monitors implemented in an integrated circuit along with the systolic array or sources external to the integrated circuit with the systolic array. A strongest throttling recommendation may be selected. The rate at which data enters the systolic array may be modified according to the strongest throttling recommendation.
Systems and methods for quantum processor topology
Topologies for analog computing systems may include cells of qubits which may implement a tripartite graph and cross substantially orthogonally. Qubits may have an H-shape or an l-shape, qubits may change direction within a cell. Topologies may be comprised of two or more different sub-topologies. Qubits may be communicatively coupled to non-adjacent cells by long-range couplers. Long-range couplers may change direction within a cell. A cell may have two or more different type of long-range couplers. A cell may have shifted qubits, more than one type of inter-cell couplers, more than one type of intra-cell couplers and long-range couplers.