Patent classifications
G06F2009/3883
Buffer checker for task processing fault detection
A graphics processing system for operation with a data store, comprising: one or more processing units for processing tasks; a check unit operable to form a signature which is characteristic of an output from processing a task on a processing unit; and a fault detection unit operable to compare signatures formed at the check unit; wherein the graphics processing system is operable to process each task first and second times at the one or more processing units so as to, respectively, generate first and second processed outputs, the graphics processing system being configured to: write out the first processed output to the data store; read back the first processed output from the data store and form at the check unit a first signature which is characteristic of the first processed output as read back from the data store; form at the check unit a second signature which is characteristic of the second processed output; compare the first and second signatures at the fault detection unit; and raise a fault signal if the first and second signatures do not match.
Grouping of Pauli observables using Bell measurements
The illustrative embodiments provide a method, system, and computer program product. In an embodiment, a method includes receiving a set of Pauli observables. In an embodiment, a method includes initializing a measurement basis, the measurement basis comprising a set of Pauli bases equivalent to a number of qubits of a quantum processor. In an embodiment, a method includes creating a list of a set of Bell basis candidates, each of the set of Bell basis candidates configured to measure at least one of the set of Pauli observables. In an embodiment, a method includes selecting a Bell basis candidate from the set of Bell basis candidates. In an embodiment, a method includes reconfiguring the measurement basis to replace a subset of the set of Pauli bases with the selected Bell basis candidate.
GRAPHICS PROCESSING UNIT SYSTEMS FOR PERFORMING DATA ANALYTICS OPERATIONS IN DATA SCIENCE
Systems and methods are provided for efficiently performing processing intensive operations, such as those involving large volumes of data, that enable accelerated processing time of these operations. In at least one embodiment, a system includes a graphics processor unit (GPU) including a memory and a plurality of cores. The plurality of cores perform a plurality of data analytics operations on a respectively allocated portion of a dataset, each of the plurality of cores using only the memory to store data input for each of the plurality of data analytics operations performed by the plurality of cores. The data storage for the plurality of data analytics operations performed by the plurality of cores is also provided solely by the memory.
Graphics processing unit systems for performing data analytics operations in data science
Systems and methods are provided for efficiently performing processing intensive operations, such as those involving large volumes of data, that enable accelerated processing time of these operations. In at least one embodiment, a system includes a graphics processor unit (GPU) including a memory and a plurality of cores. The plurality of cores perform a plurality of data analytics operations on a respectively allocated portion of a dataset, each of the plurality of cores using only the memory to store data input for each of the plurality of data analytics operations performed by the plurality of cores. The data storage for the plurality of data analytics operations performed by the plurality of cores is also provided solely by the memory.
LOW POWER AND LOW LATENCY GPU COPROCESSOR FOR PERSISTENT COMPUTING
Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.
DISTRIBUTING POWER SHARED BETWEEN AN ACCELERATED PROCESSING UNIT AND A DISCRETE GRAPHICS PROCESSING UNIT
An integrated coprocessor such as an accelerated processing unit (APU) generates commands for execution on a discrete coprocessor such as a discrete graphics processing unit (dGPU). Power distribution circuitry selectively provides power to the APU and the dGPU based on characteristics of workloads executing on the APU and the dGPU and based on a platform power limit that is shared by the APU and the dGPU. In some cases, the power distribution circuitry determines a first power provided to the APU and a second power provided to the dGPU. The power distribution circuitry increases the second power provided to the dGPU in response to a sum of the first and second powers being less than the platform power limit. In some cases, the power distribution circuitry modifies the power provided to the APU, the dGPU, or both in response to changes in temperatures measured by a set of sensors.
Low power and low latency GPU coprocessor for persistent computing
Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.
METHOD AND APPARATUS FOR EFFICIENTLY MANAGING OFFLOAD WORK BETWEEN PROCESSING UNITS
Apparatus and method for selectively saving and restoring execution state components in an inter-core work offload environment. For example, one embodiment of a processor comprises: a plurality of cores; an interconnect coupling the plurality of cores; and offload circuitry to transfer work from a first core of the plurality of cores to a second core of the plurality of cores without operating system (OS) intervention, wherein the second core is to reach a first execution state upon completing the offload work and to store results in a first memory location or register; the second core comprising: a decoder to decode a first instruction comprising at least one operand to identify one or more components of the first execution state; and execution circuitry to execute the first instruction to save the one or more components of the first execution state to a specified region in memory.
CORE-TO-CORE START "OFFLOAD" INSTRUCTION(S)
Embodiments involving core-to-core offload are detailed herein. For example, a processor core comprising performance monitoring circuitry to monitor performance of the core, an offload phase tracker to maintain status information about at least an availability of a second core to act as a helper core for the first core, decode circuitry to decode an instruction having fields for at least an opcode to indicate a start a task offload operation is to be performed, and execution circuitry to execute the decoded instruction to: cause a transmission an offload start request to at least the second core, the offload start request including one or more of: an identifier of the first core, a location of where the second core can find the task to perform, an identifier of the second core, an instruction pointer from the code that the task is a proper subset of, a requesting core state, and a requesting core state location is described.
Method, electronic device and computer program product for dual-processor storage system
In accordance with certain techniques, at a first processor of a dual-processor storage system, a change in an initial logical unit corresponding to a storage area in a physical storage device of the storage system is detected. Based on the change in the initial logical unit, a plurality of update operations to be performed on a mapped logical unit driver mapping a plurality of initial logical units including the initial logical unit to a plurality of mapped logical units are determined. An indication of the plurality of update operations is sent to a second processor of the storage system, to cause the second processor to perform the plurality of update operations on a peer mapped logical unit driver associated with the mapped logical unit driver.