G06F2009/3883

Graphics processing unit systems for performing data analytics operations in data science

Systems and methods are provided for efficiently performing processing intensive operations, such as those involving large volumes of data, that enable accelerated processing time of these operations. In at least one embodiment, a system includes a graphics processor unit (GPU) including a memory and a plurality of cores. The plurality of cores perform a plurality of data analytics operations on a respectively allocated portion of a dataset, each of the plurality of cores using only the memory to store data input for each of the plurality of data analytics operations performed by the plurality of cores. The data storage for the plurality of data analytics operations performed by the plurality of cores is also provided solely by the memory.

Systems and methods for simultaneous control of safety-critical and non-safety-critical processes in automation systems using master-minion functionality
11487265 · 2022-11-01 · ·

A control system is for controlling safety-critical processes, non-safety-critical processes, and/or installation components. The control system includes: at least one control unit configured to control non-safety-critical processes and/or non-safety-critical installation components, at least one safety control unit for controlling safety-critical processes and/or safety-critical installation components, and at least one input/output unit connected to the first control unit via an internal input/output bus. The control system is configured to act as communication master or as communication minion or as both in a pool having other devices that is connected via field bus, and to that end, the control system includes a master communication coupler and a minion communication coupler. The control system is modularly configurable. At least the safety control unit includes respective subunits with master functionality and subunits with minion functionalities.

Low power and low latency GPU coprocessor for persistent computing

Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.

Buffer Checker for Task Processing Fault Detection
20220350643 · 2022-11-03 ·

A graphics processing system for operation with a data store, comprising: one or more processing units for processing tasks; a check unit operable to form a signature which is characteristic of an output from processing a task on a processing unit; and a fault detection unit operable to compare signatures formed at the check unit; wherein the graphics processing system is operable to process each task first and second times at the one or more processing units so as to, respectively, generate first and second processed outputs, the graphics processing system being configured to: write out the first processed output to the data store; read back the first processed output from the data store and form at the check unit a first signature which is characteristic of the first processed output as read back from the data store; form at the check unit a second signature which is characteristic of the second processed output; compare the first and second signatures at the fault detection unit; and raise a fault signal if the first and second signatures do not match.

Accelerator Interface Mechanism for Data Processing System

A method and apparatus is provided for processing accelerator instructions in a data processing apparatus, where a block of one or more accelerator instructions is executable on a host processor or on an accelerator device. For an instruction executed on the host processor and referencing a first virtual address, the instruction is issued to an instruction queue of the host processor and executed the instruction by the host processor, the executing including translating, by translation hardware of the host processor, the first virtual address to a first physical address. For an instruction executed on the accelerator device and referencing the first virtual address, the first virtual address is translated, by the translation hardware, to a second physical address and the instruction is sent to the accelerator device referencing the second physical address. An accelerator task may be initiated by writing configuration data to an accelerator job queue.

Framework to provide time bound execution of co-processor commands

When a main processor issues a command to co-processor, a timeout value is included in the command. As the co-processor attempts to execute the command, it is determined whether the attempt is taking time beyond what is permitted by the timeout value. If the timeout is exceeded then responsive action is taken, such as the generation of a command timeout type failure message. The receipt of the command with the timeout value, and the consequent determination of a timeout condition for the command, may be determined by: the co-processor that receives the command, or a watchdog timer that is separate from the co-processor. Also, detection of co-processor hang and/or hung co-processor conditions during the time that a co-processor is executing a command for the main processor.

Method and apparatus for efficiently managing offload work between processing units
11321144 · 2022-05-03 · ·

Apparatus and method for selectively saving and restoring execution state components in an inter-core work offload environment. For example, one embodiment of a processor comprises: a plurality of cores; an interconnect coupling the plurality of cores; and offload circuitry to transfer work from a first core of the plurality of cores to a second core of the plurality of cores without operating system (OS) intervention, wherein the second core is to reach a first execution state upon completing the offload work and to store results in a first memory location or register; the second core comprising: a decoder to decode a first instruction comprising at least one operand to identify one or more components of the first execution state; and execution circuitry to execute the first instruction to save the one or more components of the first execution state to a specified region in memory.

Core-to-core start “offload” instruction(s)
11182208 · 2021-11-23 · ·

Embodiments involving core-to-core offload are detailed herein. For example, a processor core comprising performance monitoring circuitry to monitor performance of the core, an offload phase tracker to maintain status information about at least an availability of a second core to act as a helper core for the first core, decode circuitry to decode an instruction having fields for at least an opcode to indicate a start a task offload operation is to be performed, and execution circuitry to execute the decoded instruction to: cause a transmission an offload start request to at least the second core, the offload start request including one or more of: an identifier of the first core, a location of where the second core can find the task to perform, an identifier of the second core, an instruction pointer from the code that the task is a proper subset of, a requesting core state, and a requesting core state location is described.

Synchronization of concurrent computation engines

Integrated circuit devices and methods for synchronizing execution of program code for multiple concurrently operating execution engines of the integrated circuit devices are provided. In some cases, one execution engine of an integrated circuit device may be dependent on the operation of another execution engine of the integrated circuit device. To synchronize the execution engines around the dependency, a first execution engine may execute an instruction to set a value in a register while a second execution engine may execute an instruction to wait for a condition associated with the register value.

Accelerator interface mechanism for data processing system

A method and apparatus is provided for processing accelerator instructions in a data processing apparatus, where a block of one or more accelerator instructions is executable on a host processor or on an accelerator device. For an instruction executed on the host processor and referencing a first virtual address, the instruction is issued to an instruction queue of the host processor and executed the instruction by the host processor, the executing including translating, by translation hardware of the host processor, the first virtual address to a first physical address. For an instruction executed on the accelerator device and referencing the first virtual address, the first virtual address is translated, by the translation hardware, to a second physical address and the instruction is sent to the accelerator device referencing the second physical address. An accelerator task may be initiated by writing configuration data to an accelerator job queue.