G06F9/3877

CROSS PLATFORM AND PLATFORM AGNOSTIC ACCELERATOR REMOTING SERVICE

Disclosed are various examples of providing cross platform accelerator remoting between complex instruction set computer (CISC) components and reduced instruction set computer (RISC) components of a computing environment. An accelerator remoting server receives accelerator instructions executable by a locally installed accelerator device and provides the accelerator instructions to the accelerator device. The accelerator remoting server transmits accelerator results to an accelerator remoting client to complete the cross platform or platform agnostic accelerator remoting.

ADJUSTING STORE GATHER WINDOW DURATION IN A DATA PROCESSING SYSTEM SUPPORTING SIMULTANEOUS MULTITHREADING
20220405125 · 2022-12-22 ·

In at least some embodiments, a store-type operation is received and buffered within a store queue entry of a store queue associated with a cache memory of a processor core capable of executing multiple simultaneous hardware threads. A thread identifier indicating a particular hardware thread among the multiple hardware threads that issued the store-type operation is recorded. An indication of whether the store queue entry is a most recently allocated store queue entry for buffering store-type operations of the hardware thread is also maintained. While the indication indicates the store queue entry is a most recently allocated store queue entry for buffering store-type operations of the particular hardware thread, the store queue extends a duration of a store gathering window applicable to the store queue entry. For example, the duration may be extended by decreasing a rate at which the store gathering window applicable to the store queue entry ends.

Apparatus and method for scalable qubit addressing
11531922 · 2022-12-20 · ·

An apparatus and method for scalable qubit addressing. For example, one embodiment of a processor comprises: a decoder comprising quantum instruction decode circuitry to decode quantum instructions to generate quantum microoperations (uops) and non-quantum decode circuitry to decode non-quantum instructions to generate non-quantum uops; execution circuitry comprising: an address generation unit (AGU) to generate a system memory address responsive to execution of one or more of the non-quantum uops; and quantum index generation circuitry to generate quantum index values responsive to execution of one or more of the quantum uops, each quantum index value uniquely identifying a quantum bit (qubit) in a quantum processor; wherein to generate a first quantum index value for a first quantum uop, the quantum index generation circuitry is to read the first quantum index value from a first architectural register identified by the first quantum uop.

Reduced bandwidth tessellation factors

A graphics pipeline reduces the number of tessellation factors written to and read from a graphics memory. A hull shader stage of the graphics pipeline detects whether at least a threshold percentage of the tessellation factors for a thread group of patches are the same and, in some embodiments, whether at least the threshold percentage of the tessellation factors for a thread group of patches have a same value that either indicates that the plurality of patches are to be culled or that the plurality of patches are to be passed to a tessellator stage of the graphics pipeline. In response to detecting that at least the threshold percentage of the tessellation factors for the thread group are the same (or, additionally, that at least the threshold percentage of the tessellation factors have a value that either indicates that the plurality of patches are to be culled or that the plurality of patches are to be passed to a tessellator stage of the graphics pipeline), the hull shader stage bypasses writing at least a subset of the tessellation factors for the thread group of patches to the graphics memory, thus reducing bandwidth and increasing efficiency of the graphics pipeline.

Supplemental AI processing in memory

Apparatuses and methods can be related to supplementing AI processing in memory. An accelerator and/or a host can perform AI processing. Some of the operations comprising the AI processing can be performed by a memory device instead of by an accelerator and/or a host. The memory device can perform AI processing in conjunction with the host and/or accelerator.

System and method to dynamically and automatically sharing resources of coprocessor AI accelerators
11521042 · 2022-12-06 ·

A system and method for dynamically and automatically sharing resources of a coprocessor AI accelerator based on workload changes during training and inference of a plurality of neural networks. The method comprising steps of receiving a plurality of requests from each neural network and high-performance computing applications (HPCs) through a dynamic adaptive scheduler module. The dynamic adaptive scheduler module morphs the received requests into threads, dimensions and memory sizes. The method then receives the morphed requests from the dynamic adaptive scheduler module through client units. Each of the neural network applications is mapped with at least one of the client units on a graphics processing unit (GPU) hosts. The method then receives the morphed requests from the plurality of client units through a plurality of server units. Further, the method receives the morphed request from the plurality of server units through one or more coprocessors.

Apparatus and method for distributed database query cancellation based upon single node query execution analysis

A master database module is on a master computer node. Slave database modules are on slave computer nodes connected to the master computer node via a network. A distributed database includes executable code executed by processors on the master computer node and the slave computer nodes to receive a distributed database query at the master computer node. A query execution plan is prepared at the master computer node. The query execution plan is deployed on the slave computer nodes. The query execution plan is executed on the slave computer nodes. The slave computer nodes each perform a single node query execution analysis to selectively produce a query cancellation command. The query cancellation command is propagated to the master computer node and the slave computer nodes. The query execution plan is cancelled on the master computer node and the slave computer nodes.

PROCESSING SYSTEM, RELATED INTEGRATED CIRCUIT, DEVICE AND METHOD
20220382695 · 2022-12-01 ·

In an embodiment, a processing system comprises a microprocessor programmable via software instructions, a memory controller configured to be coupled to a memory, a communication system coupling the microprocessors to the memory controller, a cryptographic co-processor and a first communication interface. The processing system also comprises first and second configurable DMA channels. In a first configuration, the first DMA channel is configured to transfer data from the memory to the cryptographic co-processor, and the second DMA channel is configured to transfer the encrypted data via two loops from the cryptographic co-processor to the first communication interface. In a second configuration, the second DMA channel is configured to transfer received data via two loops from the first communication interface to the cryptographic co-processor, and the first DMA channel is configured to transfer the decrypted data from the cryptographic co-processor to the memory.

DRIVE ENHANCED J/ZZ OPERATION FOR SUPERCONDUCTING QUBITS
20220383169 · 2022-12-01 ·

Systems, devices, computer-implemented methods, and/or computer program products that facilitate dynamic control of ZZ interactions for quantum computing devices. In one example, a quantum device can comprise a biasing component that is operatively coupled to first and second qubits via respective first and second drive lines. The biasing component can facilitate dynamic control of ZZ interactions between the first and second qubits using continuous wave (CW) tones applied via the respective first and second drive lines.

System and method for efficient multi-GPU rendering of geometry by generating information in one rendering phase for use in another rendering phase

A method for graphics processing including rendering graphics for an application using a plurality of graphics processing units (GPUs). The method including dividing responsibility for rendering geometry of the graphics between the GPUs based on screen regions, each GPU having a corresponding division of the responsibility which is known to the GPUs. The method including determining a Z-value for a piece of geometry during a pre-pass phase of rendering at a first GPU for an image, wherein the piece of geometry overlaps a first screen region for which the first GPU has a division of responsibility. The method including comparing the Z-value against a Z-buffer value for the piece of geometry. The method including generating information including a result of the comparing the Z-value against the Z-buffer value for use by the GPU when rendering the piece of geometry during a full render phase of rendering.