G06F13/1652

Graphics processing device
20220027293 · 2022-01-27 ·

Disclosed is a graphics processing device including a main SoC, a performance-enhancing SoC, and an external circuit that is set outside any of the two SoCs. The main SoC includes: a first graphics processing unit (GPU) dividing to-be-processed data into a first input part and a second input part, and processing the first output part to generate first output data; and a first transceiver circuit forwarding the second input part to the performance-enhancing SoC via the external circuit, and then receiving second output data via the external circuit and forwarding it. The performance-enhancing SoC includes: a second transceiver circuit receiving the second input part via the external circuit and outputting the second output data to the main SoC via the external circuit; and a second GPU receiving the second input part from the second transceiver circuit and processing this part to provide the second output data for the second transceiver.

TECHNOLOGIES FOR COORDINATING DISAGGREGATED ACCELERATOR DEVICE RESOURCES

A compute device to manage workflow to disaggregated computing resources is provided. The compute device comprises a compute engine receive a workload processing request, the workload processing request defined by at least one request parameter, determine at least one accelerator device capable of processing a workload in accordance with the at least one request parameter, transmit a workload to the at least one accelerator device, receive a work product produced by the at least one accelerator device from the workload, and provide the work product to an application.

Host Connected Computer Network
20220019550 · 2022-01-20 ·

A processor comprises a plurality of processing units on an integrated circuit interconnected by an exchange. The exchange has a group of exchange paths extending between first and second portions of the integrated circuit. Each group has a first exchange block in the first portion and a second exchange block in the second portion. The processor has a first external interface in the first portion a second external interface in the second portion and a routing bus which routes packets between the external interfaces and the exchange blocks. The first external interface exchanges packets between the integrated circuit and a host. The second interface exchanges packets between the integrated circuit and another integrated circuit. Errors may be trapped when packets are wrongly addressed. A network of such processors is also provided.

Techniques for an efficient fabric attached memory

Fabric Attached Memory (FAM) provides a pool of memory that can be accessed by one or more processors, such as a graphics processing unit(s) (GPU)(s), over a network fabric. In one instance, a technique is disclosed for using imperfect processors as memory controllers to allow memory, which is local to the imperfect processors, to be accessed by other processors as fabric attached memory. In another instance, memory address compaction is used within the fabric elements to fully utilize the available memory space.

Cache-inhibited write operations

A data processing system includes multiple processing units coupled to a system interconnect including a broadcast address interconnect and a data interconnect. The processing unit includes a processor core that executes memory access instructions and a cache memory, coupled to the processor core, which is configured to store data for access by the processor core. The processing unit is configured to broadcast, on the address interconnect, a cache-inhibited write request and write data for a destination device coupled to the system interconnect. In various embodiments, the initial cache-inhibited request and the write data can be communicated in the same or different requests on the address interconnect.

Optimized use of processor memory for I/O operations
11176063 · 2021-11-16 · ·

A system may include a plurality processing cores for processing I/O operations and at least one interconnect component for communicatively coupling one or more external components to the plurality of processing cores. The at least one interconnect component may be directly physically connected to each of the plurality of processing cores. The interconnect component may route I/O operations to one of the processing cores based on a memory range of the I/O operation. An I/O communication including an I/O operation may be received at the interconnect component. The memory address range of the I/O operation may be determined. A processing core corresponding to the determined memory address range of the I/O operation may be determined, for example, by accessing a data structure that maps address ranges to processing cores. An I/O communication including the I/O operation may be sent from the interconnect component to the determined processing core.

UTILIZING COHERENTLY ATTACHED INTERFACES IN A NETWORK STACK FRAMEWORK

Embodiments for implementing an enhanced network stack framework in a computing environment. A plurality of network buffers coherently attached between one or more applications and a network interface may be shared while bypassing one or more drivers and an operating systems using an application buffer, a circular buffer and a queuing and pooling operation.

MECHANISM TO ACCELERATE GRAPHICS WORKLOADS IN A MULTI-CORE COMPUTING ARCHITECTURE
20210342968 · 2021-11-04 · ·

A processing apparatus is described. The apparatus includes a plurality of processing cores, including a first processing core and a second processing core a first field programmable gate array (FPGA) coupled to the first processing core to accelerate execution of graphics workloads processed at the first processing core and a second FPGA coupled to the second processing core to accelerate execution of workloads processed at the second processing core.

IMPROVING REMOTE TRAFFIC PERFORMANCE ON CLUSTER-AWARE PROCESSORS
20230289303 · 2023-09-14 · ·

Embodiments are directed to improving remote traffic performance on cluster-aware processors. An embodiment of a system includes at least one processor package comprising a plurality of processor ports and a plurality of system agents; and a memory device to store platform initialization firmware to cause the processing system to: determine first locations of the plurality of processor ports in the at least one processor package; determine second locations of the plurality of system agents in the at least one processor package; associate each of the processor ports with a set of the plurality of system agents based on the determined first and second locations; and program the plurality of system agents with the associated processor port for the respective system agent.

Multiprocessor system with improved secondary interconnection network

Embodiments of a multiprocessor system are disclosed that may include a plurality of processors interspersed with a plurality of data memory routers, a plurality of bus interface units, a bus control circuit, and a processor interface circuit. The data memory routers may be coupled together to form a primary interconnection network. The bus interface units and the bus control circuit may be coupled together in a daisy-chain fashion to form a secondary interconnection network. Each of the bus interface units may be configured to read or write data or instructions to a respective one of the plurality of data memory routers and a respective processor. The bus control circuit coupled with the processor interface circuit may be configured to function as a bidirectional bridge between the primary and secondary networks. The bus control circuit may also couple to other interface circuits and arbitrate their access to the secondary network.