G06F15/1735

Improving performance of multi-processor computer systems

Embodiments of the invention may improve the performance of multi-processor systems in processing information received via a network. For example, some embodiments may enable configuration of a system such that information received via a network may be distributed among multiple processors for efficient processing. A user (e.g., system administrator) may select from among multiple configuration options, each configuration option being associated with a particular mode of processing information received via a network. By selecting a configuration option, the user may specify how information received via the network is processed to capitalize on the system's characteristics, such as by aligning processors on the system with certain NICs. As such, the processor(s) aligned with a NIC may perform networking-related tasks associated with information received by that NIC. If initial alignment causes one or more processors to become over-burdened, processing tasks may be dynamically re-distributed to other processors so as to achieve a more even distribution of the overall processing burden across the system.

Spatial distribution in a 3D data processing unit
11709790 · 2023-07-25 · ·

The embodiments herein describe a 3D SmartNIC that spatially distributes compute, storage, or network functions in three dimensions using a plurality of layers. That is, unlike current SmartNIC that can perform acceleration functions in a 2D, a 3D Smart can distribute these functions across multiple stacked layers, where each layer can communicate directly or indirectly with the other layers.

TECHNOLOGIES FOR DYNAMIC ACCELERATOR SELECTION
20230050698 · 2023-02-16 ·

Technologies for dynamic accelerator selection include a compute sled. The compute sled includes a network interface controller to communicate with a remote accelerator of an accelerator sled over a network, where the network interface controller includes a local accelerator and a compute engine. The compute engine is to obtain network telemetry data indicative of a level of bandwidth saturation of the network. The compute engine is also to determine whether to accelerate a function managed by the compute sled. The compute engine is further to determine, in response to a determination to accelerate the function, whether to offload the function to the remote accelerator of the accelerator sled based on the telemetry data. Also the compute engine is to assign, in response a determination not to offload the function to the remote accelerator, the function to the local accelerator of the network interface controller.

APPARATUS AND METHOD FOR DESCRIPTOR HANDLING AND COMPUTER-READABLE MEDIUM
20220360650 · 2022-11-10 ·

Apparatuses and methods for command and response descriptors handling are provided. In an example, a method for descriptor handling can include instantiating a command descriptor of a command regarding a packet at a first layer of a protocol stack by a microcontroller of a node. The method can also include passing a command pointer to the command descriptor to integrated circuits of the node from the microcontroller of the node. The method can further include looking up, by the integrated circuits, the command descriptor. The method can additionally include processing the command by the integrated circuits.

Overlay layer for network of processor cores

Methods and systems related to the efficient execution of complex computations by a multicore processor and the movement of data among the various processing cores in the multicore processor are disclosed. A multicore processor stack for the multicore processor can include a computation layer, for conducting computations using the processing cores in the multicore processor, with executable instructions for processing pipelines in the processing cores. The multicore processor stack can also include a network-on-chip layer, for connecting the processing cores in the multicore processor, with executable instructions for routers and network interface units in the multicore processor. The computation layer and the network-on-chip layer can be logically isolated by a network-on-chip overlay layer.

Flow processing offload using virtual port identifiers

Some embodiments of the invention provide a method for providing flow processing offload (FPO) for a host computer at a physical network interface card (pNIC) connected to the host computer. A set of compute nodes executing on the host computer are each associated with a set of interfaces that are each assigned a locally-unique virtual port identifier (VPID) by a flow processing and action generator. The pNIC includes a set of interfaces that are assigned physical port identifiers (PPIDs) by the pNIC. The method includes receiving a data message at an interface of the pNIC and matching the data message to a stored flow entry that specifies a destination using a VPID. The method also includes identifying, using the VPID, a PPID as a destination of the received data message by performing a lookup in a mapping table storing a set of VPIDs and a corresponding set of PPIDs and forwarding the data message to an interface of the pNIC associated with the identified PPID.

COMPUTER SYSTEM AND COMPUTER
20170371395 · 2017-12-28 ·

A computer system, comprising a plurality of computers, each of the plurality of computers including at least one processor chip each including a plurality of processor cores, the at least one processor chip constructing a plurality of regions each constructed by at least one processor core, each of the plurality of processor cores carries out calculation processing for executing a predetermined program and inter-core communication processing, which is communication between the plurality of processor cores, the computer system comprising: a regulation module which controls a voltage and a frequency that are supplied to each of the plurality of regions; and a determination module which determines a power mode of each of the plurality of regions, to output an instruction to the regulation module.

Hardware-Assisted Memory Disaggregation with Recovery from Network Failures Using Non-Volatile Memory
20230205649 · 2023-06-29 ·

Techniques for implementing hardware-assisted memory disaggregation with recovery from network failures/problems are provided. In one set of embodiments, a hardware controller of a computer system can maintain a copy of a “remote memory” of the computer system (i.e., a section of the physical memory address space of the computer system that maps to a portion of the physical system memory of a remote computer system) in a local backup memory. The backup memory may be implemented using a non-volatile memory that is slower, but also less expensive, than conventional dynamic random-access memory (DRAM). Then, if the hardware controller is unable to retrieve data in the remote memory from the remote computer system within a specified time window due to, e.g., a network failure or other problem, the hardware controller can retrieve the data from the backup memory, thereby avoiding a hardware error condition (and potential application/system crash).

INFORMATION PROCESSING APPARATUS AND MAINTENANCE SYSTEM
20170357612 · 2017-12-14 · ·

A disclosed information processing apparatus includes a memory and a processor coupled to the memory. And the processor is configured to detect that a first apparatus is connected to a first network port, change network settings of the information processing apparatus into first network settings for the first network port, upon detecting that the first apparatus is connected to the first network port, and switch transmission paths in the information processing apparatus to enable the first apparatus to communicate using the first network port, upon detecting that the first apparatus is connected to the first network port.

PROCESSOR FOR PROCESSING EXTERNAL SERVICE REQUESTS USING A SYMMETRICAL NETWORK INTERFACE
20220309026 · 2022-09-29 ·

A computer processor comprises: a fetch unit that reads and writes instructions and data, a register file including a plurality of registers, which includes a general-purpose register and a special register, an arithmetic logic unit that performs computational processing, a decoder/controller that interprets the instructions and generates a control signal, a symmetric network interface that includes a master interface and a slave interface and that is connected to an on-chip network; and a service controller that receives a service request from the external source through the on-chip network and the symmetric network interface, that communicates with the decoder/controller to send and receive a state of the service request and a state of the computer processor, and that copies an instruction address of a subroutine for executing the service request to a program counter to perform an operation of a designated code when the decoder/controller determines execution of the service request.