Patent classifications
G06F9/5044
Efficient inter-chip interconnect topology for distributed parallel deep learning
The present disclosure provides a system comprising: a first group of computing nodes and a second group of computing nodes, wherein the first and second groups are neighboring devices and each of the first and second groups comprising: a set of computing nodes A-D, and a set of intra-group interconnects, wherein the set of intra-group interconnects communicatively couple computing node A with computing nodes B and C and computing node D with computing nodes B and C; and a set of inter-group interconnects, wherein the set of inter-group interconnects communicatively couple computing node A of the first group with computing node A of the second group, computing node B of the first group with computing node B of the second group, computing node C of the first group with computing node C of the second group, and computing node D of the first group with computing node D of the second group.
Allocation and placement of resources for network computation
Techniques for operating a computing system to perform neural network operations are disclosed. In one example, a method comprises receiving a neural network model, determining a sequence of neural network operations based on data dependency in the neural network model, and determining a set of instructions to map the sequence of neural network operations to the processing resources of the neural network processor. The method further comprises determining, based on a set of memory access operations included in the set of instructions, a first set of memory references associated with a first location of an external memory to store the input data and a second set of memory references associated with a second location of the external memory to store the output data, and generating an instruction file including the set of instructions, the first set of memory references and the second set of memory references.
Scheduling processing of machine learning tasks on heterogeneous compute circuits
Scheduling work of a machine learning application includes instantiating kernel objects by a computer processor in response to input of kernel definitions. Each kernel object is of a kernel type indicating a compute circuit. The computer processor generates a graph in a memory. Each node represents a task and specifies an assignment of the task to one or more of the kernel objects, and each edge represents a data dependency. Task queues are created in the memory and assigned to queue tasks represented by the nodes. Kernel objects are assigned to the task queues, and the tasks are enqueued by threads executing the kernel objects, based on assignments of the kernel objects to the task queues and assignments of the tasks to the kernel objects. Tasks are dequeued by the threads, and the compute circuits are activated to initiate processing of the dequeued tasks.
Neural network hardware acceleration with stochastic adaptive resource allocation
A digital circuit for accelerating computations of an artificial neural network model includes a pairs selection unit that selects different subsets of pairs of input vector values and corresponding weight vector values to be processed simultaneously at each time step; a sorting unit that simultaneously processes a vector of input-weight pairs wherein pair values whose estimated product is small are routed with a high probability to small multipliers, and pair values whose estimated product is greater are routed with a high probability to large multipliers that support larger input and output values; and a core unit that includes a plurality of multiplier units and a plurality of adder units that accumulate output results of the plurality of multiplier units into one or more output values that are stored back into the memory, where the plurality of multiplier units include the small multipliers and the large multipliers.
Computational operations in enclave computing environments
Methods and systems for performing a computational operation on a server host are provided. Exemplary methods include: receiving an encrypted service request from a client host, the client host encrypting a service request to produce the encrypted service request using a shared secret, the service request specifying the computational operation; decrypting, in a secure enclave, the encrypted service request using the shared secret to produce a decrypted service request, the secure enclave preventing other software running on the server host from accessing the shared secret and other data stored in a memory space; performing the computational operation, in the secure enclave, using the decrypted service request to generate a service result; encrypting, in the secure enclave, the service result using the shared secret to create an encrypted service result; and providing the encrypted service result to the client host, the client host decrypting the encrypted service result.
Quantum computing service supporting multiple quantum computing technologies
A quantum computing service includes connections to multiple quantum hardware providers that are configured to execute quantum circuits using quantum computers based on different quantum technologies. The quantum computing service enables a customer to define a quantum algorithm/circuit in an intermediate representation and select from any of a plurality of supported quantum computing technologies to be used to execute the quantum algorithm/quantum circuit.
System and method for unified infrastructure architecture
An information handling system for instantiating a composed information handling includes hardware computing resources. The hardware computing resources includes a compute resource set that includes computing resources including a processor and a memory, and a hardware resource set including resources distinct from the compute resource set. The information also includes a hardware system control processor adapted to present a portion of the hardware resource set to a compute resource set of the composed information handling system as bare metal resources.
Datapath load distribution for a RIC
To provide a low latency near RT RIC, some embodiments separate the RIC's functions into several different components that operate on different machines (e.g., execute on VMs or Pods) operating on the same host computer or different host computers. Some embodiments also provide high speed interfaces between these machines. Some or all of these interfaces operate in non-blocking, lockless manner in order to ensure that critical near RT RIC operations (e.g., datapath processes) are not delayed due to multiple requests causing one or more components to stall. In addition, each of these RIC components also has an internal architecture that is designed to operate in a non-blocking manner so that no one process of a component can block the operation of another process of the component. All of these low latency features allow the near RT RIC to serve as a high speed IO between the E2 nodes and the xApps.
RECOMMENDATIONS FOR SCHEDULING JOBS ON DISTRIBUTED COMPUTING DEVICES
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for scheduling operations represented as a computational graph on a distributed computing network. A method includes: receiving data representing operations to be executed in order to perform a job on a plurality of hardware accelerators of a plurality of different accelerator types; generating, for the job and from at least the data representing the operations, features that represent a predicted performance for the job on hardware accelerators of the plurality of different accelerator types; generating, from the features, a respective predicted performance metric for the job for each of the plurality of different accelerator types according to a performance objective function; and providing, to a scheduling system, one or more recommendations for scheduling the job on one or more recommended types of hardware accelerators.
Executing a Quantum Logic Circuit on Multiple Processing Nodes
In a general aspect, a quantum logic circuit is executed on multiple processing nodes in a computing system that includes quantum computing resources. In some aspects, methods of operating the computing system may include obtaining a computer program that includes a quantum logic circuit. The methods may include obtaining hardware resource metadata specifying properties of processing nodes in the computing system. The processing nodes include at least a subset of the quantum computing resources, and the hardware resource metadata includes error rate information and availability information for the respective processing nodes. The methods may include generating execution tasks configured to execute the quantum logic circuit on the processing nodes based on the hardware resource metadata; dispatching the execution tasks to the processing nodes; receiving output data generated by the processing nodes; and producing an output of the computer program based on the output data.