G06F2015/765

GPU networking using an integrated command processor

Systems, apparatuses, and methods for generating network messages on a parallel processor are disclosed. A system includes at least a parallel processor, a general purpose processor, and a network interface unit. The parallel processor includes at least a plurality of compute units, a command processor, and a cache. A thread within a kernel executing on a compute unit of the parallel processor generates a network message and stores the network message and a corresponding indication in the cache. In response to detecting the indication of the network message in the cache, the command processor processes and conveys the network message to the network interface unit without involving the general purpose processor.

Digitally coordinated dynamically adaptable clock and voltage supply apparatus and method

An apparatus and method is described that digitally coordinates dynamically adaptable clock and voltage supply to significantly reduce the energy consumed by a processor without impacting its performance or latency. A signal is generated that indicates a long latency operation. This signal is used to reduce power supply voltage and frequency of the adaptable clock. An early resume indicator is generated a few nanoseconds before normal operations are about to resume. This early resume signal is used to power up the power-downed voltage regulator, and/or can increase frequency and/or supply voltage back to normal level before normal processor operations are about to resume.

GPU NETWORKING USING AN INTEGRATED COMMAND PROCESSOR

Systems, apparatuses, and methods for generating network messages on a parallel processor are disclosed. A system includes at least a parallel processor, a general purpose processor, and a network interface unit. The parallel processor includes at least a plurality of compute units, a command processor, and a cache. A thread within a kernel executing on a compute unit of the parallel processor generates a network message and stores the network message and a corresponding indication in the cache. In response to detecting the indication of the network message in the cache, the command processor processes and conveys the network message to the network interface unit without involving the general purpose processor.

Dynamically erectable computer system
09772971 · 2017-09-26 ·

A fault-tolerant computer system architecture includes two types of operating domains: a conventional first domain (DID) that processes data and instructions, and a novel second domain (MM domain) which includes mentor processors for mentoring the DID according to “meta information” which includes but is not limited to data, algorithms and protective rule sets. The term “mentoring” (as defined herein below) refers to, among other things, applying and using meta information to enforce rule sets and/or dynamically erecting abstractions and virtualizations by which resources in the DID are shuffled around for, inter alia, efficiency and fault correction. Meta Mentor processors create systems and sub-systems by means of fault tolerant mentor switches that route signals to and from hardware and software entities. The systems and sub-systems created are distinct sub-architectures and unique configurations that may be operated as separately or concurrently as defined by the executing processes.

MEMORY-BASED DISTRIBUTED PROCESSOR ARCHITECTURE
20210365334 · 2021-11-25 · ·

Distributed processors and methods for compiling code for execution by distributed processors are disclosed. In one implementation, a distributed processor may include a substrate; a memory array disposed on the substrate; and a processing array disposed on the substrate. The memory array may include a plurality of discrete memory banks, and the processing array may include a plurality of processor subunits, each one of the processor subunits being associated with a corresponding, dedicated one of the plurality of discrete memory banks. The distributed processor may further include a first plurality of buses, each connecting one of the plurality of processor subunits to its corresponding, dedicated memory bank, and a second plurality of buses, each connecting one of the plurality of processor subunits to another of the plurality of processor subunits.

MEMORY-BASED DISTRIBUTED PROCESSOR ARCHITECTURE
20220156161 · 2022-05-19 · ·

Distributed processors and methods for compiling code for execution by distributed processors are disclosed. In one implementation, a distributed processor may include a substrate; a memory array disposed on the substrate; and a processing array disposed on the substrate. The memory array may include a plurality of discrete memory banks, and the processing array may include a plurality of processor subunits, each one of the processor subunits being associated with a corresponding, dedicated one of the plurality of discrete memory banks. The distributed processor may further include a first plurality of buses, each connecting one of the plurality of processor subunits to its corresponding, dedicated memory bank, and a second plurality of buses, each connecting one of the plurality of processor subunits to another of the plurality of processor subunits.

Memory-based distributed processor architecture
11301340 · 2022-04-12 · ·

Distributed processors and methods for compiling code for execution by distributed processors are disclosed. In one implementation, a distributed processor may include a substrate; a memory array disposed on the substrate; and a processing array disposed on the substrate. The memory array may include a plurality of discrete memory banks, and the processing array may include a plurality of processor subunits, each one of the processor subunits being associated with a corresponding, dedicated one of the plurality of discrete memory banks. The distributed processor may further include a first plurality of buses, each connecting one of the plurality of processor subunits to its corresponding, dedicated memory bank, and a second plurality of buses, each connecting one of the plurality of processor subunits to another of the plurality of processor subunits.

Memory-based distributed processor architecture
11269743 · 2022-03-08 · ·

Distributed processors and methods for compiling code for execution by distributed processors are disclosed. In one implementation, a distributed processor may include a substrate; a memory array disposed on the substrate; and a processing array disposed on the substrate. The memory array may include a plurality of discrete memory banks, and the processing array may include a plurality of processor subunits, each one of the processor subunits being associated with a corresponding, dedicated one of the plurality of discrete memory banks. The distributed processor may further include a first plurality of buses, each connecting one of the plurality of processor subunits to its corresponding, dedicated memory bank, and a second plurality of buses, each connecting one of the plurality of processor subunits to another of the plurality of processor subunits.

Memory-based distributed processor architecture
11126511 · 2021-09-21 · ·

Distributed processors and methods for compiling code for execution by distributed processors are disclosed. In one implementation, a distributed processor may include a substrate; a memory array disposed on the substrate; and a processing array disposed on the substrate. The memory array may include a plurality of discrete memory banks, and the processing array may include a plurality of processor subunits, each one of the processor subunits being associated with a corresponding, dedicated one of the plurality of discrete memory banks. The distributed processor may further include a first plurality of buses, each connecting one of the plurality of processor subunits to its corresponding, dedicated memory bank, and a second plurality of buses, each connecting one of the plurality of processor subunits to another of the plurality of processor subunits.

Memory-based distributed processor architecture
11023336 · 2021-06-01 · ·

Distributed processors and methods for compiling code for execution by distributed processors are disclosed. In one implementation, a distributed processor may include a substrate; a memory array disposed on the substrate; and a processing array disposed on the substrate. The memory array may include a plurality of discrete memory banks, and the processing array may include a plurality of processor subunits, each one of the processor subunits being associated with a corresponding, dedicated one of the plurality of discrete memory banks. The distributed processor may further include a first plurality of buses, each connecting one of the plurality of processor subunits to its corresponding, dedicated memory bank, and a second plurality of buses, each connecting one of the plurality of processor subunits to another of the plurality of processor subunits.