Patent classifications
G06F15/17362
Writing to Contiguous Memory Addresses in a Network on a Chip Architecture
Devices, systems and methods are provided for writing, by a plurality of computing resources, to contiguous memory addresses of memory that supports random access, without having to specify actual write addresses of the memory.
Network Topology of Hierarchical Ring with Recursive Shortcuts
An interconnection system comprising a plurality of nodes, each comprising at least two ports, and a plurality of links configured to interconnect ports among the nodes to form a hierarchical multi-level ring topology, wherein the ring topology comprises a plurality of levels of rings including a base ring and at least two hierarchical shortcut rings, and wherein each node connected to a higher-level shortcut ring is also connected to all lower-level rings including the base ring.
Inter-Cluster Data Communication Network for a Dynamic Shared Communication Platform
The disclosure relates to a data communication network connecting a plurality of computation clusters. The data communication network is arranged for receiving via N data input ports, N>1, input signals from first clusters of the plurality and for outputting output signals to second clusters of the plurality via M data output ports, M>1. The communication network includes a segmented bus network for interconnecting clusters of the plurality and a controller arranged for concurrently activating up to P parallel data busses of the segmented bus network, thereby forming bidirectional parallel interconnections between P of the N inputs, P<N, and P of the M outputs, P<M, via paths of connected and activated segments of the segmented bus network. The segments are linked by segmentation switches. The N data input ports and the M data output ports are connected via stubs to a subset of the segmentation switches.
Network topology of hierarchical ring with recursive shortcuts
An interconnection system comprising a plurality of nodes, each comprising at least two ports, and a plurality of links configured to interconnect ports among the nodes to form a hierarchical multi-level ring topology, wherein the ring topology comprises a plurality of levels of rings including a base ring and at least two hierarchical shortcut rings, and wherein each node connected to a higher-level shortcut ring is also connected to all lower-level rings including the base ring.
OPCODE COUNTING FOR PERFORMANCE MEASUREMENT
Methods, systems and computer program products are disclosed for measuring a performance of a program running on a processing unit of a processing system. In one embodiment, the method comprises informing a logic unit of each instruction in the program that is executed by the processing unit, assigning a weight to each instruction, assigning the instructions to a plurality of groups, and analyzing the plurality of groups to measure one or more metrics. In one embodiment, each instruction includes an operating code portion, and the assigning includes assigning the instructions to the groups based on the operating code portions of the instructions. In an embodiment, each type of instruction is assigned to a respective one of the plurality of groups. These groups may be combined into a plurality of sets of the groups.
Software based application specific integrated circuit
A processing device is provided. A cluster includes a plurality of groups of processing elements. A multi-word device is connected to the processing elements within the groups. Each processing element in a particular group is in communication with all other processing elements within the particular group, and only one of the processing elements within other groups in the cluster. Each processing element is limited to operations in which input bits can be processed and an output obtained without reference to other bits. The multi-word device is configured to cooperate with at least two other processing elements to perform processing that requires reference to other bits to obtain a result.
PROGRAMMABLE DEVICE, HEIRARCHICAL PARALLEL MACHINES, AND METHODS FOR PROVIDING STATE INFORMATION
Programmable devices, hierarchical parallel machines and methods for providing state information are described. In one such programmable device, programmable elements are provided. The programmable elements are configured to implement one or more finite state machines. The programmable elements are configured to receive an N-digit input and provide a M-digit output as a function of the N-digit input. The M-digit output includes state information from less than all of the programmable elements. Other programmable devices, hierarchical parallel machines and methods are also disclosed.
PROGRAMMABLE DEVICE, HIERARCHICAL PARALLEL MACHINES, AND METHODS FOR PROVIDING STATE INFORMATION
Programmable devices, hierarchical parallel machines and methods for providing state information are described. In one such programmable device, programmable elements are provided. The programmable elements are configured to implement one or more finite state machines. The programmable elements are configured to receive an N-digit input and provide a M-digit output as a function of the N-digit input. The M-digit output includes state information from less than all of the programmable elements. Other programmable devices, hierarchical parallel machines and methods are also disclosed.
Simultaneous Multi-Processor Apparatus Applicable to Acheiving Exascale Performance for Algorithms and Program Systems
Apparatus adapted for exascale computers are disclosed. The apparatus includes, but is not limited to at least one of: a system, data processor chip (DPC), Landing module (LM), chips including LM, anticipator chips, simultaneous multi-processor (SMP) cores, SMP channel (SMPC) cores, channels, bundles of channels, printed circuit boards (PCB) including bundles, floating point adders, accumulation managers, QUAD Link Anticipating Memory (QUADLAM), communication networks extended by coupling links of QUADLAM, log 2 calculators, exp2 calculators, log ALU, Non-Linear Accelerator (NLA), and stairways. Methods of algorithm and program development, verification and debugging are also disclosed. Collectively, embodiments of these elements disclose a class of supercomputers that obsolete Amdahl's Law, providing cabinets of petaflop performance and systems that may meet or exceed an exaflop of performance for Block LU Decomposition (Linpack).
Performing Read Operations in Network on a Chip Architecture
Systems and methods to be used by a processing element from among multiple computing resources of a computing system, where communication between the computing resources is carried out based on network on a chip architecture, to send first data from memory registers of the processing element and second data from memory of the computing system to a destination processing element from among the multiple computing resources, by sending the first data to a memory controller of the memory along with a single appended-read command.