Patent classifications
G06F15/17337
Synchronization in multi-chip systems
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining, for each pair of adjacent chips in a plurality of chips connected in a series-ring arrangement of a semiconductor device, a corresponding loop latency for round trip data transmissions between the pair of chips. Identifying, from among the loop latencies, a maximum loop latency. Determining a ring latency for a data transmission originating from a chip of the plurality chips to be transmitted around the series-ring arrangement and back to the chip. Comparing half of the maximum loop latency to one N-th of the ring latency, where N is the number of chips in the plurality of chips, and storing the greater value as an inter-chip latency of the semiconductor device, the inter-chip latency representing an operational characteristic of the semiconductor device.
CONFIGURABLE MEMORY ARCHITECTURE FOR COMPUTER PROCESSING SYSTEMS
An integrated circuit (IC) includes a memory manager having a plurality of memory ports, each configured to communicate with a corresponding floating memory block. The IC includes a first interconnect for a first domain, wherein the first interconnect has a first set of fixed ports configured to communicate with memory blocks dedicated to the first domain and a first set of floating ports configured to communicate with the memory manager, and a second interconnect for a second domain, wherein the second interconnect has a second set of fixed ports configured to communicate with memory blocks dedicated to the second domain and a second set of floating ports configured to communicate with the memory manager. The memory manager is configured to allocate a first portion of the memory ports to the first set of floating ports and a second portion of the memory ports to the second set of floating ports.
Disjoint array computer
A hierarchical array computer architecture comprised of a master computer connected to a plurality of node computers wherein each node has a memory segment. A high speed connection scheme between the master computer and the nodes allows the master computer or individual nodes conditional access to the node memory segments. The resulting architecture creates an array computer with a large distributed memory in which each memory segment of the distributed memory has an associated computing element; the entire array being housed in a blade server type enclosure. The array computer created with this architecture provides a linear increase of processing speed corresponding to the number of nodes.
Systems and methods for implementing an intelligence processing computing architecture
A system and method for automated data propagation and automated data processing within an integrated circuit includes an intelligence processing integrated circuit comprising at least one intelligence processing pipeline, wherein the at least one intelligence processing pipeline includes: a main data buffer that stores input data; a plurality of distinct intelligence processing tiles, wherein each distinct intelligence processing tile includes a computing circuit and a local data buffer; a token-based governance module, the token-based governance module implementing: a first token-based control data structure; a second token-based control data structure, wherein the first token-based control data structure and the second-token based control data operate in cooperation to control an automated flow of the input data and/or an automated processing of the input data through the at least one intelligence processing pipeline.
Single-chip multi-processor communication
A heterogeneous multi-core integrated circuit comprising two or more processors, at least one of the processors being a general purpose CPU and at least one of the processors being a specialized hardware processing engine, the processors being connected by a processor local bus on the integrated circuit, wherein the general purpose CPU is configured to generate a first instruction for an atomic operation to be performed by a second processor, different from the general purpose CPU, the first instruction comprising an address of the second processor and a first command indicating a first action to be executed by the second processor, and transmit the first instruction to the second processor over the processor local bus. The first command may include the first action, or may be a descriptor of the first action or a pointer to where the first action may be found in a memory.
Dual mode interconnect
Examples herein describe techniques for communicating between data processing engines in an array of data processing engines. In one embodiment, the array is a 2D array where each of the DPEs includes one or more cores. In addition to the cores, the data processing engines can include streaming interconnects which transmit streaming data using two different modes: circuit switching and packet switching. Circuit switching establishes reserved point-to-point communication paths between endpoints in the interconnect which routes data in a deterministic manner. Packet switching, in contrast, transmits streaming data that includes headers for routing data within the interconnect in a non-deterministic manner. In one embodiment, the streaming interconnects can have one or more ports configured to perform circuit switching and one or more ports configured to perform packet switching.
Instruction set
The invention relates to a computer program comprising a sequence of instructions for execution on a processing unit having instruction storage for holding the computer program, an execution unit for executing the computer program and data storage for holding data, the computer program comprising one or more computer executable instruction which, when executed, implements: a send function which causes a data packet destined for a recipient processing unit to be transmitted on a set of connection wires connected to the processing unit, the data packet having no destination identifier but being transmitted at a predetermined transmit time; and a switch control function which causes the processing unit to control switching circuitry to connect a set of connection wires of the processing unit to a switching fabric to receive a data packet at a predetermined receive time.
Streaming fabric interface
An interface for coupling an agent to a fabric supports a load/store interconnect protocol and includes a header channel implemented on a first subset of a plurality of physical lanes, the first subset of lanes including first lanes to carry a header of a packet based on the interconnect protocol and second lanes to carry metadata for the header. The interface additionally includes a data channel implemented on a separate second subset of the plurality of physical lanes, the second subset of lanes including third lanes to carry a payload of the packet and fourth lanes to carry metadata for the payload.
Routing in a Network of Processors
A processor in a network has a plurality of processing units arranged on a chip. An on-chip interconnect enables data to be exchanged between the processing units. A plurality of external interfaces are configured to communicate data off chip in the form of packets, each packet having a destination address identifying a destination of the packet. The external interfaces are connected to respective additional connected processors. A routing bus routes packets between the processing units and the external interfaces. A routing register defines a routing domain for the processor, the routing domain comprising one or more of the additional processor, and at least a subset of further additional processors of the network, wherein the additional processors of the subset are directly or indirectly connected to the processor. The routing domain can be modified by changing the contents of the routing register as a sliding window domain.
DUAL MODE INTERCONNECT
Examples herein describe techniques for communicating between data processing engines in an array of data processing engines. In one embodiment, the array is a 2D array where each of the DPEs includes one or more cores. In addition to the cores, the data processing engines can include streaming interconnects which transmit streaming data using two different modes: circuit switching and packet switching. Circuit switching establishes reserved point-to-point communication paths between endpoints in the interconnect which routes data in a deterministic manner. Packet switching, in contrast, transmits streaming data that includes headers for routing data within the interconnect in a non-deterministic manner. In one embodiment, the streaming interconnects can have one or more ports configured to perform circuit switching and one or more ports configured to perform packet switching.