G06F15/7825

Tile-based result buffering in memory-compute systems

A reconfigurable compute fabric can include multiple nodes, and each node can include multiple tiles with respective processing and storage elements. A first tile in a first node can include a processor with a processor output and a first register network configured to receive information from the processor output and information from one or more of the multiple other tiles in the first node. In response to an output instruction and a delay instruction, the register network can provide an output signal to one of the multiple other tiles in the first node. Based on the output instruction, the output signal can include one or the other of the information from the processor output and the information from one or more of the multiple other tiles in the first node. A timing characteristic of the output signal can depend on the delay instruction.

OPTIMIZING NOC PERFORMANCE USING CROSSBARS
20230176736 · 2023-06-08 ·

A system including an array of processing elements, a plurality of periphery crossbars and a plurality of storage components is described. The array of processing elements is interconnected in a grid via a network on an integrated circuit. The periphery crossbars are connected to a plurality of edges of the array of processing elements. The storage components are connected to the periphery crossbars.

Relocatable FPGA Modules

A logic block can be relocated without recompilation from a first area to a second area on a field-programmable gate array (FPGA) if the pattern of fabric tiles in the second area is the same as the pattern of fabric tiles in the first area, and if the two areas have the same dimensions. The design system runs synthesis, placement, and routing on a partition of a design at a first location, exports that partition to a persistent on-disk database, imports one or multiple copies of the partition into a larger design, and moves one or more of the copies from the first area to a target area in the larger design. The compatibility of the second area may be identified based on fabric tile signatures of the first area and the second area

Configurable network-on-chip for a programmable device

An example programmable integrated circuit (IC) includes a processor, a plurality of endpoint circuits, a network-on-chip (NoC) having NoC master units (NMUs), NoC slave units (NSUs), NoC programmable switches (NPSs), a plurality of registers, and a NoC programming interface (NPI). The processor is coupled to the NPI and is configured to program the NPSs by loading an image to the registers through the NPI for providing physical channels between NMUs to the NSUs and providing data paths between the plurality of endpoint circuits.

Network on layer enabled architectures
11264361 · 2022-03-01 · ·

The technology relates to a system on chip (SoC). The SoC may include a network on layer including one or more routers and an application specific integrated circuit (ASIC) layer bonded to the network layer, the ASIC layer including one or more components. In some instances, the network layer and the ASIC layer each include an active surface and a second surface opposite the active surface. The active surface of the ASIC layer and the second surface of the network may each include one or more contacts, and the network layer may be bonded to the ASIC layer via bonds formed between the one or more contacts on the second surface of the network layer and the one or more contacts on the active surface of the ASIC layer.

Bit string lookup data structure
11263010 · 2022-03-01 · ·

Systems, apparatuses, and methods related to bit string operations using a computing tile are described. An example apparatus includes computing device (or “tile”) that includes a processing unit and a memory resource configured as a cache for the processing unit. A data structure can be coupled to the computing device. The data structure can be configured to receive a bit string that represents a result of an arithmetic operation, a logical operation, or both and store the bit string that represents the result of the arithmetic operation, the logical operation, or both. The bit string can be formatted in a format different than a floating-point format.

RECONFIGURABLE MICROPROCESSOR HARDWARE ARCHITECTURE
20170300333 · 2017-10-19 ·

A reconfigurable, multi-core processor includes a plurality of memory blocks and programmable elements, including units for processing, memory interface, and on-chip cognitive data routing, all interconnected by a self-routing cognitive on-chip network. In embodiments, the processing units perform intrinsic operations in any order, and the self-routing network forms interconnections that allow the sequence of operations to be varied and both synchronous and asynchronous data to be transmitted as needed. A method for programming the processor includes partitioning an application into modules, determining whether the modules execute in series, program-driven parallel, or data-driven parallel, determining the data flow required between the modules, assigning hardware resources as needed, and automatically generating machine code for each module. In embodiments, a Time Field is added to the instruction format for all programming units that specifies the number of clock cycles for which only one instruction fetch and decode will be performed.

NoC Interconnect with Linearly-Tunable QoS Guarantees for Real-time Isolation

Disclosed is method for operating an interposer that includes assigning a binary port weight to a plurality of input ports of the interposer. The sum of all of the port weights is less than or equal to a number of traversals available to the interposer in a cycle. A traversal counter is set zero at the beginning of each cycle. The output of the traversal counter is a binary number of m bits. A mask is generated when a bit of the traversal counter transitions from a zero to a one. The mask is generated having the m−k+1 bit of the mask equal to one and all other bits of the mask equal to zero. Data is transmitted from each port when both the binary port weight and the mask have a one in the same bit position.

METHODS AND APPARATUS FOR PROCESSING IN A NETWORK ON CHIP (NOC)
20170295111 · 2017-10-12 ·

Methods and apparatus of delegating instructions or data from a CU to an NOC node in a network on chip (NOC) is disclosed. The NOC node executes the delegated instructions or processes the delegated data. An NOC controller (NCC), which is operatively coupled to the CU and the NOC node, facilitates delegating the instructions or data from the CU to the NOC node.

Link delay based routing apparatus for a network-on-chip

A router of a network-on-chip receives delay information associated with a plurality of links of the network-on-chip. The router determines at least one link of a data path based on the delay information.