Patent classifications
G06F15/17325
Instruction set
The invention relates to a computer program comprising a sequence of instructions for execution on a processing unit having instruction storage for holding the computer program, an execution unit for executing the computer program and data storage for holding data, the computer program comprising one or more computer executable instruction which, when executed, implements: a send function which causes a data packet destined for a recipient processing unit to be transmitted on a set of connection wires connected to the processing unit, the data packet having no destination identifier but being transmitted at a predetermined transmit time; and a switch control function which causes the processing unit to control switching circuitry to connect a set of connection wires of the processing unit to a switching fabric to receive a data packet at a predetermined receive time.
Persistent kernel for graphics processing unit direct memory access network packet processing
A graphics processing unit may, in accordance with a kernel, determine that at least a first packet is written to a memory buffer of the graphics processing unit by a network interface card via a direct memory access, process the at least the first packet in accordance with the kernel, and provide a first notification to a central processing unit that the at least the first packet is processed in accordance with the kernel. The graphics processing unit may further determine that at least a second packet is written to the memory buffer by the network interface card via the direct memory access, process the at least the second packet in accordance with the kernel, where the kernel comprises a persistent kernel, and provide a second notification to the central processing unit that the at least the second packet is processed in accordance with the kernel.
Networked computer with multiple embedded rings
A computer comprising a plurality of interconnected processing nodes arranged in multiple stacked layers forming a multi-face prism is provided. Each face of the prism comprises multiple stacked pairs of nodes. Said nodes are connected by at least two intralayer links. Each node is connected to a corresponding node in an adjacent pair by an interlayer link. The corresponding nodes are connected by respective interlayer links to form respective edges. Each pair forms part of a layers, each layer comprising multiple nodes, each node connected to their neighbouring nodes in the layer by at least one of the intralayer links to form a ring. Data is transmitted around paths formed by respective sets of nodes and links, each path having a first portion between a first and second endmost layers, and a second portion provided between the second and first endmost layers and comprising one of the edges.
Method of time delivery in a computing system and system thereof
There is provided a technique of time delivery in a computing system comprising a system call interface (SCI) located in a kernel space and operatively connected to a time client located in a user space. The technique comprises: using a time agent component located in the user space to measure data indicative of delay in a system time delivery and to derive therefrom a system time delivery error TE.sub.S2C; using TE.sub.S2C to enable correction of system time; and sending by the SCI the corrected system time in response to a “Read Clock RT” (RCRT) call received from the time client. The method can further comprise: measuring data indicative of delays in the system time delivery for RCRT calls with different priorities; and in response to a system time request received from the time client, providing the time client with system time corrected per TE.sub.S2C corresponding to the recognized priority thereof.
NEURAL PROCESSING DEVICE AND METHOD FOR SYNCHRONIZATION THEREOF
A neural processing device is provided. The neural processing device comprises a plurality of neural processors, a shared memory shared by the plurality of neural processors, a plurality of semaphore memories, and global interconnection. The plurality of neural processors generates a plurality of L3 sync targets, respectively. Each semaphore memory is associated with a respective one of the plurality of neural processors, and the plurality of semaphore memories receive and store the plurality of L3 sync targets, respectively. Synchronization of the plurality of neural processors is performed according to the plurality of L3 sync targets. The global interconnection connects the plurality of neural processors with the shared memory, and comprises an L3 sync channel through which an L3 synchronization signal corresponding to at least one L3 sync target is transmitted.
SYNCHRONOUS DISPLAY BLINKING
Various example embodiments described herein relate to a method for synchronizing liquid crystal display (LCD) screens. In some examples, the method includes establishing, by a first device comprising a processor, a master/slave relationship with one or more other devices; determining, by the first device, a frequency associated with turning on a first LCD screen on the first device; and sending, by the first device, a signal to each of the one or more other devices, wherein the signal comprises an instruction to turn on an LCD screen on each receiving device at a same time as the first LCD screen.
Control Barrier Network for Reconfigurable Data Processors
A processing system comprises a control bus and a plurality of logic units. The control bus is configurable by configuration data to form signal routes in a control barrier network coupled to processing units in an array of processing units. The plurality of logic units has inputs and outputs connected to the control bus and to the array of processing units. A logic unit in the plurality of logic units is operatively coupled to a processing unit in the array of processing units and is configurable by the configuration data to consume source tokens and a status signal from the processing unit on the inputs and to produce barrier tokens and an enable signal on the outputs based on the source tokens and the status signal on the inputs.
Clock recovery using between-interval timing error estimation
Disclosed clock recovery modules provide improved performance with only limited complexity and power requirements. In one illustrative embodiment, a clock recovery method includes: oversampling a receive signal to obtain mid-symbol interval (MSI) samples and between-symbol interval (BSI) samples; processing at least the MSI samples to obtain symbol decisions; filtering the symbol decisions to obtain BSI targets; determining a timing error based on a difference between the BSI samples and the BSI targets; and deriving from the timing error a clock signal for said oversampling.
Communication Between Host and Accelerator Over Network
A host system compiles a set of local programs which are provided over a network to a plurality of subsystems. By defining the synchronisation activity on the host, and then providing that information to the subsystems, the host can service a large number of subsystems. The defined synchronisation activity includes defining the synchronisation groups between which synchronisation barriers occur and the points during program execution at which data exchange with the host occurs. Defining synchronisation activity between the subsystems allows a large number of subsystems to be connecting whilst minimising the required exchanges with the host.
Extended Sync Network
An apparatus is provided for converting the form in which a synchronisation request for a barrier synchronisation is provided. The synchronisation request is provided from a first synchronisation circuitry to a second synchronisation circuitry by asserting one of a set of separate signals that may each correspond to a bit in a register or a signal on a wire. The second synchronisation circuitry provides for the packetisation of the sync request by sending a packet comprising the sync request over a network to be received at a further subsystem.