IPIQ

G06F8/45

Cascading of Graph Streaming Processors

20220300322 · 2022-09-22 ·

Blaize, Inc.

Methods, systems, and apparatuses for graph stream processing are disclosed. One apparatus includes a cascade of graph streaming processors, wherein each of the graph streaming processor includes a processor array, and a graph streaming processor scheduler. The cascade of graph streaming processors further includes a plurality of shared command buffers, wherein each shared command buffer includes a buffer address, a write pointer, and a read pointer, wherein for each of the plurality of shared command buffers a first graph streaming processor writes commands to the shared command buffer as indicated by the write pointer of the shared command buffer and a second graph streaming processor reads commands from the shared command buffer as indicated by the read pointer, wherein at least one graph streaming processor scheduler operates to manage the write pointer and the read pointer to avoid overwriting unused commands of the shared command buffer.

METHOD FOR EXECUTING COMPUTATION, COMPUTING DEVICE, COMPUTING SYSTEM, AND STORAGE MEDIUM

20220283790 · 2022-09-08 ·

Shanghai Biren Technology Co.,Ltd

A method for executing computation, a computing device, a computing system, and a storage medium are provided. The method includes: confirming, via a compiler, whether there is a call instruction related to a thread block modification request in a kernel function to be compiled; in response to confirming that there is the call instruction related to the thread block modification request in the kernel function to be compiled, determining a corresponding program segment associated with the call instruction; configuring a required thread block and thread local register for the corresponding program segment; and inserting a control instruction into the corresponding program segment to enable the thread block configured for the corresponding program segment to execute relevant computation of the corresponding program segment, and an unconfigured thread block not to execute the relevant computation. The disclosure can improve overall performance, make coding and maintenance easy and reduce error rate of code.

Cascading of graph streaming processors

11379262 · 2022-07-05 ·

Blaize, Inc.

Methods, systems and apparatuses for graph stream processing are disclosed. One apparatus includes a cascade of graph streaming processors, wherein each of the graph streaming processor includes a processor array, and a graph streaming processor scheduler. The cascade of graph streaming processors further includes a plurality of shared command buffers, wherein each shared command buffer includes a buffer address, a write pointer, and a read pointer, wherein for each of the plurality of shared command buffers a first graph streaming processor writes commands to the shared command buffer as indicated by the write pointer of the shared command buffer and a second graph streaming processor reads commands from the shared command buffer as indicated by the read pointer, wherein at least one graph streaming processor scheduler operates to manage the write pointer and the read pointer to avoid overwriting unused commands of the shared command buffer.

Performance estimation-based resource allocation for reconfigurable architectures

11410027 · 2022-08-09 ·

SambaNova Systems, Inc.

The technology disclosed relates to allocating available physical compute units (PCUs) and/or physical memory units (PMUs) of a reconfigurable data processor to operation units of an operation unit graph for execution thereof. In particular, it relates to selecting, for evaluation, an intermediate stage compute processing time between lower and upper search bounds of a generic stage compute processing time, determining a pipeline number of the PCUs and/or the PMUs required to process the operation unit graph, and iteratively, initializing new lower and upper search bounds of the generic stage compute processing time and selecting, for evaluation in a next iteration, a new intermediate stage compute processing time taking into account whether the pipeline number of the PCUs and/or the PMUs produced for a prior intermediate stage compute processing time in a previous iteration is lower or higher than the available PCUs and/or PMUs.

PARALLEL PROCESSING ARCHITECTURE USING SPECULATIVE ENCODING

20220214885 · 2022-07-07 ·

Ascenium, Inc.

Peter Foley

Techniques for program execution in a parallel processing architecture using speculative encoding are disclosed. A two-dimensional array of compute elements is accessed, where each compute element within the array of compute elements is known to a compiler and is coupled to its neighboring compute elements within the array of compute elements. Control for the array of compute elements is provided on a cycle-by-cycle basis. The control is enabled by a stream of wide, variable length, control words generated by the compiler. Two or more operations are coalesced into a control word, where the control word includes a branch decision and operations associated with the branch decision. The coalesced control word includes speculatively encoded operations for at least two possible branch paths. The at least two possible branch paths generate independent side effects. Operations associated with the branch decision that are not indicated by the branch decision are suppressed.

METHOD OF DETERMINING PROCESSING BLOCK TO BE OPTIMIZED AND INFORMATION PROCESSING APPARATUS

20220253299 · 2022-08-11 ·

Fujitsu Limited

Eiji Ohta

A non-transitory computer-readable recording medium stores a program for causing a computer to execute a process, the process includes extracting an optimization method and an optimization non-applicable condition indicating a reason why the optimization method is not applicable, from an optimization report created at a time of compiling software, determining an index value of optimization application easiness for each of a plurality of processing blocks included in the software, based on the optimization method and the optimization non-applicable condition, and determining an optimization target processing block to be optimized among the plurality of processing blocks included in the software, based on the index value.

COMPILING METHOD, COMPILING DEVICE, EXECUTION METHOD, COMPUTER-READABLE STORAGE MEDIUM AND COMPUTER DEVICE

20220308847 · 2022-09-29 ·

A Flutter-based compiling method, a compiling device, an executing method, a computer-readable storage medium, and a computer device are provided. The Flutter-based compiling method includes: receiving configuration content; in response to the configuration content, compiling and generating an executable file, where the executable file includes at least two of a Native component, a Flutter Native component and a Flutter dynamic component, and is configured to generate a routing table during operation, to enable the Native component, the Flutter Native component and the Flutter dynamic component to communicate with each other through the routing table.

USING HARDWARE-ACCELERATED INSTRUCTIONS

20220276865 · 2022-09-01 ·

Dennis Sebastian Rieber

A computer-implemented method of implementing a computation using a hardware-accelerated instruction of a processor system by solving a constraint satisfaction problem. A solution to the constraint satisfaction problem represents a possible invocation of the hardware-accelerated instruction in the computation. The constraint satisfaction problem assigns nodes of a data flow graph of the computation to nodes of a data flow graph of the instruction. The constraint satisfaction problem comprises constraints enforcing that the assigned nodes of the computation data flow graph have equivalent data flow to the instruction data flow graph, and constraints restricting which nodes of the computation data flow graph can be assigned to the inputs of the hardware-accelerated instruction, with restrictions being imposed by the hardware-accelerated instruction and/or its programming interface.

MEMORY-BASED DISTRIBUTED PROCESSOR ARCHITECTURE

20220156161 · 2022-05-19 ·

NEUROBLADE LTD.

Distributed processors and methods for compiling code for execution by distributed processors are disclosed. In one implementation, a distributed processor may include a substrate; a memory array disposed on the substrate; and a processing array disposed on the substrate. The memory array may include a plurality of discrete memory banks, and the processing array may include a plurality of processor subunits, each one of the processor subunits being associated with a corresponding, dedicated one of the plurality of discrete memory banks. The distributed processor may further include a first plurality of buses, each connecting one of the plurality of processor subunits to its corresponding, dedicated memory bank, and a second plurality of buses, each connecting one of the plurality of processor subunits to another of the plurality of processor subunits.

Merging Buffer Access Operations of a Compute Graph

20250231748 · 2025-07-17 ·

SambaNova Systems, Inc.

A method for merging buffers and associated operations includes receiving a compute graph for a reconfigurable dataflow computing system and conducting a buffer allocation and merging process responsive to determining that a first operation specified by a first operation node is a memory indexing operation and that the first operation node is a producer for exactly one consuming node that specifies a second operation. The buffer allocation and merging process may include replacing the first operation node and the consuming node with a merged buffer node within the graph responsive to determining that the first operation and the second operation can be merged into a merged indexing operation and that the resource cost of the merged node is less than the sum of the resource costs of separate buffer nodes. A corresponding system and computer readable medium are also disclosed herein.

Patent classifications

G06F8/45