G06F15/825

Computer architecture with fixed program dataflow elements and stream processor

A hardware accelerator for computers combines a stand-alone, high-speed, fixed program dataflow functional element with a stream processor, the latter of which may autonomously access memory in predefined access patterns after receiving simple stream instructions and provide them to the dataflow functional element. The result is a compact, high-speed processor that may exploit fixed program dataflow functional elements.

EXECUTION ENGINE FOR EXECUTING SINGLE ASSIGNMENT PROGRAMS WITH AFFINE DEPENDENCIES
20210286756 · 2021-09-16 ·

The execution engine is a new organization for a digital data processing apparatus, suitable for highly parallel execution of structured fine-grain parallel computations. The execution engine includes a memory for storing data and a domain flow program, a controller for requesting the domain flow program from the memory, and further for translating the program into programming information, a processor fabric for processing the domain flow programming information and a crossbar for sending tokens and the programming information to the processor fabric.

MASSIVELY PARALLEL HIERARCHICAL CONTROL SYSTEM AND METHOD
20210263885 · 2021-08-26 ·

The present disclosure relates to an electronic control system for controlling controllable elements of an external component. The system makes use of a state translator subsystem (“STS”) for receiving a state command from an external subsystem. The STS may have at least one module for processing the state command and generating operational commands over a first plurality of channels in parallel form, for controlling the elements of the external component. A programmable calibration command translation layer subsystem (“PCCTL”) is used and configured to receive and use the operational commands to generate granular level commands for controlling the elements, and to transmit the granular level commands over a second plurality of channels in parallel form. A subsystem is coupled between the PCCTL and the elements, and configured to receive the granular level commands from the PCCTL and to use the granular level commands to generate final output commands. The final output commands are applied in parallel, over a third plurality of channels, to the elements.

PROGRAMMABLE DEVICE, HIERARCHICAL PARALLEL MACHINES, AND METHODS FOR PROVIDING STATE INFORMATION
20210255911 · 2021-08-19 ·

Programmable devices, hierarchical parallel machines and methods for providing state information are described. In one such programmable device, programmable elements are provided. The programmable elements are configured to implement one or more finite state machines. The programmable elements are configured to receive an N-digit input and provide a M-digit output as a function of the N-digit input. The M-digit output includes state information from less than all of the programmable elements. Other programmable devices, hierarchical parallel machines and methods are also disclosed.

Systems and methods for stream-dataflow acceleration wherein a delay is implemented so as to equalize arrival times of data packets at a destination functional unit

A dataflow accelerator including a control/command core, a scratchpad and a coarse grain reconfigurable array (CGRA) according to an exemplary embodiment is disclosed. The scratchpad may include a write controller to transmit data to an input vector port interface and to receive data from the input vector port interface. The CGRA may receive data from the input vector port interface and includes a plurality of interconnects and a plurality of functional units.

RESERVOIR COMPUTING DATA FLOW PROCESSOR
20210263884 · 2021-08-26 · ·

A reservoir computing data flow processor includes a plurality of reservoir units to be units constituting a reservoir. The reservoir is able to be reconfigured by changing a connection relationship between the reservoir units. Each of the reservoir units is an operation unit block configured to execute a predetermined operation. The operation unit block includes a first adder configured to perform an addition operation on at least two inputs, a nonlinear operator configured to apply a nonlinear function to an output from the first adder or a result of multiplying the output by a predetermined coefficient, and a second adder configured to perform an addition operation on at least two inputs including an output from the nonlinear operator or a result of multiplying the output by a predetermined coefficient.

Programmable device, hierarchical parallel machines, and methods for providing state information
11003515 · 2021-05-11 · ·

Programmable devices, hierarchical parallel machines and methods for providing state information are described. In one such programmable device, programmable elements are provided. The programmable elements are configured to implement one or more finite state machines. The programmable elements are configured to receive an N-digit input and provide a M-digit output as a function of the N-digit input. The M-digit output includes state information from less than all of the programmable elements. Other programmable devices, hierarchical parallel machines and methods are also disclosed.

Execution engine for executing single assignment programs with affine dependencies

The execution engine is a new organization for a digital data processing apparatus, suitable for highly parallel execution of structured fine-grain parallel computations. The execution engine includes a memory for storing data and a domain flow program, a controller for requesting the domain flow program from the memory, and further for translating the program into programming information, a processor fabric for processing the domain flow programming information and a crossbar for sending tokens and the programming information to the processor fabric.

DATA STRUCTURE DESCRIPTORS FOR DEEP LEARNING ACCELERATION

Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data structure register storing a data structure descriptor describing an operand as a fabric vector or a memory vector. The data structure descriptor further describes the memory vector as one of a one-dimensional vector, a four-dimensional vector, or a circular buffer vector. Optionally, the data structure descriptor specifies an extended data structure register storing an extended data structure descriptor. The extended data structure descriptor specifies parameters relating to a four-dimensional vector or a circular buffer vector.

Massively parallel hierarchical control system and method

An electronic control system is disclosed for controlling individually controllable elements of an external component. In one embodiment the system may include a state translator subsystem for receiving a state command from an external subsystem. The state translator subsystem may have at least one module for processing the state command and generating operational commands for controlling the elements to achieve a desired state or condition. A programmable calibration command translation layer (PCCTL) subsystem may be included which receives and uses the operational commands to generate granular level commands for controlling the elements. A feedback control layer subsystem may be included which applies the granular level commands to the elements, and further modifies the granular level commands as needed to control the elements in closed loop fashion.