G06F2015/768

Acceleration System and Dynamic Configuration Method Thereof
20230237008 · 2023-07-27 · ·

An acceleration system includes a plurality of modules. Each of the plurality of modules includes at least one central processing unit, at least one graphics processing unit, at least one field programmable gate array, or at least one application specific integrated circuit. At least one of the plurality of modules includes at least another of the plurality of modules such that the acceleration system is structural and nested.

ISSUING INSTRUCTIONS ON A VECTOR PROCESSOR
20230214352 · 2023-07-06 ·

The present disclosure relates to a mechanism for issuing instructions in a processor (e.g., a vector processor) implemented as an overlay on programmable hardware (e.g., a field programmable gate array (FPGA) device). Implementations described herein include features for optimizing resource availability on programmable hardware units and enabling superscalar execution when coupled with a temporal single-instruction multiple data (SIMD). Systems described herein involve an issue component of a processor controller (e.g., a vector processor controller) that enables fast and efficient instruction issue while verifying that structural and data hazards between instructions have been resolved.

Pipeline including separate hardware data paths for different instruction types

A processing element is implemented in a stage of a pipeline and configured to execute an instruction. A first array of multiplexers is to provide information associated with the instruction to the processing element in response to the instruction being in a first set of instructions. A second array of multiplexers is to provide information associated with the instruction to the first processing element in response to the instruction being in a second set of instructions. A control unit is to gate at least one of power or a clock signal provided to the first array of multiplexers in response to the instruction being in the second set.

TECHNOLOGIES FOR HARDWARE MICROSERVICES ACCELERATED IN XPU

Methods, apparatus, and software and for hardware microservices accelerated in other processing units (XPUs). The apparatus may be a platform including a System on Chip (SOC) and an XPU, such as a Field Programmable Gate Array (FPGA). The FPGA is configured to implement one or more Hardware (HW) accelerator functions associated with HW microservices. Execution of microservices is split between a software front-end that executes on the SOC and a hardware backend comprising the HW accelerator functions. The software front-end offloads a portion of a microservice and/or associated workload to the HW microservice backend implemented by the accelerator functions. An XPU or FPGA proxy is used to provide the microservice front-ends with shared access to HW accelerator functions, and schedules/multiplexes access to the HW accelerator functions using, e.g., telemetry data generated by the microservice front-ends and/or the HW accelerator functions. The platform may be an infrastructure processing unit (IPU) configured to accelerate infrastructure operations.

FPGA coprocessor with sparsity and density modules for execution of low and high parallelism portions of graph traversals

An FPGA-based graph data processing method is provided for executing graph traversals on a graph having characteristics of a small-world network by using a first processor being a CPU and a second processor that is a FPGA and is in communicative connection with the first processor, wherein the first processor sends graph data to be traversed to the second processor, and obtains result data of the graph traversals from the second processor for result output after the second processor has completed the graph traversals of the graph data by executing level traversals, and the second processor comprises a sparsity processing module and a density processing module, the sparsity processing module operates in a beginning stage and/or an ending stage of the graph traversals, and the density processing module with a higher degree of parallelism than the sparsity processing module operates in the intermediate stage of the graph traversals.

RECONFIGURABLE PROCESSOR AND OPERATION METHOD THEREFOR

Provided are a reconfigurable processor and a method of operating the same, the reconfigurable processor including: a configurable memory configured to receive a task execution instruction from a control processor; and a plurality of reconfigurable arrays, each configured to receive configuration information from the configurable memory, wherein each of the plurality of reconfigurable arrays simultaneously executes a task based on the configuration information.

RECONFIGURABLE MICROPROCESSOR HARDWARE ARCHITECTURE
20170300333 · 2017-10-19 ·

A reconfigurable, multi-core processor includes a plurality of memory blocks and programmable elements, including units for processing, memory interface, and on-chip cognitive data routing, all interconnected by a self-routing cognitive on-chip network. In embodiments, the processing units perform intrinsic operations in any order, and the self-routing network forms interconnections that allow the sequence of operations to be varied and both synchronous and asynchronous data to be transmitted as needed. A method for programming the processor includes partitioning an application into modules, determining whether the modules execute in series, program-driven parallel, or data-driven parallel, determining the data flow required between the modules, assigning hardware resources as needed, and automatically generating machine code for each module. In embodiments, a Time Field is added to the instruction format for all programming units that specifies the number of clock cycles for which only one instruction fetch and decode will be performed.

Fault-tolerant scalable modular quantum computer architecture with an enhanced control of multi-mode couplings between trapped ion qubits

A modular quantum computer architecture is developed with a hierarchy of interactions that can scale to very large numbers of qubits. Local entangling quantum gates between qubit memories within a single modular register are accomplished using natural interactions between the qubits, and entanglement between separate modular registers is completed via a probabilistic photonic interface between qubits in different registers, even over large distances. This architecture is suitable for the implementation of complex quantum circuits utilizing the flexible connectivity provided by a reconfigurable photonic interconnect network. The subject architecture is made fault-tolerant which is a prerequisite for scalability. An optimal quantum control of multimode couplings between qubits is accomplished via individual addressing the qubits with segmented optical pulses to suppress crosstalk in each register, thus enabling high-fidelity gates that can be scaled to larger qubit registers for quantum computation and simulation.

Issuing instructions on a vector processor

The present disclosure relates to a mechanism for issuing instructions in a processor (e.g., a vector processor) implemented as an overlay on programmable hardware (e.g., a field programmable gate array (FPGA) device). Implementations described herein include features for optimizing resource availability on programmable hardware units and enabling superscalar execution when coupled with a temporal single-instruction multiple data (SIMD). Systems described herein involve an issue component of a processor controller (e.g., a vector processor controller) that enables fast and efficient instruction issue while verifying that structural and data hazards between instructions have been resolved.

HETEROGENEOUS ACCELERATOR FOR HIGHLY EFFICIENT LEARNING SYSTEMS
20220138132 · 2022-05-05 ·

An apparatus may include a heterogeneous computing environment that may be controlled, at least in part, by a task scheduler in which the heterogeneous computing environment may include a processing unit having fixed logical circuits configured to execute instructions; a reprogrammable processing unit having reprogrammable logical circuits configured to execute instructions that include instructions to control processing-in-memory functionality; and a stack of high-bandwidth memory dies in which each may be configured to store data and to provide processing-in-memory functionality controllable by the reprogrammable processing unit such that the reprogrammable processing unit is at least partially stacked with the high-bandwidth memory dies. The task scheduler may be configured to schedule computational tasks between the processing unit, and the reprogrammable processing unit.