Patent classifications
G06F15/7875
Circular reconfiguration for a reconfigurable parallel processor using a plurality of chained memory ports
Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of reconfigurable units that may include a plurality of processing elements (PEs) and a plurality of memory ports (MPs) for the plurality of PEs to access a memory unit. Each of the plurality of reconfigurable units may comprise a configuration buffer and a reconfiguration counter. The processor may further comprise a sequencer coupled to the configuration buffer of each of the plurality of reconfigurable units and configured to distribute a plurality of configurations to the plurality of reconfigurable units for the plurality of PEs and the plurality of MPs to execute a sequence of instructions.
Shared memory access for a reconfigurable parallel processor with a plurality of chained memory ports
Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) each having a plurality of arithmetic logic units (ALUs) that are configured to execute a same instruction in parallel threads and a plurality of memory ports (MPs) for the plurality of PEs to access a memory unit. Each of the plurality of MPs may comprise an address calculation unit configured to generate respective memory addresses for each thread to access a common area in the memory unit.
Private memory access for a reconfigurable parallel processor using a plurality of chained memory ports
Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) and a plurality of memory ports (MPs) for the plurality of PEs to access a memory unit. Each PE may have a plurality of arithmetic logic units (ALUs) that are configured to execute a same instruction in parallel threads. Each of the plurality of MPs may comprise an address calculation unit configured to generate respective memory addresses for each thread to access a different memory bank in the memory unit.
Reconfigurable Parallel Processing
Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) that each may comprise a configuration buffer, a sequencer coupled to the configuration buffer of each of the plurality of PEs and configured to distribute one or more PE configurations to the plurality of PEs, and a gasket memory coupled to the plurality of PEs and being configured to store at least one PE execution result to be used by at least one of the plurality of PEs during a next PE configuration.
Apparatus and Method for Per-Virtual Machine Concurrent Performance Monitoring
Apparatus and method for concurrent performance monitoring. For example, one embodiment of an apparatus comprises: compute hardware logic comprising parallel execution resources to concurrently execute a number of workloads; virtualization hardware logic to allocate the parallel execution resources between a number of virtual machines, each virtual machine to execute a workload on its allocated portion of the execution resources concurrently with workloads executed by one or more other virtual machines executed on corresponding other allocated portions of the execution resources; and programmable performance monitoring circuitry to be dynamically partitioned based on the number of virtual machines and the portion of the execution resources allocated to each virtual machine, the programmable performance monitoring circuitry to differentiate between performance monitoring data of different virtual machines based on one or more unique identifiers associated with each of the allocated portions of execution resources.
Apparatus and Method for Concurrent Performance Monitoring per Compute Hardware Context
Apparatus and method for concurrent performance monitoring. For example, one embodiment of an apparatus comprises: compute hardware logic to concurrently process a number of workloads, the compute hardware logic to be subdivided into a plurality of compute hardware contexts based on the number of workloads; and programmable performance monitoring circuitry to be dynamically partitioned to perform parallel performance monitoring operations to monitor performance of each of the plurality of compute hardware contexts while the number of workloads are concurrently processed, the programmable performance monitoring circuitry to differentiate between performance monitoring data of different compute hardware contexts based on a unique identifier associated with each of the compute hardware contexts.
Reconfigurable parallel processing
Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) that each may comprise a configuration buffer, a sequencer coupled to the configuration buffer of each of the plurality of PEs and configured to distribute one or more PE configurations to the plurality of PEs, and a gasket memory coupled to the plurality of PEs and being configured to store at least one PE execution result to be used by at least one of the plurality of PEs during a next PE configuration.
Automatic system service resource management for virtualizing low-latency workloads that are input/output intensive
An approach for providing low-latency and fast response times required by virtual machine (VM) telecommunication (telco) workloads is disclosed. In one embodiment, a kernel scheduler identifies hypervisor system contexts that are outside of traditional virtual central processing unit (CPU) contexts and servicing telco workloads. The scheduler grants each such context exclusive affinity to a respective physical CPU, up to a configurable maximum number of such grants. The scheduler also increases the CPU reservations in a system resource pool for the hypervisor by one CPU, indicating that there is one fewer physical CPU on which VMs may be placed or moved. The exclusive affinities and CPU reservations may be dynamically adjusted as, e.g., new latency-sensitive VMs are powered on with virtual CPUs that need to be granted exclusive affinity to physical CPUs or contexts no longer relevant to telco workloads are demoted to not have exclusive affinity to physical CPUs.
Method and apparatus for a configurable hardware accelerator
Method and apparatus for process accelerator (PA) using configurable hardware accelerators is provided. The PA can include a plurality of processing elements (PEs). The PEs of the PA can be used to accelerate a process and/or one or more threads. PEs can include PE local memory which due to the memories' close physical proximity to the PE can result in reduced energy consumption. The plurality of PEs can be daisy-chain connected or DMA mode can be used to write the result of a PE directly into the PE local memory of another PE for further processing.
Processor chip, dongle device, and operation method
A processor chip includes a logic circuit. The logic circuit is configured to be coupled to an electronic device. A configuration of the logic circuit corresponds to a plurality of candidate configurations. The configuration of the logic circuit is switched among the candidate configurations, and the electronic device associates with the processor chip to implement a function corresponding to the configuration of the logic circuit. When the configuration of the logic circuit is a first configuration and the electronic device executes a first driver program, the function is a first network-connection function. When the configuration of the logic circuit is a second configuration and the electronic device executes a second driver program, the function is a second network-connection function different from the first network-connection function.