Patent classifications
G06F9/4862
GENERATING HARDWARE PROFILING INFORMATION FOR MULTI-THREADED ACCELERATORS
Processors executing machine learning workloads execute many data movement tasks in parallel. Collecting fine-grained performance information of the hardware execution of the data movement tasks while ensuring that the extraction of information does not affect the performance of workload execution is not trivial. To address this challenge, hardware profiling circuitry is integrated within the data movement engine to generate accurate task-level hardware profiling information, including timestamps, stall counts, a byte count, and cycle counts for individual data movement tasks. Upon executing a post action in the data movement engine for a data movement task, a log data entry for the data movement task can be written to memory at a memory address that corresponds to the data movement task and the context. Derived metrics, such as effective bandwidth, can be computed based on log data entries, facilitating diagnostics and system-level tuning.
Dynamic provisioning of portions of a data processing array for spatial and temporal sharing
Dynamic provisioning of portions of a data processing array includes receiving, from an executing application, a context request. The context request specifies a requested task to be performed by a data processing array. A configuration for the data processing array is selected from a plurality of configurations for the data processing array. The selected configuration conforms with the context request and is capable of performing the requested task. A determination is made whether the selected configuration is implementable in the data processing array based, at least in part, on a space requirement of the selected configuration and a current status of the data processing array. The selected configuration is selectively implemented in the data processing array based on the determination.