Patent classifications
G06F15/7875
Reconfigurable parallel processing with a temporary data storage coupled to a plurality of processing elements (PES) to store a PE execution result to be used by a PE during a next PE configuration
Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) that each may comprise a configuration buffer, a sequencer coupled to the configuration buffer of each of the plurality of PEs and configured to distribute one or more PE configurations to the plurality of PEs, and a gasket memory coupled to the plurality of PEs and being configured to store at least one PE execution result to be used by at least one of the plurality of PEs during a next PE configuration.
ENABLING LATE-BINDING OF SECURITY FEATURES VIA CONFIGURATION SECURITY CONTROLLER FOR ACCELERATOR DEVICES
An apparatus to facilitate enabling late-binding of security features via configuration security controller for accelerator devices is disclosed. The apparatus includes a security controller to initialize as part of a secure boot and attestation chain of trust; receive configuration data for portions of the security controller, the portions comprising components of the security controller capable of re-programming; verify and validate the configuration data to as originating from a secure and trusted source; and responsive to successful verification and validation of the configuration data, re-program the portions of the security controller based on the configuration data.
Technology For Dynamically Tuning Processor Features
A processor comprises a microarchitectural feature and dynamic tuning unit (DTU) circuitry. The processor executes a program for first and second execution windows with the microarchitectural feature disabled and enabled, respectively. The DTU circuitry automatically determines whether the processor achieved worse performance in the second execution window. In response to determining that the processor achieved worse performance in the second execution window, the DTU circuitry updates a usefulness state for a selected address of the program to denote worse performance. In response to multiple consecutive determinations that the processor achieved worse performance with the microarchitectural feature enabled, the DTU circuitry automatically updates the usefulness state to denote a confirmed bad state. In response to the usefulness state denoting the confirmed bad state, the DTU circuitry automatically disables the microarchitectural feature for the selected address for execution windows after the second execution window. Other embodiments are described and claimed.
TRANSPARENT NETWORK ACCESS CONTROL FOR SPATIAL ACCELERATOR DEVICE MULTI-TENANCY
An apparatus to facilitate transparent network access controls for spatial accelerator device multi-tenancy is disclosed. The apparatus includes a secure device manager (SDM) to: establish a network-on-chip (NoC) communication path in the apparatus, the NoC communication path comprising a plurality of NoC nodes for ingress and egress of communications on the NoC communication path; for each NoC node of the NoC communication path, configure a programmable register of the NoC node to indicate a node group that the NoC node is assigned, the node group corresponding to a persona configured on the apparatus; determine whether a prefix of received data at the NoC node matches the node group indicated by the programmable register of the NoC; and responsive to determining that the prefix does not match the node group, discard the data from the NoC node.
ENABLING SECURE COMMUNICATION VIA ATTESTATION OF MULTI-TENANT CONFIGURATION ON ACCELERATOR DEVICES
An apparatus to facilitate enabling secure communication via attestation of multi-tenant configuration on accelerator devices is disclosed. The apparatus includes a processor to: verify a base bitstream of an accelerator device, the base bitstream published by a cloud service provider (CSP); verify partial reconfiguration (PR) boundary setups and PR isolation of an accelerator device, the PR boundary setups and PR isolation published by the CSP; generate PR bitstream to fit within at least one PR region of the PR boundary setups of the accelerator device; inspect accelerator device attestation received from a secure device manager (SDM) of the accelerator device; and responsive to successful inspection of the accelerator device attestation, provide the PR bitstream to the CSP for PR reconfiguration of the accelerator device.
ENABLING SECURE STATE-CLEAN DURING CONFIGURATION OF PARTIAL RECONFIGURATION BITSTREAMS ON FPGA
An apparatus to facilitate enabling secure state-clean during configuration of partial reconfiguration bitstreams on accelerator devices is disclosed. The apparatus includes a security engine to receive an incoming partial reconfiguration (PR) bitstream corresponding to a new PR persona to configure a region of the apparatus; perform, as part of a PR configuration sequence for the new PR persona, a first clear operation to clear previously-set persona configuration bits in the region; perform, as part of the PR configuration sequence subsequent to the first clear operation, a set operation to set new persona configuration bits in the region; and perform, as part of the PR configuration sequence, a second clear operation to clear memory blocks of the region that became unfrozen subsequent to the set operation, the second clear operation performed using a persona-dependent mask corresponding to the new PR persona.
SCALABLE RUNTIME VALIDATION FOR ON-DEVICE DESIGN RULE CHECKS
An apparatus to facilitate scalable runtime validation for on-device design rule checks is disclosed. The apparatus includes a memory to store a contention set, one or more multiplexors, and a validator communicably coupled to the memory. In one implementation, the validator is to: receive design rule information for the one or more multiplexers, the design rule information referencing the contention set; analyze, using the design rule information, a user bitstream against the contention set at a programming time of the apparatus, the user bitstream for programming the one or more multiplexors; and provide an error indication responsive to identifying a match between the user bitstream and the contention set.
ENABLING STATELESS ACCELERATOR DESIGNS SHARED ACROSS MUTUALLY-DISTRUSTFUL TENANTS
An apparatus to facilitate enabling stateless accelerator designs shared across mutually-distrustful tenants is disclosed. The apparatus includes a fully-homomorphic encryption (FHE)-capable compute kernel. The FHE-capable compute kernel is to establish a secure session with a trusted environment executing on a host device communicably coupled to the apparatus; generate, as part of establishing the secure session, per-tenant FHE keys for each tenant utilizing the FHE-capable compute kernel, the per-tenant FHE keys utilized to encrypt tenant data provided to the FHE-capable compute kernel; process tenant data that is in an FHE-encrypted format encrypted with a per-tenant FHE key of the per-tenant FHE keys; and store the tenant data that is in the FHE-encrypted format encrypted with the per-tenant FHE key of the per-tenant FHE keys.
BROADCAST REMOTE SEALING FOR SCALABLE TRUSTED EXECUTION ENVIRONMENT PROVISIONING
An apparatus to facilitate broadcast remote sealing for scalable trusted execution environment provisioning is disclosed. The apparatus includes one or more processors to: request a group status report to confirm a status of a group of trusted execution platforms from a cloud service provider (CSP) providing scalable runtime validation for on-device design rule checks; validate, by a tenant, a minimum trusted computing base (TCB) declared with the group status report; determine, based on validation of the minimum TCB, whether a set of group members of the group of trusted execution platforms satisfies security requirements of the tenant; responsive to the set of group members satisfying the security requirement, utilize a group public key to encrypt a workload of the tenant; and send the encrypted workload to the CSP for storage by the CSP and subsequent execution by an execution platform of the group using a private group key.
RUNTIME FAULT DETECTION, FAULT LOCATION, AND CIRCUIT RECOVERY IN AN ACCELERATOR DEVICE
An apparatus to facilitate runtime fault detection, fault location, and circuit recovery in an accelerator device is disclosed. In one implementation, the accelerator device comprises a sensor network comprising a plurality of sensors; a secure device manager (SDM); and a sensor aggregator communicably coupled to the sensor network and the SDM. In one implementation, the sensor aggregator can receive sensor data from the sensor network; analyze the sensor data to detect a fault condition; determine a spatial location of the fault condition based on the sensor data; and generate an event for the SDM to cause the SDM to mitigate the fault condition.