Patent classifications
G06F9/30127
Hardware for supporting OS driven load anticipation based on variable sized load units
A processor scheduling structure, a method and an integrated circuit are provided. In accordance with at least one embodiment, the processor scheduling structure comprises a processor circuit and an operating system task aware caching (OTC) controller circuit coupled to the processor circuit. The OTC controller circuit comprises a load request timer, a load sequence queue (LSQ), and a request arbiter. The timer and the LSQ are coupled to and provide inputs to the request arbiter. The processor circuit comprises an internal memory and a processor core. The OTC controller circuit is configured to schedule processor tasks for the processor circuit in accordance with both priority-based scheduling, using the LSQ, and time-triggered scheduling, using the load request timer.
COMPLIANCE AWARE SERVICE REGISTRY AND LOAD BALANCING
Techniques facilitating compliance aware service registry and load balancing are provided. A system can comprise a memory that stores computer executable components and a processor that executes computer executable components stored in the memory. The computer executable components can comprise an extraction component that removes a service from a load balancer ring based on a first determination by a verification component that a compliance state of the service is a non-compliant state. Further, the computer executable components can comprise an insertion component that adds the service to the load balancer ring based on a second determination by the verification component that, after a defined amount of time, the compliance state of the service is a compliant state.
Tightly coupled processor arrays using coarse grained reconfigurable architecture with iteration level commits
An apparatus and method for supporting simultaneous multiple iterations (SMI) and iteration level commits (ILC) in a course grained reconfigurable architecture (CGRA). The apparatus includes: Hardware structures that connect all of multiple processing engines (PEs) to a load-store unit (LSU) configured to keep track of which compiled program code iterations have completed, which ones are in flight and which are yet to begin, and a control unit including hardware structures that are used to maintain synchronization and initiate and terminate loops within the PEs. The PEs, LSU and control unit are configured to commit instructions, and save and restore context at loop iteration boundaries. In doing so, the apparatus tracks and buffers state of in-flight iterations, and detects conditions that prevent an iteration from completing. In support of ILC functions, the LSU is iteration aware and includes: iteration-interleaved load-store queue (LSQ) banks; a Bloom Filter for filtering instructions; and a load coalescing buffer.
Compliance aware service registry and load balancing
Techniques facilitating compliance aware service registry and load balancing are provided. A system can comprise a memory that stores computer executable components and a processor that executes computer executable components stored in the memory. The computer executable components can comprise an extraction component that removes a service from a load balancer ring based on a first determination by a verification component that a compliance state of the service is a non-compliant state. Further, the computer executable components can comprise an insertion component that adds the service to the load balancer ring based on a second determination by the verification component that, after a defined amount of time, the compliance state of the service is a compliant state.
Dynamic wave pairing
This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for dynamic wave pairing. A graphics processor may allocate one or more GPU workloads to one or more wave slots of a plurality of wave slots. The graphics processor may select a first execution slot of a plurality of execution slots for executing the one or more GPU workloads. The selection may be based on one of a plurality of granularities. The graphics processor may execute, at the selected first execution slot, the one or more GPU workloads at the one of the plurality of granularities.
Optimize bound information accesses in buffer protection
A method, system and apparatus for providing bound information accesses in buffer protection, including providing one-to-one mapping between a general-purpose register and bound information in a BI (bound information) register, saving loaded bound information in the BI register for future use, providing integrity of the bound information in the BI register that is maintained along program execution, and providing a pro-active load of the bound information with one-bit extra control on load instruction of the BI register.
Data loading
A data loading circuit and method are provided. The circuit is configured to load data for a feature map calculated by a neural network into a calculation circuit, wherein the size of the convolution kernel of the neural network is K*K data, and a window corresponding to the convolution kernel slides with a step size of S in the feature map, where K and S are positive integers and S<K, the circuit comprising: two data loaders comprising a first data loader and a second data loader; and a controller configured to: control the first data loader to be in a data outputting mode and control the second data loader to be in a data reading mode, when the window slides within K consecutive rows of the feature map.
Asynchronous flush and restore of distributed history buffer
Techniques are disclosed for performing a flush and restore of a history buffer (HB) in a processing unit. One technique includes identifying one or more entries of the HB to restore to a register file in the processing unit. For each of the one or more HB entries, a determination is made whether to send the HB entry to the register file via a first restore bus or via a second restore bus, different from the first restore bus, based on contents of the HB entry. Each of the one or more HB entries is then sent to the register file via one of the first restore bus or the second restore bus, based on the determination.
SYNCHRONIZING MULTIPLE PROCESSING SYSTEMS
Systems and methods are directed to a device comprising a product status synchronization device. Examples of the product status synchronization device comprise a plurality of input ports and an encoder input, each of the input ports coupled to one of a plurality of registers and configured to receive a signal indicative of a status assigned to a product by one of a plurality of process stations, and to register the status assigned to the product into the one of the plurality of registers coupled to the input port, wherein one or more of the plurality of process stations are located at different distances from a predetermined location along a processing line and are configured to asynchronously process one or more products moving through the processing line.
ASYNCHRONOUS FLUSH AND RESTORE OF DISTRIBUTED HISTORY BUFFER
Techniques are disclosed for performing a flush and restore of a history buffer (HB) in a processing unit. One technique inludes identifying one or more entries of the HB to restore to a register file in the processing unit. For each of the one or more HB entries, a determination is made whether to send the HB entry to the register file via a first restore bus or via a second restore bus, different from the first restore bus, based on contents of the HB entry. Each of the one or more HB entries is then sent to the register file via one of the first restore bus or the second restore bus, based on the determination.