Patent classifications
G06F9/30079
DYNAMIC LOAD BALANCING OF OPERATIONS FOR REAL-TIME DEEP LEARNING ANALYTICS
Apparatuses, systems, and techniques to balance processing load between a plurality of hardware accelerators. In at least one embodiment, operations performed on batches of frames of a video (e.g., as part of a video analytics pipeline) are distributed by a load balancer between a first hardware accelerator and a second hardware accelerator.
Continuous run-time validation of program execution: a practical approach
Trustworthy systems require that code be validated as genuine. Most systems implement this requirement prior to execution by matching a cryptographic hash of the binary file against a reference hash value, leaving the code vulnerable to run time compromises, such as code injection, return and jump-oriented programming, and illegal linking of the code to compromised library functions. The Run-time Execution Validator (REV) validates, as the program executes, the control flow path and instructions executed along the control flow path. REV uses a signature cache integrated into the processor pipeline to perform live validation of executions, at basic block boundaries, and ensures that changes to the program state are not made by the instructions within a basic block until the control flow path into the basic block and the instructions within the basic block are both validated.
High performance merge sort with scalable parallelization and full-throughput reduction
Disclosed herein is a novel multi-way merge network, referred to herein as a Hybrid Comparison Look Ahead Merge (HCLAM), which incurs significantly less resource consumption as scaled to handle larger problems. In addition, a parallelization scheme is disclosed, referred to herein as Parallelization by Radix Pre-sorter (PRaP), which enables an increase in streaming throughput of the merge network. Furthermore, high performance reduction scheme is disclosed to achieve full throughput.
Method and apparatus for minimally intrusive instruction pointer-aware processing resource activity profiling
Systems and methods for minimally intrusive instruction pointer-aware processing resource activity profiling are disclosed. In one embodiment, a graphics processor includes a grouping of processing resources and control logic that is associated with the grouping of processing resources. The control logic is configured to sample a state of at least one processing resource of the grouping of processing resources and to determine activity data from the state with the activity data including at least one of stalls and reason counts for stalling activity, instruction types, pipeline utilization, thread utilization, and shader activity.
HARDWARE COHERENCE FOR MEMORY CONTROLLER
A system includes a non-coherent component; a coherent, non-caching component; a coherent, caching component; and a level two (L2) cache subsystem coupled to the non-coherent component, the coherent, non-caching component, and the coherent, caching component. The L2 cache subsystem includes a L2 cache; a shadow level one (L1) main cache; a shadow L1 victim cache; and a L2 controller. The L2 controller is configured to receive and process a first transaction from the non-coherent component; receive and process a second transaction from the coherent, non-caching component; and receive and process a third transaction from the coherent, caching component.
METHODS, SYSTEMS, AND COMPUTER READABLE MEDIA FOR ON-DEMAND, ON-DEVICE COMPILING AND USE OF PROGRAMMABLE PIPELINE DEVICE PROFILES
A method for on-demand, on-device compiling and use of programmable pipeline device profiles includes storing, on a network test or visibility device, programmable pipeline device source code and a plurality of different programmable pipeline device profile definitions containing parameters for implementing different programmable pipeline device profile variations. The method further include implementing, on the network test or visibility device, a compiler that receives the programmable pipeline device source code and one of the profile definitions as input and that produces as output a programmable pipeline device profile including compiled object code for configuring a programmable pipeline device to implement a network test or network visibility function. The method further includes invoking the compiler to compile, using one of the profile definitions, the programmable pipeline device source code into a programmable pipeline device profile for implementing a network test or visibility function and loading the profile on the network test or visibility device to configure the programmable pipeline device for implementing the network test or network visibility function.
Merging data for write allocate
A method includes receiving, by a level two (L2) controller, a write request for an address that is not allocated as a cache line in a L2 cache. The write request specifies write data. The method also includes generating, by the L2 controller, a read request for the address; reserving, by the L2 controller, an entry in a register file for read data returned in response to the read request; updating, by the L2 controller, a data field of the entry with the write data; updating, by the L2 controller, an enable field of the entry associated with the write data; and receiving, by the L2 controller, the read data and merging the read data into the data field of the entry.
Managing out-of-order retirement of instructions
Retiring instructions out-of-order includes: receiving processor instructions comprising two or more and fewer than all processor instructions generated based on a program, where the processor instructions include a first instruction and a second instruction such that the first instruction precedes the second instruction in a program order of the program; receiving a start instruction that immediately precedes the processor instructions and indicates that the processor instructions are to be retired out-of-order; receiving a stop instruction immediately that succeeds the processor instructions and indicates a stop to out-of-order instruction retirement; and, in response to completing execution of the second instruction before completing execution of the first instruction, retiring the second instruction before retiring the first instruction.
PREFETCH MECHANISM FOR A CACHE STRUCTURE
An apparatus and method is provided, the apparatus comprising a processor pipeline to execute instructions, a cache structure to store information for reference by the processor pipeline when executing said instructions; and pref etch circuitry to issue prefetch requests to the cache structure to cause the cache structure to prefetch information into the cache structure in anticipation of a demand request for that information being issued to the cache structure by the processor pipeline. The processor pipeline is arranged to issue a trigger to the prefetch circuitry on detection of a given event that will result in a reduced level of demand requests being issued by the processor pipeline, and the prefetch circuitry is configured to control issuing of pref etch requests in dependence on reception of the trigger.
METHODS, SYSTEMS, AND APPARATUSES FOR A SCALABLE RESERVATION STATION IMPLEMENTING A SINGLE UNIFIED SPECULATION STATE PROPAGATION AND EXECUTION WAKEUP MATRIX CIRCUIT IN A PROCESSOR
Systems, methods, and apparatuses relating to a scalable reservation station circuit implementing a single unified speculation state propagation and execution wakeup matrix in a processor are described. In one embodiment, a hardware processor core includes a decoder circuit to decode one or more instructions into a first micro-operation to load data from a data cache, a second micro-operation dependent on the first micro-operation, and a third micro-operation dependent on the second micro-operation; an execution circuit to execute the first micro-operation, the second micro-operation, and the third micro-operation; and a reservation station circuit comprising a load speculation tracker circuit and coupled between the decoder circuit and the execution circuit, the load speculation tracker circuit to, for a reservation station entry of the third micro-operation, track progress of the first micro-operation in the data cache to generate a cancellation indication for the third micro-operation in response to a miss of the data in the data cache for the first micro-operation, wherein the load speculation tracker circuit is to begin to track the progress of the first micro-operation in the data cache in response to a dispatch of the first micro-operation into the data cache.