Patent classifications
G06F9/45525
Techniques for graphics processing unit profiling using binary instrumentation
Techniques and apparatus for profiling graphics processing unit (GPU) processes using binary instrumentation are described. In one embodiment, for example, an apparatus may include at least one memory comprising instructions and a processor coupled to the at least one memory. The processor may execute the instructions to implement a profiling process to profile a graphics processing unit (GPU) application being executed via a GPU, the profiling process to perform an instrumentation phase to determine an operating process being executed via the GPU and to generate instrumented binary code for the operating process, perform an execution phase to collect profiling data for a command of the operating process, and perform a completion phase for a profiling application executed via the processor to read the profiling data. Other embodiments are described.
Technologies for translation cache management in binary translation systems
Technologies for optimized binary translation include a computing device that determines a cost-benefit metric associated with each translated code block of a translation cache. The cost-benefit metric is indicative of translation cost and performance benefit associated with the translated code block. The translation cost may be determined by measuring translation time of the translated code block. The cost-benefit metric may be calculated using a weighted cost-benefit function based on an expected workload of the computing device. In response to determining to free space in the translation cache, the computing device determines whether to discard each translated code block as a function of the cost-benefit metric. In response to determining to free space in the translation cache, the computing device may increment an iteration count and skip each translated code block if the iteration count modulo the corresponding cost-benefit metric is non-zero. Other embodiments are described and claimed.
KERNEL FUSION FOR MACHINE LEARNING
Apparatuses, systems, and techniques are presented to compile code. In at least one embodiment, one or more compilers are to compile one or more compiled portions of code with one or more intermediate representations of one or more portions of code.
COMPUTER ARCHITECTURE BASED ON PROGRAM/WORKLOAD PROFILING
Disclosed herein are system, method, and computer program product embodiments for determining an appropriate FPGA for a particular computer program. An embodiment operates by a central processing unit's counter identifying a plurality of workload properties in processing a computer program, wherein the central processing unit is part of a first computer architecture. The central processing unit then sends the workload properties to a controller trained to identify a field-programmable gate array (FPGA) module based on the plurality of workload properties. The central processing unit thereafter receives a recommended FPGA module from the controller and implements the recommended FPGA module in a computer architecture for processing the computer program, whereby the second computer architecture is able to perform the computer program more efficiently than the first computer architecture.
Branch optimization during loading
The present disclosure provides a method, computer system and computer program product for branch optimization. According to the method, execution possibilities of instruction blocks corresponding to at least one branch of in a program can be determined. Then, the instruction blocks can be loaded according to the execution possibilities.
SNAPSHOT FOR COMPILER OPTIMIZATION
A method performed by a computing system includes, with a compiler, compiling source code to create an application for execution. The method further includes, after the compiling has started, recording a snapshot of compilation configurations, the compilation configurations including information obtained by the compiler during the compiling. The method further includes storing the snapshot in a predefined format and after storing the snapshot, loading the snapshot by configuring the compiler based on the compilation configuration in the snapshot.
Runtime GPU/CPU selection
A method, computer program product, and system includes a processor(s) obtaining, during runtime, from a compiler, two versions of a data parallel loop for an operation. The host computing system comprises includes a CPU and a GPU is accessible to the host. The processor(s) online profiles the two versions by asynchronously executing the first version, in a profile mode, with the GPU and executing the second version, in the profile mode, with the CPU. The processor(s) generates execution times for the first version and the second version. The processor(s) stores the executions times and performance data in a storage, where the performance data comprises a size of the data parallel loop for the operation. The processor(s) update a regression model(s) to predict performance numbers for a process of an unknown loop size. The processor(s) execute the operation with the CPU or the GPU based on the performance data.
HARDWARE OFFLOAD SUPPORT FOR AN OPERATING SYSTEM OFFLOAD INTERFACE USING OPERATION CODE VERIFICATION
A method may include receiving, by a privileged component executed by a processing device, bytecode of a packet processing component from an unprivileged component executed by the processing device, analyzing, by the privileged component, the bytecode of the packet processing component to identify whether the bytecode comprises a first command that returns a redirect, analyzing, by the privileged component, the bytecode of the packet processing component to identify whether the bytecode comprises a second command that returns a runtime computed value, and responsive to determining that the bytecode comprises the first command or the second command, setting a redirect flag maintained by the privileged component.
MANAGING PERFORMANCE OPTIMIZATION OF APPLICATIONS IN AN INFORMATION HANDLING SYSTEM (IHS)
Embodiments of systems and methods for managing performance optimization of applications executed by an Information Handling System (IHS) are described. In an illustrative, non-limiting embodiment, a method may include: identifying, by an IHS, a first application; assigning a first score to the first application based upon: (i) a user's presence state, (ii) a foreground or background application state, (iii) a power adaptor state, and (iv) a hardware utilization state, detected during execution of the first application; identifying, by the IHS, a second application; assigning a second score to the second application based upon: (i) another user's presence state, (ii) another foreground or background application state, (iii) another power adaptor state, and (iv) another hardware utilization state, detected during execution of the second application; and prioritizing performance optimization of the first application over the second application in response to the first score being greater than the second score.
RANKING SERVICE IMPLEMENTATIONS FOR A SERVICE INTERFACE
Techniques for ranking service implementations for a service interface are disclosed. Each module that includes a service implementation may be referred to as a service provider module. The ranking of the service implementations, for the particular service interface, may be based on modular information. Modular information includes information associated with module dependencies and/or service dependencies corresponding to one or more of a candidate set of service provider modules. Additionally or alternatively, the ranking of the service implementations, for the particular service interface, may be based on statically-available information and/or dynamically-available information associated with one or more of a candidate set of service implementations.