G06F8/4441

Eager and optimistic evaluation of promises

The invention relates to a method for generating executable code from application source code. The method steps include determining a programmatic expression using the application source code and determining a first value for the programmatic expression. The method steps further include compiling the programmatic expression into a first optimized code portion using the first value, an assumption, and an expression scope. The method steps further include executing the application source code. The method steps further include determining that the programmatic expression is modified outside of the expression scope, invalidating the assumption, and de-optimizing the first optimized code portion.

MEMOIZING MACHINE-LEARNING PRE-PROCESSING AND FEATURE ENGINEERING
20230185552 · 2023-06-15 ·

A method creates a table of keys and values. Each key is an element of an input array which is an input of a machine-learning pre-processing pipeline, and each value is an output of the pipeline. The method measures (1) a hit rate H to the memo table, (2) an average time T.sub.table to look up the table, (3) an average time T.sub.pipeline to execute the pipeline, and (4) a threshold T.sub.elements on a number of elements of the input array. The method looks up the value in the table by using an element of the input array as a key when T.sub.pipeline × H > T.sub.table and the number of elements in the input array is less than T.sub.elements. The method calls the pipeline in place of the lookup for all of the remaining elements in the input array when the value is not in the table.

STATIC BLOCK FREQUENCY PREDICTION IN IRREDUCIBLE LOOPS WITHIN COMPUTER CODE
20230185551 · 2023-06-15 ·

A block frequency of a block in an irreducible loop in computer code is statically determined. The statically determining includes splitting an incoming block mass among multiple loop headers of the irreducible loop to provide an initial mass for the block. A bottom-up traversal and a top-down traversal of a plurality of loops of the computer code including the irreducible loop are iteratively performed to update a mass of the block. The iteratively performing commences with propagating the initial mass of the block to one or more blocks of one or more loops of the plurality of loops and continues with propagating and updating masses of select blocks of the plurality of loops until a predefined point is reached providing a resulting mass for the block. The block frequency of the block is determined using the resulting mass and is to be used in processing associated with the computer code.

Framework for user-directed profile-driven optimizations
11675574 · 2023-06-13 · ·

A method for using profiling to obtain application-specific, preferred parameter values for an application is disclosed. First, a parameter for which to obtain an application-specific value is identified. Code is then augmented for application-specific profiling of the parameter. The parameter is profiled and profile data is collected. The profile data is then analyzed to determine the application's preferred parameter value for the profile parameter.

COMPILING MODELS FOR DEDICATED HARDWARE

The subject technology provides receiving a neural network (NN) model to be executed on a target platform, the NN model including multiple layers that include operations and some of the operations being executable on multiple processors of the target platform. The subject technology further sorts the operations from the multiple layers in a particular order based at least in part on grouping the operations that are executable by a particular processor of the multiple processors. The subject technology determines, based at least in part on a cost of transferring the operations between the multiple processors, an assignment of one of the multiple processors for each of the sorted operations of each of the layers in a manner that minimizes a total cost of executing the operations. Further, for each layer of the NN model, the subject technology includes an annotation to indicate the processor assigned for each of the operations.

Asynchronous parallel processing of log data
09830368 · 2017-11-28 · ·

Techniques to process machine generated log data are disclosed. In various embodiments, a parser definition associated with a set of log data is determined. The parser definition is compiled to create an instance of a parser to parse the set of log data. The parser has a hierarchical structure comprising a plurality of hierarchically related nodes, each of at least a subset of said nodes having associated therewith one or more actors each configured to parse data associated with that node. At least a portion of the set of log data is sent to the parser instance prior to compilation of said parser instance being completed. A first node of the parser instance is configured to receive and parse log data associated with the first node even if compilation of the parser definition has not been completed with respect to a second node of said parser instance.

PROPAGATING REDUCED-PRECISION ON COMPUTATION GRAPHS
20220365763 · 2022-11-17 ·

Methods, systems, and apparatus for propagating reduced-precision on computation graphs are described. In one aspect, a method includes receiving data specifying a directed graph that includes operators for a program. The operators include first operators that each represent a numerical operation performed on numerical values having a first level of precision and second operators that each represent a numerical operation performed on numerical values having a second level of precision. One or more downstream operators are identified for a first operator. A determination is made whether each downstream operator represents a numerical operation that is performed on input values having the second level of precision. Whenever each downstream operator represents a numerical operation that is performed on input values having the second level of precision, a precision of numerical values output by the operation represented by the first operator is adjusted to the second level of precision.

Compiler Global Memory Access Optimization In Code Regions Using Most Appropriate Base Pointer Registers
20170337142 · 2017-11-23 ·

A processing device includes a target processor instruction memory to store a plurality of target processor instructions that include a plurality of global memory access instructions. The processing device further includes a compiler to communicate with the target processor instruction memory, the compiler including: a global variable candidate detection module to identify a global memory access instruction within a set of code regions that use a set of global variable candidates to access a global memory, and a memory access optimization module to modify the global memory access instruction, wherein the modified global memory access instruction utilizes an unused base pointer register of a set of unused base pointer register candidates within the set of code regions, a global variable from the set of global variable candidates to be used as a base address, and an offset relative to the base address to access the global memory.

Systems and methods for energy proportional scheduling

A compilation system using an energy model based on a set of generic and practical hardware and software parameters is presented. The model can represent the major trends in energy consumption spanning potential hardware configurations using only parameters available at compilation time. Experimental verification indicates that the model is nimble yet sufficiently precise, allowing efficient selection of one or more parameters of a target computing system so as to minimize power/energy consumption of a program while achieving other performance related goals. A voltage and/or frequency optimization and selection is presented which can determine an efficient dynamic hardware configuration schedule at compilation time. In various embodiments, the configuration schedule is chosen based on its predicted effect on energy consumption. A concurrency throttling technique based on the energy model can exploit the power-gating features exposed by the target computing system to increase the energy efficiency of programs.

APPARATUS AND METHOD TO COMPILE A VARIADIC TEMPLATE FUNCTION
20170329585 · 2017-11-16 · ·

An apparatus duplicates a process code of a variadic template function that has a variable number of parameters in a source code, in association with each of actual arguments in an actual-argument list corresponding to a variadic parameter defined by a variadic operator that packs the variable number of parameters of the variadic template function. The apparatus substitutes another parameter in each duplicated process code with a prepared parameter that accepts the actual argument associated with the each duplicated process code. The apparatus firstly inserts, into a recursive call part in a process code of the variadic template function, a first duplicated process code that is associated with an actual argument at a head of the actual-argument list, and repeats inserting, into a recursive call part in the previously inserted duplicated process code, a next duplicated process code associated with a subsequent actual argument.