Patent classifications
G06F8/4434
METHOD AND SYSTEM FOR SOFTWARE ENHANCEMENT AND MANAGEMENT
A software enhancement and management system (E&M System) can include two ways to decompose existing software such that new functionality can be added: functional decomposition and time-affecting linear pathway (TALP) decomposition. Functional decomposition captures the inputs and outputs of the existing software's functions and attaches the new algorithmic constructs presented as other functions that receive the outputs of the existing software's functions. TALP decomposition allows for the generation of time-prediction polynomials that approximate time-complexity functions, speedup, and automatic dynamic loop-unrolling-based parallelization for each TALP.
Information processing apparatus, computer-readable recording medium storing compiling program, and compiling method
An information processing apparatus includes a processor configured to: for each of a plurality of loops, acquire loop information including a number of variables, a number of registers, a number of memory commands for inputting and outputting a value of the variable between the register and a main storage device, and a number of arithmetic commands for the value of the variable stored in the register, which are used in the loop; calculate the number of variables, the number of registers, the number of memory commands, and the number of arithmetic commands, which correspond to a combination of the loops that are candidates for loop fusion, for each of the combinations of the loops; determine a combination to which the loop fusion is to be applied among the combinations which are calculated for each of the combinations; and execute the loop fusion on the determined combination.
MONITORING STACK MEMORY USAGE TO OPTIMIZE PROGRAMS
A computer system determines stack usage. An intercept function is executed to store a stack marker in a stack, wherein the intercept function is invoked when a program enters or exits each function of a plurality of functions of the program. A plurality of stack markers are identified in the stack and a memory address is determined for each stack marker during execution of the program to obtain a plurality of memory addresses. The plurality of memory addresses are analyzed to identify a particular memory address associated with a greatest stack depth. A stack usage of the program is determined based on the greatest stack depth. Embodiments of the present invention further include a method and program product for determining stack usage in substantially the same manner described above.
ELECTRONIC DEVICE AND METHOD FOR MANAGING MEMORY OF ELECTRONIC DEVICE
According to an embodiment, an electronic device includes: at least one processor and a memory configured to store instructions that can be executed by the processor, wherein the processor may be configured to: monitor information about the storage space of the memory and usage histories of a plurality of objects executed by the processor, determine a target object, of which the compile scheme is to be changed, among the plurality of objects based on at least one of the information and the usage histories; and increase the free storage space of the memory by changing the compile scheme of the target object.
Computation modification by amplification of stencil including stencil points
In a sequence of major computational steps or in an iterative computation, a stencil amplifier can increase the number of data elements accessed from one or more data structures in a single major step or iteration, thereby decreasing the total number of computations and/or communication operations in the overall sequence or the iterative computation. Stencil amplification, which can be optimized according to a specified parameter such as compile time, rune time, code size, etc., can improve the performance of a computing system executing the sequence or the iterative computation in terms of run time, memory load, energy consumption, etc. The stencil amplifier typically determines boundaries, to avoid erroneously accessing data elements not present in the one or more data structures.
METHOD TO AVOID MEMORY BANK CONFLICTS AND PIPELINE CONFLICTS IN TENSOR MEMORY LAYOUT
A method for optimizing a layout of a tensor memory defines at least one hard constraint for allocating a plurality of input/output (I/O) vectors for reading and writing data for a task in the tensor memory. The at least one hard constraint is applied to determine one or more potential conflicts between the plurality of I/O vectors. One or more soft constraints aimed at mitigating the one or more potential conflicts between the I/O vectors may also be generated. The at least one hard constraint is applied in a maximum satisfiability (MaxSAT) solver. The one or more soft constraints may also be applied in the MaxSAT solver. The MaxSAT solver determines locations of the data in the tensor memory. The starting addresses of the input data to be read and of output data to be written by each of the I/O vectors are updated in the tensor memory.
Applications for hardware accelerators in computing systems
An example method of implementing an application for a hardware accelerator having a programmable device coupled to memory is disclosed. The method includes compiling source code of the application to generate logical circuit descriptions of kernel circuits; determining resource availability in a dynamic region of programmable logic of the programmable device, the dynamic region exclusive of a static region of the programmable logic programmed with a host interface configured to interface a computing system having the hardware accelerator; determining resource utilization by the kernel circuits in the dynamic region; determining fitting solutions of the kernel circuits within the dynamic region, each of the fitting solutions defining connectivity of the kernel circuits to banks of the memory; adding a memory subsystem to the application based on a selected fitting solution of the fitting solutions; and generating a kernel image configured to program the dynamic region to implement the kernel circuits and the memory subsystem.
Compiler-optimized context switching with compiler-inserted data table for in-use register identification at a preferred preemption point
Compiler-optimized context switching may include receiving an instruction indicating a preferred preemption point comprising an instruction address; storing the preferred preemption point in a data structure; determining, based on the data structure, that the preferred preemption point has been reached by a first thread; determining that preemption of the first thread for a second thread has been requested; and performing a context switch to the second thread.
Compiling application with multiple function implementations for garbage collection
Functions of an application may include multiple implementations that have corresponding behaviors but perform different garbage collection-related activities such that the different implementations may be executed during different garbage collection phases to reduce overall garbage collection overhead during application execution.
Reshape and broadcast optimizations to avoid unnecessary data movement
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for transforming patterns of operations on tensors in a computational graph to reduce the memory burden incurred when reshape operations are performed, in particular when deployed to hardware platforms that have vector instructions or vector memory requiring alignment of operands.