G06F8/4442

METHODS AND SYSTEMS FOR NESTED STREAM PREFETCHING FOR GENERAL PURPOSE CENTRAL PROCESSING UNITS

A method and hardware system to remove the overhead caused by having stream handling instructions in nested loops. Where code contains inner loops, nested in outer loops, a compiler pass identifies qualified nested streams and generates ISA specific instructions for transferring stream information linking an inner loop stream with an outer loop stream, to hardware components of a co-designed prefetcher. The hardware components include a frontend able to decode and execute instructions for a stream linking information transfer mechanism, a stream engine unit with a streams configuration table (SCT) having a field for allowing a subordinate stream to stay pending for values from its master stream, and a stream prefetch manager with buffers for storing values of current elements of a master stream, and with a nested streams control unit for reconfiguring and iterating the streams.

Dynamic update of the number of architected registers assigned to software threads using spill counts

A computer system includes a processor, main memory, and controller. The processor includes a plurality of hardware threads configured to execute a plurality of software threads. The main memory includes a first register table configured to contain a current set of architected registers for the currently running software threads. The controller is configured to change a first number of the architected registers assigned to a given one of the software threads to a second number of architected registers when a result of monitoring current usage of the registers by the software threads indicates that the change will improve performance of the computer system. The processor includes a second register table configured to contain a subset of the architected registers and a mapping table for each software thread indicating whether the architected registers referenced by the corresponding software thread are located in the first register table or the second register table.

Methods and systems for program optimization utilizing intelligent space exploration

Embodiments for program optimization are provided. A program is compiled with respect to a performance result utilizing a set of parameters. Information associated with the compiling of the program is collected. The collected information is external to the performance result. The set of parameters is changed based on the collected information.

Firmware publication of multiple binary images

Multiple binary images stored in the firmware of an electronic device are written to the device's configuration tables during booting of the device, where one of the binary images is a manager binary. During booting, the manager binary is saved to the file system of the operating system such that it automatically executes upon completion of booting. The manager binary then saves the other binary images to the OS file system, such that they also execute automatically.

COMPILER SUB EXPRESSION DIRECTED ACYCLIC GRAPH (DAG) REMAT FOR REGISTER PRESSURE

The present disclosure relates to devices and methods for transforming program source code using a rematerialization operation. The devices and methods may identify at least one hot spot with high register pressure in a program source code for an application and identify a plurality of live variables within the at least one hot spot. The devices and methods may group the plurality of live variables by a basic block that has contained a define or single use of the plurality of live variables. The devices and methods may build a directed acyclic graph (DAG) for each basic block that has a grouped plurality of live variables. The devices and methods may save the DAG as a candidate instruction to move in the program source code and may generate transformed program source code for the application by moving the candidate instruction.

INSTRUCTION PREFETCH MECHANISM

An apparatus to facilitate data prefetching is disclosed. The apparatus includes a cache, one or more execution units (EUs) to execute program code, prefetch logic to maintain tracking information of memory instructions in the program code that trigger a cache miss and compiler logic to receive the tracking information, insert one or more pre-fetch instructions in updated program code to prefetch data from a memory for execution of one or more of the memory instructions that triggered a cache miss and download the updated program code for execution by the one or more EUs.

Information processing apparatus and information processing method
11080030 · 2021-08-03 · ·

An information processing apparatus includes: a memory; and a processor coupled to the memory and configured to: receive input of an input program in which a plurality of statements is written in a loop; generate a counting program for causing a computing machinery to execute a process of counting the number of cache misses and the number of cache hits that are expected when the loop is executed for each of pairs of the statements by rewriting the input program; and split the loop into a plurality of loops based on the number of cache misses and the number of cache hits counted in the process.

Cache efficient reading of result values in a column store database
11119742 · 2021-09-14 · ·

A system for cache efficient reading of column values in a database is provided. In some aspects, the system performs operations including pre-fetching, asynchronously and in response to a request for data in a column store database system, a plurality of first values associated with the requested data. The request may identify a row of the column store database system associated with the requested data. The plurality of first values may be located in the row. The operations may further include storing the plurality of first values in a cache memory. The operations may further include pre-fetching, asynchronously and based on the plurality of first values, a plurality of second values. The operations may further include storing the plurality of second values in the cache memory. The operations may further include reading, in response to the storing the plurality of second values, the requested data from the cache memory.

SYSTEMS AND METHODS FOR DATA PROCESSING

A method for data processing is provided. The method may include: preprocessing initial data to obtain preprocessed data; storing the preprocessed data; receiving a data request made through an application, the data request including information relating to a storage path of contents that are requested; in response to the data request, determining, by a nearby proxy of a first proxy cluster in a first region, whether the contents requested in the data request are cached locally; and in response to a determination that the contents are cached locally, providing, by the nearby proxy, the contents to the application; or in response to a determination that the contents are not cached locally, acquiring, by the nearby proxy, the contents based on the information relating to the storage path of the contents; and providing, by the nearby proxy, the contents to the application.

METHODS AND SYSTEMS FOR PROGRAM OPTIMIZATION UTILIZING INTELLIGENT SPACE EXPLORATION

Embodiments for program optimization are provided. A program is compiled with respect to a performance result utilizing a set of parameters. Information associated with the compiling of the program is collected. The collected information is external to the performance result. The set of parameters is changed based on the collected information.