G06F8/445

Systems and methods of auditing server parameters in a telephony network
11593090 · 2023-02-28 · ·

The current disclosure relates to a system and method for auditing application parameters in a communication network. In particular, the method includes retrieving a set of golden parameter master lists from an auditing database, where each golden parameter master list corresponds to an application of the telecommunications network, and where each golden parameter master list comprises a set of parameter values for the respective application. The method also includes selecting a list of one or more applications for auditing, and for each application in the list: selecting a corresponding golden parameter master list from the set of golden parameter master lists and comparing each of the parameter values of the golden parameter master list to a corresponding set of parameter values of the application to generate a set of discrepancies.

AUTOMATIC TUNING OF A HETEROGENEOUS COMPUTING SYSTEM
20230236947 · 2023-07-27 ·

The present invention provides a method of configuring program parameters during run-time of a computing program for computation in a heterogeneous computing system. A compile program is processed in an autotuning system to optimize the parameters of an application for processing in a heterogeneous system comprising, for example CPU and GPU cores.

Applications for hardware accelerators in computing systems
11561779 · 2023-01-24 · ·

An example method of implementing an application for a hardware accelerator having a programmable device coupled to memory is disclosed. The method includes compiling source code of the application to generate logical circuit descriptions of kernel circuits; determining resource availability in a dynamic region of programmable logic of the programmable device, the dynamic region exclusive of a static region of the programmable logic programmed with a host interface configured to interface a computing system having the hardware accelerator; determining resource utilization by the kernel circuits in the dynamic region; determining fitting solutions of the kernel circuits within the dynamic region, each of the fitting solutions defining connectivity of the kernel circuits to banks of the memory; adding a memory subsystem to the application based on a selected fitting solution of the fitting solutions; and generating a kernel image configured to program the dynamic region to implement the kernel circuits and the memory subsystem.

Methods and systems for distributing instructions amongst multiple processing units in a multistage processing pipeline
11693664 · 2023-07-04 · ·

Methods and systems for distributing instructions amongst processing units in a processing pipeline are disclosed. A method includes compiling a set of instructions for a stage of a multistage programmable processing pipeline in which the stage of the multistage programmable processing pipeline includes multiple processing units configured to processes instructions in parallel, wherein compiling the set of instructions includes, identifying first and second subsets of instructions within the set of instructions that can be executed independent of each other, assigning the first subset of instructions to a first processing unit of the stage, assigning the second subset of instructions to a second processing unit of the stage, and executing the first and second subsets of instructions in parallel at the first and second processing units, respectively.

METHODS AND SYSTEMS FOR DISTRIBUTING INSTRUCTIONS AMONGST MULTIPLE PROCESSING UNITS IN A MULTISTAGE PROCESSING PIPELINE
20230004395 · 2023-01-05 ·

Methods and systems for distributing instructions amongst processing units in a processing pipeline are disclosed. A method includes compiling a set of instructions for a stage of a multistage programmable processing pipeline in which the stage of the multistage programmable processing pipeline includes multiple processing units configured to processes instructions in parallel, wherein compiling the set of instructions includes, identifying first and second subsets of instructions within the set of instructions that can be executed independent of each other, assigning the first subset of instructions to a first processing unit of the stage, assigning the second subset of instructions to a second processing unit of the stage, and executing the first and second subsets of instructions in parallel at the first and second processing units, respectively.

SYNCHRONIZATION BARRIER

Apparatuses, systems, and techniques to implement a barrier operation. In at least one embodiment, a memory barrier operation causes accesses to memory by a plurality of groups of threads to occur in an order indicated by the memory barrier operation.

NON-TRANSITORY COMPUTER-READABLE RECORDING MEDIUM AND COMPILATION METHOD
20220405110 · 2022-12-22 · ·

The present disclosure relates to a non-transitory computer-readable recording medium storing a complier that causes a computer to execute a process. The process includes generating a program. The program includes a first code that compares a first execution time from a start to an end of a loop processing when the loop processing is executed with a fixed-length SIMD instruction, with a second execution time from the start to the end of the loop processing when the loop processing is executed with a variable-length SIMD instruction, and a second code that executes the loop processing with the variable length SIMD instruction when a result of the comparison reveals that the first execution time is longer than the second execution time.

PARALLEL PROCESSING ARCHITECTURE FOR ATOMIC OPERATIONS
20220374286 · 2022-11-24 · ·

Techniques for task processing in a parallel processing architecture for atomic operations are disclosed. A two-dimensional array of compute elements is accessed, where each compute element within the array of compute elements is known to a compiler and is coupled to its neighboring compute elements within the array of compute elements. Control for the array of compute elements is provided on a cycle-by-cycle basis. The control is enabled by a stream of wide control words generated by the compiler. At least one of the control words involves an operation requiring at least one additional operation. A bit of the control word is set, where the bit indicates a multicycle operation. The control word is executed, on at least one compute element within the array of compute elements, based on the bit. The multicycle operation comprises a read-modify-write operation.

DETERMINING MEMORY REQUIREMENTS FOR LARGE-SCALE ML APPLICATIONS TO FACILITATE EXECUTION IN GPU-EMBEDDED CLOUD CONTAINERS

We disclose a system that executes an inferential model in VRAM that is embedded in a set of graphics-processing units (GPUs). The system obtains execution parameters for the inferential model specifying: a number of signals, a number of training vectors, a number of observations and a desired data precision. It also obtains one or more formulae for computing memory usage for the inferential model based on the execution parameters. Next, the system uses the one or more formulae and the execution parameters to compute an estimated memory footprint for the inferential model. The system uses the estimated memory footprint to determine a required number of GPUs to execute the inferential model, and generates code for executing the inferential model in parallel while efficiently using available memory in the required number of GPUs. Finally, the system uses the generated code to execute the inferential model in the set of GPUs.

Circuitry with adaptive memory assistance capabilities
11500674 · 2022-11-15 · ·

A system for running one or more applications is provided. Each application may require memory services that can be accelerated using configurable memory assistance circuits associated with different levels of a memory hierarchy. Integrated circuit design tools may be used to generate configuration data for programming the configurable memory assistance circuits. During compile time, the design tools may identify memory service patterns in a source code, match the identified memory service patterns to corresponding templates, parameterize the matching templates, and then synthesize the parameterized templates to produce the configuration data. During run time, a memory assistance scheduler may map the memory services required by each application to available memory assistance circuits in the system. The mapped memory assistance circuits are programmed by the configuration data to provide the desired memory service capability.