Patent classifications
G06F8/445
Program Generation Apparatus and Parallel Arithmetic Device
A program for causing a parallel arithmetic device including a plurality of arithmetic groups to execute parallel arithmetic is input. The program includes information defining each of the following: application arithmetic constituting predetermined processing; redundant arithmetic (which is redundant arithmetic of the application arithmetic and is arithmetic assigned to a surplus core(s) in a diagnosis target arithmetic group); and diagnostic arithmetic (arithmetic that is a comparison of results of the same redundant arithmetic by two or more diagnosis target arithmetic groups and is assigned to surplus cores in an arithmetic group for diagnosis). The surplus core(s) is a core(s) to which no application arithmetic is assigned.
Information processing method and computer-readable recording medium having stored therein optimization program
An information processing method executed by a computer, the method includes executing a target program to acquire number of executions for each of a plurality of program codes; selecting a combination of program codes related to a plurality of assignment statements from among program codes related to assignment statements having a higher number of executions based on the acquired number of executions; when the target program is changed, executing the changed target program to calculate an execution accuracy and an operation time so that parallel processing using an SIMD operation function is executed for each of the program codes related to the plurality of assignment statements included in the selected combination; and searching for the combination so that the calculated execution accuracy and operation time satisfy a predetermined condition.
GPU WAVE-TO-WAVE OPTIMIZATION
This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for GPU wave-to-wave optimization. A graphics processor may execute a shader program for a first wave associated with a draw call or a compute kernel. The graphics processor may identify at least one first indication for the first wave associated with the draw call or the compute kernel. The graphics processor may store the at least one first indication for the first wave to a memory location. The graphics processor may execute the shader program for at least one second wave associated with the draw call or the compute kernel. The execution of the shader program for the at least one second wave may be based on the shader program for the at least one second wave reading the memory location to retrieve the at least one first indication.
Language agnostic pipeline packager for machine learning
Computer-implemented methods and corresponding systems for packaging source code associated with a pipeline into an executable are provided. The methods include parsing a text string that is a textual representation of a pipeline, automatically augmenting one or more operators to the pipeline, generating source code for the pipeline, and packaging the source code into an executable for an external system. The pipeline includes a plurality of operators authored by a user using multiple programming languages to specify a plurality of operations. The one or more operators as automatically augmented include a first operator for persisting output data or metadata associated with a state of a trained model and/or a second operator for generating a monitoring metric for the trained model. The executable may be an executable file, an application, artifact, or a program that is language agnostic and can be executed in an external system using any programming language.
Rescheduling JIT compilation based on jobs of parallel distributed computing framework
A computer-implemented method is provided for compilation rescheduling from among four compilation levels comprising level 1, level 2, level 3, and level 4 on a parallel distributed computing framework running processes for a plurality of jobs of a virtual machine. The method bypasses a program analysis overhead that includes measuring a compiled method execution time by identifying completed compilation levels of a Just In Time compilation. The method finds a repetition of a same process in the processes for the plurality of jobs of the virtual machine from profiles by comparing main class names, virtual machine parameters, and Jar file types therein. The method applies a compilation scheduling for the same process a next time the same process runs based on a result of the checking the transition, by (i) compiling at the level 1 at least some methods for the same process responsive to the virtual machine finishing without compiling the at least some methods for the same process at the level 4 after compiling the at least some of the methods at a level in between the level 1 and the level 4, and (ii) compiling at the level 4 at least a subset of the methods earlier than an original scheduled time responsive to at least the subset of the methods compiled at the level 4 being infrequently invoked below a threshold amount.
Locate neural network performance hot spots
Embodiments for locating performance hot spots include collecting sample data having instruction addresses, the sample data being for a neural network model and determining instructions in the instruction addresses that are performance hot spots. A listing file is used to map the instructions of the sample data that are performance hot spots to locations in a lower-level intermediate representation. A mapping file is used to map the locations of the lower-level intermediate representation that are performance hot spots to operations in one or more higher-level representations, one or more of the operations corresponding to the performance hot spots, the mapping file being generated from compiling the neural network model.
SYSTEMS AND METHODS TO MANAGE SUB-CHART DEPENDENCIES WITH DIRECTED ACYCLIC GRAPHS
According to some embodiments, methods and systems may include a package manager chart file repository storing charts associated with a container orchestration system. A package manager platform, coupled to the package manager chart file repository, may access a first parent chart from the package manager chart file repository and determine that the first parent chart includes a dependency manifest. The package manager platform may then construct a Directed Acyclic Graph (“DAG”) based on the dependency manifest. Container orchestration system objects, including those associated with sub-charts of the first parent chart, may then be deployed in accordance with a topological ordering of the DAG.
Concept for Evaluating Hardware Tracing Records
Various examples relate to methods, computer programs, non-transitory computer-readable media, apparatuses, devices, computer systems, and a system for evaluating one or more hardware tracing records related to a hardware tracing operation or for processing a piece of software. A method for evaluating one or more hardware tracing records related to a hardware tracing operation comprises obtaining a hardware tracing record, the hardware tracing record comprising a custom information and a memory address within a deterministic distance of an instruction having triggered the hardware tracing record, identifying, based on the memory address within the deterministic distance of the instruction having triggered the hardware tracing record, a binary module containing the instruction, determining, whether a pre-defined identifier is stored at a pre-defined memory address range relative to the memory address within the deterministic distance of the instruction in the binary module, and processing information on the hardware tracing record if the pre-defined identifier is stored at the pre-defined memory address range relative to the memory address within the deterministic distance of the instruction.
Data processing systems
There is provided a data processing system comprising a host processor and a processing resource operable to perform processing operations for applications executing on the host processor by executing commands within an appropriate command stream. The host processor is configured to generate a command stream layout indicating a sequence of commands for the command stream that is then provided to the processing resource. Some commands require sensor data. The processing resource is configured to process the sensor data into command stream data for inclusion into the command stream in order to populate the command stream for execution.
PARALLEL PROCESSING ARCHITECTURE WITH SPLIT CONTROL WORD CACHES
Techniques for a parallel processing architecture with split control word caches are disclosed. A two-dimensional array of compute elements is accessed. Each compute element is known to a compiler and is coupled to its neighboring compute elements. A first control word cache is coupled to the array. The first control word cache enables loading control words to a first array portion. A second control word cache is coupled to the array. The second control word cache enables loading control words to a second array portion. The control words are split between the first and the second control word caches. The splitting is based on the constituency of the first and the second array portions. Instructions are executed within the array. Instructions executed within the first array portion use control words loaded from the first cache. Instructions executed within the second array portion use control words loaded from the second cache.