Patent classifications
G06F8/4432
ENERGY EFFICIENT MICROPROCESSOR WITH INDEX SELECTED HARDWARE ARCHITECTURE
An SoC maintains the full flexibility of a general-purpose microprocessor while providing energy efficiency similar to an ASIC by implementing software-controlled virtual hardware architectures that enable the SoC to function as a virtual ASIC. The SoC comprises a plurality of “Stella” Reconfigurable Multiprocessors (SRMs) supported by a Network-on-a-Chip that provides efficient data transfer during program execution. A hierarchy of programmable switches interconnects the programmable elements of each of the SRMs at different levels to form their virtual architectures. Arithmetic, data flow, and interconnect operations are also rendered programmable. An architecture index” points to a storage location where pre-determined hardware architectures are stored and extracted during program execution. The programmed architectures are able to mimic ASIC properties such as variable computation types, bit-resolutions, data flows, and amount and proportions of compute and data flow operations and sizes. Once established, each architecture remains in place as long as needed.
Information processing apparatus, recording medium for information processing program, and information processing method
An information processing apparatus, includes a computation processing device that includes a memory and a processor coupled to the memory; and a storage device that stores a program, and wherein the processor is configured to: store, in the memory, a first storage area for first data that is assigned to a computation target by data definition for the computation target written in the program and a second storage area for second data that is assigned to the computation target instead of the first data, simplify the program, when the data definition for the computation target is omitted by executing the simplified program, output the second data, and perform the computation by using the output second data.
COMPILER CONFIGURABLE TO GENERATE INSTRUCTIONS EXECUTABLE BY DIFFERENT DEEP LEARNING ACCELERATORS FROM A DESCRIPTION OF AN ARTIFICIAL NEURAL NETWORK
Systems, devices, and methods related to a Deep Learning Accelerator and memory are described. For example, an integrated circuit device may be configured to execute instructions with matrix operands and configured with random access memory (RAM). A compiler can convert a description of an artificial neural network into a generic result of compilation according to a specification of a generic Deep Learning Accelerator and then map the first result of compilation into a platform-specific result according to a specification of a specific hardware platform of Deep Learning Accelerators. The platform-specific result can be stored into the RAM of the integrated circuit device to enable the integrated circuit device to autonomously perform the computation of the artificial neural network in generating an output in response to an input to the artificial neural network.
Method and apparatus for balancing binary instruction burstization and chaining
A method for grouping computer instructions includes receiving a set of computer instructions, grouping the set of computer instructions by register dependencies, identifying a plurality of single-definition-use flow (SDF) bundles based on a burstization criteria and a chaining criteria; and based on the SDF bundles, transforming the set of computer instructions. The transformation may include splitting one of the set of computer instructions and setting a burst parameter for the one of the set of computer instruction. The transformation may include grouping a plurality of the set of computer instructions and replacing a pair of register file accesses with a pair of temporary register accesses.
PROGRAM REWRITE DEVICE, STORAGE MEDIUM, AND PROGRAM REWRITE METHOD
A program rewrite method executed by a computer, the method includes rewriting a program to output a first output group by performing operations for a first variable among a plurality of variables with a plurality of data types; rewriting the program to output a second output group by performing operations for a second variable among the plurality of variables with a plurality of data types; identifying, from the first output group and the second output group, a third output group that satisfied a predetermined criterion as a result of executing the rewritten programs; determining a data type that corresponds to the third output group as a use data type; and outputting a program in which the use data type is set for each of the plurality of variables.
VLIW power management
VLIW directed Power Management is described. In accordance with described techniques, a program is compiled to generate instructions for execution by a very long instruction word machine. During the compiling, power configurations for the very long instruction word machine to execute the instructions are determined, and fields of the instructions are populated with the power configurations. In one or more implementations, an instruction that includes a power configuration for the very long instruction word machine and operations for execution by the very long instruction word machine is obtained. A power setting of the very long instruction word machine is adjusted based on the power configuration of the instruction, and the operations of the instruction are executed by the very long instruction word machine.
Propagating reduced-precision on computation graphs
Methods, systems, and apparatus for propagating reduced-precision on computation graphs are described. In one aspect, a method includes receiving data specifying a directed graph that includes operators for a program. The operators include first operators that each represent a numerical operation performed on numerical values having a first level of precision and second operators that each represent a numerical operation performed on numerical values having a second level of precision. One or more downstream operators are identified for a first operator. A determination is made whether each downstream operator represents a numerical operation that is performed on input values having the second level of precision. Whenever each downstream operator represents a numerical operation that is performed on input values having the second level of precision, a precision of numerical values output by the operation represented by the first operator is adjusted to the second level of precision.
Energy efficient microprocessor with index selected hardware architecture
An SoC maintains the full flexibility of a general-purpose microprocessor while providing energy efficiency similar to an ASIC by implementing software-controlled virtual hardware architectures that enable the SoC to function as a virtual ASIC. The SoC comprises a plurality of “Stella” Reconfigurable Multiprocessors (SRMs) supported by a Network-on-a-Chip that provides efficient data transfer during program execution. A hierarchy of programmable switches interconnects the programmable elements of each of the SRMs at different levels to form their virtual architectures. Arithmetic, data flow, and interconnect operations are also rendered programmable. An architecture index” points to a storage location where pre-determined hardware architectures are stored and extracted during program execution. The programmed architectures are able to mimic ASIC properties such as variable computation types, bit-resolutions, data flows, and amount and proportions of compute and data flow operations and sizes. Once established, each architecture remains in place as long as needed.
Methods and systems for program optimization utilizing intelligent space exploration
Embodiments for program optimization are provided. A program is compiled with respect to a performance result utilizing a set of parameters. Information associated with the compiling of the program is collected. The collected information is external to the performance result. The set of parameters is changed based on the collected information.
METHOD AND APPARATUS FOR BALANCING BINARY INSTRUCTION BURSTIZATION AND CHAINING
A method for grouping computer instructions includes receiving a set of computer instructions, grouping the set of computer instructions by register dependencies, identifying a plurality of single-definition-use flow (SDF) bundles based on a burstization criteria and a chaining criteria; and based on the SDF bundles, transforming the set of computer instructions. The transformation may include splitting one of the set of computer instructions and setting a burst parameter for the one of the set of computer instruction. The transformation may include grouping a plurality of the set of computer instructions and replacing a pair of register file accesses with a pair of temporary register accesses.