Power Optimization In An Artificial Intelligence Processor
20210081019 ยท 2021-03-18
Inventors
Cpc classification
G06F1/28
PHYSICS
Y02D10/00
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
International classification
G06F1/28
PHYSICS
Abstract
In one embodiment, the present disclosure includes a method of reducing power in an artificial intelligence processor. For each cycle, over a plurality of cycles, an AI model is translated into operations executable on an artificial intelligence processor. The translating is based on power parameters that correspond to power consumption and performance of the artificial intelligence processor. The AI processor is configured with the executable operations, and input activation data sets are processed. Accordingly, result sets, power consumption data, and performance data are generated and stored over the plurality of cycles. The method further includes training an AI algorithm using the stored parameters, the power consumption data, and the performance data. A trained AI algorithm outputs a plurality of optimized parameters to reduce power consumption of the AI processor. The AI model is then translated into optimized executable operations based on the plurality of optimized parameters.
Claims
1. A non-transitory computer-readable storage medium having stored thereon computer executable instructions, which when executed by a compiler operating on at least one computer processor, cause the at least one computer processor to be operable for: receiving a set of parameters produced using at least power consumption data and performance data generated by executing a first plurality of operations on an artificial intelligence (AI) processor over a first time period; and translating a model for the AI processor into a second plurality of operations by using at least a subset of the parameters from the set controlling utilization of resources in the AI processor over a second time period during execution of the second operations on the AI processor, wherein a power consumed for executing the second operations on the AI processor over the second time period is less than another power consumed for executing the first operations on the AI processor over the first time period, the power consumed over the second time period maintained below a threshold level.
2. The computer-readable storage medium of claim 1, wherein the at least one computer processor is further operable for training an AI algorithm using at least the power consumption data and the performance data to produce the set of parameters.
3. The computer-readable storage medium of claim 2, wherein the trained AI algorithm comprises a plurality of elements representing values and mathematical operations associated with the second operations.
4. The computer-readable storage medium of claim 1, wherein the compiler comprises a plurality of stages, and the at least one computer processor is further operable for: reconfiguring the model into another model for the AI processor; and translating the other model into the second operations.
5. The computer-readable storage medium of claim 4, wherein the at least one computer processor is further operable for: reconfiguring the model into the other model by selectively mapping a plurality of elements of the model modeling other resources in the AI processor utilized over the first time period into a plurality of elements of the other model modeling the resources utilized over the second time period using at least the subset of the parameters to reduce an amount of power consumed by the AI processor over the second time period.
6. The computer-readable storage medium of claim 4, wherein the at least one computer processor is further operable for: translating the other model into the second operations by selectively scheduling the second operations for execution on the AI processor over the second time period using at least the subset of the parameters.
7. The computer-readable storage medium of claim 4, wherein the at least one computer processor is further operable for: translating the other model into the second operations by scheduling one or more no operation (NOOP) operations using at least the subset of the parameters to reduce an amount of power consumed by the AI processor over the second time period.
8. The computer-readable storage medium of claim 4, wherein the at least one computer processor is further operable for: reconfiguring the model into the other model by converting convolutions into matrix multiplications performed by a first portion of the resources and into vector operations performed by a second portion of the resources resulting into fewer matrix multiplications, more vector operations, and the consumed power over the second time period below the threshold level.
9. The computer-readable storage medium of claim 4, wherein the at least one computer processor is further operable for: reconfiguring the model into the other model by selectively controlling modifications of the model using least the subset of the parameters, the modifications reducing an amount of power consumed at the AI processor over the second time period.
10. The computer-readable storage medium of claim 1, wherein the at least one computer processor is further operable for: resolving dependencies in the model to determine a plurality of operations for parallel execution on the AI processor; and including the plurality of operations into the other model.
11. The computer-readable storage medium of claim 1, wherein the at least one computer processor is further operable for: dynamically configuring the resources for performing the second operations over the second time period, based on at least the subset of the parameters.
12. The computer-readable storage medium of claim 1, wherein the at least one computer processor is further operable for: determining data paths to be utilized in the AI processor during the execution of the second operations based on at least the subset of the parameters.
13. The computer-readable storage medium of claim 1, wherein the at least one computer processor is further operable for: determining sub-circuits in a plurality of circuits to be utilized in the AI processor during the execution of the second operations based on at least the subset of the parameters.
14. The computer-readable storage medium of claim 1, wherein the at least one computer processor is further operable for: dynamically turning on and off the resources at predetermined time during the execution of the second operations on the AI processor, based on at least the subset of the parameters.
15. A system comprising: an artificial intelligence (AI) processor comprising a plurality of circuits for inferring a solution to an AI problem defined by a model for the AI processor by executing a plurality of operations to perform convolutions; and a dynamic compiler having means for: receiving a set of parameters produced using at least power consumption data and performance data generated by executing a first plurality of operations on the AI processor over a first time period, and translating a model for the AI processor into a second plurality of operations by using at least a subset of the parameters from the set controlling utilization of resources in the AI processor over a second time period during execution of the second operations on the AI processor, wherein a power consumed for executing the second operations on the AI processor over the second time period is less than another power consumed for executing the first operations on the AI processor over the first time period, the power consumed over the second time period maintained below a threshold level.
16. The system of claim 15, wherein the model comprises a tensor flow model for the AI processor.
17. The system of claim 15, wherein the AI processor comprises a matrix multiplier circuit, a memory circuit, a vector processor circuit, and a switch matrix circuit.
18. A method comprising: receiving, at a compiler operating on a computer, a set of parameters produced using at least power consumption data and performance data generated by executing a first plurality of operations on an artificial intelligence (AI) processor over a first time period; and translating, in the compiler, a model for the AI processor into a second plurality of operations by using at least a subset of the parameters from the set controlling utilization of resources in the AI processor over a second time period during execution of the second operations on the AI processor, wherein a power consumed for executing the second operations on the AI processor over the second time period is less than another power consumed for executing the first operations on the AI processor over the first time period, the power consumed over the second time period maintained below a threshold level.
19. The method of claim 18, wherein an AI algorithm is trained using at least the power consumption data and the performance data to produce the set of parameters, the trained AI algorithm comprises a plurality of elements representing values and mathematical operations associated with the second operations.
20. The method of claim 18, further comprising: reconfiguring the model into the other model by selectively mapping a plurality of elements of the model modeling other resources in the AI processor utilized over the first time period into a plurality of elements of the other model modeling the resources utilized over the second time period using at least the subset of the parameters to reduce an amount of power consumed by the AI processor over the second time period; and translating the other model into the second operations.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
DETAILED DESCRIPTION
[0026] In the following description, for purposes of explanation, numerous examples and specific details are set forth in order to provide a thorough understanding of the present disclosure. Such examples and details are not to be construed as unduly limiting the elements of the claims or the claimed subject matter as a whole. It will be evident to one skilled in the art, based on the language of the different claims, that the claimed subject matter may include some or all of the features in these examples, alone or in combination, and may further include modifications and equivalents of the features and techniques described herein.
[0027]
[0028] Advantageously, compiler 103 may be configured with parameters to modify the translation process to optimize power and performance. For example, translating the AI model into the executable operations may be based on a plurality of parameters. The parameters may modify various steps of the translation process that impact power and performance. Translating an AI model into executable operations may entail a wide range of choices that impact power and performance (e.g., what operations can be performed in parallel, the structure of the optimized AI model itself, the number of matrix multiplications versus the number of vector operations, how the model is mapped to the particular operations, and the scheduling of such operations in the AI processor). Power parameters may be used to modify a wide range of decisions made by compiler 103, thereby changing the performance and power achieved by AI processor 110. Power and performance are observable results stemming from different values of parameters and different combinations of such values, for example. Accordingly, embodiments of the disclosure may advantageously gather and store power and performance data, and the parameter values that generated them, and train an AI algorithm 141 to optimize the parameters such that power and performance are optimized.
[0029] For example, an AI model 102 may be received and translated by compiler 103 to produce executable operations using one set of power parameters. The AI processor may then be configured with the executable operations. Once configured, a plurality of input activation data sets may be processed in the AI processor 110 to generate result sets, power consumption data, and performance data.
[0030] The power parameters may be used as features input to an AI algorithm. Accordingly, a corpus of power parameters, power consumption data, and performance data over many execution cycles (e.g., where activations are processed and results produced) may be stored at 142 (e.g., in a data repository on computer system 101) to form a training set, for example. The training set may be used to train AI algorithm 141. The trained AI algorithm, in turn, may output a plurality of optimized parameters to reduce power consumption of the artificial intelligence processor, for example. The parameters may be trained to optimize power consumption and performance based on the corpus 142 of past power and performance for different parameters, for example. Accordingly, AI model 102 may be translated again into a new set of executable operations based on the optimized parameters from AI algorithm 141. When the new executable operations are used to configure AI processor 110, input activations may be processed more efficiently.
[0031]
[0032]
[0033] As illustrated in
[0034] As another example, translating model 302 into model 303 may include converting convolutions into operations supported by circuit blocks of the AI processor. An example of matrix convolution of a kernel of weights in an AI model and an array of input values is shown in
[0035]
[0036] In various embodiments, an AI processor may comprise a variety of hardware resources, the utilization of which may involve a power/performance tradeoff. For example, in one embodiment, an AI processor may comprise numerous data streams (or data paths. Utilizing more data paths may improve performance, but increase power consumption. Conversely, utilizing fewer data paths may reduce performance, but decrease power consumption. Accordingly, one or more parameters may control the number of data paths used in the AI processor. Additionally, in one embodiment, various processing circuit blocks in the AI processor may comprise multiple subblocks (e.g., computational elements). For example, a memory, vector processor, switch matrix, or matrix multiplier may comprise numerous subblocks (e.g., tiles or cells). If more subblocks are used, performance may improve, but more power is consumed. Conversely, if fewer subblocks are used, performance may be reduced, but less power is consumed. Accordingly, one or more parameters may control how many subblocks in each of the different circuit blocks of an AI processor are used during various times during processing.
[0037] Furthermore, selective scheduling may be used to throttle power and performance up or down based on the parameters. For example, a first schedule of the executable operands may correspond to a first power consumption. However, changing the parameters may change various aspects of the scheduling, which may impact power and performance. Accordingly, a second schedule of the executable operands may correspond to different power consumption. As but one example, stage 701 may selectively schedule one or more no operation (NOOP) operations based on the parameters to control the amount of power consumed during execution. This technique is illustrated in
[0038]
[0039]
[0040] The above description illustrates various embodiments of the present disclosure along with examples of how aspects of the particular embodiments may be implemented. The above examples should not be deemed to be the only embodiments, and are presented to illustrate the flexibility and advantages of the particular embodiments as defined by the following claims. Based on the above disclosure and the following claims, other arrangements, embodiments, implementations and equivalents may be employed without departing from the scope of the present disclosure as defined by the claims.