Analysis system and method for reducing the control flow divergence in the Graphics Processing Units (GPUs)

20170330303 ยท 2017-11-16

    Inventors

    Cpc classification

    International classification

    Abstract

    The invention discloses an analysis system and method for reducing control flow divergence in the Graphics Processing Units (GPUs). A computing unit is used to count the number of branch, number of cycle, and to calculate at least one direction ratio. A profiler is used to determine whether the code having the optimized control flow structure and the specialized branch or not. The optimization decision unit can determine which transform pattern can be used to transform the sub-control flow structure.

    Claims

    1. An analysis system for reducing a flow control divergence in a GPU, said GPU comprises a computing unit and a hardware performance profiling support unit. the computing unit is used to count a number of at least one branch, a number of at least one cycle, and to calculate at least one direction ratio of a counting program code, and to send an information to a hardware performance profiling support unit, said analysis system comprises: a compiling unit, said compiling unit loads said counting program code; a profiler unit, said profiler unit is connected to said compiling unit, said profiler unit receiving said number of branch, said number of cycle, and said direction ratio transmitted by said hardware performance profiling support unit, said profiler unit determining whether said counting program code having an optimized sub-flow control structure and an optimized specific branch instruction in accordance with said number of branch, said number of cycle, and said direction ratio; and an optimization unit, said optimization unit being connected to said profiler unit; wherein, when said profiler unit determines whether said counting program code having said optimized sub-flow control structure and said specific branch instruction or not, said profiler unit calculating said branch ratio of said each branch and a number of a specific branch instruction, an optimization unit determining to use a transform pattern to transform said sub-flow control structure in accordance with said branch ratio and said number of said specific branch instruction.

    2. The analysis system according to claim 1, wherein said profiler unit further comprises a control flow profiler unit, said control flow profiler unit is connected to a compiling unit, said control flow profiler unit determines whether a counting program code having said sub-flow control structure, and said specific branch instruction or not.

    3. The analysis system according to claim 1, wherein said profiler unit further comprises an execution profiling unit, said execution profiling unit is connected to a compiling unit, said execution profiling unit determines an execution efficiency of counting program code in accordance with said number of branch, said number of cycle, and said direction ratio.

    4. The analysis system according to claim 1, wherein said profiler unit further comprises a branch data profiler unit, said branch data profiler unit calculates said branch ratio, and said number of said specific branch instruction for each branch.

    5. The analysis system according to claim 1, wherein said direction ratio is a ratio of a number of execution branch and a number of un-execution branch.

    6. The analysis system according to claim 1, wherein said optimization unit further comprises an optimization decision unit, said optimization decision unit determines which transform pattern is used for transforming said sub-control flow structure in accordance with said branch ratio and said number of said specific branch instruction.

    7. The analysis system according to claim 6, wherein said optimization unit further comprises a transform pattern unit, said transform pattern unit transforms said sub-flow control structure in accordance with said transform pattern determined by said optimization decision unit.

    8. An analysis method for reducing a flow control divergence in a GPU, comprising: loading and executing a counting program code on a GPU, said GPU counting a number of at least one branch, a number of at least one cycle, and calculating at least one direction ratio of said counting program code; using a compiling unit to load said counting program code and producing a control flow diagram; using a profiler unit to determine whether said counting program code having a sub-flow control structure or not, wherein said sub-flow control structure is optimized; determining said optimized sub-flow control structure, said profiler unit determining whether said counting program code having a specific branch instruction or not; when said profiler unit determining whether said counting program code having a specific branch instruction or not, said compiling unit compiling said counting program code to form a specific flow control structure; when the profiler unit determining whether said counting program code having a specific branch instruction or not, said profiler unit calculating said branch ratio, and said number of said specific branch instruction for each branch; said optimization unit determining a conversion mode according to said branch ratio and said number of said specific branch instruction, and said optimization unit determining said conversion mode to convert said sub-flow control structure.

    9. The analysis method according to claim 8, wherein said profiler further comprises a control flow profiler unit, said control flow profiler unit is connected to a compiling unit, said control flow profiler unit determines whether said counting program code having said sub-flow control structure, and said specific branch instruction or not.

    10. The analysis method according to claim 8, wherein profiler unit further comprises an execution profiling unit, said execution profiling unit is connected to a compiling unit, said execution profiling unit determines an execution efficiency of said counting program code in accordance with said number of branch, said number of cycle, and said direction ratio.

    11. The analysis method according to claim 8, wherein profiler unit further comprises a branch data profiler unit, said branch data profiler unit calculates said branch ratio and said number of said specific branch instruction for each branch.

    12. The analysis method according to claim 8, wherein said optimization unit further comprises an optimization decision unit, said optimization unit determines which transform pattern to transform said sub-control flow structure in accordance with said branch ratio and said number of said specific branch instruction.

    13. The analysis method according to claim 8, wherein said optimization unit further comprises a transform pattern unit, said transform pattern unit determines said conversion mode to convert said sub-flow control structure in accordance with said conversion mode determined by said optimization decision unit.

    14. The analysis method according to claim 8, wherein when said profiler unit determines no optimized sub-flow control structure, then a process is terminated.

    15. The analysis method according to claim 8, wherein said profiler determines said counting program code does not have said specific branch instruction, said compiling unit uses a common prediction method to compile said counting program code.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0025] The foregoing aspects and many of the attendant advantages of this invention will become more readily appreciated as the same becomes better understood by reference to the following detailed description, when taken in conjunction with the accompanying drawings, wherein:

    [0026] FIG. 1A illustrates an example for the divergent flow control structure and the execution thread scheduling diagram.

    [0027] FIG. 1B illustrates an example for adopting the conventional method for reducing the flow control divergence and the thread scheduling diagram.

    [0028] FIG. 2 illustrates the structure of analysis system provided by present invention for reducing the control flow divergence diagram.

    [0029] FIG. 3A illustrates the direction ratio of the program code provided by present invention.

    [0030] FIG. 3B illustrates the table of the direction ratio calculated by the program code specified in FIG. 3A.

    [0031] FIG. 4 illustrates the flow diagram of the analysis method provided by present invention for reducing the control flow divergence.

    DESCRIPTION OF THE PREFERRED EMBODIMENT

    [0032] The detailed features and advantages of prevent invention will be described in the following execution method in detail. The technical content of present invention will be understood and executed by the person skilled in the art. According to the content disclosed in this paper, the purpose and advantages of present invention can be understood easily by the person skilled in the art.

    [0033] Please refer to FIG. 2 and FIG. 3A. FIG. 2 illustrates the structure of analysis system 200 provided by the present invention for reducing the control flow divergence. FIG. 3A illustrates the direction ratio of the program code provided by present invention.

    [0034] As shown in FIG. 2, the analysis system 200 is suitable for the GPU 210. The GPU 210 comprises a computing unit 212 and a hardware performance profiling support unit 214. The computing unit 212 is used to count the number of at least one branch, the number of at least one cycle, and to calculate at least one direction ratio R.sub.d of counting program code. The computing unit 212 sends the produced number of branch, the number of cycle, and the direction ratio R.sub.d to the hardware performance profiling support unit 214.

    [0035] Still as shown in FIG. 2, the analysis system 200 for reducing the control flow divergence comprises a compiling unit 230, a profiler unit 232, and an optimization unit 234. The compiling unit is used to load the counting program code sent from GPU 210. The profiler unit 232 is connected to the compiling unit 230. The profiler unit 232 comprises a control flow profiler unit 2322, an execution profiling unit 2324 and a branch data profiler unit 2326. The control flow profiler unit 2322 is connected to the compiling unit 230. The control flow profiler unit 2322 is used to determine whether the counting program code having the sub-flow control structure and the specific branch instruction or not.

    [0036] As shown in FIG. 2, the execution performance profiling unit 2324 is connected to the compiling unit 230. The execution performance profiling unit 2324 receives the number of branch, the number of cycle, and the direction ratio sent from the hardware performance profiling support unit 214, and determines the execution performance efficiency of counting program code in accordance with the number of branch, the number of cycle and the direction ratio in the counting program code. The direction ratio R.sub.d is the ratio of the number of the execution branch and the number of the un-execution branch.

    [0037] Please refer to FIG. 3A and FIG. 3B. FIG. 3B illustrates the table of the direction ratio calculated by the counting program code specified in FIG. 3A. When the system executes the counting program code, the computing unit 212 starts to conduct the computation, and the execution number of each execution thread is shown in FIG. 3B. The execution number of Block A is 2, the execution number of the Block B is 10, and the direction ratio R.sub.d is 0.2 estimated by the computing unit 212.

    [0038] As shown in FIG. 2, the branch data profiler unit 2326 is used to determine whether the counting program code having the sub-flow control structure and the specific branch instruction or not, and to calculate the branch ratio R.sub.b and the number of the specific branch instruction for each branch.

    [0039] As shown in FIG. 2, the optimization unit 234 comprises an optimization decision unit 2342 and a transform pattern unit 2344. The optimization decision unit 2342 can determine which transform pattern can be used to transform the sub-control flow structure in accordance with the branch ratio and the number of the specific branch instruction. The branch ratio R.sub.b is the ratio of branch in the counting program code. The transform pattern unit 2344 transforms the sub-flow control structure in accordance with the transform pattern determined by the optimization decision unit 2342.

    [0040] Please refer to FIG. 4. FIG. 4 illustrates the flow diagram of analysis method provided by present invention for reducing the control flow divergence. The analysis method provided by the present invention is suitable to be used in the GPU 210 specified in FIG. 2. As shown in FIG. 4, the steps comprise:

    [0041] Step S401 of FIG. 4: Loading and executing a counting program code on the GPU 210 (as example shown in FIG. 3A), the GPU 2102 counts the number of branch, the number of cycle, and the direction ratio R.sub.d of counting program code.

    [0042] Step S402 of FIG. 4: Using a compiling unit 230 to load the counting program code and produce the control flow diagram.

    [0043] Step S403 of FIG. 4: Using a profiler unit 232 to determine whether the counting program code having a sub-flow control structure or not, wherein, the sub-flow control structure can be optimized.

    [0044] Step S404 of FIG. 4: Upon determining the optimized sub-flow control structure, the profiler unit 232 is used to determine whether the counting program code having a specific branch instruction or not.

    [0045] Step S405 of FIG. 4: When the profiler unit 232 is used to determine whether the counting program code having a specific branch instruction or not, the compiling unit 230 is used to compile the counting program code to form a specific flow control structure.

    [0046] Step S406 of FIG. 4: When the profiler 232 is used to determine whether the counting program code having a specific branch instruction or not, the profiler unit 232 is used to calculate the branch ratio, and the number of the specific branch instruction for each branch.

    [0047] Step S408 of FIG. 4: According to the branch ratio and the number of the specific branch instruction, the optimization unit 234 determines the use of a conversion mode to convert the sub-flow control structure.

    [0048] In Step S403 of FIG. 4, when the profiler unit 232 determines there is no optimized sub-flow control structure, then, the process will be terminated, and go to Step S4032.

    [0049] In Step S404 of FIG. 4, when the profiler unit 232 determines the counting program code does not have the specific branch instruction, the compiling unit 230 uses a common prediction method to compile the counting program code, and go to Step S4052. Then, in Step S4062: Calculating the execution rate from the analysis sample, and predicting the branch ratio R.sub.b of each path.

    [0050] Summarized from the abovementioned embodiments, through the analysis system and method provided by present invention, it is able to determine the flow structure and branch which will slow down the GPU speed, and conduct the suitable transformation for the structure and the branch immediately in accordance with the hardware counting information and the software determining and transforming the process, in order to increase the processing speed.

    [0051] It is understood that various other modifications will be apparent to and can be readily made by those skilled in the art without departing from the scope and spirit of this invention. Accordingly, it is not intended that the scope of the claims appended hereto be limited to the description as set forth herein, but rather that the claims be construed as encompassing all the features of patentable novelty that reside in the present invention, including all features that would be treated as equivalents thereof by those skilled in the art to which this invention pertains.