Computing system, and driving method and compiling method thereof
10866817 ยท 2020-12-15
Assignee
Inventors
- Seung-won LEE (Hwaseong-si, KR)
- Chae-Seok Im (Yongin-si, KR)
- Seok-hwan Jo (Suwon-si, KR)
- Suk-jin Kim (Seoul, KR)
Cpc classification
G06F9/44521
PHYSICS
Y02D10/00
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
International classification
G06F9/50
PHYSICS
G06F9/30
PHYSICS
Abstract
A computing system is disclosed. The computing system according to one embodiment of the present disclosure comprises: a memory device for storing an application program; a processor for executing a loader for loading data of the application program into a memory space allocated for execution of the application program; a local memory having a width corresponding to the size of a register of the processor; and a constant memory having a width smaller than that of the local memory, wherein, according to the size of constant data included in the application program, the processor loads the constant data into one of the local memory and the constant memory.
Claims
1. A computing system, comprising: a memory device for storing an application program; a processor for executing a loader for loading data of the application program into a memory space allocated for execution of the application; a local memory having a width corresponding to a size of a register of the processor; and a constant memory having a width smaller than that of the local memory, wherein, according to a size of constant data included in the application program, the processor loads the constant data into one of the local memory and the constant memory, and wherein, among instructions included in the application program, the processor substitutes a general load instruction of loading the constant data, which is loaded into the constant memory, within a threshold into a target register, with a constant load instruction of loading from the constant memory.
2. The computing system of claim 1, wherein, when a numerical value of the constant data is within the threshold, the processor loads the constant data into the constant memory, and when the numerical value of the constant data is above the threshold, and the processor loads the constant data into the local memory.
3. The computing system of claim 2, wherein the threshold is the largest size that can be expressed by a bit stream that is narrower by 1 bit than the width of the constant memory.
4. The computing system of claim 3, wherein the processor performs calculation according to instructions included in the application program, and when performing the calculation for loading data from the constant memory, inserts a leading-zero to allow the data of the constant memory to correspond to the size of the register of the processor.
5. The computing system of claim 4, wherein, when performing the calculation for loading data from the constant memory, and when a most significant bit (MSB) at an accessed location is 1, the processor loads data of the local memory based on the loaded data.
6. The computing system of claim 1, wherein the local memory and the constant memory are configured as one of static random access memory (SRAM), scratch pad memory (SPM) and tightly coupled memory (TCM).
7. A driving method of a computing system, comprising: comparing a numerical value of constant data constructing an application program with a threshold; among instructions included in the application program, substituting a general load instruction of loading the constant data into a target register, with a constant load instruction of loading from a constant memory; loading constant data having the numerical value within the threshold as a result of a comparison to the constant memory; and loading the rest data other than the constant data into a local memory.
8. The driving method of claim 7, further comprising: writing in the constant memory an offset that indicates a location that stores the rest data in the local memory.
9. The driving method of claim 8, wherein the writing in the constant memory further comprises setting a flag to distinguish the offset and the constant data.
10. The driving method of claim 9, wherein, when width of the local memory corresponds to a size of a register of a processor of the computing system, when a width of the constant memory is smaller than a width of the local memory, and when performing calculation for loading data from the constant memory, the method further comprises inserting a leading-zero to allow data of the constant memory to correspond to the size of the register of the processor.
11. The driving method of claim 10, wherein, when performing the calculation for loading data from the constant memory and when the flag set to the loaded data is 1, the method further comprises determining the loaded data to be offset and loading data from the local memory based on the offset.
Description
DESCRIPTION OF DRAWING
(1) The above and/or other aspects and advantages of the present invention will become apparent and more readily appreciated from the following detailed description, taken in conjunction with the accompanying drawings, in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
BEST MODE
(14) Hereinbelow, preferred exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. In the following description of the present disclosure, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present disclosure rather unclear. In addition, terms used herein are defined in consideration of functions of the present disclosure, and may be varied according to practices of the users or operators. Therefore, definition should be interpreted based on the entire description of the present disclosure.
(15)
(16) Referring to
(17) The memory device 30 may store application programs. Specifically, the memory device 30 may store application programs to be executed. Among the application programs stored in a sub-storage such as auxiliary memory device, the memory device 30 may load an application program that is designated to be executed.
(18) The memory device 30 may be configured as various types of memory. Although
(19) The processor 110 executes the loader. The loader is a program provided to perform functions of allocation, linking, relocation and loading. Among the above, loading is the function related with the loader. The loading is an operation of the processor 110 for reading from an auxiliary memory device or an external memory device to a register for writing an object of calculation (operand). The loader may be a service program that drives the computing system 100 and that is included in an operating system for executing the application programs.
(20) The local memory 120 and the constant memory 130 are located on an upper level of the memory device 30. The local memory 120 and the constant memory 130 are distinct from each other. The local memory 120 may be a physically separated component from the constant memory 130 or may represent two independent regions which are divided from each other in one memory.
(21) The local memory 120 may have a width corresponding to the size of the register of the processor 110. Specifically, the local memory 120 may be configured on the basis of a unit that corresponds to the word. For example, the local memory 120 has a width of 32 bits.
(22) Differently from the above, the constant memory 130 has a different width from that of the local memory 120. In other words, the constant memory 130 may be configured on the basis of a unit that is smaller than the word. For example, the constant memory 30 has a width of 8 bits.
(23) The processor 110 may execute a loader to load the data included in an application program stored in DRAM 30 to the local memory 120 or the constant memory 130. Specifically, the processor 110 may load the constant data into one of the local memory 120 and the constant memory 130 according to size of the constant data included in the application program. In this example, the size of the constant data refers to constant values indicated by the constant data, rather than amount of data. As a constant value increases, the size for representing the constant increases. For example, all variables declared as integer are expressed by data amount of 32 bits, but the size of constant indicated by the data may be varied such as 0 and 128.
(24) The processor 110 may compare a numerical value of the constant data with a threshold to load constant data into one of the local memory 120 or the constant memory 130. Specifically, the processor 110 may load the numerical value to the constant memory 130 when a numerical value of the constant data is within the threshold, and loads the numerical value to the local memory 120 when the numerical value of the constant data is above the threshold.
(25) In this example, comparison with the threshold may be implemented in various manners. For example, the loader may perform the comparison based on a difference between 127 and a numerical value of the constant data. Alternatively, the loader may perform the comparison based on whether a binary number expressing a numerical value is within 7 digits or not.
(26) Meanwhile, the threshold is determined according to the width of the constant memory 130. When the constant memory 130 can contain data by a width of 1 bit, the threshold is 0. According to one embodiment of the present disclosure, the threshold is a largest number that can be expressed by the bit stream that is narrower by 1 bit than the width of the constant memory 130. In the embodiment described above, the constant memory 130 may have a width of 8 bits, and therefore, the threshold may be 127 which is largest size that can be expressed by 7 bits. In this example, the rest 1 bit of the constant memory 130 is the place for flag which will be explained below. For the constant data within 127, only the latter 7 digits are significant. In other words, the constant memory 130 may load the latter part of the constant data within the threshold.
(27) Among the instructions included in an application program, the processor 110 may substitute the general load instruction of loading the constant data within the threshold, with the constant load instruction of loading from the constant memory 130. Specifically, the processor 110 may search the instructions of an application program for a general load instruction of loading the constant data within the threshold. Further, the processor 110 may confirm whether addresses of the data loaded by the general load instruction are addresses of the constant data having a size within the threshold. Addresses of the constant data may be recognized during a process of loading the application program to DRAM 30. The processor 110 may substitute the general load instruction of calling data from the local memory 120 with the constant load instruction of calling from the constant memory 130. Based on the above, when calculation required by the constant data within the threshold is performed, the processor 110 may access the constant memory 130.
(28) The processor 110 performs the application program. The processor 110 may perform calculation according to the instruction included in the application program. Further, when the processor 110 performs calculation for loading data from the constant memory 130, the leading-zero may be inserted into the data of the constant memory 130 so as to correspond to the size of the register. Further explanation will be provided below by referring to
(29) When calculation for loading the data from the constant memory 130 is performed, the processor 110 may load data from the local memory 120 based on the loaded data according to a most significant bit of the accessed position. Further explanation will be provided below by referring to
(30) The local memory 120 and the constant memory 130 may be configured as one of static random access memory (SRAM), scratch pad memory (SPM), or tightly coupled memory (TCM).
(31) The computing system 100 according to the embodiment described above is expected to provide an enhanced processing speed in executing of an application program, because the computing system 100 loads a low-numbered constant data including 0 to a separate constant memory.
(32)
(33) Referring to
(34) The code region 410 may include instructions (or execution code) of a program. The code region 410 may be also be called as text region. The processor may execute an application program by reading this region.
(35) The data region 420 includes information which should be secured for the lifetime of a program, such as global data, static data, text string, constant, or the like.
(36) The heap region 430 is space for dynamic memory allocation. In other words, this space is dynamically allocated by a programmer. Unlike this, the stack regions 440, 450 are stored with automatic/temporary variables when calling a function. Specifically, the stack region 440, 450 may be composed of a region 450 where arguments of functions and environment variables are stored, and an increasing and decreasing region 440 such as local variables, parameters, return addresses or the like.
(37) According to one embodiment of the present disclosure, the constant data included in the data region 420 may be loaded to the constant memory. The constant memory having narrow width including 0 enhances read speed of the processor.
(38)
(39) Referring to
(40) The most significant bit (MSB) is the flag 131 for distinguishing the data of the corresponding line. Specifically, when a bit value of the flag is 0, data of the corresponding line is constant data. Further, when a bit value of the flag is 1, data of the corresponding line is offset.
(41) An example of processing when a bit value of the flag 131 is 0 will be explained by referring to
(42)
(43)
(44) However, since the length of the called constant data is 7 bits, the 7-bit-long may be expanded to be 32 bits to be loaded to the 32-bit-long register. Specifically, 0, which is 25-bit-long, is inserted in front of the 7-bit-long constant data. These zero (0) bits inserted on the front end are called as leading-zeros.
(45)
(46) Referring to
(47) To be prepared for the circumstance mentioned above, the constant memory 130 may store offset. Specifically, the constant memory 130 may store offset pointing to a specific address of the local memory 120. Further, the constant memory 130 may store a bit value 1 for identifying the offset in the flag 131.
(48) Referring to the example of
(49) CPU performs general load calculation again with the offset, as an exception handling. Specifically, CPU may recognize a numerical value written on the 7-bit-long data region 720 of the loaded Line 2, and call for data of the local memory 120 located in the offset. In the illustration of
(50) Meanwhile, a starting point where the offset is counted in the local memory 120 may not match the first address of the local memory 120. The offset may be a number indicating a point that is counted from a preset location of the local memory 120.
(51) Meanwhile, the capacity of the local memory 120 may correspond to the length of the offset of the constant memory 130. For example, the offset length of the constant memory 130 is 7 bits. The largest size of the offset that can be expressed by 7 bits is 2.sup.7, that is, 127. Accordingly, the capacity of the local memory 120 is 27*32 bits, which has a width of 32 bits and is composed of 127 lines. However, the present disclosure may not be limited hereto. Accordingly, the local memory 120 may have a larger capacity as long as the offset can point to correct data from a preset location.
(52)
(53) Referring to
(54) The constant data within the threshold may be loaded to the constant memory, at S820. Specifically, the driving method includes loading the constant data having the numerical value within the threshold to a separate constant memory.
(55) The rest data may be loaded to the local memory, at S830. Specifically, the driving method includes loading the constant data exceeding the threshold or the rest general data into the local memory except for the constant data loaded to the constant memory.
(56) Meanwhile, the driving method described above may further include writing, in the constant memory, the offset indicating a location that stores the rest data loaded to the local memory. In this example, the writing may further include setting the flag for dividing the offset and the constant data. For example, when the constant data is loaded, MSB may be set to 0, and when the offset is written, MSB may be set to 1. CPU may then recognize whether the written data is constant data or offset, based on the value of the flag.
(57) The driving method according to the embodiment described above may enhance performance of the computing system by loading the constant data having high load frequency to a memory region.
(58)
(59) The operations in
(60) Next, the general load instruction, [LOAD], of loading the checked address may be substituted with a constant load instruction, [C_LOAD], at S920. In this example, LOAD is a common assembly language used for calculation for calling data at the address and loading it to the register. When calculation is performed by LOAD instruction, CPU may access the local memory. C_LOAD is a modified assembly language that directs an access path to call the data at the address toward the constant memory. The operation at S920 is a process of reconfiguring an application program so that the constant data within the threshold can be loaded to the constant memory.
(61) Meanwhile, the width of the constant memory written with the constant data within the threshold may be narrower than the width of the local memory corresponding to size of the register of the processor. In this case, the driving method described above may further include inserting leading-zero, in which case the data of the constant memory can correspond to size of the register of the processor when calculation for loading data from the constant memory is performed according to C_LOAD instruction.
(62) Further, exception handling may be performed according to the flag set as described above. Specifically, when the calculation for loading data from the constant memory is performed and when the flag set for the loaded data is 1, the operating method described above may determine that the loaded data is offset and load data from the local memory based on the offset.
(63)
(64) Referring to
(65) It is then determined whether the constant memory includes data at searched address or not, at S1020. In other words, it may be determined whether a hit, in which the constant data is present on the constant memory, or a miss, in which the constant data is not present on the constant memory, is generated.
(66) When the data is missed at S1020:N, a sub-level DRAM may be accessed to load data corresponding to the address, at S1030.
(67) When the data is hit at S1020:Y, it may be determined whether a bit value of the flag at the accessed location is 1 or not, at S1040. Specifically, the processor checks whether the MSB at a line of the accessed constant memory is 1 or not, based on the address.
(68) When the flag value is 0 at S1040:N, leading-zero may be inserted in front of the data in the data region before loading to the register.
(69) When a flag value is 1 at S1040:Y, exception handling may be performed, at S1050. The processor may access the local memory based on the loaded offset, and load the general data at the offset location once again, at S1060.
(70) The loading method according to the embodiment described above is capable of efficient loading of data even when the computing system is configured with two separate memories having different widths from each other.
(71) Meanwhile, the operating method and loading method of
(72)
(73) Referring to
(74) According to one embodiment of the present disclosure, the complier 1110 may aggregate the constant data within the threshold from the application program code. Specifically, the compiler 1110 may determine whether a numerical value of the constant data from the application program code is within the threshold or not. Further, the compiler 1110 may assemble constant data determined to have a numerical value within the threshold into a constant data section of the binary file. As illustrated in
(75) As described above, because the compiler 1110 compiles the constant data including zero (0) showing high frequency in aggregation, the same effect as if a devoted constant memory is installed, can be obtained. When the locality principle for effective operation of the cache is applied, further enhancement of performance can be expected.
(76)
(77) Referring to
(78) The constant data determined as having the numerical value within the threshold may be assembled into the constant data section of the binary file, at S1220. Specifically, the compiling method may aggregate the constant data within the threshold to one section of the binary file for executing of an application program. Any independent section distinguished from the other data may be used as the one section.
(79) The compiling method described above may further include assembling the rest data except for the constant data within the threshold among the application program code, to a general data section of the binary file. For example, the general data section may be the region other than the constant data section among the data regions 420 of
(80) Further, according to application program code, the compiling method described above may further include generating constant load instruction, [C_LOAD], for loading data of the constant data section or generating general load instruction, [LOAD], for loading data of a general data section. In other words, the compiler may compile the constant load instruction, [C_LOAD], for loading the constant data from the constant data section, rather than the general load instruction, [LOAD], when converting the program code into the assembly language of calculation for loading the constant data within the threshold. Accordingly, in the stage of executing the application program, the load calculation may be performed for calling data from the distinguished sections from each other according to [LOAD] and [C_LOAD] instructions.
(81) By sectioning a storing region, the compiling method described above may reduce time required for the load calculation for accessing low-numbered constant including 0 when executing an application program, thus facilitating enhancement of the overall speed of the computing system.
(82) The compiling method described above may be also performed in the computing system including the configuration of
(83) Further, the compiling method described above may be configured as at least one execution program to perform the compiling method described above, and such execution program may be stored in computer readable recording medium.
(84) Accordingly, each block of the present disclosure may be performed as computer recording code on computer readable recording medium. The computer readable recording medium may be device that can store data to be read by a computer system.
(85) For example, the computer readable recording medium may be ROM, RAM, CD-ROMs, magnetic tape, floppy disk, optical disk, optical data storing device and image display apparatus such as television including the storing devices described above. Further, the computer readable code may be performed as computer data signals of carrier wave.
(86) The foregoing exemplary embodiments and advantages are merely exemplary and are not to be construed as limiting the exemplary embodiments. The present teaching can be readily applied to other types of apparatuses. Also, the description of the exemplary embodiments of the present inventive concept is intended to be illustrative, and not to limit the scope of the claims.