SYSTEM AND METHOD FOR DATA GENERATOR DRIVEN BUS CLOCK VOTING
20170346656 · 2017-11-30
Inventors
- AJAY NAWANDHAR (BANGALORE, IN)
- Pavan Kumar (Bangalore, IN)
- KIRAN KUMAR MALIPEDDI (SECUNDERABAD, IN)
- CHANDRASEKHAR REDDY RAMREDDY GARI (HYDERABAD, IN)
Cpc classification
Y02D10/00
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
International classification
Abstract
Various embodiments of methods and systems for data generator driven bus clock voting are disclosed. An exemplary embodiment defines a first timing domain within a system on a chip to comprise a data generating component and a bus that includes a memory management unit. The bus serves to communicatively couple the data generating component to a memory component, such as a DDR. A second timing domain within the system on a chip comprises the memory component. With such a configuration, the embodiment may leverage the clock speed of the data generating component to set a clock speed for components in the first timing domain and, in doing so, the clock speed of the memory management unit is dictated by the first timing domain.
Claims
1. A method for data generator driven bus clock voting, the method comprising: defining a first timing domain within a system on a chip to comprise a data generating component and a bus, wherein the bus comprises a memory management unit and is communicatively coupled to the data generating component and a memory component; and defining a second timing domain within the system on a chip to comprise the memory component; wherein a clock speed of the data generating component determines a clock speed setting for the first timing domain and a clock speed of the memory management unit is dictated by the first timing domain.
2. The method of claim 1, further comprising: monitoring the clock speed of the data generating component; recognizing a change in the clock speed of the data generating component; and adjusting the clock speed of the memory management unit based on the recognized change.
3. The method of claim 2, wherein the recognized change in the clock speed of the data generating component is the result of a thermal energy generation trigger.
4. The method of claim 2, wherein the recognized change in the clock speed of the data generating component is the result of a workload scheduling queue trigger.
5. The method of claim 2, wherein the recognized change in the clock speed of the data generating component is the result of a user setting trigger.
6. The method of claim 1, wherein the data generating component is one of a graphics processing unit, a digital signal processor and a central processing unit.
7. A system for data generator driven bus clock voting, the system comprising: means for defining a first timing domain within a system on a chip to comprise a data generating component and a bus, wherein the bus comprises a memory management unit and is communicatively coupled to the data generating component and a memory component; and means for defining a second timing domain within the system on a chip to comprise the memory component; wherein a clock speed of the data generating component determines a clock speed setting for the first timing domain and a clock speed of the memory management unit is dictated by the first timing domain.
8. The system of claim 7, further comprising: means for monitoring the clock speed of the data generating component; means for recognizing a change in the clock speed of the data generating component; and means for adjusting the clock speed of the memory management unit based on the recognized change.
9. The system of claim 8, wherein the recognized change in the clock speed of the data generating component is the result of a thermal energy generation trigger.
10. The system of claim 8, wherein the recognized change in the clock speed of the data generating component is the result of a workload scheduling queue trigger.
11. The system of claim 8, wherein the recognized change in the clock speed of the data generating component is the result of a user setting trigger.
12. The system of claim 7, wherein the data generating component is one of a graphics processing unit, a digital signal processor and a central processing unit.
13. The system of claim 7, wherein the system on a chip is comprised within a portable computing device.
14. A system for data generator driven bus clock voting, the system comprising: a dynamic current and voltage module operable to: define a first timing domain within a system on a chip to comprise a data generating component and a bus, wherein the bus comprises a memory management unit and is communicatively coupled to the data generating component and a memory component; and define a second timing domain within the system on a chip to comprise the memory component; wherein a clock speed of the data generating component determines a clock speed setting for the first timing domain and a clock speed of the memory management unit is dictated by the first timing domain.
15. The system of claim 14, wherein the dynamic current and voltage module is further operable to: monitor the clock speed of the data generating component; recognize a change in the clock speed of the data generating component; and adjust the clock speed of the memory management unit based on the recognized change.
16. The system of claim 15, wherein the recognized change in the clock speed of the data generating component is the result of a thermal energy generation trigger.
17. The system of claim 15, wherein the recognized change in the clock speed of the data generating component is the result of a workload scheduling queue trigger.
18. The system of claim 15, wherein the recognized change in the clock speed of the data generating component is the result of a user setting trigger.
19. The system of claim 14, wherein the data generating component is one of a graphics processing unit, a digital signal processor and a central processing unit.
20. The system of claim 14, wherein the system on a chip is comprised within a portable computing device.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0007] In the drawings, like reference numerals refer to like parts throughout the various views unless otherwise indicated. For reference numerals with letter character designations such as “102A” or “102B”, the letter character designations may differentiate two like parts or elements present in the same figure. Letter character designations for reference numerals may be omitted when it is intended that a reference numeral to encompass all parts having the same reference numeral in all figures.
[0008]
[0009]
[0010]
DETAILED DESCRIPTION
[0011] The word “exemplary” is used herein to mean serving as an example, instance, or illustration. Any aspect described herein as “exemplary” is not necessarily to be construed as exclusive, preferred or advantageous over other aspects.
[0012] In this description, the term “application” may also include files having executable content, such as: object code, scripts, byte code, markup language files, and patches. In addition, an “application” referred to herein, may also include files that are not executable in nature, such as documents that may need to be opened or other data files that need to be accessed.
[0013] In this description, reference to double data rate “DDR” memory components will be understood to envision any of a broader class of volatile random access memory (“RAM”) used for long term data storage and will not limit the scope of the solutions disclosed herein to configurations or arrangements that include a specific type or generation of RAM.
[0014] As used in this description, the terms “component,” “database,” “module,” “system,” “generator,” “engine,” “controller,” and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device may be a component. One or more components may reside within a process and/or thread of execution, and a component may be localized on one computer and/or distributed between two or more computers. In addition, these components may execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the Internet with other systems by way of the signal).
[0015] In this description, the terms “central processing unit (“CPU”),” “digital signal processor (“DSP”),” “graphical processing unit (“GPU”),” and “chip” are used interchangeably. Moreover, a CPU, DSP, GPU or a chip may be comprised of one or more distinct processing components generally referred to herein as “core(s).”
[0016] In this description, the terms “engine,” “processing engine,” “master processing engine,” “master component,” “data generator” and the like are used to refer to any component within a system on a chip (“SoC”) that generates transaction requests to closely coupled memory devices and/or to components of a memory subsystem via a bus. As such, a master component may refer to, but is not limited to refer to, a CPU, DSP, GPU, modem, controller, display, camera, etc. A master component comprised within an embodiment of the solution, depending on its particular function and needs, may dictate the clock frequency for a master component time domain that includes a bus.
[0017] In this description, the terms “memory management unit,” “MMU,” “translation lookaside buffer,” and “TLB” are used interchangeably to refer to a component associated with a bus and having all transaction requests from a data generator passed through it for the purpose of performing translations of virtual memory addresses in a cache to physical memory addresses in the DDR.
[0018] In this description, the terms “bus,” “bus interconnect,” “advanced extensible interface (“AXI”)” and the like are used interchangeably and refer to a collection of wires through which data is transmitted from a data generator to a memory component or other device located on or off the SoC. It will be understood that a bus consists of two parts—an address bus and a data bus where the data bus transfers actual data and the address bus transfers information specifying location of the data in a memory component. The term “width” or “bus width” or “bandwidth” refers to an amount of data, i.e. a “chunk size,” that may be transmitted per cycle through a given bus. For example, a 16-byte bus may transmit 16 bytes of data at a time, whereas 32-byte bus may transmit 32 bytes of data per cycle. Moreover, “bus speed” refers to the number of times a chunk of data may be transmitted through a given bus each second. Similarly, a “bus cycle” or “cycle” refers to transmission of one chunk of data through a given bus. The bus speed for a given bus may be driven or dictated by the clock frequency of a data generator associated with the bus in embodiments of the solution.
[0019] In this description, the term “portable computing device” (“PCD”) is used to describe any device operating on a limited capacity power supply, such as a battery. Although battery operated PCDs have been in use for decades, technological advances in rechargeable batteries coupled with the advent of third generation (“3G”) and fourth generation (“4G”) wireless technology have enabled numerous PCDs with multiple capabilities. Therefore, a PCD may be a cellular telephone, a satellite telephone, a pager, a PDA, a smartphone, a navigation device, a smartbook or reader, a media player, a combination of the aforementioned devices, a laptop computer with a wireless connection, among others.
[0020] In current systems and methods, a bus interconnect's dynamic clock and voltage scaling (“DCVS”) power scheme may be determined by a memory controller or, by extension, a memory device managed by the memory controller. As such, the bus may run at a relatively slower clock frequency when a data generator associated with the bus is running at a relatively higher DCVS rate and generating a high volume of transaction requests. Notably, when a data generator is generating a high volume of transaction requests while its associated bus is running at a relatively lower bus speed, a bubble in the performance of the overall system-on-chip (“SoC”) may occur.
[0021] As an example of the loading scheme mentioned above, in the prior art when a GPU workload is low (thereby dictating a Supply Voltage Supervisor (“SVS”) or a nominal (“NOM”) mode), while the memory load is high (thereby dictating a “Turbo mode” for the memory), the bus interconnect between them may be running at a high clock speed causing it to consume extra power. Conversely, in the prior art when a GPU workload is high (“Turbo mode”), while the memory load is low (thereby dictating a Supply Voltage Supervisor (“SVS”) or a nominal (“NOM”) mode), the bus interconnect between them may be running at a low clock speed that negatively impacts system performance.
[0022] To mitigate or alleviate the shortcomings of the prior art arrangements, embodiments of the solution provide for a DCVS voting scheme for a bus interconnect that is driven by a data generator master component. To do this, embodiments recognize when a given data generator, like a GPU, has a low workload and is running a FIFO on a low clock speed. In such a scenario, an embodiment of the solution may run the bus interconnect clock at a frequency consistent with that of the GPU knowing that back pressure in the system may be minimal because the GPU is sending relatively fewer transaction requests to the DDR memory device. Advantageously, power consumption from the bus interconnect may be minimized when transaction bandwidth is on low demand from the GPU.
[0023] When the GPU experiences a high workload and the FIFO is on a high clock speed, embodiments of the solution may be configured to respond to the bandwidth demands with the bus interconnect already being driven at a clock speed dictated by that of the GPU, thereby optimizing overall performance of the PCD.
[0024] A further advantage of embodiments of the solution is that, given most of the data typically requested by a master component is already cached in a memory management unit (“MMU”) running on the AXI clock, the transaction requests of the master component may be satisfied from the MMU, thereby avoiding buildup of a transaction request queue, even when the master component and the bus memory interconnect are running at a relatively higher speed than the DDR memory device.
[0025] Thus, the novel DCVS voting scheme encompassed by embodiments of the solution may be determined by a core of a multi-core processor in order to match the core's faster/slower bandwidth requirements. In this way, when a core needs faster data, the memory interconnect may run faster and when the core generates slower requests, the bus interconnect may also be slowed commensurately. Moreover, given that most data of a PCD is cached in the MMU (which runs on a bus clock), in embodiments of the solution much of the data typically requested may be returned to the master component quickly even if the memory component speed is slower (e.g., GPU in turbo mode while DDR is in SVS mode).
[0026]
[0027] In general, the memory subsystem 112 comprises, inter alia, a memory controller 215, dedicated caches and FIFOs for master components, MMU/TLB 116, and a DDR memory 115 (collectively depicted in the
[0028] As illustrated in
[0029] As depicted in
[0030] As further illustrated in
[0031]
[0032] The CPU 110 may also be coupled to one or more internal, on-chip thermal sensors 157A as well as one or more external, off-chip thermal sensors 157B. The on-chip thermal sensors 157A may comprise one or more proportional to absolute temperature (“PTAT”) temperature sensors that are based on vertical PNP structure and are usually dedicated to complementary metal oxide semiconductor (“CMOS”) very large-scale integration (“VLSI”) circuits. The off-chip thermal sensors 157B may comprise one or more thermistors. The thermal sensors 157 may produce a voltage drop that is converted to digital signals with an analog-to-digital converter (“ADC”) controller (not shown). However, other types of thermal sensors 157 may be employed.
[0033] The touch screen display 132, the video port 138, the USB port 142, the camera 148, the first stereo speaker 154, the second stereo speaker 156, the microphone 160, the FM antenna 164, the stereo headphones 166, the RF switch 170, the RF antenna 172, the keypad 174, the mono headset 176, the vibrator 178, thermal sensors 157B, the PMIC 180 and the power supply 188 are external to the on-chip system 102. It will be understood, however, that one or more of these devices depicted as external to the on-chip system 102 in the exemplary embodiment of a PCD 100 in
[0034] In a particular aspect, one or more of the method steps described herein may be implemented by executable instructions and parameters stored in the memory subsystem 112 or as form the DCVS module 26 and/or the clocks 27, 28 (see
[0035]
[0036] The transactions emanating from the master component 201 are marshaled by memory controller 215. However, when the master component 201 clock speed exceeds the DDR 115 clock speed, embodiments of a BCV solution may leverage data stored in the MMU 116 to satisfy the transaction requests, thereby avoiding a backlog of the transaction queue. Advantageously, because embodiments of a BCV solution set the memory interconnect clock 29 at the speed dictated by the master component time domain (which is dictated by the memory component clock 27), instead of the clock speed associated with the memory component time domain, requests that can be satisfied from the MMU 116 are filled at a fast rate even when the DDR 115 is subject to a low power mode. Additionally, when the memory component time domain is set by the DCVS module 26 (per the memory clock 28) to a high performance mode, such as a turbo mode, and the master component time domain is set by the DCVS module 26 (per the master component clock 27) to a low performance mode, such as a SVS or NOM mode, embodiments of the solution enable the bus interconnect 206 to avoid unnecessary power consumption when the master component requires low bandwidth because both the master component 201 and the bus interconnect 206 are associated with the same time domain (i.e., the master component 201 and the bus interconnect 206 run at the same frequency associated with the low power mode).
[0037]
[0038] Next, at block 310, the clock speed of the data generator may be monitored and governed by a DCVS module 26 according to the setting of the master component clock 27. The memory interconnect 206 clock 29 may be set to the same speed as the data generator 201 because both components are associated with the same timing domain defined at block 305. At decision block 315, if the clock speed of the data generator changes (per the instructions of the DCVS module 26 working with the master component clock 27), the “yes” branch is followed to block 320 and the clock speed of the memory interconnect 206 is also adjusted to match that of the adjusted data generator clock. The method 300 loops back to block 310 and monitoring continues. If the clock speed of the data generator remains at a given set point unchanged, the “no” branch is followed from decision block 315 back to block 310 and monitoring continues. In this way, embodiments of the solution seek to set the memory interconnect clock speed in view of the associated data generator clock speed. When the DCVS module 26 adjusts the clock speed of the data generator, the clock speed of the memory interconnect is also adjusted to match. As such, when the processing speed of the data generator is high, the memory interconnect clock speed is also high to provide needed bandwidth even though a clock speed of memory component time domain may be relatively slower.
[0039] Certain steps in the processes or process flows described in this specification naturally precede others for the invention to function as described. However, the invention is not limited to the order of the steps described if such order or sequence does not alter the functionality of the invention. That is, it is recognized that some steps may performed before, after, or parallel (substantially simultaneously with) other steps without departing from the scope and spirit of the invention. In some instances, certain steps may be omitted or not performed without departing from the invention. Further, words such as “thereafter”, “then”, “next”, etc. are not intended to limit the order of the steps. These words are simply used to guide the reader through the description of the exemplary method.
[0040] Additionally, one of ordinary skill in programming is able to write computer code or identify appropriate hardware and/or circuits to implement the disclosed invention without difficulty based on the flow charts and associated description in this specification, for example. Therefore, disclosure of a particular set of program code instructions or detailed hardware devices or software instruction and data structures is not considered necessary for an adequate understanding of how to make and use the invention. The inventive functionality of the claimed computer implemented processes is explained in more detail in the above description and in conjunction with the drawings, which may illustrate various process flows.
[0041] In one or more exemplary embodiments, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted as one or more instructions or code on a computer-readable device. Computer-readable devices include both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that may be accessed by a computer. By way of example, and not limitation, such computer-readable media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that may be used to carry or store desired program code in the form of instructions or data structures and that may be accessed by a computer.
[0042] Therefore, although selected aspects have been illustrated and described in detail, it will be understood that various substitutions and alterations may be made therein without departing from the spirit and scope of the present invention, as defined by the following claims.