TRANSACTION DISPATCHER FOR MEMORY MANAGEMENT UNIT
20190087351 ยท 2019-03-21
Inventors
- Sadayan Ghows Ghani Sadayan Ebramsah Mo Abdul (Austin, TX, US)
- Piyush Patel (Cary, NC, US)
- Michael TROMBLEY (Cary, NC, US)
- Rakesh Anigundi (San Diego, CA, US)
- Jason Norman (Cary, NC, US)
- Aaron SEYFRIED (Raleigh, NC, US)
Cpc classification
G06F12/1027
PHYSICS
G06F12/1081
PHYSICS
G06F2212/7201
PHYSICS
International classification
G06F12/1027
PHYSICS
Abstract
According to various aspects, a memory management unit (MMU) having multiple parallel translation machines may collect transactions in an incoming transaction stream and select appropriate transactions to dispatch to the parallel translation machines. For example, the MMU may include a dispatcher that can identify different transactions that belong to the same address set (e.g., have the same address translation) and dispatch one transaction from each transaction set to an individual translation machine. As such, the dispatcher may be used to ensure that multiple parallel translation machines do not perform identical memory translations, as other transactions that share the same address translation may obtain the translation results from a translation lookaside buffer.
Claims
1. A method for dispatching memory transactions, comprising: receiving a transaction stream comprising multiple memory transactions at a dispatcher coupled to a memory translation unit configured to perform multiple memory address translations in parallel; identifying, among the multiple memory transactions in the transaction stream, one or more transaction sets, wherein the one or more transaction sets each include one or more memory transactions that share a memory address translation; and dispatching, to the memory translation unit, one memory transaction for translation per transaction set such that the memory translation unit is configured to perform one memory address translation per transaction set.
2. The method recited in claim 1, wherein the one or more transaction sets include at least one transaction set in which one or more memory transactions that are not sent to the memory translation unit for translation are configured to use the memory address translation associated with the one memory transaction dispatched for translation.
3. The method recited in claim 2, wherein a translation lookaside buffer is configured to make the memory address translation associated with the dispatched memory transaction available to the one or more memory transactions that are not sent to the memory translation unit for translation.
4. The method recited in claim 3, wherein the memory address translation is a virtual-to-physical memory address translation.
5. The method recited in claim 1, wherein the memory translation unit includes multiple translation machines configured to perform the multiple memory address translations in parallel.
6. The method recited in claim 5, wherein the multiple translation machines are each configured to perform one memory address translation at a time.
7. The method recited in claim 1, wherein identifying the one or more transaction sets comprises: determining that the multiple memory transactions in the transaction stream include at least a first memory transaction and a second memory transaction configured to access the same memory address region; and grouping the first memory transaction with the second memory transaction.
8. The method recited in claim 1, wherein the dispatcher and the memory translation unit are integrated into a memory management unit.
9. An apparatus for processing memory transactions, comprising: a memory translation unit configured to perform multiple memory address translations in parallel; and a dispatcher coupled to the memory translation unit, wherein the dispatcher is configured to receive a transaction stream comprising multiple memory transactions, identify, among the multiple memory transactions in the transaction stream, one or more transaction sets that each include one or more memory transactions that share a memory address translation, and dispatch, to the memory translation unit, one memory transaction for translation per transaction set such that the memory translation unit is configured to perform one memory address translation per transaction set.
10. The apparatus recited in claim 9, wherein the one or more transaction sets include at least one transaction set in which one or more memory transactions that are not sent to the memory translation unit for translation are configured to use the memory address translation associated with the one memory transaction dispatched for translation.
11. The apparatus recited in claim 10, wherein a translation lookaside buffer is configured to make the memory address translation associated with the dispatched memory transaction available to the one or more memory transactions that are not sent to the memory translation unit for translation.
12. The apparatus recited in claim 11, wherein the memory address translation is a virtual-to-physical memory address translation.
13. The apparatus recited in claim 9, wherein the memory translation unit includes multiple translation machines configured to perform the multiple memory address translations in parallel.
14. The apparatus recited in claim 13, wherein the multiple translation machines are each configured to perform one memory address translation at a time.
15. The apparatus recited in claim 9, wherein the dispatcher is further configured to: determine that the multiple memory transactions in the transaction stream include at least a first memory transaction and a second memory transaction configured to access the same memory address region; and group the first memory transaction with the second memory transaction.
16. The apparatus recited in claim 9, wherein the dispatcher and the memory translation unit are integrated into a memory management unit.
17. An apparatus, comprising: means for receiving a transaction stream comprising multiple memory transactions; means for identifying, among the multiple memory transactions in the transaction stream, one or more transaction sets, wherein the one or more transaction sets each include one or more memory transactions that share a memory address translation; and means for dispatching, to a memory translation unit configured to perform multiple memory address translations in parallel, one memory transaction for translation per transaction set such that the memory translation unit is configured to perform one memory address translation per transaction set.
18. The apparatus recited in claim 17, wherein the one or more transaction sets include at least one transaction set in which one or more memory transactions that are not sent to the memory translation unit for translation are configured to use the memory address translation associated with the one memory transaction dispatched for translation.
19. The apparatus recited in claim 18, wherein a translation lookaside buffer is configured to make the memory address translation associated with the dispatched memory transaction available to the one or more memory transactions that are not sent to the memory translation unit for translation.
20. The apparatus recited in claim 19, wherein the memory address translation is a virtual-to-physical memory address translation.
21. The apparatus recited in claim 17, wherein the memory translation unit includes multiple translation machines configured to perform the multiple memory address translations in parallel.
22. The apparatus recited in claim 21, wherein the multiple translation machines are each configured to perform one memory address translation at a time.
23. The apparatus recited in claim 17, wherein the means for identifying comprises: means for determining that the multiple memory transactions in the transaction stream include at least a first memory transaction and a second memory transaction configured to access the same memory address region; and means for grouping the first memory transaction with the second memory transaction.
24. A non-transitory computer-readable storage medium having computer-executable instructions recorded thereon, the computer-executable instructions configured to cause one or more processors to: receive a transaction stream comprising multiple memory transactions; identify, among the multiple memory transactions in the transaction stream, one or more transaction sets, wherein the one or more transaction sets each include one or more memory transactions that share a memory address translation; and dispatch, to a memory translation unit configured to perform multiple memory address translations in parallel, one memory transaction for translation per transaction set such that the memory translation unit is configured to perform one memory address translation per transaction set.
25. The non-transitory computer-readable storage medium recited in claim 24, wherein the one or more transaction sets include at least one transaction set in which one or more memory transactions that are not sent to the memory translation unit for translation are configured to use the memory address translation associated with the one memory transaction dispatched for translation.
26. The non-transitory computer-readable storage medium recited in claim 25, wherein a translation lookaside buffer is configured to make the memory address translation associated with the dispatched memory transaction available to the one or more memory transactions that are not sent to the memory translation unit for translation.
27. The non-transitory computer-readable storage medium recited in claim 26, wherein the memory address translation is a virtual-to-physical memory address translation.
28. The non-transitory computer-readable storage medium recited in claim 24, wherein the memory translation unit includes multiple translation machines configured to perform the multiple memory address translations in parallel.
29. The non-transitory computer-readable storage medium recited in claim 28, wherein the multiple translation machines are each configured to perform one memory address translation at a time.
30. The non-transitory computer-readable storage medium recited in claim 24, wherein the computer-executable instructions are further configured to cause one or more processors to: determine that the multiple memory transactions in the transaction stream include at least a first memory transaction and a second memory transaction configured to access the same memory address region; and group the first memory transaction with the second memory transaction.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] A more complete appreciation of the various aspects and embodiments described herein and many attendant advantages thereof will be readily obtained as the same becomes better understood by reference to the following detailed description when considered in connection with the accompanying drawings which are presented solely for illustration and not limitation, and in which:
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
DETAILED DESCRIPTION
[0021] Various aspects and embodiments are disclosed in the following description and related drawings to show specific examples relating to exemplary aspects and embodiments. Alternate aspects and embodiments will be apparent to those skilled in the pertinent art upon reading this disclosure, and may be constructed and practiced without departing from the scope or spirit of the disclosure. Additionally, well-known elements will not be described in detail or may be omitted so as to not obscure the relevant details of the aspects and embodiments disclosed herein.
[0022] The word exemplary is used herein to mean serving as an example, instance, or illustration. Any embodiment described herein as exemplary is not necessarily to be construed as preferred or advantageous over other embodiments. Likewise, the term embodiments does not require that all embodiments include the discussed feature, advantage, or mode of operation.
[0023] The terminology used herein describes particular embodiments only and should not be construed to limit any embodiments disclosed herein. As used herein, the singular forms a, an, and the are intended to include the plural forms as well, unless the context clearly indicates otherwise. Those skilled in the art will further understand that the terms comprises, comprising, includes, and/or including, as used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
[0024] Further, various aspects and/or embodiments may be described in terms of sequences of actions to be performed by, for example, elements of a computing device. Those skilled in the art will recognize that various actions described herein can be performed by specific circuits (e.g., an application specific integrated circuit (ASIC)), by program instructions being executed by one or more processors, or by a combination of both. Additionally, these sequences of actions described herein can be considered to be embodied entirely within any form of non-transitory computer-readable medium having stored thereon a corresponding set of computer instructions that upon execution would cause an associated processor to perform the functionality described herein. Thus, the various aspects described herein may be embodied in a number of different forms, all of which have been contemplated to be within the scope of the claimed subject matter. In addition, for each of the aspects described herein, the corresponding form of any such aspects may be described herein as, for example, logic configured to and/or other structural components configured to perform the described action.
[0025] Before discussing exemplary apparatuses and methods for increasing hardware utilization, increasing throughput, and decreasing bus bandwidth requirements in a memory management unit (MMU) having multiple parallel translation machines as disclosed herein, a conventional computing system providing virtual-to-physical memory address translation is described. In this regard,
[0026] As seen in
[0027] As noted above, the computing system 100 also includes the CPU 104 having integrated therein the CPU MMU 102. The CPU MMU 102 may provide address translation services for CPU memory access requests (not shown) of the CPU MMU 102 in much the same manner that the SMMU 106 provides address translation services to the upstream devices 108, 110, and 112. After performing virtual-to-physical memory address translation of a CPU memory access request, the CPU MMU 102 may access the memory 132 and/or the slave device 134 via the system interconnect 136. In particular, a master port (M) 150 of the CPU 104 communicates with a slave port (S) 152 of the system interconnect 136. The system interconnect 136 then communicates via the master ports (M) 142 and 144 with the slave ports (S) 146 and 148, respectively, of the memory 132 and the slave device 134.
[0028] To improve performance, an MMU, such as the CPU MMU 102 and/or the SMMU 106, may provide a translation cache (not shown) for storing previously generated virtual-to-physical memory address translation mappings. However, in the case of an MMU that has multiple parallel translation machines, the performance benefits achieved through use of the translation cache (also referred to as a translation lookaside buffer, or TLB) may be reduced in scenarios in which the MMU processes transactions in an incoming transaction stream in order of arrival.
[0029] In this regard,
[0030] However, because the MMU 200 shown in
[0031] Accordingly, to avoid the wasted hardware resources and bus bandwidth from having the parallel translation machines 222, 224 perform identical translations,
[0032] In various embodiments, with reference to block 430 in
[0033] According to various aspects,
[0034] According to various aspects,
[0035] According to various embodiments, the CPU 610 can also access the display controller 640 over the system bus 620 to control information sent to a display 670. The display controller 640 can include a memory controller 642 and memory 644 to store data to be sent to the display 670 in response to communications with the CPU 610. The display controller 640 sends information to the display 670 to be displayed via a video processor 660, which processes the information to be displayed into a format suitable for the display 670. The display 670 can include any suitable display type, including but not limited to a cathode ray tube (CRT), a liquid crystal display (LCD), a plasma display, an LED display, a touchscreen display, a virtual-reality headset, and/or any other suitable display.
[0036] Those skilled in the art will appreciate that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
[0037] Further, those skilled in the art will appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the aspects disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted to depart from the scope of the various aspects and embodiments described herein.
[0038] The various illustrative logical blocks, modules, and circuits described in connection with the aspects disclosed herein may be implemented or performed with a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices (e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or other such configurations).
[0039] The methods, sequences, and/or algorithms described in connection with the aspects disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM, flash memory, ROM, EPROM, EEPROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of non-transitory computer-readable medium known in the art. An exemplary non-transitory computer-readable medium may be coupled to the processor such that the processor can read information from, and write information to, the non-transitory computer-readable medium. In the alternative, the non-transitory computer-readable medium may be integral to the processor. The processor and the non-transitory computer-readable medium may reside in an ASIC. The ASIC may reside in an IoT device. In the alternative, the processor and the non-transitory computer-readable medium may be discrete components in a user terminal.
[0040] In one or more exemplary aspects, the functions described herein may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a non-transitory computer-readable medium. Computer-readable media may include storage media and/or communication media including any non-transitory medium that may facilitate transferring a computer program from one place to another. A storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of a medium. The term disk and disc, which may be used interchangeably herein, includes CD, laser disc, optical disc, DVD, floppy disk, and Blu-ray discs, which usually reproduce data magnetically and/or optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
[0041] While the foregoing disclosure shows illustrative aspects and embodiments, those skilled in the art will appreciate that various changes and modifications could be made herein without departing from the scope of the disclosure as defined by the appended claims. Furthermore, in accordance with the various illustrative aspects and embodiments described herein, those skilled in the art will appreciate that the functions, steps, and/or actions in any methods described above and/or recited in any method claims appended hereto need not be performed in any particular order. Further still, to the extent that any elements are described above or recited in the appended claims in a singular form, those skilled in the art will appreciate that singular form(s) contemplate the plural as well unless limitation to the singular form(s) is explicitly stated.