Patent classifications
G06F2212/301
Microcontroller for memory management unit
One embodiment of the present invention includes a microcontroller coupled to a memory management unit (MMU). The MMU is coupled to a page table included in a physical memory, and the microcontroller is configured to perform one or more virtual memory operations associated with the physical memory and the page table. In operation, the microcontroller receives a page fault generated by the MMU in response to an invalid memory access via a virtual memory address. To remedy such a page fault, the microcontroller performs actions to map the virtual memory address to an appropriate location in the physical memory. By contrast, in prior-art systems, a fault handler would typically remedy the page fault. Advantageously, because the microcontroller executes these tasks locally with respect to the MMU and the physical memory, latency associated with remedying page faults may be decreased. Consequently, overall system performance may be increased.
Multi-granular cache coherence
Technologies are generally described for methods and systems effective to maintain coherence in a multi-core processor on a die. In an example, a method for processing a request for a particular block in a particular region may include analyzing, by a first processor, a first cache to determine whether there is a block indicator in the first cache associated with the particular block. The method may further include when the first processor determines that the block indicator is not present in the first cache, analyzing, by the first processor, the first cache to determine whether there is a region indicator associated with the particular region. The method may further include when the first processor determines that the region indicator is not present in the first cache, the method further includes sending, by the first processor, the request to the directory in the tile.
Arithmetic processing device, information processing device, and control method of arithmetic processing device
An arithmetic processing device which connects to a main memory, the arithmetic processor includes a cache memory which stores data, an arithmetic unit which performs an arithmetic operation for data stored in the cache memory, a first control device which controls the cache memory and outputs a first request which reads the data stored in the main memory, and a second control device which is connected to the main memory and transmits a plurality of second requests which are divided the first request output from the first control device, receives data corresponding to the plurality of second requests which is transmitted from the main memory and sends each of the data to the first control device.
Coprocessor operation bundling
In an embodiment, a processor includes a buffer in an interface unit. The buffer may be used to accumulate coprocessor instructions to be transmitted to a coprocessor. In an embodiment, the processor issues the coprocessor instructions to the buffer when ready to be issued to the coprocessor. The interface unit may accumulate the coprocessor instructions in the buffer, generating a bundle of instructions. The bundle may be closed based on various predetermined conditions and then the bundle may be transmitted to the coprocessor. If a sequence of coprocessor instructions appears consecutively in a program, the rate at which the instructions are provided to the coprocessor (on average) at least matches the rate at which the coprocessor consumes the instructions, in an embodiment.
Victim cache that supports draining write-miss entries
A caching system including a first sub-cache and a second sub-cache in parallel with the first sub-cache, wherein the second sub-cache includes a set of cache lines, line type bits configured to store an indication that a corresponding cache line of the set of cache lines is configured to store write-miss data, and an eviction controller configured to flush stored write-miss data based on the line type bits.
Write merging on stores with different tags
Techniques for caching data are provided that include receiving, by a caching system, a write memory command for a memory address, the write memory command associated with a first color tag, determining, by a first sub-cache of the caching system, that the memory address is not cached in the first sub-cache, determining, by second sub-cache of the caching system, that the memory address is not cached in the second sub-cache, storing first data associated with the first write memory command in a cache line of the second sub-cache, storing the first color tag in the second sub-cache, receiving a second write memory command for the cache line, the write memory command associated with a second color tag, merging the second color tag with the first color tag, storing the merged color tag, and evicting the cache line based on the merged color tag.
Managing low-level instructions and core interactions in multi-core processors
Managing the messages associated with memory pages stored in a main memory includes: receiving a message from outside the pipeline, and providing at least one low-level instruction to the pipeline for performing an operation indicated by the received message. Executing instructions in the pipeline includes: executing a series of low-level instructions in the pipeline, where the series of low-level instructions includes a first (second) set of low-level instructions converted from a first (second) high-level instruction. The second high-level instruction occurs after the first high-level instruction within a series of high-level instructions, and delaying insertion of the low-level instruction provided for performing the operation into an insertion position within the series of low-level instructions, where the delaying causes the insertion position to be between a final low-level instruction converted from the first high-level instruction and an initial low-level instruction converted from the second high-level instruction.
BLOCK DEVICE INTERFACE USING NON-VOLATILE PINNED MEMORY
A method comprising: receiving, at a block device interface, an instruction to write data, the instruction comprising a memory location of the data; copying the data to pinned memory; performing, by a vector processor, one or more invertible transforms on the data; and writing the data from the pinned memory to one or more storage devices asynchronously; wherein the pinned memory of the data corresponds to a location in pinned memory, the pinned memory being accessible by the vector processor and one or more other processors.
Coprocessor Operation Bundling
In an embodiment, a processor includes a buffer in an interface unit. The buffer may be used to accumulate coprocessor instructions to be transmitted to a coprocessor. In an embodiment, the processor issues the coprocessor instructions to the buffer when ready to be issued to the coprocessor. The interface unit may accumulate the coprocessor instructions in the buffer, generating a bundle of instructions. The bundle may be closed based on various predetermined conditions and then the bundle may be transmitted to the coprocessor. If a sequence of coprocessor instructions appears consecutively in a program, the rate at which the instructions are provided to the coprocessor (on average) at least matches the rate at which the coprocessor consumes the instructions, in an embodiment.
PREFETCH KILL AND REVIVAL IN AN INSTRUCTION CACHE
A system comprises a processor including a CPU core, first and second memory caches, and a memory controller subsystem. The memory controller subsystem speculatively determines a hit or miss condition of a virtual address in the first memory cache and speculatively translates the virtual address to a physical address. Associated with the hit or miss condition and the physical address, the memory controller subsystem configures a status to a valid state. Responsive to receipt of a first indication from the CPU core that no program instructions associated with the virtual address are needed, the memory controller subsystem reconfigures the status to an invalid state and, responsive to receipt of a second indication from the CPU core that a program instruction associated with the virtual address is needed, the memory controller subsystem reconfigures the status back to a valid state.