Patent classifications
G06F2212/6028
Method and an apparatus for pre-fetching and processing work for procesor cores in a network processor
A method and a system embodying the method for pre-fetching and processing work for processor cores in a network processor, comprising requesting pre-fetch work by a requestor; determining that the work may be pre-fetched for the requestor; searching for the work to pre-fetch; and pre-fetching the found work into one of one or more pre-fetch work-slots associated with the requestor is disclosed.
In-Memory/Register Vector Radix Sort
Methods, systems and computer program products for accelerating sorting of data are provided herein. A computer-implemented method includes retrieving a plurality of cache lines of data from an input buffer, wherein each cache line comprises a plurality of elements, scattering the plurality of elements of each retrieved cache line into a plurality of bins, wherein said scattering comprises using one or more vector instructions, forming a bin cache line in a corresponding one of the plurality of bins, wherein the bin cache line comprises a group of the plurality of elements which were scattered to the corresponding one of the plurality of bins, writing the bin cache line from the corresponding one of the plurality of bins to a memory, and loading the bin cache line from the memory to the input buffer.
CACHE MANAGEMENT OPERATIONS USING STREAMING ENGINE
A stream of data is accessed from a memory system using a stream of addresses generated in a first mode of operating a streaming engine in response to executing a first stream instruction. A block cache management operation is performed on a cache in the memory using a block of addresses generated in a second mode of operating the streaming engine in response to executing a second stream instruction.
Command interface and pre-fetch architecture
A data storage system includes a memory including a plurality of memory cells; and control logic configured to receive a first data string and determine a data type of the first data string. If the first data string is a combination command, the control logic obtains a plurality of sub-commands based on the first data string. Meanwhile, the control logic receives a second data string, determines that it represents an address, and decodes the address. While decoding the address or otherwise processing the second data string, the control logic performs a system operation specified by one of the sub-commands. The control logic also performs a memory operation, specified by another of the sub-commands, on one or more of the plurality of memory cells in accordance with the decoded address.
METHOD AND APPARATUS WITH ACCELERATOR PROCESSING
An accelerator includes processing elements configured to perform an operation associated with an instruction received from a host processor, hierarchical memories configured to be accessible by any one or any combination of any two or more of the processing elements, and sub-cores configured to prefetch data associated with the operation to a memory of a corresponding level of the hierarchical memories.
Software assisted data address prefetch
Three new software instructions assist a processor in performing indirect prefetching, and managing a next-to-prefetch address list. The software instructions populate hardware register locations according to a hardware register description comprising a data structure of at least seven fields. Multiple instances of the data structure, shared across multiple respectively corresponding threads running concurrently, comprise an indirect-prefetch-tracker table. The indirect-prefetch-tracker table assists the processor to efficiently perform indirect prefetching, from random (not necessarily contiguous) memory locations, and reduces processor core real estate dedicated to control and management of data prefetch and loading operations.
Host-Assisted Memory-Side Prefetcher
Methods, apparatuses, and techniques related to a host-assisted memory-side prefetcher are described herein. In general, prefetchers monitor the pattern of memory-address requests by a host device and use the pattern information to determine or predict future memory-address requests and fetch data associated with those predicted requests into a faster memory. In many cases, prefetchers that can make predictions with high performance use appreciable processing and computing resources, power, and cooling. Generally, however, producing a prefetching configuration that the prefetcher uses involves more resources than making predictions. The described host-assisted memory-side prefetcher uses the greater computing resources of the host device to produce at least an updated prefetching configuration. The memory-side prefetcher uses the prefetching configuration to predict the data to prefetch into the faster memory, which allows a higher-performance prefetcher to be implemented in the memory device with a reduced resource burden on the memory device.
Hint model updating using automated browsing clusters
Embodiments seek to improve prefetch hinting by using automated browsing clusters to generate and update hinting models used for machine-generated hints. For example, hinting machines can include browsing clusters that autonomously fetch web pages in response to update triggers (e.g., client web page requests, scheduled web crawling, etc.) and generate timing and/or other hinting-related feedback relating to which resources were used to load the fetched web pages. The hinting machines can use the hinting feedback to generate and/or update hinting models, which can be used for machine-generation of hints. Some embodiments can provide preliminary hinting functionality in response to client hinting requests, for example, when hinting models for a requested page are insufficient (e.g., unavailable, outdated, etc.). For example, without having a sufficient hinting model in place, the hinting machine can fetch the page to generate preliminary hinting feedback, which it can use to machine-generate preliminary hints.
Process data caching through iterative feedback
Systems and methods for improved process caching through iterative feedback are disclosed. In embodiments, a computer implemented method comprises retrieving updated metadata of a process to be executed, wherein the updated metadata includes information regarding cache misses from a prior execution of the process; automatically modifying a setting of a data stream control register based on the updated metadata; automatically setting a hint at a data cache block touch module; performing an initial execution of the process after the steps of retrieving the updated metadata, automatically modifying the setting of the data stream control register, and automatically setting the hint at the data cache block touch module; and modifying the updated metadata of the process after the execution of the process based on cache miss statistical data gathered during the execution of the process, to produce newly updated metadata.
Stacked memory dice for combined access operations
Methods, systems, and devices for stacked memory dice and combined access operations are described. A device may include multiple memory dice. One die may be configured as a master, and another may be configured as a slave. The master may communicate with a host device. A slave may be coupled with the master but not the host device. The device may include a first die (e.g., master) and a second die (e.g., slave). The first die may be coupled with a host device and configured to output a set of data in response to a read command. The first die may supply a first subset of the data and obtain a second subset of the data from the second die. In some cases, the first die may select, based on a data rate, a modulation scheme (e.g., PAM4, NRZ, etc.) and output the data using the selected modulation scheme.