Patent classifications
G06F2212/6026
SYSTEMS AND METHODS FOR ADAPTIVE HYBRID HARDWARE PRE-FETCH
An apparatus includes a processor core and a memory hierarchy. The memory hierarchy includes main memory and one or more caches between the main memory and the processor core. A plurality of hardware pre-fetchers are coupled to the memory hierarchy and a pre-fetch control circuit is coupled to the plurality of hardware pre-fetchers. The pre-fetch control circuit is configured to compare changes in one or more cache performance metrics over two or more sampling intervals and control operation of the plurality of hardware pre-fetchers in response to a change in one or more performance metrics between at least a first sampling interval and a second sampling interval.
Streaming engine with early and late address and loop count registers to track architectural state
A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. The streaming engine stores an early address of next to be fetched data elements and a late address of a data element in the stream head register for each of the nested loops. The streaming engine stores an early loop counts of next to be fetched data elements and a late loop counts of a data element in the stream head register for each of the nested loops.
METHOD AND APPARATUS FOR DISTRIBUTED AND COOPERATIVE COMPUTATION IN ARTIFICIAL NEURAL NETWORKS
An apparatus and method are described for distributed and cooperative computation in artificial neural networks. For example, one embodiment of an apparatus comprises: an input/output (I/O) interface; a plurality of processing units communicatively coupled to the I/O interface to receive data for input neurons and synaptic weights associated with each of the input neurons, each of the plurality of processing units to process at least a portion of the data for the input neurons and synaptic weights to generate partial results; and an interconnect communicatively coupling the plurality of processing units, each of the processing units to share the partial results with one or more other processing units over the interconnect, the other processing units using the partial results to generate additional partial results or final results. The processing units may share data including input neurons and weights over the shared input bus.
Resource usage prediction for cluster provisioning
A system for provisioning resources includes a processor and a memory. The processor is configured to receive a time series of past usage data. The past usage data comprises process usage data and instance usage data. The processor is further configured to determine an upcoming usage data based at least in part on the time series of the past usage data, and provision a computing system according to the upcoming usage data.
Method, apparatus, and system for memory bandwidth aware data prefetching
An apparatus, method, and system for memory bandwidth aware data prefetching is presented. The method may comprise monitoring a number of request responses received in an interval at a current prefetch request generation rate, comparing the number of request responses received in the interval to at least a first threshold, and adjusting the current prefetch request generation rate to an updated prefetch request generation rate by selecting the updated prefetch request generation rate from a plurality of prefetch request generation rates, based on the comparison. The request responses may be NACK or RETRY responses. The method may further comprise either retaining a current prefetch request generation rate or selecting a maximum prefetch request generation rate as the updated prefetch request generation rate in response to an indication that prefetching is accurate.
Machine learning to improve caching efficiency in a storage system
A system and method improve caching efficiency in a data storage system by performing machine learning processes on metadata relating to extents of data blocks, rather than individual blocks themselves. Thus, once the storage devices are divided into extents, various metadata regarding access to the blocks within each extent are aggregated, and per-extent features are extracted. These features are used to train a data regression model that is subsequently used to infer a most likely “hotness” value for each extent at a future time. These predicted values, which may be further classified as e.g. “hot”, “warm”, and “cold” using thresholds, are used to implement the cache replacement policy. Embodiments scale to large and multi-layered caches, and may avoid common caching problems like thrashing, by adjusting the extent size. Policy goal functions may be optimized by dynamically adjusting the classification thresholds.
Selectively utilizing a read page cache mode in a memory subsystem
A method is described, which includes receiving, by a memory subsystem, a memory command targeted at a memory array; determining, by the memory subsystem, if the memory command is a high priority memory command; and determining if the memory subsystem is processing any non-high priority memory commands. The memory subsystem enables a read page cache mode for processing the memory command in response to determining that (1) the memory command is a high priority memory command and (2) the memory subsystem is not processing any non-high priority memory commands Thereafter, the memory subsystem processes the memory command using the read page cache mode.
RANGE PREFETCH INSTRUCTION
In response to an instruction decoder decoding a range prefetch instruction specifying first and second address-range-specifying parameters and a stride parameter, prefetch circuitry controls, depending on the first and second address-range-specifying parameters and the stride parameter, prefetching of data from a plurality of specified ranges of addresses into the at least one cache. A start address and size of each specified range is dependent on the first and second address-range-specifying parameters. The stride parameter specifies an offset between start addresses of successive specified ranges. Use of the range prefetch instruction helps to improve programmability and improve the balance between prefetch coverage and circuit area of the prefetch circuitry.
Self-consistent structures for secure transmission and temporary storage of sensitive data
Implementations provide self-consistent, temporary, secure storage of information. An example system includes short-term memory storing a plurality of key records and a cache storing a plurality of data records. The key records and data records are locatable using participant identifiers. Each key record includes a nonce and each data record includes an encrypted portion. The key records are deleted periodically. The system also includes memory storing instructions that cause the system to receive query parameters that include first participant identifiers and to obtain a first nonce. The first nonce is associated with the first participant identifiers in the short-term memory. The instructions also cause the system to obtain data records associated with the first participant identifiers in the cache, to build an encryption key using the nonce and the first participant identifiers, and to decrypt the encrypted portion of the obtained data records using the encryption key.
LINK STACK BASED INSTRUCTION PREFETCH AUGMENTATION
A computer-implemented method of performing a link stack based prefetch augmentation using a sequential prefetching includes observing a call instruction in a program being executed, and pushing a return address onto a link stack for processing the next instruction. A stream of instructions is prefetched starting from a cached line address of the next instruction and is stored in an instruction cache.