Patent classifications
G06F2212/401
SYSTEMS AND METHODS FOR READING AND WRITING SPARSE DATA IN A NEURAL NETWORK ACCELERATOR
Disclosed herein includes a system, a method, and a device for reading and writing sparse data in a neural network accelerator. A mask identifying byte positions within a data word having non-zero values in memory can be accessed. Each bit of the mask can have a first value or a second value, the first value indicating that a byte of the data word corresponds to a non-zero byte value, the second value indicating that the byte of the data word corresponds to a zero byte value. The data word can be modified to have non-zero byte values stored at an end of a first side of the data word in the memory, and any zero byte values stored in a remainder of the data word. The modified data word can be written to the memory via at least a first slice of a plurality of slices that is configured to access the first side of the data word in the memory.
Policy-based system interface for a real-time autonomous system
- Joydeep Ray ,
- Ben Ashbaugh ,
- Prasoonkumar Surti ,
- Pradeep Ramani ,
- Rama Harihara ,
- Jerin C. Justin ,
- Jing Huang ,
- Xiaoming Cui ,
- Timothy B. Costa ,
- Ting Gong ,
- Elmoustapha Ould-Ahmed-Vall ,
- Kumar Balasubramanian ,
- Anil Thomas ,
- Oguz H. Elibol ,
- Jayaram Bobba ,
- Guozhong Zhuang ,
- Bhavani Subramanian ,
- Gokce Keskin ,
- Chandrasekaran Sakthivel ,
- Rajesh Poornachandran
Embodiments are generally directed to compression in machine learning and deep learning processing. An embodiment of an apparatus for compression of untyped data includes a graphical processing unit (GPU) including a data compression pipeline, the data compression pipeline including a data port coupled with one or more shader cores, wherein the data port is to allow transfer of untyped data without format conversion, and a 3D compression/decompression unit to provide for compression of untyped data to be stored to a memory subsystem and decompression of untyped data from the memory subsystem.
Interfacing with systems, for processing data samples, and related systems, methods and apparatuses
Disclosed examples include an apparatus. The apparatus may include first interfaces, second interfaces, a bus interface, and a buffer interface. The first interfaces may be to communicate at first data widths via first interconnects for operable coupling with data samples sources. The second interface may be to communicate at second data widths via second interconnects for operable coupling with data sinks. The buffer interface may be to communicate with a system to process data samples sampled using different sampling rates according to processing frame durations. The buffer interface may include an uplink channel handler and a downlink channel handler. The uplink channel handler may be to receive data samples from the first interfaces at first data widths and provide the data samples to the bus interface at third data widths. The downlink channel handler may be to receive processed data samples from the bus interface at third data widths and provide the processed data samples to the second interfaces at second data widths. The bus interface to communicate at a third data width via a third interconnect for operative coupling with allocated memory region utilized by the system to process data samples.
COMPRESSION AWARE PREFETCH
Methods, devices, and systems for prefetching data. First data is loaded from a first memory location. The first data in cached in a cache memory. Other data is prefetched to the cache memory based on a compression of the first data and a compression of the other data. In some implementations, the compression of the first data and the compression of the other data are determined based on metadata associated with the first data and metadata associated with the other data. In some implementations, the other data is prefetched to the cache memory based on a total of a compressed size of the first data and a compressed size of the other data being less than a threshold size. In some implementations, the other data is not prefetched to the cache memory based on the other data being uncompressed.
Methods and apparatus to utilize non-volatile memory for computer system boot
Methods, apparatus, systems and articles of manufacture are disclosed to utilize non-volatile memory for computer system boot. An example processor platform includes a non-volatile memory coupled to a processing unit via a bus, and a microcontroller to: configure the processing unit to store, on the non-volatile memory, a heap and a stack for execution of boot code, and configure the processing unit to execute the boot code stored on the non-volatile memory.
High performance space efficient distributed storage
High performance space efficient distributed storage is disclosed. For example, a distributed storage volume (DSV) is deployed on a plurality of hosts, with a first host storing a local cache, and a storage controller executing on a processor of the first host receives a request to store a first file. The first file is stored to the local cache. The DSV is queried to determine whether a second file that is a copy of the first file is stored in the DSV. In response to determining that the DSV lacks the second file, the first file is transferred from the local cache to the DSV and then replicated to a second host of the plurality of hosts. In response to determining that the second file resides in the DSV, a reference to the second file is stored in the DSV and then replicated to the second host.
Technologies for assigning workloads to balance multiple resource allocation objectives
Technologies for allocating resources of managed nodes to workloads to balance multiple resource allocation objectives include an orchestrator server to receive resource allocation objective data indicative of multiple resource allocation objectives to be satisfied. The orchestrator server is additionally to determine an initial assignment of a set of workloads among the managed nodes and receive telemetry data from the managed nodes. The orchestrator server is further to determine, as a function of the telemetry data and the resource allocation objective data, an adjustment to the assignment of the workloads to increase an achievement of at least one of the resource allocation objectives without decreasing an achievement of another of the resource allocation objectives, and apply the adjustments to the assignments of the workloads among the managed nodes as the workloads are performed. Other embodiments are also described and claimed.
Compressed cache using dynamically stacked roaring bitmaps
A method for compressing data in a local cache of a web server is described. A local cache compression engine accesses values in the local cache and determines a cardinality of the values of the local cache. The local cache compression engine determines a compression rate of a compression algorithm based on the cardinality of the values of the local cache. The compression algorithm is applied to the cache based on the compression rate to generate a compressed local cache.
Resource-aware compression
Systems, apparatuses, and methods for implementing a multi-tiered approach to cache compression are disclosed. A cache includes a cache controller, light compressor, and heavy compressor. The decision on which compressor to use for compressing cache lines is made based on certain resource availability such as cache capacity or memory bandwidth. This allows the cache to opportunistically use complex algorithms for compression while limiting the adverse effects of high decompression latency on system performance. To address the above issue, the proposed design takes advantage of the heavy compressors for effectively reducing memory bandwidth in high bandwidth memory (HBM) interfaces as long as they do not sacrifice system performance. Accordingly, the cache combines light and heavy compressors with a decision-making unit to achieve reduced off-chip memory traffic without sacrificing system performance.
TECHNOLOGY FOR EARLY ABORT OF COMPRESSION ACCELERATION
An integrated circuit includes a compression accelerator to process a request from software to compress source data into an output file. The compression accelerator includes early-abort circuitry to provide for early abort of compression operations. In particular, the compression accelerator uses a predetermined sample size to compute an estimated size for a portion of the output file. The sample size specifies how much of the source data is to be analyzed before computing the estimated size. The compression accelerator also determines whether the estimated size reflects an acceptable amount of compression, based on a predetermined early-abort threshold. The compression accelerator aborts the request if the estimated size does not reflect the acceptable amount of compression. The compression accelerator may complete the request if the estimated size reflects the acceptable amount of compression. Other embodiments are described and claimed.