Patent classifications
G06F9/544
Asynchronous data movement pipeline
Apparatuses, systems, and techniques to parallelize operations in one or more programs with data copies from global memory to shared memory in each of the one or more programs. In at least one embodiment, a program performs operations on shared data and then asynchronously copies shared data to shared memory, and continues performing additional operations in parallel while the shared data is copied to shared memory until an indicator provided by an application programming interface to facilitate parallel computing, such as CUDA, informs said program that shared data has been copied to shared memory.
Configurable caching policy for transferring data via shared memory
Techniques are disclosed for transferring a message between a sender agent and a receiver agent via a shared memory having a main memory and a cache. Feedback data indicative of a number of read messages in the shared memory is generated by the receiver agent. The feedback data is sent from the receiver agent to the sender agent. A number of unread messages in the shared memory is estimated by the sender agent based on the number of read messages. A threshold for implementing a caching policy is set by the sender agent based on the feedback data. The message is designated as cacheable if the number of unread messages is less than the threshold and as non-cacheable if the number of unread messages is greater than the threshold. The message is written to the shared memory based on the designation.
COMPUTATION AND STORAGE OF OBJECT IDENTITY HASH VALUES
Techniques for computing and storing object identity hash values are disclosed. In some embodiments, a runtime system generates a value, such as a nonce, that is unique to a particular allocation region within memory. The runtime system may mix the value with one or more seed values that are associated with one or more respective objects stored in the allocation region. The runtime system may obtain object identifiers for the respective objects by applying a hash function to the result of mixing the seed value with at least the value associated with the allocation region. Conditioning operations may also be applied before, during or after the mixing operations to make the values appear more random. The nonce value may be changed from time to time, such as when memory is recycled in the allocation region, to reduce the risk of hash collisions.
Data compression for having one direct connection between host and port of storage system via internal fabric interface
Data compression is performed on a storage system for which one or more host systems have direct access to data on the storage system. The storage system may compress the data for one or more logical storage units (LSUs) having data stored thereon, and may update compression metadata associated with the LSUs and/or the data portions thereof to reflect that the data is compressed. In response to a read request for a data portion received from a host application executing on the host system, compression metadata for the data portion may be accessed. If it is determined from the compression metadata that the data portion is compressed, the data compression metadata for the data portion may be further analyzed to determine how to decompress the data portion. The data portion may be retrieved and decompressed, and the decompressed data may be returned to the requesting application.
Database backup performance
The indication of a database backup process starting is received. A plurality of buffer manipulator threads are marked as active. A determination is made whether any of the plurality of buffer manipulator threads marked as active in the database backup process become idle. In response to determining that at least one buffer manipulator thread in the database backup process becomes idle, the at least one idle buffer manipulator thread is reassigned to a first largest tablespace in the database that has not completed the database backup process. The first largest tablespace has a largest storage capacity relative to other tablespaces in the database.
Automatic out-of-bound access prevention in GPU kernels executed in a managed environment
Techniques are provided for an automated method of adding out-of-bound access prevention in GPU kernels executed in a managed environment. In an embodiment, a system of computers compiles a GPU kernel code function that includes one or more array references that are memory address dependent. The system of computers compiles the kernel code function by generating a rewritten GPU kernel code module that includes, within the function signature of the rewritten GPU kernel code module, a respective array size parameter for each array reference of the one or more array references included in the GPU kernel code function. The system of computers further compiles the kernel code function by adding bounding protection instructions to the one or more potential out-of-bound access instructions in the rewritten GPU kernel code module. The potential out-of-bound access instructions comprise instructions that reference each respective array size parameter of the one or more array references. Afterwards, the rewritten GPU kernel code module is loaded in a virtual machine. Loading the rewritten GPU kernel code module in the virtual machine comprises modifying a host application to automatically transmit, from the host application, one or more input array size values. The one or more input array size values is referenced by the one or more potential out-of-bound-access instructions.
Shared memory management
Techniques for memory management may include: allocating, from an allocation pool, buffers for logs used by processing cores; recording messages in the logs for the processing cores; responsive to filling a first buffer included in a first log used to record messages for a first of the plurality of processing cores, allocating a second buffer of the allocation pool for the first log; adding the second buffer to the first list of buffers for the first log; and adding the first buffer, that is included in the first list for the first log, to the allocation pool, wherein after adding the first buffer to the allocation pool, the first buffer is included in the first list of buffers for the first log and also included in the allocation pool. The buffers may be included in a distributed global memory portion of the same computing module as the processing cores.
Efficient data read operation
In response to receiving a read metadata request from the host system for a data portion, the storage system may determine a status and location of the data portion, including determining whether the data portion is in a cache of the storage system. If the data portion is in the cache, the storage system may send a response that includes the data portion itself along with the status and location of the data portion. If the data portion is not in the cache, the storage system may send a response to the read metadata request that includes the status and location of the data portion, but not the data portion itself. The host system may be configured to determine whether the data portion has been returned with the metadata response, and if so, refrain from sending a separate data request, for example, to retrieve the data portion from cache.
DEEP LEARNING-BASED CHANNEL BUFFER COMPRESSION
A method and system are provided. The method includes performing channel estimation on a reference signal (RS), compressing, with a neural network, the channel estimation of the RS, decompressing, with the neural network, the compressed channel estimation, and interpolating the decompressed channel estimation.
Systems and methods for sharing memory pointers across multiple processes
A disclosed method may include (1) mapping a block of shared memory to a plurality of processes running on a computing device, (2) determining, for a process within the plurality of processes, a local pointer that references a specific portion of the block of shared memory from a shared memory pointer that is shared across the plurality of processes by (A) identifying, within the shared memory pointer, a block number assigned to the block of shared memory and (B) identifying, within the shared memory pointer, an offset that corresponds to the specific portion of the block of shared memory relative to the process, and then (3) performing an operation on the specific portion of the block of shared memory based at least in part on the local pointer. Various other systems, methods, and computer-readable media are also disclosed.