Patent classifications
G06F12/023
Message Object Traversal In High-Performance Network Messaging Architecture
A communications system implements instructions including maintaining a message object that includes an array of entries. Each entry of the array includes a field identifier, a data type, and a next entry pointer. The next entry pointers and a head pointer establish a linked list of entries. The instructions include, in response to a request to add a new entry to the message object, calculating an index based on a field identifier of the new entry and determining whether the entry at the calculated index within the array of entries is active. The instructions include, if the entry is inactive, writing a data type, field identifier, and data value of the new entry to the calculated index, and inserting the new entry into the linked list. The instructions include, if the entry is already active, selectively expanding the size of the array and repeating the calculating and determining.
DETERMINISTIC MEMORY FOR TENSOR STREAMING PROCESSORS
Embodiments are directed to a deterministic streaming system with one or more deterministic streaming processors each having an array of processing elements and a first deterministic memory coupled to the processing elements. The deterministic streaming system further includes a second deterministic memory with multiple data banks having a global memory address space, and a controller. The controller initiates retrieval of first data from the data banks of the second deterministic memory as a first plurality of streams, each stream of the first plurality of streams streaming toward a respective group of processing elements of the array of processing elements. The controller further initiates writing of second data to the data banks of the second deterministic memory as a second plurality of streams, each stream of the second plurality of streams streaming from the respective group of processing elements toward a respective data bank of the second deterministic memory.
Methods and arrangements to manage memory in cascaded neural networks
Logic may reduce the size of runtime memory for deep neural network inference computations. Logic may determine, for two or more stages of a neural network, a count of shared block allocations, or shared memory block allocations, that concurrently exist during execution of the two or more stages. Logic may compare counts of the shared block allocations to determine a maximum count of the counts. Logic may reduce inference computation time for deep neural network inference computations. Logic may determine a size for each of the shared block allocations of the count of shared memory block allocations, to accommodate data to store in a shared memory during execution of the two or more stages of the cascaded neural network. Logic may determine a batch size per stage of the two or more stages of a cascaded neural network based on a lack interdependencies between input data.
Encoded inline capabilities
Disclosed embodiments relate to encoded inline capabilities. In one example, a system includes a trusted execution environment (TEE) to partition an address space within a memory into a plurality of compartments each associated with code to execute a function, the TEE further to assign a message object in a heap to each compartment, receive a request from a first compartment to send a message block to a specified destination compartment, respond to the request by authenticating the request, generating a corresponding encoded capability, conveying the encoded capability to the destination compartment, and scheduling the destination compartment to respond to the request, and subsequently, respond to a check capability request from the destination compartment by checking the encoded capability and, when the check passes, providing a memory address to access the message block, and, otherwise, generating a fault, wherein each compartment is isolated from other compartments.
DYNAMIC ASSIGNMENT OF DOWN SAMPLING INTERVALS FOR DATA STREAM PROCESSING
- Joydeep Ray ,
- Ben Ashbaugh ,
- Prasoonkumar Surti ,
- Pradeep Ramani ,
- Rama Harihara ,
- Jerin C. Justin ,
- Jing Huang ,
- Xiaoming Cui ,
- Timothy B. Costa ,
- Ting Gong ,
- Elmoustapha Ould-Ahmed-Vall ,
- Kumar Balasubramanian ,
- Anil Thomas ,
- Oguz H. Elibol ,
- Jayaram Bobba ,
- Guozhong Zhuang ,
- Bhavani Subramanian ,
- Gokce Keskin ,
- Chandrasekaran Sakthivel ,
- Rajesh Poornachandran
Embodiments are generally directed to compression in machine learning and deep learning processing. An embodiment of an apparatus for compression of untyped data includes a graphical processing unit (GPU) including a data compression pipeline, the data compression pipeline including a data port coupled with one or more shader cores, wherein the data port is to allow transfer of untyped data without format conversion, and a 3D compression/decompression unit to provide for compression of untyped data to be stored to a memory subsystem and decompression of untyped data from the memory subsystem.
SYSTEMS AND METHODS FOR READING AND WRITING SPARSE DATA IN A NEURAL NETWORK ACCELERATOR
Disclosed herein includes a system, a method, and a device for reading and writing sparse data in a neural network accelerator. A mask identifying byte positions within a data word having non-zero values in memory can be accessed. Each bit of the mask can have a first value or a second value, the first value indicating that a byte of the data word corresponds to a non-zero byte value, the second value indicating that the byte of the data word corresponds to a zero byte value. The data word can be modified to have non-zero byte values stored at an end of a first side of the data word in the memory, and any zero byte values stored in a remainder of the data word. The modified data word can be written to the memory via at least a first slice of a plurality of slices that is configured to access the first side of the data word in the memory.
Adaptive user defined health indication
Methods, systems, and devices for adaptive user defined health indications are described. A host device may be configured to dynamically indicate adaptive health flags for monitoring health and wear information for a memory device. The host device may indicate, to a memory device, a first index. The first index may correspond to a first level of wear of a set of multiple indexed levels of wear for the memory device. The memory device may determine that a metric of the memory device satisfies the first level of wear and indicate, to the host device, that the first level of wear is satisfied. The host device may receive the indication that the first level of wear is satisfied and indicate, to the memory device, a second level of wear of the set of indexed levels of wear that is different than the first level of wear.
SYSTEMS, METHODS, AND DEVICES FOR UTILIZATION AWARE MEMORY ALLOCATION
A method may include receiving, from a process, a memory allocation request for a memory system comprising a first channel having a first channel utilization and a second channel having a second channel utilization, selecting, based on the first channel utilization and the second channel utilization, the first channel, and allocating, to the process, a page of memory from the first channel. The selecting may include selecting the first channel based on a balanced random policy. The selecting may include generating a ticket based on a random number and a number of free pages, comparing the ticket to a number of free pages of the first channel, and selecting the first channel based on the comparing. The selecting may include selecting the first channel based on a least used channel policy.
Policy-based system interface for a real-time autonomous system
- Joydeep Ray ,
- Ben Ashbaugh ,
- Prasoonkumar Surti ,
- Pradeep Ramani ,
- Rama Harihara ,
- Jerin C. Justin ,
- Jing Huang ,
- Xiaoming Cui ,
- Timothy B. Costa ,
- Ting Gong ,
- Elmoustapha Ould-Ahmed-Vall ,
- Kumar Balasubramanian ,
- Anil Thomas ,
- Oguz H. Elibol ,
- Jayaram Bobba ,
- Guozhong Zhuang ,
- Bhavani Subramanian ,
- Gokce Keskin ,
- Chandrasekaran Sakthivel ,
- Rajesh Poornachandran
Embodiments are generally directed to compression in machine learning and deep learning processing. An embodiment of an apparatus for compression of untyped data includes a graphical processing unit (GPU) including a data compression pipeline, the data compression pipeline including a data port coupled with one or more shader cores, wherein the data port is to allow transfer of untyped data without format conversion, and a 3D compression/decompression unit to provide for compression of untyped data to be stored to a memory subsystem and decompression of untyped data from the memory subsystem.
Booting a secondary operating system kernel with reclaimed primary kernel memory
Methods that boot a secondary operating system (O/S) kernel with reclaimed primary kernel memory are disclosed herein. One method includes booting, via a processor performing a boot algorithm, a secondary kernel for an O/S in response to a primary kernel for the O/S going offline, in which the secondary kernel is configured to be loaded to a reserved memory area. The method further includes reclaiming memory space from the primary kernel for use in booting the secondary kernel in response to a determination that the reserved memory area includes insufficient memory space for completing the boot algorithm. Also disclosed herein are apparatus, systems, and computer program products that can include, perform, and/or implement the methods for providing a secondary kernel that includes a reserved area in memory.