G06F12/0813

DETERMINISTIC MEMORY FOR TENSOR STREAMING PROCESSORS
20230024670 · 2023-01-26 ·

Embodiments are directed to a deterministic streaming system with one or more deterministic streaming processors each having an array of processing elements and a first deterministic memory coupled to the processing elements. The deterministic streaming system further includes a second deterministic memory with multiple data banks having a global memory address space, and a controller. The controller initiates retrieval of first data from the data banks of the second deterministic memory as a first plurality of streams, each stream of the first plurality of streams streaming toward a respective group of processing elements of the array of processing elements. The controller further initiates writing of second data to the data banks of the second deterministic memory as a second plurality of streams, each stream of the second plurality of streams streaming from the respective group of processing elements toward a respective data bank of the second deterministic memory.

LOW LATENCY INTER-CHIP COMMUNICATION MECHANISM IN A MULTI-CHIP PROCESSING SYSTEM

Systems and methods of multi-chip processing with low latency and congestion. In a multi-chip processing system, each chip includes a plurality of clusters arranged in a mesh design. A respective interconnect controller is disposed at the end of each column. The column is linked to a corresponding remote column in the other chip. A share cache controller in the column is paired with a corresponding cache controller in the remote column, the pair of cache controllers are configured to control data caching for a same set of main memory locations. Communications between cross-chip cache controllers are performed within linked columns of clusters via the column-specific inter-chip interconnect controllers.

LOW LATENCY INTER-CHIP COMMUNICATION MECHANISM IN A MULTI-CHIP PROCESSING SYSTEM

Systems and methods of multi-chip processing with low latency and congestion. In a multi-chip processing system, each chip includes a plurality of clusters arranged in a mesh design. A respective interconnect controller is disposed at the end of each column. The column is linked to a corresponding remote column in the other chip. A share cache controller in the column is paired with a corresponding cache controller in the remote column, the pair of cache controllers are configured to control data caching for a same set of main memory locations. Communications between cross-chip cache controllers are performed within linked columns of clusters via the column-specific inter-chip interconnect controllers.

Method for transferring packets of a communication protocol

A method for transferring packets of a communication protocol via a memory-based interface between two processing units. The method includes providing, in each of the processing units, a send area including a read index section, a write index section, and a send buffer, and a receive area including a read index section, a write index section and a receive buffer. Each processing unit repeats as sending steps: reading a read index from the receive area; writing at least one send packet into the send buffer (from a starting write address to an ending write address, the ending write address maximally corresponding to a buffer address assigned to the read read index, and writing a changed write index into the send area.

COMPOSABLE INFRASTRUCTURE ENABLED BY HETEROGENEOUS ARCHITECTURE, DELIVERED BY CXL BASED CACHED SWITCH SOC

Described herein are systems, methods, and products utilizing a cache coherent switch on chip. The cache coherent switch on chip may utilize Compute Express Link (CXL) interconnect open standard and allow for multi-host access and the sharing of resources. The cache coherent switch on chip provides for resource sharing between components while independent of a system processor, removing the system processor as a bottleneck. Cache coherent switch on chip may further allow for cache coherency between various different components. Thus, for example, memories, accelerators, and/or other components within the disclose systems may each maintain caches, and the systems and techniques described herein allow for cache coherency between the different components of the system with minimal latency.

SYSTEM AND METHOD FOR OPTIMIZING CACHED MEMORY COMPRISING VARYING DEGREES OF SLA AND CRG

A system and method for optimizing cached memory comprising varying degrees of Service Level Agreements (SLA) and Consistency Requirement Grades (CRG). The system receives one or more requests from one or more client devices to store information in a cache memory, and determines degrees of Service Level Agreement (SLA) and CRG in the information received via requests or system configurations. Further, system stores for one-time in cache layer of cache memory, the information as master record, based on determining the degrees of SLA and CRG. Furthermore, the system stores grades of entries of information referencing to the master record in different layers of cache memory. Each of the grades of entries comprises different Time-To-Live (TTL). Thereafter, the system outputs the information stored in the master record to client devices, based on SLA and consistency requirements.

SYSTEM AND METHOD FOR OPTIMIZING CACHED MEMORY COMPRISING VARYING DEGREES OF SLA AND CRG

A system and method for optimizing cached memory comprising varying degrees of Service Level Agreements (SLA) and Consistency Requirement Grades (CRG). The system receives one or more requests from one or more client devices to store information in a cache memory, and determines degrees of Service Level Agreement (SLA) and CRG in the information received via requests or system configurations. Further, system stores for one-time in cache layer of cache memory, the information as master record, based on determining the degrees of SLA and CRG. Furthermore, the system stores grades of entries of information referencing to the master record in different layers of cache memory. Each of the grades of entries comprises different Time-To-Live (TTL). Thereafter, the system outputs the information stored in the master record to client devices, based on SLA and consistency requirements.

Distributed key caching for encrypted keys

Methods, systems, and devices for distributed caching of encrypted encryption keys are described. Some multi-tenant database systems may support encryption of data records. To efficiently handle multiple encryption keys across multiple application servers, the database system may store the encryption keys in a distributed cache accessible by each of the application servers. To securely cache the encryption keys, the database system may encrypt (e.g., wrap) each data encryption key (DEK) using a second encryption key (e.g., a key encryption key (KEK)). The database system may store the DEKs and KEKs in separate caches to further protect the encryption keys. For example, while the encrypted DEKs may be stored in the distributed cache, the KEKs may be stored locally on application servers. The database system may further support “bring your own key” (BYOK) functionality, where a user may upload a tenant secret or tenant-specific encryption key to the database.

Distributed key caching for encrypted keys

Methods, systems, and devices for distributed caching of encrypted encryption keys are described. Some multi-tenant database systems may support encryption of data records. To efficiently handle multiple encryption keys across multiple application servers, the database system may store the encryption keys in a distributed cache accessible by each of the application servers. To securely cache the encryption keys, the database system may encrypt (e.g., wrap) each data encryption key (DEK) using a second encryption key (e.g., a key encryption key (KEK)). The database system may store the DEKs and KEKs in separate caches to further protect the encryption keys. For example, while the encrypted DEKs may be stored in the distributed cache, the KEKs may be stored locally on application servers. The database system may further support “bring your own key” (BYOK) functionality, where a user may upload a tenant secret or tenant-specific encryption key to the database.

ARITHMETIC PROCESSOR AND METHOD FOR OPERATING ARITHMETIC PROCESSOR
20230010353 · 2023-01-12 · ·

An arithmetic processor including a plurality of core groups each including a plurality of cores and a cache unit, a plurality of home agents each including a tag directory and a store command queue and a store command queue. The store command queue enters the received store request to the entry queue in order of reception, the cache unit stores the data of the store request in a data RAM. The store command queue sets a data ownership acquisition flag of the store request to valid when obtaining a data ownership of the store request and issues a top-of-queue notification to the cache control unit when the flag of the top-of-queue entry is valid. In response to the top-of-queue notification, the cache unit update a cache tag to modified state and issue a store request completion notification.