G06F2212/1008

DMA engine that generates an address-less memory descriptor that does not include a memory address for communicating with integrated circuit device
11513986 · 2022-11-29 · ·

To improve data throughput and data transfer rate, a contiguous block of host memory can be allocated for data transfers between the host system and an integrated circuit device such as a peripheral component. By using a contiguous block of memory that acts as a circular buffer, the memory address field of memory descriptors can be eliminated because the host system only need to inform the data movement engine of the length of each data transfer. The data movement engine can maintain pointers to keep track of the memory address in the host memory to read from and write to. After each data transfer, the relevant pointer can be incremented by a value corresponding to the length indicated in the memory descriptor for the transfer. As such, it is not necessary for the host system to provide the data movement engine with the memory address of each transfer.

Technologies for switching network traffic in a data center

Technologies for switching network traffic include a network switch. The network switch includes one or more processors and communication circuitry coupled to the one or more processors. The communication circuitry is capable of switching network traffic of multiple link layer protocols. Additionally, the network switch includes one or more memory devices storing instructions that, when executed, cause the network switch to receive, with the communication circuitry through an optical connection, network traffic to be forwarded, and determine a link layer protocol of the received network traffic. The instructions additionally cause the network switch to forward the network traffic as a function of the determined link layer protocol. Other embodiments are also described and claimed.

PHYSICALLY DISTRIBUTED CONTROL PLANE FIREWALLS WITH UNIFIED SOFTWARE VIEW

Various embodiments include techniques for processing transactions via a computer system interconnect with a distributed firewall. The distributed firewall includes separate firewalls for various initiators of transactions and separate firewalls for various targets of those transactions. As a result, transactions proceed, for example, along the shortest path from the initiator to the target, rather than being routed through a centralized firewall. In addition, firewall transactions, for example, may be remapped such that initiators address the initiator firewalls and target firewalls via a unified address space, without having to maintain separate base addresses for each initiator firewall and target firewall. As a result, application programs, for example, can execute transactions with increased performance on a computer system as compared to prior approaches.

PROCESSOR CORE SIMULATOR INCLUDING TRACE-BASED COHERENT CACHE DRIVEN MEMORY TRAFFIC GENERATOR
20230056423 · 2023-02-23 ·

A core simulator includes one or more simulated processors, a trace-based traffic generator, and a simulated memory subsystem. Each simulated processor includes a core element and at least one lower-level cache excluded from the core element. The trace-based traffic generator includes a plurality of modeled caches that model the at least lower-level cache without modeling the core element. The trace-based traffic generator is configured to receive at least one workload trace and based on the workload trace simulate actual memory traffic to be processed by the simulated memory subsystem. The simulated memory subsystem is shared between the at least one simulated processor and the trace-based traffic generator. The trace-based traffic generator performs a data exchange with the memory subsystem based on the at least one workload trace. The data exchange impacts a measured performance of the at least one simulated processor.

DETERMINISTIC MEMORY ALLOCATION FOR REAL-TIME APPLICATIONS
20220365764 · 2022-11-17 ·

Deterministic memory allocation for real-time applications. In an embodiment, bitcode is scanned to detect calls by a memory allocation function to a dummy function. Each call uses parameters comprising an identifier of a memory pool and a size of a data type to be stored in the memory pool. For each detected call, an allocation record, comprising the parameters, is generated. Then, a header file is generated based on the allocation records. The header file may comprise a definition of bucket(s) and a definition of memory pools. Each definition of a memory pool may identify at least one bucket.

Throttling Schemes in Multicore Microprocessors
20220365879 · 2022-11-17 ·

An electronic device includes a cache, a processing cluster having one or more processors, and prefetch throttling circuitry that determines a congestion level of the processing cluster based on an extent to which the data retrieval requests sent from the processors to the cache are not satisfied by the cache. Congestion criteria require that the congestion level of the cluster is above a cluster congestion threshold. In accordance with a determination that the congestion level of the cluster satisfies the congestion criteria, the prefetch throttling circuit causes one of the processors to limit prefetch requests to the cache to prefetch requests of at least a threshold quality. In accordance with a determination that the congestion level of the cluster does not satisfy the congestion criteria, the prefetch throttling circuit forgoes causing the processors to limit prefetch requests to the cache to prefetch requests of at least the threshold quality.

Recovery of validity data for a data storage system

The subject technology provides for recovering a validity table for a data storage system. A set of logical addresses in a mapping table is partitioned into subsets of logical addresses. Each of the subsets of logical addresses is assigned to respective processor cores in the data storage system. Each of the processor cores is configured to check each logical address of the assigned subset of logical addresses in the mapping table for a valid physical address mapped to the logical address, for each valid physical address mapped to a logical address of the assigned subset of logical addresses, increment a validity count in a local validity table associated with a blockset of the non-volatile memory corresponding to the valid physical address, and update validity counts in a global validity table associated with respective blocksets of the non-volatile memory with the validity counts in the local validity table.

Pseudo-first in, first out (FIFO) tag line replacement

A method is provided that includes searching tags in a tag group comprised in a tagged memory system for an available tag line during a clock cycle, wherein the tagged memory system includes a plurality of tag lines having respective tags and wherein the tags are divided into a plurality of non-overlapping tag groups, and searching tags in a next tag group of the plurality of tag groups for an available tag line during a next clock cycle when the searching in the tag group does not find an available tag line.

Vector prefetching for computing systems
11500779 · 2022-11-15 · ·

Described is a computing system for vector prefetching which includes a hierarchical memory including multiple caches, a missing address storage unit (MASU) associated with each cache which stores prefetch requests suffering a cache miss, a prefetcher which sends prefetch requests towards the hierarchical memory, and a vector prefetch unit. The vector prefetch unit determines existence of at least one of a relationship between a cache block associated with the prefetch request and cache blocks associated with one or more entries in a MASU, or a relationship between cache blocks associated with different entries in a MASU, and sends a vector prefetch request based on related prefetch requests including indicators indicating a starting cache block and a number of related cache blocks to a higher memory level to obtain data associated with each cache block. The hierarchical memory stores the data received in at least one response message from the higher memory level if available.

GENERATIONAL PHYSICAL ADDRESS PROXIES

Each PIPT L2 cache entry is uniquely identified by a set index and a way and holds a generational identifier (GENID). The L2 detects a miss of a physical memory line address (PMLA). An L2 set index is obtained from the PMLA. The L2 picks a way for replacement, increments the GENID held in the entry in the picked way of the selected set, and forms a physical address proxy (PAP) for the PMLA with the obtained set index and the picked way. The PAP uniquely identifies the picked L2 entry. The L2 forms a generational PAP (GPAP) for the PMLA with the PAP and the incremented GENID. A load/store unit makes available the GPAP as a proxy of the PMLA for comparisons with GPAPs of other PMLAs, rather than making comparisons of the PMLA itself with the other PMLAs, to determine whether the PMLA matches the other PMLAs.