G06F2212/1048

Scalable network-on-package for connecting chiplet-based designs

A network-on-package (NoPK) for connecting a plurality of chiplets may include a plurality of interface bridges configured to convert a plurality of protocols used by the plurality of chiplets into a common protocol, a routing network configured to route traffic between the plurality of interface bridges using the common protocol, and a controller configured to program the plurality of interface bridges and the routing network based on types of the plurality of chiplets connected to the NoPK. The NoPK may provide a scalable connection for any number of chiplets from different ecosystems using different communication protocols.

SCALABLE ADDRESS DECODING SCHEME FOR CXL TYPE-2 DEVICES WITH PROGRAMMABLE INTERLEAVE GRANULARITY
20230086222 · 2023-03-23 · ·

Methods and apparatus relating to a scalable address decoding scheme for Compute Express Link™ or CXL™ Type-2 devices with programmable interleave granularity are described. In an embodiment, configurator logic circuitry determines an interleave granularity and an address range size for a plurality of devices coupled to a socket of a processor. A single System Address Decoder (SAD) rule for two or more of the plurality of the devices coupled to the socket of the processor is stored in memory. A memory access transaction directed at a first device from the plurality of devices is routed to the first device in accordance with the SAD rule. Other embodiments are also disclosed and claimed.

Data processing network with flow compaction for streaming data transfer

An improved protocol for data transfer between a request node and a home node of a data processing network that includes a number of devices coupled via an interconnect fabric is provided that minimizes the number of response messages transported through the interconnect fabric. When congestion is detected in the interconnect fabric, a home node sends a combined response to a write request from a request node. The response is delayed until a data buffer is available at the home node and home node has completed an associated coherence action. When the request node receives a combined response, the data to be written and the acknowledgment are coalesced in the data message.

Scalable Cache Coherency Protocol

A scalable cache coherency protocol for system including a plurality of coherent agents coupled to one or more memory controllers is described. The memory controller may implement a precise directory for cache blocks from the memory to which the memory controller is coupled. Multiple requests to a cache block may be outstanding, and snoops and completions for requests may include an expected cache state at the receiving agent, as indicated by a directory in the memory controller when the request was processed, to allow the receiving agent to detect race conditions. In an embodiment, the cache states may include a primary shared and a secondary shared state. The primary shared state may apply to a coherent agent that bears responsibility for transmitting a copy of the cache block to a requesting agent. In an embodiment, at least two types of snoops may be supported: snoop forward and snoop back.

CACHE COHERENT SYSTEM IMPLEMENTING VICTIM BUFFERS
20230079078 · 2023-03-16 · ·

In accordance with various aspects of the invention, a recall transaction is issued if a tag filter entry needs to be freed up for an incoming transaction. Directory entries chosen for a recall transaction are pushed into a fully associative structure called victim buffer. If this structure gets full, then an entry is selected from entries inside the victim buffer for the recall.

Cache architectures with address delay registers for memory devices
11481330 · 2022-10-25 · ·

Methods, systems, and devices for cache architectures for memory devices are described. For example, a memory device may include a main array having a first set of memory cells, a cache having a second set of memory cells, and a cache delay register configured to store an indication of cache addresses associated with recently performed access operations. In some examples, the cache delay register may be operated as a first-in-first-out (FIFO) register of cache addresses, where a cache address associated with a performed access operation may be added to the beginning of the FIFO register, and a cache address at the end of the FIFO register may be purged. Information associated with access operations on the main array may be maintained in the cache, and accessed directly (e.g., without another accessing of the main array), at least as long as the cache address is present in the cache delay register.

MAINTAINING AN ACTIVE TRACK DATA STRUCTURE TO DETERMINE ACTIVE TRACKS IN CACHE TO PROCESS

Provided are a computer program product for managing tracks in a storage in a cache. An active track data structure indicates tracks in the cache that have an active status. An active bit in a cache control block for a track is set to indicate active for the track indicated as active in the active track data structure. In response to processing the cache control block, a determination is made, from the cache control block for the track, whether the track is active or inactive to determine processing for the cache control block.

PREVENTING UNAUTHORIZED TRANSLATED ACCESS USING ADDRESS SIGNING
20230070125 · 2023-03-09 ·

A host may use address translation to convert virtual addresses to physical addresses for endpoints, which may then submit memory access requests for physical addresses. The host may incorporate the physical address and a signature of the physical address generated using a private key into a translated address field of a response to a translation request. An endpoint may treat the combination as a translated address by storing it in an entry of a translation cache, and accessing the entry for inclusion in a memory access request. The host may generate a signature of the translated address from the request using the private key, with the result being compared to the signature from the request. The memory access request may be verified when the compared values match, and the memory access may be performed using the translated address.

Hardware-implemented universal floating-point instruction set architecture for computing directly with human-readable decimal character sequence floating-point representation operands
11635957 · 2023-04-25 ·

A universal floating-point Instruction Set Architecture (ISA) compute engine implemented entirely in hardware. The ISA compute engine computes directly with human-readable decimal character sequence floating-point representation operands without first having to explicitly perform a conversion-to-binary-format process in software. A fully pipelined convertToBinaryFromDecimalCharacter hardware operator logic circuit converts one or more human-readable decimal character sequence floating-point representations to IEEE 754-2008 binary floating-point representations every clock cycle. Following computations by at least one hardware floating-point operator, a convertToDecimalCharacterFromBinary hardware conversion circuit converts the result back to a human-readable decimal character sequence floating-point representation.

Cache memory architecture and management

Aspects of the present disclosure relate to data cache management. In embodiments, a storage array's memory is provisioned with cache memory, wherein the cache memory includes one or more sets of distinctly sized cache slots. Additionally, a logical storage volume (LSV) is established with at least one logical block address (LBA) group. Further, at least one of the LSV's LBA groups is associated with two or more distinctly sized cache slots based on an input/output (IO) workload received by the storage array.