G06F12/0692

Technologies for providing shared memory for accelerator sleds

Technologies for providing shared memory for accelerator sleds includes an accelerator sled to receive, with a memory controller, a memory access request from an accelerator device to access a region of memory. The request is to identify the region of memory with a logical address. Additionally, the accelerator sled is to determine from a map of logical addresses and associated physical address, the physical address associated with the region of memory. In addition, the accelerator sled is to route the memory access request to a memory device associated with the determined physical address.

DATA DEDUPLICATION LATENCY REDUCTION

Aspects of the present disclosure relate to reducing the latency of data deduplication. In embodiments, an input/output (IO) workload received by a storage array is monitored. Further, at least one IO write operation in the IO workload is identified. A space-efficient probabilistic data structure is used to determine if a director board is associated with the IO write. Additionally, the IO write operation is processed based on the determination.

Multi-uplink device enumeration and management

A device includes a plurality of ports and a plurality of capability registers that correspond to a respective one of the plurality of ports. The device is to connect to one or more processors of a host device through the plurality of ports, and each of the plurality of ports comprises a respective protocol stack to support a respective link between the corresponding port and the host device according to a particular interconnect protocol. Each of the plurality of capability registers comprises a respective set of fields for use in configuration of the link between its corresponding port and one of the one or more processors of the host device. The fields include a field to indicate an association between the port and a particular processor, a field to indicate a port identifier for the port, and a field to indicate a total number of ports of the device.

Systems and methods for implementing a four-dimensional superblock
11693773 · 2023-07-04 · ·

A solid state drive (SSD) is presented herein that includes a plurality of memory dies communicatively arranged in a plurality of communication channels such that each respective memory die is associated with a respective one communication channel of the plurality of communication channels, each respective memory die comprises one or more die regions, and each of the one or more die regions comprises a plurality of physical blocks configured to store data. The SSD further includes a memory controller communicatively coupled to the plurality of memory dies. The memory controller is configured to, upon a first power up of the SSD, determine a parameter of the SSD and for each of the one or more die regions, associate, based on the parameter, a number of physical blocks of the plurality of physical blocks with a block region of a plurality of block regions.

Storage system with interconnected solid state disks
11573895 · 2023-02-07 · ·

An embodiment of a semiconductor package apparatus may include technology to provide a first interface between a first storage device and a host device, and provide a second interface directly between the first storage device and a second storage device. Other embodiments are disclosed and claimed.

Deep learning approach to mitigate the cold-start problem in textual items recommendations
11687612 · 2023-06-27 · ·

A method for mitigating cold starts in recommendations includes receiving a request that identifies a requested page and identifying a content vector of the requested page. The content vector is generated based on providing text of the requested page to a neural network text encoder. The method further includes selecting, based on the content vector, a link to a cold start page that does not satisfy a threshold level of interaction data. The selected link is ranked above a second link to a warm page that does satisfy the threshold level of the interaction data. The method further includes presenting the requested page with the selected link.

Data deduplication latency reduction

Aspects of the present disclosure relate to reducing the latency of data deduplication. In embodiments, an input/output (IO) workload received by a storage array is monitored. Further, at least one IO write operation in the IO workload is identified. A space-efficient probabilistic data structure is used to determine if a director board is associated with the IO write. Additionally, the IO write operation is processed based on the determination.

Neural network processing element incorporating compute and local memory elements

A novel and useful neural network (NN) processing core adapted to implement artificial neural networks (ANNs) and incorporating processing circuits having compute and local memory elements. The NN processor is constructed from self-contained computational units organized in a hierarchical architecture. The homogeneity enables simpler management and control of similar computational units, aggregated in multiple levels of hierarchy. Computational units are designed with minimal overhead as possible, where additional features and capabilities are aggregated at higher levels in the hierarchy. On-chip memory provides storage for content inherently required for basic operation at a particular hierarchy and is coupled with the computational resources in an optimal ratio. Lean control provides just enough signaling to manage only the operations required at a particular hierarchical level. Dynamic resource assignment agility is provided which can be adjusted as required depending on resource availability and capacity of the device.

Neural network processor incorporating inter-device connectivity

A novel and useful neural network (NN) processing core incorporating inter-device connectivity and adapted to implement artificial neural networks (ANNs). A chip-to-chip interface spreads a given ANN model across multiple devices in a seamless manner. The NN processor is constructed from self-contained computational units organized in a hierarchical architecture. The homogeneity enables simpler management and control of similar computational units, aggregated in multiple levels of hierarchy. Computational units are designed with minimal overhead as possible, where additional features and capabilities are aggregated at higher levels in the hierarchy. On-chip memory provides storage for content inherently required for basic operation at a particular hierarchy and is coupled with the computational resources in an optimal ratio. Lean control provides just enough signaling to manage only the operations required at a particular hierarchical level. Dynamic resource assignment agility is provided which can be adjusted as required depending on resource availability and capacity of the device.

CLOUD-BASED SCALE-UP SYSTEM COMPOSITION

Technologies for composing a managed node with multiple processors on multiple compute sleds to cooperatively execute a workload include a memory, one or more processors connected to the memory, and an accelerator. The accelerator further includes a coherence logic unit that is configured to receive a node configuration request to execute a workload. The node configuration request identifies the compute sled and a second compute sled to be included in a managed node. The coherence logic unit is further configured to modify a portion of local working data associated with the workload on the compute sled in the memory with the one or more processors of the compute sled, determine coherence data indicative of the modification made by the one or more processors of the compute sled to the local working data in the memory, and send the coherence data to the second compute sled of the managed node.