Patent classifications
G06F12/0692
Technolgies for millimeter wave rack interconnects
Racks and rack pods to support a plurality of sleds are disclosed herein. Switches for use in the rack pods are also disclosed herein. A rack comprises a plurality of sleds and a plurality of electromagnetic waveguides. The plurality of sleds are vertically spaced from one another. The plurality of electromagnetic waveguides communicate data signals between the plurality of sleds.
Configurable and programmable sliding window based memory access in a neural network processor
A novel and useful neural network (NN) processing core adapted to implement artificial neural networks (ANNs) and incorporating configurable and programmable sliding window based memory access. The memory mapping and allocation scheme trades off random and full access in favor of high parallelism and static mapping to a subset of the overall address space. The NN processor is constructed from self-contained computational units organized in a hierarchical architecture. The homogeneity enables simpler management and control of similar computational units, aggregated in multiple levels of hierarchy. Computational units are designed with minimal overhead as possible, where additional features and capabilities are aggregated at higher levels in the hierarchy. On-chip memory provides storage for content inherently required for basic operation at a particular hierarchy and is coupled with the computational resources in an optimal ratio. Lean control provides just enough signaling to manage only the operations required at a particular hierarchical level. Dynamic resource assignment agility is provided which can be adjusted as required depending on resource availability and capacity of the device.
Multi-value mapping for object store
A method for mapping an object store may include storing a data entry within a mapping page for an object in the object store, wherein the data entry may include a key and a value, and the value may include an address for the object in the object store. The method may further include storing multiple data entries within the mapping page for multiple corresponding objects in the object store, wherein each data entry may include a key and one or more values for a corresponding object in the object store, and each value may include an address for the corresponding object in the object store. The data entries may be part of a mapping data structure which may include nodes, and each node may be stored within a mapping page.
TECHNOLOGIES FOR COORDINATING DISAGGREGATED ACCELERATOR DEVICE RESOURCES
A compute device to manage workflow to disaggregated computing resources is provided. The compute device comprises a compute engine receive a workload processing request, the workload processing request defined by at least one request parameter, determine at least one accelerator device capable of processing a workload in accordance with the at least one request parameter, transmit a workload to the at least one accelerator device, receive a work product produced by the at least one accelerator device from the workload, and provide the work product to an application.
Method and apparatus for uniform memory access in a storage cluster
The present application presents a Uniform Memory Access (UMA) network including a cluster of UMA nodes each having at least one UMA memory unit and a server local to the at least one UMA memory unit. A respective UMA memory unit in a respective UMA node comprises persistent memory; non-persistent memory, a node control device operatively coupled to the persistent memory and the non-persistent memory, a local interface for interfacing with the local server in the respective UMA node, and a network interface for interfacing with the UMA network. The node control device is configured to translate between a local unified memory access (UMA) address space accessible by applications running on the local server and a global UMA address space that is mapped to a physical UMA address space. The physical UMA address space includes physical address spaces associated with different UMA nodes in the cluster of UMA nodes. Thus, a server in the UMA network can access the physical address spaces at other UMA nodes without going through the servers in the other UMA nodes.
Neural network processor incorporating multi-level hierarchical aggregated computing and memory elements
A novel and useful neural network (NN) processing core adapted to implement artificial neural networks (ANNs). The NN processor is constructed from self-contained computational units organized in a hierarchical architecture. The homogeneity enables simpler management and control of similar computational units, aggregated in multiple levels of hierarchy. Computational units are designed with minimal overhead as possible, where additional features and capabilities are aggregated at higher levels in the hierarchy. On-chip memory provides storage for content inherently required for basic operation at a particular hierarchy and is coupled with the computational resources in an optimal ratio. Lean control provides just enough signaling to manage only the operations required at a particular hierarchical level. Dynamic resource assignment agility is provided which can be adjusted as required depending on resource availability and capacity of the device.
TECHNOLOGIES FOR COORDINATING DISAGGREGATED ACCELERATOR DEVICE RESOURCES
A compute device to manage workflow to disaggregated computing resources is provided. The compute device comprises a compute engine receive a workload processing request, the workload processing request defined by at least one request parameter, determine at least one accelerator device capable of processing a workload in accordance with the at least one request parameter, transmit a workload to the at least one accelerator device, receive a work product produced by the at least one accelerator device from the workload, and provide the work product to an application.
Technologies for accelerator interface
Technologies for an accelerator interface over Ethernet are disclosed. In the illustrative embodiment, a network interface controller of a compute device may receive a data packet. If the network interface controller determines that the data packet should be pre-processed (e.g., decrypted) with a remote accelerator device, the network interface controller may encapsulate the data packet in an encapsulating network packet and send the encapsulating network packet to a remote accelerator device on a remote compute device. The remote accelerator device may pre-process the data packet (e.g., decrypt the data packet) and send it back to the network interface controller. The network interface controller may then send the pre-processed packet to a processor of the compute device.
TECHNOLOGIES FOR PROVIDING MANIFEST-BASED ASSET REPRESENTATION
Technologies for generating manifest data for a sled include a sled to generate manifest data indicative of one or more characteristics of the sled (e.g., hardware resources, firmware resources, a configuration of the sled, or a health of sled components). The sled is also to associate an identifier with the manifest data. The identifier uniquely identifies the sled from other sleds. Additionally, the sled is to send the manifest data and the associated identifier to a server. The sled may also detect a change in the hardware resources, firmware resources, the configuration, or component health of the sled. The sled may also generate an update of the manifest data based on the detected change, where the update specifies the detected change in the hardware resources, firmware resources, the configuration, or component health of the sled. The sled may also send the update of the manifest data to the server.
Data driven quantization optimization of weights and input data in an artificial neural network
A novel and useful system and method of data driven quantization optimization of weights and input data in an artificial neural network (ANN). The system reduces quantization implications (i.e. error) in a limited resource system by employing the information available in the data actually observed by the system. Data counters in the layers of the network observe the data input thereto. The distribution of the data is used to determine an optimum quantization scheme to apply to the weights, input data, or both. The mechanism is sensitive to the data observed at the input layer of the network. As a result, the network auto-tunes to optimize the instance specific representation of the network. The network becomes customized (i.e. specialized) to the inputs it observes and better fits itself to the subset of the sample space that is applicable to its actual data flow. As a result, nominal process noise is reduced and detection accuracy improves. In addition, the mechanism enables the reduction of the representation space and further reduces the memory (and energy thereof) needed to represent the network properties.