Patent classifications
G06F2212/251
De-centralized load-balancing at processors
A mechanism is described for facilitating localized load-balancing for processors in computing devices. A method of embodiments, as described herein, includes facilitating hosting, at a processor of a computing device, a local load-balancing mechanism. The method may further include monitoring balancing of loads at the processor and serving as a local scheduler to maintain de-centralized load-balancing at the processor and between the processor and other one or more processors.
Weight prefetch for in-memory neural network execution
The present disclosure is directed to systems and methods of bit-serial, in-memory, execution of at least an n.sup.th layer of a multi-layer neural network in a first on-chip processor memory circuitry portion contemporaneous with prefetching and storing layer weights associated with the (n+1).sup.st layer of the multi-layer neural network in a second on-chip processor memory circuitry portion. The storage of layer weights in on-chip processor memory circuitry beneficially decreases the time required to transfer the layer weights upon execution of the (n+1).sup.st layer of the multi-layer neural network by the first on-chip processor memory circuitry portion. In addition, the on-chip processor memory circuitry may include a third on-chip processor memory circuitry portion used to store intermediate and/or final input/output values associated with one or more layers included in the multi-layer neural network.
System and Method for Improving a Processing System
A system and corresponding method improve a processing system. The system comprises a first learning system coupled to a system controller. The first learning system identifies variations for altering processing of a processing system to meet at least one goal. The system controller applies the variations identified to the processing system. The system further comprises a second learning system coupled to the system controller. The second learning system determines respective effects of the variations identified and applied. The first learning system converges on a given variation of the variations based on the respective effects determined. The given variation enables the at least one goal to be met, improving the processing system, such as by increasing throughput, reducing latency, reducing power consumption, reducing temperature, etc.
SUPPLY CHAIN MONITORING
A computer-implemented method includes receiving values for different keys from different systems in a supply chain and determining a difference between a value for a first key from a first system and a value for a second key from a second system and comparing the difference to an alert threshold. An alert is issued when the difference exceeds the alert threshold.
Processor with split read
An apparatus includes a processor and split-read control circuitry (SRCC). The processor is to issue a set of one or more split-read requests for loading one or more data values to one or more respective local registers of the processor. The SRCC is to receive the set of one or more split-read requests, to read the one or more data values on behalf of the processor, and to write the data values into the one or more respective local registers. The processor and the SRCC are to coordinate a status of the split-read requests via a split-read-status indication.
GRAPHICS PROCESSING UNIT PROCESSING AND CACHING IMPROVEMENTS
Embodiments described herein are generally directed to improvements relating to power, latency, bandwidth and/or performance issues relating to GPU processing/caching. According to one embodiment, a state of multiple intellectual property (IP) cores that have access to a common cache via a central fabric is observed. Responsive to the observed state being indicative of performance of a standalone workload by a first IP core of the multiple IP cores, the common cache is treated as a local cache of the first IP core by powering off the central fabric and causing the first IP core to access the common cache via a low power access path between the first IP core and the common cache that is outside of the central fabric.
SYSTEMS AND METHODS FOR LEADER NODE ELECTION IN CLUSTER SERVER CONFIGURATIONS
Disclosed are systems and methods for leader node election, comprising a cluster system including a plurality of nodes, a node registry, wherein nodes are configured to transmit registration requests to the node registry, receive node data is response, and to determine a leader node based on the earliest registered node, and wherein the leader node is configured to dynamically allocate data slots between the plurality of nodes, and each of the nodes are configured to store data associated with allocated data slots in an in-memory least recently used component and data associated with all of the data slots in a persistent storage component.
Algorithmic TCAM with compressed key encoding
A ternary content addressable memory is provided comprising; a memory device that includes a plurality of memory address locations; hash logic operative to determine a hash value, based upon a ternary key, wherein the determined hash value corresponds to a memory address location of the memory device; an encoder operable to convert the ternary key to a binary bit representation; wherein converting includes determining binary mapping bits based upon number and positions of ternary non-X (don't care) value bits of the ternary key; wherein converting further includes determining a different binary data bit to correspond to each different ternary non-X value bit of the ternary key; and memory controller logic to cause the memory device to store the binary bit representation at the memory address location that corresponds to the determined hash value.
STASH BALANCING IN MODEL PARALLELISM
Embodiments of the present disclosure include systems and methods for training neural networks. In one embodiment, data for an artificial intelligence model is processed in a first plurality of stages and in a second plurality of stages. The first and second pluralities of stages form a pipeline. One or more of the first plurality of stages uses at least one memory associated with a corresponding one or more of the second plurality of stages to balance memory across the pipeline.
Enhanced storage device memory architecture for machine learning
Embodiments of storage device architecture for processing data using machine learning are disclosed. In some embodiments, the storage device includes a separate I/O core and a neural network core. The storage device can create a copy of data streams in which the data is stored, and the neural network core can process the copy of the data streams in a neural network while the I/O core can perform read or write functions on the data streams.