Patent classifications
G06F12/0875
System and method for optimizing DRAM bus switching using LLC
The present disclosure relates to a system and method for optimizing switching of a DRAM bus using LLC. An embodiment of the disclosure includes sending a first type request from a first type queue to the second memory via the memory bus if a direction setting of the memory bus is in a first direction corresponding to the first type request, decrementing a current direction credit count by a first type transaction decrement value, if the decremented current direction credit count is greater than zero, sending another first type request to the second memory via the memory bus and decrementing the current direction credit count again by the first type transaction decrement value, and if the decremented current direction credit count is zero, switching the direction setting of the memory bus to a second direction and resetting the current direction credit count to a second type initial value.
System and method for optimizing DRAM bus switching using LLC
The present disclosure relates to a system and method for optimizing switching of a DRAM bus using LLC. An embodiment of the disclosure includes sending a first type request from a first type queue to the second memory via the memory bus if a direction setting of the memory bus is in a first direction corresponding to the first type request, decrementing a current direction credit count by a first type transaction decrement value, if the decremented current direction credit count is greater than zero, sending another first type request to the second memory via the memory bus and decrementing the current direction credit count again by the first type transaction decrement value, and if the decremented current direction credit count is zero, switching the direction setting of the memory bus to a second direction and resetting the current direction credit count to a second type initial value.
Method and apparatus for adapting feature data in a convolutional neural network
A method and an apparatus for adapting feature data in a convolutional neural network. The method includes selecting a plurality of consecutive layers; determining an expected number of subdata blocks and a layout position, width and height of each subdata block in an output feature data of a last layer; determining, for each current layer, a layout position, width, and height of each subdata block of an input feature data for the current layer according to the layout position, width, and height of each subdata block of the output feature data for the current layer; determining an actual position of each subdata block of the input feature data for a first layer in the input feature data for the first layer; and obtaining the expected number of subdata blocks of the input feature data for the first layer according to the actual position, width and height of each subdata block of the input feature data for the first layer.
Method and apparatus for adapting feature data in a convolutional neural network
A method and an apparatus for adapting feature data in a convolutional neural network. The method includes selecting a plurality of consecutive layers; determining an expected number of subdata blocks and a layout position, width and height of each subdata block in an output feature data of a last layer; determining, for each current layer, a layout position, width, and height of each subdata block of an input feature data for the current layer according to the layout position, width, and height of each subdata block of the output feature data for the current layer; determining an actual position of each subdata block of the input feature data for a first layer in the input feature data for the first layer; and obtaining the expected number of subdata blocks of the input feature data for the first layer according to the actual position, width and height of each subdata block of the input feature data for the first layer.
Streaming engine with early and late address and loop count registers to track architectural state
A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. The streaming engine stores an early address of next to be fetched data elements and a late address of a data element in the stream head register for each of the nested loops. The streaming engine stores an early loop counts of next to be fetched data elements and a late loop counts of a data element in the stream head register for each of the nested loops.
Streaming engine with early and late address and loop count registers to track architectural state
A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. The streaming engine stores an early address of next to be fetched data elements and a late address of a data element in the stream head register for each of the nested loops. The streaming engine stores an early loop counts of next to be fetched data elements and a late loop counts of a data element in the stream head register for each of the nested loops.
Feature map and weight selection method and accelerating device
The present disclosure provides a processing device including: a coarse-grained pruning unit configured to perform coarse-grained pruning on a weight of a neural network to obtain a pruned weight, an operation unit configured to train the neural network according to the pruned weight. The coarse-grained pruning unit is specifically configured to select M weights from the weights of the neural network through a sliding window, and when the M weights meet a preset condition, all or part of the M weights may be set to 0. The processing device can reduce the memory access while reducing the amount of computation, thereby obtaining an acceleration ratio and reducing energy consumption.
Feature map and weight selection method and accelerating device
The present disclosure provides a processing device including: a coarse-grained pruning unit configured to perform coarse-grained pruning on a weight of a neural network to obtain a pruned weight, an operation unit configured to train the neural network according to the pruned weight. The coarse-grained pruning unit is specifically configured to select M weights from the weights of the neural network through a sliding window, and when the M weights meet a preset condition, all or part of the M weights may be set to 0. The processing device can reduce the memory access while reducing the amount of computation, thereby obtaining an acceleration ratio and reducing energy consumption.
DETECTING INFINITE LOOPS IN A PROGRAMMABLE ATOMIC TRANSACTION
Disclosed in some examples are systems, methods, devices, and machine-readable mediums to detect and terminate programmable atomic transactions that are stuck in an infinite loop. In order to detect and terminate these transactions, the programmable atomic unit may use an instruction counter that increments each time an instruction is executed during execution of a programmable atomic transaction. If the instruction counter meets or exceeds a threshold instruction execution limit without reaching the termination instruction, the programmable atomic transaction may be terminated, all resources used (e.g., memory locks) may be freed, and a response may be sent to a calling processor.
Memory system
A memory system includes: a first memory module including first volatile memories; a second memory module including second volatile memories, non-volatile memories and a module controller; a memory controller controlling the first and second memory modules through second and third control buses, respectively; and a switch array electrically coupling the second and third control buses, wherein the module controller controls the switch array to electrically couple the second and third control buses in a backup operation for backing up data of the first volatile memories to the non-volatile memories, wherein the first and second memory modules include one or more first memory stacks and one or more second memory stacks, respectively, wherein the first volatile memories are stacked in the first memory stacks, and wherein the second volatile memories, the non-volatile memories and the module controller are stacked in the second memory stacks.