Patent classifications
G06F13/28
DUAL-PORT NON-VOLATILE DUAL IN-LINE MEMORY MODULES
According to an example, a dual-port non-volatile dual in-line memory module (NVDIMM) includes a first port to provide a central processing unit (CPU) with access to universal memory of the dual-port NVDIMM and a second port to provide an external NVDIMM manager circuit with access to the universal memory of the dual-port NVDIMM. Accordingly, a media controller of the dual-port NVDIMM may store data received from the CPU through the first port in the universal memory, control dual-port settings received from the CPU, and transmit the stored data to the NVDIMM manager circuit through the second port of the dual-port NVDIMM.
Tensor data distribution using grid direct-memory access (DMA) controller
In one embodiment, a method for tensor data distribution using a direct-memory access agent includes generating, by a first controller, source addresses indicating locations in a source memory where portions of a source tensor are stored. A second controller may generate destination addresses indicating locations in a destination memory where portions of a destination tensor are to be stored. The direct-memory access agent receives a source address generated by the first controller and a destination address generated by the second controller and determines a burst size. The direct-memory access agent may issue a read request comprising the source address and the burst size to read tensor data from the source memory and may store the tensor data into an alignment buffer. The direct-memory access agent then issues a write request comprising the destination address and the burst size to write data from the alignment buffer into the destination memory.
Tensor data distribution using grid direct-memory access (DMA) controller
In one embodiment, a method for tensor data distribution using a direct-memory access agent includes generating, by a first controller, source addresses indicating locations in a source memory where portions of a source tensor are stored. A second controller may generate destination addresses indicating locations in a destination memory where portions of a destination tensor are to be stored. The direct-memory access agent receives a source address generated by the first controller and a destination address generated by the second controller and determines a burst size. The direct-memory access agent may issue a read request comprising the source address and the burst size to read tensor data from the source memory and may store the tensor data into an alignment buffer. The direct-memory access agent then issues a write request comprising the destination address and the burst size to write data from the alignment buffer into the destination memory.
Computing device and method
The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.
Computing device and method
The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.
NEURAL NETWORK COMPUTE TILE
A computing unit is disclosed, comprising a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs one or more computations associated with at least one element of a data array, the one or more computations being performed by the MAC operator and comprising, in part, a multiply operation of the input activation received from the data bus and a parameter received from the second memory bank.
NEURAL NETWORK COMPUTE TILE
A computing unit is disclosed, comprising a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs one or more computations associated with at least one element of a data array, the one or more computations being performed by the MAC operator and comprising, in part, a multiply operation of the input activation received from the data bus and a parameter received from the second memory bank.
SYNCHRONIZED DATA CHAINING USING ON-CHIP CACHE
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating, by an image sensor of a computing device, frame data comprising sub-frames of image pixel data. A first resource of the system-on-chip provides the frame data to a second resource of the system-on-chip. The frame data is provided to the second resource using a first data path included in the system-on-chip. The first resource provides a token to the second resource using a second data path included in the system-on-chip. A processor of the system-on-chip, uses the token to synchronize production of sub-frames of image pixel data provided by the first resource to the second resource and to synchronize consumption of the sub-frames of image pixel data received by the second resource from the elastic memory buffer.
SYNCHRONIZED DATA CHAINING USING ON-CHIP CACHE
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating, by an image sensor of a computing device, frame data comprising sub-frames of image pixel data. A first resource of the system-on-chip provides the frame data to a second resource of the system-on-chip. The frame data is provided to the second resource using a first data path included in the system-on-chip. The first resource provides a token to the second resource using a second data path included in the system-on-chip. A processor of the system-on-chip, uses the token to synchronize production of sub-frames of image pixel data provided by the first resource to the second resource and to synchronize consumption of the sub-frames of image pixel data received by the second resource from the elastic memory buffer.
AUTOMATIC READ CONTROL SYSTEM BASED ON A HARDWARE ACCELERATED SPI AND AUTOMATIC READ CONTROL METHOD
Disclosed is a hardware acceleration based automatic read control system and method for a serial peripheral interface (SPI). The automatic read control system includes an SPI module, an advanced peripheral bus (APB) module, an interrupt generation module, a direct memory access (DMA) controller, a state schedule control module, a register group module, a count signal generation module, a transmitted data buffer and a received data buffer; the state schedule control module, the register group module and the count signal generation module form a state machine system; and the state schedule control module controls automatic timed batch read of sensor data of the SPI according to configuration information of the register group module and counting and timing information of the count signal generation module.