Patent classifications
G06F15/786
SCALABLE ACCELERATION OF REENTRANT COMPUTE OPERATIONS
Examples herein describe techniques for performing parallel processing using a plurality of processing elements (PEs) and a controller for data that has data dependencies. For example, a calculation may require an entire row or column to be summed, or to determine its mean. The PEs can be assigned different chunks of a data set (e.g., a tensor set, a column, or a row) for processing. The PEs can use one or more tokens to inform the controller when they are done with partial processing of their data chunks. The controller can then gather the partial results and determine an intermediate value for the data set. The controller can then distribute this intermediate value to the PEs which then re-process their respective data chunks using the intermediate value to generate final results.
METHOD AND SYSTEM FOR PROCESSING NEURAL NETWORK
The present disclosure provides a neural network processing system that comprises a multi-core processing module composed of a plurality of core processing modules and for executing vector multiplication and addition operations in a neural network operation, an on-chip storage medium, an on-chip address index module, and an ALU module for executing a non-linear operation not completable by the multi-core processing module according to input data acquired from the multi-core processing module or the on-chip storage medium, wherein the plurality of core processing modules share an on-chip storage medium and an ALU module, or the plurality of core processing modules have an independent on-chip storage medium and an ALU module. The present disclosure improves an operating speed of the neural network processing system, such that performance of the neural network processing system is higher and more efficient.
ELECTRONIC CIRCUIT
Provided is an electronic circuit that makes a changed FPGA usable without changing firmware. The electronic circuit includes: a processor; a first field-programmable gate array (FPGA); a first generation circuit; and a second generation circuit. The first FPGA executes configuration according to first bitstream data before the processor is activated. The first generation circuit outputs a first reset signal that resets the processor to the processor based on a first completion signal indicating that the configuration is completed. The second generation circuit outputs a second completion signal indicating that the configuration is completed to the processor based on a control signal output from the processor.
Distributed processor memory chip with multi-port processor subunits
In some embodiments, an integrated circuit may include a substrate and a memory array disposed on the substrate, where the memory array includes a plurality of discrete memory banks. The integrated circuit may also include a processing array disposed on the substrate, where the processing array includes a plurality of processor subunits, each one of the plurality of processor subunits being associated with one or more discrete memory banks among the plurality of discrete memory banks. The integrated circuit may also include a controller configured to implement at least one security measure with respect to an operation of the integrated circuit and take one or more remedial actions if the at least one security measure is triggered.
Apparatuses, methods, and systems for package on package memory refresh and self-refresh rate management
Methods, systems, and apparatuses relating to package on package memory refresh and self-refresh rate management are described. In one embodiment, an apparatus includes a processor die, a dynamic memory die mounted to and overlapping the processor die, a first thermal sensor of the processor die disposed adjacent to a first hot spot from a first type of workload and a second thermal sensor of the processor die disposed adjacent to a second hot spot from a second type of workload, and a hardware control circuit of the processor die to cause a refresh of a capacitor of the dynamic memory die when either of an output of the first thermal sensor exceeds a first threshold value and an output of the second thermal sensor exceeds a second threshold value.
Scalable acceleration of reentrant compute operations
Examples herein describe techniques for performing parallel processing using a plurality of processing elements (PEs) and a controller for data that has data dependencies. For example, a calculation may require an entire row or column to be summed, or to determine its mean. The PEs can be assigned different chunks of a data set (e.g., a tensor set, a column, or a row) for processing. The PEs can use one or more tokens to inform the controller when they are done with partial processing of their data chunks. The controller can then gather the partial results and determine an intermediate value for the data set. The controller can then distribute this intermediate value to the PEs which then re-process their respective data chunks using the intermediate value to generate final results.
APPARATUSES, METHODS, AND SYSTEMS FOR PACKAGE ON PACKAGE MEMORY REFRESH AND SELF-REFRESH RATE MANAGEMENT
Methods, systems, and apparatuses relating to package on package memory refresh and self-refresh rate management are described. In one embodiment, an apparatus includes a processor die, a dynamic memory die mounted to and overlapping the processor die, a first thermal sensor of the processor die disposed adjacent to a first hot spot from a first type of workload and a second thermal sensor of the processor die disposed adjacent to a second hot spot from a second type of workload, and a hardware control circuit of the processor die to cause a refresh of a capacitor of the dynamic memory die when either of an output of the first thermal sensor exceeds a first threshold value and an output of the second thermal sensor exceeds a second threshold value.
Apparatuses, methods, and systems for package on package memory refresh and self-refresh rate management
Methods, systems, and apparatuses relating to package on package memory refresh and self-refresh rate management are described. In one embodiment, an apparatus includes a processor die, a dynamic memory die mounted to and overlapping the processor die, a first thermal sensor of the processor die disposed adjacent to a first hot spot from a first type of workload and a second thermal sensor of the processor die disposed adjacent to a second hot spot from a second type of workload, and a hardware control circuit of the processor die to cause a refresh of a capacitor of the dynamic memory die when either of an output of the first thermal sensor exceeds a first threshold value and an output of the second thermal sensor exceeds a second threshold value.
In-memory zero value detection
In some embodiments, an integrated circuit may include a substrate and a memory array disposed on the substrate, where the memory array includes a plurality of discrete memory banks. The integrated circuit may also include a processing array disposed on the substrate, where the processing array includes a plurality of processor subunits, each one of the plurality of processor subunits being associated with one or more discrete memory banks among the plurality of discrete memory banks. The integrated circuit may also include a controller configured to implement at least one security measure with respect to an operation of the integrated circuit and take one or more remedial actions if the at least one security measure is triggered.
Electronic circuit that renders a changed FPGA usable without changing firmware
Provided is an electronic circuit that makes a changed FPGA usable without changing firmware. The electronic circuit includes: a processor; a first field-programmable gate array (FPGA); a first generation circuit; and a second generation circuit. The first FPGA executes configuration according to first bitstream data before the processor is activated. The first generation circuit outputs a first reset signal that resets the processor to the processor based on a first completion signal indicating that the configuration is completed. The second generation circuit outputs a second completion signal indicating that the configuration is completed to the processor based on a control signal output from the processor.