Patent classifications
G06F9/30083
Cache flush method and apparatus
Various embodiments of a method and apparatus for flushing a cache are disclosed. In a system, a cache memory is accessible by an execution circuit. The execution circuit executes instructions and may utilize data and/or instructions stored in the cache. A flush circuit is also coupled to the cache. Responsive to execution of a power down instruction by the execution circuit, the flush circuit performs a cache flush. If a control state is asserted in a control register, the flush circuit generates a dummy event upon completing the cache flush. Responsive to generating the dummy event, a processor core that includes the execution circuit is inhibited from being powered down.
Distributing power shared between an accelerated processing unit and a discrete graphics processing unit
An integrated coprocessor such as an accelerated processing unit (APU) generates commands for execution on a discrete coprocessor such as a discrete graphics processing unit (dGPU). Power distribution circuitry selectively provides power to the APU and the dGPU based on characteristics of workloads executing on the APU and the dGPU and based on a platform power limit that is shared by the APU and the dGPU. In some cases, the power distribution circuitry determines a first power provided to the APU and a second power provided to the dGPU. The power distribution circuitry increases the second power provided to the dGPU in response to a sum of the first and second powers being less than the platform power limit. In some cases, the power distribution circuitry modifies the power provided to the APU, the dGPU, or both in response to changes in temperatures measured by a set of sensors.
PROACTIVE DI/DT VOLTAGE DROOP MITIGATION
Embodiments include a method comprising identifying, by an instruction scheduler of a processor core, a first high power instruction in an instruction stream to be executed by an execution unit of the processor core. A pre-charge signal is asserted indicating that the first high power instruction is scheduled for execution. Subsequent to the pre-charge signal being asserted, a voltage boost signal is asserted to cause a supply voltage for the execution unit to be increased. A busy signal indicating that the first high power instruction is executing is received from the execution unit. Based at least in part on the busy signal being asserted, de-asserting the voltage boost signal. More specific embodiments include decreasing the supply voltage for the execution unit subsequent to the de-asserting the voltage boost signal. More Further embodiments include delaying asserting the voltage boost signal based on a start delay time.
Systems and methods to skip inconsequential matrix operations
Disclosed embodiments relate to systems and methods to skip inconsequential matrix operations. In one example, a processor includes decode circuitry to decode an instruction having fields to specify an opcode and locations of first source, second source, and destination matrices, the opcode indicating that the processor is to multiply each element at row M and column K of the first source matrix with a corresponding element at row K and column N of the second source matrix, and accumulate a resulting product with previous contents of a corresponding element at row M and column N of the destination matrix, the processor to skip multiplications that, based on detected values of corresponding multiplicands, would generate inconsequential results; scheduling circuitry to schedule execution of the instruction; and execution circuitry to execute the instructions as per the opcode.
Memory-network processor with programmable optimizations
Various embodiments are disclosed of a multiprocessor system with processing elements optimized for high performance and low power dissipation and an associated method of programming the processing elements. Each processing element may comprise a fetch unit and a plurality of address generator units and a plurality of pipelined datapaths. The fetch unit may be configured to receive a multi-part instruction, wherein the multi-part instruction includes a plurality of fields. First and second address generator units may generate, based on different fields of the multi-part instruction, addresses from which to retrieve first and second data for use by an execution unit for the multi-part instruction or a subsequent multi-part instruction. The execution units may perform operations using a single pipeline or multiple pipelines based on third and fourth fields of the multi-part instruction.
Memory-Network Processor with Programmable Optimizations
Various embodiments are disclosed of a multiprocessor system with processing elements optimized for high performance and low power dissipation and an associated method of programming the processing elements. Each processing element may comprise a fetch unit and a plurality of address generator units and a plurality of pipelined datapaths. The fetch unit may be configured to receive a multi-part instruction, wherein the multi-part instruction includes a plurality of fields. A first address generator unit may be configured to perform an arithmetic operation dependent upon a first field of the plurality of fields. A second address generator unit may be configured to generate at least one address of a plurality of addresses, wherein each address is dependent upon a respective field of the plurality of fields. A parallel assembly language may be used to control the plurality of address generator units and the plurality of pipelined datapaths.
METHOD AND APPARATUS FOR IMPLEMENTING POWER MODES IN MICROCONTROLLERS USING POWER PROFILES
A method and apparatus for implementing power modes in microcontrollers (MCUs) using power profiles. In one embodiment of the method, a central processing unit (CPU) of the MCU executes a first instruction for calling a subroutine stored in a memory of the MCU, wherein the first instruction comprises a first parameter to be passed to the subroutine. Thereafter the CPU writes a first value to a first special function register (SFR) of the MCU in response to executing the first instruction, wherein the first value is related to the first parameter. The MCU operates in a first power mode in response to the CPU writing the first value to the first SFR. The CPU also executes a second instruction for calling the subroutine, wherein the second instruction comprises a second parameter to be passed to the subroutine. In response the CPU writes a second value to a second SFR of the MCU in response to executing the second instruction, wherein the second value is related to the second parameter. The MCU operates in a second power mode in response to the CPU writing the second value to the second SFR. The MCU consumes more power operating in the first power mode than it does when operating in the second power mode.
Adaptive metadata refreshing
Techniques are described for managing the optimized refreshing of metadata associated with online and live systems. In some implementations, a set of metadata modules associated with one or more entities are identified, the metadata modules defining metadata associated with a particular data model for the associated entities. A request to initiate a refreshing of the metadata for a subset of the set of metadata modules is identified. Each metadata module from the subset of the set of metadata modules is prioritized into a prioritization order. A determination is made as to whether two or more idle database connections are available. In response to determining that two or more idle database connections are available, a concurrent refresh of the subset of the set of metadata modules is initialized in the prioritization order.
NEURAL NETWORK PROCESSOR AND NEURAL NETWORK COMPUTATION METHOD
The present disclosure provides a neural network processor and neural network computation method that deploy a memory and a cache to perform a neural network computation, where the memory may be configured to store data and instructions of the neural network computation, the cache may be connected to the memory via a memory bus, thereby, the actual compute ability of hardware may be fully utilized, the cost and power consumption overhead may be reduced, parallelism of the network may be fully utilized, and the efficiency of the neural network computation may be improved.
DATA SCREENING DEVICE AND METHOD
The present disclosure provides a data screening device and method, which employ a storage unit and a register unit, and are capable of performing operations on data of different storage structures and different sizes efficiently.