Patent classifications
G06F9/268
Scheduler for amp architecture using a closed loop performance and thermal controller
Systems and methods are disclosed for scheduling threads on an asymmetric multiprocessing system having multiple core types. Each core type can run at a plurality of selectable voltage and frequency scaling (DVFS) states. Threads from a plurality of processes can be grouped into thread groups. Execution metrics are accumulated for threads of a thread group and fed into a plurality of tunable controllers. A closed loop performance control (CLPC) system determines a control effort for the thread group and maps the control effort to a recommended core type and DVFS state. A closed loop thermal and power management system can limit the control effort determined by the CLPC for a thread group, and limit the power, core type, and DVFS states for the system. Metrics for workloads offloaded to co-processors can be tracked and integrated into metrics for the offloading thread group.
Apparatus and method for secure, efficient microcode patching
An apparatus and method for efficient microcode patching. For example, one embodiment of an apparatus comprises: a package comprising one or more integrated circuit dies, the one or more integrated circuit dies comprising: a plurality of cores; and a security controller coupled to the plurality of cores, a first core of the plurality of cores comprising: a decoder to decode a microcode patching instruction, the microcode patching instruction comprising an operand to be used to identify an address; and execution circuitry to execute the microcode patching instruction, wherein responsive to the microcode patching instruction, the execution circuitry and/or security controller are to: retrieve a microcode patch from a location in memory based on the address, validate the microcode patch, apply the microcode patch to update or replace microcode associated with the one or more integrated circuit dies, and transmit the microcode patch to a persistent storage device; wherein the microcode patch is to be subsequently retrieved from the persistent storage device by one or more external security controllers of one or more external integrated circuit dies, the one or more external security controllers to cause the microcode patch to be applied to update or replace microcode associated with the one or more external integrated circuit dies.
Closed loop performance controller work interval instance propagation
Systems and methods are disclosed for scheduling threads on an asymmetric multiprocessing system having multiple core types. Each core type can run at a plurality of selectable voltage and frequency scaling (DVFS) states. Threads from a plurality of processes can be grouped into thread groups. Execution metrics are accumulated for threads of a thread group and fed into a plurality of tunable controllers. A closed loop performance control (CLPC) system determines a control effort for the thread group and maps the control effort to a recommended core type and DVFS state. A closed loop thermal and power management system can limit the control effort determined by the CLPC for a thread group, and limit the power, core type, and DVFS states for the system. Metrics for workloads offloaded to co-processors can be tracked and integrated into metrics for the offloading thread group.
LIVE FIRMWARE UPDATE SWITCHOVER
A method includes receiving, by a microcontroller, a live firmware update (LFU) command from an external host; and downloading, by the microcontroller, an image of a new version of firmware responsive to the LFU command. During a first time period, the method includes initializing only variables contained in the new version that are not contained in an old version of firmware. During a second time period, the method includes updating one or more of an interrupt vector table, a function pointer, and/or a stack pointer responsive to the new version. The second time period begins responsive to completing initialization of the variables.
APPARATUS AND METHOD FOR SECURE, EFFICIENT MICROCODE PATCHING
An apparatus and method for efficient microcode patching. For example, one embodiment of an apparatus comprises: a package comprising one or more integrated circuit dies, the one or more integrated circuit dies comprising: a plurality of cores; and a security controller coupled to the plurality of cores, a first core of the plurality of cores comprising: a decoder to decode a microcode patching instruction, the microcode patching instruction comprising an operand to be used to identify an address; and execution circuitry to execute the microcode patching instruction, wherein responsive to the microcode patching instruction, the execution circuitry and/or security controller are to: retrieve a microcode patch from a location in memory based on the address, validate the microcode patch, apply the microcode patch to update or replace microcode associated with the one or more integrated circuit dies, and transmit the microcode patch to a persistent storage device; wherein the microcode patch is to be subsequently retrieved from the persistent storage device by one or more external security controllers of one or more external integrated circuit dies, the one or more external security controllers to cause the microcode patch to be applied to update or replace microcode associated with the one or more external integrated circuit dies.
Hardware processor and method for loading a microcode patch from cache into patch memory and reloading an overwritten micro-operation
Hardware processors and methods for extended microcode patching through on-die and off-die secure storage are described. In one embodiment, the additional storage resources used for storing micro-operations are section(s) of a cache that are unused at runtime and/or unused by a configuration of a processor. For example, the additional storage resources may be a section of a cache that is used to store context information from a core when the core is transitioned to a power state that shuts off voltage to the core. Non-limiting examples of such sections are one or more sections for storage of context information for a transition of a thread to idle or off, storage of context information for a transition of a core for a multiple core processor to idle or off, or storage of coherency information for a transition of a cache coherency circuit (e.g., cache box (CBo)) to idle or off.
Live firmware update switchover
A method includes receiving, by a microcontroller, a live firmware update (LFU) command from an external host; and downloading, by the microcontroller, an image of a new version of firmware responsive to the LFU command. During a first time period, the method includes initializing only variables contained in the new version that are not contained in an old version of firmware. During a second time period, the method includes updating one or more of an interrupt vector table, a function pointer, and/or a stack pointer responsive to the new version. The second time period begins responsive to completing initialization of the variables.
Systems and methods of parallel and distributed processing of datasets for model approximation
A system including: at least one processor; and at least one memory having stored thereon computer program code that, when executed by the at least one processor, controls the system to: receive a data model identification and a dataset; in response to determining that the data model does not contain a hierarchical structure, perform expectation propagation on the dataset to approximate the data model with a hierarchical structure; divide the dataset into a plurality of channels; for each of the plurality of channels: divide the data into a plurality of microbatches; process each microbatch of the plurality of microbatches through parallel iterators; and process the output of the parallel iterators through single-instruction multiple-data (SIMD) layers; and asynchronously merge results of the SIMD layers.
Controlling apparatus for industrial products
The controlling apparatus for an industrial product of this disclosure has a couple of microcomputers each of which has a CPU and a memory and each of which runs the same controlling program as well as the same diagnostic program sequence parallelly and simultaneously. After the CPU of the microcomputer writes the calculated result of the diagnostic program sequence in the predetermined area of the storing area for monitoring value, such CPU send the same calculated result to the other one of the microcomputers (receiving microcomputer). The CPU of the receiving microcomputer makes a diagnosis for finding whether or not the received result is identical with its own calculated result.
Accelerator controller for inserting template microcode instructions into a microcode buffer to accelerate matrix operations
A method for a controller to execute a program comprising a sequence of functions on an accelerator with a pipelined architecture comprising a microcode buffer. The method comprises executing a function of the program as a sequence of operations, wherein the sequence of operations is represented by a sequence of templates, determining whether the template is non-colliding with previously inserted templates in the microcode buffer, determining whether data in local memory will be referenced before all previously inserted templates have taken effect, determining whether registers will be referenced before all previously inserted templates in the microcode buffer have taken effect, when it is determined that the template fits, that resources are available, that local data memory accesses will not collide, and that register accesses will not collide: creating a sequence of microcode instructions in the template, and inserting the template into the microcode buffer.