Patent classifications
G06F9/268
Register read/write ordering
Apparatus and methods are disclosed for controlling execution of register access instructions in a block-based processor architecture using a hardware structure that indicates a relative ordering of register access instruction in an instruction block. In one example of the disclosed technology, a method of operating a processor includes selecting a register access instruction of the plurality of instructions to execute based at least in part on dependencies encoded within a previous block of instructions and on stored data indicating which of the register write instructions have executed for the previous block, and executing the selected instruction. In some examples, one or more of a write mask, a read mask, a register write vector register, or a counter are used to determine register read/write dependences. Based on the encoded dependencies and the masked write vector, the next instruction block can issue when its register dependencies are available.
HARDWARE PROCESSORS AND METHODS FOR EXTENDED MICROCODE PATCHING
Hardware processors and methods for extended microcode patching through on-die and off-die secure storage are described. In one embodiment, the additional storage resources used for storing micro-operations are section(s) of a cache that are unused at runtime and/or unused by a configuration of a processor. For example, the additional storage resources may be a section of a cache that is used to store context information from a core when the core is transitioned to a power state that shuts off voltage to the core. Non-limiting examples of such sections are one or more sections for storage of context information for a transition of a thread to idle or off, storage of context information for a transition of a core for a multiple core processor to idle or off, or storage of coherency information for a transition of a cache coherency circuit (e.g., cache box (CBo)) to idle or off.
SYSTEMS AND METHODS OF PARALLEL AND DISTRIBUTED PROCESSING
A system including: at least one processor; and at least one memory having stored thereon computer program code that, when executed by the at least one processor, controls the system to: receive a data model identification and a dataset; in response to determining that the data model does not contain a hierarchical structure, perform expectation propagation on the dataset to approximate the data model with a hierarchical structure; divide the dataset into a plurality of channels; for each of the plurality of channels: divide the data into a plurality of microbatches; process each microbatch of the plurality of microbatches through parallel iterators; and process the output of the parallel iterators through single-instruction multiple-data (SIMD) layers; and asynchronously merge results of the SIMD layers.
INFORMATION PROCESSING SYSTEM AND RELAY DEVICE
An information processing system includes a first information processing device, a second information processing device, and a relay device connected to the first/second information processing devices over different buses. The first information processing device updates firmware of the power control microcomputer and transmit, to the power control microcomputer, a reactivation instruction signal after the firmware is updated. The power control microcomputer: executes reactivation of the power control microcomputer when the reactivation instruction signal is received from the first information processing device; determines whether or not the executed reactivation is reactivation that is executed immediately after the firmware update; and supplies the operation voltage to the first information processing device when the power control microcomputer determines that the executed reactivation is reactivation immediately after the firmware update.
Debug support for block-based processor
Systems and methods are disclosed for supporting debugging of programs in block-based processor architectures. In one example of the disclosed technology, a processor includes a block-based processor core for executing an instruction block comprising an instruction header and a plurality of instructions. The block-based processor core includes execution control logic and core state access logic. The execution control logic can be configured to schedule respective instructions of the plurality of instructions for execution in a dynamic order during a default execution mode and to schedule the respective instructions for execution in a static order during a debug mode. The core state access logic can be configured to read intermediate states of the block-based processor core and to provide the intermediate states outside of the block-based processor core during the debug mode.
Block-based processor including topology and control registers to indicate resource sharing and size of logical processor
Systems, apparatuses, and methods related to a block-based processor core topology register are disclosed. In one example of the disclosed technology, a processor can include a plurality of block-based processor cores for executing a program including a plurality of instruction blocks. A respective block-based processor core can include a sharable resource and a programmable composition topology register. The programmable composition topology register can be used to assign a group of the physical processor cores that share the sharable resource.
Systems and methods of parallel and distributed processing of datasets for model approximation
A system including: at least one processor; and at least one memory having stored thereon computer program code that, when executed by the at least one processor, controls the system to: receive a data model identification and a dataset; in response to determining that the data model does not contain a hierarchical structure, perform expectation propagation on the dataset to approximate the data model with a hierarchical structure; divide the dataset into a plurality of channels; for each of the plurality of channels: divide the data into a plurality of microbatches; process each microbatch of the plurality of microbatches through parallel iterators; and process the output of the parallel iterators through single-instruction multiple-data (SIMD) layers; and asynchronously merge results of the SIMD layers.
Prefetching instruction blocks
Technology related to prefetching instruction blocks is disclosed. In one example of the disclosed technology, a processor comprises a block-based processor core for executing a program comprising a plurality of instruction blocks. The block-based processor core can include prefetch logic and a local buffer. The prefetch logic can be configured to receive a reference to a predicted instruction block and to determine a mapping of the predicted instruction block to one or more lines. The local buffer can be configured to selectively store portions of the predicted instruction block and to provide the stored portions of the predicted instruction block when control of the program passes along a predicted execution path to the predicted instruction block.
HARDWARE PROCESSORS AND METHODS FOR EXTENDED MICROCODE PATCHING
Hardware processors and methods for extended microcode patching through on-die and off-die secure storage are described. In one embodiment, the additional storage resources used for storing micro-operations are section(s) of a cache that are unused at runtime and/or unused by a configuration of a processor. For example, the additional storage resources may be a section of a cache that is used to store context information from a core when the core is transitioned to a power state that shuts off voltage to the core. Non-limiting examples of such sections are one or more sections for: storage of context information for a transition of a thread to idle or off, storage of context information for a transition of a core for a multiple core processor to idle or off, or storage of coherency information for a transition of a cache coherency circuit (e.g., cache box (CBo)) to idle or off.
Initiating instruction block execution using a register access instruction
Apparatus and methods are disclosed for initiating instruction block execution using a register access instruction (e.g., a register Read instruction). In some examples of the disclosed technology, a block-based computing system can include a plurality of processor cores configured to execute at least one instruction block. The at least one instruction block encodes a data-flow instruction set architecture (ISA). The ISA includes a first plurality of instructions and a second plurality of instructions. One or more of the first plurality of instructions specify at least a first target instruction without specifying a data source operand. One or more of the second plurality of instructions specify at least a second target instruction and a data source operand that specifies a register.