Patent classifications
G06F9/268
Scheduler for amp architecture using a closed loop performance controller and deferred inter-processor interrupts
Systems and methods are disclosed for scheduling threads on a processor that has at least two different core types, such as an asymmetric multiprocessing system. Each core type can run at a plurality of selectable voltage and frequency scaling (DVFS) states. Threads from a plurality of processes can be grouped into thread groups. Execution metrics are accumulated for threads of a thread group and fed into a plurality of tunable controllers for the thread group. A closed loop performance control (CLPC) system determines a control effort for the thread group and maps the control effort to a recommended core type and DVFS state. A closed loop thermal and power management system can limit the control effort determined by the CLPC for a thread group, and limit the power, core type, and DVFS states for the system. Deferred interrupts can be used to increase performance.
Apparatus, a method and a computer program for video coding and decoding
A method comprising: encoding a first picture on a first scalability layer and on a lowest temporal sub-layer; encoding a second picture on a second scalability layer and on the lowest temporal sub-layer, wherein the first picture and the second picture represent the same time instant, encoding one or more first syntax elements, associated with the first picture, with a value indicating that a picture type of the first picture is other than a step-wise temporal sub-layer access (STSA) picture; encoding one or more second syntax elements, associated with the second picture, with a value indicating that a picture type of the second picture is a step-wise temporal sub-layer access picture; and encoding at least a third picture on a second scalability layer and on a temporal sub-layer higher than the lowest temporal sub-layer.
Broadcast channel architectures for block-based processors
Apparatus and methods are disclosed for example computer processors that are based on a hybrid dataflow execution model. In particular embodiments, a processor core in a block-based processor comprises: one or more functional units configured to perform functions using one or more operands; an instruction window comprising buffers configured to store individual instructions for execution by the processor core, the instruction window including one or more operand buffers for an individual instruction configured to store operand values; a control unit configured to execute the instructions in the instruction window and control operation of the one or more functional units; and a broadcast value store comprising a plurality of buffers dedicated to storing broadcast values, each buffer of the broadcast value store being associated with a respective broadcast channel from among a plurality of available broadcast channels.
APPARATUS, A METHOD AND A COMPUTER PROGRAM FOR VIDEO CODING AND DECODING
A method comprising: encoding a first picture on a first scalability layer and on a lowest temporal sub-layer; encoding a second picture on a second scalability layer and on the lowest temporal sub-layer, wherein the first picture and the second picture represent the same time instant, encoding one or more first syntax elements, associated with the first picture, with a value indicating that a picture type of the first picture is other than a step-wise temporal sub-layer access (STSA) picture; encoding one or more second syntax elements, associated with the second picture, with a value indicating that a picture type of the second picture is a step-wise temporal sub-layer access picture; and encoding at least a third picture on a second scalability layer and on a temporal sub-layer higher than the lowest temporal sub-layer.
Multimodal targets in a block-based processor
Apparatus and methods are disclosed for decoding targets from an instruction and transmitting data to those targets in accordance with a current instruction. Multimodal target hardware is used in conjunction with one or more of the routers so as to route data to an appropriate target. The data can be one or more operands or a predicate and the targets can include operand buffers, broadcast channels, and general registers. In this way, operands, for example, can be directed for use with multiple subsequent instructions, and there are multiple modes for distributing the operands to the multiple instructions.
Scheduler for AMP architecture with closed loop performance controller
Systems and methods are disclosed for scheduling threads on a processor that has at least two different core types, such as an asymmetric multiprocessing system. Each core type can run at a plurality of selectable voltage and frequency scaling (DVFS) states. Threads from a plurality of processes can be grouped into thread groups. Execution metrics are accumulated for threads of a thread group and fed into a plurality of tunable controllers for the thread group. A closed loop performance control (CLPC) system determines a control effort for the thread group and maps the control effort to a recommended core type and DVFS state. A closed loop thermal and power management system can limit the control effort determined by the CLPC for a thread group, and limit the power, core type, and DVFS states for the system. Deferred interrupts can be used to increase performance.
Implicit program order
Apparatus and methods are disclosed for controlling execution of memory access instructions in a block-based processor architecture using a hardware structure that generates a relative ordering of memory access instruction in an instruction block. In one example of the disclosed technology, a method of executing an instruction block having a plurality of memory load and/or memory store instructions includes decoding an instruction block encoding a plurality of memory access instructions and generating data indicating a relative order for executing the memory access instructions in the instruction block and scheduling operation of a portion of the instruction block based at least in part on the relative order data. In some examples, a store vector data register can store the generated relative ordering data for use in subsequent instances of the instruction block.
Apparatus, a method and a computer program for video coding and decoding
A method comprising: encoding a first picture on a first scalability layer and on a lowest temporal sub-layer; encoding a second picture on a second scalability layer and on the lowest temporal sub-layer, wherein the first picture and the second picture represent the same time instant, encoding one or more first syntax elements, associated with the first picture, with a value indicating that a picture type of the first picture is other than a step-wise temporal sub-layer access (STSA) picture; encoding one or more second syntax elements, associated with the second picture, with a value indicating that a picture type of the second picture is a step-wise temporal sub-layer access picture; and encoding at least a third picture on a second scalability layer and on a temporal sub-layer higher than the lowest temporal sub-layer.
HINT INSTRUCTION FOR MANAGING TRANSACTIONAL ABORTS IN TRANSACTIONAL MEMORY COMPUTING ENVIRONMENTS
When executed, a transaction-hint instruction specifies a transaction-count-to-completion (CTC) value for a transaction. The CTC value indicates how far a transaction is from completion. The CTC may be a number of instructions to completion or an amount of time to completion. The CTC value is adjusted as the transaction progresses. When a disruptive event associated with inducing transactional aborts, such as an interrupt or a conflicting memory access, is identified while processing the transaction, processing of the disruptive event is deferred if the adjusted CTC value satisfies deferral criteria. If the adjusted CTC value does not satisfy deferral criteria, the transaction is aborted and the disruptive event is processed.
Hint instruction for managing transactional aborts in transactional memory computing environments
When executed, a transaction-hint instruction specifies a transaction-count-to-completion (CTC) value for a transaction. The CTC value indicates how far a transaction is from completion. The CTC may be a number of instructions to completion or an amount of time to completion. The CTC value is adjusted as the transaction progresses. When a disruptive event associated with inducing transactional aborts, such as an interrupt or a conflicting memory access, is identified while processing the transaction, processing of the disruptive event is deferred if the adjusted CTC value satisfies deferral criteria. If the adjusted CTC value does not satisfy deferral criteria, the transaction is aborted and the disruptive event is processed.