G06F11/3656

IMPLICIT PROGRAM ORDER

Apparatus and methods are disclosed for controlling execution of memory access instructions in a block-based processor architecture using a hardware structure that generates a relative ordering of memory access instruction in an instruction block. In one example of the disclosed technology, a method of executing an instruction block having a plurality of memory load and/or memory store instructions includes decoding an instruction block encoding a plurality of memory access instructions and generating data indicating a relative order for executing the memory access instructions in the instruction block and scheduling operation of a portion of the instruction block based at least in part on the relative order data. In some examples, a store vector data register can store the generated relative ordering data for use in subsequent instances of the instruction block.

BROADCAST CHANNEL ARCHITECTURES FOR BLOCK-BASED PROCESSORS

Apparatus and methods are disclosed for example computer processors that are based on a hybrid dataflow execution model. In particular embodiments, a processor core in a block-based processor comprises: one or more functional units configured to perform functions using one or more operands; an instruction window comprising buffers configured to store individual instructions for execution by the processor core, the instruction window including one or more operand buffers for an individual instruction configured to store operand values; a control unit configured to execute the instructions in the instruction window and control operation of the one or more functional units; and a broadcast value store comprising a plurality of buffers dedicated to storing broadcast values, each buffer of the broadcast value store being associated with a respective broadcast channel from among a plurality of available broadcast channels.

MEMORY SYNCHRONIZATION IN BLOCK-BASED PROCESSORS

Apparatus and methods are disclosed for performing memory operations instructions in a block-based processor architecture. In certain examples of the disclosed technology, a block-based processor core coupled to memory includes a control unit configured to issue one or more memory operations encoded in an instruction block allocated to the core and to commit the core when execution of the instruction block is complete, a memory store queue configured to cache one or more operands for the one or more memory operations, where a result of performing the memory operations is not architecturally visible unless the instruction block is committed by the control unit, and a memory interface configured to store the cached operands in the memory responsive to the instruction block committing. In some examples, the block-based processor core supports memory synchronization using load linked and store conditional instructions.

GENERATION AND USE OF BLOCK BRANCH METADATA

Apparatus and methods are disclosed for generating and using block branch metadata in block-based processor architectures. In one example of the disclosed technology, a block-based processor is configured to dynamically generate metadata representing control flow, exit points, and control flow probabilities for an instruction block while decoding and executing the block. The metadata can be used with subsequent invocations of the instruction block for branch and memory dependence predictions. In some examples, an incomplete portion of a control flow representation is generated for a number of predicated instructions and stored in a memory or storage device for enhancing prediction and prefetch for subsequent invocations of an instruction block.

PREDICATED READ INSTRUCTIONS

Apparatus and methods are disclosed for example computer processors that are based on a hybrid dataflow execution model. Embodiments of the disclosed technology use read instructions to retrieve a value from a specified register in the register file of the processor architecture and send the value for use by one or more targets (e.g., other instructions in the instruction block). The read instruction may be predicated such that the instruction is only executed when a predicate condition is satisfied. In some examples of the disclosed technology, a compiler for such processors performs an analysis of the source and/or object code being compiled in order to determine whether operation(s) along conditional paths can be executed before or concurrently with determination of a condition on which the conditional operation(s) depend, thus improving processor efficiency.

Cache debug system for programmable circuits
09594655 · 2017-03-14 · ·

An integrated circuit may be provided with system-on-chip circuitry including system-on-chip interconnects and a microprocessor unit subsystem. The subsystem may include microprocessor cores that execute instructions stored in memory. Cache may be used to cache data for the microprocessor cores. A memory coherency control unit may be used to maintain memory coherency during operation of the microprocessor unit subsystem. The memory coherency control unit may be coupled to the system-on-chip interconnects by a bus. A command translator may be interposed in the bus. The command translator may have a slave interface that communicates with the interconnects and a master interface that communicates with the memory coherency control unit. The integrated circuit may have programmable circuitry that is programmed to implement a debug master coupled to the interconnects. During debug operations, the command translator may translate commands from the debug master.

SCAN TOPOLOGY DISCOVERY IN TARGET SYSTEMS
20170059653 · 2017-03-02 ·

Topology discovery of a target system having a plurality of components coupled with a scan topology may be performed by driving a low logic value on the data input signal and a data output signal of the scan topology. An input data value and an output data value for each of the plurality of components is sampled and recorded. A low logic value is then scanned through the scan path and recorded at each component. The scan topology may be determined based on the recorded data values and the recorded scan values.

ALTERNATE SIGNALING MECHANISM USING CLOCK AND DATA
20170059654 · 2017-03-02 ·

Control events may be signaled to a target system having a plurality of components coupled to a scan path by using the clock and data signals of the scan path. While the clock signal is held a high logic level, two or more edge transitions are detected on the data signal. The number of edge transitions on the data signal is counted while the clock signal is held at the high logic state. A control event is determined based on the counted number of edge transitions on the data signal after the clock signal transitions to the low logic state.

APPARATUS INCLUDING CORE AND CLOCK GATING CIRCUIT AND METHOD OF OPERATING SAME

A device may include a first core, a master clock core, and a clock gating circuit. The master clock core may generate a master clock signal. The clock gating circuit may clock gate the master clock signal in response to a stall signal from the first core.

OPTIMIZED JTAG INTERFACE
20170059652 · 2017-03-02 ·

An optimized JTAG interface is used to access JTAG Tap Domains within an integrated circuit. The interface requires fewer pins than the conventional JTAG interface and is thus more applicable than conventional JTAG interfaces on an integrated circuit where the availability of pins is limited. The interface may be used for a variety of serial communication operations such as, but not limited to, serial communication related integrated circuit test, emulation, debug, and/or trace operations.