Patent classifications
G06F9/3016
ATOMIC ADD WITH CARRY INSTRUCTION
Processing circuitry performs processing operations specified by program instructions. An instruction decoder decodes an atomic-add-with-carry instruction AAD-DC to control the processing circuitry to perform an atomic operation of an add of an addend operand value and a data value stored in a memory to generate a result value stored in the memory and a carry value indicative of whether or not the add generated a carry out.
IN-MEMORY COMPUTATIONAL DEVICE WITH BIT LINE PROCESSORS
A computing device includes bit line processors, multiplexers and a decoder. Each bit line processor includes a bit line of memory cells and each cell stores one bit of a data word. A column of bit line processors stores the bits of the data word. Each multiplexer connects a bit line processor in a first row of bit line processors to a bit line processor in a second row of bit line processors. The decoder activates at least two word lines of the bit line processor of the first row and a word line in the bit line processor in the second row and enables a bit line voltage associated with a result of a logical operation performed by the bit line processor in the first row to be written into the cell in the bit line processor in the second row.
HYBRID BLOCK-BASED PROCESSOR AND CUSTOM FUNCTION BLOCKS
Apparatus and methods are disclosed for implementing block-based processors having custom function blocks, including field-programmable gate array (FPGA) implementations. In some examples of the disclosed technology, a dynamically configurable scheduler is configured to issue at least one block-based processor instruction. A custom function block is configured to receive input operands for the instruction and generate ready state data indicating completion of a computation performed for the instruction by the respective custom function block.
Operand size control
A data processing system is provided with processing circuitry as well as a bank of 64-bit registers. An instruction decoder decodes arithmetic instructions and logical instruction specifying arithmetic operations and logical operations to be performed upon operands stored within the 64-bit registers. The instruction decoder is responsive to an operand size field SF within the arithmetic instructions and the logical instructions specifying whether the operands are 64-bit operands or 32-bit operands where all of the operands are 64-bit operands or all of the operands are 32-bit operands. If a switch is made to a lower exception level, then a check is made as to whether or not a register being used was previously subject to a 64-bit write to that register. If such a 64-bit write had previously taken place, then the upper 32-bits are flushed so as to avoid data leakage from the higher exception level.
Instruction and logic to provide stride-based vector load-op functionality with mask duplication
Instructions and logic provide vector load-op and/or store-op with stride functionality. Some embodiments, responsive to an instruction specifying: a set of loads, a second operation, destination register, operand register, memory address, and stride length; execution units read values in a mask register, wherein fields in the mask register correspond to stride-length multiples from the memory address to data elements in memory. A first mask value indicates the element has not been loaded from memory and a second value indicates that the element does not need to be, or has already been loaded. For each having the first value, the data element is loaded from memory into the corresponding destination register location, and the corresponding value in the mask register is changed to the second value. Then the second operation is performed using corresponding data in the destination and operand registers to generate results. The instruction may be restarted after faults.
Conditional execution support for ISA instructions using prefixes
In one embodiment, a processor includes an instruction decoder to receive a first instruction having a prefix and an opcode and to generate, by an instruction decoder of the processor, a second instruction executable based on a condition determined based on the prefix, and an execution unit to conditionally execute the second instruction based on the condition determined based on the prefix.
PROCESSOR WITH INSTRUCTION LOOKAHEAD ISSUE LOGIC
A processor having an instruction cache for storing a plurality of instructions is provided. The processor further includes annotation logic configured to determine a lookahead distance associated with an instruction and annotate the at least one instruction cache with the lookahead distance. The lookahead distance may correspond to a number of instructions that separates an instruction that references a register from the most recent register definition. The lookahead distance may indicate the shortest distance to a later instruction that references a register that this instruction defines.
PROCESSOR WITH MEMORY CONTROLLER INCLUDING DYNAMICALLY PROGRAMMABLE FUNCTIONAL UNIT
A processor including a memory controller for interfacing an external memory and a programmable functional unit (PFU). The PFU is programmed by a PFU program to modify operation of the memory controller, in which the PFU includes programmable logic elements and programmable interconnectors. For example, the PFU is programmed by the PFU program to add a function or otherwise to modify an existing function of the memory controller enhance its functionality during operation of the processor. In this manner, the functionality and/or operation of the memory controller is not fixed once the processor is manufactured, but instead the memory controller may be modified after manufacture to improve efficiency and/or enhance performance of the processor, such as when executing a corresponding process.
CONTROLLING THE NUMBER OF POWERED VECTOR LANES VIA A REGISTER FIELD
The vector data path is divided into smaller vector lanes. A register such as a memory mapped control register stores a vector lane number (VLX) indicating the number of vector lanes to be powered. A decoder converts this VLX into a vector lane control word, each bit controlling the ON of OFF state of the corresponding vector lane. This number of contiguous least significant vector lanes are powered. In the preferred embodiment the stored data VLX indicates that 2.sup.VLX contiguous least significant vector lanes are to be powered. Thus the number of vector lanes powered is limited to an integral power of 2. This manner of coding produces a very compact controlling bit field while obtaining substantially all the power saving advantage of individually controlling the power of all vector lanes.
STREAMING ENGINE WITH STREAM METADATA SAVING FOR CONTEXT SWITCHING
A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces addresses of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. Stream metadata is stored in response to a stream store instruction. Stored stream metadata is restored to the stream engine in response to a stream restore instruction. An interrupt changes an open stream to a frozen state discarding stored stream data. A return from interrupt changes a frozen stream to an active state.