Patent classifications
G06F9/3557
CAPABILITY-GENERATING ADDRESS CALCULATING INSTRUCTION
An apparatus has processing circuitry, an instruction decoder, and capability registers, each capability register to store a capability comprising a pointer and constraint metadata for constraining valid use of the pointer/capability. In response to a capability-generating address calculating instruction specifying an offset value, a reference capability register is selected as one of a program counter capability register and a further capability register. A result capability is generated for which the pointer of the result capability indicates a window address identifying a selected window within an address space, the selected window being offset from a reference window by a number of windows determined based on the offset value of the capability-generating address calculating instruction. The reference window comprises the window comprising an address indicated by the pointer of the reference capability register.
Hardware-implemented universal floating-point instruction set architecture for computing directly with human-readable decimal character sequence floating-point representation operands
A universal floating-point Instruction Set Architecture (ISA) compute engine implemented entirely in hardware. The ISA compute engine computes directly with human-readable decimal character sequence floating-point representation operands without first having to explicitly perform a conversion-to-binary-format process in software. A fully pipelined convertToBinaryFromDecimalCharacter hardware operator logic circuit converts one or more human-readable decimal character sequence floating-point representations to IEEE 754-2008 binary floating-point representations every clock cycle. Following computations by at least one hardware floating-point operator, a convertToDecimalCharacterFromBinary hardware conversion circuit converts the result back to a human-readable decimal character sequence floating-point representation.
Generation and use of memory access instruction order encodings
Apparatus and methods are disclosed for controlling execution of memory access instructions in a block-based processor architecture using a hardware structure that indicates a relative ordering of memory access instruction in an instruction block. In one example of the disclosed technology, a method of executing an instruction block having a plurality of memory load and/or memory store instructions includes selecting a next memory load or memory store instruction to execute based on dependencies encoded within the block, and on a store vector that stores data indicating which memory load and memory store instructions in the instruction block have executed. The store vector can be masked using a store mask. The store mask can be generated when decoding the instruction block, or copied from an instruction block header. Based on the encoded dependencies and the masked store vector, the next instruction can issue when its dependencies are available.
IMPLEMENTING A RECEIVED ADD PROGRAM COUNTER IMMEDIATE SHIFT (ADDPCIS) INSTRUCTION USING A MICRO-CODED OR CRACKED SEQUENCE
A computer program product for implementing a received add program counter immediate shift (ADDPCIS) instruction using a micro-coded or cracked sequence is provided. The computer program product includes a computer readable storage medium having program instructions embodied therewith. The program instructions are readable and executable by a processing circuit to cause the processing circuit to recognize register operand and integer terms associated with the ADDPCIS instruction, set a value of a target register associated with the ADDPCIS instruction in accordance with the integer term summed with another term by obtaining a next instruction address (NIA), moving an architecturally defined register file from a first temporary register to a general purpose register and adding a shifted immediate constant to a value stored in a second temporary register.
PROGRAM COUNTER (PC)-RELATIVE LOAD AND STORE ADDRESSING
Load store addressing can include a processor, which fuses two consecutive instruction determined to be prefix instructions and treats the two instructions as a single fused instruction. The prefix instruction of the fused instruction is auto-finished at dispatch time in an issue unit of the processor. A suffix instruction of the fused instruction and its fields and the prefix instruction's fields are issued from an issue queue of the issue unit, wherein an opcode of the suffix instruction is issued to a load store unit of the processor, and fields of the fused instruction are issued to the execution unit of the processor. The execution unit forms operands of the suffix instruction, at least one operand formed based on a current instruction address of the single fused instruction. The load store unit executes the suffix instruction using the operands formed by the execution unit.
METHOD FOR PATCHING CHIP AND CHIP
An embodiment of the present application discloses a method for patching a chip and a chip. The chip includes a first program, and the method includes: when a function that needs to be replaced in the first program is run, executing an interrupt service routine according to a pre-stored correspondence relationship between an address of the function that needs to be replaced and an interrupt instruction, where the interrupt service routine is a service routine scheduled by an interrupt instruction corresponding to the function that needs to be replaced, and a return address of the interrupt service routine is an address of a patch function of the function that needs to be replaced; and running the patch function according to the address of the patch function, to perform patch processing on the first program.
Apparatus and method for generating intermediate layer values in parallel
A memory apparatus and an operation method thereof are provided. The memory apparatus includes a mode configuration register, a system memory array, a pointer and an arithmetic circuit including logic operation units. The mode configuration register stores weight matrix information and a base address. The system memory array stores feature values in a feature map from the base address according to the weight matrix information. The pointer stores the base address and a weight matrix size to provide pointer information. The arithmetic circuit sequentially or parallelly reads the feature values according to the pointer information. The arithmetic circuit parallelly arranges weight coefficients of a selected weight matrix and the corresponding feature values in each of the corresponding logic operation units according to the weight matrix information, and causes the logic operation units to perform computing operations parallelly to output intermediate layer feature values to an external processing unit.
SYSTEM AND METHOD FOR ADDRESSING DATA IN MEMORY
A digital signal processor having a CPU with a program counter register and, optionally, an event context stack pointer register for saving and restoring the event handler context when higher priority event preempts a lower priority event handler. The CPU is configured to use a minimized set of addressing modes that includes using the event context stack pointer register and program counter register to compute an address for storing data in memory. The CPU may also eliminate post-decrement, pre-increment and post-decrement addressing and rely only on post-increment addressing.
Delivering immediate values by using program counter (PC)-relative load instructions to fetch literal data in processor-based devices
Delivering immediate values by using program counter (PC)-relative load instructions to fetch literal data in processor-based devices is disclosed. In this regard, a processing element (PE) of a processor-based device provides an execution pipeline circuit that comprises an instruction processing portion and a data access portion. Using a literal data access logic circuit, the PE detects a PC-relative load instruction within a fetch window that includes multiple fetched instructions. The PE determines that the PC-relative load instruction can be serviced using literal data that is available to the instruction processing portion of the execution pipeline circuit (e.g., located within the fetch window containing the PC-relative load instruction, or stored in a literal pool buffer), The PE then retrieves the literal data within the instruction processing portion of the execution pipeline circuit, and executes the PC-relative load instruction using the literal data.
Fully pipelined hardware operator logic circuit for converting human-readable decimal character sequence floating-point representations to IEEE 754-2008 binary floating-point format representations
A fully pipelined convertToBinaryFromDecimalCharacter hardware operator logic circuit configured to convert one or more human-readable decimal character sequence floating-point representations to IEEE 754-2008 binary floating-point representations every clock cycle. The circuit converts decimal character sequence floating-point representations up to 28 decimal digits in length to IEEE 754 binary64, binary32, or binary16 floating-point format representations.