Patent classifications
G06F9/30098
ADVANCED PROCESSOR ARCHITECTURE
The invention relates to a method for processing instructions out-of-order on a processor comprising an arrangement of execution units. The inventive method comprises: 1) looking up operand sources in a Register Positioning Table and setting operand input references of the instruction to be issued accordingly; 2) checking for an Execution Unit (EXU) available for receiving a new instruction; and 3) issuing the instruction to the available Execution Unit and enter a reference of the result register addressed by the instruction to be issued to the Execution Unit into the Register Positioning Table (RPT).
TECHNIQUES FOR METADATA PROCESSING
Techniques are described for metadata processing that can be used to encode an arbitrary number of security policies for code running on a processor. Metadata may be added to every word in the system and a metadata processing unit may be used that works in parallel with data flow to enforce an arbitrary set of policies. In one aspect, the metadata may be characterized as unbounded and software programmable to be applicable to a wide range of metadata processing policies. Techniques and policies have a wide range of uses including, for example, safety, security, and synchronization. Additionally, described are aspects and techniques in connection with metadata processing in an embodiment based on the RISC-V architecture.
Method and tensor traversal engine for strided memory access during execution of neural networks
A tensor traversal engine in a processor system comprising a source memory component and a destination memory component, the tensor traversal engine comprising: a control signal register storing a control signal for a strided data transfer operation from the source memory component to the destination memory component, the control signal comprising an initial source address, an initial destination address, a first source stride length in a first dimension, and a first source stride count in the first dimension; a source address register communicatively coupled to the control signal register; a destination address register communicatively coupled to the control signal register; a first source stride counter communicatively coupled to the control signal register; and control logic communicatively coupled to the control signal register, the source address register, and the first source stride counter.
METHOD AND APPARATUS TO SORT A VECTOR FOR A BITONIC SORTING ALGORITHM
A method is provided that includes performing, by a processor in response to a vector sort instruction, sorting of values stored in lanes of the vector to generate a sorted vector, wherein the values in a first portion of the lanes are sorted in a first order indicated by the vector sort instruction and the values in a second portion of the lanes are sorted in a second order indicated by the vector sort instruction; and storing the sorted vector in a storage location.
Accumulating data values and storing in first and second storage devices
Herein described is a method of operating an accumulation process in a data processing apparatus. The accumulation process comprises a plurality of accumulations which output a respective plurality of accumulated values, each based on a stored value and a computed value generated by a data processing operation. The method comprises storing a first accumulated value, the first accumulated value being one of said plurality of accumulated values, into a first storage device comprising a plurality of single-bit storage elements; determining that a predetermined trigger has been satisfied with respect to the accumulation process; and in response to the determining, storing at least a portion of a second accumulated value, the second accumulated value being one of said plurality of accumulated values, into a second storage device.
Apparatuses and methods for approximating nonlinear function
The present disclosure relates to a method and an apparatus for approximating non-linear function. In some embodiments, an exemplary processing unit includes: one or more registers for storing a lookup table (LUT) and one or more operation elements communicatively coupled with the one or more registers. The LUT includes a control state and a plurality of data entries. The one or more operation elements are configured to: receive an input operand; select one or more bits from the input operand; select a data entry from the plurality of data entries using the one or more bits; and determine an approximation value of a non-linear activation function for the input operand using the data entry.
Moving entries between multiple levels of a branch predictor based on a performance loss resulting from fewer than a pre-set number of instructions being stored in an instruction cache register
An instruction processing device and an instruction processing method are provided. The instruction processing device includes: a first-level branch target buffer, configured to store entries of a first plurality of branch instructions; a second-level branch target buffer, configured to store entries of a second plurality of branch instructions, wherein the entries in the first-level branch target buffer are accessed faster than the entries in the second-level branch target buffer; an instruction fetch unit coupled to the first-level branch target buffer and the second-level branch target buffer, the instruction fetch unit including circuitry configured to add, for a first branch instruction, one or more entries corresponding to the first branch instruction into the first-level branch target buffer when the one or more entries corresponding to the first branch instruction are identified in the second-level branch target buffer; and an execution unit including circuitry configured to execute the first branch instruction.
ISA accessible physical unclonable function
Techniques for encrypting data using a key generated by a physical unclonable function (PUF) are described. An apparatus according to the present disclosure may include decoder circuitry to decode an instruction and generate a decoded instruction. The decoded instruction includes operands and an opcode. The opcode indicates that execution circuitry is to encrypt data using a key generated by a PUF. The apparatus may further include execution circuitry to execute the decoded instruction according to the opcode to encrypt the data to generate encrypted data using the key generated by the PUF.
FUNCTION EXECUTION IN SYSTEM MANAGEMENT MODES
In some examples, executable code causes a processor to execute a first function and enter a system management mode responsive to receipt of a first system management interrupt during the execution of the first function. The executable code causes the processor to pause execution of the first function, save a state of the processor in the system management mode, and exit the system management mode. The executable code causes the processor to execute a second function identified by the first system management interrupt and receive a second system management interrupt to enter the system management mode. The executable code causes the processor to restore the state of the processor in the system management mode and exit the system management mode. The executable code causes the processor to resume execution of the first function after exiting the system management mode.
COMMANDED JTAG TEST ACCESS PORT OPERATIONS
The disclosure describes a novel method and apparatus for improving the operation of a TAP architecture in a device through the use of Command signal inputs to the TAP architecture. In response to a Command signal input, the TAP architecture can perform streamlined and uninterrupted Update, Capture and Shift operation cycles to a target circuit in the device or streamlined and uninterrupted capture and shift operation cycles to a target circuit in the device. The Command signals can be input to the TAP architecture via the devices dedicated TMS or TDI inputs or via a separate CMD input to the device.