Patent classifications
G06F9/38
Instruction packing scheme for VLIW CPU architecture
A processor is provided and includes a core that is configured to perform a decode operation on a multi-instruction packet comprising multiple instructions. The decode operation includes receiving the multi-instruction packet that includes first and second instructions. The first instruction includes a primary portion at a fixed first location and a secondary portion. The second instruction includes a primary portion at a fixed second location between the primary portion of the first instruction and the secondary portion of the first instruction. An operational code portion of the primary portion of each of the first and second instructions is accessed and decoded. An instruction packet including the primary and secondary portions of the first instruction is created, and a second instruction packet including the primary portion of the second instruction is created. The first and second instructions packets are dispatched to respective first and second functional units.
Apparatus and method for inhibiting instruction manipulation
An apparatus and method are provided for inhibiting instruction manipulation. The apparatus has execution circuitry for performing data processing operations in response to a sequence of instructions from an instruction set, and decoder circuitry for decoding each instruction in the sequence in order to generate control signals for the execution circuitry. Each instruction comprises a plurality of instruction bits, and the decoder circuitry is arranged to perform a decode operation on each instruction to determine from the value of each instruction bit, and knowledge of the instruction set, the control signals to be issued to the execution circuitry in response to that instruction. An input path to the decoder circuitry comprises a set of wires over which the instruction bits of each instruction are provided. Scrambling circuitry is used to perform a scrambling function on each instruction using a secret scrambling key, such that the wire within the set of wires over which any given instruction bit is provided to the decoder circuitry is dependent on the secret scrambling key. The decode operation performed by the decoder circuitry is then adapted to incorporate a descrambling function using the secret scrambling key to reverse the effect of the scrambling function. As a result, independent of which wire any given instruction bit is provided on, the decode operation is arranged when decoding a given instruction to correctly interpret each instruction bit of that given instruction, based on knowledge of the instruction set, in order to determine from the value of each instruction bit the control signals to be issued to the execution circuitry in response to that given instruction.
Offloading for gradient computation
Techniques for selectively offloading data that is computed by a first processing unit during training of an artificial neural network onto memory associated with a second processing unit and transferring the data back to the first processing unit when the data is needed for further processing are described herein. For example, the first processing unit may compute activations for operations associated with forward propagation. During the forward propagation, one or more of the activations may be transferred to a second processing unit for storage. Then, during backpropagation for the artificial neural network, the activations may be transferred back to the first processing unit as needed to compute gradients.
TECHNIQUES FOR HYBRID COMPUTER THREAD CREATION AND MANAGEMENT
A technique for operating a computer system to support an application, a first application server environment, and a second application server environment includes intercepting a work request relating to the application issued to the first application server environment prior to execution of the work request. A thread adapted for execution in the first application server environment is created. A context is attached to the thread that non-disruptively modifies the thread into a hybrid thread that is additionally suitable for execution in the second application server environment. The hybrid thread is returned to the first application server environment.
OPERATION OF A DUAL INSTRUCTION PIPE VIRUS CO-PROCESSOR
Circuits and methods are provided for detecting, identifying and/or removing undesired content. According to one embodiment, a method for performing content scanning of content objects is provided. A content object that is to be scanned is stored by a general purpose processor to a system memory of the general purpose processor. Content scanning parameters associated with the content object are set up by the general purpose processor. Instructions from a signature memory of a co-processor that is coupled to the general purpose processor are read by the co-processor based on the content scanning parameters. The instructions contain op-codes of a first instruction type and op-codes of a second instruction type. Those of the instructions containing op-codes of the first instruction type are assigned by the co-processor to a first instruction pipe of multiple instruction pipes of the co-processor for execution. An instruction of the assigned instructions containing op-codes of the first instruction type is executed by the first instruction pipe including accessing a portion of the content object from the system memory.
INDIRECT BRANCH PREDICTION
Methods and indirect branch predictor logic units to predict the target addresses of indirect branch instructions. The method comprises storing in a table predicted target addresses for indirect branch instructions indexed by a combination of the indirect path history for previous indirect branch instructions and the taken/not-taken history for previous conditional branch instructions. When a new indirect branch instruction is received for prediction, the indirect path history and the taken/not-taken history are combined to generate an index for the indirect branch instruction. The generated index is then used to identify a predicted target address in the table. If the identified predicted target address is valid, then the target address of the indirect branch instruction is predicted to be the predicted target address.
APPARATUS AND METHOD FOR PROPAGATING CONDITIONALLY EVALUATED VALUES IN SIMD/VECTOR EXECUTION USING AN INPUT MASK REGISTER
An apparatus and method for propagating conditionally evaluated values are disclosed. For example, a method according to one embodiment comprises: reading each value contained in an input mask register, each value being a true value or a false value and having a bit position associated therewith; for each true value read from the input mask register, generating a first result containing the bit position of the true value; for each false value read from the input mask register following the first true value, adding the vector length of the input mask register to a bit position of the last true value read from the input mask register to generate a second result; and storing each of the first results and second results in bit positions of an output register corresponding to the bit positions read from the input mask register.
ENGINE ARCHITECTURE FOR PROCESSING FINITE AUTOMATA
An engine architecture for processing finite automata includes a hyper non-deterministic automata (HNA) processor specialized for non-deterministic finite automata (NFA) processing. The HNA processor includes a plurality of super-clusters and an HNA scheduler. Each super-cluster includes a plurality of clusters. Each cluster of the plurality of clusters includes a plurality of HNA processing units (HPUs). A corresponding plurality of HPUs of a corresponding plurality of clusters of at least one selected super-cluster is available as a resource pool of HPUs to the HNA scheduler for assignment of at least one HNA instruction to enable acceleration of a match of at least one regular expression pattern in an input stream received from a network.
ADVANCED PROCESSOR ARCHITECTURE
The invention relates to a method for processing instructions out-of-order on a processor comprising an arrangement of execution units. The inventive method comprises: 1) looking up operand sources in a Register Positioning Table and setting operand input references of the instruction to be issued accordingly; 2) checking for an Execution Unit (EXU) available for receiving a new instruction; and 3) issuing the instruction to the available Execution Unit and enter a reference of the result register addressed by the instruction to be issued to the Execution Unit into the Register Positioning Table (RPT).
SEQUENTIAL MONITORING AND MANAGEMENT OF CODE SEGMENTS FOR RUN-TIME PARALLELIZATION
A processor includes an instruction pipeline and control circuitry. The instruction pipeline is configured to process instructions of program code. The control circuitry is configured to monitor the processed instructions at run-time, to construct an invocation data structure comprising multiple entries, wherein each entry (i) specifies an initial instruction that is a target of a branch instruction, (ii) specifies a portion of the program code that follows one or more possible flow-control traces beginning from the initial instruction, and (iii) specifies, for each possible flow-control trace specified in the entry, a next entry that is to be processed following processing of that possible flow-control trace, and to configure the instruction pipeline to process segments of the program code, by continually traversing the entries of the invocation data structure.