Patent classifications
G06F9/30192
HARDWARE SUPPORT FOR DYNAMIC DATA TYPES AND OPERATORS
A decoder circuit may be configured to receive an instruction which includes a plurality of data bits and decode a first subset of the plurality of data bits. A transcode circuit may be configured to determine if the received instruction is to be modified and, in response to a determination that the received instruction is to be modified, modify a second subset of the plurality of data bits.
Apparatus and method for propagating conditionally evaluated values in SIMD/vector execution using an input mask register
An apparatus and method for propagating conditionally evaluated values are disclosed. For example, a method according to one embodiment comprises: reading each value contained in an input mask register, each value being a true value or a false value and having a bit position associated therewith; for each true value read from the input mask register, generating a first result containing the bit position of the true value; for each false value read from the input mask register following the first true value, adding the vector length of the input mask register to a bit position of the last true value read from the input mask register to generate a second result; and storing each of the first results and second results in bit positions of an output register corresponding to the bit positions read from the input mask register.
System and methods for expandably wide processor instructions
Expandably wide operations are disclosed in which operands wider than the data path between a processor and memory are used in executing instructions. The expandably wide operands reduce the influence of the characteristics of the associated processor in the design of functional units performing calculations, including the width of the register file, the processor clock rate, the exception subsystem of the processor, and the sequence of operations in loading and use of the operand in a wide cache memory.
PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS TO PARTITION A SOURCE PACKED DATA INTO LANES
A processor includes a decode unit to decode an instruction that is to indicate a source packed data that is to include a plurality of adjoining data elements, a number of data elements, and a destination. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the instruction, is to store a result packed data in the destination. The result packed data is to have a plurality of lanes that are each to store a different non-overlapping set of the indicated number of adjoining data elements aligned with a least significant end of the respective lane. The different non-overlapping sets of the indicated number of the adjoining data elements in adjoining lanes of the result packed data are to be separated from one another by at least one most significant data element position of the less significant lane.
Post-silicon configurable instruction behavior based on input operands
A system and method for controlling post-silicon configurable instruction behavior are provided. For example, the method includes receiving data related to a compute circuit. The method also includes detecting a data pattern in the data. The method further includes determining that the data pattern is a special case that the compute circuit may handle improperly. The method also includes selecting a value from a post-silicon configurable data set based on the detected data. Further, the method includes changing a behavior of the compute circuit to produce a different output result based on the value selected from the post-silicon configurable data set.
PROCESSING MIXED-SCALAR-VECTOR INSTRUCTIONS
Processing circuitry supports overlapped execution of vector instructions when at least one beat of a first vector instruction is performed in parallel with at least one beat of a second vector instruction. The processing circuitry also supports mixed-scalar-vector instructions for which one of a destination register and one or more source registers is a vector register and another is a scalar register. In a sequence including first and subsequent mixed-scalar-vector instructions, instances of relaxed execution which can potentially lead to uncertain and incorrect results are permitted by the processing circuitry when the instructions are separated by fewer than a predetermined number of intervening instructions. In practice the situations which lead to the uncertain results are very rare and so it is not justified providing relatively expensive dependency checking circuitry for eliminating such cases.
Optimizing data processing using dynamic schemas
A computer system access feed data belonging to a particular feed, and executes a dynamic server statement to create a relational dataset with data type fields from the feed data in an in-memory table of the server. The dynamic server statement is stored within metadata associated with the feed. The computer system also applies a second dynamic server statement to the data feed which applies one or more data processing conditions indicated in the metadata. The second dynamic server statement is also stored within the metadata associated with the feed. Upon determining that one or more feed data rows match the data processing conditions, the computer system places feed data row information about the matching data rows into an alert table that includes references to a regional blob table with blob data, thereby triggering an alert.
Enhanced Macroscalar predicate operations
Systems, apparatuses and methods for utilizing enhanced macro scalar predicate operations which take enhanced predicate operands that designate the element width and which elements are to be processed. The element width and the number of elements per vector are determined at run-time rather than being defined in the architectural definition of the instruction. This enables additional parallelism when processing smaller-sized data. The instruction performs the requested operation on the elements specified by the enhanced control predicate, assuming an element-width also specified by the enhanced control predicate, and returns the result as an enhanced predicate of the same element width.
Method and Computing System for Handling Instruction Execution Using Affine Register File on Graphic Processing Unit
The present invention provides an affine engine design to the microarchitecture of the graphic processing unit, in which an operand type detection is performed, and then physical scalar, affine, or vector registers and corresponding ALUs with maximum performance improving and energy saving are allocated to perform instruction execution. In runtime, affine and uniform instructions are executed by the affine engine, while general vector instructions are executed by a vector engine, thereby the affine/uniform instruction execution can be dispatched to the affine engine, so the vector engine can enter a power-saving state to save the energy consumption of the GPU.
Vector friendly instruction format and execution thereof
- Robert C. Valentine ,
- Jesus Corbal San Adrian ,
- Roger Espasa Sans ,
- Robert D. Cavin ,
- Bret L. Toll ,
- Santiago Galan Duran ,
- Jeffrey G. Wiedemeier ,
- Sridhar Samudrala ,
- Milind Baburao Girkar ,
- Edward Thomas Grochowski ,
- Jonathan Cannon Hall ,
- Dennis R. Bradford ,
- Elmoustapha Ould-Ahmed-Vall ,
- James C Abel ,
- Mark Charney ,
- Seth Abraham ,
- Suleyman Sair ,
- Andrew Thomas Forsyth ,
- Lisa Wu ,
- Charles Yount
A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams.