Patent classifications
G06F9/30105
LOGIC CIRCUITRY
A logic circuitry package for a replaceable print apparatus component comprises an interface to communicate with a print apparatus logic circuit, and at least one logic circuit. The logic circuit may be configured to identify, from a command stream received from the print apparatus, parameters including a class parameter, and/or identify, from the command stream, a read request, and output, via the interface, a count value in response to a read request, the count value based on identified received parameters.
CONTROLLING THE NUMBER OF POWERED VECTOR LANES VIA A REGISTER FIELD
The vector data path is divided into smaller vector lanes. A register such as a memory mapped control register stores a vector lane number (VLX) indicating the number of vector lanes to be powered. A decoder converts this VLX into a vector lane control word, each bit controlling the ON of OFF state of the corresponding vector lane. This number of contiguous least significant vector lanes are powered. In the preferred embodiment the stored data VLX indicates that 2.sup.VLX contiguous least significant vector lanes are to be powered. Thus the number of vector lanes powered is limited to an integral power of 2. This manner of coding produces a very compact controlling bit field while obtaining substantially all the power saving advantage of individually controlling the power of all vector lanes.
Apparatus and method for complex by complex conjugate multiplication
An apparatus and method for multiplying packed real and imaginary components of complex numbers are described. A processor embodiment includes: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed real and imaginary data elements; a second source register to store a second plurality of packed real and imaginary data elements; and execution circuitry to execute the decoded instruction. The execution circuitry includes: multiplier circuitry to select real and imaginary data elements in the first source register and second source, multiply each selected imaginary data element in the first source register with a selected real data element in the second source register, and multiply each selected real data element in the first source register with a selected imaginary data element in the second source register to generate a plurality of imaginary products; adder circuitry to add a first subset of the plurality of imaginary products and subtract a second subset of the plurality of imaginary products to generate a first temporary result, and to add a third subset of the plurality of imaginary products and subtract a fourth subset of the plurality of imaginary products to generate a second temporary result; and accumulation circuitry to combine the first temporary result with first data from a destination register to generate a first final result, combine the second temporary result with second data from the destination register to generate a second final result, and store the first final result and second final result back in the destination register.
Systems, Apparatuses, And Methods For Fused Multiply Add
Embodiments of systems, apparatuses, and methods for fused multiple add. In some embodiments, a decoder decodes a single instruction having an opcode, a destination field representing a destination operand, and fields for a first, second, and third packed data source operand, wherein packed data elements of the first and second packed data source operand are of a first, different size than a second size of packed data elements of the third packed data operand. Execution circuitry then executes the decoded single instruction to perform, for each packed data element position of the destination operand, a multiplication of a M N-sized packed data elements from the first and second packed data sources that correspond to a packed data element position of the third packed data source, add of results from these multiplications to a full-sized packed data element of a packed data element position of the third packed data source, and storage of the addition result in a packed data element position destination corresponding to the packed data element position of the third packed data source, wherein M is equal to the full-sized packed data element divided by N.
Gather-op instruction to duplicate a mask and perform an operation on vector elements gathered via tracked offset-based gathering
Instructions and logic provide vector scatter-op and/or gather-op functionality. In some embodiments, responsive to an instruction specifying: a gather and a second operation, a destination register, an operand register, and a memory address; execution units read values in a mask register, wherein fields in the mask register correspond to offset indices in the indices register for data elements in memory. A first mask value indicates the element has not been gathered from memory and a second value indicates that the element does not need to be, or has already been gathered. For each having the first value, the data element is gathered from memory into the corresponding destination register location, and the corresponding value in the mask register is changed to the second value. When all mask register fields have the second value, the second operation is performed using corresponding data in the destination and operand registers to generate results.
Pipelined cascaded digital signal processing structures and methods
Circuitry operating under a floating-point mode or a fixed-point mode includes a first circuit accepting a first data input and generating a first data output. The first circuit includes a first arithmetic element accepting the first data input, a plurality of pipeline registers disposed in connection with the first arithmetic element, and a cascade register that outputs the first data output. The circuitry further includes a second circuit accepting a second data input and generating a second data output. The second circuit is cascaded to the first circuit such that the first data output is connected to the second data input via the cascade register. The cascade register is selectively bypassed when the first circuit is operated under the fixed-point mode.
System and Method for Contextual Vectorization of Instructions at Runtime
Methods and apparatuses relating to processors that contextually optimize instructions at runtime are disclosed. In one embodiment, a processors includes a fetch circuit to fetch an instruction from an instruction storage, a format of the instruction including an opcode, a first source operand identifier, and a second source operand identifier; wherein the instruction storage includes a sequence of sub-optimal instructions preceded by a start-of-sequence instruction and followed by an end-of-sequence instruction. The disclosed processor further includes a decode circuit to decode the instruction, to detect the start-of-sequence instruction and the end-of-sequence instruction, to buffer the sequence of sub-optimal instructions there between, to access a lookup table to identify one or more optimized instructions to substitute for one or more of the sequence of sub-optimal instructions, and to select either the decoded instruction or the sequence of one or more optimized instructions to dispatch to an execution circuit.
Register files for storing data operated on by instructions of multiple widths
A processor core includes even and odd execution slices each having a register file. The slices are each configured to perform operations specified in a first set of instructions on data from its respective register file, and together configured to perform operations specified in a second set of instructions on data stored across both register files. During utilization, the processor receives a first instruction of the first set specifying an operation, a target register, and a source register. Next, a second instruction upon which content of the source register depends is identified as being of the second set. In response, the first instruction is dispatched to the even slice. In accordance with the operation specified in the first instruction, the even slice uses content of the source register in its register file to produce a result. Copies of the result are written to the target register in both register files.
CONTROLLING THE NUMBER OF POWERED VECTOR LANES VIA A REGISTER FIELD
The vector data path is divided into smaller vector lanes. A register such as a memory mapped control register stores a vector lane number (VLX) indicating the number of vector lanes to be powered. A decoder converts this VLX into a vector lane control word, each bit controlling the ON of OFF state of the corresponding vector lane. This number of contiguous least significant vector lanes are powered. In the preferred embodiment the stored data VLX indicates that 2.sup.VLX contiguous least significant vector lanes are to be powered. Thus the number of vector lanes powered is limited to an integral power of 2. This manner of coding produces a very compact controlling bit field while obtaining substantially all the power saving advantage of individually controlling the power of all vector lanes.
Logic circuitry package accessible for a time period duration while disregarding inter-integrated circuitry traffic
In an example, a logic circuitry package has a first address and comprises a first logic circuit. In some examples, the first address is an I2C address for the first logic circuit, and the package is configured such that, in response to a first command indicative of a task and a first time period sent to the first address, the first logic circuit is to, for a duration of the time period, perform a task, and disregard I2C traffic sent to the first address.