G06F7/38

Processing with compact arithmetic processing element
09792088 · 2017-10-17 · ·

A processor or other device, such as a programmable and/or massively parallel processor or other device, includes processing elements designed to perform arithmetic operations (possibly but not necessarily including, for example, one or more of addition, multiplication, subtraction, and division) on numerical values of low precision but high dynamic range (“LPHDR arithmetic”). Such a processor or other device may, for example, be implemented on a single chip. Whether or not implemented on a single chip, the number of LPHDR arithmetic elements in the processor or other device in certain embodiments of the present invention significantly exceeds (e.g., by at least 20 more than three times) the number of arithmetic elements, if any, in the processor or other device which are designed to perform high dynamic range arithmetic of traditional precision (such as 32 bit or 64 bit floating point arithmetic).

Loading values from a value vector into subregisters of a single instruction multiple data register

A method and apparatus for efficiently processing data in various formats in a single instruction multiple data (“SIMD”) architecture is presented. Specifically, a method to unpack a fixed-width bit values in a bit stream to a fixed width byte stream in a SIMD architecture is presented. A method to unpack variable-length byte packed values in a byte stream in a SIMD architecture is presented. A method to decompress a run length encoded compressed bit-vector in a SIMD architecture is presented. A method to return the offset of each bit set to one in a bit-vector in a SIMD architecture is presented. A method to fetch bits from a bit-vector at specified offsets relative to a base in a SIMD architecture is presented. A method to compare values stored in two SIMD registers is presented.

Anticipated prefetching for a parent core in a multi-core chip

Embodiments relate to prefetching data on a chip having a scout core and a parent core coupled to the scout core. The method includes determining that a program executed by the parent core requires content stored in a location remote from the parent core. The method includes sending a fetch table address determined by the parent core to the scout core. The method includes accessing a fetch table that is indicated by the fetch table address by the scout core. The fetch table indicates how many of pieces of content are to be fetched by the scout core and a location of the pieces of content. The method includes based on the fetch table indicating, fetching the pieces of content by the scout core. The method includes returning the fetched pieces of content to the parent core.

Method and circuit for integrating a programmable matrix in the field of reconfigurable logic gates employing a non-lineal system and an efficient programmable rewiring

The present invention relates to the field of reconfigurable computing also known as dynamic computing and, more particularly, to reconfigurable architectures logic gates and programmable wiring connections between them and the input interfaces and output interfaces. There is growing interest in developing new hardware architectures to complement or replace existing static architectures, and recently, there has been a theoretical direction to explore the richness of nonlinear dynamical systems to implement reconfigurable hardware (dynamic). The present invention is to use a nonlinear to emulate different logic gates dynamic system that are the basis of general-purpose computing, and after obtaining the logic gates, integrate these elements into a programmable device by the user, ie for create a field programmable array of reconfigurable logic gates.

Floating point instruction with selectable comparison attributes

An instruction to perform a comparison of a first value and a second value is executed. Based on a control of the instruction, a compare function to be performed is determined. The compare function is one of a plurality of compare functions configured for the instruction, and the compare function has a plurality of options for comparison. A compare option based on the first value and the second value is selected from the plurality of options defined for the compare function, and used to compare the first value and the second value. A result of the comparison is then placed in a select location, the result to be used in processing within a computing environment.

PROCESSING WITH COMPACT ARITHMETIC PROCESSING ELEMENT
20230168861 · 2023-06-01 ·

A processor or other device, such as a programmable and/or massively parallel processor or other device, includes processing elements designed to perform arithmetic operations (possibly but not necessarily including, for example, one or more of addition, multiplication, subtraction, and division) on numerical values of low precision but high dynamic range (“LPHDR arithmetic”). Such a processor or other device may, for example, be implemented on a single chip. Whether or not implemented on a single chip, the number of LPHDR arithmetic elements in the processor or other device in certain embodiments of the present invention significantly exceeds (e.g., by at least 20 more than three times) the number of arithmetic elements, if any, in the processor or other device which are designed to perform high dynamic range arithmetic of traditional precision (such as 32 bit or 64 bit floating point arithmetic).

PROCESSING WITH COMPACT ARITHMETIC PROCESSING ELEMENT
20230168861 · 2023-06-01 ·

A processor or other device, such as a programmable and/or massively parallel processor or other device, includes processing elements designed to perform arithmetic operations (possibly but not necessarily including, for example, one or more of addition, multiplication, subtraction, and division) on numerical values of low precision but high dynamic range (“LPHDR arithmetic”). Such a processor or other device may, for example, be implemented on a single chip. Whether or not implemented on a single chip, the number of LPHDR arithmetic elements in the processor or other device in certain embodiments of the present invention significantly exceeds (e.g., by at least 20 more than three times) the number of arithmetic elements, if any, in the processor or other device which are designed to perform high dynamic range arithmetic of traditional precision (such as 32 bit or 64 bit floating point arithmetic).

Multiplier pipelining optimization with a bit folding correction
09778910 · 2017-10-03 · ·

One embodiment provides a system. The system includes a register to store an operand; a multiplier; and optimizer logic to initiate a square/multiply stage to operate on the operand, initiate a reduction stage prior to completion of the square/multiply stage, and determine whether a carry propagation has occurred.

Techniques for handling high voltage circuitry in an integrated circuit
09755647 · 2017-09-05 · ·

An integrated circuit formed using a semiconductor substrate may include a logic circuit and a switch circuit, whereby the logic circuit operates at a first power supply voltage and the switch circuit operates at a second power supply voltage that is greater than the first power supply voltage. The logic circuit may be formed within a first triple well structure within the semiconductor substrate and is supplied with a first bias voltage. The switch circuit may be formed within a second triple well structure that is electrically isolated from the first triple well structure within the semiconductor substrate and is supplied with a second bias voltage. The switch circuit may receive a control signal that controls the first bias voltage and the second power supply voltage to turn off a transistor in the logic circuit during a programming operation of the integrated circuit.

Method and apparatus for performing a shift and exclusive or operation in a single instruction

Method and apparatus for performing a shift and XOR operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources perform a shift and XOR on at least one value.