G06F7/38

Managing potentially invalid results during runahead

Embodiments related to managing potentially invalid results generated/obtained by a microprocessor during runahead are provided. In one example, a method for operating a microprocessor includes causing the microprocessor to enter runahead upon detection of a runahead event. The example method also includes, during runahead, determining that an operation associated with an instruction referencing a storage location would produce a potentially invalid result based on a value of an architectural poison bit associated with the storage location and performing a different operation in response.

Register files for storing data operated on by instructions of multiple widths

A processor core includes even and odd execution slices each having a register file. The slices are each configured to perform operations specified in a first set of instructions on data from its respective register file, and together configured to perform operations specified in a second set of instructions on data stored across both register files. During utilization, the processor receives a first instruction of the first set specifying an operation, a target register, and a source register. Next, a second instruction upon which content of the source register depends is identified as being of the second set. In response, the first instruction is dispatched to the even slice. In accordance with the operation specified in the first instruction, the even slice uses content of the source register in its register file to produce a result. Copies of the result are written to the target register in both register files.

Multiply and accumulate calculation device, neuromorphic device, and method for using multiply and accumulate calculation device
11429348 · 2022-08-30 · ·

A multiply and accumulate calculation device includes a multiple calculation unit and a accumulate calculation unit. The multiple calculation unit includes a plurality of multiple calculation elements, which are variable resistance elements, and at least one reference element. The accumulate calculation unit includes an output detector configured to detect a total value of at least outputs from the plurality of multiple calculation elements. Each of the plurality of multiple calculation elements is a magnetoresistance effect element including a magnetized free layer having a magnetic domain wall, a magnetization fixed layer in which a magnetization direction is fixed, and a nonmagnetic layer sandwiched between the magnetized free layer and the magnetized fixed layer. The reference element is a reference magnetoresistance effect element having a magnetization free layer that does not have the magnetic domain wall.

Detecting and selecting two processing modules to execute code having a set of parallel executable parts

The execution of an executable code by a set of processing modules is provided, wherein the executable code is executed by at least one first processing module of the set of processing modules, wherein said executable code comprises a set of parallel executable parts, wherein each parallel executable part of the executable code comprises at least two parallel executable steps, and wherein said executing comprises: detecting by the at least one first processing module a parallel executable part of the set of parallel executable parts of the executable code to be executed; selecting by the at least one first processing module at least two second processing modules of the set of processing modules; and commanding by the at least one first processing module the selected at least two second processing modules to perform the at least two parallel executable steps of the detected parallel executable part of the executable code.

Filter for interpolated signals
09819330 · 2017-11-14 · ·

A digital filter for filtering an input signal to form an output signal containing a coefficient multiplier and a moving-average filter. The coefficient multiplier is embodied to multiply values of the input signal by coefficients of the filter to form an intermediate signal. The moving-average filter is embodied to generate the output signal as a moving average of the intermediate signal.

Techniques for scheduling operations at an instruction pipeline
09817667 · 2017-11-14 · ·

A dispatch stage of a processor core dispatches designated operations (e.g. load/store operations) to a temporary queue when the resources to execute the designated operations are not available. Once the resources become available to execute an operation at the temporary queue, the operation is transferred to a scheduler queue where it can be picked for execution. By dispatching the designated operations to the temporary queue, other operations behind the designated operations in a program order are made available for dispatch to the scheduler queue, thereby improving instruction throughput at the processor core.

LOW-POWER PROCESSOR WITH SUPPORT FOR MULTIPLE PRECISION MODES

Multiple data wordlengths may be supported by a processor through a single data path and/or a single set of registers. For example, the processor may support 16-bit wordlengths and 24-bit wordlengths through a single datapath. For supported data wordlengths that are less than the wordlength of the registers and datapath, the data may be left-aligned within the registers and datapath. The left alignment of data may allow saturation detection in the processor to be performed by examining the same saturation point regardless of the wordlength of the data being operated on. A special saturation mode of the processor may set the lower bits to zero when a configuration register or instruction-bit is set and saturation is detected.

Arithmetic operation in a data processing system

An arithmetic operation in a data processing unit, preferably by iterative digit accumulations, is proposed. An approximate result of the arithmetic operation is computed iteratively. Concurrently at least two supplementary values of the approximate result of the arithmetic operation are computed, and the final result selected from one of the values of the approximate result and the at least two supplementary values of the arithmetic operation depending on the results of the last iteration step.

CAUSING AN INTERRUPT BASED ON EVENT COUNT

Some implementations provide techniques and arrangements for causing an interrupt in a processor in response to an occurrence of a number of events. A first event counter counts the occurrences of a type of event within the processor and outputs a signal to activate a second event counter in response to reaching a first predefined count. The second event counter counts the occurrences of the type of event within the processor and causes an interrupt of the processor in response to reaching a second predefined count.

Functional unit having tree structure to support vector sorting algorithm and other algorithms

An apparatus is described having a functional unit of an instruction execution pipeline. The functional unit has a plurality of compare-and-exchange circuits coupled to network circuitry to implement a vector sorting tree for a vector sorting instruction. Each of the compare-and-exchange circuits has a respective comparison circuit that compares a pair of inputs. Each of the compare-and-exchange circuits have a same sided first output for presenting a higher of the two inputs and a same sided second output for presenting a lower of the two inputs, said comparison circuit to also support said functional unit's execution of a prefix min and/or prefix add instruction.