IPIQ

G06F9/325

PROGRAM FLOW PREDICTION FOR LOOPS

20230130323 · 2023-04-27 ·

Vijay CHAVAN

Instruction processing circuitry comprises fetch circuitry to fetch instructions for execution; instruction decoder circuitry to decode fetched instructions; execution circuitry to execute decoded instructions; and program flow prediction circuitry to predict a next instruction to be fetched; in which the instruction decoder circuitry is configured to decode a loop control instruction in respect of a given program loop and to derive information from the loop control instruction for use by the program flow prediction circuitry to predict program flow for one or more iterations of the given program loop.

Hardware-implemented universal floating-point instruction set architecture for computing directly with human-readable decimal character sequence floating-point representation operands

11635957 · 2023-04-25 ·

Jerry D. Harthcock

A universal floating-point Instruction Set Architecture (ISA) compute engine implemented entirely in hardware. The ISA compute engine computes directly with human-readable decimal character sequence floating-point representation operands without first having to explicitly perform a conversion-to-binary-format process in software. A fully pipelined convertToBinaryFromDecimalCharacter hardware operator logic circuit converts one or more human-readable decimal character sequence floating-point representations to IEEE 754-2008 binary floating-point representations every clock cycle. Following computations by at least one hardware floating-point operator, a convertToDecimalCharacterFromBinary hardware conversion circuit converts the result back to a human-readable decimal character sequence floating-point representation.

Programmable vision accelerator

11630800 · 2023-04-18 ·

Nvidia Corporation

In one embodiment of the present invention, a programmable vision accelerator enables applications to collapse multi-dimensional loops into one dimensional loops. In general, configurable components included in the programmable vision accelerator work together to facilitate such loop collapsing. The configurable elements include multi-dimensional address generators, vector units, and load/store units. Each multi-dimensional address generator generates a different address pattern. Each address pattern represents an overall addressing sequence associated with an object accessed within the collapsed loop. The vector units and the load store units provide execution functionality typically associated with multi-dimensional loops based on the address pattern. Advantageously, collapsing multi-dimensional loops in a flexible manner dramatically reduces the overhead associated with implementing a wide range of computer vision algorithms. Consequently, the overall performance of many computer vision applications may be optimized.

SYSTEMS AND METHODS FOR DISTRIBUTED DECISION-MAKING AND SCHEDULING

20230060546 · 2023-03-02 ·

Brian Van Matre

An embodiment of the disclosed invention is a computer-implemented method for performing automated decision-making, which includes operating one or more loop(s) of sequential steps that receive data from the environment or from another source, interpret the data, decide on a course of action, and then execute the course of action. During the operation of the one or more loop(s), the method includes a self-monitor function that detects and corrects errors. Another embodiment is a loop architecture for performing automated decision-making that includes an API, three support modules, a receive module, an interpret module, a decide module, an execute module, and an orchestration layer. Another embodiment is a method for implementing a loop architecture to perform a task, wherein the method includes implementing handlers to perform the receive, interpret, decide, and execute functions, and implementing a topology definition.

SPECULATIVE RESOLUTION OF LAST BRANCH-ON-COUNT AT FETCH

20230063079 · 2023-03-02 ·

A computer processor includes an instruction pipeline configured to dispatch a plurality of branch-to-count (BCNT) instructions and an instruction fetch unit (IFU). The IFU is configured to execute an instruction loop for fetching a targeted number of BCNT instructions from the instruction pipeline and to monitor a loop counter that counts a number of fetched BCNT instructions that are actually fetched from the instruction pipeline in response to executing the instruction loop. The IFU resolves a final BCNT instruction included in the instruction loop in response to the number of fetched BCNT instructions reaching a target loop count value.

Control of branch prediction for zero-overhead loop

11663007 · 2023-05-30 ·

Arm Limited

In response to decoding a zero-overhead loop control instruction of an instruction set architecture, processing circuitry sets at least one loop control parameter for controlling execution of one or more iterations of a program loop body of a zero-overhead loop. Based on the at least one loop control parameter, loop control circuitry controls execution of the one or more iterations of the program loop body of the zero-overhead loop, the program loop body excluding the zero-overhead loop control instruction. Branch prediction disabling circuitry detects whether the processing circuitry is executing the program loop body of the zero-overhead loop associated with the zero-overhead loop control instruction, and dependent on detecting that the processing circuitry is executing the program loop body of the zero-overhead loop, disables branch prediction circuitry. This reduces power consumption during a zero-overhead loop when the branch prediction circuitry is unlikely to provide a benefit.

LOOP UNROLLING PROCESSING APPARATUS, METHOD, AND PROGRAM

20230161590 · 2023-05-25 ·

NEC Corporation

Yoshiyuki Ohno

The generation unit 4 generates arithmetic expressions. Here, N denotes the number of looping times of the loop processing. L denotes a designated lower limit of unroll stage number. M denotes a designated upper limit of the unroll stage number. Q denotes a quotient obtained by dividing N by L. R denotes a remainder obtained by dividing N by L. The arithmetic expressions include an arithmetic expression that represents executing loop processing whose number of looping times is a quotient obtained by dividing R by (M−L), with the unroll stage number M when R−Q*(M−L)>0 is not satisfied, and then executing, when a remainder obtained by dividing R by (M−L) is other than 0, processing of one loop with sum of the remainder and L as the unroll stage number, and then executing loop processing with the unroll stage number L.

Livelock recovery circuit for detecting illegal repetition of an instruction and transitioning to a known state

11467840 · 2022-10-11 ·

Imagination Technologies Limited

Livelock recovery circuits configured to detect livelock in a processor, and cause the processor to transition to a known safe state when livelock is detected. The livelock recovery circuits include detection logic configured to detect that the processor is in livelock when the processor has illegally repeated an instruction; and transition logic configured to cause the processor to transition to a safe state when livelock has been detected by the detection logic.

CONTROL OF BRANCH PREDICTION FOR ZERO-OVERHEAD LOOP

20230108825 · 2023-04-06 ·

Program flow prediction for loops

11650822 · 2023-05-16 ·

Arm Limited

Vijay CHAVAN

Patent classifications

G06F9/325