G06F9/3005

Reduction mode of planar engine in neural processor

Embodiments relate to a neural processor that includes one or more neural engine circuits and planar engine circuits. The neural engine circuits can perform convolution operations of input data with one or more kernels to generate outputs. The planar engine circuit is coupled to the plurality of neural engine circuits. A planar engine circuit can be configured to multiple modes. In a reduction mode, the planar engine circuit may process values arranged in one or more dimensions of input to generate a reduced value. The reduced values across multiple input data may be accumulated. The planar engine circuit may program a filter circuit as a reduction tree to gradually reduce the data into a reduced value. The reduction operation reduces the size of one or more dimensions of a tensor.

Automatically mapping binary executable files to source code by a software modernization system

Techniques are described for enabling a software modernization system to automatically map binary executable files and other runtime artifacts (e.g., application binaries, Java ARchive (JAR) files, .NET Dynamic Link Library (DLL) files, process identifiers, etc.) to source code associated with the binary executable files, e.g., as part of modernization processes aimed at migrating users' applications to a cloud service provider's infrastructure. A software modernization service of a cloud provider network provides discovery agents and other tools that are capable of creating an inventory of users' software applications and collecting profile data about the software applications. Various techniques are described for automatically identifying the source code associated with software applications identified by a discovery agent in a user's computing environment, thereby improving the efficiency of various software modernization analyses and other modernization processes.

SYSTEMS AND METHODS FOR REDUCING CONGESTION ON NETWORK-ON-CHIP

Systems or methods of the present disclosure may provide a programmable logic device including a network-on-chip (NoC) to facilitate data transfer between one or more main intellectual property components (main IP) and one or more secondary intellectual property components (secondary IP). To reduce or prevent excessive congestion on the NoC, the NoC may include one or more traffic throttlers that may receive feedback from a data buffer, a main bridge, or both and adjust data injection rate based on the feedback. Additionally, the NoC may include a data mapper to enable data transfer to be remapped from a first destination to a second destination if congestion is detected at the first destination.

System and method for instruction mapping in an out-of-order processor
11531549 · 2022-12-20 · ·

A system and corresponding method map instructions in an out-of-order (OoO) processor. The system comprises a mapper, integer snapshot circuitry, and floating-point (FP) snapshot circuitry. The mapper maps instructions by mapping integer and FP architectural registers (ARs) of the instructions to integer and FP physical registers of the OoO processor, respectively. The mapper records, via at least one present FP indicator, presence of FP ARs used as destinations in the instructions. The mapper copies, periodically, the integer mapper state to the integer snapshot circuitry and copies, intermittently, based on the at least one FP present indicator, the FP mapper state to the FP snapshot circuitry. Copies of the integer and FP mapper state in the integer and FP snapshot circuitry, respectively, improve performance for instruction unwinding caused, for example, by an exception, branch/jump mispredict, etc. By copying the FP mapper state, intermittently, power efficiency of the OoO processor is improved.

Caching override indicators for statistically biased branches to selectively override a global branch predictor

A data processing apparatus is provided that includes global-history prediction circuitry that provides a prediction of an outcome of a given control flow instruction based on a result of execution of one or more previous control flow instructions. Correction circuitry provides a corrected prediction of the global-history prediction circuitry in respect of the given control flow instruction and cache circuitry, separate from the correction circuitry, stores the corrected prediction in respect of the given control flow instruction.

Systems and methods for customization of workflow design

Disclosed here are systems and methods that allow users, upon detecting errors within a running workflow, to either 1) pause the workflow and directly correct its design before resuming the workflow, or 2) pause the workflow, correct the erred action within the workflow, resume running the workflow, and afterwards apply the corrections to the design of the workflow. The disclosure comprises functionality that pauses a single workflow and other relevant workflows as soon as the error is detected and while it is corrected. The disclosed systems and methods improve communication technology between the networks and servers of separate parties relevant and/or dependent on successful execution of other workflows.

Serverless workflow enablement and execution platform

The present disclosure provides computing systems and methods that optimize the execution of workflows that include computational tasks (e.g., which may take the form of functions or containers). In general, the proposed systems and methods can be referred as to or embodied within a serverless workflow enablement and execution platform (also referred to herein as a workflow management system). The serverless workflow platform can facilitate performance of a large-scale computational workflow. In particular, the serverless workflow platform can facilitate performance of serverless workflows that are executed on serverless execution platforms.

Method for processing event data flow and computing device

The present disclosure provides a method for processing an event data flow and a computing device. The method includes: reading a plurality of pieces of event data with a first duration sequentially from the event data flow; with respect to each piece of event data with the first duration, analyzing the event data to acquire time-difference information about each event within the first duration; and generating an image frame presenting a change in movement within the first duration in accordance with the time-difference information about each event within the first duration.

INSERTING A PROXY READ INSTRUCTION IN AN INSTRUCTION PIPELINE IN A PROCESSOR
20220365780 · 2022-11-17 ·

Inserting a proxy read instruction in an instruction pipeline in a processor is disclosed. A scheduler circuit is configured to recognize when a produced value generated by execution of a producer instruction in the instruction pipeline will not be available through a data forwarding path to be consumed for processing of a subsequent consumer instruction. In this case, the scheduling circuit is configured to insert a proxy read instruction in the instruction pipeline to cause execution of an operation to generate the same produced value as was generated by previous execution of producer instruction in the instruction pipeline. Thus, the produced value will remain available in the instruction pipeline to again be available through a data forwarding path to an earlier stage of the instruction pipeline to be consumed by a consumer instruction, which may avoid a pipeline stall.

Spatial and temporal merging of remote atomic operations

Disclosed embodiments relate to spatial and temporal merging of remote atomic operations. In one example, a system includes an RAO instruction queue stored in a memory and having entries grouped by destination cache line, each entry to enqueue an RAO instruction including an opcode, a destination identifier, and source data, optimization circuitry to receive an incoming RAO instruction, scan the RAO instruction queue to detect a matching enqueued RAO instruction identifying a same destination cache line as the incoming RAO instruction, the optimization circuitry further to, responsive to no matching enqueued RAO instruction being detected, enqueue the incoming RAO instruction; and, responsive to a matching enqueued RAO instruction being detected, determine whether the incoming and matching RAO instructions have a same opcode to non-overlapping cache line elements, and, if so, spatially combine the incoming and matching RAO instructions by enqueuing both RAO instructions in a same group of cache line queue entries at different offsets.