G06F9/324

Filtering based on instruction execution characteristics for assessing program performance

Identifying computer program execution characteristics for determine relevance of pattern instruction executions to determine characteristics of a computer program. Filters are utilized to determine which subsequent occurrences of execution of at least one computer instruction are relevant to a counter based on execution characteristics of the at least one computer instruction where the counter counts the subsequent occurrences of execution of at least one computer instruction following prior executions of the same at least one computer instruction.

Apparatus and method for interpreting permissions associated with a capability
11023237 · 2021-06-01 · ·

An apparatus and method are provided for interpreting permissions associated with a capability. The apparatus has processing circuitry for executing instructions in order to perform operations, and a capability storage element accessible to the processing circuitry and arranged to store a capability used to constrain at least one operation performed by the processing circuitry when executing the instructions. The capability identifies a plurality N of default permissions whose state, in accordance with a default interpretation, is determined from N permission flags provided in the capability. In accordance with the default interpretation, each permission flag is associated with one of the default permissions. The processing circuitry is then arranged to analyse the capability in accordance with an alternative interpretation, in order to derive, from logical combinations of the N permission flags, state for an enlarged set of permissions, where the enlarged set comprises at least N+1 permissions. This provides a mechanism for encoding additional permissions into capabilities without increasing the number of permission flags required, whilst still retaining desirable behaviour.

DATA STRUCTURE DESCRIPTORS FOR DEEP LEARNING ACCELERATION

Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data structure register storing a data structure descriptor describing an operand as a fabric vector or a memory vector. The data structure descriptor further describes the memory vector as one of a one-dimensional vector, a four-dimensional vector, or a circular buffer vector. Optionally, the data structure descriptor specifies an extended data structure register storing an extended data structure descriptor. The extended data structure descriptor specifies parameters relating to a four-dimensional vector or a circular buffer vector.

Dataflow Triggered Tasks for Accelerated Deep Learning

Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow based computations on wavelets of data. Each processing element has a compute element and a routing element. Each compute element has memory. Each router enables communication via wavelets with nearest neighbors in a 2D mesh. Routing is controlled by respective virtual channel specifiers in each wavelet and routing configuration information in each router. A compute element receives a particular wavelet comprising a particular virtual channel specifier and a particular data element. Instructions are read from the memory of the compute element based at least in part on the particular virtual channel specifier. The particular data element is used as an input operand to execute at least one of the instructions.

FILTERING BASED ON INSTRUCTION EXECUTION CHARACTERISTICS FOR ASSESSING PROGRAM PERFORMANCE

Identifying computer program execution characteristics for determine relevance of pattern instruction executions to determine characteristics of a computer program. Filters are utilized to determine which subsequent occurrences of execution of at least one computer instruction are relevant to a counter based on execution characteristics of the at least one computer instruction where the counter counts the subsequent occurrences of execution of at least one computer instruction following prior executions of the same at least one computer instruction.

INSTRUCTION PROCESSING METHOD AND APPARATUS
20210089306 · 2021-03-25 ·

Embodiments of the present disclosure provide an instruction processing apparatus, comprising a count buffer; an instruction fetching circuitry configured to store address-related information of a jump instruction into the count buffer and to determine an information identifier corresponding to the address-related information, wherein the address-related information comprises address information of the jump instruction in a storage device; and an instruction executing circuitry configured to acquire the address-related information of the jump instruction from the count buffer according to the information identifier and execute the jump instruction according to the address-related information.

Distance based branch prediction and detection of potential call and potential return instructions

Examples of techniques for distance-based branch prediction are disclosed. In one example implementation according to aspects of the present disclosure, a computer-implemented method includes: determining, by a processing system, a potential return instruction address (IA) by determining whether a relationship is satisfied between a first target IA and a first branch IA; storing a second branch IA as a return when a target IA of a second branch matches a potential return IA for the second branch; and applying the potential return IA for the second branch as a predicted target IA of a predicted branch IA stored as a return.

Allocation of memory by mapping registers referenced by different instances of a task to individual logical memories

Methods of memory allocation in which registers referenced by different groups of instances of the same task are mapped to individual logical memories. Other example methods describe the mapping of registers referenced by a task to different banks within a single logical memory and in various examples this mapping may take into consideration which bank is likely to be the dominant bank for the particular task and the allocation for one or more other tasks.

Unified logic for aliased processor instructions
10846089 · 2020-11-24 · ·

A binary logic circuit for manipulating an input binary string includes a first stage of a first group of multiplexers arranged to select respective portions of an input binary string and configured to receive a respective first control. A second stage is included in which a plurality of a second group of multiplexers is arranged to select respective portions of the input binary string and configured to receive a respective second control signal. The control signals are provided such that each multiplexer of a second group is configured to select a respective second portion of the first binary string. Control circuitry is configured to generate the first and second control signals such that two or more of the first groups and/or two or more of the second groups of multiplexers are independently controllable.

WAVELET REPRESENTATION FOR ACCELERATED DEEP LEARNING

Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a compute element with dedicated storage and a routing element. Each router enables communication with nearest neighbors in a 2D mesh. The communication is via wavelets in accordance with a representation comprising an index specifier, a virtual channel specifier, a task specifier, a data element specifier, and an optional control/data specifier. The virtual channel specifier and the task specifier are associated with one or more instructions. The index specifier and the data element are optionally associated with operands of the one or more instructions.