Patent classifications
G06F9/324
AN APPARATUS AND METHOD FOR MANAGING A CAPABILITY DOMAIN
An apparatus and method are provided for managing a capability domain. The apparatus has processing circuitry for executing instructions, the processing circuitry when in a default state being arranged to operate in a capability domain comprising capabilities used to constrain operations performed by the processing circuitry when executing the instructions. A program counter capability storage element is also provided to store a program counter capability used by the processing circuitry to determine a program counter value. The program counter capability is arranged to identify a capability state for the processing circuitry. The processing circuitry is then arranged, when the capability state indicates the default state, to operate in the capability domain. However, when the capability state indicates the executive state, the processing circuitry is arranged to operate in a manner less constrained than when in the default state so as to allow modification of the capability domain. This provides a simple and effective mechanism for selectively allowing the apparatus to modify the capability domain.
CALL-PATH-INDEXED BRANCH PREDICTION STRUCTURE
An apparatus comprises a branch predictor to predict an outcome of a branch instruction to be processed by processing circuitry, based on branch prediction state information stored in one or more branch prediction structures, and a call path tracking element to store call path tracking information, where in response to a procedure calling instruction processed by the processing circuitry, the call path tracking information is updated depending on at least a portion of an address associated with the procedure calling instruction. The one or more branch prediction structures include at least one call-path-indexed branch prediction structure which is indexed based on an index value which depends on a portion of the call path tracking information stored in the call path tracking element.
Apparatus and method for managing bounded pointers
An apparatus and method are provided for managing bounded pointers. The apparatus has processing circuitry to execute a sequence of instructions, and a plurality of storage elements accessible to the processing circuitry, for storage of bounded pointers and non-bounded pointers. Each bounded pointer has explicit range information associated therewith indicative of an allowable range of memory addresses when using the bounded pointer. A current range check storage element is then used to store a current range check state for the processing circuitry. When the current range check state indicates a default state, the processing circuitry is responsive to execution of a memory access instruction identifying a pointer to be used to identify a memory address, to perform a range check operation to determine whether access to that memory address is permitted. In particular, when the memory access instruction identifies as the pointer one of the bounded pointers, the range check operation is performed with reference to the explicit range information associated with that bounded pointer. If instead the memory access instruction identifies a non-bounded pointer, the range check operation is performed with reference to default range information defined for the processing circuitry. On detection of at least one event, the current range check state is set to an executive state. When in the executive state, the processing circuitry is responsive to execution of a memory access instruction to disable the range check operation when the identified pointer is a non-bounded pointer. This provides an efficient, but controlled, mechanism for enabling the set of bounded pointers available to the processing circuitry to be altered.
HANDLING MODIFICATIONS TO PERMITTED PROGRAM COUNTER RANGES IN A DATA PROCESSING APPARATUS
An apparatus and method of operating an apparatus are disclosed. The apparatus has a program counter permitted range storage element defining a permitted range of program counter values for the sequence of instructions it executes. Branch prediction circuitry predicts target instruction addresses for branch instructions. In response to a program counter modifying event, a program counter speculative range storage element is updated corresponding to each speculatively executed instruction after a branch instruction. Program counter permitted range verification circuitry is responsive to resolution of a modification of the program counter permitted range indication resulting from the program counter modifying event to determine whether the speculatively executed program counter range satisfies the permitted range of program counter values. A branch mis-prediction mechanism may support the response of the apparatus if the permitted range of program counter values is violated.
ALLOCATION OF MEMORY BY MAPPING REGISTERS REFERENCED BY DIFFERENT INSTANCES OF A TASK TO INDIVIDUAL LOGICAL MEMORIES
Methods of memory allocation in which registers referenced by different groups of instances of the same task are mapped to individual logical memories. Other example methods describe the mapping of registers referenced by a task to different banks within a single logical memory and in various examples this mapping may take into consideration which bank is likely to be the dominant bank for the particular task and the allocation for one or more other tasks.
DEBUGGING OF PREFIXED CODE
A debugging capability that enables the efficient debugging of code that has prefixes, referred to herein as prefixed code. To debug application code, in which the application code includes a prefixed instruction to be modified by a prefix, a trap is provided. The trap is configured to report a presence of the prefix, but to otherwise perform the trap functions absent the prefix; i.e., the prefix is otherwise ignored in the processing of the trap.
Dataflow Triggered Tasks for Accelerated Deep Learning
Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow based computations on wavelets of data. Each processing element has a compute element and a routing element. Each compute element has memory. Each router enables communication via wavelets with nearest neighbors in a 2D mesh. Routing is controlled by respective virtual channel specifiers in each wavelet and routing configuration information in each router. A compute element receives a particular wavelet comprising a particular virtual channel specifier and a particular data element. Instructions are read from the memory of the compute element based at least in part on the particular virtual channel specifier. The particular data element is used as an input operand to execute at least one of the instructions.
WAVELET REPRESENTATION FOR ACCELERATED DEEP LEARNING
Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a compute element with dedicated storage and a routing element. Each router enables communication with nearest neighbors in a 2D mesh. The communication is via wavelets in accordance with a representation comprising an index specifier, a virtual channel specifier, a task specifier, a data element specifier, and an optional control/data specifier. The virtual channel specifier and the task specifier are associated with one or more instructions. The index specifier and the data element are optionally associated with operands of the one or more instructions.
DATA STRUCTURE DESCRIPTORS FOR DEEP LEARNING ACCELERATION
Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data structure register storing a data structure descriptor describing an operand as a fabric vector or a memory vector. The data structure descriptor further describes the memory vector as one of a one-dimensional vector, a four-dimensional vector, or a circular buffer vector. Optionally, the data structure descriptor specifies an extended data structure register storing an extended data structure descriptor. The extended data structure descriptor specifies parameters relating to a four-dimensional vector or a circular buffer vector.
Debugging of prefixed code
A debugging capability that enables the efficient debugging of code that has prefixes, referred to herein as prefixed code. To debug application code, in which the application code includes a prefixed instruction to be modified by a prefix, a trap is provided. The trap is configured to report a presence of the prefix, but to otherwise perform the trap functions absent the prefix; i.e., the prefix is otherwise ignored in the processing of the trap.