Patent classifications
G06F9/324
Apparatus and method for managing a capability domain
An apparatus and method are provided for managing a capability domain. The apparatus has processing circuitry for executing instructions, the processing circuitry when in a default state being arranged to operate in a capability domain comprising capabilities used to constrain operations performed by the processing circuitry when executing the instructions. A program counter capability storage element is also provided to store a program counter capability used by the processing circuitry to determine a program counter value. The program counter capability is arranged to identify a capability state for the processing circuitry. The processing circuitry is then arranged, when the capability state indicates the default state, to operate in the capability domain. However, when the capability state indicates the executive state, the processing circuitry is arranged to operate in a manner less constrained than when in the default state so as to allow modification of the capability domain. This provides a simple and effective mechanism for selectively allowing the apparatus to modify the capability domain.
Data structure descriptors for deep learning acceleration
Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data structure register storing a data structure descriptor describing an operand as a fabric vector or a memory vector. The data structure descriptor further describes the memory vector as one of a one-dimensional vector, a four-dimensional vector, or a circular buffer vector. Optionally, the data structure descriptor specifies an extended data structure register storing an extended data structure descriptor. The extended data structure descriptor specifies parameters relating to a four-dimensional vector or a circular buffer vector.
Data structure descriptors for deep learning acceleration
Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data structure register storing a data structure descriptor describing an operand as a fabric vector or a memory vector. The data structure descriptor further describes the memory vector as one of a one-dimensional vector, a four-dimensional vector, or a circular buffer vector. Optionally, the data structure descriptor specifies an extended data structure register storing an extended data structure descriptor. The extended data structure descriptor specifies parameters relating to a four-dimensional vector or a circular buffer vector.
Non-transitory computer-readable medium, analysis device, and analysis method
The present disclosure relates to a non-transitory computer-readable recording medium storing an analysis program that causes a computer to execute a process. The process includes sampling an instruction address of one of instructions included in a program during execution of the program, identifying a first function that includes the sampled instruction address in an address range, rewriting mark information associated with the identified first function, identifying first information corresponding to the instruction address of the first function among a plurality of first information based on the rewritten mark information, identifying second information corresponding to the instruction address of the first function among a plurality of second information based on the rewritten mark information, storing the first information and the second information in a memory, and analyzing performance of the program based on the first information and the second information stored in the memory.
DATA STRUCTURE DESCRIPTORS FOR DEEP LEARNING ACCELERATION
Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data structure register storing a data structure descriptor describing an operand as a fabric vector or a memory vector. The data structure descriptor further describes the memory vector as one of a one-dimensional vector, a four-dimensional vector, or a circular buffer vector. Optionally, the data structure descriptor specifies an extended data structure register storing an extended data structure descriptor. The extended data structure descriptor specifies parameters relating to a four-dimensional vector or a circular buffer vector.
Allocation of memory
Methods of memory allocation map registers referenced by different groups of instances of the same task to individual logical memories. Other example methods describe the mapping of registers referenced by a task to different banks within a single logical memory and in various examples this mapping may take into consideration which bank is likely to be the dominant bank for the particular task and the allocation for one or more other tasks.
Managing prefetch requests based on stream information for previously recognized streams
Managing prefetch requests associated with memory access requests includes storing stream information associated with multiple streams. At least one stream was recognized based on an initial subset of memory access requests within a previously performed set of related memory access requests and is associated with stream information that includes stream matching information and stream length information. After the previously performed set has ended, a matching memory access request is identified that matches with a corresponding matched stream based at least in part on stream matching information within stream information associated with the matched stream. In response to identifying the matching memory access request, the managing determines whether or not to perform a prefetch request for data at an address related to a data address in the matching memory access request based at least in part on stream length information within the stream information associated with the matched stream.
Increasing address space layout randomization entropy via page remapping and rotations
Various embodiments include methods and devices for generating a memory map configured to map virtual addresses of pages to physical addresses, in which pages of a same size are grouped into regions. The embodiments may include adding a first entry for a first additional page to a first region in the memory map, shifting virtual addresses of the first region to accommodate a shift of virtual addresses of the first region allocated for code by a sub-page granular shift amount, mapping shifted virtual addresses of the first entry for the first additional page to physical address mapped to a first lowest shifted virtually addressed page of the first region, and shifting the virtual addresses of the first region allocated for code by a sub-page granular shift amount, in which the virtual addresses of the first region allocated for code partially shift into the first entry for the first additional page.
Branch prediction structure indexed based on return address popped from a call-return stack
An apparatus comprises a branch predictor to predict an outcome of a branch instruction to be processed by processing circuitry, based on branch prediction state information stored in one or more branch prediction structures, and a call path tracking element to store call path tracking information, where in response to a procedure calling instruction processed by the processing circuitry, the call path tracking information is updated depending on at least a portion of an address associated with the procedure calling instruction. The one or more branch prediction structures include at least one call-path-indexed branch prediction structure which is indexed based on an index value which depends on a portion of the call path tracking information stored in the call path tracking element.
ACCELERATING PROCESSOR BASED ARTIFICIAL NEURAL NETWORK COMPUTATION
An apparatus employed in a processing device comprises a processor configured to process data of a predefined data structure. A memory fetch device is coupled to the processor and is configured to determine addresses of the packed data for the processor. The packed data is stored on a memory device that is coupled to the processor. The memory fetch device is further configured to provide output data based on the addresses of the packed data to the processor, where the output data is configured according to the predefine data structure.