Patent classifications
G06F9/324
NON-TRANSITORY COMPUTER-READABLE MEDIUM, ANALYSIS DEVICE, AND ANALYSIS METHOD
The present disclosure relates to a non-transitory computer-readable recording medium storing an analysis program that causes a computer to execute a process. The process includes sampling an instruction address of one of instructions included in a program during execution of the program, identifying a first function that includes the sampled instruction address in an address range, rewriting mark information associated with the identified first function, identifying first information corresponding to the instruction address of the first function among a plurality of first information based on the rewritten mark information, identifying second information corresponding to the instruction address of the first function among a plurality of second information based on the rewritten mark information, storing the first information and the second information in a memory, and analyzing performance of the program based on the first information and the second information stored in the memory.
TRACING CIRCUIT, SEMICONDUCTOR DEVICE, TRACER, AND TRACING SYSTEM
A tracing circuit is integrated in a semiconductor device along with a microprocessor including an m-bit program counter, and externally outputs a tracing clock along with an n-bit tracing data (where 2≤n≤m). The tracing circuit, when the program counter remains unchanged, synchronously with the tracing clock sets the tracing data to a first output value; when the program counter is incremented, synchronously with the tracing clock sets the tracing data to a second output value; and when the program counter is loaded, synchronously with the tracing clock sets the tracing data to a third output value, and then suspends the state machine in the microprocessor and split-outputs, as the tracing data, the branch destination address or interrupt destination address loaded in the program counter.
STATELESS AND LOW-OVERHEAD DOMAIN ISOLATION USING CRYPTOGRAPHIC COMPUTING
Technologies provide domain isolation using encoded pointers to data and code. A system may be configured for decoding an encoded pointer to obtain a linear address of an encrypted code block of a first software component in memory. The first software component shares a linear address space of the memory with a plurality of software components. A processor uses the linear address to access the encrypted code block, determines a relative position of the encrypted code block within a memory slot of the linear address space, and decrypts the encrypted code block to generate a decrypted code block using a code key and a code tweak. The code tweak includes a relative position of the encrypted code block in the address space and domain metadata that uniquely identifies the software component. In some scenarios, the software component may be position independent code and may be relocatable to different address spaces.
CIRCUIT ENABLING DEVICE, NON-TRANSITORY COMPUTER READABLE MEDIUM, AND USER-SPECIFIC CIRCUIT
A circuit enabling device includes a processor configured to, when multiple predetermined setting values are written in a register in predetermined order, enable a module corresponding to the multiple predetermined setting values and the predetermined order, the multiple predetermined setting values being determined in advance for each of a multiple modules, the register being used to enable the multiple modules individually, the multiple modules being included in a user-specific circuit, the user-specific circuit including a general-purpose circuit and a user circuit, the user circuit including the multiple modules.
DISTANCE-BASED BRANCH PREDICTION AND DETECTION
Examples of techniques for distance-based branch prediction are disclosed. In one example implementation according to aspects of the present disclosure, a computer-implemented method includes: determining, by a processing system, a potential return instruction address (IA) by determining whether a relationship is satisfied between a first target IA and a first branch IA; storing a second branch IA as a return when a target IA of a second branch matches a potential return IA for the second branch; and applying the potential return IA for the second branch as a predicted target IA of a predicted branch IA stored as a return
Compound instruction set architecture for a neural inference chip
A device for controlling neural inference processor cores is provided, including a compound instruction set architecture. The device comprises an instruction memory, which comprises a plurality of instructions for controlling a neural inference processor core. Each of the plurality of instructions comprises a control operation. The device further comprises a program counter. The device further comprises at least one loop counter register. The device is adapted to execute the plurality of instructions. Executing the plurality of instructions comprises: reading an instruction from the instruction memory based on a value of the program counter; updating the at least one loop counter register according to the control operation of the instruction; and updating the program counter according to the control operation of the instruction and a value of the at least one loop counter register.
INSTRUCTION PREFETCHING
A data processing apparatus has prefetch circuitry for prefetching instructions from a data store into an instruction queue. Branch prediction circuitry is provided for predicting outcomes of branch instructions and the prefetch circuitry may prefetch instructions subsequent to the branch based on the predicted outcome. Instruction identifying circuitry identifies whether a given instruction prefetched from the data store is a predetermined type of program flow altering instruction and if so then controls the prefetch circuitry to halt prefetching of subsequent instructions into the instruction queue.
MANAGING PREFETCH REQUESTS BASED ON STREAM INFORMATION FOR PREVIOUSLY RECOGNIZED STREAMS
Managing prefetch requests associated with memory access requests includes storing stream information associated with multiple streams. At least one stream was recognized based on an initial subset of memory access requests within a previously performed set of related memory access requests and is associated with stream information that includes stream matching information and stream length information. After the previously performed set has ended, a matching memory access request is identified that matches with a corresponding matched stream based at least in part on stream matching information within stream information associated with the matched stream. In response to identifying the matching memory access request, the managing determines whether or not to perform a prefetch request for data at an address related to a data address in the matching memory access request based at least in part on stream length information within the stream information associated with the matched stream.
METHOD AND SYSTEM FOR EXECUTING NEW INSTRUCTIONS
A method for executing new instructions is provided. The method is used in a processor and includes: receiving an instruction; when the received instruction is an unknown instruction, the processor executes the following steps through a conversion program: determining whether the received instruction is a new instruction; and converting the received instruction into at least one old instruction when the received instruction is a new instruction; and simulating the execution of the received instruction by executing the at least one old instruction.
TASK SYNCHRONIZATION FOR ACCELERATED DEEP LEARNING
Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a compute element and a routing element. Each compute element has memory. Each router enables communication via wavelets with at least nearest neighbors in a 2D mesh. Routing is controlled by respective virtual channel specifiers in each wavelet and routing configuration information in each router. A compute element conditionally selects for task initiation a previously received wavelet specifying a particular one of the virtual channels. The conditional selecting excludes the previously received wavelet for selection until at least block/unblock state maintained for the particular virtual channel is in an unblock state. The compute element executes block/unblock instructions to modify the block/unblock state.