Patent classifications
G06F9/322
METHODS AND APPARATUS TO INSERT PROFILING INSTRUCTIONS INTO A GRAPHICS PROCESSING UNIT KERNEL
Embodiments are disclosed for inserting profiling instructions into graphics processing unit (GPU) kernels. An example apparatus includes instructions, and at least one processor to execute the instructions to determine whether a GPU supports modification of entry point addresses, detect a first entry point address and a second entry point address of an original GPU kernel, create a corresponding instrumented GPU kernel from the original GPU kernel based on the determination by inserting at least one of first profiling initialization instructions or first jump instructions at the first entry point address of the instrumented GPU kernel, inserting at least one of second profiling initialization instructions or second jump instructions at the second entry point address of the instrumented GPU kernel, and inserting profiling measurement instructions into the instrumented GPU kernel.
Apparatus and method for controlling execution of instructions
An apparatus has processing circuitry to execute a sequence of instructions, an integer storage element to store an integer value for access by the processing circuitry, and a capability storage element for storing a capability for access by the processing circuitry. A capability usage storage is then used to store capability usage information. The processing circuitry is responsive to execution of at least one instruction in the sequence of instructions to generate, in dependence on the capability usage information, a result to be stored in a destination storage element, when the capability usage information identifies a capability state, the result is generated as a capability, and the capability storage element is selected as the destination storage element, and when the capability usage information identifies a non-capability state, the result is generated as an integer value, and the integer storage element is selected as the destination storage element.
SIGNAL HANDLING BETWEEN PROGRAMS ASSOCIATED WITH DIFFERENT ADDRESSING MODES
Techniques for signal handling between programs associated with different addressing modes in a computer system are described herein. An aspect includes, based on a signal occurring during execution of a first program in a first runtime environment, wherein the first program and the first runtime environment are associated with a first addressing mode, invoking a first signal exit routine associated with the first addressing mode. Another aspect includes allocating a signal information area (SIA) by the first signal exit routine. Another aspect includes calling a second signal exit routine associated with a second addressing mode that is different from the first addressing mode with an address of the SIA. Another aspect includes allocating a mirror SIA by the second signal exit routine. Another aspect includes handling the signal, and resuming execution based on the handling of the signal.
FETCH STAGE HANDLING OF INDIRECT JUMPS IN A PROCESSOR PIPELINE
Systems and methods are disclosed for fetch stage handling of indirect jumps in a processor pipeline. For example, a method includes detecting a sequence of instructions fetched by a processor core, wherein the sequence of instructions includes a first instruction, with a result that depends on an immediate field of the first instruction and a program counter value, followed by a second instruction that is an indirect jump instruction; responsive to detection of the sequence of instructions, preventing an indirect jump target predictor circuit from generating a target address prediction for the second instruction; and, responsive to detection of the sequence of instructions, determining a target address for the second instruction before the first instruction is issued to an execution stage of a pipeline of the processor core.
Methods and apparatus to insert profiling instructions into a graphics processing unit kernel
Embodiments are disclosed for inserting profiling instructions into graphics processing unit (GPU) kernels. An example apparatus includes an entry point detector to detect a first entry point address and a second entry point address of an original GPU kernel, the first entry point address including a first entry point instruction, the second entry point address including a second entry point instruction. An instruction inserter is to create a corresponding instrumented GPU kernel from the original GPU kernel by inserting first profiling initialization instructions at a first address of the instrumented GPU kernel, the instruction inserter to insert profiling measurement instructions into the instrumented GPU kernel. An entry point adjuster is to adjust a list of entry points of the instrumented GPU kernel to replace the first entry point address with the first address and the second entry point address with the second address.
Arithmetic processing apparatus and control method for arithmetic processing apparatus
An apparatus includes an instruction issuer that issues an instruction; and a cache including a cache data memory and a cache tag including cache entries, and a cache controller configured to perform cache-hit judgement, in response to a memory-access instruction issued from the instruction issuer, based on an address of the memory-access instruction and configured to issue a memory-access request to a memory in a case where the cache-hit judgement is a cache miss, wherein the cache controller registers, when issuing the memory-access request, data obtained by the memory-access request in the cache data memory, and registers provisional registration information of a provisional registration state indicating that cache registration is performed by execution of a speculative memory-access instruction in the cache tag, and judges as a speculative entry cache miss and issues the memory-access request.
Method and device for updating a program
A method updating a program in a flash memory includes executing a first image of the program while an address space of the program is imaged onto the memory blocks, which are operated in a single-level mode; copying part of the first image from a range within the address space, which is imaged onto one of the blocks, into a backup block; setting the one of the blocks to a multi-level mode; while the address range is imaged onto the backup block, programming the one of the blocks with part of the second image besides for the part of the first image; switching the address range back to the block while the block remains in the multi-level mode; as long as the second image is incomplete, repeating the copying, programming, and switching with further parts of the second image; and subsequently executing the second image instead of the first image.
Method of enforcing control flow integrity in a monolithic binary using static analysis
Method of enforcing control flow integrity (CFI) for a monolithic binary using static analysis by: marking evaluated functions as core functions by a chosen heuristic or empirically; generating a binary call graph; merging all function nodes of core functions as a node of highest privilege (set 0); merging all leaf functions in one node without privilege (set n); merging all nodes without privilege that reach functions of privilege i and setting the merged node privilege to i+1; checking if there is a node without privilege besides a trivial function; in a positive case, returning to merging all nodes without privilege and setting the merged node privilege to i+1; and in a negative case, setting the privilege of trivial functions as i+2.
Branch predictor
An apparatus comprises processing circuitry to perform data processing in response to instructions fetched from an instruction cache, an instruction prefetcher to speculatively prefetch instructions into the instruction cache, and a branch predictor having at least one branch prediction structure to store branch prediction data for predicting at least one branch property of an instruction fetched for processing by the processing circuitry. On prefetching of a given instruction into the instruction cache by the instruction prefetcher, the branch predictor is configured to perform a prefetch-triggered update of the branch prediction data based on information derived from the given instruction prefetched by the instruction prefetcher. This can help to improve performance, especially for workloads with a high branch density and large branch re-reference interval.
METHODS AND APPARATUS TO INSERT PROFILING INSTRUCTIONS INTO A GRAPHICS PROCESSING UNIT KERNEL
Embodiments are disclosed for inserting profiling instructions into graphics processing unit (GPU) kernels. An example apparatus includes instructions, and at least one processor to execute the instructions to determine whether a GPU supports modification of entry point addresses, detect a first entry point address and a second entry point address of an original GPU kernel, create a corresponding instrumented GPU kernel from the original GPU kernel based on the determination by inserting at least one of first profiling initialization instructions or first jump instructions at the first entry point address of the instrumented GPU kernel, inserting at least one of second profiling initialization instructions or second jump instructions at the second entry point address of the instrumented GPU kernel, and inserting profiling measurement instructions into the instrumented GPU kernel.