G06F9/3865

DYNAMIC INSTRUMENTATION VIA USER-LEVEL MECHANISMS
20220214886 · 2022-07-07 ·

In one embodiment, a method includes accessing a loaded but paused source process executable and disassembling the source process executable to identify a system call to be instrumented and an adjacent relocatable instruction. Instrumenting the system call includes building a trampoline for the system call that includes a check flag instruction at or near an entry point to the trampoline and two areas of the trampoline that are selectively executed according to results of the check flag instruction. Building a first area of the trampoline includes providing instructions to execute a relocated copy of the adjacent relocatable instruction and return flow to an address immediately following the adjacent relocatable instruction. Building a second area of the trampoline includes providing instructions to invoke at least one handler associated with executing a relocated copy of the system call and return flow to an address immediately following the system call.

Apparatuses, methods, and systems to precisely monitor memory store accesses

Systems, methods, and apparatuses relating to circuitry to precisely monitor memory store accesses are described. In one embodiment, a system includes a memory, a hardware processor core comprising a decoder to decode an instruction into a decoded instruction, an execution circuit to execute the decoded instruction to produce a resultant, a store buffer, and a retirement circuit to retire the instruction when a store request for the resultant from the execution circuit is queued into the store buffer for storage into the memory, and a performance monitoring circuit to mark the retired instruction for monitoring of post-retirement performance information between being queued in the store buffer and being stored in the memory, enable a store fence after the retired instruction to be inserted that causes previous store requests to complete within the memory, and on detection of completion of the store request for the instruction in the memory, store the post-retirement performance information in storage of the performance monitoring circuit.

SYSTEM AND METHOD FOR HANDLING FLOATING POINT HARDWARE EXCEPTION

A method includes receiving an input data at a floating point arithmetic operating unit, wherein the floating point operating unit is configured to perform a floating point arithmetic operation on the input data to generate an output result. The method also includes determining whether the output result is going to cause a floating point hardware exception responsive to the floating point arithmetic operation on the input data. The method further includes converting a value of the output result to a modified value responsive to the determining that the output result is going to cause the floating point hardware exception, wherein the modified value eliminates the floating point hardware exception responsive to the floating point arithmetic operation on the input data.

SYSTEM AND METHOD FOR HANDLING FLOATING POINT HARDWARE EXCEPTION

A method includes receiving an input data at a floating point arithmetic operating unit, wherein the floating point operating unit is configured to perform a floating point arithmetic operation on the input data. The method includes determining whether the received input data is a qnan (quiet not-a-number) or whether the received input data is an snan (signaling not-a-number) prior to performing the floating point arithmetic operation. The method also includes converting a value of the received input data to a modified value prior to performing the floating point arithmetic operation if the received input data is either qnan or snan, wherein the converting eliminates special handling associated with the floating point arithmetic operation on the input data being either qnan or snan.

SYSTEM AND METHOD FOR HANDLING FLOATING POINT HARDWARE EXCEPTION

A method includes receiving a first input data and a second input data at a floating point arithmetic operating unit, wherein the first input data and the second input data are associated with operands of a floating point arithmetic operation respectively, wherein the floating point operating unit is configured to perform a floating point arithmetic operation on the first input data and the second input data. The method further includes determining whether the first input data is a qnan (quiet not-a-number) or whether the first input data is an snan (signaling not-a-number) prior to performing the floating point arithmetic operation. A value of the first input data is modified prior to performing the floating point arithmetic operation if the first input data is either qnan or snan, wherein the converting eliminates special handling associated with the floating point arithmetic operation on the first input data being either qnan or snan.

SYSTEM AND METHOD FOR HANDLING FLOATING POINT HARDWARE EXCEPTION

A method includes receiving an input data at a floating point arithmetic operating unit, wherein the floating point operating unit is configured to perform a floating point arithmetic operation on the input data. The method also includes determining whether the received input data is positive infinity or negative infinity prior to performing the floating point arithmetic operation. The method further includes converting a value of the received input data to a modified value prior to performing the floating point arithmetic operation if the received input data is positive infinity or negative infinity.

Multiple multithreaded processors with shared data cache

A multi-core processor configured to improve processing performance in certain computing contexts is provided. The multi-core processor includes multiple processing cores that implement barrel threading to execute multiple instruction threads in parallel while ensuring that the effects of an idle instruction or thread upon the performance of the processor is minimized. The multiple cores can also share a common data cache, thereby minimizing the need for expensive and complex mechanisms to mitigate inter-cache coherency issues. The barrel-threading can minimize the latency impacts associated with a shared data cache. In some examples, the multi-core processor can also include a serial processor configured to execute single threaded programming code that may not yield satisfactory performance in a processing environment that employs barrel threading.

Dynamic instrumentation via user-level mechanisms
11288075 · 2022-03-29 · ·

In one embodiment, a method includes accessing a loaded but paused source process executable and disassembling the source process executable to identify a system call to be instrumented and an adjacent relocatable instruction. Instrumenting the system call includes building a trampoline for the system call that includes a check flag instruction at or near an entry point to the trampoline and two areas of the trampoline that are selectively executed according to results of the check flag instruction. Building a first area of the trampoline includes providing instructions to execute a relocated copy of the adjacent relocatable instruction and return flow to an address immediately following the adjacent relocatable instruction. Building a second area of the trampoline includes providing instructions to invoke at least one handler associated with executing a relocated copy of the system call and return flow to an address immediately following the system call.

System and method for handling floating point hardware exception

A method includes receiving an input data at a FP arithmetic operating unit configured to perform a FP arithmetic operation on the input data. The method further includes determining whether the received input data generates a FP hardware exception responsive to the FP arithmetic operation on the input data, wherein the determining occurs prior to performing the FP arithmetic operation. The method also includes converting a value of the received input data to a modified value responsive to the determining that the received input data generates the FP hardware exception, wherein the converting eliminates generation of the FP hardware exception responsive to the FP arithmetic operation on the input data.

INTERRUPTIBLE AND RESTARTABLE MATRIX MULTIPLICATION INSTRUCTIONS, PROCESSORS, METHODS, AND SYSTEMS

A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.