G06F11/348

APPARATUSES, METHODS, AND SYSTEMS TO PRECISELY MONITOR MEMORY STORE ACCESSES
20230176870 · 2023-06-08 ·

Systems, methods, and apparatuses relating to circuitry to precisely monitor memory store accesses are described. In one embodiment, a system includes a memory, a hardware processor core comprising a decoder to decode an instruction into a decoded instruction, an execution circuit to execute the decoded instruction to produce a resultant, a store buffer, and a retirement circuit to retire the instruction when a store request for the resultant from the execution circuit is queued into the store buffer for storage into the memory, and a performance monitoring circuit to mark the retired instruction for monitoring of post-retirement performance information between being queued in the store buffer and being stored in the memory, enable a store fence after the retired instruction to be inserted that causes previous store requests to complete within the memory, and on detection of completion of the store request for the instruction in the memory, store the post-retirement performance information in storage of the performance monitoring circuit.

Embedding stall and event trace profiling data in the timing stream

An electronic tracing process includes packing both stall (215) and reason (219) data into a single high priority timing information stream. An integrated circuit includes an electronic processor (110), and a tracing circuit (120) operable to pack both stall and events data into a single timing information stream. Other circuits, processes and systems are also disclosed.

Performance monitoring in a data processing apparatus capable of executing instructions at a plurality of privilege levels
09798646 · 2017-10-24 · ·

A data processing apparatus has processing circuitry which can execute instructions at one of several privilege levels. A plurality of performance monitoring circuits are included. In response to an instruction executed at a first privilege level, first configuration data can be set for controlling performance monitoring by a first subset of performance monitoring circuits. A disable control flag can be set in response to an instruction executed at a second privilege level higher than the first privilege level. If the disable control flag has a predetermined value then performance monitoring control circuitry disables performance monitoring by the first subset of performance monitoring circuits while the processing circuitry is executing instructions at the second privilege level.

Instruction and logic for tracking fetch performance bottlenecks
11256506 · 2022-02-22 · ·

A processor includes a front end, an execution unit, a retirement stage, a counter, and a performance monitoring unit. The front end includes logic to receive an event instruction to enable supervision of a front end event that will delay execution of instructions. The execution unit includes logic to set a register with parameters for supervision of the front end event. The front end further includes logic to receive a candidate instruction and match the candidate instruction to the front end event. The counter includes logic to generate the front end event upon retirement of the candidate instruction.

Monitoring microprocessor interface information for a preset service using an address based filter

Embodiments of the present invention, which relate to the field of electronic technologies, provide a monitoring method, a monitoring apparatus, and an electronic device, which can accurately locate an error point in MPI information delivered by a system chip. The apparatus may include: an address filter, a read/write controller connected to the address filter, and a memory connected to the read/write controller, where the address filter is configured to acquire multiple pieces of MPI information, and obtain, by filtering the multiple pieces of MPI information, first MPI information corresponding to a first service that is preset; the read/write controller is configured to write, into the memory according to a time sequence of receiving the first MPI information, the first MPI information that is obtained by the address filter by filtering; and the memory is configured to store the first MPI information written by the read/write controller.

Technology for dynamically tuning processor features

A processor comprises a microarchitectural feature and dynamic tuning unit (DTU) circuitry. The processor executes a program for first and second execution windows with the microarchitectural feature disabled and enabled, respectively. The DTU circuitry automatically determines whether the processor achieved worse performance in the second execution window. In response to determining that the processor achieved worse performance in the second execution window, the DTU circuitry updates a usefulness state for a selected address of the program to denote worse performance. In response to multiple consecutive determinations that the processor achieved worse performance with the microarchitectural feature enabled, the DTU circuitry automatically updates the usefulness state to denote a confirmed bad state. In response to the usefulness state denoting the confirmed bad state, the DTU circuitry automatically disables the microarchitectural feature for the selected address for execution windows after the second execution window. Other embodiments are described and claimed.

METHOD FOR PERFORMANCE MONITORING USING A REDUNDANCY TRACKING REGISTER

Embodiments include a system for performance monitoring, the system includes a processor configured to perform a method. The method includes detecting, by a redundancy register, a change to a counter value corresponding to one of a plurality of hardware counters, wherein the redundancy register comprises a plurality of memory locations; storing, in each of the plurality of memory locations, a value indicating a change was detected for the counter value corresponding to the plurality of hardware counters, wherein each of the plurality of hardware counters map to one of the plurality of memory locations; performing read operation on a subset of the hardware counters, wherein members of the subset of the hardware counters are determined based upon the value indicating that the change was detected for the counter value corresponding to the plurality of hardware counters; and resetting the value stored in all the memory locations to a default value.

Methods and systems for analyzing and improving performance of computer codes
09753731 · 2017-09-05 · ·

Methods and systems for analyzing and improving performance of computer codes. In some embodiments, a method comprises executing, via one or more processors, program code; collecting, via the one or more processors, one or more hardware dependent metrics for the program code; identifying an execution anomaly based on the one or more hardware dependent metrics, wherein the execution anomaly is present when executing the program code; and designing a modification of the program code via the one or more processors, wherein the modification addresses the execution anomaly. In some other embodiments, a method comprises collecting one or more hardware independent metrics for program code; receiving one or more characteristics of a computing device; and estimating, based on the one or more hardware independent metrics and the one or more characteristics, a duration for execution of the program code on the computing device.

Information processing apparatus and method of collecting performance analysis data

An information processing apparatus includes a packet preprocessing unit configured to generate a packet process request when a packet is received; a CPU core configured to process the packet in response to the packet process request; a hardware element configured to generate a message including information identifying a predetermined event, in response to the predetermined event occurring in accordance with the processing of the packet, the hardware element being provided in the CPU core; and a message recording unit configured to record the message generated by the hardware element together with a count value of a timer.

APPARATUS AND METHOD FOR PAUSING PROCESSOR TRACE FOR EFFICIENT ANALYSIS
20220308980 · 2022-09-29 ·

Processor trace systems and methods are described. For example, one embodiment comprises executing instrumented code by a compiler, the instrumented code including at least one call to un-instrumented code. The compiler can determine the at least one call to un-instrumented code is a next call to be executed. A resume tracing instruction can be inserted into the instrumented code prior to the at least one call to the un-instrumented code. The resume tracing instruction can be executed to selectively add processor tracing to the at least one call to the un-instrumented code, and the at least one call to the un-instrumented code can be executed.