Patent classifications
G06F9/322
Verified Stack Trace Generation And Accelerated Stack-Based Analysis With Shadow Stacks
A verified stack trace can be generated by utilizing information contained in a shadow stack, such as a hardware protected duplicate stack implemented for malware prevention and computer security. The shadow stack contains return addresses which are obtainable without requiring an unwinding of the traditional call stack. As such, triaging based on return address information can be performed more quickly and more efficiently, and with a reduced utilization of processing resources. Additionally, the generation of a verified stack trace can be performed, with such a verified stack trace containing return addresses that are known to be correct and not corrupted. The return addresses can either be read from the traditional call stack, or derived therefrom, and then verified by comparison to corresponding return addresses from the shadow stack, or they can be read directly from the shadow stack.
Register restoring branch instruction
There is provided an apparatus that includes processing circuitry for performing processing operations specified by program instructions and a target register that stores a target program address. A value register stores a data value. There is also provided an architectural register and an instruction decoder that decodes the program instructions to generate control signals to control the processing circuitry to perform the processing operations. The instruction decoder includes branch instruction decoding circuitry that decodes a register restoring branch instruction to cause the processing circuitry to determine whether the target program address and the data value are valid. If the target program address and the data value are both valid then the processing circuitry is caused to branch to the target program address and update the architectural register to store the data value. Otherwise an error action is taken.
Apparatus and method for an early page predictor for a memory paging subsystem
An apparatus and method for early page address prediction. For example, one embodiment of a processor comprises: an instruction fetch circuit to fetch a load instruction; a decoder to decode the load instruction; execution circuitry to execute the load instruction to perform a load operation, the execution circuitry including an address generation unit (AGU) to generate an effective address to be used for the load operation; and early page prediction (EPP) circuitry to use one or more attributes associated with the load instruction to predict a physical page address for the load instruction simultaneously with the AGU generating the effective address and/or prior to generation of the effective address.
Call stack sampling
Apparatuses and methods of their operation are disclosed. A call stack is maintained which comprises subroutine information relating to subroutines which have been called during data processing operations and have not yet returned. A stack pointer is indicative of an extremity of the call stack associated with a most recently called subroutine which has been called during the data processing operations and has not yet returned. Call stack sampling can be carried out with reference to the stack pointer. A tide mark pointer is maintained, which indicates of a value which the stack pointer had when the call stack sampling procedure processing circuitry was last completed. The call stack sampling procedure comprises retrieving subroutine information from the call stack indicated between the value of the tide mark pointer and the current value of the stack pointer. More efficient call stack sampling is thereby supported, in that only modifications to the call stack need be sampled.
Computation engine with extract instructions to minimize memory access
In an embodiment, a computation engine may offload work from a processor (e.g. a CPU) and efficiently perform computations such as those used in LSTM and other workloads at high performance. In an embodiment, the computation engine may perform computations on input vectors from input memories in the computation engine, and may accumulate results in an output memory within the computation engine. The input memories may be loaded with initial vector data from memory, incurring the memory latency that may be associated with reading the operands. Compute instructions may be performed on the operands, generating results in an output memory. One or more extract instructions may be supported to move data from the output memory to the input memory, permitting additional computation on the data in the output memory without moving the results to main memory.
METHODS AND SYSTEMS TO TRACK KERNEL CALLS USING A DISASSEMBLER
This disclosure and the exemplary embodiments described herein, provide methods and systems to trace/verify kernel calls of interest operatively associated with an operating system platform of a device. According to an exemplary embodiment, the mount/unmount kernel call associated with a Linux operating system platform is traced/verified to initiate an incremental backup of a memory of a device during the execution of the mount/unmount kernel call.
Instruction memory
Provided are systems and methods for implementing a memory for an integrated circuit device. In various examples, the integrated circuit can operate the memory as a FIFO, where each address in the FIFO is directly addressable. The integrated circuit can include a first register for storing a head pointer and a second register for storing a tail pointer. When new data is written to the memory, the data cat be written starting at the tail pointer location, without the tail pointer being modified. The tail pointer can be incremented using write transactions received from external to the integrated circuit.
PROCESSOR INCLUDING DEBUG UNIT AND DEBUG SYSTEM
The present disclosure discloses a debug unit, comprising: a write register configured to store kernel write data written by a kernel of a processor, wherein the processor is communicatively coupled to a debugger configured to read the kernel write data, wherein the kernel write data is associated with a kernel write flag bit to indicate data validity of the kernel write data; and a control unit including circuitry configured to control access to the write register by the kernel of the processor and the debugger based on data validity indicated by the kernel write flag bit. The present disclosure further discloses a corresponding processor including the debug unit, a corresponding debugger communicatively coupled to the processor, and a corresponding debug system including the processor coupled to the debugger.
METHODS AND APPARATUS TO INSERT PROFILING INSTRUCTIONS INTO A GRAPHICS PROCESSING UNIT KERNEL
Embodiments are disclosed for inserting profiling instructions into graphics processing unit (GPU) kernels. An example apparatus includes an entry point detector to detect a first entry point address and a second entry point address of an original GPU kernel, the first entry point address including a first entry point instruction, the second entry point address including a second entry point instruction. An instruction inserter is to create a corresponding instrumented GPU kernel from the original GPU kernel by inserting first profiling initialization instructions at a first address of the instrumented GPU kernel, the instruction inserter to insert profiling measurement instructions into the instrumented GPU kernel. An entry point adjuster is to adjust a list of entry points of the instrumented GPU kernel to replace the first entry point address with the first address and the second entry point address with the second address.
INPUT/OUTPUT DATA TRANSFORMATIONS WHEN EMULATING NON-TRACED CODE WITH A RECORDED EXECUTION OF TRACED CODE
Transforming input data to enable execution of second executable code using trace data gathered during execution of first executable code. A trace of an execution of the first code is accessed. The trace stores data of an input that was consumed by first executable instructions of the first code. It is determined that the stored data of the input is usable as an input to second executable instructions of the second code. A difference in size/format of the stored data as used by the first instructions, compared to an input size/format expected by the second executable instructions, is identified. Based on the identified difference, a data transformation is determined that would enable the second instructions to consume the stored data. Execution of the second instructions is emulated using the stored data, including projecting the data transformation to enable the second instructions to consume the stored data.