Patent classifications
G06F9/3005
Latent modification instruction for substituting functionality of instructions during transactional execution
An instruction stream includes a transactional code region. The transactional code region includes a latent modification instruction (LMI), a next sequential instruction (NSI) following the LMI, and a set of target instructions following the NSI in program order. Each target instruction has an associated function, and the LMI at least partially specifies a substitute function for the associated function. A processor executes the LMI, the NSI, and at least one of the target instructions, employing the substitute function at least partially specified by the LMI. The LMI, the NSI, and the target instructions may be executed by the processor in sequential program order or out of order.
Complex query optimization
Disclosed is a computer implemented method and related system to improve the efficiency of querying remote databases. The method includes receiving, from a host, a query, wherein the query is configured to retrieve a set of data from a remote database. The method also includes, generating an access plan, the access plan comprising a plurality of nodes wherein each node of the plurality of nodes includes a command. The method further includes determining capabilities of the remote database. The method includes executing the query and returning the set of data to the host.
DATA COMMUNICATION INTERFACE FOR PROCESSING DATA IN LOW POWER SYSTEMS
Improvements over existing data collection interfaces disclosed herein include, among other things, additional logic blocks (and associated timers, state machines, and registers) to off-load data collection and data processing prior to waking a microprocessor from a sleep mode. For example, an improved data collection interface collects a predetermined number of sensor values from a sensor while maintaining active a single communication session with the sensor over a pin of the interface. The microprocessor remains in the sleep mode for an entire duration of the single communication session. The data collection interface can reduce the likelihood of false starts of the microprocessor by using the logic blocks to verify that data meet preconditions prior to interrupting the microprocessor. The data collection interface can reduce the overall power consumption of a chip in which the microprocessor is integrated by a factor of at least about 2× (i.e., 50% reduction in power consumption).
Non-serialized push instruction for pushing a message payload from a sending thread to a receiving thread
In at least some embodiments, a processor core executes a sending thread including a first push instruction and a second push instruction subsequent to the first push instruction in a program order. Each of the first and second push instructions requests that a respective message payload be pushed to a mailbox of a receiving thread. In response to executing the first and second push instructions, the processor core transmits respective first and second co-processor requests to a switch in the data processing system via an interconnect fabric of the data processing system. The processor core transmits the second co-processor request to the switch without regard to acceptance of the first co-processor request by the switch.
Method and apparatus for efficient execution of nested branches on a graphics processor unit
An apparatus and method for executing nested control flow instructions on a graphics processing unit (GPU). For example, one embodiment of a processor comprises: an execution unit having a plurality of channels to execute control flow instructions including fused control flow instructions comprising two or more consecutive control flow instructions fused into a single fused control flow instruction; and a branch unit to process the control flow instructions and to maintain a global counter indicating a nesting level of the control flow instructions, wherein to process a fused control flow instruction, the branch unit is to store a value N in a stack indicating a number of control flow instructions fused into the fused control flow instruction, the branch unit to subsequently read the value N from the stack upon execution of the fused control flow instruction and decrement the global counter by a value of N responsive to execution of the fused control flow instruction.
BLOCKING INSTRUCTION FETCHING IN A COMPUTER PROCESSOR
Blocking instruction fetching in a computer processor, includes: receiving a non-branching instruction to be executed by the computer processor; determining whether executing the non-branching instruction will cause a flush; and responsive to determining that executing the non-branching instruction will cause a flush, disabling instruction fetching for the computer processor for a time, including recoding the instruction such that the recoded instruction will be interpreted by an instruction fetch unit as an unconditional branch instruction.
INDEPENDENT VECTOR ELEMENT ORDER AND MEMORY BYTE ORDER CONTROLS
Techniques are disclosed for managing vector element ordering. One technique includes receiving an assembler command from a source file, wherein the assembler command indicates a vector element order for one or more subsequent machine instructions in the source file. The technique includes determining whether the vector element order comprises a big-endian (BE) order or a little-endian (LE) order. If the vector element order comprises a BE order, the technique includes assembling one or more subsequent machine instructions and placing the machine instructions in a BE section of a file. If the vector element order comprises a LE order, the technique includes assembling one or more subsequent machine instructions and placing the machine instructions in a LE section of the file.
BRANCHING OPERATION FOR NEURAL PROCESSOR CIRCUIT
A neural processor includes neural engines for performing convolution operations on input data corresponding to one or more tasks to generate output data. The neural processor circuit also includes a data processor circuit that is coupled to one or more neural engine. The data processor circuit receives the output data from the neural engine and generates a branching command from the output data. The neural processor circuit further includes a task manager that is coupled to the data processor circuit. The task manager receives the branching command from the data processor circuit. The task manager enqueues one of two or more segment branches according to the received branching command. The two or more segment branches are subsequent to a pre-branch task segment that includes the pre-branch task. The task manager transmits a task from the selected one of the segment branches to data processor circuit to perform the task.
Enhanced processing for communication workflows using machine-learning techniques
The present disclosure generally relates to evaluating communication workflows comprised of tasks using machine-learning techniques. More particularly, the present disclosure relates to systems and methods for generating a prediction of a task outcome of a communication workflow, generating a recommendation of one or more tasks to add to a partial communication workflow to complete the communication workflow, and generating a vector representation of a communication workflow.
METHOD FOR CONTROL-FLOW INTEGRITY PROTECTION, APPARATUS, DEVICE AND STORAGE MEDIUM
Embodiments of the present disclosure provide a method for control-flow integrity protection, including: changing preset bits of all legal target addresses of a current indirect branch instruction in a control flow of a program to be protected to be same; and rewriting preset bits of a current target address of the current indirect branch instruction to be same as the preset bits of the legal target addresses, so that the program to be protected terminates when the current target address is tampered with. By changing the preset bits of all the legal target addresses of the current indirect branch instruction to be same and rewriting the preset bits of the current target address to be consistent with the preset bits of the legal target addresses, traditional label comparison is replaced by the preset bit overlap operation, reducing performance overhead and improving attack defense efficiency.