G06F9/30087

Pausing execution of a first machine code instruction with injection of a second machine code instruction in a processor
11635966 · 2023-04-25 · ·

Aspects of the present disclosure provide a processor having: an execution unit configured to execute machine code instructions, at least one of the machine code instructions requiring multiple cycles for its execution; instruction memory holding instructions for execution, wherein the execution unit is configured to access the memory to fetch instructions for execution; an instruction injection mechanism configured to inject an instruction into the execution pipeline during execution of the at least one machine code instruction fetched from the memory; the execution unit configured to pause execution of the at least one machine code instruction, to execute the injected instruction to termination, to detect termination of the injected instruction and to automatically recommence execution of the at least one machine code instruction on detection of termination of the injected instruction.

Unified memory management for a multiple processor system

Various multi-processor unified memory management systems and methods are detailed herein. In embodiments detailed herein, inter-chip memory management modules may be executed by processors that are in communication via an inter-chip link. A flat memory map may be used across the multiple processors of the system. Each inter-chip memory management module may analyze memory transactions. If the memory transaction is directed to a portion of the flat memory map managed by another processor, the memory-transaction may be translated to a non-memory mapped transaction and transmitted via an inter-chip communication link.

Emulator synchronization subsystem with enhanced slave mode

Embodiments described herein include an emulator system having a synchronization subsystem comprising devices, organized in logical hierarchy, controlling synchronization of a system clock and system components during emulation execution. The devices of the logical hierarchy communicate bi-directionally, communicating status indicators upwards and execution instructions downwards. A TCI is designated “master TCI” and others are designated “slave TCIs.” The master TCI asserts a RDY status that propagates upwards to a root node for a number cycles. The slave TCIs execute in “infinite run” and continually assert the RDY status upwards to the root device regardless of the cycle count. The root node detects each RDY status and propagates downwards a GO instruction to the master TCI and the slave TCIs. In this way, the TCIs execute until the master TCI de-asserts RDY status. The result is only the master TCI is manipulated to, for example, start/stop emulation or perform iterative execution.

METHOD AND APPARATUS FOR PERFORMING REDUCTION OPERATIONS ON A PLURALITY OF ASSOCIATED DATA ELEMENT VALUES

Embodiments detailed herein relate to reduction operations on a plurality of data element values. In one embodiment, a process comprises decoding circuitry to decode an instruction and execution circuitry to execute the decoded instruction. The instruction specifies a first input register containing a plurality of data element values, a first index register containing a plurality of indices, and an output register, where each index of the plurality of indices maps to one unique data element position of the first input register. The execution includes to identify data element values that are associated with one another based on the indices, perform one or more reduction operations on the associated data element values based on the identification, and store results of the one or more reduction operations in the output register.

Messaging for a hardware acceleration system

The present disclosure relates to a messaging method for a hardware acceleration system. The method includes determining exchange message types to be exchanged with a hardware accelerator in accordance with an application performed by the hardware acceleration system. The exchange message types indicate a number of variables, and a type of the variables, of the messages. The method also includes selecting schemas from a schema database. The message type schemas indicates a precision representation of variables of messages associated with the schema. The selected schemas correspond to the determined exchange message types. Further, the method includes configuring a serial interface of the hardware accelerator in accordance with the selected schemas, to enable a message exchange including the messages.

Creation of Message Serializer for Event Streaming Platform
20230060957 · 2023-03-02 ·

Processing logic may determine that an application is to produce one or more records to an event streaming platform. Processing logic may determine a data structure to contain content to be stored to the event streaming platform. Processing logic may automatically generate a serializer in view of the data structure during development of the application. During runtime, the application may use the serializer to serialize the content contained in the data structure and store the content to the one or more records of the event streaming platform.

INSTRUCTION EXECUTION METHOD AND INSTRUCTION EXECUTION DEVICE
20230161594 · 2023-05-25 ·

An instruction configuration and execution method includes the following steps. A target instruction is received through an instruction cache. The target instruction is decoded by an instruction translator. It is determined whether the target instruction has the authority to read or write the model specific register in an unprivileged state. It is determined whether the model specific register index of the specific instruction corresponds to a specific model specific register, so as to order the microprocessor to perform an instruction serialization operation.

PRIORITIZATION OF THREADS IN A SIMULTANEOUS MULTITHREADING PROCESSOR CORE

A first instruction for processing by a processor core is received. Whether the instruction is a larx is determined. Responsive to determining the instruction is a larx, whether a cacheline associated with the larx is locked is determined. Responsive to determining the cacheline associated with the larx is not locked, the cacheline associated with the larx is locked and a counter associated with a first thread of the processor core is started. The first thread is processing the first instruction.

Vector table load instruction with address generation field to access table offset value

A processor includes a scalar processor core and a vector coprocessor core coupled to the scalar processor core. The scalar processor core is configured to retrieve an instruction stream from program storage, and pass vector instructions in the instruction stream to the vector coprocessor core. The vector coprocessor core includes a register file, a plurality of execution units, and a table lookup unit. The register file includes a plurality of registers. The execution units are arranged in parallel to process a plurality of data values. The execution units are coupled to the register file. The table lookup unit is coupled to the register file in parallel with the execution units. The table lookup unit is configured to retrieve table values from one or more lookup tables stored in memory by executing table lookup vector instructions in a table lookup loop.

System and methods for tag-based synchronization of tasks for machine learning operations
11604683 · 2023-03-14 · ·

A new approach for supporting tag-based synchronization among different tasks of a machine learning (ML) operation. When a first task tagged with a set tag indicating that one or more subsequent tasks need to be synchronized with it is received at an instruction streaming engine, the engine saves the set tag in a tag table and transmits instructions of the first task to a set of processing tiles for execution. When a second task having an instruction sync tag indicating that it needs to be synchronized with one or more prior tasks is received at the engine, the engine matches the instruction sync tag with the set tags in the tag table to identify prior tasks that the second task depends on. The engine holds instructions of the second task until these matching prior tasks have been completed and then releases the instructions to the processing tiles for execution.