G06F9/30181

Non-cached loads and stores in a system having a multi-threaded, self-scheduling processor
11579888 · 2023-02-14 · ·

Representative apparatus, method, and system embodiments are disclosed for a self-scheduling processor which also provides additional functionality. Representative embodiments include a self-scheduling processor, comprising: a processor core adapted to execute instructions; and a core control circuit adapted to automatically schedule an instruction for execution by the processor core in response to a received work descriptor data packet. In a representative embodiment, the processor core is further adapted to execute a non-cached load instruction to designate a general purpose register rather than a data cache for storage of data received from a memory circuit. The core control circuit is also adapted to schedule a fiber create instruction for execution by the processor core, and to generate one or more work descriptor data packets to another circuit for execution of corresponding execution threads. Event processing, data path management, system calls, memory requests, and other new instructions are also disclosed.

Highly Scalable Quantum Control
20230042521 · 2023-02-09 ·

A system comprising a quantum control data exchange circuit that enables a large, variable number of pulse generation circuits to exchange data within the coherence time of a plurality of quantum elements to enable feedback-based quantum control of a large, variable number of quantum elements.

Configurable delay insertion in compiled instructions
11556342 · 2023-01-17 · ·

Techniques are disclosed for utilizing configurable delays in an instruction stream. A set of instructions to be executed on a set of engines are generated. The set of engines are distributed between a set of hardware elements. A set of configurable delays are inserted into the set of instructions. Each of the set of configurable delays includes an adjustable delay amount that delays an execution of the set of instructions on the set of engines. The adjustable delay amount is adjustable by a runtime application that facilitates the execution of the set of instructions on the set of engines. The runtime application is configured to determine a runtime condition associated with the execution of the set of instructions on the set of engines and to adjust the set of configurable delays based on the runtime condition.

ACCESSING PHYSICAL MEMORY FROM A CPU OR PROCESSING ELEMENT IN A HIGH PERFOMANCE MANNER
20180004671 · 2018-01-04 ·

A method and apparatus is described herein for accessing a physical memory location referenced by a physical address with a processor. The processor fetches/receives instructions with references to virtual memory addresses and/or references to physical addresses. Translation logic translates the virtual memory addresses to physical addresses and provides the physical addresses to a common interface. Physical addressing logic decodes references to physical addresses and provides the physical addresses to a common interface based on a memory type stored by the physical addressing logic.

Computing device and method

The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.

Virtualized Multicore Systems With Extended Instruction Heterogeneity
20230237009 · 2023-07-27 ·

A system on a chip may include a plurality of data plane processor cores sharing a common instruction set architecture. At least one of the data plane processor cores is specialized to perform a particular function via extensions to the otherwise common instruction set architecture. Such systems on a chip may have reduced physical complexity, cost, and time-to-market, and may provide improvements in core utilization and reductions in system power consumption.

Optimized branching using safe static keys
11567774 · 2023-01-31 · ·

Systems and methods for managing optimized branching in executable instructions are disclosed. In one implementation, a processing device may identify, in a sequence of executable instructions, a branching instruction associated with a safe static key, the branching instruction specifying a first target location. The processing device may determine whether a value of the safe static key is initialized. Responsive to determining that the value of the safe static key is initialized, the processing device may further replace the branching instruction with an unconditional branching instruction specifying the first target location. Responsive to determining that the value of the safe static key is uninitialized, the processing device may replace the branching instruction with a conditional branching instruction specifying the first target location.

Dynamic generation of logic for computing systems

Some embodiments provide a non-transitory machine-readable medium that stores a program. The program observes a parameter associated with a computing system. Upon receiving a change associated with the parameter, the program further determines a routine definition from a set of routine definitions associated with the parameter. Each routine definition in the set of routine definitions specifies a set of instructions associated with a particular parameter associated with the computing system. The program also executes the set of instructions specified in the determined routine definition.

ZERO OPERAND INSTRUCTION CONVERSION FOR ACCELERATING SPARSE COMPUTATIONS IN A CENTRAL PROCESSING UNIT PIPELINE
20230024089 · 2023-01-26 ·

A processing device includes a zero detection circuit to determine that an operand of a first instruction is zero and instruction conversion logic coupled with the zero detection circuit to, in response to the zero detection circuit determining that the operand is zero, convert the first instruction to a register move instruction executable by the processing device.

Inactivating basic blocks of program code to prevent code reuse attacks

An approach is provided that, after receiving a request to execute a computer program, determines an active set of metadata that corresponds to the requested computer program and then loads basic blocks of the requested computer program into memory. One of the loaded basic blocks is a starting block of the requested computer program. The memory also stores basic blocks corresponding to some previously loaded computer programs. The approach also inactivates basic blocks that are currently stored in the memory, with the inactivated basic blocks being identified based on a comparison of the active set of metadata to the sets of metadata that corresponding to the basic blocks of previously loaded computer programs. After inactivating some basic blocks, the approach executes the starting block of the requested computer program.