Patent classifications
G06F9/3889
HYBRID BLOCK-BASED PROCESSOR AND CUSTOM FUNCTION BLOCKS
Apparatus and methods are disclosed for implementing block-based processors having custom function blocks, including field-programmable gate array (FPGA) implementations. In some examples of the disclosed technology, a dynamically configurable scheduler is configured to issue at least one block-based processor instruction. A custom function block is configured to receive input operands for the instruction and generate ready state data indicating completion of a computation performed for the instruction by the respective custom function block.
VLIW processor including a state register for inter-slot data transfer and extended bits operations
A very long instruction word (VLIW) processor that performs efficient processing including extended bits operations is provided. The VLIW processor includes an instruction control unit, a register file unit, and an instruction execution unit. The instruction execution unit includes a plurality of slots, and a state register arranged between the second slot and the third slot to transfer N-bit data between the second and third slots. The VLIW processor stores data output from the third slot into the state register and uses the data, and thus achieves efficient processing including bit-expanded operations, such as processing performed in response to instructions commonly used in image processing, image recognition, and other processing, while preventing scaling up of the circuit.
Anticipated prefetching for a parent core in a multi-core chip
Embodiments relate to prefetching data on a chip having a scout core and a parent core coupled to the scout core. The method includes determining that a program executed by the parent core requires content stored in a location remote from the parent core. The method includes sending a fetch table address determined by the parent core to the scout core. The method includes accessing a fetch table that is indicated by the fetch table address by the scout core. The fetch table indicates how many of pieces of content are to be fetched by the scout core and a location of the pieces of content. The method includes based on the fetch table indicating, fetching the pieces of content by the scout core. The method includes returning the fetched pieces of content to the parent core.
Sync network
The provision of redundancy in a sync network, which protects the sync network against faults, such as broken cables in the sync network. The gateway comprises a sync propagation module configured to provide redundant sync requests that are sent along different pathways in the sync network. These sync requests are sent to towards different masters in the sync network. If a fault occurs at a point in one of the paths, the gateway will still receive a sync acknowledgment returned along the other path. Furthermore, the use of redundant sync networks, propagating the sync requests across different paths, allows fault detection in the wiring to be detected.
Instruction dispatch for superscalar processors
The present disclosure relates to instruction dispatch mechanisms for superscalar processors having a plurality of functional units for executing operations simultaneously. Each particular functional unit of the plurality of functional units may be configured to output a capability vector indicating a set of operations that the particular functional unit is currently available to perform. As instructions are received in an issue queue, the functional unit to execute the instruction is selected by comparing capabilities required by the instruction to the asserted capabilities of each of the functional units. A functional unit may reset or de-assert a particular functionality while performing an operation and then re-assert the capability when the instruction is completed. A result of the operation may be stored in a skid buffer for at least as long as the chain execution time in order to avoid resource hazards are a write port of the vector register file.
Instruction Writing Method and Apparatus, and Network Device
An instruction writing method, apparatus, and network device are provided to reduce a requirement for a storage space of a microcode processor. The method includes obtaining, by a first device, first indication information, where the first indication information indicates the first device to enable a first service function, and writing, by the first device, a first microcode instruction set corresponding to the first service function into an unused storage space of a target microcode processor in a network processor, where a size of the unused storage space is greater than or equal to a size of the first microcode instruction set.
Reconfigurable Multi-Thread Processor for Simultaneous Operations on Split Instructions and Operands
A superscalar processor has a thread mode of operation for supporting multiple instruction execution threads which are full data path wide instructions, and a micro-thread mode of operation where each thread supports two micro-threads which independently execute instructions. An executed instruction sets a micro-thread mode and an executed instruction sets the thread mode.
Techniques for scheduling operations at an instruction pipeline
A dispatch stage of a processor core dispatches designated operations (e.g. load/store operations) to a temporary queue when the resources to execute the designated operations are not available. Once the resources become available to execute an operation at the temporary queue, the operation is transferred to a scheduler queue where it can be picked for execution. By dispatching the designated operations to the temporary queue, other operations behind the designated operations in a program order are made available for dispatch to the scheduler queue, thereby improving instruction throughput at the processor core.
Method and Computing System for Handling Instruction Execution Using Affine Register File on Graphic Processing Unit
The present invention provides an affine engine design to the microarchitecture of the graphic processing unit, in which an operand type detection is performed, and then physical scalar, affine, or vector registers and corresponding ALUs with maximum performance improving and energy saving are allocated to perform instruction execution. In runtime, affine and uniform instructions are executed by the affine engine, while general vector instructions are executed by a vector engine, thereby the affine/uniform instruction execution can be dispatched to the affine engine, so the vector engine can enter a power-saving state to save the energy consumption of the GPU.
Snapshot transmission from storage array to cloud using multi-path input-output
A processing device is configured to communicate over a network with a storage system comprising a plurality of storage devices. The device comprises a multi-path input-output (MPIO) driver configured to control delivery of input-output (IO) operations from the device to the storage system over selected ones of a plurality of paths through the network. The paths are associated with respective initiator-target pairs, and each of a plurality of targets of the initiator-target pairs comprises a corresponding port of the storage system. The MPIO driver is further configured to create a plurality of IO operation threads, to use a given IO operation thread to retrieve a given IO operation from an IO queue, to attempt to perform the given IO operation on a given target of the plurality of targets, and to return the given IO operation to the IO queue upon a failure to perform the given IO operation.