G06F9/3828

ADVANCED PROCESSOR ARCHITECTURE
20180004530 · 2018-01-04 ·

The invention relates to a method for processing instructions out-of-order on a processor comprising an arrangement of execution units. The inventive method comprises: 1) looking up operand sources in a Register Positioning Table and setting operand input references of the instruction to be issued accordingly; 2) checking for an Execution Unit (EXU) available for receiving a new instruction; and 3) issuing the instruction to the available Execution Unit and enter a reference of the result register addressed by the instruction to be issued to the Execution Unit into the Register Positioning Table (RPT).

DUAL PIPELINE PARALLEL SYSTOLIC ARRAY

A processing apparatus described herein includes a general-purpose parallel processing engine comprising a systolic array having multiple pipelines, each of the multiple pipelines including multiple pipeline stages, wherein the multiple pipelines include a first pipeline, a second pipeline, and a common input shared between the first pipeline and the second pipeline.

OPERATOR REGISTRATION METHOD AND APPARATUS FOR DEEP LEARNING FRAMEWORK, DEVICE AND STORAGE MEDIUM

The present disclosure provides an operator registration method and apparatus for a deep learning framework, a device and a storage medium, relates to the field of computer technologies, and specifically to the field of artificial intelligence such as deep learning. The operator registration method for a deep learning framework includes: receiving registration information provided by a user for registering operators with the deep learning framework, the registration information including: a custom calculation function, the custom calculation function being written in a manner irrelevant to the deep learning framework; building operator meta-information in the deep learning framework based on the registration information; and constructing a to-be-registered operator within the deep learning framework based on the operator meta-information, and registering the to-be-registered operator in a global operator table within the deep learning framework. The present disclosure can simplify an operator registration process.

UNIFIED AUTOMATION OF APPLICATION DEVELOPMENT

Unified automation of application development and delivery is provided. An automation pipeline execution coordinator may define a pipeline specification that includes actions to be performed, a triggering event definition and specification for determining execution context. The coordinator may concurrently detect triggering events for multiple pipelines matching the pipeline specification, and responsive to the detecting, determine execution contexts for the pipelines. The coordinator may then execute the multiple pipelines, where execution may proceed independently for pipelines with differing execution contexts. For pipelines sharing an execution context, execution of various actions of the respective pipelines may be coordinated. Execution context may be determined according to the specification for determining execution context, which may include an overridable default specification that determines context by locations of source data related to the triggering event. Pipeline specifications may be defined using pipeline specification templates and input from users obtained via various user interfaces.

APPARATUS AND METHODS EMPLOYING A SHARED READ PORT REGISTER FILE
20230034072 · 2023-02-02 ·

In some implementations, a processor includes a plurality of parallel instruction pipes, a register file includes at least one shared read port configured to be shared across multiple pipes of the plurality of parallel instruction pipes. Control logic controls multiple parallel instruction pipes to read from the at least one shared read port. In certain examples, the at least one shared register file read port is coupled as a single read port for one of the parallel instruction pipes and as a shared register file read port for a plurality of other parallel instruction pipes.

Distributed cluster training method and apparatus
11636379 · 2023-04-25 · ·

A distributed cluster training method and an apparatus thereof are provided. The method includes reading a sample set, the sample set including at least one piece of sample data; using the sample data and current weights to substitute into a target model training function for iterative training to obtain a first gradient before receiving a collection instruction, the collection instruction being issued by a scheduling server when a cluster system environment meets a threshold condition; sending the first gradient to an aggregation server if a collection instruction is received, wherein the aggregation server collects each first gradient and calculates second weights; and receiving the second weights sent by the aggregation server to update current weights. The present disclosure reduces an amount of network communications and an impact on switches, and avoids the use of an entire cluster from being affected.

Operator registration method and apparatus for deep learning framework, device and storage medium

The present disclosure provides an operator registration method and apparatus for a deep learning framework, a device and a storage medium, relates to the field of computer technologies, and specifically to the field of artificial intelligence such as deep learning. The operator registration method for a deep learning framework includes: receiving registration information provided by a user for registering operators with the deep learning framework, the registration information including: a custom calculation function, the custom calculation function being written in a manner irrelevant to the deep learning framework; building operator meta-information in the deep learning framework based on the registration information; and constructing a to-be-registered operator within the deep learning framework based on the operator meta-information, and registering the to-be-registered operator in a global operator table within the deep learning framework. The present disclosure can simplify an operator registration process.

SIMD operand permutation with selection from among multiple registers

Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline. In some embodiments, the routing circuitry may support a shift and fill instruction that facilitates storage of an arbitrary portion of a graphics frame in one or more registers.

OPERATION OF A MULTI-SLICE PROCESSOR IMPLEMENTING DEPENDENCY ACCUMULATION INSTRUCTION SEQUENCING

Operation of a multi-slice processor that includes a plurality of execution slices. Operation of such a multi-slice processor includes: receiving a first instruction indicating a first target register; receiving a second instruction indicating the first target register as a source operand; responsive to the second instruction indicating the first target register as a source operand, updating a dependent count corresponding to the first instruction; and issuing, in dependence upon the dependent count for the first instruction being greater than a dependent count for another instruction, the first instruction to an execution slice of the plurality of execution slices.

TRANSMITTING DATA BETWEEN EXECUTION SLICES OF A MULTI-SLICE PROCESSOR

Methods and apparatus for transmitting data between execution slices of a multi-slice processor including receiving, by an execution slice, a broadcast message comprising an instruction tag (ITAG) for a producer instruction, a latency, and a source identifier; determining that an issue queue in the execution slice comprises an ITAG for a consumer instruction, wherein the consumer instruction depends on result data from the producer instruction; calculating a cycle countdown using the latency and the source identifier; determining that the cycle countdown has expired; and in response to determining that the cycle countdown has expired, reading the result data from the producer instruction.