G06F9/30192

Instruction and logic for processing text strings

Method, apparatus, and program means for performing a string comparison operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources store a result of a comparison between each data element of a first and second operand corresponding to a first and second text string, respectively.

Dataflow Triggered Tasks for Accelerated Deep Learning

Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow based computations on wavelets of data. Each processing element has a compute element and a routing element. Each compute element has memory. Each router enables communication via wavelets with nearest neighbors in a 2D mesh. Routing is controlled by respective virtual channel specifiers in each wavelet and routing configuration information in each router. A compute element receives a particular wavelet comprising a particular virtual channel specifier and a particular data element. Instructions are read from the memory of the compute element based at least in part on the particular virtual channel specifier. The particular data element is used as an input operand to execute at least one of the instructions.

Processing Core with Meta Data Actuated Conditional Graph Execution
20210042118 · 2021-02-11 · ·

A processing core for the efficient execution of a directed graph is disclosed. The processing core includes a memory and a first and a second data tile stored in the memory. The first and second data tiles include a first and a second set of data elements stored contiguously in the memory. The processing core also includes metadata relationally stored with the first data tile in the memory. The processing core also includes an execution engine, a control unit, and an instruction. Execution of the instruction uses the execution engine, a first data element in the first set of data elements, and a second data element in the second set of data elements. The control unit conditions execution of the instruction using the metadata. A standard execution of the instruction generates a standard output. A conditional execution of the instruction operation generates a conditionally executed output.

Task activating for accelerated deep learning

Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a compute element and a routing element. Each router enables communication via wavelets with at least nearest neighbors in a 2D mesh. Routing is controlled by virtual channel specifiers in each wavelet and routing configuration information in each router. Execution of an activate instruction or completion of a fabric vector operation activates one of the virtual channels. A virtual channel is selected from a pool comprising previously activated virtual channels and virtual channels associated with previously received wavelets. A task corresponding to the selected virtual channel is activated by executing instructions corresponding to the selected virtual channel.

Simulation program, method, and device

A simulation method performed by a computer for simulating operations by a plurality of cores based on resource access operation descriptions on the plurality of cores, the method includes steps of: extracting a resource access operation description on at least one core of the plurality of cores by executing simulation for the one core; and, under a condition where the one core and a second core among the plurality of cores have a specific relation in execution processing, generating a resource access operation description on the second core from the resource access operation description on the one core by reflecting an address difference between an address of a resource to which the one core accesses and an address of a resource to which the second core accesses.

OPTIMIZED COMPUTE HARDWARE FOR MACHINE LEARNING OPERATIONS

A processing cluster of a processing cluster array comprises a plurality of registers to store input values of vector input operands, the input values of at least some of the vector input operands having different bit lengths than those of other input values of other vector input operands, and a compute unit to execute a dot-product instruction with the vector input operands to perform a number of parallel multiply operations and an accumulate operation per 32-bit lane based on a bit length of the smallest-sized input value of a first vector input operand relative to the 32-bit lane.

SORT AND MERGE INSTRUCTION FOR A GENERAL-PURPOSE PROCESSOR

A Sort Lists instruction is provided to perform a sort and/or a merge operation. The instruction is an architected machine instruction of an instruction set architecture and is executed by a general-purpose processor of the computing environment. The executing includes sorting a plurality of input lists to obtain one or more sorted output lists, which are output.

DATA PROCESSING METHOD AND APPARATUS, AND RELATED PRODUCT

The present disclosure provides a data processing method and an apparatus and a related product. The products include a control module including an instruction caching unit, an instruction processing unit, and a storage queue unit. The instruction caching unit is configured to store computation instructions associated with an artificial neural network operation; the instruction processing unit is configured to parse the computation instructions to obtain a plurality of operation instructions; and the storage queue unit is configured to store an instruction queue, where the instruction queue includes a plurality of operation instructions or computation instructions to be executed in the sequence of the queue. By adopting the above-mentioned method, the present disclosure can improve the operation efficiency of related products when performing operations of a neural network model.

TASK ACTIVATING FOR ACCELERATED DEEP LEARNING

Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a compute element and a routing element. Each router enables communication via wavelets with at least nearest neighbors in a 2D mesh. Routing is controlled by virtual channel specifiers in each wavelet and routing configuration information in each router. Execution of an activate instruction or completion of a fabric vector operation activates one of the virtual channels. A virtual channel is selected from a pool comprising previously activated virtual channels and virtual channels associated with previously received wavelets. A task corresponding to the selected virtual channel is activated by executing instructions corresponding to the selected virtual channel.

COMPOSABLE NEURAL NETWORK KERNELS

A technique for manipulating a generic tensor is provided. The technique includes receiving a first request to perform a first operation on a generic tensor descriptor associated with the generic tensor, responsive to the first request, performing the first operation on the generic tensor descriptor, receiving a second request to perform a second operation on generic tensor raw data associated with the generic tensor, and responsive to the second request, performing the second operation on the generic tensor raw data.