G06F9/3885

Frame parser executing subsets of instructions in parallel for processing a frame header

An integrated circuit (IC) may include a set of instruction list engines (ILEs) that execute in parallel, where each ILE stores a subset of a set of instructions for processing a header of a frame, and where each ILE generates an ILE result based on executing the subset of the set of instructions. The IC may include a circuit to determine a result of parsing the header of the frame based on merging ILE results generated by the set of ILEs.

Processing pipeline with first and second processing modes having different performance or energy consumption characteristics

An apparatus 2 has a processing pipeline 4 supporting at least a first processing mode and a second processing mode with different energy consumption or performance characteristics. A storage structure 22, 30, 36, 50, 40, 64, 44 is accessible in both the first and second processing modes. When the second processing mode is selected, control circuitry 70 triggers a subset 102 of the entries of the storage structure to be placed in a power saving state.

Method and apparatus to efficiently process and execute Artificial Intelligence operations
11580371 · 2023-02-14 · ·

A method, apparatus, and system are discussed to efficiently process and execute Artificial Intelligence operations. An integrated circuit has a tailored architecture to process and execute Artificial Intelligence operations, including computations for a neural network having weights with a sparse value. The integrated circuit contains at least a scheduler, one or more arithmetic logic units, and one or more random access memories configured to cooperate with each other to process and execute these computations for the neural network having weights with the sparse value.

SPARSE MATRIX OPERATIONS FOR DEEP LEARNING

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for parallelizing matrix operations. One of the methods includes implementing a neural network on a parallel processing device, the neural network comprising at least one sparse neural network layer, the sparse neural network layer being configured to receive an input matrix and perform matrix multiplication between the input matrix and a sparse weight matrix to generate an output matrix, the method comprising: for each row of the M rows of the output matrix, determining a plurality of tiles that each include one or more elements from the row; assigning, for each tile of each row, the tile to a respective one of a plurality of thread blocks of the parallel processing device; and computing, for each tile, respective values for each element in the tile using the respective thread block to which the tile was assigned.

OPERATION OF A DUAL INSTRUCTION PIPE VIRUS CO-PROCESSOR
20180004945 · 2018-01-04 · ·

Circuits and methods are provided for detecting, identifying and/or removing undesired content. According to one embodiment, a method for performing content scanning of content objects is provided. A content object that is to be scanned is stored by a general purpose processor to a system memory of the general purpose processor. Content scanning parameters associated with the content object are set up by the general purpose processor. Instructions from a signature memory of a co-processor that is coupled to the general purpose processor are read by the co-processor based on the content scanning parameters. The instructions contain op-codes of a first instruction type and op-codes of a second instruction type. Those of the instructions containing op-codes of the first instruction type are assigned by the co-processor to a first instruction pipe of multiple instruction pipes of the co-processor for execution. An instruction of the assigned instructions containing op-codes of the first instruction type is executed by the first instruction pipe including accessing a portion of the content object from the system memory.

ENGINE ARCHITECTURE FOR PROCESSING FINITE AUTOMATA

An engine architecture for processing finite automata includes a hyper non-deterministic automata (HNA) processor specialized for non-deterministic finite automata (NFA) processing. The HNA processor includes a plurality of super-clusters and an HNA scheduler. Each super-cluster includes a plurality of clusters. Each cluster of the plurality of clusters includes a plurality of HNA processing units (HPUs). A corresponding plurality of HPUs of a corresponding plurality of clusters of at least one selected super-cluster is available as a resource pool of HPUs to the HNA scheduler for assignment of at least one HNA instruction to enable acceleration of a match of at least one regular expression pattern in an input stream received from a network.

ADVANCED PROCESSOR ARCHITECTURE
20180004530 · 2018-01-04 ·

The invention relates to a method for processing instructions out-of-order on a processor comprising an arrangement of execution units. The inventive method comprises: 1) looking up operand sources in a Register Positioning Table and setting operand input references of the instruction to be issued accordingly; 2) checking for an Execution Unit (EXU) available for receiving a new instruction; and 3) issuing the instruction to the available Execution Unit and enter a reference of the result register addressed by the instruction to be issued to the Execution Unit into the Register Positioning Table (RPT).

CONVOLUTIONAL NEURAL NETWORK ON PROGRAMMABLE TWO DIMENSIONAL IMAGE PROCESSOR

A method is described that includes executing a convolutional neural network layer on an image processor having an array of execution lanes and a two-dimensional shift register. The executing of the convolutional neural network includes loading a plane of image data of a three-dimensional block of image data into the two-dimensional shift register. The executing of the convolutional neural network also includes performing a two-dimensional convolution of the plane of image data with an array of coefficient values by sequentially: concurrently multiplying within the execution lanes respective pixel and coefficient values to produce an array of partial products; concurrently summing within the execution lanes the partial products with respective accumulations of partial products being kept within the two dimensional register for different stencils within the image data; and, effecting alignment of values for the two-dimensional convolution within the execution lanes by shifting content within the two-dimensional shift register array.

Virtualized Multicore Systems With Extended Instruction Heterogeneity
20230237009 · 2023-07-27 ·

A system on a chip may include a plurality of data plane processor cores sharing a common instruction set architecture. At least one of the data plane processor cores is specialized to perform a particular function via extensions to the otherwise common instruction set architecture. Such systems on a chip may have reduced physical complexity, cost, and time-to-market, and may provide improvements in core utilization and reductions in system power consumption.

PROCESSING INGESTED DATA TO IDENTIFY ANOMALIES

Systems and methods are described for processing ingested data in an asynchronous manner as the data is being ingested to detect potential anomalies. For example, one or more streaming data processors can convert data as the data is ingested into a comparable data structure, determine whether the comparable data structure should be assigned to an existing data pattern or a new data pattern, and optionally update a characteristic of the data pattern to which the comparable data structure is assigned. The streaming data processor(s) can perform these operations automatically in real-time or in periodic batches. Once one or more comparable data structures have been assigned to one or more data patterns, the streaming data processor(s) can analyze the comparable data structures assigned to a particular data pattern to determine whether any of the comparable data structures appear to be anomalous.