G06F9/3867

SYSTEMS AND METHODS FOR AI INFERENCE PLATFORM

System and method for using and managing artificial intelligence (AI) inference platform (AIP) and/or model orchestrators according to certain embodiments. For example, a method includes receiving sensor data via a data interface of a model orchestrator, the model orchestrator including an indication of a model pipeline, the model pipeline including a plurality of models; loading the plurality of models according to the model pipeline; applying the model pipeline to the received sensor data; receiving a model output from the model pipeline via a model interface of the model orchestrator; and generating an insight based at least in part on the model output.

EFFICIENT PROCESSING OF NESTED LOOPS FOR COMPUTING DEVICE WITH MULTIPLE CONFIGURABLE PROCESSING ELEMENTS USING MULTIPLE SPOKE COUNTS
20230051544 · 2023-02-16 ·

Disclosed in some examples, are methods, systems, devices, and machine-readable mediums which provide for more efficient CGRA execution by assigning different initiation intervals to different PEs executing a same code base. The initiation intervals may be a multiple of each other and the PE with the lowest initiation interval may be used to execute instructions of the code that is to be executed at a greater frequency than other instructions than other instructions that may be assigned to PEs with higher initiation intervals.

Frame parser executing subsets of instructions in parallel for processing a frame header

An integrated circuit (IC) may include a set of instruction list engines (ILEs) that execute in parallel, where each ILE stores a subset of a set of instructions for processing a header of a frame, and where each ILE generates an ILE result based on executing the subset of the set of instructions. The IC may include a circuit to determine a result of parsing the header of the frame based on merging ILE results generated by the set of ILEs.

Processing pipeline with first and second processing modes having different performance or energy consumption characteristics

An apparatus 2 has a processing pipeline 4 supporting at least a first processing mode and a second processing mode with different energy consumption or performance characteristics. A storage structure 22, 30, 36, 50, 40, 64, 44 is accessible in both the first and second processing modes. When the second processing mode is selected, control circuitry 70 triggers a subset 102 of the entries of the storage structure to be placed in a power saving state.

Programmable instruction buffering for accumulating a burst of instructions

A processing system 2 includes a processing pipeline 12, 14, 16, 18, 28 which includes fetch circuitry 12 for fetching instructions to be executed from a memory 6, 8. Buffer control circuitry 34 is responsive to a programmable trigger, such as explicit hint instructions delimiting an instruction burst, or predetermined configuration data specifying parameters of a burst together with a synchronising instruction, to trigger the buffer control circuitry to stall a stallable portion of the processing pipeline (e.g. issue circuitry 16), to accumulate within one or more buffers 30, 32 fetched instructions starting from a predetermined starting instruction, and, when those instructions have been accumulated, to restart the stallable portion of the pipeline.

METHOD AND APPARATUS FOR VECTOR SORTING USING VECTOR PERMUTATION LOGIC
20230037321 · 2023-02-09 ·

A method for sorting of a vector in a processor is provided that includes performing, by the processor in response to a vector sort instruction, generating a control input vector for vector permutation logic comprised in the processor based on values in lanes of the vector and a sort order for the vector indicated by the vector sort instruction and storing the control input vector in a storage location.

OFFLOADING PROCESSING TASKS TO DECOUPLED ACCELERATORS FOR INCREASING PERFORMANCE IN A SYSTEM ON A CHIP

In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

Writeback Hazard Elimination
20230011446 · 2023-01-12 ·

A processor includes a processing pipeline, a plurality of result-storage elements, and writeback logic. The processing pipeline is configured to process program operations and to write, to a result storage, up to a predefined maximal number of results of the processed program operations per clock cycle. The result-storage elements are configured to store respective ones of the results. The writeback logic is configured to (i) detect a writeback conflict event in which the processing pipeline produces simultaneous results that exceed the predefined maximal number of results, for writing to the result storage, in a same clock cycle, (ii) in response to detecting the writeback conflict event, to temporarily store at least a given result, from among the simultaneous results, in a given result-storage element, and (iii) to subsequently write the temporarily-stored given result from the given result-storage element to the result storage.

OPERATION OF A DUAL INSTRUCTION PIPE VIRUS CO-PROCESSOR
20180004945 · 2018-01-04 · ·

Circuits and methods are provided for detecting, identifying and/or removing undesired content. According to one embodiment, a method for performing content scanning of content objects is provided. A content object that is to be scanned is stored by a general purpose processor to a system memory of the general purpose processor. Content scanning parameters associated with the content object are set up by the general purpose processor. Instructions from a signature memory of a co-processor that is coupled to the general purpose processor are read by the co-processor based on the content scanning parameters. The instructions contain op-codes of a first instruction type and op-codes of a second instruction type. Those of the instructions containing op-codes of the first instruction type are assigned by the co-processor to a first instruction pipe of multiple instruction pipes of the co-processor for execution. An instruction of the assigned instructions containing op-codes of the first instruction type is executed by the first instruction pipe including accessing a portion of the content object from the system memory.

TECHNIQUES FOR METADATA PROCESSING
20180011708 · 2018-01-11 ·

Techniques are described for metadata processing that can be used to encode an arbitrary number of security policies for code running on a processor. Metadata may be added to every word in the system and a metadata processing unit may be used that works in parallel with data flow to enforce an arbitrary set of policies. In one aspect, the metadata may be characterized as unbounded and software programmable to be applicable to a wide range of metadata processing policies. Techniques and policies have a wide range of uses including, for example, safety, security, and synchronization. Additionally, described are aspects and techniques in connection with metadata processing in an embodiment based on the RISC-V architecture.