G06F9/3885

Processor with instruction iteration

A processor includes a plurality of execution units. At least one of the execution units is configured to repeatedly execute a first instruction based on a first field of the first instruction indicating that the first instruction is to be iteratively executed.

Method and apparatus for stateless parallel processing of tasks and workflows
11714654 · 2023-08-01 · ·

In a method for parallel processing of a data stream, a processing task is received to process the data stream that includes a plurality of segments. A split operation is performed on the data stream to split the plurality of segments into N sub-streams. Each of the N sub-streams includes one or more segments of the plurality of segments. The N is a positive integer. N sub-processing tasks are performed on the N sub-streams to generate N processed sub-streams. A merge operation is performed on the N processed sub-streams based on a merge buffer to generate a merged output data stream. The merge buffer includes an output iFIFO buffer and N sub-output iFIFO buffers coupled to the output iFIFO buffer. The merged output data stream is identical to an output data stream that is generated when the processing task is applied directly to the data stream without the split operation.

Thread group scheduling for graphics processing

Embodiments are generally directed to thread group scheduling for graphics processing. An embodiment of an apparatus includes a plurality of processors including a plurality of graphics processors to process data; a memory; and one or more caches for storage of data for the plurality of graphics processors, wherein the one or more processors are to schedule a plurality of groups of threads for processing by the plurality of graphics processors, the scheduling of the plurality of groups of threads including the plurality of processors to apply a bias for scheduling the plurality of groups of threads according to a cache locality for the one or more caches.

PARALLEL DECISION SYSTEM AND METHOD FOR DISTRIBUTED DATA PROCESSING
20230229449 · 2023-07-20 ·

The present disclosure provides a parallel decision system and method for distributed data processing. The system includes: an initial logical node generation assembly, a logical node traversal assembly, a predetermined configuration cost computation assembly, and a parallel decision assembly. The initial logical node generation assembly is configured to receive task configuration data input by a user to generate an initial logical node topology for the distributed data processing system. The logical node traversal assembly is configured to traverse the initial logical node topology to obtain a predetermined configuration in the initial logical node topology. The predetermined configuration cost computation assembly is configured to compute a transmission cost of each predetermined configuration and a cost sum. The predetermined configuration transformation assembly is configured to, based on the result of the predetermined configuration, reduce an initial logical node, and a connection edge, a combined initial logical node.

Signal-processing apparatus including a second processor that, after receiving an instruction from a first processor, independantly controls a second data processing unit without further instruction from the first processor

A signal-processing apparatus includes an instruction-parallel processor, a first data-parallel processor, a second data-parallel processor, and a motion detection unit, a de-blocking filtering unit and a variable-length coding/decoding unit which are dedicated hardware. With this structure, during signal processing of an image compression and decompression algorithm needing a large amount of processing, the load is distributed between software and hardware, so that the signal-processing apparatus can realize high processing capability and flexibility.

Collision-free hashing for accessing cryptographic computing metadata and for cache expansion

Embodiments are directed to collision-free hashing for accessing cryptographic computing metadata and for cache expansion. An embodiment of an apparatus includes one or more processors to: receive a physical address; compute a set of hash functions using a set of different indexes corresponding to the set of hash functions, wherein the set of hash functions combine additions, bit-level reordering, bit-linear mixing, and wide substitutions, wherein the plurality of hash functions differ in the bit-linear mixing; access a plurality of cache units utilizing the set of hash functions; read different sets of the plurality of cache units in parallel, where a set of the different sets is obtained from each cache unit of the plurality of cache units; and responsive to the physical address being located one of the different sets, return cache line data of the set corresponding to the set of the cache unit having the physical address.

Method for Scheduling Feature Services with a Distributed Data Flow Service Framework
20230020772 · 2023-01-19 ·

The present disclosure generally relates to dataflow applications. In aspects, a system is disclosed for scheduling execution of feature services within a distributed data flow service (DDFS) framework. Further, the DDFS framework includes a main system-on-chip (SoC), at least one sensing service, and a plurality of feature services. Each of the plurality of feature services include a common pattern with an algorithm for processing the input data, a feature for encapsulating the algorithm into a generic wrapper rendering the algorithm compatible with other algorithms, a feature interface for encapsulating a feature output into a generic interface allowing generic communication with other feature services, and a configuration file including a scheduling policy to execute the feature services. For each of the plurality of feature services, processor(s) schedule the execution of a given feature service using the scheduling policy and execute a given feature service on the standard and/or accelerator cores.

Machine learning model with conditional execution of multiple processing tasks

A method includes receiving input data at a trained machine learning model that includes a common part and task-specific parts, receiving an execution instruction that identifies one or more processing tasks to be performed, processing the input data using the common part of the trained machine learning model to generate intermediate data, and processing the intermediate data using one or more of the task-specific parts of the trained machine learning model based on the execution instruction to generate one or more outputs.

Execution circuits using discardable state
11550651 · 2023-01-10 · ·

There is provided execution circuitry. Storage circuitry retains a stored state of the execution circuitry. Operation receiving circuitry receives, from issue circuitry, an operation signal corresponding to an operation to be performed that accesses the stored state of the execution circuitry from the storage circuitry. Functional circuitry seeks to perform the operation in response to the operation signal by accessing the stored state of the execution circuitry from the storage circuitry. Delete request receiving circuitry receives a deletion signal and in response to the deletion signal, deletes the stored state of the execution circuitry from the storage circuitry. State loss indicating circuitry responds to the operation signal when the stored state of the execution circuitry is not present and is required for the operation by indicating an error. In addition, there is provided a data processing apparatus comprising issue circuitry to issue an operation to execution circuitry. The execution circuitry stores a stored state that is accessed during performance of the operation and error detecting circuitry detects an indication of an error from the execution circuitry that the stored state is required for performance of the operation and that the stored state has been deleted.

Sharing preprocessing, computations, and hardware resources between multiple neural networks
11551095 · 2023-01-10 · ·

A method for training a Neural-Network (NN), the method includes receiving a plurality of NN training tasks, each training task including (i) a respective preprocessing phase that preprocesses data to be provided as input data to the NN, and (ii) a respective computation phase that trains the NN using the preprocessed data. The plurality of NN training tasks is executed, including: (a) a commonality is identified between the input data required by computation phases of two or more of the training tasks, and (b) in response to identifying the commonality, one or more preprocessing phases are executed that produce the input data jointly for the two or more training tasks.