G06F9/38

Transaction-enabling systems and methods for customer notification regarding facility provisioning and allocation of resources

The present disclosure describes transaction-enabling systems and methods. A system can include a facility including a core task including a customer relevant output and a controller. The controller may include a facility description circuit to interpret a plurality of historical facility parameter values and corresponding facility outcome values and a facility prediction circuit to operate an adaptive learning system, wherein the adaptive learning system is configured to train a facility production predictor in response to the historical facility parameter values and the corresponding outcome values. The facility description circuit also interprets a plurality of present state facility parameter values, wherein the trained facility production predictor determines a customer contact indicator in response to the plurality of present state facility parameter values and a customer notification circuit provides a notification to a customer in response.

ARITHMETIC PROCESSING DEVICE AND ARITHMETIC PROCESSING METHOD
20230010536 · 2023-01-12 · ·

An arithmetic processing device includes: a memory; and a processor coupled to the memory and configured to: execute a plurality of data processes each of which is divided into a plurality of pipeline stages in parallel at different timings; measure a processing time of each of the plurality of pipeline stages; and set a priority of the plurality of pipeline stages in a descending order of the measured processing time.

Writeback Hazard Elimination
20230011446 · 2023-01-12 ·

A processor includes a processing pipeline, a plurality of result-storage elements, and writeback logic. The processing pipeline is configured to process program operations and to write, to a result storage, up to a predefined maximal number of results of the processed program operations per clock cycle. The result-storage elements are configured to store respective ones of the results. The writeback logic is configured to (i) detect a writeback conflict event in which the processing pipeline produces simultaneous results that exceed the predefined maximal number of results, for writing to the result storage, in a same clock cycle, (ii) in response to detecting the writeback conflict event, to temporarily store at least a given result, from among the simultaneous results, in a given result-storage element, and (iii) to subsequently write the temporarily-stored given result from the given result-storage element to the result storage.

Hardware accelerated dynamic work creation on a graphics processing unit

A processor core is configured to execute a parent task that is described by a data structure stored in a memory. A coprocessor is configured to dispatch a child task to the at least one processor core in response to the coprocessor receiving a request from the parent task concurrently with the parent task executing on the at least one processor core. In some cases, the parent task registers the child task in a task pool and the child task is a future task that is configured to monitor a completion object and enqueue another task associated with the future task in response to detecting the completion object. The future task is configured to self-enqueue by adding a continuation future task to a continuation queue for subsequent execution in response to the future task failing to detect the completion object.

AUTOMATIC COMPUTE KERNEL GENERATION

Apparatuses, systems, and techniques to receive, by a processor of a computer system, one or more operations for a kernel; automatically generate, by the processor, one or more operators that perform the one or more operations on elements of one or more input data structures; and automatically generate, by the processor, the kernel that comprises the one or more operators.

COMPUTING DEVICE WITH ONE OR MORE HARDWARE ACCELERATORS DIRECTLY COUPLED WITH CLUSTER OF PROCESSORS
20230037780 · 2023-02-09 ·

A computing device having a tightly attached or closely attached hardware accelerator directly coupled with one or more processors for efficient uses of the hardware accelerator for executing specific functions are described. According to an embodiment, the hardware accelerator is instantiated inside the main processor unit and interfaces to a load-store unit (LS) using virtual addresses. The hardware accelerator instantiated inside the main processing unit (e.g., core) is referred to as a tightly attached hardware accelerator. In an alternative embodiment, the hardware accelerator is instantiated inside a cluster of processor cores. The hardware accelerator that is instantiated inside the cluster of processor cores but not inside a specific processor core is referred to as a closely attached hardware accelerator.

Built-in self-test for a programmable vision accelerator of a system on a chip

In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

PMEM cache RDMA security

Techniques are described for providing one or more clients with direct access to cached data blocks within a persistent memory cache on a storage server. In an embodiment, a storage server maintains a persistent memory cache comprising a plurality of cache lines, each of which represent an allocation unit of block-based storage. The storage server maintains an RDMA table that include a plurality of table entries, each of which maps a respective client to one or more cache lines and a remote access key. An RDMA access request to access a particular cache line is received from a storage server client. The storage server identifies access credentials for the client and determines whether the client has permission to perform the RDMA access on the particular cache line. Upon determining that the client has permissions, the cache line is accessed from the persistent memory cache and sent to the storage server client.

SHARED UNIT INSTRUCTION EXECUTION

A data processing apparatus comprises receiver circuitry for receiving instructions from each of a plurality of requester devices. Processing circuitry executes the instructions associated with each of a subset of the requester devices at a time and arbitration circuitry determines the subset of the requester devices and causes the instructions associated with each of the subset of the requester devices to be executed next. In response to the receiver circuitry receiving an instruction of a predetermined type from one of the requester devices outside the subset of requester devices, the arbitration circuitry causes the instruction of the predetermined type to be executed next.

Iterating group sum of multiple accumulate operations
11593114 · 2023-02-28 · ·

Methods, systems and apparatuses for performing walk operations of single instruction, multiple data (SIMD) instructions are disclosed. One method includes initiating, by a scheduler, a SIMD thread, where the scheduler is operative to schedule the SIMD thread. The method further includes fetching a plurality of instructions for the SIMD thread. The method further includes determining, by a thread arbiter, at least one instruction that is a walk instruction, where the walk instruction iterates a block of instructions for a subset of channels of the SIMD thread, where the walk instruction includes a walk size, and where the walk size is a number of channels in the subset of channels of the SIMD thread that are processed in a walk iteration in association with the walk instruction. The method further includes executing the walk instruction based on the walk size.