G06F12/0842

PARALLEL PROCESSOR OPTIMIZED FOR MACHINE LEARNING
20230008138 · 2023-01-12 ·

A parallel processor system for machine learning includes an arithmetic unit (ALU) array including several ALUs and a controller to provide instructions for the ALUs. The system includes a direct-access memory (DMA) block containing multiple DMA engines to access an external memory to retrieve data. An input-stream buffer decouples the DMA block from the ALU array and provides aligning and reordering of the retrieved data. The DMA engines operate in parallel and include rasterization logic capable of performing a three-dimensional (3-D) rasterization.

METHOD OF SCHEDULING CACHE BUDGET IN MULTI-CORE PROCESSING DEVICE AND MULTI-CORE PROCESSING DEVICE PERFORMING THE SAME

A method is provided. The method includes: receiving a plurality of characteristic information associated with a plurality of tasks allocated to a plurality of processor cores; monitoring a task execution environment while the plurality of processor cores perform the plurality of tasks based on at least one operating condition; and allocating a plurality of cache areas of at least one cache memory to the plurality of processor cores based on the plurality of characteristic information and the task execution environment. Sizes of the plurality of cache areas are set differently for the plurality of processor cores.

Techniques for increasing the isolation of workloads within a multiprocessor instance
11693708 · 2023-07-04 · ·

In various embodiments, an isolation application determines processor assignment(s) based on a performance cost estimate. The performance cost estimate is associated with an estimated level of cache interference arising from executing a set of workloads on a set of processors. Subsequently, the isolation application configures at least one processor included in the set of processors to execute at least a portion of a first workload that is included in the set of workloads based on the processor assignment(s). Advantageously, because the isolation application generates the processor assignment(s) based on the performance cost estimate, the isolation application can reduce interference in a non-uniform memory access (NUMA) microprocessor instance.

Techniques for increasing the isolation of workloads within a multiprocessor instance
11693708 · 2023-07-04 · ·

In various embodiments, an isolation application determines processor assignment(s) based on a performance cost estimate. The performance cost estimate is associated with an estimated level of cache interference arising from executing a set of workloads on a set of processors. Subsequently, the isolation application configures at least one processor included in the set of processors to execute at least a portion of a first workload that is included in the set of workloads based on the processor assignment(s). Advantageously, because the isolation application generates the processor assignment(s) based on the performance cost estimate, the isolation application can reduce interference in a non-uniform memory access (NUMA) microprocessor instance.

Semiconductor device, control system, and control method of semiconductor device

A semiconductor device includes first and second CPUs, first and second SPUs for controlling a snoop operation, a controller supporting ASIL D of a functional safety standard and a memory. The controller sets permission of the snoop operation to the first and second SPUs when a software lock-step is not performed. The controller sets prohibition of the snoop operation to the first and second SPUs when the software lock-step is performed. The first CPU executes a first software for the software lock-step, and writes an execution result in a first area for the memory. The second CPU executes a second software for the software lock-step, and writes an execution result in a second area of the memory. The execution result written in the first area is compared with the execution result written in the second area.

Semiconductor device, control system, and control method of semiconductor device

A semiconductor device includes first and second CPUs, first and second SPUs for controlling a snoop operation, a controller supporting ASIL D of a functional safety standard and a memory. The controller sets permission of the snoop operation to the first and second SPUs when a software lock-step is not performed. The controller sets prohibition of the snoop operation to the first and second SPUs when the software lock-step is performed. The first CPU executes a first software for the software lock-step, and writes an execution result in a first area for the memory. The second CPU executes a second software for the software lock-step, and writes an execution result in a second area of the memory. The execution result written in the first area is compared with the execution result written in the second area.

COORDINATING DYNAMIC POWER SCALING OF AGENTS BASED ON POWER CORRELATIONS OF AGENT INSTRUCTIONS
20220413578 · 2022-12-29 ·

Coordinating dynamic power scaling of agents based on power correlations of agent instructions is disclosed. A global power controller determines a first local power quantifier of a first agent executing an agent instruction of a task of a workload. The global power controller stores a correlation between the first agent executing the agent instruction and a second local power quantifier corresponding to a second agent. The global power controller subsequently determines that the first agent is executing or will execute the agent instruction. The global power controller accesses the correlation associated with the first agent executing the agent instruction and sends to the second agent a proposed power level based on the correlation.

COORDINATING DYNAMIC POWER SCALING OF AGENTS BASED ON POWER CORRELATIONS OF AGENT INSTRUCTIONS
20220413578 · 2022-12-29 ·

Coordinating dynamic power scaling of agents based on power correlations of agent instructions is disclosed. A global power controller determines a first local power quantifier of a first agent executing an agent instruction of a task of a workload. The global power controller stores a correlation between the first agent executing the agent instruction and a second local power quantifier corresponding to a second agent. The global power controller subsequently determines that the first agent is executing or will execute the agent instruction. The global power controller accesses the correlation associated with the first agent executing the agent instruction and sends to the second agent a proposed power level based on the correlation.

FROZEN TIME CACHE FOR MULTI-HOST READ OPERATIONS
20220405206 · 2022-12-22 ·

Aspects of a storage device including a memory and a controller are provided. The controller may receive a prefetch request to retrieve data for a host having a promoted stream. The controller may access a frozen time table indicating hosts for which data has been prefetched and frozen times associated with the host and other hosts. The controller can determine whether the host has a higher priority over other hosts included in the frozen time table based on corresponding frozen times and data access parameters associated with the host. The controller may determine to prefetch the data for the host in response to the prefetch request when the host has a higher priority than the other hosts. The controller can receive a host read command associated with the promoted stream from the host and provide the prefetched data to the host in response to the host read command.

APPARATUSES, SYSTEMS, AND METHODS FOR CONFIGURING COMBINED PRIVATE AND SHARED CACHE LEVELS IN A PROCESSOR-BASED SYSTEM

Apparatuses, systems, and methods for configuring combined private and shared cache levels in a processor-based system. The processor-based system includes a processor that includes a plurality of processing cores each including execution circuits which are coupled to respective cache(s) and a configurable combined private and shared cache, and which may receive instructions and data on which to perform operations from the cache(s) and the combined private and shared cache. A shared cache portion of each configurable combined private and shared cache can be treated as an independently-assignable portion of the overall shared cache, which is effectively the shared cache portions of all of the processing cores. Each independently-assignable portion of the overall shared cache can be associated with a particular client running on the processor as an example. This approach can provide greater granularity of cache partitioning of a shared cache between particular clients running on a processor.