Patent classifications
G06F2209/507
Dynamic update of the number of architected registers assigned to software threads using spill counts
A computer system includes a processor, main memory, and controller. The processor includes a plurality of hardware threads configured to execute a plurality of software threads. The main memory includes a first register table configured to contain a current set of architected registers for the currently running software threads. The controller is configured to change a first number of the architected registers assigned to a given one of the software threads to a second number of architected registers when a result of monitoring current usage of the registers by the software threads indicates that the change will improve performance of the computer system. The processor includes a second register table configured to contain a subset of the architected registers and a mapping table for each software thread indicating whether the architected registers referenced by the corresponding software thread are located in the first register table or the second register table.
System, Apparatus And Method For Providing Power Monitoring Isolation In A Processor
In one embodiment, a processor comprises a plurality of cores and a controller. Each of the plurality of cores may include: an execution circuit, a power measurement circuit to measure power consumption of the core and a first register to store a power-related context identifier to identify a process to be executed on the core. The controller may include: a plurality of energy status registers each associated with a power-related context identifier and to store energy consumption information of a process; and a control circuit coupled to the plurality of energy status registers, where the control circuit is to enable each of a plurality of processes to independently monitor the energy consumption information of the process and prevent each of the plurality of processes from monitoring the energy consumption information of other ones of the plurality of processes. Other embodiments are described and claimed.
FPGA search in a cloud compute node
Implementations described herein identify and exploit opportunities for offloading search-time and/or index-time operations to programmed offloading hardware accelerators (POHAs). An event-based data intake and query system is implemented in an enterprise core that is in communication with the POHAs over network interfaces. The system receives search requests associated with search-time operations classified into off-loadable operations and non-off-loadable operations. Non-off-loadable operations are distributed to local processing resources, and off-loadable operations are distributed to the POHAs for offloaded processing. The system can post-process both the locally processed and offload-processed results to generate search results responsive to at least some of the received search requests.
TASK EXECUTION IN A SIMD PROCESSING UNIT WITH PARALLEL GROUPS OF PROCESSING LANES
A SIMD processing unit processes a plurality of tasks which each include up to a predetermined maximum number of work items. The work items of a task are arranged for executing a common sequence of instructions on respective data items. The data items are arranged into blocks, with some of the blocks including at least one invalid data item. Work items which relate to invalid data items are invalid work items. The SIMD processing unit comprises a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles. A control module assembles work items into the tasks based on the validity of the work items, so that invalid work items of the particular task are temporally aligned across the processing lanes. In this way the number of wasted processing slots due to invalid work items may be reduced.
Task execution in a SIMD processing unit with parallel groups of processing lanes
A SIMD processing unit processes a plurality of tasks which each include up to a predetermined maximum number of work items. The work items of a task are arranged for executing a common sequence of instructions on respective data items. The data items are arranged into blocks, with some of the blocks including at least one invalid data item. Work items which relate to invalid data items are invalid work items. The SIMD processing unit comprises a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles. A control module assembles work items into the tasks based on the validity of the work items, so that invalid work items of the particular task are temporally aligned across the processing lanes. In this way the number of wasted processing slots due to invalid work items may be reduced.
AI Accelerator Virtualization
An AI (Artificial Intelligence) processor for Neural Network (NN) Processing shared by multiple users is disclosed. The AI processor comprises a Multiplier Unit (MXU), a Scalar Computing Unit (SCU), a unified buffer coupled to the MXU and SCU to store data and a control circuitry coupled to the CCU and the unified buffer. The MXU comprises a plurality of Processing Elements (PEs) responsible for computing matrix multiplications. The SCU coupled to output of the MXU is responsible for computing the activation function. The control circuitry is configured to perform the space division and time division NN processing for a plurality of users. At one time instance, at least one of the MXU and SCU is shared by two or more users; and at least one user is using a part of the MXU while the other user is using a part of the SCU.
RESOURCE ALLOCATION IN A PARALLEL PROCESSING SYSTEM
A method of resource allocation in a parallel processing system is described. The method comprises receiving a request to allocate resources to a task, where the request identifies an amount of resources required to execute a next chunk of the task only, and when available, allocating to the task, the amount of resources required to execute the next chunk of the task.
APPARATUS, METHOD, AND SYSTEM FOR ENSURING QUALITY OF SERVICE FOR MULTI-THREADING PROCESSOR CORES
A simultaneous multi-threading (SMT) processor core capable of thread-based biasing with respect to execution resources. The SMT processor includes priority controller circuitry to determine a thread priority value for each of a plurality of threads to be executed by the SMT processor core and to generate a priority vector comprising the thread priority value of each of the plurality of threads. The SMT processor further includes thread selector circuitry to make execution cycle assignments of a pipeline by assigning to each of the plurality of threads a portion of the pipeline's execution cycles based on each thread's priority value in the priority vector. The thread selector circuitry is further to select, from the plurality of threads, tasks to be processed by the pipeline based on the execution cycle assignments.
Apparatus, method, and system for ensuring quality of service for multi-threading processor cores
A simultaneous multi-threading (SMT) processor core capable of thread-based biasing with respect to execution resources. The SMT processor includes priority controller circuitry to determine a thread priority value for each of a plurality of threads to be executed by the SMT processor core and to generate a priority vector comprising the thread priority value of each of the plurality of threads. The SMT processor further includes thread selector circuitry to make execution cycle assignments of a pipeline by assigning to each of the plurality of threads a portion of the pipeline's execution cycles based on each thread's priority value in the priority vector. The thread selector circuitry is further to select, from the plurality of threads, tasks to be processed by the pipeline based on the execution cycle assignments.
Dynamic update of the number of architected registers assigned to software threads using spill counts
A computer system includes a processor, main memory, and controller. The processor includes a plurality of hardware threads configured to execute a plurality of software threads. The main memory includes a first register table configured to contain a current set of architected registers for the currently running software threads. The controller is configured to change a first number of the architected registers assigned to a given one of the software threads to a second number of architected registers when a result of monitoring current usage of the registers by the software threads indicates that the change will improve performance of the computer system. The processor includes a second register table configured to contain a subset of the architected registers and a mapping table for each software thread indicating whether the architected registers referenced by the corresponding software thread are located in the first register table or the second register table.