Patent classifications
G06F9/3856
Extended memory architecture
Systems, apparatuses, and methods related to extended memory communication subsystems for performing extended memory operations are described. An example apparatus can include a plurality of computing devices. Each of the computing devices can include a processing unit configured to perform an operation on a block of data, and a memory array configured as a cache for each respective processing unit. The example apparatus can further include a first communication subsystem coupled to a host and to each of the plurality of communication subsystems. The example apparatus can further include a plurality of second communication subsystems coupled to each of the plurality of computing devices. Each of the plurality of computing devices can be configured to receive a request from the host, send a command to execute at least a portion of the operation, and receive a result of performing the operation from the at least one hardware accelerator.
PTIVE RESOURCE PROVISIONING FOR A MULTI-TENANT DISTRIBUTED EVENT DATA STORE
Systems and methods for adaptively provisioning a distributed event data store of a multi-tenant architecture are provided. According to one embodiment, a managed security service provider (MSSP) maintains a distributed event data store on behalf of each tenant of the MSSP. For each tenant, the MSSP periodically determines a provisioning status for a current active partition of the distributed event data store of the tenant. Further, when the determining indicates an under-provisioning condition exits, the MSSP dynamically increases number of resource provision units (RPUs) to be used for a new partition to be added to the partitions for the tenant by a first adjustment ratio. While, when the determining indicates an over-provisioning condition exists, the MSSP dynamically decreases the number of RPUs to be used for subsequent partitions added to the partitions for the tenant by a second adjustment ratio.
DYNAMIC, LOW-LATENCY, DEPENDENCY-AWARE SCHEDULING ON SIMD-LIKE DEVICES FOR PROCESSING OF RECURRING AND NON-RECURRING EXECUTIONS OF TIME-SERIES DATA
An apparatus for parallel processing includes a memory and one or more processors, at least one of which operates a single instruction, multiple data (SIMD) model, and each of which are coupled to the memory. The processors are configured to process data samples associated with one or multiple chains or graphs of data processors, which chains or graphs describe processing steps to be executed repeatedly on data samples that are a subset of temporally ordered samples. The processors are additionally configured to dynamically schedule one or multiple sets of the samples associated with the one or multiple chains or graphs of data processors to reduce latency of processing of the data samples associated with a single chain or graph of data processors or different chains and graphs of data processors.
IMPLEMENTATION METHOD AND SYSTEM OF RISC_V VECTOR INSTRUCTION SET VSETVLI INSTRUCTION
The invention relates to the technical field of CPUs, in particular to a method and system for implementing a risc_v vector instruction set vsetvli instruction. it allocates vectag[n:0] information in the rename module when the CPU executes out of order, and determines whether the instruction is vsetvli. If the instruction is vsetvli, vectag+1 is added. If it is a non-vsetvli instruction, the vectag remains unchanged; it is sent to the execution unit, and the vsetvli instruction is distributed to the csr module; and the corresponding other vector instructions are distributed to the vpu module. The non-vsetvli{i} Vector instruction execution efficiency of the present invention is high. Data is selected by mask, which reduces power consumption, reduces execution cycle and latency, and has strong market application prospects.
System and method of VLIW instruction processing using reduced-width VLIW processor
Very long instruction word (VLIW) instruction processing using a reduced-width processor is disclosed. In a particular embodiment, a VLIW processor includes a control circuit configured to receive a VLIW packet that includes a first number of instructions and to distribute the instructions to a second number of instruction execution paths. The first number is greater than the second number. The VLIW processor also includes physical registers configured to store results of executing the instructions and a register renaming circuit that is coupled to the control circuit.
Noisy instructions for side-channel attack mitigation
Described herein are systems and methods using noisy instructions for side-channel attack mitigation. For example, some methods include fetching an instruction from a memory into a processor pipeline of a processor core that is configured to execute instructions using an architectural state of the processor core; generating a random number; fissioning the instruction into a set of micro-operations that includes one or more micro-operations that perform the instruction and the random number of noisy micro-operations, wherein each of the noisy micro-operations does not affect the architectural state; executing the set of micro-operations using one or more execution units of the processor pipeline; and, retiring, responsive to completion of execution of the set of micro-operations, the instruction.
Storing multiple instructions in a single reordering buffer entry
Embodiments of the present disclosure provide an instruction processing apparatus, comprising an instruction decoding circuitry configured to decode a set of instructions; a buffer comprising one or more buffer entries associated with the set of instructions, wherein the one or more buffer entries are configured to store information corresponding to at least one instruction of the set of instructions decoded by the instruction decoding circuitry; and an instruction executing circuitry configured to execute the at least one instruction, wherein a buffer entry storing the information corresponding to the at least one instruction is updated to indicate that the at least one instruction has been executed to enable retiring the set of instructions after the set of instructions have been executed.
Systems, methods, and devices for queue availability monitoring
A method may include determining, with a queue availability module, that an entry is available in a queue, asserting a bit in a register based on determining that an entry is available in the queue, determining, with a processor, that the bit is asserted, and processing, with the processor, the entry in the queue based on determining that the bit is asserted. The method may further include storing the register in a tightly coupled memory associated with the processor. The method may further include storing the queue in the tightly coupled memory. The method may further include determining, with the queue availability module, that an entry is available in a second queue, and asserting a second bit in the register based on determining that an entry is available in the second queue. The method may further include finding the first bit in the register using a find first instruction.
SYSTEMS, METHODS, AND DEVICES FOR QUEUE AVAILABILITY MONITORING
A method may include determining, with a queue availability module, that an entry is available in a queue, asserting a bit in a register based on determining that an entry is available in the queue, determining, with a processor, that the bit is asserted, and processing, with the processor, the entry in the queue based on determining that the bit is asserted. The method may further include storing the register in a tightly coupled memory associated with the processor. The method may further include storing the queue in the tightly coupled memory. The method may further include determining, with the queue availability module, that an entry is available in a second queue, and asserting a second bit in the register based on determining that an entry is available in the second queue. The method may further include finding the first bit in the register using a find first instruction.
DYNAMIC GRAPHICAL PROCESSING UNIT REGISTER ALLOCATION
Systems, apparatuses, and methods for dynamic graphics processing unit (GPU) register allocation are disclosed. A GPU includes at least a plurality of compute units (CUs), a control unit, and a plurality of registers for each CU. If a new wavefront requests more registers than are currently available on the CU, the control unit spills registers associated with stack frames at the bottom of a stack since they will not likely be used in the near future. The control unit has complete flexibility determining how many registers to spill based on dynamic demands and can prefetch the upcoming necessary fills without software involvement. Effectively, the control unit manages the physical register file as a cache. This allows younger workgroups to be dynamically descheduled so that older workgroups can allocate additional registers when needed to ensure improved fairness and better forward progress guarantees.