G06F9/3867

Analytics, Algorithm Architecture, and Data Processing System and Method
20250231718 · 2025-07-17 ·

A system and method employing a distributed hardware architecture, either independently or in cooperation with an attendant data structure, in connection with various data processing strategies and data analytics implementations are disclosed. A compute node may be implemented independent of a host compute system to manage and to execute data processing operations. Additionally, an unique algorithm architecture and processing system and method are also disclosed. Different types of nodes may be implemented, either independently or in cooperation with an attendant data structure, in connection with various data processing strategies and data analytics implementations.

CACHE SUPPORT FOR INDIRECT LOADS AND INDIRECT STORES IN GRAPH APPLICATIONS

Techniques for operating on an indirect memory access instruction, where the instruction accesses a memory location via at least one indirect address. A pipeline processes the instruction and a memory operation engine generates a first access to the at least one indirect address and a second access to a target address determined by the at least one indirect address. A cache memory used with the pipeline and the memory operation engine caches pointers. In response to a cache hit when executing the indirect memory access instruction, operations dereference a pointer to obtain the at least one indirect address, not set a cache bit, and return data for the instruction without storing the data in the cache memory; and in response to a cache miss, operations set the cache bit, obtain, and store a cache line for a missed pointer, and return data without storing the data in the cache memory.

Technology For Optimizing Memory-To-Register Operations

An apparatus comprises decoder circuitry to decode an instruction that includes an opcode to indicate a protected load operation, a source field for source memory address information, and a destination field to identify a destination register. The apparatus also comprises memory to store an allocate load-protect (LP) data structure with an entry for the identified destination register. The entry comprises an IP field and a status field. The apparatus also comprises load elision circuitry to (a) use the allocate LP data structure to determine whether the identified destination register has active status for the IP; (b) in response to determining that the identified destination register has active status for the IP, cause the instruction to be elided; and (c) in response to determining that the identified destination register does not have active status for the IP, cause the instruction to be executed. Other embodiments are described and claimed.

ROUTING INSTRUCTIONS IN A MICROPROCESSOR

A computer system, processor, programming instructions and/or method for balancing the workload of processing pipelines that includes an execution slice, the execution slice comprising at least two processing pipelines having one or more execution units for processing instructions, wherein at least a first processing pipeline and a second processing pipeline are capable of executing a first instruction type; and an instruction decode unit for decoding instructions to determine which of the first processing pipeline or the second processing pipeline to execute the first instruction type. The processor configured to calculate at least one of a workload group consisting of: the first processing pipeline workload, the second processing pipeline workload, and combinations thereof; and select the first processing pipeline or the second processing pipeline to execute the first instruction type based upon at least one of the workload group.

ADAPTIVE HYBRID POLLING BASED ON OUTSTANDING INPUT/OUTPUT (I/O) DETERMINATION
20220414035 · 2022-12-29 · ·

An adaptive hybrid polling technique combines an interrupt mode with a polling mode, and is based on outstanding input/output (OIO) determination to improve I/O performance and to save processor cycles. The OIO includes two types of I/O commands: (1) I/O commands submitted to storage devices for processing, and (2) I/O commands completed by the storage devices but not yet acknowledged by host software. The adaptive hybrid polling technique involves two phases to determine when to poll based on current OIO commands. In the first phase, a determination is made whether there is an adequate number of the first type of OIO commands to prepare for polling. In the second phase, a determination is made whether there is an adequate number of the second type of OIS commands to activate polling.

Scheduling tasks using swap flags

A method of activating scheduling instructions within a parallel processing unit is described. The method comprises decoding, in an instruction decoder, an instruction in a scheduled task in an active state and checking, by an instruction controller, if a swap flag is set in the decoded instruction. If the swap flag in the decoded instruction is set, a scheduler is triggered to de-activate the scheduled task by changing the scheduled task from the active state to a non-active state.

DECOUPLED ACCESS-EXECUTE PROCESSING
20220391214 · 2022-12-08 ·

An apparatus comprises first instruction execution circuitry, second instruction execution circuitry, and a decoupled access buffer. Instructions of an ordered sequence of instructions are issued to one of the first and second instruction execution circuitry for execution in dependence on whether the instruction has a first type label or a second type label. An instruction with the first type label is an access-related instruction which determines at least one characteristic of a load operation to retrieve a data value from a memory address. Instruction execution by the first instruction execution circuitry of instructions having the first type label is prioritised over instruction execution by the second instruction execution circuitry of instructions having the second type label. Data values retrieved from memory as a result of execution of the first type instructions are stored in the decoupled access buffer.

Flushing of instructions based upon a finish ratio and/or moving a flush point in a processor

Processing data in an information handling system is disclosed that includes: in response to an event that triggers a flushing operation, calculate a finish ratio, wherein the finish ratio is a number of finished operations to a number of at least one of the group consisting of in-flight instructions, instructions pending in a processor pipeline, instructions issued to an issue queue, and instructions being processed in a processor execution unit; compare the calculated finish ratio to a threshold; and if the finish ratio is greater than the threshold, then do not perform the flushing operation. Also disclosed is moving the flush point.

Pipeline flattener with conditional triggers

A semiconductor device comprising a processor having a pipelined architecture and a pipeline flattener and a method for operating a pipeline flattener in a semiconductor device are provided. The processor comprises a pipeline having a plurality of pipeline stages and a plurality of pipeline registers that are coupled between the pipeline stages. The pipeline flattener comprises a plurality of trigger registers for storing a trigger, wherein the trigger registers are coupled between the pipeline stages.

Pipeline set selection based on duty cycle estimation

A computer implemented system is described for assigning executable jobs to pipeline sets, whereby the jobs may be network based computer jobs. The assigning includes generating a weight for each pipeline set of multiple pipeline sets to obtain multiple weights. Generating a weight includes obtaining duty cycle metrics for pipeline software threads in the pipeline set. The duty cycle metrics include a measure of an amount of time that a corresponding pipeline thread is executing and actively processing data. Generating the weight further includes determining the weight for the pipeline set based at least in part on the duty cycle metrics. The method further includes assigning a job request to a target pipeline set selected from the pipeline sets according to a weighted random algorithm, wherein the weighted random algorithm uses the weights.