G06F9/3802

Unified register file for supporting speculative architectural states
11467839 · 2022-10-11 · ·

A method for supporting architecture speculation in an out of order processor is disclosed. The method comprises fetching two threads into the processor, wherein a first thread executes in a speculative state and a second thread executes in a non-speculative state. The method also comprises enabling a speculative scope for an execution of the first thread and a non-speculative scope for an execution of the second thread in an architecture of the processor, wherein the speculative scope and the non-speculative scope can both be fetched into the architecture and be present concurrently.

Livelock recovery circuit for detecting illegal repetition of an instruction and transitioning to a known state

Livelock recovery circuits configured to detect livelock in a processor, and cause the processor to transition to a known safe state when livelock is detected. The livelock recovery circuits include detection logic configured to detect that the processor is in livelock when the processor has illegally repeated an instruction; and transition logic configured to cause the processor to transition to a safe state when livelock has been detected by the detection logic.

SMALL FILE RESTORE PERFORMANCE IN A DEDUPLICATION FILE SYSTEM
20230114100 · 2023-04-13 ·

Embodiments of small file restore process in deduplication file system wherein restoration requires issuing a read request within an I/O request to the file system. The process places the small files in a prefetch queue such that a combined size of the small files meets or exceeds a size of the prefetch queue as defined by a prefetch horizon. A queue processor issues a read request for the first file in the queue, scans the prefetch queue to find a read request for a file at the prefetch horizon, and prefetches the file at the prefetch horizon. The prefetch queue essentially constitutes a hint from the client that a read I/O is imminent for purposes of filling the read-ahead cache and preventing a need to issue a blocking I/O operation.

Noisy instructions for side-channel attack mitigation
11604873 · 2023-03-14 · ·

Described herein are systems and methods using noisy instructions for side-channel attack mitigation. For example, some methods include fetching an instruction from a memory into a processor pipeline of a processor core that is configured to execute instructions using an architectural state of the processor core; generating a random number; fissioning the instruction into a set of micro-operations that includes one or more micro-operations that perform the instruction and the random number of noisy micro-operations, wherein each of the noisy micro-operations does not affect the architectural state; executing the set of micro-operations using one or more execution units of the processor pipeline; and, retiring, responsive to completion of execution of the set of micro-operations, the instruction.

Methods and apparatus for thread-based scheduling in multicore neural networks
11625592 · 2023-04-11 · ·

Systems, apparatus, and methods for thread-based scheduling within a multicore processor. Neural networking uses a network of connected nodes (aka neurons) to loosely model the neuro-biological functionality found in the human brain. Various embodiments of the present disclosure use thread dependency graphs analysis to decouple scheduling across many distributed cores. Rather than using thread dependency graphs to generate a sequential ordering for a centralized scheduler, the individual thread dependencies define a count value for each thread at compile-time. Threads and their thread dependency count are distributed to each core at run-time. Thereafter, each core can dynamically determine which threads to execute based on fulfilled thread dependencies without requiring a centralized scheduler.

Apparatus and method for managing prefetch transactions

An apparatus and method are provided for managing prefetch transactions. The apparatus has an interconnect for providing communication paths between elements coupled to the interconnect. The elements coupled to the interconnect comprise at least a requester element to initiate transactions, and a plurality of completer elements each of which is arranged to respond to a transaction received by that completer element. Congestion tracking circuitry maintains, in association with the requester element, a congestion indication for each of a plurality of routes through the interconnect used to propagate transactions initiated by that requester element. Each route comprises one or more communication paths, and the route employed to propagate a given transaction is dependent on a target completer element for that transaction. Prefetch throttling circuitry then identifies, in response to an indication of a given prefetch transaction that the requester element wishes to initiate, the target completer element amongst the plurality of completer elements to which that given prefetch transaction would be issued. It then determines whether to issue the given prefetch transaction in dependence on the congestion indication for the route that has been determined.

Methods, systems and apparatus to reduce memory latency when fetching pixel kernels
11620726 · 2023-04-04 · ·

Methods, systems, apparatus, and articles of manufacture to reduce memory latency when fetching pixel kernels are disclosed. An example apparatus includes first interface circuitry to receive a first request from a hardware accelerator at a first time including first coordinates of a first pixel disposed in a first image block, second interface circuitry to receive a second request including second coordinates from the hardware accelerator at a second time after the first time, and kernel retriever circuitry to, in response to the second request, determine whether the first image block is in cache storage based on a mapping of the second coordinates to a block tag, and, in response to determining that the first image block is in the cache storage, access, in parallel, two or more memory devices associated with the cache storage to transfer a plurality of image blocks including the first image block to the hardware accelerator.

Systems and methods for improving cache efficiency and utilization

Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache coupled to the processing resources. The cache controller is configured to control cache priority by determining whether default settings or an instruction will control cache operations for the cache.

METHODS AND APPARATUS FOR THREAD-BASED SCHEDULING IN MULTICORE NEURAL NETWORKS
20230153596 · 2023-05-18 · ·

Systems, apparatus, and methods for thread-based scheduling within a multicore processor. Neural networking uses a network of connected nodes (aka neurons) to loosely model the neuro-biological functionality found in the human brain. Various embodiments of the present disclosure use thread dependency graphs analysis to decouple scheduling across many distributed cores. Rather than using thread dependency graphs to generate a sequential ordering for a centralized scheduler, the individual thread dependencies define a count value for each thread at compile-time. Threads and their thread dependency count are distributed to each core at run-time. Thereafter, each core can dynamically determine which threads to execute based on fulfilled thread dependencies without requiring a centralized scheduler.

METHODS FOR DYNAMIC INSTRUCTION SIMPLIFICATION BASED ON REGISTER VALUE LOCALITY

There is provided methods and devices for dynamically simplifying processor instructions. A method includes receiving, at a computing device, processor instructions and determining, by the computing device, if instruction simplification is enabled for an instruction being processed. The method further includes determining, by the computing device, from an instruction simplification table if the instruction is capable of being simplified and scheduling, by the computing device, a simplified instruction based on the determination from the instruction simplification table. A device includes a processor, and a non-transient computer readable memory having stored thereon instructions which when executed by the processor configure the device to execute the methods disclosed herein.