G06F9/30079

On-circuit data activity monitoring for a systolic array
11442890 · 2022-09-13 · ·

On-circuit data activity monitoring may be performed for a systolic array. A current data activity measurement may be determined for changes in input data for processing at a systolic array and compared with a prior data activity measurement. Based on the comparison, a throttling recommendation may be provided to a management component to determine whether to perform the throttling recommendation.

MAC Processing Pipelines, Circuitry to Configure Same, and Methods of Operating Same

An integrated circuit comprising a plurality MAC processors, interconnected into a linear pipeline, configurable to process input data, wherein each MAC processor includes (A) a multiplier and (B) an accumulator circuit, and (C) a plurality of rotate input data paths, wherein each rotate input data path couples two sequential MAC processors of the linear pipeline including an input of the multiplier circuit of a first MAC processor of sequential MAC processors to an input of the multiplier circuit of the immediately following MAC processor of the associated sequential MAC processors of the pipeline—wherein each rotate input data path is configurable to provide rotate input data from a first MAC processor of sequential MAC processors of the linear pipeline to the immediately following MAC processor of the associated sequential MAC processors thereby forming a serial circular path via the plurality of rotate input data paths.

Cost Effective Storage Management
20220269601 · 2022-08-25 ·

Cost-effective storage management including identifying, by a remote storage consumer, one or more of portions of one or more source objects stored at a remote storage resources; issuing, by the remote storage consumer, a command to the remote storage resources configured to cause the remote storage resources to create a new object comprising the one or more of portions of the one or more source objects; and updating, at the remote storage consumer, a mapping data structure to reference the new object.

Distributed user mode processing
11461137 · 2022-10-04 · ·

A first processing unit such as a graphics processing unit (GPU) pipelines that execute commands and a scheduler to schedule one or more first commands for execution by one or more of the pipelines. The one or more first commands are received from a user mode driver in a second processing unit such as a central processing unit (CPU). The scheduler schedules one or more second commands for execution in response to completing execution of the one or more first commands and without notifying the second processing unit. In some cases, the first processing unit includes a direct memory access (DMA) engine that writes blocks of information from the first processing unit to a memory. The one or more second commands program the DMA engine to write a block of information including results generated by executing the one or more first commands.

Method and apparatus for efficient programmable instructions in computer systems
11422812 · 2022-08-23 · ·

Systems, apparatuses, and methods for implementing as part of a processor pipeline a reprogrammable execution unit capable of executing specialized instructions are disclosed. A processor includes one or more reprogrammable execution units which can be programmed to execute different types of customized instructions. When the processor loads a program for execution, the processor loads a bitfile associated with the program. The processor programs a reprogrammable execution unit with the bitfile so that the reprogrammable execution unit is capable of executing specialized instructions associated with the program. During execution, a dispatch unit dispatches the specialized instructions to the reprogrammable execution unit for execution. The results of other instructions, such as integer and floating point instructions, are available immediately to instructions executing on the reprogrammable execution unit since the reprogrammable execution unit shares the processor registers with the integer and floating point execution units.

COMPUTING CHIP, HASHRATE BOARD AND DATA PROCESSING APPARATUS
20220276868 · 2022-09-01 ·

This disclosure relates to a computing chip, a hashrate board, and a data processing apparatus. The computing chip includes a plurality of operation stages arranged in a pipeline configuration. Each operation stage includes: a first combinational logic circuit occupying a plurality of first cell points adjacent to each other, at least a portion of the first cell points being located in a first incomplete column; one or more second combinational logic circuits each occupying one or more second cell points, at least a portion of the second cell points being located in a second incomplete column; and a plurality of registers each occupying a plurality of third cell points, at least a portion of the third cell points being located in the first incomplete column or the second incomplete column. The first cell points, the second cell points, and third cell points occupy equal areas on the computing chip.

PERFORMING SPECULATIVE ADDRESS TRANSLATION IN PROCESSOR-BASED DEVICES

Performing speculative address translation in processor-based devices is disclosed herein. In one exemplary embodiment, a processor-based device provides a processing element (PE) that defines a speculative translation instruction such as an enqueue instruction for offloading operations to a peripheral device. The speculative translation instruction references a plurality of bytes including one or more virtual memory addresses. After receiving the speculative translation instruction, an instruction decode stage of an execution pipeline circuit of the PE transmits a request for address translation of the virtual memory address to a memory management unit (MMU) of the PE. The MMU then performs speculative address translation of the virtual memory address into a corresponding translated memory address. In some embodiments, any address translation errors encountered are raised to an appropriate exception level, and may be raised synchronously or asynchronously with respect to an operation performed when the speculative translation instruction is executed.

Computing resource allocation with subgraph isomorphism
11461143 · 2022-10-04 · ·

A computing system is provided, including a processor configured to generate a directed weighted graph indicating a plurality of functions configured to be executed on a plurality of communicatively connected processing devices. For each of a plurality of pairs of the functions, the processor may determine a shortest path between the pair of functions. The processor may generate a second graph indicating the plurality of pairs of functions connected by the shortest paths. The processor may receive a pipeline directed acyclic graph (DAG) specifying a data pipeline of a plurality of processing stages. The processor may determine a subgraph isomorphism between the pipeline DAG and the second graph. The processor may convey, to one or more processing devices of the plurality of processing devices, instructions to execute the plurality of processing stages as specified by the subgraph isomorphism.

CACHE STRUCTURE AND UTILIZATION

Embodiments are generally directed to cache structure and utilization. An embodiment of an apparatus includes one or more processors including a graphics processor; a memory for storage of data for processing by the one or more processors; and a cache to cache data from the memory; wherein the apparatus is to provide for dynamic overfetching of cache lines for the cache, including receiving a read request and accessing the cache for the requested data, and upon a miss in the cache, overfetching data from memory or a higher level cache in addition to fetching the requested data, wherein the overfetching of data is based at least in part on a current overfetch boundary, and provides for data is to be prefetched extending to the current overfetch boundary.

SYSTEM AND METHODS TO PROVIDE HIERARCHICAL OPEN SECTORING AND VARIABLE SECTOR SIZE FOR CACHE OPERATIONS

Graphics processors of the present design provide hierarchical open sectors and variable cache sizes for cache operations. In one embodiment, a graphics processor comprises a cache memory having a hierarchical open sector design including a first hierarchy of upper and lower regions with each region including a second hierarchy of sectors. A cache controller is configured to initially open a first sector of the lower region, to receive a memory request that does not match an address in the first sector, and to open a second sector of the lower region.