G06F15/7839

ASIC POWER CONTROL

A logic power network provided in an application-specific integrated circuit (ASIC). The ASIC includes a central processor. The ASIC also includes at least one intellectual property (IP) core operatively connected with the central processor and having a set of electrical components provided therein. The ASIC also includes a network-on-chip (NOC) operatively connected with the central processor and the at least one IP core. The ASIC also includes a logic power network operatively connected with the central processor, the at least one IP core and the set of electrical components therein, and the NOC. The logic power network is adapted to control power of the at least one IP core and the set of electrical components provided in the at least one IP Core individually and separately.

POST-MANUFACTURE LATCH TIMING CONTROL BLOCKS IN PIPELINED PROCESSORS

An apparatus includes a series of pipeline stages that have logic components connected to supply output data to latch components, timing correction blocks connected to the latch components, and a memory component connected to supply a correction pattern to the timing correction blocks. The timing correction blocks have a buffer connected to a multiplexor. The correction pattern controls whether the multiplexor receives an adjusted clock signal through the buffer to control whether the timing correction blocks supply an unadjusted clock signal or the adjusted clock signal to the latch components.

Tensor partitioning and partition access order

A method of processing partitions of a tensor in a target order includes receiving, by a reorder unit and from two or more producer units, a plurality of partitions of a tensor in a first order that is different from the target order, storing the plurality of partitions in the reorder unit, and providing, from the reorder unit, the plurality of partitions in the target order to one or more consumer units. In an example, the one or more consumer units process the plurality of partitions in the target order.

SYSTEMS AND METHODS FOR IMPROVING CACHE EFFICIENCY AND UTILIZATION

Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache coupled to the processing resources. The cache controller is configured to control cache priority by determining whether default settings or an instruction will control cache operations for the cache.

Lossless Tiling In Convolution Networks - Backward Pass

Disclosed is a data processing system to receive a processing graph of an application. A compile time logic is configured to modify the processing graph and generate a modified processing graph. The modified processing graph is configured to apply a post-padding tiling after applying a cumulative input padding that confines padding to an input. The cumulative input padding pads the input into a padded input. The post-padding tiling tiles the padded input into a set of pre-padded input tiles with a same tile size, tiles intermediate representation of the input into a set of intermediate tiles with a same tile size, and tiles output representation of the input into a set of non-overlapping output tiles with a same tile size. Runtime logic is configured with the compile time logic to execute the modified processing graph to execute the application.

TECHNIQUES FOR CONFIGURING PARALLEL PROCESSORS FOR DIFFERENT APPLICATION DOMAINS

In various embodiments, a parallel processor includes a parallel processor module implemented within a first die and a memory system module implemented within a second die. The memory system module is coupled to the parallel processor module via an on-package link. The parallel processor module includes multiple processor cores and multiple cache memories. The memory system module includes a memory controller for accessing a DRAM. Advantageously, the performance of the parallel processor module can be effectively tailored for memory bandwidth demands that typify one or more application domains via the memory system module.

RECONFIGURABLE REDUCED INSTRUCTION SET COMPUTER PROCESSOR ARCHITECTURE WITH FRACTURED CORES

Systems and methods for reconfiguring a reduced instruction set computer processor architecture are disclosed. Exemplary implementations may: provide a primary processing core consisting of a RISC processor; provide a node wrapper associated with each of the plurality of secondary cores, the node wrapper comprising access memory associates with each secondary core, and a load/unload matrix associated with each secondary core; operate the architecture in a manner in which, for at least one core, data is read from and written to the at least cache memory in a control-centric mode; the secondary cores are selectively partitioned to operate in a streaming mode wherein data streams out of the corresponding secondary core into the main memory and other ones of the plurality of secondary cores.

CACHE STRUCTURE AND UTILIZATION

Embodiments are generally directed to cache structure and utilization. An embodiment of an apparatus includes one or more processors including a graphics processor; a memory for storage of data for processing by the one or more processors; and a cache to cache data from the memory; wherein the apparatus is to provide for dynamic overfetching of cache lines for the cache, including receiving a read request and accessing the cache for the requested data, and upon a miss in the cache, overfetching data from memory or a higher level cache in addition to fetching the requested data, wherein the overfetching of data is based at least in part on a current overfetch boundary, and provides for data is to be prefetched extending to the current overfetch boundary.

SYSTEM AND METHODS TO PROVIDE HIERARCHICAL OPEN SECTORING AND VARIABLE SECTOR SIZE FOR CACHE OPERATIONS

Graphics processors of the present design provide hierarchical open sectors and variable cache sizes for cache operations. In one embodiment, a graphics processor comprises a cache memory having a hierarchical open sector design including a first hierarchy of upper and lower regions with each region including a second hierarchy of sectors. A cache controller is configured to initially open a first sector of the lower region, to receive a memory request that does not match an address in the first sector, and to open a second sector of the lower region.

SYSTEMS AND METHODS FOR IMPROVING CACHE EFFICIENCY AND UTILIZATION

Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache coupled to the processing resources. The cache controller is configured to control cache priority by determining whether default settings or an instruction will control cache operations for the cache.