Patent classifications
G06F2015/763
Acceleration System and Dynamic Configuration Method Thereof
An acceleration system includes a plurality of modules. Each of the plurality of modules includes at least one central processing unit, at least one graphics processing unit, at least one field programmable gate array, or at least one application specific integrated circuit. At least one of the plurality of modules includes at least another of the plurality of modules such that the acceleration system is structural and nested.
Arbitrating throttling recommendations for a systolic array
Throttling recommendations for a systolic array may be arbitrated. Throttling recommendations may be received at an arbiter for a systolic array from different sources, such as one or more monitors implemented in an integrated circuit along with the systolic array or sources external to the integrated circuit with the systolic array. A strongest throttling recommendation may be selected. The rate at which data enters the systolic array may be modified according to the strongest throttling recommendation.
Application specific integrated circuit accelerators
An application specific integrated circuit (ASIC) chip includes: a systolic array of cells; and multiple controllable bus lines configured to convey data among the systolic array of cells, in which the systolic array of cells is arranged in multiple tiles, each tile of the multiple tiles including 1) a corresponding sub array of cells of the systolic array of cells, 2) a corresponding subset of controllable bus lines of the multiple controllable bus lines, and 3) memory coupled to the subarray of cells.
Network on layer enabled architectures
The technology relates to a system on chip (SoC). The SoC may include a network on layer including one or more routers and an application specific integrated circuit (ASIC) layer bonded to the network layer, the ASIC layer including one or more components. In some instances, the network layer and the ASIC layer each include an active surface and a second surface opposite the active surface. The active surface of the ASIC layer and the second surface of the network may each include one or more contacts, and the network layer may be bonded to the ASIC layer via bonds formed between the one or more contacts on the second surface of the network layer and the one or more contacts on the active surface of the ASIC layer.
Tensor Partitioning and Partition Access Order
A method of processing partitions of a tensor in a target order includes receiving, by a reorder unit and from two or more producer units, a plurality of partitions of a tensor in a first order that is different from the target order, storing the plurality of partitions in the reorder unit, and providing, from the reorder unit, the plurality of partitions in the target order to one or more consumer units. In an example, the one or more consumer units process the plurality of partitions in the target order.
Lossless tiling in convolution networks—read-modify-write in backward pass
Disclosed is a data processing system which includes compile time logic configured to section a graph into a sequence of subgraphs, the sequence of subgraphs including at least a first subgraph. The compile time logic configures the first subgraph to generate a plurality of output tiles of an output tensor. A runtime logic configured with the compile time logic is to execute the sequence of subgraphs to generate, at the output of the first subgraph, the plurality of output tiles of the output tensor, and write the plurality of output tiles in a memory in an overlapping configuration. In an example, an overlapping region between any two neighboring output tiles of the plurality of output tiles comprises a summation of a corresponding region of a first neighboring output tile and a corresponding region of a second neighboring output tile.
METHODS AND APPARATUS TO ESTIMATE CARDINALITY OF USERS REPRESENTED ACROSS MULTIPLE BLOOM FILTER ARRAYS
Methods and apparatus to estimate cardinality of users represented across multiple bloom filter arrays are disclosed. Examples includes processor circuitry to execute and/or instantiate instructions to generate a first composite Bloom filter array based on first and second Bloom filter arrays. The processor circuitry is to generate a final composite Bloom filter array based on the first composite Bloom filter array and a third Bloom filter array. Different ones of the first, second, and third Bloom filter arrays representative of different sets of users who accessed media. The first, second, and third Bloom filter arrays including differential privacy noise. The processor circuitry to estimate a cardinality of a union of the first, second, and third Bloom filter arrays based on the final composite Bloom filter array.
Tensor partitioning and partition access order
A method of processing partitions of a tensor in a target order includes receiving, by a reorder unit and from two or more producer units, a plurality of partitions of a tensor in a first order that is different from the target order, storing the plurality of partitions in the reorder unit, and providing, from the reorder unit, the plurality of partitions in the target order to one or more consumer units. In an example, the one or more consumer units process the plurality of partitions in the target order.
Multi-headed multi-buffer for buffering data for processing
An integrated circuit includes a plurality of configurable units, each configurable unit having two or more corresponding sections. The plurality of configurable units is arranged in a serial arrangement to form a chain of sections of the configurable units. A data bus is connected to the plurality of configurable units which communicates data at a clock rate. The chain of sections is to receive and write a series of tensors at the clock rate at a first end section of the chain of sections, and sequentially propagate the series of tensors through individual sections within the chain of sections at the clock rate. The chain of sections is to output the series of tensors at a second end section of the chain of sections. The chain of sections is to also output the series of tensors at an intermediate section of the chain of sections.
Methods and apparatus to estimate cardinality of users represented across multiple bloom filter arrays
Methods and apparatus to estimate cardinality of users represented across multiple bloom filter arrays are disclosed. Examples includes processor circuitry to execute and/or instantiate instructions to generate a first composite Bloom filter array based on first and second Bloom filter arrays. The processor circuitry is to generate a final composite Bloom filter array based on the first composite Bloom filter array and a third Bloom filter array. Different ones of the first, second, and third Bloom filter arrays representative of different sets of users who accessed media. The first, second, and third Bloom filter arrays including differential privacy noise. The processor circuitry to estimate a cardinality of a union of the first, second, and third Bloom filter arrays based on the final composite Bloom filter array.