Patent classifications
G06F15/8023
Fuseload architecture for system-on-chip reconfiguration and repurposing
Methods, systems, and devices that support fuseload architectures for system-on-chip (SoC) reconfiguration and repurposing are described. Trim data may be loaded from fuses to registers on a die based on a fuse header. For example, a set of registers coupled with a set of fuses on the die may be identified, where the set of fuses may store trim data to be copied to the registers as part of a fuseload procedure. In such cases, one or more fuse headers may be identified within the trim data, and each fuse header may correspond to a fuse group that includes a subset of fuses. Based on one or more subfields within a fuse header, a mapping between fuse addresses and register addresses may be determined, and the trim data from each fuse group may be copied into a set of registers based on the mapping.
Defect repair for a reconfigurable data processor for homogeneous subarrays
A device architecture includes a spatially reconfigurable array of processors, such as configurable units of a CGRA, having spare homogenous subarrays, and a parameter store on the device which stores parameters that tag one or more elements as unusable. Configuration data is distributed using a statically reconfigurable bus system, to implement the pattern of placement of configuration data, in dependence on the tagged elements. As a result, a spatially reconfigurable array having unusable elements can be repaired.
High bandwidth memory system with distributed request broadcasting masters
A system comprises a processor and a plurality of memory units. The processor is coupled to each of the plurality of memory units by a plurality of network connections. The processor includes a plurality of processing elements arranged in a two-dimensional array and a corresponding two-dimensional communication network communicatively connecting each of the plurality of processing elements to other processing elements on same axes of the two-dimensional array. Each processing element that is located along a diagonal of the two-dimensional array is configured as a request broadcasting master for a respective group of processing elements located along a same axis of the two-dimensional array.
Method for vectorizing heapsort using horizontal aggregation SIMD instructions
Techniques are provided for vectorizing Heapsort. A K-heap is used as the underlying data structure for indexing values being sorted. The K-heap is vectorized by storing values in a contiguous memory array containing a beginning-most side and end-most side. The vectorized Heapsort utilizes horizontal aggregation SIMD instructions for comparisons, shuffling, and moving data. Thus, the number of comparisons required in order to find the maximum or minimum key value within a single node of the K-heap is reduced resulting in faster retrieval operations.
Systems and methods for virtually partitioning a machine perception and dense algorithm integrated circuit
Systems and methods for virtually partitioning an integrated circuit may include identifying dimensional attributes of a target input dataset and selecting a data partitioning scheme from a plurality of distinct data partitioning schemes for the target input dataset based on the dimensional attributes of the target dataset and architectural attributes of an integrated circuit. The methods described herein may also include disintegrating the target dataset into a plurality of distinct subsets of data based on the selected data partitioning scheme and identifying a virtual processing core partitioning scheme from a plurality of distinct processing core partitioning schemes for an architecture of the integrated circuit based on the disintegration of the target input dataset. Additionally, the architecture of the integrated circuit may be virtually partitioned into a plurality of distinct partitions of processing cores and each of the plurality of distinct subsets of data may be mapped to one of the plurality of distinct partitions of processing cores.
Apparatuses and methods for map reduce
The present disclosure relates to a method and an apparatus for map reduce. In some embodiments, an exemplary processing unit includes: a 2-dimensional (2D) processing element (PE) array comprising a plurality of PEs, each PE comprising a first input and a second input, the first inputs of the PEs in a linear array in a first dimension of the PE array being connected in series and the second inputs of the PEs in a linear array in a second dimension of the PE array being connected in parallel, each PE being configured to perform an operation on data from the first input or second input; and a plurality of reduce tree units, each reduce tree unit being coupled with the PEs in a linear array in the first dimension or the second dimension of the PE array and configured to perform a first reduction operation.
SYSTEMS AND METHODS FOR VIRTUALLY PARTITIONING A MACHINE PERCEPTION AND DENSE ALGORITHM INTEGRATED CIRCUIT
Systems and methods for virtually partitioning an integrated circuit may include identifying dimensional attributes of a target input dataset and selecting a data partitioning scheme from a plurality of distinct data partitioning schemes for the target input dataset based on the dimensional attributes of the target dataset and architectural attributes of an integrated circuit. The methods described herein may also include disintegrating the target dataset into a plurality of distinct subsets of data based on the selected data partitioning scheme and identifying a virtual processing core partitioning scheme from a plurality of distinct processing core partitioning schemes for an architecture of the integrated circuit based on the disintegration of the target input dataset. Additionally, the architecture of the integrated circuit may be virtually partitioned into a plurality of distinct partitions of processing cores and each of the plurality of distinct subsets of data may be mapped to one of the plurality of distinct partitions of processing cores.
Boot image file having a global partition for data processing engines of a programmable device
Some examples described herein relate to a boot image file. In an example, a design system includes a processor and a memory, storing instruction code, coupled to the processor. The processor is configured to execute the instruction code to compile an application to generate a boot image file. The boot image file is capable of being loaded onto and executed by a programmable device that comprises data processing engines (DPEs). The boot image file has a format comprising a platform loader and manager (PLM) and partitions. The PLM comprises code capable of being executed by a controller of the programmable device to load the partitions onto the programmable device. Each of the partitions comprises a bitstream, executable code, data, or a combination thereof. The partitions collectively include a single global partition that comprises DPE partitions that are capable of being loaded onto one or more of the DPEs.
VECTOR COMPUTATIONAL UNIT
A microprocessor system comprises a computational array and a vector computational unit. The computational array includes a plurality of computation units. The vector computational unit is in communication with the computational array and includes a plurality of processing elements. The processing elements are configured to receive output data elements from the computational array and process in parallel the received output data elements.
System for cross-routed communication between functional units of multiple processing units
A data processing system comprising a plurality of processing units. Each processing unit comprises a set of plural functional units and an internal communications network that routes communications between the functional units in a particular sequence order of the functional units. Each processing unit is connected to at least one other processing unit via a communications bridge that has at least two connections, a first connection that routes communications between a first pair of network nodes of the pair of processing units, and a separate, second connection that routes communications between a second, different pair of network nodes of the pair of processing units. Each connected pair of network nodes comprises network nodes having different positions in the internal communications network sequence order of the network nodes and/or network nodes associated with functional units of different types.