Patent classifications
G06F7/498
COMPUTE IN/NEAR MEMORY (CIM) CIRCUIT ARCHITECTURE FOR UNIFIED MATRIX-MATRIX AND MATRIX-VECTOR COMPUTATIONS
A memory circuit includes a number (X) of multiply-accumulate (MAC) circuits that are dynamically configurable. The MAC circuits can either compute an output based on computations of X elements of the input vector with the weight vector, or to compute the output based on computations of a single element of the input vector with the weight vector, with each element having a one bit or multibit length. A first memory can hold the input vector having a width of X elements and a second memory can store the weight vector. The MAC circuits include a MAC array on chip with the first memory.
SWITCHED CAPACITOR VECTOR-MATRIX MULTIPLIER
Methods and apparatuses enable a general-purpose low power analog vector-matrix multiplier. A switched capacitor matrix multiplier may comprise a plurality of successive approximate registers (SAR) operating in parallel, each SAR having a SAR digital output; and a plurality of Analog Multiply-and-Accumulate (MAC) units for multiplying and accumulating and scaling bit-wise products of a digital weight matrix with a digital input vector, wherein each MAC unit is connected in series to a SAR of the plurality of SARs.
Mechanism to perform single precision floating point extended math operations
A processor to facilitate execution of a single-precision floating point operation on an operand is disclosed. The processor includes one or more execution units, each having a plurality of floating point units to execute one or more instructions to perform the single-precision floating point operation on the operand, including performing a floating point operation on an exponent component of the operand; and performing a floating point operation on a mantissa component of the operand, comprising dividing the mantissa component into a first sub-component and a second sub-component, determining a result of the floating point operation for the first sub-component and determining a result of the floating point operation for the second sub-component, and returning a result of the floating point operation.
CONNECTIVITY IN COARSE GRAINED RECONFIGURABLE ARCHITECTURE
A reconfigurable compute fabric can include multiple nodes, and each node can include multiple tiles with respective processing and storage elements. The tiles can be arranged in an array or grid and can be communicatively coupled. In an example, the tiles can be arranged in a one-dimensional array and each tile can be coupled to its respective adjacent neighbor tiles using a direct bus coupling. Each tile can be further coupled to at least one non-adjacent neighbor tile that is one tile, or device space, away using a passthrough bus. The passthrough bus can extend through intervening tiles.
BIGNUM ADDITION AND/OR SUBTRACTION WITH CARRY PROPAGATION
A processing unit includes a plurality of adders and a plurality of carry bit generation circuits. The plurality of adders add first and second X bit binary portion values of a first Y bit binary value and a second Y bit binary value. Y is a multiple of X. The plurality of adders further generate first carry bits. The plurality of carry bit generation circuits is coupled to the plurality of adders, respectively, and receive the first carry bits. The plurality of carry bit generation circuits generate second carry bits based on the first carry bits. The plurality of adders use the second carry bits to add the first and second X bit binary portions of the first and second Y bit binary values, respectively.
Data computing system
The present disclosure provides a data computing system. The data computing system comprises: a memory, a processor and an accelerator, wherein the memory is communicatively coupled to the processor and configured to store data to be computed and a computed result, the data being written by the processor; the processor is communicatively coupled to the accelerator and configured to control the accelerator; and the accelerator is communicatively coupled to the memory and configured to access the memory according to pre-configured control information, implement a computing process to produce the computed result and write the computed result back to the memory. The present disclosure also provides an accelerator and a method performed by an accelerator of a data computing system. The present disclosure can improve the execution efficiency of the processor and reduce the computing overhead of the processor.
Configurable processor with in-package look-up table
A configurable processor comprises a memory die and a logic die. The memory die comprises a programmable memory array for storing a look-up table (LUT) for a mathematical function, while the logic die comprises an arithmetic logic circuit (ALC) for performing at least an arithmetic operation on selected data from the LUT, wherein said mathematical function includes more operation than the arithmetic operations performable by the ALC. Complex mathematical functions can be implemented and configured.
Fractional pointer lookup table
Systems, apparatuses, and methods for implementing a fractional pointer lookup table are disclosed. A system includes a fractional pointer lookup table and control logic coupled to the table. The control logic performs an access to the table with a numerator and a denominator, wherein the numerator and the denominator are integers. The control logic receives a result of the lookup, wherein the result is either a rounded-up value of a quotient of the numerator and denominator or a rounded-down value of the quotient. In one embodiment, the control logic provides a fractional pointer to the table with each access and receives a fractional pointer limit from the table. The control logic initializes the fractional pointer to zero, increments the fractional pointer after each access to the table, and resets the fractional pointer to zero when the fractional pointer reaches the fractional pointer limit.
APPROXIMATE QUANTILE ESTIMATION
Apparatuses, corresponding methods, and instructions for data processing for the generation of an approximate quantile are provided. A sequence of data values is received, wherein the data values span a range of values and in a plurality of counters each counter is configured to have a correspondence to a subrange within the range of values. For each data value received a corresponding counter of the plurality of counters is determined and an update is applied to the corresponding counter of the plurality of counters. Approximate quantile determination is performed to generate at least one approximate quantile value for the sequence of data values received in dependence on respective counts of the plurality of counters.
APPROXIMATE QUANTILE ESTIMATION
Apparatuses, corresponding methods, and instructions for data processing for the generation of an approximate quantile are provided. A sequence of data values is received, wherein the data values span a range of values and in a plurality of counters each counter is configured to have a correspondence to a subrange within the range of values. For each data value received a corresponding counter of the plurality of counters is determined and an update is applied to the corresponding counter of the plurality of counters. Approximate quantile determination is performed to generate at least one approximate quantile value for the sequence of data values received in dependence on respective counts of the plurality of counters.