Patent classifications
G06F7/49947
SYSTOLIC ARRAY CELLS WITH OUTPUT POST-PROCESSING
This specification relates to systolic arrays of hardware processing units. In one aspect, a matrix multiplication unit includes multiple cells arranged in a systolic array. Each cell includes multiplication circuitry configured to determine a product of elements of input matrices. Each cell includes an accumulator configured to determine an accumulated value by accumulating a sum of the products output by the multiplication circuitry. Each cell also includes a post-processing component configured to determine a post-processed value by performing one or more post-processing operations on the accumulated value.
SYSTEM AND METHOD OF GENERATING QUANTUM UNITARY NOISE USING SILICON BASED QUANTUM DOT ARRAYS
A novel and useful system and method of generating quantum unitary noise using silicon based quantum dot arrays. Unitary noise is derived from a probability of detecting a particle within a quantum dot array structure comprising position based charge qubits with two time independent basis states |0> and |1>. A two level electron tunneling device such as an interface device, qubit or other quantum structure is used to generate quantum noise. The electron tunneling device includes a reservoir of particles, a quantum dot, and a barrier that is used to control tunneling between the reservoir and the quantum dot. A detector circuit connected to the device outputs a digital stream corresponding to the probability of a particle of being detected. Controlling the bias applied to the barrier controls the probability of detection. Thus, the probability density function (PDF) of the output unitary noise can be controlled to correspond to a desired probability. The unitary noise can be used in stochastic rounding by controlling the bias applied to the barrier in accordance with a remainder of numbers to be rounded.
BIT-WIDTH OPTIMIZATION METHOD FOR PERFORMING FLOATING POINT TO FIXED POINT CONVERSION
Provided is a bit-width optimization method for performing floating point to fixed point conversion (FFC) by at least one processor. The bit-width optimization method includes receiving a first floating-point value which represents a minimum value among floating-point values to be converted, receiving a second floating-point value which represents a maximum value among the floating-point values to be converted, receiving a maximum permissible error rate for performing FFC, calculating a minimum bit width of fixed-point notation which satisfies the maximum permissible error rate on the basis of the first floating-point value, the second floating-point value, and the maximum permissible error rate, and calculating a scale factor for FFC on the basis of the second floating-point value and the calculated minimum bit width.
HIGH-PRECISION ANCHORED-IMPLICIT PROCESSING
An apparatus includes a processing circuit and a storage device. The processing circuit is configured to perform one or more processing operations in response to one or more instructions to generate an anchored-data element. The storage device is configured to store the anchored-data element. A format of the anchored-data element includes an identification item, an overlap item, and a data item. The data item is configured to hold a data value of the anchored-data element. The identification item indicates an anchor value for the data value or one or more special values.
Encoding method and device, decoding method and device, and storage medium
Provided are an encoding method and device, a decoding method and device, and a storage medium. The encoding method comprises: encoding an initial to-be-encoded bit sequence with a low density parity check code LDPC having a code rate R.sub.1, to obtain an encoded first bit sequence, where 0≤R.sub.1≤1; linearly combining at least two bit sequence segments in the first bit sequence to obtain a second bit sequence; and cascading the first bit sequence and the second bit sequence to obtain a target bit sequence having a code rate R.sub.2, where 0≤R.sub.2≤R.sub.1≤1.
Low latency floating-point division operations
Methods and systems for division operation are described. A processor can initialize an estimated quotient between the dividend and the divisor separately from a floating-point unit (FPU) pipeline. The processor can implement the FPU pipeline to execute a refinement process that can include at least a first iteration of operations and a second iteration of operations. The refinement process can include, in the first iteration of operations, generating a first unnormalized floating-point value using the initialized estimated quotient. The refinement process can include, in the second iteration of operations, generating a second unnormalized floating-point value using the first unnormalized floating-point value. The processor can determine a final quotient based on the second unnormalized floating-point value.
PROCESSING CORE WITH DATA ASSOCIATIVE ADAPTIVE ROUNDING
Processing cores with data associative adaptive rounding and associated methods are disclosed herein. One disclosed processing core comprises an arithmetic logic unit cluster configured to generate a value for a unit of directed graph data using input directed graph data, a comparator coupled to a threshold register and a data register, a core controller configured to load a threshold value into the threshold register when the value for the unit of directed graph data is loaded into the data register, and a rounding circuit. The rounding circuit is configured to receive the value for the unit of directed graph data from the arithmetic logic unit cluster and conditionally round the value for the unit of directed graph data based on a comparator output from the comparator.
Programmable Device Implementing Fixed and Floating Point Functionality in a Mixed Architecture
Configurable specialized processing blocks, such as DSP blocks, are described that implement fixed and floating-point functionality in a single mixed architecture on a programmable device. The described architecture reduces the need to construct floating-point functions outside the configurable specialized processing block, thereby minimizing hardware cost and area. The disclosed architecture also introduces pipelining into the DSP block in order to ensure the floating-point multiplication and addition functions remain in synchronicity, thereby increasing the maximum frequency at which the DSP block can operate. Moreover, the disclosed architecture includes logic circuitry to support floating-point exception handling.
Repurposed hexadecimal floating point data path
A method includes dividing a fraction of a floating point result into a first portion and a second portion. The method includes outputting a first normalizer result based on the first portion during to a first clock cycle. The method includes storing a first segment of the first portion during to the first clock cycle. The method includes outputting a first rounder result based on the first normalizer result during to the first clock cycle. The method includes outputting a second normalizer result based on the second portion during to a second clock cycle. The method includes outputting a second rounder result based on the second normalizer result and the first segment during to the second clock cycle.
NEURAL NETWORK SECURITY
Herein is disclosed a neural network controller, configured to implement a neural network, the neural network including: a first layer; one or more second layers; and a third layer; wherein each layer of the first layer, the one or more second layers, and the third layer includes one or more nodes; wherein at least one node of the one or more second layers is configured to provide an output value at a first level of precision; wherein the neural network controller is configured to implement a precision reduction function to reduce an output value of at least one node of the third layer to a second level of precision; and wherein the second level of precision is less precise than the first level of precision.