G06F7/49947

CIRCUITRY AND METHOD
20230005209 · 2023-01-05 ·

Circuitry comprises ray tracing circuitry comprising a plurality of floating-point circuitries to perform floating-point processing operations to detect intersection between a virtual ray defined by a ray direction and a test region, the floating-point circuitries operating to a given precision to generate an output floating-point value comprising a significand and an exponent; in which at least some of the plurality of floating-point circuitries are configured to round using a predetermined directed rounding mode any denormal floating-point value generated by operation of that circuitry so as to output normal values, a denormal floating-point value being a floating-point value in which the significand comprises one or more leading zeroes.

ELECTRONIC DEVICE AND CONTROLLING METHOD OF ELECTRONIC DEVICE

An electronic device and a controlling method of an electronic device are provided. An electronic device recursively determines a plurality of layers of a neural network model. Weight data of first model information is recursively quantized to obtain a second neural network model. The recursive quantization begins with the weight data and determines an iteration count of a recursion. The recursion operates on error data, quantized weight data, scale data and quantized error data to obtain the iteration count. A first bit-width of the weight data is reduced to a second bit-width of the quantized weight data. The recursion may be performed on a per-layer basis. The weight data may be formulated in a floating-point format and the quantized weight data may be formulated in a fixed point format with an integer number of bits.

Method and apparatus for generating fixed-point quantized neural network

A method of generating a fixed-point quantized neural network includes analyzing a statistical distribution for each channel of floating-point parameter values of feature maps and a kernel for each channel from data of a pre-trained floating-point neural network, determining a fixed-point expression of each of the parameters for each channel statistically covering a distribution range of the floating-point parameter values based on the statistical distribution for each channel, determining fractional lengths of a bias and a weight for each channel among the parameters of the fixed-point expression for each channel based on a result of performing a convolution operation, and generating a fixed-point quantized neural network in which the bias and the weight for each channel have the determined fractional lengths.

DUAL EXPONENT BOUNDING BOX FLOATING-POINT PROCESSOR

Apparatus and methods are disclosed for performing matrix operations, including operations suited to neural network and other machine learning accelerators and applications, using dual exponent formats. Disclosed matrix formats include single exponent bounding box floating-point (SE-BBFP) and dual exponent bounding box floating-point (DE-BBFP) formats. Shared exponents for each element are determined for each element based on whether the element is used as a row of matrix tile or a column of a matrix file, for example, for a dot product operation. Computing systems suitable for employing such neural networks include computers having general-purpose processors, neural network accelerators, or reconfigure both logic devices, such as Field programmable gate arrays (FPGA). Certain techniques disclosed herein can provide improved system performance while reducing memory and network bandwidth used.

TININESS DETECTION
20230035159 · 2023-02-02 ·

An apparatus comprises floating-point processing circuitry to perform a floating-point operation with rounding to generate a floating-point result value; and tininess detection circuitry to detect a tininess status indicating whether an outcome of the floating-point operation is tiny. A tiny outcome corresponds to a non-zero number with a magnitude smaller than a minimum non-zero magnitude representable as a normal floating-point number in a floating-point format to be used for the floating-point result value. The tininess detection circuitry comprises hardware circuit logic configured to support both before rounding tininess detection and after rounding tininess detection for detecting the tininess status.

COMPRESSION TECHNIQUES FOR VERTICES OF GRAPHIC MODELS
20230090310 · 2023-03-23 ·

Methods for lossy and lossless pre-processing of image data. In one embodiment, a method for lossy pre-processing image data, where the method may include, at a computing device: receiving the image data, where the image data includes a model having a mesh, the mesh includes vertices defining a surface, the vertices including attribute vectors, and the attribute vectors including values. The method also including quantizing the values of the attribute vectors to produce modified values, where a precision of the modified values is determined based on a largest power determined using a largest exponent of the values, encoding pairs of the modified values into two corresponding units of information. The method also including, for each pair of the pairs of the modified values, serially storing the two corresponding units of information as a data stream into a buffer, and compressing the data stream in the buffer.

MODEL TRAINING METHOD AND APPARATUS FOR FEDERATED LEARNING, DEVICE, AND STORAGE MEDIUM

A model training method and apparatus for federated learning, a device and a storage medium are provided, which belong to the technical field of machining learning. The method includes: generating an i.sup.th scalar operator based on a (t-1).sup.th round of training data and a t.sup.th round of training data (201); transmitting an i.sup.th fusion operator to a next node device based on the i.sup.th scalar operator (202); determining an i.sup.th second-order gradient descent direction of an i.sup.th sub-model based on an acquired second-order gradient scalar, an i.sup.th model parameter and an i.sup.thfirst-order gradient; and updating the i.sup.th sub-model based on the i.sup.th second-order gradient descent direction to obtain a model parameter of the i.sup.th sub-model during a (t+1).sup.th round of iterative training.

METHOD AND APPARATUS FOR GENERATING FIXED-POINT QUANTIZED NEURAL NETWORK

A method of generating a fixed-point quantized neural network includes analyzing a statistical distribution for each channel of floating-point parameter values of feature maps and a kernel for each channel from data of a pre-trained floating-point neural network, determining a fixed-point expression of each of the parameters for each channel statistically covering a distribution range of the floating-point parameter values based on the statistical distribution for each channel, determining fractional lengths of a bias and a weight for each channel among the parameters of the fixed-point expression for each channel based on a result of performing a convolution operation, and generating a fixed-point quantized neural network in which the bias and the weight for each channel have the determined fractional lengths.

Vector convert hexadecimal floating point to scaled decimal instruction

An instruction to perform converting and scaling operations is provided. Execution of the instruction includes converting an input value in one format to provide a converted result in another format. The converted result is scaled to provide a scaled result. A result obtained from the scaled result is placed in a selected location. Further, an instruction to perform scaling and converting operations is provided. Execution of the instruction includes scaling an input value in one format to provide a scaled result and converting the scaled result from the one format to provide a converted result in another format. A result obtained from the converted result is placed in a selected location.

Processing core with data associative adaptive rounding
11645041 · 2023-05-09 · ·

Processing cores with data associative adaptive rounding and associated methods are disclosed herein. One disclosed processing core comprises an arithmetic logic unit cluster configured to generate a value for a unit of directed graph data using input directed graph data, a comparator coupled to a threshold register and a data register, a core controller configured to load a threshold value into the threshold register when the value for the unit of directed graph data is loaded into the data register, and a rounding circuit. The rounding circuit is configured to receive the value for the unit of directed graph data from the arithmetic logic unit cluster and conditionally round the value for the unit of directed graph data based on a comparator output from the comparator.