G06F9/30025

Stacked transistors with different gate lengths in different device strata

Disclosed herein are stacked transistors with different gate lengths in different device strata, as well as related methods and devices. In some embodiments, an integrated circuit structure may include stacked strata of transistors, with two different device strata having different gate lengths.

SYSTEM AND METHOD FOR INT9 QUANTIZATION
20230096994 · 2023-03-30 ·

A method of converting a data stored in a memory from a first format to a second format is disclosed. The method includes extending a number of bits in the data stored in a double data rate (DDR) memory by one bit to form an extended data. The method further includes determining whether the data stored in the DDR is signed or unsigned data. Moreover, responsive to determining that the data is signed, a sign value is added to the most significant bit of the extended data and the data is copied to lower order bits of the extended data. Responsive to determining that the data is unsigned, the data is copied to lower order bits of the extended data and the most significant bit is set to an unsigned value, e.g., zero. The extended data is stored in an on-chip memory (OCM) of a processing tile of a machine learning computer array.

ISA ACCESSIBLE PHYSICAL UNCLONABLE FUNCTION

Techniques for encrypting data using a key generated by a physical unclonable function (PUF) or a virtual PUF key are described. An apparatus according to the present disclosure may include decoder circuitry to decode an instance of a single instruction having a field for an opcode to indicate that execution circuitry is to encrypt at least encrypt secret information from an input data structure with either a physical unclonable function (PUF) generated encryption key or a virtual PUF key, bind the wrapped secret information to an identified target, update the input data structure, generate a MAC over the updated data structure, store the MAC in the input data structure to generate a wrapped output data structure, store the wrapped output data structure having the encrypted secret information and an indication of the target;

APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS FOR STRUCTURED-SPARSE TILE MATRIX FMA

Systems, methods, and apparatuses relating sparsity based FMA. In some examples, an instance of a single FMA instruction has one or more fields for an opcode, one or more fields to identify a source/destination matrix operand, one or more fields to identify a first plurality of source matrix operands, one or more fields to identify a second plurality of matrix operands, wherein the opcode is to indicate that execution circuitry is to select a proper subset of data elements from the first plurality of source matrix operands based on sparsity controls from a first matrix operand of the second plurality of matrix operands and perform a FMA.

Systems and methods for performing 16-bit floating-point matrix dot product instructions

Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (m, n) of the specified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the specified first source matrix by a corresponding nibble of a doubleword element (K,N) of the specified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element.

PERFORMING COMPARISON OPERATIONS USING VECTOR FLOATING POINT VALUES
20220350606 · 2022-11-03 ·

A method and processing module for performing a particular comparison operation using floating point values. The floating point values are received in a scalar format. The received floating point values are promoted to a vector format, wherein the received floating point values are used as a first component of the vector floating point values. A second component of one or more of the vector floating point values is set to a non-zero, finite value. The particular comparison operation is performed using the vector floating point values to determine a vector result having first and second components. A scalar result of the particular comparison operation is determined, wherein the magnitude of the scalar result is given by the magnitude of the first component of the vector result, and wherein if the first component of the vector result is non-zero then the sign of the scalar result equals the sign of the first component of the vector result, and wherein if the first component of the vector result is zero and if the second component of the vector result is non-zero then the sign of the scalar result equals the sign of the second component of the vector result. The scalar result of the particular comparison operation is outputted.

Computing device and method

The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.

Calculation method and related product

The present disclosure provides a computing method that is applied to a computing device. The computing device includes: a memory, a register unit, and a matrix computing unit. The method includes the following steps: controlling, by the computing device, the matrix computing unit to obtain a first operation instruction, where the first operation instruction includes a matrix reading instruction for a matrix required for executing the instruction; controlling, by the computing device, an operating unit to send a reading command to the memory according to the matrix reading instruction; and controlling, by the computing device, the operating unit to read a matrix corresponding to the matrix reading instruction in a batch reading manner, and executing the first operation instruction on the matrix. The technical solutions in the present disclosure have the advantages of fast computing speed and high efficiency.

Hardware-implemented universal floating-point instruction set architecture for computing directly with human-readable decimal character sequence floating-point representation operands
11635957 · 2023-04-25 ·

A universal floating-point Instruction Set Architecture (ISA) compute engine implemented entirely in hardware. The ISA compute engine computes directly with human-readable decimal character sequence floating-point representation operands without first having to explicitly perform a conversion-to-binary-format process in software. A fully pipelined convertToBinaryFromDecimalCharacter hardware operator logic circuit converts one or more human-readable decimal character sequence floating-point representations to IEEE 754-2008 binary floating-point representations every clock cycle. Following computations by at least one hardware floating-point operator, a convertToDecimalCharacterFromBinary hardware conversion circuit converts the result back to a human-readable decimal character sequence floating-point representation.

CONVERTER FOR CONVERTING DATA TYPE, CHIP, ELECTRONIC DEVICE, AND METHOD THEREFOR
20220326947 · 2022-10-13 ·

The present disclosure relates to a converter for data type conversion, a method for data type conversion, an integrated circuit chip, and a calculation apparatus, where the calculation apparatus may be included in a combined processing apparatus, where the combined processing apparatus may further include a general interconnection interface and other processing apparatus. The calculation apparatus interacts with other processing apparatus to jointly complete calculation operations specified by users. The combined processing apparatus may further include a storage apparatus. The storage apparatus is respectively connected to the calculation apparatus and other processing apparatus, and the storage apparatus is used for storing data of the calculation apparatus and other processing apparatus. A solution of the present disclosure may be widely applied to various data type conversion applications.