Patent classifications
G06F9/30025
CONVERSION INSTRUCTIONS
Techniques for data type conversion via instruction are described. An exemplary instruction is to include fields for an opcode, an identification of a source operand, and an identification of destination operand, wherein the opcode is to indicate instruction processing circuitry is to convert odd 16-bit floating point values from the identified source operand into 32-bit floating point values and store the 32-bit floating point values in data element positions of the identified destination operand.
Data Processing Method and Apparatus
The present application discloses a data processing method and apparatus. A specific embodiment of the method includes: preprocessing received to-be-processed input data; obtaining a storage address of configuration parameters of the to-be-processed input data based on a result of the preprocessing and a result obtained by linearly fitting an activation function, the configuration parameters being preset according to curve characteristics of the activation function; acquiring the configuration parameters of the to-be-processed input data according to the storage address; and processing the result of the preprocessing of the to-be-processed input data based on the configuration parameters of the to-be-processed input data and a preset circuit structure, to obtain a processing result. This implementation manner implements the processing of the input data to be processed by using the configuration parameter and the preset circuit structure, without the need to use any special circuit for implementing the activation function, thereby simplifying the circuit structure. In addition, this implementation manner can support multiple types of activation functions, thereby improving the flexibility. With such an embodiment, the processing of the input data to be processed can be realized by using the configuration parameters and the preset circuit structure, without the need of using a special circuit to implement the activation function, thereby simplifying the circuit structure, supporting various activation functions, and improving the flexibility.
SYSTEM AND METHOD FOR DATA COMPATIBILITY ACROSS HETEROGENEOUS MACHINE ARCHITECTURES
A method includes loading a data element from at least one memory into at least one internal register. The method also includes converting the data element from a network standardized format to a device native format. The method further includes performing an operation on the data value. The method also includes de-converting the data element from the device native format to the network standardized format. In addition, the method includes storing the data element in the at least one memory.
Method and device for dynamically adjusting decimal point positions in neural network computations
The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.
INSTRUCTIONS AND LOGIC TO PERFORM FLOATING POINT AND INTEGER OPERATIONS FOR MACHINE LEARNING
- Himanshu Kaul ,
- Mark A. Anders ,
- Sanu K. Mathew ,
- Anbang Yao ,
- Joydeep Ray ,
- Ping T. Tang ,
- Michael S. Strickland ,
- Xiaoming Chen ,
- Tatiana Shpeisman ,
- Abhishek R. Appu ,
- Altug Koker ,
- Kamal Sinha ,
- Balaji Vembu ,
- Nicolas C. Galoppo Von Borries ,
- Eriko Nurvitadhi ,
- Rajkishore Barik ,
- Tsung-Han Lin ,
- Vasanth Ranganathan ,
- Sanjeev Jahagirdar
One embodiment provides a graphics processor comprising a memory controller and a graphics processing resource coupled with the memory controller. The graphics processing resource includes circuitry configured to execute an instruction to perform a matrix operation on first input including weight data and second input including input activation data, generate intermediate data based on a result of the matrix operation, quantize the intermediate data to a floating-point format determined based on a statistical distribution of first output data, and output, as second output data, quantized intermediate data in a determined floating-point format.
Systems and methods for combining low-mantissa units to achieve and exceed FP64 emulation of matrix multiplication
The present disclosure relates to an apparatus that includes decoding circuitry that decodes a single instruction. The single instruction includes an identifier of a first source operand, an identifier of a second source operand, an identifier of a destination, and an opcode indicative of execution circuitry is to multiply from the identified first source operand and the identified second source operand and store a result in the identified destination. Additionally, the apparatus includes execution circuitry to execute the single decoded instruction to calculate a dot product by calculating a plurality of products using data elements of the identified first and second operands using values less precise than the identified first and second source operands, summing the calculated products, and storing the summed products in the destination.
SEMICONDUCTOR DEVICE
When the conversion arithmetic of the numerical type of floating-point data and integer data is performed by software, the load of the CPU becomes heavy. A semiconductor device includes a memory, a bus coupled to the memory, a bus master coupled to the bus, and a conversion arithmetic circuit coupled to the bus. The conversion arithmetic circuit includes a floating-point data adder-subtracter, an integer data adder-subtracter, and a shift operator. The semiconductor device converts the floating-point data to the integer data or converts the integer data to the floating-point data, without employing a multiplier and a divider of the floating-point data.
Shift significand of decimal floating point data
A decimal floating point finite number in a decimal floating point format is composed from the number in a different format. A decimal floating point format includes fields to hold information relating to the sign, exponent and significand of the decimal floating point finite number. Other decimal floating point data, including infinities and NaNs (not a number), are also composed. Decimal floating point data are also decomposed from the decimal floating point format to a different format. For composition and decomposition, one or more instructions may be employed, including a shift significand instruction.
APPARATUS AND METHOD FOR SUPPORTING A CONVERSION INSTRUCTION
A data processing system 2 includes instruction decoder circuitry 12 responsive to a conversion instruction FCVTJS to convert a double precision floating point number into a 32-bit integer number. Right shifting circuitry 28 performs a right shift upon at least part of the input number and left shifting circuitry 32 performs a left shift of at least part of the input number. Selection circuitry 38 serves to select one of the right shifted number and the left shifted number as a selected shifted number which forms at least part of the output number which is generated.
Systems and methods for combining low-mantissa units to achieve and exceed FP64 emulation of matrix multiplication
The present disclosure relates to an apparatus that includes decoding circuitry that decodes a single instruction. The single instruction includes an identifier of a first source operand, an identifier of a second source operand, an identifier of a destination, and an opcode indicative of execution circuitry is to multiply from the identified first source operand and the identified second source operand and store a result in the identified destination. Additionally, the apparatus includes execution circuitry to execute the single decoded instruction to calculate a dot product by calculating a plurality of products using data elements of the identified first and second operands using values less precise than the identified first and second source operands, summing the calculated products, and storing the summed products in the destination.