G06F7/4915

Block floating point computations using shared exponents

A system for block floating point computation in a neural network receives a plurality of floating point numbers. An exponent value for an exponent portion of each floating point number of the plurality of floating point numbers is identified and mantissa portions of the floating point numbers are grouped. A shared exponent value of the grouped mantissa portions is selected according to the identified exponent values and then removed from the grouped mantissa portions to define multi-tiered shared exponent block floating point numbers. One or more dot product operations are performed on the grouped mantissa portions of the multi-tiered shared exponent block floating point numbers to obtain individual results. The individual results are shifted to generate a final dot product value, which is used to implement the neural network. The shared exponent block floating point computations reduce processing time with less reduction in system accuracy.

Parallel decimal multiplication hardware with a 3x generator

A method to produce a final product from a multiplicand and a multiplier is provided. The method is executed by a parallel decimal multiplication hardware architecture, which includes a 3 generator, at least one additional generator, a multiplier recoder, a partial product tree, and a decimal adder. The 3 generator, the at least one additional generator, and the multiplier recoder generate decimal partial products from the multiplicand and the multiplier. The partial product tree executes a reduction of the decimal partial products to produce two corresponding partial product accumulations. The decimal adder adds the two corresponding partial product accumulations of the decimal partial products to produce the final product.

DECIMAL LOAD IMMEDIATE INSTRUCTION

An instruction generates a value for use in processing within a computing environment. The instruction obtains a sign control associated with the instruction, and shifts an input value of the instruction in a specified direction by a selected amount to provide a result. The result is placed in a first designated location in a register, and the sign, which is based on the sign control, is placed in a second designated location of the register. The result and the sign provide a signed value to be used in processing within the computing environment.

BLOCK FLOATING POINT COMPUTATIONS USING SHARED EXPONENTS
20190347072 · 2019-11-14 ·

A system for block floating point computation in a neural network receives a plurality of floating point numbers. An exponent value for an exponent portion of each floating point number of the plurality of floating point numbers is identified and mantissa portions of the floating point numbers are grouped. A shared exponent value of the grouped mantissa portions is selected according to the identified exponent values and then removed from the grouped mantissa portions to define multi-tiered shared exponent block floating point numbers. One or more dot product operations are performed on the grouped mantissa portions of the multi-tiered shared exponent block floating point numbers to obtain individual results. The individual results are shifted to generate a final dot product value, which is used to implement the neural network. The shared exponent block floating point computations reduce processing time with less reduction in system accuracy.

Data computing system

The present disclosure provides a data computing system. The data computing system comprises: a memory, a processor and an accelerator, wherein the memory is communicatively coupled to the processor and configured to store data to be computed and a computed result, the data being written by the processor; the processor is communicatively coupled to the accelerator and configured to control the accelerator; and the accelerator is communicatively coupled to the memory and configured to access the memory according to pre-configured control information, implement a computing process to produce the computed result and write the computed result back to the memory. The present disclosure also provides an accelerator and a method performed by an accelerator of a data computing system. The present disclosure can improve the execution efficiency of the processor and reduce the computing overhead of the processor.

Decimal load immediate instruction

An instruction generates a value for use in processing within a computing environment. The instruction obtains a sign control associated with the instruction, and shifts an input value of the instruction in a specified direction by a selected amount to provide a result. The result is placed in a first designated location in a register, and the sign, which is based on the sign control, is placed in a second designated location of the register. The result and the sign provide a signed value to be used in processing within the computing environment.

PARALLEL DECIMAL MULTIPLICATION HARDWARE WITH A 3X GENERATOR

A method to produce a final product from a multiplicand and a multiplier is provided. The method is executed by a parallel decimal multiplication hardware architecture, which includes a 3 generator, at least one additional generator, a multiplier recoder, a partial product tree, and a decimal adder. The 3 generator, the at least one additional generator, and the multiplier recoder generate decimal partial products from the multiplicand and the multiplier. The partial product tree executes a reduction of the decimal partial products to produce two corresponding partial product accumulations. The decimal adder adds the two corresponding partial product accumulations of the decimal partial products to produce the final product.

Decimal multiply and shift instruction

An instruction to perform a multiply and shift operation is executed. The executing includes multiplying a first value and a second value obtained by the instruction to obtain a product. The product is shifted in a specified direction by a user-defined selected amount to provide a result, and the result is placed in a selected location. The result is to be used in processing within the computing environment.

Parallel decimal multiplication hardware with a 3X generator

A method to produce a final product from a multiplicand and a multiplier is provided. The method is executed by a parallel decimal multiplication hardware architecture, which includes a 3 generator, at least one additional generator, a multiplier recoder, a partial product tree, and a decimal adder. The 3 generator, the at least one additional generator, and the multiplier recoder generate decimal partial products from the multiplicand and the multiplier. The partial product tree executes a reduction of the decimal partial products to produce two corresponding partial product accumulations. The decimal adder adds the two corresponding partial product accumulations of the decimal partial products to produce the final product.

PARALLEL DECIMAL MULTIPLICATION HARDWARE WITH A 3X GENERATOR

A method to produce a final product from a multiplicand and a multiplier is provided. The method is executed by a parallel decimal multiplication hardware architecture, which includes a 3 generator, at least one additional generator, a multiplier recoder, a partial product tree, and a decimal adder. The 3 generator, the at least one additional generator, and the multiplier recoder generate decimal partial products from the multiplicand and the multiplier. The partial product tree executes a reduction of the decimal partial products to produce two corresponding partial product accumulations. The decimal adder adds the two corresponding partial product accumulations of the decimal partial products to produce the final product.