G06F7/491

SIGN-EFFICIENT ADDITION AND SUBTRACTION FOR STREAMINGCOMPUTATIONS IN CRYPTOGRAPHIC ENGINES
20230042366 · 2023-02-09 ·

Aspects of the present disclosure involve techniques and cryptographic processors configured to perform the techniques that include sign-efficient addition and subtraction operations that use Montgomery reduction and are capable of facilitating fast streaming operations. The techniques involve receiving a first number and a second number, where the first number and second number are within a target interval, and performing a modular operation to obtain a third number, the third number being within the same target interval and representing a sum or a difference of a rescaled first number and a rescaled second number, and wherein the modular operation includes a Montgomery reduction.

Computing device and method

The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.

Computing device and method

The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.

DATA PROCESSING METHOD IMPLEMENTED AT EDGE SWITCH, ELECTRONIC DEVICE, AND PROGRAM PRODUCT
20230236795 · 2023-07-27 ·

Embodiments of the present disclosure provide a data processing method implemented at an edge switch, an electronic device, and a program product. For example, a data processing method implemented at an edge switch is provided. The method includes receiving at least two data packets for floating-point arithmetic operations from at least one source device. In addition, the method may include acquiring corresponding floating-point numerical sequences respectively from the at least two data packets; and acquiring a floating-point arithmetic method from at least one data packet of the at least two data packets to determine a floating-point arithmetic result of the corresponding floating-point numerical sequences. The method may further include sending the floating-point arithmetic result to a target device indicated by the at least one data packet of the at least two data packets.

COMBINED DIVIDE/SQUARE ROOT PROCESSING CIRCUITRY AND METHOD
20230017462 · 2023-01-19 ·

An apparatus comprises combined divide/square root processing circuitry to perform, in response to a divide instruction, a given radix-64 iteration of a radix-64 divide operation, and in response to a square root instruction, a given radix-64 iteration of a radix-64 square root operation; in which: the combined divide/square root processing circuitry comprises shared circuitry to generate at least one output value for the given radix-64 iteration on a same data path used for both the radix-64 divide operation and the radix-64 square root operation.

ON-THE-FLY CONVERSION
20230018056 · 2023-01-19 ·

A data processing apparatus that converts a plurality of signed digits representing an input value in redundant representation comprises receiver circuitry to receive, at each of a plurality of iterations, a signed digit from the plurality of signed digits, and previous intermediate data from a previous iteration. Concatenation circuitry performs a concatenation of bits corresponding to the signed digit and bits of the previous intermediate data to produce updated intermediate data. Output circuitry provides the updated intermediate data as previous intermediate data of a next iteration. The previous intermediate data comprises S3[i] in non-redundant representation, which is at least part of the input value multiplied by 3 in non-redundant representation.

Computing device and method

The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.

Computing device and method

The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.

Prepare for shorter precision (round for reround) mode in a decimal floating-point instruction

An instruction is executed in round-for-reround mode wherein the permissible resultant value that is closest to and no greater in magnitude than the infinitely precise result is selected. If the selected value is not exact and the units digit of the selected value is either 0 or 5, then the digit is incremented by one and the selected value is delivered. In all other cases, the selected value is delivered.

Prepare for shorter precision (round for reround) mode in a decimal floating-point instruction

An instruction is executed in round-for-reround mode wherein the permissible resultant value that is closest to and no greater in magnitude than the infinitely precise result is selected. If the selected value is not exact and the units digit of the selected value is either 0 or 5, then the digit is incremented by one and the selected value is delivered. In all other cases, the selected value is delivered.