Patent classifications
G06F7/552
RING-LWR-BASED QUANTUM-RESISTANT SIGNATURE METHOD AND SYSTEM THEREOF
According to an embodiment of the present disclosure, a ring-learning with rounding (LWR)-based quantum-resistant signature method includes: a key generation step of receiving a security parameter, and outputting a signature key and a verification key, via an operation on a ring defined by a cyclotomic equation including three terms; a signature value output step of outputting a signature value based on the output signature key; and a signature verification step of calculating an operation value based on the output verification key and signature value, and verifying a signature based on a result of comparing the output signature value with the calculated operation value.
Hybrid accumulation method in multiply-accumulate for machine learning
Methods for performing mixed-mode Multiply-Accumulate (MAC) functions in an integrated circuit (IC) are disclosed. By performing part of the MAC operation spatially and in parallel, and part of it temporally and serially, the number of MAC operations can be programmed in the serial/temporal MAC segment as a multiple of the parallel/spatial MAC segment. Such a trait provides a degree of flexibility in programming the mixed-mode MAC function. A Programmable-Hybrid-Accumulation (PHA) method, performs the accumulation function of the MAC IC, by transforming the accumulation signal to a hybrid accumulation signal. The hybrid accumulation signal is comprised of a Most-Significant-Portion (MSP) and a Least-Significant-Portion (LSP), wherein the portions of the hybrid accumulation signal can be programmed in accordance with cost-performance objectives of an end application. Transforming the accumulated signal to a hybrid signal, and utilizing the PHA method, enables keeping the signal magnitudes bounded which prevent signal over-flow constraints while accumulation cycles proceed. Arranging a mixed-signal MAC in accordance with the PHA method can, among other benefits, help to limit the peak-to-peak analog signal swings which enhances performance attributes such as lower current consumption, faster speed, lower power supply voltage, and a wider signal accumulation range before power supply operating head-room conditions are breached.
INVERSE ELEMENT OPERATION APPARATUS AND COMPUTER READABLE MEDIUM
An acceptance unit (110) accepts an element a. A preliminary operation unit (120) calculates t.sub.1 that is a computation result of a.sub.0.sup.2, t.sub.2 that is a computation result of a.sub.2.sup.2, t.sub.3 that is a computation result of a.sub.0a.sub.1, t.sub.4 that is a computation result of a.sub.1a.sub.2, and t.sub.7 that is equal to a computation result of (a.sub.0+a.sub.1)(a.sub.1−a.sub.2), using a.sub.0, a.sub.1, and a.sub.2. An inverse element operation unit (130) calculates b.sub.0 that is equal to a computation result of a.sub.0.sup.2−a.sub.1a.sub.2v, b.sub.1 that is equal to a computation result of a.sub.2.sup.2v−a.sub.0a.sub.1, and b.sub.2 that is equal to a computation result of a.sub.1.sup.2−a.sub.0a.sub.2, using t.sub.1, t.sub.2, t.sub.3, t.sub.4, and t.sub.7. An output unit (140) generates and outputs an inverse element a.sup.−1, using b.sub.0, b.sub.1, and b.sub.2.
INVERSE ELEMENT OPERATION APPARATUS AND COMPUTER READABLE MEDIUM
An acceptance unit (110) accepts an element a. A preliminary operation unit (120) calculates t.sub.1 that is a computation result of a.sub.0.sup.2, t.sub.2 that is a computation result of a.sub.2.sup.2, t.sub.3 that is a computation result of a.sub.0a.sub.1, t.sub.4 that is a computation result of a.sub.1a.sub.2, and t.sub.7 that is equal to a computation result of (a.sub.0+a.sub.1)(a.sub.1−a.sub.2), using a.sub.0, a.sub.1, and a.sub.2. An inverse element operation unit (130) calculates b.sub.0 that is equal to a computation result of a.sub.0.sup.2−a.sub.1a.sub.2v, b.sub.1 that is equal to a computation result of a.sub.2.sup.2v−a.sub.0a.sub.1, and b.sub.2 that is equal to a computation result of a.sub.1.sup.2−a.sub.0a.sub.2, using t.sub.1, t.sub.2, t.sub.3, t.sub.4, and t.sub.7. An output unit (140) generates and outputs an inverse element a.sup.−1, using b.sub.0, b.sub.1, and b.sub.2.
BFLOAT16 SQUARE ROOT AND/OR RECIPROCAL SQUARE ROOT INSTRUCTIONS
Techniques for performing square root or reciprocal square root calculations on BF16 data elements in response to an instruction are described. An example of an instruction is one that includes fields for an opcode, an identification of a location of a packed data source operand, and an identification of a packed data destination operand, wherein the opcode is to indicate that execution circuitry is to perform, for each data element position of the packed data source operand, a calculation of a square root value of a BF16 data element in that position and store a result of each square root into a corresponding data element position of the packed data destination operand.
Digital approximate multipliers for machine learning and artificial intelligence applications
Digital approximate multipliers (aMULT) utilizing interpolative apparatuses, circuits, and methods are described in this disclosure. The disclosed aMULT interpolative methods can be arranged or programmed to operate asynchronously and or synchronously. For applications where less precision is acceptable, fewer interpolations can yield less precise multiplication results, while such approximate multiplication can be computed faster and at lower power consumption. Conversely, for applications where higher precision is required, more interpolations can generate more precise multiplication results. As such, by utilizing the disclosed aMULT method, the resolution and precision objectives of an approximate multiplication function can be pre-programmed or adjusted real-time and or on the fly, which enables optimizing for different and flexible power consumption and speed of multiplication, in addition to enabling the optimization of an approximate multiplier's die size and cost in accordance with cost-performance objectives.
Verification of Hardware Designs to Implement Floating Point Power Functions
A method of exhaustively verifying a property of a hardware design to implement a floating point power function. The method includes, formally verifying that the hardware design is recurrent over sets of β input exponents, wherein β is an integer that is a multiple of the reciprocal of the exponent of the power function; and for each recurrent input range of the hardware design, exhaustively simulating the hardware design over a simulation range to verify the property is true over the simulation range, wherein the simulation range comprises only β input exponents.
Cubic root of a galois field element
A method includes receiving a first element of a Galois Field of order q.sup.m, where q is a prime number and m is a positive integer. The first element is raised to a predetermined power so as to form a second element z, wherein the predetermined power is a function of q.sup.m and an integer p, where p is a prime number which divides q.sup.m−1. The second element z is raised to a p.sup.th power to form a third element. If the third element equals the first element, the second element multiplied by a p.sup.th root of unity raised to a respective power selected from a set of integers between 0 and p−1 is output as at least one root of the first element.
Cubic root of a galois field element
A method includes receiving a first element of a Galois Field of order q.sup.m, where q is a prime number and m is a positive integer. The first element is raised to a predetermined power so as to form a second element z, wherein the predetermined power is a function of q.sup.m and an integer p, where p is a prime number which divides q.sup.m−1. The second element z is raised to a p.sup.th power to form a third element. If the third element equals the first element, the second element multiplied by a p.sup.th root of unity raised to a respective power selected from a set of integers between 0 and p−1 is output as at least one root of the first element.
Evaluating Polynomials in Hardware Logic
An accurate implementation of a polynomial using floating-point or other rounded arithmetic can be generated using a plurality of hardware logic components which each implement an input polynomial such that the zeros in the input polynomial can be determined correctly. The number of different hardware logic components that are used can be reduced by analysing the set of input polynomials and from it generating a set of polynomial components, where each polynomial in the set of input polynomials which is not also in the set of polynomial components, can be generated from a single one of the polynomial components.