G06F7/764

Fused multiply add operations using bit masks

Systems and methods of performing a fused multiply add (FMA) operations are provided. In one embodiment, the length of the adder used by the FMA operation is less than 3*N, where N is the number of bits in the mantissa term of a floating point number. A mask may be used to perform the addition portion of the FMA operation using the adder. A second mask may be used to denormalize the result of the addition portion of the FMA operation if an underflow occurs.

Processing non-power-of-two work unit in neural processor circuit
12327177 · 2025-06-10 · ·

A neural processor includes one or more neural engine circuits for performing convolution operations on input data corresponding to one or more tasks to generate output data. The neural engine circuits process the input data having a power-of-two (P2) shape. The neural processor circuit also includes a data processor circuit. The data processor circuit fetches source data having a non-power-of-two (NP2) shape. The source data may correspond to data of a machine learning model. The data processor circuit also reshapes the source data to generate reshaped source data with the P2 shape. The data processor circuit further sends the reshaped source data to the one or more neural engine circuits as the input data for performing convolution operations. In some cases, the data processor circuit may also perform padding on the source data before the source data is reshaped to the P2 shape.

TECHNIQUE FOR GENERATING AN OUTPUT VALUE REPRESENTING A SHIFTED INPUT VALUE
20250190176 · 2025-06-12 ·

A technique is provided for performing a computation equivalent to applying a shift to an input value to generate an output value. Mask generation circuitry is used to generate an N-bit mask in dependence on a provided shift amount indication. N is a number of possible bit positions that a given bit of the input value may be located within the output value after the shift is performed. The mask generation circuitry performs N independent logical operations on bits forming the shift amount indication, each logical operation producing a mask bit value for a corresponding bit position of the N-bit mask, and the N logical operations being arranged such that, for any given shift amount indication, only one bit position in the generated N-bit mask will have its mask bit value indicating a set state. Output value generation circuitry is used to apply the N-bit mask to the given bit of the input value in order to determine a corresponding location of the given bit within the output value, and to determine a location within the output value of each other bit of the input value in dependence on the corresponding location of the given bit.

PROCESSING NON-POWER-OF-TWO WORK UNIT IN NEURAL PROCESSOR CIRCUIT
20250278617 · 2025-09-04 · ·

A neural processor includes one or more neural engine circuits for performing convolution operations on input data corresponding to one or more tasks to generate output data. The neural engine circuits process the input data having a power-of-two (P2) shape. The neural processor circuit also includes a data processor circuit. The data processor circuit fetches source data having a non-power-of-two (NP2) shape. The source data may correspond to data of a machine learning model. The data processor circuit also reshapes the source data to generate reshaped source data with the P2 shape. The data processor circuit further sends the reshaped source data to the one or more neural engine circuits as the input data for performing convolution operations. In some cases, the data processor circuit may also perform padding on the source data before the source data is reshaped to the P2 shape.

Character recognition model training method and apparatus, character recognition method and apparatus, device and storage medium

The present disclosure provides a character recognition model training method and apparatus, a character recognition method and apparatus, a device and a medium, relating to the technical field of artificial intelligence, and specifically to the technical fields of deep learning, image processing and computer vision, which can be applied to scenarios such as character detection and recognition technology. The specific implementing solution is: partitioning an untagged training sample into at least two sub-sample images; dividing the at least two sub-sample images into a first training set and a second training set; where the first training set includes a first sub-sample image with a visible attribute, and the second training set includes a second sub-sample image with an invisible attribute; performing self-supervised training on a to-be-trained encoder by taking the second training set as a tag of the first training set, to obtain a target encoder.

Reducing power consumption in integrated circuits

Techniques for replacing input values being loaded into a computational circuit are described. Small input values such as denormal numbers can be replaced with alternative values such as zeros to reduce switching activity in the computational circuit, and thus reduce power consumption. In applications such as most neural networks, the impact on the prediction results when replacing small numbers with zeros can be negligible. In applications where high precision computations may be desirable, the input values can be loaded into the computation circuit without modification.

Device and methods for functional descriptor-based DMA controller

A microcontroller may include a DMA controller, a pattern matching circuit and a memory. The DMA controller may read a first descriptor word in the memory at a location addressed by a first descriptor pointer, and may move an input word from a location in the memory addressed by a source payload pointer to a location in the memory addressed by a destination payload pointer. The pattern matching circuit may perform a pattern matching operation based on the input word and one or more register values. The first descriptor pointer may be modified based on the results of the pattern matching circuit and may generate a second descriptor pointer value.

Security Device
20260003573 · 2026-01-01 ·

According to various embodiments, a security device is provided comprising a modular reducer configured to perform a modulo reduction by a modulus of each binary number of a sequence of binary numbers forming a data word, wherein each binary number consists of n bits by one or more first iterations comprising, in reaction to a first detector of the security device detecting that the most significant bit (MSB) of the binary number is set, changing the binary number by deleting its MSB and adding the difference between 2.sup.n1 and the modulus to the binary number, followed by one or more second iterations comprising, in reaction to a second detector of the security device detecting that the MSB of the sum of the binary number with the difference between 2.sup.n1 and the modulus is set, setting the binary number to that sum, wherein the MSB of the sum is deleted.

DYNAMIC ELEMENT MATCHING ENCODER PROVIDING A QUASI-CONSTANT NUMBER OF TRANSITIONS AS A FUNCTION OF A CONTROL WORD

A dynamic element matching (DEM) encoder system configured to convert an N-bit control word into a pattern of 1-bit values. The DEM encoder system includes a plurality of bypassable switching blocks comprising an encoder input configured to receive the N-bit control word and a plurality of control outputs configured to provide a plurality of intermediate control values based on the N-bit control word, wherein the plurality of bypassable switching blocks are connected in a series; and a plurality of DEM encoders configured to receive the plurality of intermediate control values and generate a plurality of encoder output values based on the plurality of intermediate control values, wherein each encoder output value is a respective 1-bit value of the pattern of 1-bit values.

Method for calculating a transition from a Boolean masking to an arithmetic masking
12578927 · 2026-03-17 · ·

A method is provided for re-masking from a Boolean mask to an arithmetic mask with a modulus (2m*p), in which m is an integer greater than or equal to zero, and p has at least one prime divisor unequal to 2, so that a carry is generated. The carry is masked or balanced to protect it against intrusion attacks.