G06F7/575

BIT STRING ACCUMULATION
20220156046 · 2022-05-19 ·

Systems, apparatuses, and methods related to bit string accumulation are described. A method for bit string accumulation can include performing an iteration of a recursive operation using a first bit string and a second bit string and modifying a quantity of bits of a result of the iteration of the recursive operation, wherein the modified quantity of bits is less than a threshold quantity of bits. The method can further include writing a first value comprising the modified bits indicative of the result of the iteration of the recursive operation to a first register and writing a second value indicative of the factor corresponding to the result of the iteration of the recursive operation to a second register.

METHOD OF ACCELERATING SIMULTANEOUS LOCALIZATION AND MAPPING (SLAM) AND DEVICE USING SAME

A preconditioned conjugate gradient (PCG) solver, embedded in an electronic device to perform a simultaneous localization and mapping (SLAM) operation, includes an image database, a factor graph database, and a back-end processor, wherein the back-end processor is configured to receive an image from the image database to perform re-localization, receive, from the factor graph database, data for calculating six degrees of freedom (DoF)-related components, construct a matrix including the six degrees of freedom-related components based on the received data, and load and rearrange the matrix and a vector, to perform calculation on each block of each row of the matrix and the vector, then output first data, and shift second data to a location of the first data.

ACCELERATED MATHEMATICAL ENGINE
20230305808 · 2023-09-28 ·

Various embodiments of the disclosure relate to an accelerated mathematical engine. In certain embodiments, the accelerated mathematical engine is applied to image processing such that convolution of an image is accelerated by using a two-dimensional matrix processor comprising sub-circuits that include an ALU, output register and shadow register. This architecture supports a clocked, two-dimensional architecture in which image data and weights are multiplied in a synchronized manner to allow a large number of mathematical operations to be performed in parallel.

ACCELERATED MATHEMATICAL ENGINE
20230305808 · 2023-09-28 ·

Various embodiments of the disclosure relate to an accelerated mathematical engine. In certain embodiments, the accelerated mathematical engine is applied to image processing such that convolution of an image is accelerated by using a two-dimensional matrix processor comprising sub-circuits that include an ALU, output register and shadow register. This architecture supports a clocked, two-dimensional architecture in which image data and weights are multiplied in a synchronized manner to allow a large number of mathematical operations to be performed in parallel.

Scheduling tasks using work fullness counter

A method of activating scheduling instructions within a parallel processing unit includes checking if an ALU targeted by a decoded instruction is full by checking a value of an ALU work fullness counter stored in the instruction controller and associated with the targeted ALU. If the targeted ALU is not full, the decoded instruction is sent to the targeted ALU for execution and the ALU work fullness counter associated with the targeted ALU is updated. If, however, the targeted ALU is full, a scheduler is triggered to de-activate the scheduled task by changing the scheduled task from the active state to a non-active state. When an ALU changes from being full to not being full, the scheduler is triggered to re-activate an oldest scheduled task waiting for the ALU by removing the oldest scheduled task from the non-active state.

Scheduling tasks using work fullness counter

A method of activating scheduling instructions within a parallel processing unit includes checking if an ALU targeted by a decoded instruction is full by checking a value of an ALU work fullness counter stored in the instruction controller and associated with the targeted ALU. If the targeted ALU is not full, the decoded instruction is sent to the targeted ALU for execution and the ALU work fullness counter associated with the targeted ALU is updated. If, however, the targeted ALU is full, a scheduler is triggered to de-activate the scheduled task by changing the scheduled task from the active state to a non-active state. When an ALU changes from being full to not being full, the scheduler is triggered to re-activate an oldest scheduled task waiting for the ALU by removing the oldest scheduled task from the non-active state.

Execution unit for calculations with masked data

According to one embodiment, an execution unit is described, which includes a mask generation circuit configured to generate a mask by multiplying a mask generation vector by blocks of codewords of a plurality of cyclic codes, a masking circuit configured to mask data to be processed by means of the mask, and an arithmetic logic unit configured to process the masked data by means of additions and rotations.

Execution unit for calculations with masked data

According to one embodiment, an execution unit is described, which includes a mask generation circuit configured to generate a mask by multiplying a mask generation vector by blocks of codewords of a plurality of cyclic codes, a masking circuit configured to mask data to be processed by means of the mask, and an arithmetic logic unit configured to process the masked data by means of additions and rotations.

BYPASSING ZERO-VALUE MULTIPLICATIONS IN A HARDWARE MULTIPLIER
20210349694 · 2021-11-11 ·

A device (e.g., integrated circuit chip) includes a first operand register, a second operand register, a multiplication unit, and a hardware logic component. The first operand register is configured to store a first operand value. The second operand register is configured to store a second operand value. The multiplication unit is configured to at least multiply the first operand value with the second operand value. The hardware logic component is configured to detect whether a zero value is provided and in response to a detection that the zero value is being provided: cause an update of at least the first operand register to be disabled, and cause a result of a multiplication of the first operand value with the second operand value to be a zero-value result.

BYPASSING ZERO-VALUE MULTIPLICATIONS IN A HARDWARE MULTIPLIER
20210349694 · 2021-11-11 ·

A device (e.g., integrated circuit chip) includes a first operand register, a second operand register, a multiplication unit, and a hardware logic component. The first operand register is configured to store a first operand value. The second operand register is configured to store a second operand value. The multiplication unit is configured to at least multiply the first operand value with the second operand value. The hardware logic component is configured to detect whether a zero value is provided and in response to a detection that the zero value is being provided: cause an update of at least the first operand register to be disabled, and cause a result of a multiplication of the first operand value with the second operand value to be a zero-value result.