IPIQ

G06F2207/5442

Systolic similarity estimation with two-dimensional sliding window

11086817 · 2021-08-10 ·

Intel Corporation

Dan Pritsker

A systolic array implemented in circuitry of an integrated circuit, includes a processing element array having processing elements arranged in a vertical direction and a horizontal direction, first loaders communicatively coupled to the processing element array to load samples A.sub.m,n from at least one external memory to the processing element array, and second loaders communicatively coupled to the processing element array to load samples B.sub.k,l from the at least one external memory to the processing element array. Each row of the samples A.sub.m,n is loaded one row at a time to a single processing element along the horizontal direction, and each row of the samples B.sub.k,l is loaded one row at a time to a single processing element along the vertical direction, wherein pairing between the samples A.sub.m,n and B.sub.k,l in the horizontal direction and the vertical direction enables data reuse to reduce bandwidth usage of the external memory.

Systems, apparatuses, and methods for performing a double blocked sum of absolute differences

10303471 · 2019-05-28 ·

Intel Corporation

Embodiments of systems, apparatuses, and methods for performing in a computer processor vector double block packed sum of absolute differences (SAD) in response to a single vector double block packed sum of absolute differences instruction that includes a destination vector register operand, first and second source operands, an immediate, and an opcode are described.

Systolic Similarity Estimation with Two-Dimensional Sliding Window

20190095384 · 2019-03-28 ·

Dan Pritsker

Faster and more efficient different precision sum of absolute differences for dynamically configurable block searches for motion estimation

09788011 · 2017-10-10 ·

Texas Instruments Incorporated

This invention is a digital signal processor form plural sums of absolute values (SAD) in a single operation. An operational unit performing a sum of absolute value operation comprising two sets of a plurality of rows, each row producing a SAD output. Plural absolute value difference units receive corresponding packed candidate pixel data and packed reference pixel data. A row summer sums the output of the absolute value difference units in the row. The candidate pixels are offset relative to the reference pixels by one pixel for each succeeding row in a set of rows. The two sets of rows operate on opposite halves of the candidate pixels packed within an instruction specified operand. The SAD operations can be performed on differing data widths employing carry chain control in the absolute difference unit and the row summers.

SYSTEMS, APPARATUSES, AND METHODS FOR PERFORMING A DOUBLE BLOCKED SUM OF ABSOLUTE DIFFERENCES

20170242694 · 2017-08-24 ·

Apparatus and method for performing absolute difference operation

09678716 · 2017-06-13 ·

Arm Limited

An apparatus comprises processing circuitry for performing an absolute difference operation for generating an absolute difference value in response to the first operand the second operand. The processing circuitry supports variable data element sizes for data elements of the first and second operands and the absolute difference value. Each data element of the absolute difference value represents an absolute difference between corresponding data elements of the first and second operands. The processing circuitry has an adding stage for performing at least one addition to generate at least one intermediate value and an inverting stage for inverting selected bits of each intermediate value. Control circuitry generates control information based on the current data element size and status information generated in the adding stage, to identify the selected bits to be inverted in the inverting stage to convert each intermediate value into a corresponding portion of the absolute difference value.

FASTER AND MORE EFFICIENT DIFFERENT PRECISION SUM OF ABSOLUTE DIFFERENCES FOR DYNAMICALLY CONFIGURABLE BLOCK SEARCHES FOR MOTION ESTIMATION

20170150175 · 2017-05-25 ·

Near optimal configurable adder tree for arbitrary shaped 2D block sum of absolute differences (SAD) calculation engine

09658829 · 2017-05-23 ·

Intel Corporation

Embodiments of a near optimal configurable adder tree for arbitrary shaped 2D block sum of absolute differences (SAD) calculation engine are generally described herein. Other embodiments may be described and claimed. In some embodiments, a configurable two-dimensional adder tree architecture for computing a sum of absolute differences (SAD) for various block sizes up to 16 by 16 comprises a first stage of one-dimensional adder trees and a second stage of one-dimensional adder trees, wherein each one-dimensional adder tree comprises an input routing network, a plurality of adder units, and an output routing network.

Systems, apparatuses, and methods for performing a horizontal add or subtract in response to a single instruction

09619226 · 2017-04-11 ·

Intel Corporation

Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed horizontal add or subtract of packed data elements in response to a single vector packed horizontal add or subtract instruction that includes a destination vector register operand, a source vector register operand, and an opcode are describes.

Floating point execution unit for calculating packed sum of absolute differences

09594556 · 2017-03-14 ·

International Business Machines Corporation

A circuit arrangement and program product provide support for packed sum of absolute difference operations in a floating point execution unit, e.g., a scalar or vector floating point execution unit. Existing adders in a floating point execution unit may be utilized along with minimal additional logic in the floating point execution unit to support efficient execution of a fixed point packed sum of absolute differences instruction within the floating point execution unit, often eliminating the need for a separate vector fixed point execution unit in a processor architecture, and thereby leading to less logic and circuit area, lower power consumption and lower cost.

Patent classifications

G06F2207/5442