IPIQ

G06F9/3818

Method and apparatus for vector based finite impulse response (FIR) filtering

11681526 · 2023-06-20 ·

Texas Instmments Incorporated

A method is provided that includes performing, by a processor in response to a vector finite impulse response (VFIR) filter instruction, generating of a plurality of filter outputs using a plurality of coefficients and a plurality of sequential data elements, the plurality of coefficients specified by a coefficient operand of the VFIR filter instruction and the plurality of sequential data elements specified by a data operand of the VFIR filter instruction, and storing the filter outputs in a storage location specified by the VFIR filter instruction.

Protecting against out-of-bounds buffer references

11681806 · 2023-06-20 ·

International Business Machines Corporation

In an approach to protecting against out-of-bounds buffer references, an apparatus comprises one or more processor cores and a bounds-checking functional unit in each processor core configured to manage bounds information for one or more memory buffers. When a buffer is allocated, an address range of the buffer is stored. When a pointer is assigned an address within the address range of the buffer, the address range of the buffer is associated with the pointer. When the pointer is used to compute an address for an operation, whether the address for the operation is within the address range associated with the pointer is determined. If the address is not within the address range associated with the pointer, signaling that an error has occurred.

INSTRUCTION DECODE CLUSTER OFFLINING

20230185572 · 2023-06-15 ·

Intel Corporation

An embodiment of an integrated circuit may comprise a core and an instruction decoder communicatively coupled to the core to decode one or more instructions for execution by the core, where the instruction decoder includes two or more decode clusters in a parallel arrangement, and circuitry to offline a decode cluster of the two or more decode clusters. Other embodiments are disclosed and claimed.

Tracking streaming engine vector predicates to control processor execution

11507520 · 2022-11-22 ·

Texas Instruments Incorporated

In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.

Object-oriented memory client

11675623 · 2023-06-13 ·

Marvell Asia Pte, Ltd.

Nathan Chrisman

A hardware client and corresponding method employ an object-oriented memory device. The hardware client generates an object-oriented message associated with an object of an object class. The object class includes at least one data member and at least one method. The hardware client transmits the object-oriented message generated to the object-oriented memory device via a hardware communications interface. The hardware communications interface couples the hardware client to the object-oriented memory device. The object is instantiated or to-be instantiated in at least one physical memory of the object-oriented memory device according to the object class. The at least one method enables the object-oriented memory device to access the at least one data member for the hardware client.

Systems, apparatuses, and methods for fused multiply add

11507369 · 2022-11-22 ·

Intel Corporation

Embodiments of systems, apparatuses, and methods for fused multiple add. In some embodiments, a decoder decodes a single instruction having an opcode, a destination field representing a destination operand, and fields for a first, second, and third packed data source operand, wherein packed data elements of the first and second packed data source operand are of a first, different size than a second size of packed data elements of the third packed data operand. Execution circuitry then executes the decoded single instruction to perform, for each packed data element position of the destination operand, a multiplication of a M N-sized packed data elements from the first and second packed data sources that correspond to a packed data element position of the third packed data source, add of results from these multiplications to a full-sized packed data element of a packed data element position of the third packed data source, and storage of the addition result in a packed data element position destination corresponding to the packed data element position of the third packed data source, wherein M is equal to the full-sized packed data element divided by N.

GRAPHICS PROCESSORS AND GRAPHICS PROCESSING UNITS HAVING DOT PRODUCT ACCUMULATE INSTRUCTION FOR HYBRID FLOATING POINT FORMAT

20220365901 · 2022-11-17 ·

Intel Corporation

Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.

Method and apparatus for permuting streamed data elements

11669463 · 2023-06-06 ·

Texas Instruments Incorporated

A method is provided that includes receiving, in a permute network, a plurality of data elements for a vector instruction from a streaming engine, and mapping, by the permute network, the plurality of data elements to vector locations for execution of the vector instruction by a vector functional unit in a vector data path of a processor.

Hardware channel-parallel data compression/decompression

11671111 · 2023-06-06 ·

Samsung Electronics Co., Ltd.

A multichannel data packer includes a plurality of two-input multiplexers and a controller. The plurality of two-input multiplexers is arranged in 2.sup.N rows and N columns in which N is an integer greater than 1. Each input of a multiplexer in a first column receives a respective bit stream of 2.sup.N channels of bit streams. Each respective bit stream includes a bit-stream length based on data in the bit stream. The multiplexers in a last column output 2.sup.N channels of packed bit streams each having a same bit-stream length. The controller controls the plurality of multiplexers so that the multiplexers in the last column output the 2.sup.N channels of bit streams that each has the same bit-stream length.

Method and apparatus for implied bit handling in floating point multiplication

11500631 · 2022-11-15 ·

Texas Instruments Incorporated

A method is provided that includes performing, by a processor in response to a floating point multiply instruction, multiplication of floating point numbers, wherein determination of values of implied bits of leading bit encoded mantissas of the floating point numbers is performed in parallel with multiplication of the encoded mantissas, and storing, by the processor, a result of the floating point multiply instruction in a storage location indicated by the floating point multiply instruction.

Patent classifications

G06F9/3818