Patent classifications
G06F9/3818
Method and apparatus for vector based finite impulse response (FIR) filtering
A method is provided that includes performing, by a processor in response to a vector finite impulse response (VFIR) filter instruction, generating of a plurality of filter outputs using a plurality of coefficients and a plurality of sequential data elements, the plurality of coefficients specified by a coefficient operand of the VFIR filter instruction and the plurality of sequential data elements specified by a data operand of the VFIR filter instruction, and storing the filter outputs in a storage location specified by the VFIR filter instruction.
Protecting against out-of-bounds buffer references
In an approach to protecting against out-of-bounds buffer references, an apparatus comprises one or more processor cores and a bounds-checking functional unit in each processor core configured to manage bounds information for one or more memory buffers. When a buffer is allocated, an address range of the buffer is stored. When a pointer is assigned an address within the address range of the buffer, the address range of the buffer is associated with the pointer. When the pointer is used to compute an address for an operation, whether the address for the operation is within the address range associated with the pointer is determined. If the address is not within the address range associated with the pointer, signaling that an error has occurred.
INSTRUCTION DECODE CLUSTER OFFLINING
An embodiment of an integrated circuit may comprise a core and an instruction decoder communicatively coupled to the core to decode one or more instructions for execution by the core, where the instruction decoder includes two or more decode clusters in a parallel arrangement, and circuitry to offline a decode cluster of the two or more decode clusters. Other embodiments are disclosed and claimed.
Tracking streaming engine vector predicates to control processor execution
In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.
Object-oriented memory client
A hardware client and corresponding method employ an object-oriented memory device. The hardware client generates an object-oriented message associated with an object of an object class. The object class includes at least one data member and at least one method. The hardware client transmits the object-oriented message generated to the object-oriented memory device via a hardware communications interface. The hardware communications interface couples the hardware client to the object-oriented memory device. The object is instantiated or to-be instantiated in at least one physical memory of the object-oriented memory device according to the object class. The at least one method enables the object-oriented memory device to access the at least one data member for the hardware client.
Systems, apparatuses, and methods for fused multiply add
Embodiments of systems, apparatuses, and methods for fused multiple add. In some embodiments, a decoder decodes a single instruction having an opcode, a destination field representing a destination operand, and fields for a first, second, and third packed data source operand, wherein packed data elements of the first and second packed data source operand are of a first, different size than a second size of packed data elements of the third packed data operand. Execution circuitry then executes the decoded single instruction to perform, for each packed data element position of the destination operand, a multiplication of a M N-sized packed data elements from the first and second packed data sources that correspond to a packed data element position of the third packed data source, add of results from these multiplications to a full-sized packed data element of a packed data element position of the third packed data source, and storage of the addition result in a packed data element position destination corresponding to the packed data element position of the third packed data source, wherein M is equal to the full-sized packed data element divided by N.
GRAPHICS PROCESSORS AND GRAPHICS PROCESSING UNITS HAVING DOT PRODUCT ACCUMULATE INSTRUCTION FOR HYBRID FLOATING POINT FORMAT
Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.
Method and apparatus for permuting streamed data elements
A method is provided that includes receiving, in a permute network, a plurality of data elements for a vector instruction from a streaming engine, and mapping, by the permute network, the plurality of data elements to vector locations for execution of the vector instruction by a vector functional unit in a vector data path of a processor.
Hardware channel-parallel data compression/decompression
A multichannel data packer includes a plurality of two-input multiplexers and a controller. The plurality of two-input multiplexers is arranged in 2.sup.N rows and N columns in which N is an integer greater than 1. Each input of a multiplexer in a first column receives a respective bit stream of 2.sup.N channels of bit streams. Each respective bit stream includes a bit-stream length based on data in the bit stream. The multiplexers in a last column output 2.sup.N channels of packed bit streams each having a same bit-stream length. The controller controls the plurality of multiplexers so that the multiplexers in the last column output the 2.sup.N channels of bit streams that each has the same bit-stream length.
Method and apparatus for implied bit handling in floating point multiplication
A method is provided that includes performing, by a processor in response to a floating point multiply instruction, multiplication of floating point numbers, wherein determination of values of implied bits of leading bit encoded mantissas of the floating point numbers is performed in parallel with multiplication of the encoded mantissas, and storing, by the processor, a result of the floating point multiply instruction in a storage location indicated by the floating point multiply instruction.