G06F9/3013

SAVING AND RESTORING REGISTERS
20230069266 · 2023-03-02 ·

There is provided a data processing apparatus comprising a plurality of registers, each of the registers having data bits to store data and metadata bits to store metadata. Each of the registers is adapted to operate in a metadata mode in which the metadata bits and the data bits are valid, and a data mode in which the data bits are valid and the metadata bits are invalid. Mode bit storage circuitry indicates whether each of the registers is in the data mode or the metadata mode. Execution circuitry is responsive to a memory operation that is a store operation on one or more given registers.

Programmable vision accelerator

In one embodiment of the present invention, a programmable vision accelerator enables applications to collapse multi-dimensional loops into one dimensional loops. In general, configurable components included in the programmable vision accelerator work together to facilitate such loop collapsing. The configurable elements include multi-dimensional address generators, vector units, and load/store units. Each multi-dimensional address generator generates a different address pattern. Each address pattern represents an overall addressing sequence associated with an object accessed within the collapsed loop. The vector units and the load store units provide execution functionality typically associated with multi-dimensional loops based on the address pattern. Advantageously, collapsing multi-dimensional loops in a flexible manner dramatically reduces the overhead associated with implementing a wide range of computer vision algorithms. Consequently, the overall performance of many computer vision applications may be optimized.

Circular buffer accessing device, system and method

A device includes a circular buffer, which, in operation, is organized into a plurality of subsets of buffers, and control circuitry coupled to the circular buffer. The control circuitry, in operation, receives a memory load command to load a set of data into the circular buffer. The memory load command has an offset parameter indicating a data offset and a subset parameter indicating a subset of the plurality of subsets into which the circular buffer is organized. The control circuitry responds to the command by identifying a set of buffer addresses of the circular buffer based on a value of the offset parameter and a value of the subset parameter, and loading the set of data into the circular buffer using the identified set of buffer addresses.

Speculation in memory

The present disclosure is related to performing speculation in, for example, a memory device or a computing system that includes a memory device. Speculation can be used to identify data that is accessed together or to predict data that will be accessed with greater frequency. The identified data can be organized to improve efficiency in providing access to the data.

SYSTEM AND METHOD TO CONTROL THE NUMBER OF ACTIVE VECTOR LANES IN A PROCESSOR
20230161587 · 2023-05-25 ·

In one disclosed embodiment, a processor includes a first execution unit and a second execution unit, a register file, and a data path including a plurality of lanes. The data path and the register file are arranged so that writing to the register file by the first execution unit and by the second execution unit is allowed over the data path, reading from the register file by the first execution unit is allowed over the data path, and reading from the register file by the second execution unit is not allowed over the data path. The processor also includes a power control circuit configured to, when a transfer of data between the register file and either of the first and second execution units uses less than all of the lanes, power down the lanes of the data path not used for the transfer of the data.

Vector table load instruction with address generation field to access table offset value

A processor includes a scalar processor core and a vector coprocessor core coupled to the scalar processor core. The scalar processor core is configured to retrieve an instruction stream from program storage, and pass vector instructions in the instruction stream to the vector coprocessor core. The vector coprocessor core includes a register file, a plurality of execution units, and a table lookup unit. The register file includes a plurality of registers. The execution units are arranged in parallel to process a plurality of data values. The execution units are coupled to the register file. The table lookup unit is coupled to the register file in parallel with the execution units. The table lookup unit is configured to retrieve table values from one or more lookup tables stored in memory by executing table lookup vector instructions in a table lookup loop.

Vector bit transpose

A method to transpose source data in a processor in response to a vector bit transpose instruction includes specifying, in respective fields of the vector bit transpose instruction, a source register containing the source data and a destination register to store transposed data. The method also includes executing the vector bit transpose instruction by interpreting N×N bits of the source data as a two-dimensional array having N rows and N columns, creating transposed source data by transposing the bits by reversing a row index and a column index for each bit, and storing the transposed source data in the destination register.

Processor comprising a double multiplication and double addition operator actuable by an instruction with three operand references
11604646 · 2023-03-14 · ·

A method of processing data by a processor, the method comprising the steps of: receiving, by the processor, an instruction including an operator code associated with three register references designating registers configured to contain pairs of multiplication operands, an addition operand, and a result register configured to receive an operator result, the operator code designating an operator configured to compute products of the pairs of multiplication operands and add the products with the addition operand; decoding the instruction by an instruction decoder of the processor, to determine the operator to be executed, and the registers containing the operands to be supplied to the operator and the result of the operator; actuating the operator by an arithmetic circuit of the processor, consuming the operands in the registers designated by the register references; and storing the result of the operator in the designated result register.

Vector floating-point classification

A method to classify source data in a processor in response to a vector floating-point classification instruction includes specifying, in respective fields of the vector floating-point classification instruction, a source register containing the source data and a destination register to store classification indications for the source data. The source register includes a plurality of lanes that each contains a floating-point value and the destination register includes a plurality of lanes corresponding to the lanes of the source register. The method further includes executing the vector floating-point classification instruction by, for each lane in the source register, classifying the floating-point value in the lane to identify a type of the floating-point value, and storing a value indicative of the identified type in the corresponding lane of the destination register.

System and method to implement masked vector instructions

A processor includes a register file comprising a length register, a vector register file comprising a plurality of vector registers, a mask register file comprising a plurality of mask registers, and a vector instruction execution circuit to execute a masked vector instruction comprising a first length register identifier representing the length register, a first vector register identifier representing a first vector register of the vector register file, and a first mask register identifier representing a first mask register of the mask register file, wherein the length register is to store a length value representing a number of operations to be applied to data elements stored in the first vector register, the first mask register is to store a plurality of mask bits, and a first mask bit of the plurality of mask bits determines whether a corresponding first one of the operations causes an effect.