G06F9/30149

METHOD AND APPARATUS TO SORT A VECTOR FOR A BITONIC SORTING ALGORITHM
20230229448 · 2023-07-20 ·

A method is provided that includes performing, by a processor in response to a vector sort instruction, sorting of values stored in lanes of the vector to generate a sorted vector, wherein the values in a first portion of the lanes are sorted in a first order indicated by the vector sort instruction and the values in a second portion of the lanes are sorted in a second order indicated by the vector sort instruction; and storing the sorted vector in a storage location.

STATEFUL MICROCODE BRANCHING

Stateful microbranch instructions, including: generating, based on an instruction, a first one or more microinstructions including a stateful microbranch instruction, wherein the stateful microbranch instruction includes: an address of a next instruction after the instruction; a branch target address; one or more microcode attributes; and executing the first one or more microinstructions.

INSTRUCTION PACKING SCHEME FOR VLIW CPU ARCHITECTURE

A processor is provided and includes a core that is configured to perform a decode operation on a multi-instruction packet comprising multiple instructions. The decode operation includes receiving the multi-instruction packet that includes first and second instructions. The first instruction includes a primary portion at a fixed first location and a secondary portion. The second instruction includes a primary portion at a fixed second location between the primary portion of the first instruction and the secondary portion of the first instruction. An operational code portion of the primary portion of each of the first and second instructions is accessed and decoded. An instruction packet including the primary and secondary portions of the first instruction is created, and a second instruction packet including the primary portion of the second instruction is created. The first and second instructions packets are dispatched to respective first and second functional units.

Data processing apparatus having streaming engine with read and read/advance operand coding
11693660 · 2023-07-04 · ·

A streaming engine employed in a digital signal processor specified a fixed data stream. Once started the data stream is read only and cannot be written. Once fetched, the data stream is stored in a first-in-first-out buffer for presentation to functional units in the fixed order. Data use by the functional unit is controlled using the input operand fields of the corresponding instruction. A read only operand coding supplies the data an input of the functional unit. A read/advance operand coding supplies the data and also advances the stream to the next sequential data elements. The read only operand coding permits reuse of data without requiring a register of the register file for temporary storage.

Monolithic vector processor configured to operate on variable length vectors using a vector length register

A computer processor comprising a vector unit is disclosed. The vector unit may comprise a vector register file comprising at least one register to hold a varying number of elements. The vector unit may further comprise a vector length register file comprising at least one register to specify the number of operations of a vector instruction to be performed on the varying number of elements in the at least one register of the vector register file. The computer processor may be implemented as a monolithic integrated circuit.

Systems and methods for controlling machine operations within a multi-dimensional memory space
11526357 · 2022-12-13 · ·

Systems and methods for controlling machine operations are provided. A number of data entries are organized into a stack. Each data entry includes a type, a flag, a length, and a value or pointer entry. For each data entry in the stack, the type of data is determined from the type entry, the presence of an address or value is determined by the respective flag entry, and a length of the address or value is determined from the respective length entry. The data to be utilized or an address for the same at a particular electronic storage area is provided at the respective value or pointer entry, which may be specified by a space definition pushed onto the stack.

Stacked transistors with different gate lengths in different device strata

Disclosed herein are stacked transistors with different gate lengths in different device strata, as well as related methods and devices. In some embodiments, an integrated circuit structure may include stacked strata of transistors, with two different device strata having different gate lengths.

Method and apparatus for executing vector instructions with merging behavior

A processor includes a register file and control logic that detects multiple different sets of sequential zero bits of a register in the register file, wherein each of the multiple different sets has a bit length that corresponds to a partial instruction width and operates at a first partial instruction width or a second partial instruction width with the register file depending on number of sets of zero bits detected in the register. In certain examples, the control logic causes operating at first instruction width that avoids merging of a first bit length of data in the register and operating at the second instruction width that avoids merging of a second bit length of data in the register. In some examples, a register rename map table incudes multiple zero bits that identify the detected multiple different sets of bits of sequential zeros.

APPARATUS AND METHODS EMPLOYING A SHARED READ PORT REGISTER FILE
20230034072 · 2023-02-02 ·

In some implementations, a processor includes a plurality of parallel instruction pipes, a register file includes at least one shared read port configured to be shared across multiple pipes of the plurality of parallel instruction pipes. Control logic controls multiple parallel instruction pipes to read from the at least one shared read port. In certain examples, the at least one shared register file read port is coupled as a single read port for one of the parallel instruction pipes and as a shared register file read port for a plurality of other parallel instruction pipes.

Method and device for managing operation of a computing unit capable of operating with instructions of different sizes

An integrated circuit comprises a processing unit configured for booting up with a set of boot instructions, then for determining the size of the instructions of an application programme and potentially rebooting on its own initiative, while being reconfigured, in order for it to execute the instructions of the application program. Only one boot memory is needed as a consequence.