G06F9/30138

Data Processing Method and Device, and Storage Medium
20230084523 · 2023-03-16 ·

A data processing method and device, and a storage medium are provided. A processor of the data processing device comprises an index register group. Said method comprises: obtaining a first index value of each of at least one index register according to instruction codes, and determining the at least one index register according to the first index value, the instruction codes being generated by a compiler, and the at least one index register being at least one register in the index register group; and acquiring a first content stored in each of the at least one index register, and determining a first vector register according to the first content; and executing the instruction codes by accessing the first vector register.

Processor for avoiding reduced performance using instruction metadata to determine not to maintain a mapping of a logical register to a physical register in a first level register file

A processor includes a first level register file, second level register file, and register file mapper. The first and second level register files are comprised of physical registers, with the first level register file more efficiently accessed relative to the second level register file. The register file mapper is coupled with the first and second level register files. The register file mapper comprises a mapping structure and register file mapper controller. The mapping structure hosts mappings between logical registers and physical registers of the first level register file. The register file mapper controller determines whether to map a destination logical register of an instruction to a physical register in the first level register file. The register file mapper controller also determines, based on metadata associated with the instruction, whether to write data associated with the destination logical register to one of the physical registers of the second level register file.

Virtualized multicore systems with extended instruction heterogeneity
11630798 · 2023-04-18 · ·

A system on a chip may include a plurality of data plane processor cores sharing a common instruction set architecture. At least one of the data plane processor cores is specialized to perform a particular function via extensions to the otherwise common instruction set architecture. Such systems on a chip may have reduced physical complexity, cost, and time-to-market, and may provide improvements in core utilization and reductions in system power consumption.

Virtualization of process address space identifiers for scalable virtualization of input/output devices

Implementations of the disclosure provide a processing device comprising an address translation circuit to intercept a work request from an I/O device. The work request comprises a first ASID to map to a work queue. A second ASID of a host is allocated for the first ASID based on the work queue. The second ASID is allocated to at least one of: an ASID register for a dedicated work queue (DWQ) or an ASID translation table for a shared work queue (SWQ). Responsive to receiving a work submission from the SVM client to the I/O device, the first ASID of the application container is translated to the second ASID of the host machine for submission to the I/O device using at least one of: the ASID register for the DWQ or the ASID translation table for the SWQ based on the work queue associated with the I/O device.

Method of using multidimensional blockification to optimize computer program and device thereof

Disclosed embodiments relate to a method and device for optimizing compilation of source code. The proposed method receives a first intermediate representation code of a source code and analyses each basic block instruction of the plurality of basic block instructions contained in the first intermediate representation code for blockification. In order to blockify the identical instructions, the one or more groups of basic block instructions are assessed for eligibility of blockification. Upon determining as eligible, the group of basic block instructions are blockified using one of one dimensional SIMD vectorization and two-dimensional SIMD vectorization. The method further generates a second intermediate representation of the source code which is translated to executable target code with more efficient processing capacity.

EXTENSION OF REGISTER FILES FOR LOCAL PROCESSING OF DATA IN COMPUTING ENVIRONMENTS
20170371662 · 2017-12-28 ·

A mechanism is described for facilitating extension of register files in computing environments. A method of embodiments, as described herein, includes facilitating, inside an extended register file, performance of one or more tasks relating to an instruction, where the one or more tasks are performed by an extension mechanism being hosted inside the extended register file of a computing device.

Providing code sections for matrix of arithmetic logic units in a processor
11687346 · 2023-06-27 · ·

The present invention relates to a processor having a trace cache and a plurality of ALUs arranged in a matrix, comprising an analyser unit located between the trace cache and the ALUs, wherein the analyser unit analyses the code in the trace cache, detects loops, transforms the code, and issues to the ALUs sections of the code combined to blocks for joint execution for a plurality of clock cycles.

Generation and use of memory access instruction order encodings

Apparatus and methods are disclosed for controlling execution of memory access instructions in a block-based processor architecture using a hardware structure that indicates a relative ordering of memory access instruction in an instruction block. In one example of the disclosed technology, a method of executing an instruction block having a plurality of memory load and/or memory store instructions includes selecting a next memory load or memory store instruction to execute based on dependencies encoded within the block, and on a store vector that stores data indicating which memory load and memory store instructions in the instruction block have executed. The store vector can be masked using a store mask. The store mask can be generated when decoding the instruction block, or copied from an instruction block header. Based on the encoded dependencies and the masked store vector, the next instruction can issue when its dependencies are available.

Apparatus and method for handling registers in pipeline processing
09841957 · 2017-12-12 · ·

An apparatus stores a program including a description of loop processing of iterating a plurality of instructions, and rearranges an execution sequence of the plurality of instructions in the program such that the loop processing is pipelined by software pipeline. The apparatus inserts an instruction to use a register for single instruction multiple data (SIMD) extension instruction, into the description of the loop processing in the program.

COMPILER MANAGED MEMORY FOR IMAGE PROCESSOR

A method is described. The method includes repeatedly loading a next sheet of image data from a first location of a memory into a two dimensional shift register array. The memory is locally coupled to the two-dimensional shift register array and an execution lane array having a smaller dimension than the two-dimensional shift register array along at least one array axis. The loaded next sheet of image data keeps within an image area of the two-dimensional shift register array. The method also includes repeatedly determining output values for the next sheet of image data through execution of program code instructions along respective lanes of the execution lane array, wherein, a stencil size used in determining the output values encompasses only pixels that reside within the two-dimensional shift register array. The method also includes repeatedly moving a next sheet of image data to be fully loaded into the two dimensional shift register array from a second location of the memory to the first location of the memory.