G06F9/462

Mobile device virtualization solution based on bare-metal hypervisor with optimal resource usage and power consumption
20200110459 · 2020-04-09 · ·

The invention provides multiple secure virtualized environments operating in parallel with optimal resource usage, power consumption and performance. The invention provides a method whereby virtual machines (VMs) have direct access to the computing system's hardware without adding traditional virtualization layers while the hypervisor maintains hardware-enforced isolation between VMs, preventing risks of cross-contamination. Additionally, some of the VMs can be deactivated and reactivated dynamically when needed, which saves the computing system resources. As a result, the invention provides bare-metal hypervisor use and security but without the limitations that make such hypervisors impractical, inefficient and inconvenient for use in mobile devices due to the device's limited CPU and battery power capacity.

Apparatus and method for controlling instruction execution behaviour
10613865 · 2020-04-07 · ·

An apparatus and method are provided for controlling instruction execution behaviour. The apparatus includes a set of data registers for storing data values, and a set of bounded pointer storage elements, where each bounded pointer storage element stores a pointer having associated range information indicative of an allowable range of addresses when using that pointer. A control storage element stores a current instruction context, and that current instruction context is used to influence the behaviour of at least one instruction executed by processing circuitry, that at least one instruction specifying a pointer reference for a required pointer, where the pointer reference is within at least a first subset of values (in one embodiment the behaviour is influenced irrespective of the value of the required pointer). In particular, when the current instruction context identifies a default state, the processing circuitry uses the pointer reference to identify one of the data registers whose stored data value forms the required pointer. However, when the current instruction context identifies a bounded pointer state, the processing circuitry instead uses the pointer reference to identify one of the bounded pointer storage elements whose stored pointer forms the required pointer. This allows an instruction set to be provided that can be used for both bounded pointer aware code and bounded pointer unaware code, without significantly increasing the pressure on instruction set encoding space.

Synchronization in a Multi-Tile Processing Arrangement
20200089499 · 2020-03-19 ·

A processing system comprising multiple tiles and an interconnect between the tiles. The interconnect is used to communicate between a group of some or all of the tiles according to a bulk synchronous parallel scheme, whereby each tile in the group performs an on-tile compute phase followed by an inter-tile exchange phase with the exchange phase being held back until all tiles in the group have completed the compute phase. Each tile in the group has a local exit state upon completion of the compute phase. The instruction set comprises a synchronization instruction for execution by each tile upon completion of its compute phase to signal a sync request to logic in the interconnect. In response to receiving the sync request from all the tiles in the group, the logic releases the next exchange phase and also makes available an aggregated a state of all the tiles in the group.

Cache storing data fetched by address calculating load instruction with label used as associated name for consuming instruction to refer
10585670 · 2020-03-10 · ·

A processor architecture includes a register file hierarchy to implement virtual registers that provide a larger set of registers than those directly supported by an instruction set architecture to facilitate multiple copies of the same architecture register for different processing threads, where the register file hierarchy includes a plurality of hierarchy levels. The processor architecture further includes a plurality of execution units coupled to the register file hierarchy.

Dynamically allocating storage elements to provide registers for processing thread groups

A technique is provided for processing thread groups, each thread group having associated program code comprising a plurality of regions that each require access to an associated plurality of registers providing operand values for the instructions of that region. Capacity management circuitry is arranged, for a thread group having a region of the associated program code that is ready to be executed, to perform an operand setup process to reserve sufficient storage elements within an operand staging unit to provide the associated plurality of registers, and to cause the operand value for any input register to be preloaded into a reserved storage element allocated for that input register, an input register being a register whose operand value is required before the region can be executed. Scheduling circuitry selects for processing a thread group for which the operand setup process has been performed in respect of the region to be executed.

Synchronization in a multi-tile processing arrangement

A processing system comprising multiple tiles and an interconnect between the tiles. The interconnect is used to communicate between a group of some or all of the tiles according to a bulk synchronous parallel scheme, whereby each tile in the group performs an on-tile compute phase followed by an inter-tile exchange phase with the exchange phase being held back until all tiles in the group have completed the compute phase. Each tile in the group has a local exit state upon completion of the compute phase. The instruction set comprises a synchronization instruction for execution by each tile upon completion of its compute phase to signal a sync request to logic in the interconnect. In response to receiving the sync request from all the tiles in the group, the logic releases the next exchange phase and also makes available an aggregated a state of all the tiles in the group.

HIERARCHICAL GENERAL REGISTER FILE (GRF) FOR EXECUTION BLOCK

In an example, an apparatus comprises a plurality of execution units, and a first general register file (GRF) communicatively couple to the plurality of execution units, wherein the first GRF is shared by the plurality of execution units. Other embodiments are also disclosed and claimed.

Application restore time from cloud gateway optimization using storlets

A method, computer system, and a computer program product for designing and executing at least one storlet is provided. The present invention may include receiving a plurality of restore operations based on a plurality of data. The present invention may also include identifying a plurality of blocks corresponding to the received plurality of restore operations from the plurality of data. The present invention may then include identifying a plurality of grain packs corresponding with the identified plurality of blocks. The present invention may further include generating a plurality of grain pack index identifications corresponding with the identified plurality of grain packs. The present invention may also include generating at least one storlet based on the generated plurality of grain pack index identifications. The present invention may then include returning a plurality of consolidated objects by executing the generated storlet.

MEMORY MODULE
20190391840 · 2019-12-26 ·

The access control circuit writes to the first storage unit a context information transmitted in one cycle from the CPU through the first bus, a context number identifying the context information, and a link context number identifying the context information transmitted from the CPU prior to the interrupt when the request for evacuating the task context information is received by the interrupt. After writing to the first storage unit, the access control circuit transfers the data including the context information and the link context number stored in the first storage unit to the second storage unit in a plurality of cycles through the internal bus (second bus) in association with the context number stored in the first storage unit.

Loading apparatus and method for convolution with stride or dilation of 2
11915338 · 2024-02-27 · ·

The disclosed technology generally relates to a graphics processing unit (GPU). In one aspect, a GPU includes a general purpose register (GPR) having registers, an arithmetic logic unit (ALU) configured to read pixels of an image independently of a shared memory, and a level 1 (L1) cache storing the pixels read by the ALU. The ALU can implement pixel mapping by fetching a quad of pixels, which includes pixels of first, second, third, and fourth pixel types, from the L1 cache, grouping the pixels of the different pixel types of the quad into four groups based on pixel type, and, for each group, separating the pixels included in the group into three regions that each have a set of pixels. The pixels for each group can then be loaded into the registers corresponding to the three regions.