G06F9/5044

Hybrid memory in a dynamically power gated hardware accelerator

In an embodiment, a local memory dedicated to one or more hardware accelerators in a system may include at least two portions: a volatile portion and a non-volatile portion. Data that is reused from iteration to iteration of the hardware accelerator (e.g. constants, instruction words, etc.) may be stored in the non-volatile portion. Data that varies from iteration to iteration may be stored in the volatile portion. Both the local memory and the hardware accelerators may be powered down between iterations, saving power. The non-volatile portion need only be initialized at a first iteration, allowing the amount of time that the hardware accelerators and the local memory are powered up to be lessened for subsequent iterations since the reused data need not be reloaded in the subsequent iterations.

ACCELERATED RESOURCE DISTRIBUTION IN A UNIFIED ENDPOINT MANAGEMENT SYSTEM

Systems and methods presented herein provide examples for distributing resources in a UEM system. In one example, the UEM system can receive a request to check out a user device enrolled in the UEM system. The request can include a profile identifier (“ID”) of a user profile making the request and attributes of the user device. The UEM system can create a hash of group IDs associated with the profile ID. The UEM system can create a device context that includes the device attributes and the hash. The UEM system can then determine if the device context matches to a resource context. Resource contexts can identify a set of UEM resources associated with a device context. Where a match is found, the UEM system can provide the corresponding resources to the user device.

Parallel runtime execution on multiple processors
11544075 · 2023-01-03 · ·

A method and an apparatus that schedule a plurality of executables in a schedule queue for execution in one or more physical compute devices such as CPUs or GPUs concurrently are described. One or more executables are compiled online from a source having an existing executable for a type of physical compute devices different from the one or more physical compute devices. Dependency relations among elements corresponding to scheduled executables are determined to select an executable to be executed by a plurality of threads concurrently in more than one of the physical compute devices. A thread initialized for executing an executable in a GPU of the physical compute devices are initialized for execution in another CPU of the physical compute devices if the GPU is busy with graphics processing threads. Sources and existing executables for an API function are stored in an API library to execute a plurality of executables in a plurality of physical compute devices, including the existing executables and online compiled executables from the sources.

Classification of synthetic data tasks and orchestration of resource allocation

Various techniques are described for classifying synthetic data tasks and orchestrating a resource allocation between groups of eligible resources for processing the synthetic data tasks. Received synthetic data tasks can be classified by identifying a task category and a corresponding group of eligible resources (e.g., processors) for processing synthetic data tasks in the task category. For example, synthetic data tasks can include generation of source assets, ingestion of source assets, identification of variation parameters, variation of variation parameters, and creation of synthetic data. Certain categories of synthetic data tasks can be classified for processing with a particular group of eligible resources. For example, tasks to ingest synthetic data assets can be classified for processing on a CPU only, while a task to create synthetic data assets can be classified for processing on a GPU only. The synthetic data tasks can be queued and routed for processing by an eligible resource.

Apparatus and method for store pairing with reduced hardware requirements

An apparatus and method for pairing store operations. For example, one embodiment of a processor comprises: a grouping eligibility checker to evaluate a plurality of store instructions based on a set of grouping rules to determine whether two or more of the plurality of store instructions are eligible for grouping; and a dispatcher to simultaneously dispatch a first group of store instructions of the plurality of store instructions determined to be eligible for grouping by the grouping eligibility checker.

Recommendations for scheduling jobs on distributed computing devices

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for scheduling operations represented as a computational graph on a distributed computing network. A method includes: receiving data representing operations to be executed in order to perform a job on a plurality of hardware accelerators of a plurality of different accelerator types; generating, for the job and from at least the data representing the operations, features that represent a predicted performance for the job on hardware accelerators of the plurality of different accelerator types; generating, from the features, a respective predicted performance metric for the job for each of the plurality of different accelerator types according to a performance objective function; and providing, to a scheduling system, one or more recommendations for scheduling the job on one or more recommended types of hardware accelerators.

Systems and methods for IT management of distributed computing resources on a peer-to-peer network
11546168 · 2023-01-03 · ·

Systems and methods for managing distributed computing resources including blockchain-based management of serverless computing and edge computing. Distributed computing resources are managed on a peer-to-peer network, and serverless functions are hosted on a distributed IT infrastructure. Developers for the serverless functions and providers of distributed IT infrastructure utilize a blockchain-based IT marketplace platform to make transactions relating to computing resource consumption.

Task scheduling for machine-learning workloads
11544113 · 2023-01-03 · ·

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, are described for scheduling tasks of ML workloads. A system receives requests to perform the workloads and determines, based on the requests, resource requirements to perform the workloads. The system includes multiple hosts and each host includes multiple accelerators. The system determines a quantity of hosts assigned to execute tasks of the workload based on the resource requirement and the accelerators for each host. For each host in the quantity of hosts, the system generates a task specification based on a memory access topology of the host. The specification specifies the task to be executed at the host using resources of the host that include the multiple accelerators. The system provides the task specifications to the hosts and performs the workloads when each host executes assigned tasks specified in the task specifications for the host.

Multi-Core Processor, Multi-Core Processor Processing Method, and Related Device
20220414052 · 2022-12-29 ·

A multi-core processor includes a primary processor core and a secondary processor core coupled to the primary processor core. The primary processor core has first instruction space, and the secondary processor core has second instruction space. The primary processor core is configured to execute a first code segment in a target program, where the target program further includes a second code segment, the first code segment is a code segment compatible with the first instruction space, and the second code segment is a code segment compatible with the second instruction space, and send an address of the second code segment to the secondary processor core through a configuration interface of the secondary processor core.

5G-NR MULTI-CELL SOFTWARE FRAMEWORK
20220413928 · 2022-12-29 ·

Apparatuses, systems, and techniques to perform multi-cell physical layer (PHY) processing in a fifth generation (5G) new radio (NR) network. In at least one embodiment, a PHY library implementing a PHY pipeline groups multi-user and/or multi-cell 5G-NR PHY operations for parallel execution as a result of one or more function calls to an application programming interface provided by said PHY library.