Patent classifications
G06F2209/509
AUTOMATIC COMPUTE KERNEL GENERATION
Apparatuses, systems, and techniques to receive, by a processor of a computer system, one or more operations for a kernel; automatically generate, by the processor, one or more operators that perform the one or more operations on elements of one or more input data structures; and automatically generate, by the processor, the kernel that comprises the one or more operators.
Non-disruptive firmware upgrade of symmetric hardware accelerator systems
In a symmetric hardware accelerator system, an initial hardware accelerator is selected for an upgrade of firmware. The initial and other hardware accelerators handle workloads that have been balanced across the hardware accelerators. Workloads are rebalanced by directing workloads having low CPU utilization to the initial hardware accelerator. A CPU fallback is conducted of the workloads of the initial hardware accelerator to the CPU. While the CPU is handling the workloads, firmware of the initial hardware accelerator is upgraded.
DEDICATED HARDWARE SYSTEM FOR SOLVING PARTIAL DIFFERENTIAL EQUATIONS
Embodiments relate to a computing system for solving differential equations. The system is configured to receive problem packages corresponding to problems to be solved, each comprising at least a differential equation and a domain, and to select a solver of a plurality of solvers, based upon availability of each of the plurality of solvers. Each solver comprises a coordinator that partitions the domain of the problem into a plurality of sub-domains, and assigns each of the plurality of sub-domains to a differential equation accelerator (DEA) of a plurality of DEAs. Each DEA comprises at least two memory units, and processes the sub-domain data over a plurality of time-steps by passing the sub-domain data through a selected systolic array from one memory unit, and storing the processed sub-domain data in the other memory unit, and vice versa.
DATA PROCESSING METHOD AND APPARATUS, DISTRIBUTED DATA FLOW PROGRAMMING FRAMEWORK, AND RELATED COMPONENT
A data processing method, a data processing apparatus, a distributed data flow programming framework, an electronic device, and a storage medium. The data processing method includes: dividing a data processing task into a plurality of data processing subtasks (S101); determining, in a Field Programmable Gate Array (FPGA) accelerator side, a target FPGA acceleration board corresponding to each of the data processing subtasks (S102); and sending data to be computed to the target FPGA acceleration board, and executing the corresponding data processing subtask by use of each of the target FPGA acceleration boards to obtain a data processing result (S103). According to the method, a physical limitation of host interfaces on the number of FPGA acceleration boards in an FPGA accelerator side may be avoided, thereby improving the data processing efficiency.
TECHNICAL INJECTION SYSTEM FOR INJECTING A RETRAINED MACHINE LEARNING MODEL
A technical injection system for injecting a retrained machine learning model is provided, including a. a first computing unit including a first storage medium, wherein the first computing unit is configured for providing the retrained machine learning model; and preprocessing the retrained machine learning model; wherein the retrained machine learning model is stored in the first storage medium; b. a second computing unit comprising a second storage medium and an injection interface, wherein the injection interface is configured for injecting at least one relevant part of the retrained machine learning model after processing from the first storage medium of the first computing unit into the second storage medium of the second computing unit by the injection interface at runtime.
DISTRIBUTED ACCELERATOR
Systems, methods, and devices are described coordinating a distributed accelerator. A command that includes instructions for performing a task is received. One or more sub-tasks of the task are determined to generate a set of sub-tasks. For each sub-task of the set of sub-tasks, an accelerator slice of a plurality of accelerator slices of a distributed accelerator is allocated, sub-task instructions for performing the sub-task are determined. Sub-task instructions are transmitted to the allocated accelerator slice for each sub-task. Each allocated accelerator slice is configured to generate a corresponding response indicative of the allocated accelerator slice having completed a respective sub-task. In a further example aspect, corresponding responses are received from each allocated accelerator slice and a coordinated response indicative of the corresponding responses is generated.
MODEL COORDINATION METHOD AND APPARATUS
A model coordination method for a first device is provided. The first device stores at least one model segment. The at least one model segment is configured to realize a part of functions of a preset model. The method includes: determining a first model segment from the at least one model segment stored in the first device, wherein when the first model segment is executed and a second model segment is executed by a second device, a part of or all the functions of the preset model are realized, the second model segment is one of at least one model segment stored in the second device, and the at least one model segment stored in the second device is configured to realize a part of the functions of the preset model. A model coordination apparatus is also provided.
On-demand access to compute resources
Disclosed are systems, methods and computer-readable media for controlling and managing the identification and provisioning of resources within an on-demand center as well as the transfer of workload to the provisioned resources. One aspect involves creating a virtual private cluster within the on-demand center for the particular workload from a local environment. A method of managing resources between a local compute environment and an on-demand environment includes detecting an event associated with a local compute environment and based on the detected event, identifying information about the local environment, establishing communication with an on-demand compute environment and transmitting the information about the local environment to the on-demand compute environment, provisioning resources within the on-demand compute environment to substantially duplicate the local environment and transferring workload from the local-environment to the on-demand compute environment. The event can be a threshold or a triggering event within or outside of the local environment.
Data management method and apparatus, and server
A data management method includes receiving, by a management server, a first request, determining, based on an identifier of a first user in the first request, whether a shadow tenant bucket associated with the identifier of the first user exists, and if the shadow tenant bucket associated with the identifier of the first user exists, storing, in the shadow tenant bucket associated with the identifier of the first user, an acceleration engine image (AEI) that the first user requests to register, where a shadow tenant bucket is used to store an AEI of a specified user, and each shadow tenant bucket is in a one-to-one correspondence with a user.
Techniques to perform fast fourier transform
Apparatuses, systems, and techniques to perform a fast Fourier transform operation. In at least one embodiment, a fast Fourier transform operation is performed based on one or more parameters, wherein the one or more parameters indicate information about one or more operands of the fast Fourier transform.