G06F9/3877

SYSTEMS, METHODS, AND APPARATUS FOR ASSOCIATING COMPUTATIONAL DEVICE FUNCTIONS WITH COMPUTE ENGINES
20230052076 · 2023-02-16 ·

A method may include creating an association identifier based on an association between a computational device function and a compute engine of a computational device, and invoking an execute command to perform an execution of the computational device function using the compute engine, wherein the execute command uses the association identifier. The compute engine may be a first compute engine, and the association may be further between the computational device function and a second compute engine of the computational device. The execute command may perform an execution of the computational device function using the second compute engine. The execution of the computational device function using the first compute engine and the execution of the computational device function using the second compute engine may overlap. The execute command may include the association identifier. The creating the association identifier may include invoking a create association command.

System and Method for Distributed Data Processing
20230052131 · 2023-02-16 ·

A distributed data processing system includes a processing center or algorithm persistence system (“APS”), a series of remote caching nodes in electronic communication with the APS, and a series of remote computing or processing nodes in electronic communication with the remote caching nodes. Each remote caching node is mounted to a top surface of a mobile vehicle and includes a data transmitter/receiver (transceiver), computer hardware and software to operate the caching node, memory to transmit or transfer data from the APS to the remote processing nodes. The remote processing nodes include a series of electricity generating solar panels, a series of electronic data processing chips, electronic data memory, an electronic date transmitter/receiver (transceiver), and a motion sensor. The series of electronic data processing chips are preferably a tensor processing unit (TPU), which is an AI accelerator application-specific integrated circuit (ASIC) developed specifically for neural network machine learning.

Convolutional layer acceleration unit, embedded system having the same, and method for operating the embedded system

Disclosed herein are a convolutional layer acceleration unit, an embedded system having the convolutional layer acceleration unit, and a method for operating the embedded system. The method for operating an embedded system, the embedded system performing an accelerated processing capability programmed using a Lightweight Intelligent Software Framework (LISF), includes initializing and configuring, by a parallelization managing function entity (FE), entities present in resources for performing mathematical operations in parallel, and processing in parallel, by an acceleration managing FE, the mathematical operations using the configured entities.

Broadside random access memory for low cycle memory access and additional functions

A computational system includes one or more processors. Each processor has multiple registers, as well attached memory to hold instructions. The processor is coupled to one or more broadside interfaces. A broadside interface allows the processor to load or store an entire widget state in a single clock cycle of the processor. The broadside interface also allows the processor to move and store 32 bytes of information into RAM in less than four to five clock cycles of the processor while the processor concurrently performs one or more mathematical operations on the information while the move and store operation is taking place.

Determine whether to perform action on computing device based on analysis of endorsement information of a security co-processor

Examples disclosed herein relate to a computing device that includes a central processing unit, a management controller separate from the central processing unit, and a security co-processor. The management controller is powered using an auxiliary power rail that provides power to the management controller while the computing device is in an auxiliary power state. The security co-processor includes device unique data. The management controller receives the device unique data and stores a representation at a secure location. At a later time, the management controller receives endorsement information from an expected location of the security co-processor. The management controller determines whether to perform an action on the computing device based on an analysis of the endorsement information and the stored representation of the device unique data.

Systems and methods for performing horizontal tile operations

Disclosed embodiments relate to systems and methods for performing instructions specifying horizontal tile operations. In one example, a processor includes fetch circuitry to fetch an instruction specifying a horizontal tile operation, a location of a M by N source matrix comprising K groups of elements, and locations of K destinations, wherein each of the K groups of elements comprises the same number of elements, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction by generating K results, each result being generated by performing the specified horizontal tile operation across every element of a corresponding group of the K groups, and writing each generated result to a corresponding location of the K specified destination locations.

Machine-learning training service for synthetic data

Various embodiments, methods and systems for implementing a distributed computing system machine-learning training service are provided. Initially a machine learning model is accessed. A plurality of synthetic data assets are accessed, where a synthetic data asset is associated with asset-variation parameters that are programmable for machine-learning. The machine learning model is retrained using the plurality of synthetic data assets. The machine-learning training service is further configured for executing real-time calls to generate an on-the-fly-generated synthetic data asset such that the on-the-fly-generated synthetic data asset is rendered in real-time to preclude pre-rendering and storing the on-the-fly-generated synthetic data asset. The machine-learning training service further supports hybrid-based machine learning training, where the machine learning model is trained based on a combination of the plurality of synthetic data assets, a plurality of non-synthetic data assets, and synthetic data asset metadata associated with the plurality of synthetic data assets.

System having a hybrid threading processor, a hybrid threading fabric having configurable computing elements, and a hybrid interconnection network
11579887 · 2023-02-14 · ·

Representative apparatus, method, and system embodiments are disclosed for configurable computing. In a representative embodiment, a system includes an interconnection network, a processor, a host interface, and a configurable circuit cluster. The configurable circuit cluster may include a plurality of configurable circuits arranged in an array; an asynchronous packet network and a synchronous network coupled to each configurable circuit of the array; and a memory interface circuit and a dispatch interface circuit coupled to the asynchronous packet network and to the interconnection network. Each configurable circuit includes instruction or configuration memories for selection of a current data path configuration, a master synchronous network input, and a data path configuration for a next configurable circuit.

Platform independent GPU profiles for more efficient utilization of GPU resources

Disclosed are various examples for platform independent graphics processing unit (GPU) profiles for more efficient utilization of GPU resources. A virtual machine configuration can be identified to include a platform independent graphics computing requirement. Hosts can be identified as available in a computing environment based on the platform independent graphics computing requirement. The virtual machine can be placed on a host based on a consideration of host priority.

PROGRAMMABLE CONTROLLER

A method for a controller to execute a program comprising a sequence of functions on an accelerator with a pipelined architecture comprising a microcode buffer. The method comprises executing a function of the program as a sequence of operations, wherein the sequence of operations is represented by a sequence of templates, determining whether the template is non-colliding with previously inserted templates in the microcode buffer, determining whether data in local memory will be referenced before all previously inserted templates have taken effect, determining whether registers will be referenced before all previously inserted templates in the microcode buffer have taken effect, when it is determined that the template fits, that resources are available, that local data memory accesses will not collide, and that register accesses will not collide: creating a sequence of microcode instructions in the template, and inserting the template into the microcode buffer.