Patent classifications
G06F9/38
SYSTEMS, METHODS, AND APPARATUS FOR ASSOCIATING COMPUTATIONAL DEVICE FUNCTIONS WITH COMPUTE ENGINES
A method may include creating an association identifier based on an association between a computational device function and a compute engine of a computational device, and invoking an execute command to perform an execution of the computational device function using the compute engine, wherein the execute command uses the association identifier. The compute engine may be a first compute engine, and the association may be further between the computational device function and a second compute engine of the computational device. The execute command may perform an execution of the computational device function using the second compute engine. The execution of the computational device function using the first compute engine and the execution of the computational device function using the second compute engine may overlap. The execute command may include the association identifier. The creating the association identifier may include invoking a create association command.
SYSTEMS AND METHODS FOR AI INFERENCE PLATFORM
System and method for using and managing artificial intelligence (AI) inference platform (AIP) and/or model orchestrators according to certain embodiments. For example, a method includes receiving sensor data via a data interface of a model orchestrator, the model orchestrator including an indication of a model pipeline, the model pipeline including a plurality of models; loading the plurality of models according to the model pipeline; applying the model pipeline to the received sensor data; receiving a model output from the model pipeline via a model interface of the model orchestrator; and generating an insight based at least in part on the model output.
NON-BLOCKING EXTERNAL DEVICE CALL
Devices and techniques for non-blocking external device calls are described herein. Specifically, when a processor receives an instruction with a no-return indication from a thread for a device, the processor can increase a counter corresponding to the thread based on the no-return indication. The processor can then continue execution of the thread without waiting for a return value from the device. When a return value is received for the instruction, the processor can decrement the counter. While the counter is not zero, the processor prevents the thread from completing (exiting).
AUTOMATED SYNTHESIS OF REFERENCE POLICIES FOR RUNTIME MICROSERVICE PROTECTION
A method, apparatus and computer program product for automated security policy synthesis and use in a container environment. In this approach, a binary analysis of a program associated with a container image is carried out within a binary analysis platform. During the binary analysis, the program is micro-executed directly inside the analysis platform to generate a graph that summarizes the program's expected interactions within the run-time container environment. The expected interactions are identified by analysis of one or more system calls and their arguments found during micro-executing the program. Once the graph is created, a security policy is then automatically synthesized from the graph and instantiated into the container environment. The policy embeds at least one system call argument. During run-time monitoring of an event sequence associated with the program executing in the container environment, an action is taken when the event sequence is determined to violate the security policy.
SIMD DATA PATH ORGANIZATION TO INCREASE PROCESSING THROUGHPUT IN A SYSTEM ON A CHIP
In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
REDUCED MEMORY WRITE REQUIREMENTS IN A SYSTEM ON A CHIP USING AUTOMATIC STORE PREDICATION
In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
CALL AND RETURN INSTRUCTIONS FOR CONFIGURABLE REGISTER CONTEXT SAVE AND RESTORE
Systems, devices, circuitries, and methods are disclosed for identifying, within a call instruction, context registers for storing prior to a jump to another subroutine. In one example, a method includes receiving, while executing a first subroutine, a call instruction that includes a first opcode and identifies a first target address, wherein the first target address stores instructions for performing a second subroutine. A first set of context registers identified by the call instruction is determined and the content of the first set of context registers is stored in first memory allocated for context storage for the first subroutine prior to executing the instruction stored in the first target address.
System and Method for Distributed Data Processing
A distributed data processing system includes a processing center or algorithm persistence system (“APS”), a series of remote caching nodes in electronic communication with the APS, and a series of remote computing or processing nodes in electronic communication with the remote caching nodes. Each remote caching node is mounted to a top surface of a mobile vehicle and includes a data transmitter/receiver (transceiver), computer hardware and software to operate the caching node, memory to transmit or transfer data from the APS to the remote processing nodes. The remote processing nodes include a series of electricity generating solar panels, a series of electronic data processing chips, electronic data memory, an electronic date transmitter/receiver (transceiver), and a motion sensor. The series of electronic data processing chips are preferably a tensor processing unit (TPU), which is an AI accelerator application-specific integrated circuit (ASIC) developed specifically for neural network machine learning.
PACKING CONDITIONAL BRANCH OPERATIONS
Disclosed in some examples, are systems, methods, devices, and machine readable mediums which use improved dynamic programming algorithms to pack conditional branch instructions. Conditional code branches may be modeled as directed acyclic graphs (DAGs) which have a topological ordering. These DAGs may be used to construct a dynamic programming table to find a partial mapping of one path onto the other path using dynamic programming algorithms.
EFFICIENT PROCESSING OF NESTED LOOPS FOR COMPUTING DEVICE WITH MULTIPLE CONFIGURABLE PROCESSING ELEMENTS USING MULTIPLE SPOKE COUNTS
Disclosed in some examples, are methods, systems, devices, and machine-readable mediums which provide for more efficient CGRA execution by assigning different initiation intervals to different PEs executing a same code base. The initiation intervals may be a multiple of each other and the PE with the lowest initiation interval may be used to execute instructions of the code that is to be executed at a greater frequency than other instructions than other instructions that may be assigned to PEs with higher initiation intervals.