Patent classifications
G06F9/3812
METHOD AND DEVICE FOR PROVIDING A VECTOR STREAM INSTRUCTION SET ARCHITECTURE EXTENSION FOR A CPU
A method and device for providing a vector stream instruction set architecture extension for a CPU. In one aspect, there is provided a vector stream engine unit comprising: a first fast memory storage for temporarily storing data of vector data streams from a memory for loading into a vector register file; a second fast memory storage for temporarily storing data of the vector data streams from the vector register file for loading into the memory; a prefetcher configured to prefetch data of the vector data streams from the memory into the first fast storage memory, and prefetch data of the vector data streams from the vector register file into the second fast storage memory; and a stream configuration table (SCT) storing stream information for prefetching data from the vector data streams.
Method and apparatus for configuring a reduced instruction set computer processor architecture to execute a fully homomorphic encryption algorithm
Systems and methods for configuring a reduced instruction set computer processor architecture to execute fully homomorphic encryption (FHE) logic gates as a streaming topology. The method includes parsing sequential FHE logic gate code, transforming the FHE logic gate code into a set of code modules that each have in input and an output that is a function of the input and which do not pass control to other functions, creating a node wrapper around each code module, configuring at least one of the primary processing cores to implement the logic element equivalents of each element in a manner which operates in a streaming mode wherein data streams out of corresponding arithmetic logic units into the main memory and other ones of the plurality arithmetic logic units.
Accelerating AI training by an all-reduce process with compression over a distributed system
According to various embodiments, methods and systems are provided to accelerate artificial intelligence (AI) model training with advanced interconnect communication technologies and systematic zero-value compression over a distributed training system. According to an exemplary method, during each iteration of a Scatter-Reduce process performed on a cluster of processors arranged in a logical ring to train a neural network model, a processor receives a compressed data block from a prior processor in the logical ring, performs an operation on the received compressed data block and a compressed data block generated on the processor to obtain a calculated data block, and sends the calculated data block to a following processor in the logical ring. A compressed data block calculated from corresponding data blocks from the processors can be identified on each processor and distributed to each other processor and decompressed therein for use in the AI model training.
Method and Apparatus for Configuring a Reduced Instruction Set Computer Processor Architecture to Execute a Fully Homomorphic Encryption Algorithm
Systems and methods for configuring a reduced instruction set computer processor architecture to execute fully homomorphic encryption (FHE) logic gates as a streaming topology. The method includes parsing sequential FHE logic gate code, transforming the FHE logic gate code into a set of code modules that each have in input and an output that is a function of the input and which do not pass control to other functions, creating a node wrapper around each code module, configuring at least one of the primary processing cores to implement the logic element equivalents of each element in a manner which operates in a streaming mode wherein data streams out of corresponding arithmetic logic units into the main memory and other ones of the plurality arithmetic logic units.
SCALABLE TOGGLE POINT CONTROL CIRCUITRY FOR A CLUSTERED DECODE PIPELINE
Systems, methods, and apparatuses relating to circuitry to implement toggle point insertion for a clustered decode pipeline are described. In one example, a hardware processor core includes a first decode cluster comprising a plurality of decoder circuits, a second decode cluster comprising a plurality of decoder circuits, and a toggle point control circuit to toggle between sending instructions requested for decoding between the first decode cluster and the second decode cluster, wherein the toggle point control circuit is to: determine a location in an instruction stream as a candidate toggle point to switch the sending of the instructions requested for decoding between the first decode cluster and the second decode cluster, track a number of times a characteristic of multiple previous decodes of the instruction stream is present for the location, and cause insertion of a toggle point at the location, based on the number of times, to switch the sending of the instructions requested for decoding between the first decode cluster and the second decode cluster.
Information-Unit Based Scaling of an Ordered Event Stream
Scaling an ordered event stream (OES) based on an information-unit (IU) metric is disclosed. The IU metric can correspond to an amount of computing resources that can be consumed to access information embodied in event data of an event of the OES. In this regard, the amount of computing resources to access the data of the stream event itself can be distinct from an amount of computing resources employed to access information embodied in the data. As such, where an external application, e.g., a reader, a writer, etc., can connect to an OES data storage system, enabling the OES to be scaled in response to burdening of computing resources accessing event information, rather than merely event data, can aid in preservation of an ordering of events accessed from the OES.
METHODS AND APPARATUS FOR CONTEXT SWITCHING
Aspects of the present disclosure relate to apparatus comprising execution circuitry comprising at least one execution unit to execute program instructions, and control circuitry. The control circuitry receives a stream of processing instructions, and issues each received instruction to one of said at least one execution unit. Responsive to determining that a first type of context switch is to be performed from an initial context to a new context, issuing continues until a pre-emption point in the stream of processing instructions is reached. Responsive to reaching the pre-emption point, state information is stored, and the new context is switched to. Responsive to determining that a context switch is to be performed to return from the new context to the initial context, the processing status is restored from the state information, and the stream of processing instructions is continued.
Creation of Message Serializer for Event Streaming Platform
Processing logic may determine that an application is to produce one or more records to an event streaming platform. Processing logic may determine a data structure to contain content to be stored to the event streaming platform. Processing logic may automatically generate a serializer in view of the data structure during development of the application. During runtime, the application may use the serializer to serialize the content contained in the data structure and store the content to the one or more records of the event streaming platform.
Systems and methods for optimizing authentication branch instructions
Systems, apparatuses, and methods for efficient handling of subroutine epilogues. When an indirect control transfer instruction corresponding to a procedure return for a subroutine is identified, the return address and a signature are retrieved from one or more of a return address stack and the memory stack. An authenticator generates a signature based on at least a portion of the retrieved return address. While the signature is being generated, instruction processing speculatively continues. No instructions are permitted to commit yet. The generated signature is later compared to a copy of the signature generated earlier during the corresponding procedure call. A mismatch causes an exception.
Instruction generation process multiplexing method and device
Aspects of reusing neural network instructions are described herein. The aspects may include a computing device configured to calculate a hash value of a neural network layer based on the layer information thereof. A determination unit may be configured to determine whether the hash value exists in a hash table. If the hash value is included in the hash table, one or more neural network instructions that correspond to the hash value may be reused.