Patent classifications
G06F9/3555
AUTOMATIC COMPUTE KERNEL GENERATION
Apparatuses, systems, and techniques to receive, by a processor of a computer system, one or more operations for a kernel; automatically generate, by the processor, one or more operators that perform the one or more operations on elements of one or more input data structures; and automatically generate, by the processor, the kernel that comprises the one or more operators.
Zero knowledge proof hardware accelerator and the method thereof
A hardware accelerator for accelerating the zero knowledge succinct non-interactive argument of knowledge (zk-SNARK) protocol by reducing the computation time of the cryptographic verification is disclosed. The accelerator includes a zk-SNARK engine having one or more processing units running in parallel. The processing unit can include one or more multiply-accumulate operation (MAC) units, one or more fast Fourier transform (FFT) units; and one or more elliptic curve processor (ECP) units. The one or more ECP units are configured to reduce a bit-length of a scalar d.sub.i in an ECP algorithm used for generating a proof, thereby the cryptographic verification requires less computation power.
STOCHASTIC HYPERDIMENSIONAL ARITHMETIC COMPUTING
Stochastic hyperdimensional arithmetic computing is provided. Hyperdimensional computing (HDC) is a neurally-inspired computation model working based on the observation that the human brain operates on high-dimensional representations of data, called hypervectors. Although HDC is powerful in reasoning and association of the abstract information, it is weak on feature extraction from complex data. Consequently, most existing HDC solutions rely on expensive pre-processing algorithms for feature extraction. This disclosure proposes StocHD, a novel end-to-end hyperdimensional system that supports accurate, efficient, and robust learning over raw data. StocHD expands HDC functionality to the computing area by mathematically defining stochastic arithmetic over HDC hypervectors. StocHD enables an entire learning application (including feature extractor) to process using HDC data representation, enabling uniform, efficient, robust, and highly parallel computation. This disclosure further provides a novel fully digital and scalable processing in-memory (PIM) architecture that exploits the HDC memory-centric nature to support extensively parallel computation.
COMPUTING ARCHITECTURE
Computing architecture comprises an off-chip memory, an on-chip cache unit, a prefetching unit, a global scheduler, a transmitting unit, a pre-recombination network, a post-recombination network, a main computing array, a write-back cache unit, a data dependence controller and an auxiliary computing array. The architecture reads data tiles into an on-chip cache in a prefetching mode, and performs computing according to the data tiles; in the computing process of the tiles, a tile exchange network is adopted to recombine a data structure, and a data dependence module is arranged to process a data dependence relationship possibly existing between different tiles. According to the computing architecture, the data utilization rate can be increased, the data processing flexibility is improved, and therefore Cache Miss is reduced, and the memory bandwidth pressure is reduced.
METHOD OF EXECUTING OPERATION, ELECTRONIC DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM
A method of executing an operation in a deep learning training, an electronic device, and a computer-readable storage medium, which relate to a field of artificial intelligence, especially to a field of deep learning. The method of executing an operation in a deep learning training includes: acquiring an instruction for the operation including a plurality of vector operations; determining, for each vector operation of the plurality of vector operations, two source operand vectors for a comparison; and executing the vector operation on the two source operand vectors using an instruction format for the vector operation, so as to obtain an operation result including a destination operand vector.
Method and Device for Detecting Dysfunction of Vehicle Embedded Computer
The present disclosure concerns a method to train, on a computing device, a machine-learning model adapted to determine a dysfunction of a monitored vehicle electronic control unit (ECU) or vehicle embedded computer. In aspects, the computing device stores, in a memory, historical data from a plurality of ECUs having a dysfunction. The historical data may include usage values over a period of time of at least one ECU resource by applications running on the ECUs. Further, the computing device may process the historical data to obtain two-dimensional training files. In implementations, each usage value may be linked with a specific application in a first dimension and a specific time in a second dimension. Still further, the computing device may train a machine-learning model with the training files.
Ordered event stream event retention
Retention of events of an ordered event stream is disclosed. Expiration of events stored in a segment of an ordered event stream (OES) can be desirable. New events are added to a head of an OES segment, and pruning events from a tail of the OES segment can be valuable. Processing applications can register a processing scheme for a segment, e.g., at-least-once processing, exactly-once processing, etc., and can generate checkpoints indicating a degree of advancement in processing events of the segment. The ordered event stream can determine a cut point indicative of a progress point, that before which, events of an OES can be marked as ready for expiration. However, events that are marked for expiration can be retained to allow processing based on a checkpoint, e.g., expiration of the event can be refused until there is an assurance the event was read by the processing application.
Autoscaling in an elastic cloud service
Techniques described herein can optimize usage of computing resources in a data system. Dynamic throttling can be performed locally on a computing resource in the foreground and autoscaling can be performed in a centralized fashion in the background. Dynamic throttling can lower the load without overshooting while minimizing oscillation and reducing the throttle quickly. Autoscaling may involve scaling in or out the number of computing resources in a cluster as well as scaling up or down the type of computing resources to handle different types of situations.
Initialization of parameters for machine-learned transformer neural network architectures
An online system trains a transformer architecture by an initialization method which allows the transformer architecture to be trained without normalization layers of learning rate warmup, resulting in significant improvements in computational efficiency for transformer architectures. Specifically, an attention block included in an encoder or a decoder of the transformer architecture generates the set of attention representations by applying a key matrix to the input key, a query matrix to the input query, a value matrix to the input value to generate an output, and applying an output matrix to the output to generate the set of attention representations. The initialization method may be performed by scaling the parameters of the value matrix and the output matrix with a factor that is inverse to a number of the set of encoders or a number of the set of decoders.
Vector convert hexadecimal floating point to scaled decimal instruction
An instruction to perform converting and scaling operations is provided. Execution of the instruction includes converting an input value in one format to provide a converted result in another format. The converted result is scaled to provide a scaled result. A result obtained from the scaled result is placed in a selected location. Further, an instruction to perform scaling and converting operations is provided. Execution of the instruction includes scaling an input value in one format to provide a scaled result and converting the scaled result from the one format to provide a converted result in another format. A result obtained from the converted result is placed in a selected location.