Patent classifications
G06F9/463
DEVICES AND METHODS FOR PARALLELIZED RECURSIVE BLOCK DECODING
A decoder for determining an estimate of a vector of information symbols carried by a signal received through a transmission channel represented by a channel matrix is provided. The decoder includes a block division unit configured to divide the vector of information symbols into two or more sub-vectors, each sub-vector being associated with a block level; two or more processors configured to determine, in parallel, candidate sub-vectors and to store the candidate sub-vectors in a first stack. Each processor is configured to determine at least a candidate sub-vector by applying a symbol estimation algorithm and to store each candidate sub-vector with a decoding metric and the block level associated with the candidate sub-vector. The decoding metric is lower than or equal to a decoding metric threshold. A processor among the two or more processors is configured to determine at least a candidate vector from candidate sub-vectors stored in the first stack, the candidate vector being associated with a cumulated decoding metric and to update the decoding metric threshold from the cumulated decoding metric.
Pre-instruction scheduling rematerialization for register pressure reduction
Examples are disclosed herein that relate to performing rematerialization operation(s) on program source code prior to instruction scheduling. In one example, a method includes prior to performing instruction scheduling on program source code, for each basic block of the program source code, determining a register pressure at a boundary of the basic block, determining whether the register pressure at the boundary is greater than a target register pressure, based on the register pressure at the boundary being greater than the target register pressure, identifying one or more candidate instructions in the basic block suitable for rematerialization to reduce the register pressure at the boundary, and performing a rematerialization operation on at least one of the one or more candidate instructions to reduce the register pressure at the boundary to be less than the target register pressure.
Partial order procedure planning device, partial order procedure planning method and partial order procedure planning program
A partial order procedure planning device 10 is provided with: a first generation unit 11 which generates a first condition of a removable order relationship under a predetermined restriction among order relationships between operations in a serial procedure in which a plurality of operations, which transit the state of a state element from an initial state to a target state, are arranged in series; a second generation unit 12 which generates a second condition of an order relationship, which is required to satisfy a transient requirement that is required to satisfy the state element while a state among the order relationships is transitioned from the initial state to the target state; and a determination unit 13 which determines, as the order relationship to be deleted from the serial procedure, an order relationship which satisfies the generated first condition, but does not satisfy the generated second condition.
Artificial intelligence chip and instruction execution method for artificial intelligence chip
Embodiments of the present disclosure disclose an artificial intelligence chip and an instruction execution method for an artificial intelligence chip. A specific embodiment of the artificial intelligence chip includes: an instruction memory, a data memory, at least one general execution unit, and at least one dedicated execution unit. The instruction memory is configured to: receive a kernel code including at least one code block. The general execution unit is configured to: receive the code block, lock the dedicated execution unit associated with the received code block, and send an instruction in the received code block to the locked dedicated execution unit. The dedicated execution unit is configured to: execute the received instruction, and store an execution result in the data memory. The data memory is configured to: store the execution result sent by the dedicated execution unit.
PROCESSING PIPELINE WITH ZERO LOOP OVERHEAD
Techniques are disclosed for reducing or eliminating loop overhead caused by function calls in processors that form part of a pipeline architecture. The processors in the pipeline process data blocks in an iterative fashion, with each processor in the pipeline completing one of several iterations associated with a processing loop for a commonly-executed function. The described techniques leverage the use of message passing for pipelined processors to enable an upstream processor to signal to a downstream processor when processing has been completed, and thus a data block is ready for further processing in accordance with the next loop processing iteration. The described techniques facilitate a zero loop overhead architecture, enable continuous data block processing, and allow the processing pipeline to function indefinitely within the main body of the processing loop associated with the commonly-executed function where efficiency is greatest.
Deriving component statistics for a stream enabled application
A technique for generating component usage statistics involves associating components with blocks of a stream-enabled application. When the streaming application is executed, block requests may be logged by Block ID in a log. The frequency of component use may be estimated by analyzing the block request log with the block associations.
Initialization of Parameters for Machine-Learned Transformer Neural Network Architectures
An online system trains a transformer architecture by an initialization method which allows the transformer architecture to be trained without normalization layers of learning rate warmup, resulting in significant improvements in computational efficiency for transformer architectures. Specifically, an attention block included in an encoder or a decoder of the transformer architecture generates the set of attention representations by applying a key matrix to the input key, a query matrix to the input query, a value matrix to the input value to generate an output, and applying an output matrix to the output to generate the set of attention representations. The initialization method may be performed by scaling the parameters of the value matrix and the output matrix with a factor that is inverse to a number of the set of encoders or a number of the set of decoders.
Method and apparatus for execution of neural network
The present disclosure relates to methods and apparatuses for execution of a neural network. An exemplary method can be implemented by a processing unit. The processing unit can include a command parser configured to dispatch commands and computing tasks and at least one core communicatively coupled with the command parser and configured to process the dispatched computing task. Each core can include a convolution unit, a pooling unit, at least one operation unit and a sequencer communicatively coupled with the convolution unit, the pooling unit, and the at least one operation unit and configured to distribute instructions of the dispatched computing task to the convolution unit, the pooling unit, and the at least one operation unit for execution. The method can include: reading, by the convolution unit, data from a local memory of the at least one operation unit; performing, by the convolution unit, a convolution operation on the data to generate a feature map; and performing, by the pooling unit, a pooling operation on the feature map.
Method and Apparatus of Providing a Function as a Service (FAAS) Deployment of an Application
It is disclosed a method and an apparatus (80) of providing a function as a service, faas, deployment of an application. A deployment unit is generated (30, 44, 508) per group of application blocks, where said deployment unit comprises said group of application blocks, and an implementation of function invocation for functions being accessed by groups of application blocks. Function invocations of the group of application blocks are constrained or bound (604, 610, 612) to libraries of supporting implementations. Deployment units are provided (32, 48, 510) together with the element invocations attached to said libraries, to a lifecycle manager of a faas platform, whereby the faas platform implements the faas deployment of said application the performance targets of which, being related to the groups of application blocks. This disclosure enables a developer to adjust the performance of an application without having to change the logic of application implementations.
Method and process of creating qualifiable parameter data item (PDI) to define the function of a power system controller
A method and system of designing control logic for an avionics system, the method and system including receiving a function requirement defining a desired control logic for the desired control system, designing, by a user in a user interface (UI) of a toolset, the desired control logic comprising an arrangement of predefined library blocks to enable the functional requirement in the desired control system, and generating, by the toolset, a data file representative of the desired control logic to enable the functional requirement during run-time operation in the avionics system.