Patent classifications
G06F2209/484
Enhancing processing performance of a DNN module by bandwidth control of fabric interface
An exemplary computing environment having a DNN module can maintain one or more bandwidth throttling mechanisms. Illustratively, a first throttling mechanism can specify the number of cycles to wait between transactions on a cooperating fabric component (e.g., data bus). Illustratively, a second throttling mechanism can be a transaction count limiter that operatively sets a threshold of a number of transactions to be processed during a given transaction sequence and limits the number of transactions such as multiple transactions in flight to not exceed the set threshold. In an illustrative operation, in executing these two exemplary calculated throttling parameters, the average bandwidth usage and the peak bandwidth usage can be limited. Operatively, with this fabric bandwidth control, the processing units of the DNN are optimized to process data across each transaction cycle resulting in enhanced processing and lower power consumption.
SYSTEM AND METHOD FOR BATCH EVALUATION PROGRAMS
A batching module that prepares a plurality of blocked expressions for batch evaluation. The plurality of blocked expressions comprises a plurality of expressions in a blocked state. The batching module divides the plurality of blocked expressions into one or more partitions. For each particular partition of the one or more partitions, a single batch processing call is dispatched to an application server to perform a batch evaluation.
Systems and methods for data processing
A method for data processing is provided. The method may include: preprocessing initial data to obtain preprocessed data; storing the preprocessed data; receiving a data request made through an application, the data request including information relating to a storage path of contents that are requested; in response to the data request, determining, by a nearby proxy of a first proxy cluster in a first region, whether the contents requested in the data request are cached locally; and in response to a determination that the contents are cached locally, providing, by the nearby proxy, the contents to the application; or in response to a determination that the contents are not cached locally, acquiring, by the nearby proxy, the contents based on the information relating to the storage path of the contents; and providing, by the nearby proxy, the contents to the application.
PROCESSOR FOR PERFORMING A PREDETERMINED COMPUTATIONAL OPERATION, AND PROCESSING UNIT
A processor for performing a predetermined computational operation in which one or multiple data element(s) is/are used to determine a result. The processor includes one or more processor core(s) and at least one buffer memory, connectable to a main memory, and if the main memory is connected, it is designed to access the main memory. Each processor core is designed to execute instructions. The at least one buffer memory includes a calculation circuit which is designed to perform the computational operation in response to an execution signal if the one or the multiple data element(s) is/are stored in the buffer memory, the result being stored in the buffer memory. The processor is designed to perform the computational operation optionally using one of the processor cores with the aid of the instructions or to perform it in the at least one buffer memory using the respective calculation circuit.
Low-cost task specific device scheduling system
A low-cost task specific device system for scheduling tasks that are performed by one or more devices is described. After a preceding task is performed, the performance of a successive task is delayed for a task-specific recharge interval associated with the preceding task. The successive task is performed after the task-specific recharge interval has expired.
Scheduling heterogeneous execution on heterogeneous hardware
The subject technology determines input parameters and an output format of algorithms for a particular functionality provided by an electronic device. The subject technology determines an order of the algorithms for performing the particular functionality based on temporal dependencies of the algorithms, and the input parameters and the output format of the algorithms. The subject technology generates a graph based on the order of the algorithms, the graph comprising a set of nodes corresponding to the algorithms, each node indicating a particular processor of the electronic device for executing an algorithm. Further, the subject technology executes the particular functionality based on performing a traversal of the graph, the traversal comprising a topological traversal of the set of nodes and the traversal being based on a score indicating whether selection of a particular node for execution over another node enables a greater number of processors to be utilized at a time.
Flexible hardware for high throughput vector dequantization with dynamic vector length and codebook size
The performance of a neural network (NN) and/or deep neural network (DNN) can limited by the number of operations being performed as well as memory data management of a NN/DNN. Using vector quantization of neuron weight values, the processing of data by neurons can be optimize the number of operations as well as memory utilization to enhance the overall performance of a NN/DNN. Operatively, one or more contiguous segments of weight values can be converted into one or more vectors of arbitrary length and each of the one or more vectors can be assigned an index. The generated indexes can be stored in an exemplary vector quantization lookup table and retrieved by exemplary fast weight lookup hardware at run time on the fly as part of an exemplary data processing function of the NN as part of an inline de-quantization operation to obtain needed one or more neuron weight values.
Prioritization and prediction of jobs using cognitive rules engine
A method, computer system, and a computer program product for data pipeline prioritization is provided. Embodiments may include receiving, by a cognitive rules engine, one or more data pipelines. Embodiments may then include analyzing, using a computational method of the cognitive rules engine, the one or more data pipelines. Embodiments may lastly include prioritizing the one or more data pipelines based on a result of the computational method of the cognitive rules engine.
Neural network processor using compression and decompression of activation data to reduce memory bandwidth utilization
A deep neural network (“DNN”) module compresses and decompresses neuron-generated activation data to reduce the utilization of memory bus bandwidth. The compression unit receives an uncompressed chunk of data generated by a neuron in the DNN module. The compression unit generates a mask portion and a data portion of a compressed output chunk. The mask portion encodes the presence and location of the zero and non-zero bytes in the uncompressed chunk of data. The data portion stores truncated non-zero bytes from the uncompressed chunk of data. A decompression unit receives a compressed chunk of data from memory in the DNN processor or memory of an application host. The decompression unit decompresses the compressed chunk of data using the mask portion and the data portion.
Dynamically partitioning workload in a deep neural network module to reduce power consumption
A deep neural network (DNN) module is disclosed that can dynamically partition neuron workload to reduce power consumption. The DNN module includes neurons and a group partitioner and scheduler unit. The group partitioner and scheduler unit divides a workload for the neurons into partitions in order to maximize the number of neurons that can simultaneously process the workload. The group partitioner and scheduler unit then assigns a group of neurons to each of the partitions. The groups of neurons in the DNN module process the workload in their assigned partition to generate a partial output value. The neurons in each group can then sum their partial output values to generate a final output value for the workload. The neurons can be powered down once the groups of neurons have completed processing their assigned workload to reduce power consumption.