Patent classifications
G06F2205/00
SIGNED DIVISION IN MEMORY
Examples of the present disclosure provide apparatuses and methods for performing signed division operations. An apparatus can include a first group of memory cells coupled to a sense line and to a number of first access lines. The apparatus can include a second group of memory cells coupled to the sense line and to a number of second access lines. The apparatus can include a controller configured to operate sensing circuitry to divide a signed dividend element stored in the first group of memory cells by a signed divisor element stored in the second group of memory cells by performing a number of operations.
Method and device for noise suppression in a data processing arrangement
Methods and systems are provided for noise suppression in data, particularly data comprising video and/or audio data. An input adjustment, based on a corresponding input adjustment value, may be applied to received input data that comprises video and/or audio data; and an output adjustment, based on a corresponding output adjustment value, may be applied to output data corresponding to previously processed received input data. The input adjustment value may be re-calculated based on an outcome of applying the input adjustment to the received data, and when a change occurs in the input adjustment value the change in the input adjustment value may be applied to subsequent received input data, and at the same time, and based on the change in the input adjustment value, a corresponding change may be applied at least some of data corresponding to previously processed received input data.
Hardware accelerated machine learning
A machine learning hardware accelerator architecture and associated techniques are disclosed. The architecture features multiple memory banks of very wide SRAM that may be concurrently accessed by a large number of parallel operational units. Each operational unit supports an instruction set specific to machine learning, including optimizations for performing tensor operations and convolutions. Optimized addressing, an optimized shift reader and variations on a multicast network that permutes and copies data and associates with an operational unit that support those operations are also disclosed.
System, method, and recording medium for mirroring matrices for batched cholesky decomposition on a graphic processing unit
A batched Cholesky decomposition method, system, and non-transitory computer readable medium for a Graphics Processing Unit (GPU), include mirroring matrices to form paired matrices solving the paired matrices simultaneously.
Hardware accelerated machine learning
A machine learning hardware accelerator architecture and associated techniques are disclosed. The architecture features multiple memory banks of very wide SRAM that may be concurrently accessed by a large number of parallel operational units. Each operational unit supports an instruction set specific to machine learning, including optimizations for performing tensor operations and convolutions. Optimized addressing, an optimized shift reader and variations on a multicast network that permutes and copies data and associates with an operational unit that support those operations are also disclosed.
SYSTEM, METHOD AND RECORDING MEDIUM FOR MIRRORING MATRICES FOR BATCHED CHOLESKY DECOMPOSITION ON A GRAPHIC PROCESSING UNIT
A batched Cholesky decomposition method, system, and non-transitory computer readable medium for a Graphics Processing Unit (GPU), include mirroring matrices to form paired matrices solving the paired matrices simultaneously.
HARDWARE ACCELERATED MACHINE LEARNING
A machine learning hardware accelerator architecture and associated techniques are disclosed. The architecture features multiple memory banks of very wide SRAM that may be concurrently accessed by a large number of parallel operational units. Each operational unit supports an instruction set specific to machine learning, including optimizations for performing tensor operations and convolutions. Optimized addressing, an optimized shift reader and variations on a multicast network that permutes and copies data and associates with an operational unit that support those operations are also disclosed.
SYSTEM, METHOD, AND RECORDING MEDIUM FOR MIRRORING MATRICES FOR BATCHED CHOLESKY DECOMPOSITION ON A GRAPHIC PROCESSING UNIT
A batched Cholesky decomposition method, system, and non-transitory computer readable medium for a Graphics Processing Unit (GPU), include mirroring matrices to form paired matrices solving the paired matrices simultaneously.
System, method, and recording medium for mirroring matrices for batched Cholesky decomposition on a graphic processing unit
A batched Cholesky decomposition method, system, and non-transitory computer readable medium for a Graphics Processing Unit (GPU), include mirroring a second problem matrix of a second problem to a first problem matrix of a first problem as paired matrices and shifting the second problem matrix by N+1 and combining the first problem matrix and the mirrored second problem matrix into one matrix of (N+1)×N, where the first problem shared memory comprises regular intervals, where the second problem shared memory is continuous, and where the GPU performs batched dense Cholesky decomposition with the one matrix from the combining to accelerate the Cholesky decomposition.
Compare and swap functionality for key-value and object stores
Embodiments for providing compare and swap (CAS) functionality to key value storage to allow multi-threaded applications to share storage devices and synchronize multiple concurrent threads or processes. A key-value application programming interface (API) is modified to include a CAS API in addition to the standard Put and Get APIs. The CAS function uses a key, expected old value, and new value to compare and swap an existing key value only if its current value equals the expected old value. Hash values of the key value and expected old value may be used by the CAS function to improve performance and reduce bandwidth.