Patent classifications
G06F12/0207
PERFORMING MULTIPLE POINT TABLE LOOKUPS IN A SINGLE CYCLE IN A SYSTEM ON CHIP
In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
Intelligent post-packaging repair
Techniques are provided for storing a row address of a defective row of memory cells to a bank of non-volatile storage elements (e.g., fuses or anti-fuses). After a memory device has been packaged, one or more rows of memory cells may become defective. In order to repair (e.g., replace) the rows, a post-package repair (PPR) operation may occur to replace the defective row with a redundant row of the memory array. To replace the defective row with a redundant row, an address of the defective row may be stored (e.g., mapped) to an available bank of non-volatile storage elements that is associated with a redundant row. Based on the bank of non-volatile storage elements the address of the defective row, subsequent access operations may utilize the redundant row and not the defective row.
Implicit integrity for cryptographic computing
In one embodiment, a processor includes a memory hierarchy and a core coupled to the memory hierarchy. The memory hierarchy stores encrypted data, and the core includes circuitry to access the encrypted data stored in the memory hierarchy, decrypt the encrypted data to yield decrypted data, perform an entropy test on the decrypted data, and update a processor state based on a result of the entropy test. The entropy test may include determining a number of data entities in the decrypted data whose values are equal to one another, determining a number of adjacent data entities in the decrypted data whose values are equal to one another, determining a number of data entities in the decrypted data whose values are equal to at least one special value from a set of special values, or determining a sum of n highest data entity value frequencies.
Method and device for optimizing neural network
The embodiments of this application provide a method and device for optimizing neural network. The method includes: binarizing and bit-packing input data of a convolution layer along a channel direction, and obtaining compressed input data; binarizing and bit-packing respectively each convolution kernel of the convolution layer along the channel direction, and obtaining each corresponding compressed convolution kernel; dividing the compressed input data sequentially in a convolutional computation order into blocks of the compressed input data with the same size of each compressed convolution kernel, wherein the data input to one time convolutional computation form a data block; and, taking a convolutional computation on each block of the compressed input data and each compressed convolution kernel sequentially, obtaining each convolutional result data, and obtaining multiple output data of the convolution layer according to each convolutional result data.
Classification using cascaded spatial voting grids
A method can include identifying a first key value of a first cell of a first grid of grids of cells to which a first feature maps, embedding the first grid into each cell of a second grid, identifying a second key value of a second cell of the second grid to which a second feature maps, the second key value representative of the first and second key values, comparing the identified key value to the key values of a memory, in response to determining the identified key value is in the memory, and providing data indicating a class associated with the identified key value in the memory.
PERFORMING LOAD AND STORE OPERATIONS OF 2D ARRAYS IN A SINGLE CYCLE IN A SYSTEM ON A CHIP
In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
Systems and methods in a graphics environment for providing shared virtual memory addressing support for a host system
Systems and methods for providing shared virtual memory addressing support for a host system are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations. A memory management unit (MMU) is coupled to the processing resources. The MMU to support a first virtual address size for managing allocation of non-shared virtual memory and to support a second virtual address size for managing allocation of shared virtual memory that is shared between the graphics processor and a host.
Technologies to address individual bits in memory
Technologies for addressing individual bits in memory include a device having a memory that includes partitions that each have tiles, in which each tile stores an individual bit. The device also includes circuitry to receive a request to access (e.g., read or write) a sequence of bits in a partition. The request specifies a logical row or column address. A corresponding tile is determined from the logical row or column address and for each bit in the sequence. The corresponding tile is accessed to read or write the bit therein.
Processing method and device
The application provides a processing method and device. Weights and input neurons are quantized respectively, and a weight dictionary, a weight codebook, a neuron dictionary, and a neuron codebook are determined. A computational codebook is determined according to the weight codebook and the neuron codebook. Meanwhile, according to the application, the computational codebook is determined according to two types of quantized data, and the two types of quantized data are combined, which facilitates data processing.
Memory compression hashing mechanism
An apparatus to facilitate memory data compression is disclosed. The apparatus includes a memory and having a plurality of banks to store main data and metadata associated with the main data and a memory management unit (MMU) coupled to the plurality of banks to perform a hash function to compute indices into virtual address locations in memory for the main data and the metadata and adjust the metadata virtual address locations to store each adjusted metadata virtual address location in a bank storing the associated main data.