Patent classifications
G06F2212/16
Processor architecture
A processor having a functional slice architecture is divided into a plurality of functional units (“tiles”) organized into a plurality of slices. Each slice is configured to perform specific functions within the processor, which may include memory slices (MEM) for storing operand data, and arithmetic logic slices for performing operations on received operand data. The tiles of the processor are configured to stream operand data across a first dimension, and receive instructions across a second dimension orthogonal to the first dimension. The timing of data and instruction flows are configured such that corresponding data and instructions are received at each tile with a predetermined temporal relationship, allowing operand data to be transmitted between the slices of the processor without any accompanying metadata. Instead, each slice is able to determine what operations to perform on received data based upon the timing at which the data is received.
Programmable Multi-Level Data Access Address Generator
A programmable address generator has an iteration variable generator for generation of an ordered set of iteration variables, which are re-ordered by an iteration variable selection fabric, which delivers the re-ordered iteration variables to one or more address generators. A configurator receives an instruction containing fields which provide configuration constants to the address generator, iteration variable selection fabric, and address generators. After configuration, the address generators provide addresses coupled to a memory. In one example of the invention, the address generators generate an input address, a coefficient address, and an output address for performing convolutional neural network inferences.
Programmable Multi-Level Data Access Address Generator
A programmable address generator has an iteration variable generator for generation of an ordered set of iteration variables, which are re-ordered by an iteration variable selection fabric, which delivers the re-ordered iteration variables to one or more address generators. A configurator receives an instruction containing fields which provide configuration constants to the address generator, iteration variable selection fabric, and address generators. After configuration, the address generators provide addresses coupled to a memory. In one example of the invention, the address generators generate an input address, a coefficient address, and an output address for performing convolutional neural network inferences.
Processor architecture
A processor having a functional slice architecture is divided into a plurality of functional units (“tiles”) organized into a plurality of slices. Each slice is configured to perform specific functions within the processor, which may include memory slices (MEM) for storing operand data, and arithmetic logic slices for performing operations on received operand data. The tiles of the processor are configured to stream operand data across a first dimension, and receive instructions across a second dimension orthogonal to the first dimension. The timing of data and instruction flows are configured such that corresponding data and instructions are received at each tile with a predetermined temporal relationship, allowing operand data to be transmitted between the slices of the processor without any accompanying metadata. Instead, each slice is able to determine what operations to perform on received data based upon the timing at which the data is received.
System and methods of a self-tuning cache sizing system in a cache partitioning system
The disclosure provides a technique for reducing cache misses to a cache of a computer system. The technique includes deallocating memory pages of the cache from one process and allocating those memory pages to another process based on cache misses of each process during a given time period. Repeating the technique leads the total number of cache misses to the cache to gradually decrease to an optimum or near optimum level. The repetition of the technique leads to a dynamic and flexible apportionment of cache memory pages to processes running within the computer system.
Memory design for a processor
A processor having a functional slice architecture is divided into a plurality of functional units (tiles) organized into a plurality of slices. Each slice is configured to perform specific functions within the processor, which may include memory slices (MEM) for storing operand data, and arithmetic logic slices for performing operations on received operand data. The tiles of the processor are configured to stream operand data across a first dimension, and receive instructions across a second dimension orthogonal to the first dimension. The timing of data and instruction flows are configured such that corresponding data and instructions are received at each tile with a predetermined temporal relationship, allowing operand data to be transmitted between the slices of the processor without any accompanying metadata. Instead, each slice is able to determine what operations to perform on received data based upon the timing at which the data is received.
SYSTEM AND METHODS OF A SELF-TUNING CACHE SIZING SYSTEM IN A CACHE PARTITIONING SYSTEM
The disclosure provides a technique for reducing cache misses to a cache of a computer system. The technique includes deallocating memory pages of the cache from one process and allocating those memory pages to another process based on cache misses of each process during a given time period. Repeating the technique leads the total number of cache misses to the cache to gradually decrease to an optimum or near optimum level. The repetition of the technique leads to a dynamic and flexible apportionment of cache memory pages to processes running within the computer system.
Burning control system and method
A burning control system includes a capturing unit, a storing unit, a controlling unit, a processing unit, and a displaying unit. The capturing unit acquires product barcode information and burned barcode information relating to printed circuit board assembly (PCBA). The storing unit can store a program file corresponding to the burned barcode information. The controlling unit obtains product barcode information and the burned barcode information and outputs the burned barcode information to the processing unit. The processing unit writes the program file to a storage chip of the PCBA board. The displaying unit can display one of several prompts for a user's information. A burning control method is also provided.
BURNING CONTROL SYSTEM AND METHOD
A burning control system includes a capturing unit, a storing unit, a controlling unit, a processing unit, and a displaying unit. The capturing unit acquires product barcode information and burned barcode information relating to printed circuit board assembly (PCBA). The storing unit can store a program file corresponding to the burned barcode information. The controlling unit obtains product barcode information and the burned barcode information and outputs the burned barcode information to the processing unit. The processing unit writes the program file to a storage chip of the PCBA board. The displaying unit can display one of several prompts for a user's information. A burning control method is also provided.
PROCESSOR ARCHITECTURE
A processor having a functional slice architecture is divided into a plurality of functional units (tiles) organized into a plurality of slices. Each slice is configured to perform specific functions within the processor, which may include memory slices (MEM) for storing operand data, and arithmetic logic slices for performing operations on received operand data. The tiles of the processor are configured to stream operand data across a first dimension, and receive instructions across a second dimension orthogonal to the first dimension. The timing of data and instruction flows are configured such that corresponding data and instructions are received at each tile with a predetermined temporal relationship, allowing operand data to be transmitted between the slices of the processor without any accompanying metadata. Instead, each slice is able to determine what operations to perform on received data based upon the timing at which the data is received.