Patent classifications
G06F9/3895
Neural network training mechanism
- Gokcen Cilingir ,
- Elmoustapha Ould-Ahmed-Vall ,
- Rajkishore Barik ,
- Kevin Nealis ,
- Xiaoming Chen ,
- Justin E. Gottschlich ,
- Prasoonkumar Surti ,
- Chandrasekaran Sakthivel ,
- Abhishek Appu ,
- John C. Weast ,
- Sara S. Baghsorkhi ,
- Barnan Das ,
- Narayan Biswal ,
- Stanley J. Baran ,
- Nilesh V. Shah ,
- Archie Sharma ,
- Mayuresh M. Varerkar
An apparatus to facilitate neural network (NN) training is disclosed. The apparatus includes training logic to receive one or more network constraints and train the NN by automatically determining a best network layout and parameters based on the network constraints.
Processor with instruction iteration
A processor includes a plurality of execution units. At least one of the execution units is configured to repeatedly execute a first instruction based on a first field of the first instruction indicating that the first instruction is to be iteratively executed.
Memory-network processor with programmable optimizations
Various embodiments are disclosed of a multiprocessor system with processing elements optimized for high performance and low power dissipation and an associated method of programming the processing elements. Each processing element may comprise a fetch unit and a plurality of address generator units and a plurality of pipelined datapaths. The fetch unit may be configured to receive a multi-part instruction, wherein the multi-part instruction includes a plurality of fields. First and second address generator units may generate, based on different fields of the multi-part instruction, addresses from which to retrieve first and second data for use by an execution unit for the multi-part instruction or a subsequent multi-part instruction. The execution units may perform operations using a single pipeline or multiple pipelines based on third and fourth fields of the multi-part instruction.
Neural network accelerator with parameters resident on chip
One embodiment of an accelerator includes a computing unit; a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations, the second memory bank configured to store a sufficient amount of the neural network parameters on the computing unit to allow for latency below a specified level with throughput above a specified level. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs computations associated with at least one element of a data array, the one or more computations performed by the MAC operator.
Dynamic re-evaluation of parameters for non-volatile memory using microcontroller
A non-volatile memory apparatus and corresponding method of operation are provided. The apparatus includes non-volatile memory cells in an integrated circuit device along with a microcontroller in communication with the non-volatile memory cells. The microcontroller is configured to receive a memory operation command and in response, determine a condition value of one of a plurality of conditions associated with the memory operation command and whether the one of the plurality of conditions is dynamic. In parallel, the microcontroller determines and outputs an output value using the condition value. The microcontroller then determines whether the one the plurality of conditions has changed. If the one of the plurality of conditions is dynamic and has changed, the microcontroller determines an updated condition value and in parallel, compares the condition value and the updated condition value and determines and outputs an updated output value using the updated condition value and the comparison.
Digital filter with programmable impulse response for direct amplitude modulation at radio frequency
A digital filter according to the disclosure includes a processing circuit having a memory and a number of parallel processing circuits. The parallel processing circuits perform a convolution operations based on input data and function data that is accessed from the memory. The filter further includes a serializer for serializing data that is received from the processing circuits. A clock generator circuit provides a first clock signal to the processing circuit and a second clock signal to the serializer. The frequency of the second clock signal is greater than that of the first clock signal.
NEURAL NETWORK ACCELERATOR WITH PARAMETERS RESIDENT ON CHIP
One embodiment of an accelerator includes a computing unit; a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations, the second memory bank configured to store a sufficient amount of the neural network parameters on the computing unit to allow for latency below a specified level with throughput above a specified level. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs computations associated with at least one element of a data array, the one or more computations performed by the MAC operator.
Mechanism for reducing coherence directory controller overhead for near-memory compute elements
A parallel processing (PP) level coherence directory, also referred to as a Processing In-Memory Probe Filter (PimPF), is added to a coherence directory controller. When the coherence directory controller receives a broadcast PIM command from a host, or a PIM command that is directed to multiple memory banks in parallel, the PimPF accelerates processing of the PIM command by maintaining a directory for cache coherence that is separate from existing system level directories in the coherence directory controller. The PimPF maintains a directory according to address signatures that define the memory addresses affected by a broadcast PIM command. Two implementations are described: a lightweight implementation that accelerates PIM loads into registers, and a heavyweight implementation that accelerates both PIM loads into registers and PIM stores into memory.
MULTI-MULTIDIMENSIONAL COMPUTER ARCHITECTURE FOR BIG DATA APPLICATIONS
A data processing apparatus is provided comprising a front-end interface electronically coupled to a main processor. The front-end interface is configured to receive data stored in a repository, in particular an external storage and/or a network, determine whether the data is a single-access data or a multiple-access data by analyzing an access parameter designating the data, route the multiple-access data for processing by the main processor, and route the single-access data for pre-processing by the front-end interface and routing results of the pre-processing to the main processor.
METHOD AND APPARATUS FOR PROCESSING MACRO INSTRUCTION
A processor includes: at least one operator; and at least one macro instruction processing unit configured to share the at least one operator, wherein the at least one macro instruction processing unit is configured to execute a macro instruction with respect to input data by using the at least one operator to output result data, and to control the at least one operator to perform an operation included in the macro instruction, and the at least one macro instruction processing unit comprises: a scheduler configured to manage schedules of the at least one operator and output input data and a control signal to the at least one operator; and a controller configured to control the scheduler to execute the macro instruction and to receive the result data from the scheduler.