Patent classifications
G06F9/3013
INSTRUCTIONS AND LOGIC TO PERFORM FLOATING POINT AND INTEGER OPERATIONS FOR MACHINE LEARNING
- Himanshu Kaul ,
- Mark A. Anders ,
- Sanu K. Mathew ,
- Anbang Yao ,
- Joydeep Ray ,
- Ping T. Tang ,
- Michael S. Strickland ,
- Xiaoming Chen ,
- Tatiana Shpeisman ,
- Abhishek R. Appu ,
- Altug Koker ,
- Kamal Sinha ,
- Balaji Vembu ,
- Nicolas C. Galoppo Von Borries ,
- Eriko Nurvitadhi ,
- Rajkishore Barik ,
- Tsung-Han Lin ,
- Vasanth Ranganathan ,
- Sanjeev Jahagirdar
One embodiment provides a graphics processor comprising a memory controller and a graphics processing resource coupled with the memory controller. The graphics processing resource includes circuitry configured to execute an instruction to perform a matrix operation on first input including weight data and second input including input activation data, generate intermediate data based on a result of the matrix operation, quantize the intermediate data to a floating-point format determined based on a statistical distribution of first output data, and output, as second output data, quantized intermediate data in a determined floating-point format.
DEEP VISION PROCESSOR
Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.
Compound instruction set architecture for a neural inference chip
A device for controlling neural inference processor cores is provided, including a compound instruction set architecture. The device comprises an instruction memory, which comprises a plurality of instructions for controlling a neural inference processor core. Each of the plurality of instructions comprises a control operation. The device further comprises a program counter. The device further comprises at least one loop counter register. The device is adapted to execute the plurality of instructions. Executing the plurality of instructions comprises: reading an instruction from the instruction memory based on a value of the program counter; updating the at least one loop counter register according to the control operation of the instruction; and updating the program counter according to the control operation of the instruction and a value of the at least one loop counter register.
Method and device for identifying type of variable in binary
A method for identifying a type of a variable within a binary performed on a computing device is provided. The method comprises, identifying a variable from disassembly code of a binary, and determining a type of the variable based on an instruction of the disassembly code, associated with the variable.
DATA PROCESSING METHOD AND DEVICE
Provided is a data processing method including the operations of storing, in a register, a first immediate portion included in a first instruction, from among the first immediate portion and a second immediate portion that constitute an immediate value, which is an operand; determining the immediate value by catenating the second immediate portion included in a second instruction with the stored first immediate portion; and performing an operation by using a value indicated by the second instruction and the determined immediate value.
APPARATUS AND METHOD FOR SEGMENTING A DATA STREAM OF A PHYSICAL LAYER
The invention introduces an apparatus for segmenting a data stream, installed in a physical layer, to include a host interface, a data register and a boundary detector. The data register is arranged to operably store data received from the host side through the host interface. The boundary detector is arranged to operably detect the content of a data register. When the data register includes a special symbol, the boundary detector outputs a starting address that the special symbol is stored in the data register to an offset register to update a value stored in the offset register, thereby enabling a stream splitter to divide data bits of the data register according to the updated value of the offset register.
Thread context preservation in a multithreading computer system
According to one aspect, a computer-implemented method for thread context preservation in a configuration including a core configurable between a single thread (ST) mode and a multithreading (MT) mode is provided. The ST mode addresses a primary thread, and the MT mode addresses the primary thread and one or more secondary threads on shared resources of the core. Based on determining, by the core in the MT mode, that MT is to be disabled, switching from the MT mode to the ST mode is performed, where the primary thread of the MT mode is maintained as the primary thread of the ST mode. A thread context including program accessible register values and program counter values of the one or more secondary threads is made inaccessible to programs. Based on the switching, any one of clearing the program accessible register values or retaining the program accessible register values is performed.
INDIRECT CHAINING OF COMMAND BUFFERS
Systems, apparatuses, and methods for enabling indirect chaining of command buffers are disclosed. A system includes at least first and second processors and a memory. The first processor generates a plurality of command buffers and stores the plurality of command buffers in the memory. The first processor also generates and stores, in the memory, a table with entries specifying addresses of the plurality of command buffers and an order in which to process the command buffers. The first processor conveys an indirect buffer packet to the second processor, where the indirect buffer packet specifies a location and a size of the table in the memory. The second processor retrieves an initial entry from the table, processes a first command buffer at the address specified in the initial entry, and then returns to the table for the next entry upon completing processing of the first command buffer.
CONTROLLING THE NUMBER OF POWERED VECTOR LANES VIA A REGISTER FIELD
The vector data path is divided into smaller vector lanes. A register such as a memory mapped control register stores a vector lane number (VLX) indicating the number of vector lanes to be powered. A decoder converts this VLX into a vector lane control word, each bit controlling the ON of OFF state of the corresponding vector lane. This number of contiguous least significant vector lanes are powered. In the preferred embodiment the stored data VLX indicates that 2.sup.VLX contiguous least significant vector lanes are to be powered. Thus the number of vector lanes powered is limited to an integral power of 2. This manner of coding produces a very compact controlling bit field while obtaining substantially all the power saving advantage of individually controlling the power of all vector lanes.
Thread transition management
A system and process for managing thread execution includes providing two data register sets coupled to a processor and using, by the processor, the two register sets as first-level registers for thread execution. A portion of main memory or cache memory is assigned as second-level registers where the second-level registers serve as registers of at least one of the two data register sets for executing the threads. Data for the threads may be moved between the first-level registers and second-level registers for different modes of thread processing.