Patent classifications
G06F9/30181
Method and device for dynamically adjusting decimal point positions in neural network computations
The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.
Low-latency register error correction
To implement low-latency register error correction a register may be read as part of an instruction when that instruction is the currently executing instruction in a processor. A correctable error in data produced from reading the register can be detected. In response to detecting the correctable error, the currently executing instruction in the processor can be changed into a register update instruction that is executed to overwrite the data in the register with corrected data. Then, the original (e.g., unchanged) instruction can be rescheduled.
Sharing instruction encoding space between a coprocessor and auxiliary execution circuitry
Data processing apparatuses, methods of data processing, and non-transitory computer-readable media on which computer-readable code is stored defining logical configurations of processing devices are disclosed. In an apparatus, fetch circuitry retrieves a sequence of instructions and execution circuitry performs data processing operations with respect to data values in a set of registers. An auxiliary execution circuitry interface and a coprocessor interface to provide a connection to a coprocessor outside the apparatus are provided. Decoding circuitry generates control signals in dependence on the instructions of the sequence of instructions, wherein the decoding circuitry is responsive to instructions in a first subset of an instruction encoding space to generate the control signals to control the execution circuitry to perform the first data processing operations, and the decoding circuitry is responsive to instructions in a second subset of the instruction encoding space to generate the control signals in dependence on a configuration condition: to generate the control signals for the auxiliary execution circuitry interface when the configuration condition has a first state; and to generate the control signals for the coprocessor interface when the configuration condition has a second state.
Assigning operational codes to lists of values of control signals selected from a processor design based on end-user software
End-user software is used to select lists of values of control signals from a predetermined design of a processor, and a unique value of an opcode is assigned to each selected list of values of control signals. The assignments, of opcode values to lists of values of control signals, are used to create a new processor design customized for the end-user software, followed by synthesis, place and route, and netlist generation based on the new processor design, followed by configuring an FPGA based on the netlist, followed by execution of the end-user software in customized processor implemented by the FPGA. Different end-user software may be used as input to generate different assignments, of opcode values to lists of control signal values, followed by generation of different netlists. The different netlists may be used at different times, to reconfigure the same FPGA, to execute different end-user software optimally at different times.
HARDWARE SUPPORT FOR DYNAMIC DATA TYPES AND OPERATORS
A decoder circuit may be configured to receive an instruction which includes a plurality of data bits and decode a first subset of the plurality of data bits. A transcode circuit may be configured to determine if the received instruction is to be modified and, in response to a determination that the received instruction is to be modified, modify a second subset of the plurality of data bits.
HYBRID BLOCK-BASED PROCESSOR AND CUSTOM FUNCTION BLOCKS
Apparatus and methods are disclosed for implementing block-based processors having custom function blocks, including field-programmable gate array (FPGA) implementations. In some examples of the disclosed technology, a dynamically configurable scheduler is configured to issue at least one block-based processor instruction. A custom function block is configured to receive input operands for the instruction and generate ready state data indicating completion of a computation performed for the instruction by the respective custom function block.
PROCESSOR WITH MEMORY CONTROLLER INCLUDING DYNAMICALLY PROGRAMMABLE FUNCTIONAL UNIT
A processor including a memory controller for interfacing an external memory and a programmable functional unit (PFU). The PFU is programmed by a PFU program to modify operation of the memory controller, in which the PFU includes programmable logic elements and programmable interconnectors. For example, the PFU is programmed by the PFU program to add a function or otherwise to modify an existing function of the memory controller enhance its functionality during operation of the processor. In this manner, the functionality and/or operation of the memory controller is not fixed once the processor is manufactured, but instead the memory controller may be modified after manufacture to improve efficiency and/or enhance performance of the processor, such as when executing a corresponding process.
Out-of-order processor that avoids deadlock in processing queues by designating a most favored instruction
An instruction sequencing unit in an out-of-order (OOO) processor includes a Most Favored Instruction (MFI) mechanism that designates an instruction as an MFI. The processing queues in the processor identify when they contain the MFI, and assures processing the MFI. The MFI remains the MFI until it is completed or is flushed, and which time the MFI mechanism selects the next MFI.
INFORMATION PROCESSING APPARATUS AND CONVERSION METHOD
An information processing apparatus sets, in a second program: a second array where an occurrence pattern indicating whether elements are subjected to computation is a repetition of a pattern for every power-of-two number of elements; a second mask array generated by adding masks indicating that corresponding elements are not subjected to the computation to a first mask array so that the second mask array includes as many masks as the number of elements included in a second pattern; and a second instruction string providing an instruction for the computation of elements corresponding to masks indicating that corresponding elements are subjected to the computation, among the elements set in the second array. Each mask in the second mask array to be applied to an element in the second array is specified by a bitwise logical AND using a value indicating the position of the element in the second array.
RECONFIGURABLE MICROPROCESSOR HARDWARE ARCHITECTURE
A reconfigurable, multi-core processor includes a plurality of memory blocks and programmable elements, including units for processing, memory interface, and on-chip cognitive data routing, all interconnected by a self-routing cognitive on-chip network. In embodiments, the processing units perform intrinsic operations in any order, and the self-routing network forms interconnections that allow the sequence of operations to be varied and both synchronous and asynchronous data to be transmitted as needed. A method for programming the processor includes partitioning an application into modules, determining whether the modules execute in series, program-driven parallel, or data-driven parallel, determining the data flow required between the modules, assigning hardware resources as needed, and automatically generating machine code for each module. In embodiments, a Time Field is added to the instruction format for all programming units that specifies the number of clock cycles for which only one instruction fetch and decode will be performed.