Patent classifications
G06F9/30181
Computing device and method
The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.
METHODS FOR DYNAMIC INSTRUCTION SIMPLIFICATION BASED ON REGISTER VALUE LOCALITY
There is provided methods and devices for dynamically simplifying processor instructions. A method includes receiving, at a computing device, processor instructions and determining, by the computing device, if instruction simplification is enabled for an instruction being processed. The method further includes determining, by the computing device, from an instruction simplification table if the instruction is capable of being simplified and scheduling, by the computing device, a simplified instruction based on the determination from the instruction simplification table. A device includes a processor, and a non-transient computer readable memory having stored thereon instructions which when executed by the processor configure the device to execute the methods disclosed herein.
Processor authentication method
The disclosure includes a method of authenticating a processor that includes an arithmetic and logic unit. At least one decoded operand of at least a portion of a to-be-executed opcode is received on a first terminal of the arithmetic and logic unit. A signed instruction is received on a second terminal of the arithmetic and logic unit. The signed instruction combines a decoded instruction of the to-be-executed opcode and at least one previously-executed opcode.
Apparatuses, methods, and systems for instructions to multiply floating-point values of about one
Systems, methods, and apparatuses relating to instructions to multiply floating-point values of about one are described. In one embodiment, a hardware processor includes a decoder to decode a single instruction into a decoded single instruction, the single instruction having a first field that identifies a first floating-point number, a second field that identifies a second floating-point number, and a third field that indicates an about one threshold; and an execution circuit to execute the decoded single instruction to: cause a first comparison of an exponent of the first floating-point number to the about one threshold, cause a second comparison of an exponent of the second floating-point number to the about one threshold, provide as a resultant of the single instruction a value of the first floating-point number one when both the first comparison indicates the exponent of the first floating-point number does not exceed the about one threshold and the second comparison indicates the exponent of the second floating-point number does not exceed the about one threshold, provide as the resultant of the single instruction the second floating-point number when the first comparison indicates the exponent of the first floating-point number does not exceed the about one threshold, and provide as the resultant of the single instruction a product of a multiplication of the first floating-point number and the second floating-point number when the first comparison indicates the exponent of the first floating-point number exceeds the about one threshold or and the second comparison indicates the exponent of the second floating-point number exceeds the about one threshold.
Processors, methods, systems, and instructions to generate sequences of integers in numerical order that differ by a constant stride
A method of an aspect includes receiving an instruction indicating a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes a sequence of at least four non-negative integers in numerical order with all integers in consecutive positions differing by a constant stride of at least two. In an aspect, storing the result including the sequence of the at least four integers is performed without calculating the at least four integers using a result of a preceding instruction. Other methods, apparatus, systems, and instructions are disclosed.
COALESCING ADJACENT GATHER/SCATTER OPERATIONS
According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.
ELECTRONIC APPARATUS AND METHOD FOR REDUCING NUMBER OF COMMANDS
An electronic apparatus and a method for reducing the number of commands are provided. The electronic apparatus includes a central processor and a co-processor. The central processor generates a plurality of original register setting commands to set at least one bit of at least one register of the co-processor. The original register setting commands include a plurality of first original register setting commands, and a plurality of setting targets of the first original register setting commands have address continuity. The central processor merges the first original register setting commands to generate at least one merged register setting command. The central processor transmits the at least one merged register setting command to the co-processor.
Function virtualization facility for function query of a processor
Selected installed function of a multi-function instruction is hidden such that even though a processor is capable of performing the hidden installed function, the availability of the hidden function is hidden such that responsive to the multi-function instruction querying the availability of functions, only functions not hidden are reported as installed.
Out-of-order block-based processors and instruction schedulers using ready state data indexed by instruction position identifiers
Apparatus and methods are disclosed for implementing block-based processors including field programmable gate-array implementations. In one example of the disclosed technology, a block-based processor includes an instruction decoder configured to generate decoded ready dependencies for a transactional block of instructions, where each of the instructions is associated with a different instruction identifier encoded in the transactional block. The processor further includes an instruction scheduler configured to issue an instruction from a set of instructions of the transactional block of instructions. The instruction is issued based on determining that decoded ready state dependencies for an instruction are satisfied. The determining includes accessing storage with the decoded ready dependencies indexed with a respective instruction identifier that is encoded in the transactional block of instructions.
Data processing method and apparatus, and related product
The present disclosure provides a data processing method and an apparatus and a related product. The products include a control module including an instruction caching unit, an instruction processing unit, and a storage queue unit. The instruction caching unit is configured to store computation instructions associated with an artificial neural network operation; the instruction processing unit is configured to parse the computation instructions to obtain a plurality of operation instructions; and the storage queue unit is configured to store an instruction queue, where the instruction queue includes a plurality of operation instructions or computation instructions to be executed in the sequence of the queue. By utilizing the above-mentioned method, the present disclosure can improve the operation efficiency of related products when performing operations of a neural network model.