Patent classifications
G06F9/30192
System Call Management in a User-Mode, Multi-Threaded, Self-Scheduling Processor
Representative apparatus, method, and system embodiments are disclosed for a self-scheduling processor which also provides additional functionality. Representative embodiments include a self-scheduling processor, comprising: a processor core adapted to execute a received instruction; and a core control circuit adapted to automatically schedule an instruction for execution by the processor core in response to a received work descriptor data packet. In another embodiment, the core control circuit is also adapted to schedule a fiber create instruction for execution by the processor core, to reserve a predetermined amount of memory space in a thread control memory to store return arguments, and to generate one or more work descriptor data packets to another processor or hybrid threading fabric circuit for execution of a corresponding plurality of execution threads. Event processing, data path management, system calls, memory requests, and other new instructions are also disclosed.
Apparatus and method for controlling a change in instruction set
An apparatus and method are provided for controlling a change in instruction set. The apparatus has processing circuitry arranged to operate in a capability domain comprising capabilities used to constrain operations performed by the processing circuitry. A program counter capability storage element is used to store a program counter capability used by the processing circuitry to determined a program counter value. The processing circuitry is arranged to employ a capability based operation to change the instruction set. In response to execution of at least one type of instruction to load an identified capability into the program counter capability storage element, the processing circuitry is arranged to invoke the capability based operation in order to perform a capability check operation in respect of the identified capability, and to cause the instruction set to be identified by an instruction set identifier field from the identified capability provided the capability check operation is passed.
CONTROL WAVELET FOR ACCELERATED DEEP LEARNING
Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow based computations on wavelets of data. Each processing element has a compute element and a routing element. Each compute element has memory. Each router enables communication via wavelets with nearest neighbors in a 2D mesh. A compute element receives a wavelet. If a control specifier of the wavelet is a first value, then instructions are read from the memory of the compute element in accordance with an index specifier of the wavelet. If the control specifier is a second value, then instructions are read from the memory of the compute element in accordance with a virtual channel specifier of the wavelet. Then the compute element initiates execution of the instructions.
DATA PROCESSING METHOD AND APPARATUS, AND RELATED PRODUCT
The present disclosure provides a data processing method and an apparatus and a related product. The products include a control module including an instruction caching unit, an instruction processing unit, and a storage queue unit. The instruction caching unit is configured to store computation instructions associated with an artificial neural network operation; the instruction processing unit is configured to parse the computation instructions to obtain a plurality of operation instructions; and the storage queue unit is configured to store an instruction queue, where the instruction queue includes a plurality of operation instructions or computation instructions to be executed in the sequence of the queue. By adopting the above-mentioned method, the present disclosure can improve the operation efficiency of related products when performing operations of a neural network model.
Method of allocating a virtual register stack in a stack machine
A method of allocating a virtual register stack (10) of a processing unit in a stack machine is provided. The method comprises allocating a given number of topmost elements (11) of the virtual register stack (10) in a physical register file (17) of the stack machine and allocating subsequent elements of the virtual register stack (10) in a hierarchical register cache (13) of the stack machine.
Systems, apparatuses, and methods for cumulative product
Systems, methods, and apparatuses for executing an instruction are described. In some embodiments, the instruction includes at least an opcode, a field for a packed data source operand, and a field for a packed data destination operand. When executed, the instruction causes for each data element position of the source operand, multiply to a value stored in that data element position all values stored in preceding data element positions of the packed data source operand and store a result of the multiplication into a corresponding data element position of the packed data destination operand.
Processor, information processing apparatus, and processing method for converting a field of an instruction
A predetermined field of a fetched instruction is extended to secure an instruction type and an operand length. An instruction conversion table stores an extension field longer than the predetermined field in association with a bit pattern of the predetermined field of an instruction. An extension field acquisition unit acquires the extension field by referring to the instruction conversion table, with a bit pattern of the predetermined field of the fetched instruction. An instruction decoder performs a decoding process on a new instruction including the extension field in place of the predetermined field of the fetched instruction.
Graphics processing systems for determining blending operations
A sequence of instructions is included in a graphics processing shader program for controlling the way in which blending is implemented. The sequence of instructions includes a blend instruction which determines whether blending for a processing item is to be performed by fixed-function blending hardware or by executing a blend shader routine. If blend shading is to be performed, a sequence of instructions for setting up and performing blend shading is executed. If fixed-function blending is to be performed, an execution thread initiates fixed-function blending in response to the blend instruction, and skips over the sequence of instructions for setting up and performing blend shading.
Systems and Methods for Dynamic Server Control based on Estimated Script Complexity
A computer system includes processor hardware and memory hardware storing instructions for execution by the processor hardware. The instructions include, in response to receiving a first script from a user device, compiling the first script, generating an image representation of the compiled first script, and determining an estimated runtime of the first script using a machine learning algorithm. The instructions include transmitting the estimated runtime for display on a display of the user device, categorizing the estimated runtime, and transmitting the first script to a queue based on the categorization. The instructions include, in response to the first script reaching a front of the queue, executing the first script on a server of the plurality of servers that corresponds to the queue. The instructions include, in response to the first script being executed, transforming the display of the user device according to instructions of the first script.
APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS TO MULTIPLY FLOATING-POINT VALUES OF ABOUT ZERO
Systems, methods, and apparatuses relating to instructions to multiply floating-point values of about zero are described. In one embodiment, a hardware processor includes a decoder to decode a single instruction into a decoded single instruction, the single instruction having a first field that identifies a first floating-point number, a second field that identifies a second floating-point number, and a third field that indicates an about zero threshold; and an execution circuit to execute the decoded single instruction to: cause a first comparison of an exponent of the first floating-point number to the about zero threshold, cause a second comparison of an exponent of the second floating-point number to the about zero threshold, provide as a resultant of the single instruction a value of zero when the first comparison indicates the exponent of the first floating-point number does not exceed the about zero threshold, provide as the resultant of the single instruction the value of zero when the second comparison indicates the exponent of the second floating-point number does not exceed the about zero threshold, and provide as the resultant of the single instruction a product of a multiplication of the first floating-point number and the second floating-point number when the first comparison indicates the exponent of the first floating-point number exceeds the about zero threshold and the second comparison indicates the exponent of the second floating-point number exceeds the about zero threshold.