Patent classifications
G06F9/30181
LOW-LATENCY REGISTER ERROR CORRECTION
Devices and techniques for low-latency register error correction are described herein. A register is read as part of an instruction when that instruction is the currently executing instruction in a processor. A correctable error in data produced from reading the register can be detected. In response to detecting the correctable error, the currently executing instruction in the processor can be changed into a register update instruction that is executed to overwrite the data in the register with corrected data. Then, the original (e.g., unchanged) instruction can be rescheduled.
COPY A SUBSET OF STATUS FLAGS FROM A CONTROL AND STATUS REGISTER TO A FLAGS REGISTER
Techniques for copying a subset of status flags from a control and status register to a flags register in response to an instruction are described. An exemplary instruction includes a field for an opcode, the opcode to indicate execution circuitry is to copy from a first register a saturation flag value, an overflow value, and a carry value to a second register into one or more instructions of a different instruction set.
Method performed by a microcontroller for managing a NOP instruction and corresponding microcontroller
Disclosed herein is a method for managing of NOP instructions in a microcontroller, the method comprising duplicating all jump instructions causing a NOP instruction to form a new instruction set; inserting an internal NOP instruction into each of the jump instructions; when a jump instruction is executed, executing a subsequent instruction of the new instruction set; and executing the internal NOP instruction when an execution of the subsequent instruction is skipped.
Inline data inspection for workload simplification
A method, computer readable medium, and processor are described herein for inline data inspection by using a decoder to decode a load instruction, including a signal to cause a circuit in a processor to indicate whether data loaded by a load instruction exceeds a threshold value. Moreover, an indication of whether data loaded by a load instruction exceeds a threshold value may be stored.
Computing device and method
The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.
SOFTWARE INSTRUCTION SET UPDATE OF MEMORY DIE USING PAGE BUFFERS
Disclosed in some examples are methods, systems, devices, memory controllers, memory dies, memory devices, and machine-readable mediums that allow for efficient updating of software instructions of the memory die. In some examples, the controller of the memory device may cause the software instructions of one or more memory dies to be updated by causing the page buffers of the one or more memory dies to be loaded with updated software instructions and subsequently issuing a command to the memory die to update the software instructions from the page buffer.
Coalescing adjacent gather/scatter operations
According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.
Apparatus and method for controlling complex multiply-accumulate circuitry
An apparatus and method for performing multiply-accumulate (MAC) operations on complex numbers to generate real results. For example, one embodiment of a processor comprises: a decoder to decode instructions including multiply-accumulate instructions; first and second source registers to store a first plurality of complex values and a second plurality of complex values, respectively, each complex value comprising a real value and an imaginary value; multiply-accumulate (MAC) execution circuitry coupled to the first and second source registers comprising multiplier circuitry, adder circuitry, and accumulator circuitry; mode selection circuitry to select between at least two execution modes for the MAC execution circuitry including a first mode in which the MAC execution circuitry is to perform complex multiply-accumulate operations using real and imaginary values from the first plurality of complex values and the second plurality of complex values and a second mode in which the MAC execution circuitry is to replace one or more of the real or imaginary values from the first and second plurality of complex values with one or more real or imaginary values specified in a set of scalar complex numbers or with zeroes.
CO-SCHEDULED LOADS IN A DATA PROCESSING APPARATUS
A data processing apparatus and method of operating such is disclosed. Issue circuitry buffers operations prior to execution until operands are available in a set of registers. A first and a second load operation are identified in the issue circuitry, when both are dependent on a common operand, and when the common operand is available in the set of registers. Load circuitry has a first address generation unit to generate a first address for the first load operation and a second address generation unit to generate a second address for the second load operation. An address comparison unit compares the first address and the second address. The load circuitry is arranged to cause a merged lookup to be performed in local temporary storage, when the address comparison unit determines that the first and the second address differ by less than a predetermined address range characteristic of the local temporary storage.
TRANSFORMATION OF DATA FROM LEGACY ARCHITECTURE TO UPDATED ARCHITECTURE
One or more systems, computer-implemented methods and/or computer program products to facilitate a process to transform original operational data into updated operational data. A system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a transformation component that can transform original operational data of a first architecture into updated operational data employable at a second architectures, wherein the second architectures is an updated architectures relative to the first architecture. In one or more embodiments, the transformation component further can employ machine learning to match one or more data elements of the original operational data to one or more aspects of the second architecture.