G06F9/30192

METHOD AND APPARATUS FOR RECTIFYING WEAK MEMORY ORDERING PROBLEM
20230289187 · 2023-09-14 ·

This application relates to the field of computer technologies, and discloses methods and apparatuses, for example, for rectifying a weak memory ordering problem. An example method includes: determining a read/write instruction set in to-be-repaired code; classifying instructions in the read/write instruction set to determine a target instruction; and inserting a memory barrier instruction between a previous read/write instruction of the target instruction and the target instruction. The read/write instruction set includes a read instruction and/or a write instruction in the to-be-repaired code, and an instruction in the read/write instruction set is used for memory access.

Instruction and logic for processing text strings

Method, apparatus, and program means for performing a string comparison operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources store a result of a comparison between each data element of a first and second operand corresponding to a first and second text string, respectively.

METHOD TO OPTIMIZE STORAGE PARTITION REDISCOVERY

Disclosed is a storage management system comprising: sending, by a user device manager running at a user space of an operating system, a first request for partition table data to a block device; receiving, by the user device manager, first partition data of the block device; sending, by the user device manager, a second request for partition data of the block device to a kernel of the operating system; receiving, by the user device manager, second partition data from the kernel, wherein the second partition data is associated with the block device and cached by the kernel; determining whether the first partition data and the second partition data are identical; and in response to determining that the first partition data is different from the second partition data, performing a device discovery operation on the block device.

DATA STRUCTURE DESCRIPTORS FOR DEEP LEARNING ACCELERATION

Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data structure register storing a data structure descriptor describing an operand as a fabric vector or a memory vector. The data structure descriptor further describes the memory vector as one of a one-dimensional vector, a four-dimensional vector, or a circular buffer vector. Optionally, the data structure descriptor specifies an extended data structure register storing an extended data structure descriptor. The extended data structure descriptor specifies parameters relating to a four-dimensional vector or a circular buffer vector.

SYSTEMS AND METHODS FOR DYNAMIC SERVER CONTROL BASED ON ESTIMATED SCRIPT COMPLEXITY

A computer system includes processor hardware and memory hardware storing instructions for execution by the processor hardware. The instructions include, in response to receiving a first script from a user device, compiling the first script, generating an image representation of the compiled first script, and determining an estimated runtime of the first script using a machine learning algorithm. The instructions include transmitting the estimated runtime for display on a display of the user device, categorizing the estimated runtime, and transmitting the first script to a queue based on the categorization. The instructions include, in response to the first script reaching a front of the queue, executing the first script on a server of the plurality of servers that corresponds to the queue. The instructions include, in response to the first script being executed, transforming the display of the user device according to instructions of the first script.

OPTIMIZED COMPUTE HARDWARE FOR MACHINE LEARNING OPERATIONS

Described herein is a graphics processor including a processing resource including a multiplier configured to multiply input associated with the instruction at one of a first plurality of bit widths, an adder configured to add a product output from the multiplier with an accumulator value at one of a second plurality of bit widths, and circuitry to select a first bit width of the first plurality of bit widths for the multiplier and a second bit width of the second plurality of bit widths for the adder.

Microthreading for accelerated deep learning

Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of compute elements and routers performs flow-based computations on wavelets of data. Some instructions are performed in iterations, such as one iteration per element of a fabric vector or FIFO. When sources for an iteration of an instruction are unavailable, and/or there is insufficient space to store results of the iteration, indicators associated with operands of the instruction are checked to determine whether other work can be performed. In some scenarios, other work cannot be performed and processing stalls. Alternatively, information about the instruction is saved, the other work is performed, and sometime after the sources become available and/or sufficient space to store the results becomes available, the iteration is performed using the saved information.

MATRIX MULTIPLICATION AND ACCUMULATION OPERATIONS ON COMPRESSED MATRICES

Apparatuses, systems, and techniques to perform an operation to indicate one or more non-zero values within one or more matrices of data; to perform an API to compress one or more matrices of data; to perform a matrix multiply accumulate (MMA) operation on two or more matrices of data, wherein at least one of the two or more matrices contain compressed data; and/or to perform an API to decompress one or more matrices of data. In at least one embodiment, one or more circuits are configured to receive and compile one or more instructions to perform computational operations for a sparse matrix multiplication.

APPLICATION PROGRAMMING INTERFACE TO COMPRESS DATA

Apparatuses, systems, and techniques to perform an operation to indicate one or more non-zero values within one or more matrices of data; to perform an API to compress one or more matrices of data; to perform a matrix multiply accumulate (MMA) operation on two or more matrices of data, wherein at least one of the two or more matrices contain compressed data; and/or to perform an API to decompress one or more matrices of data. In at least one embodiment, one or more circuits are configured to receive and compile one or more instructions to perform computational operations for a sparse matrix multiplication.

PERFORMING MATRIX VALUE INDICATION

Apparatuses, systems, and techniques to perform an operation to indicate one or more non-zero values within one or more matrices of data; to perform an API to compress one or more matrices of data; to perform a matrix multiply accumulate (MMA) operation on two or more matrices of data, wherein at least one of the two or more matrices contain compressed data; and/or to perform an API to decompress one or more matrices of data. In at least one embodiment, one or more circuits are configured to receive and compile one or more instructions to perform computational operations for a sparse matrix multiplication.