Patent classifications
G06F9/30094
Blackbox matching engine
A method and apparatus are disclosed for enhancing operable functionality of input source code files from a software program by identifying a first code snippet and a first library function which generate similar outputs from a shared input by parsing each and every line of code in a candidate code snippet to generate a templatized code snippet data structure for the first code snippet, and then testing the templatized code snippet data structure against extracted library function information to check for similarity of outputs between the first code snippet and the first library function in response to a shared input so that the developer is presented with a library function recommendation which includes the first code snippet, the first library function, and instructions for replacing the first code snippet with the first library function.
Apparatus and method for sparse training acceleration in neural networks
A computing device, comprising: a computing module, comprising one or more computing units; and a control module, comprising a computing control unit, and used for controlling shutdown of the computing unit of the computing module according to a determining condition. Also provided is a computing method. The computing device and method have the advantages of low power consumption and high flexibility, and can be combined with the upgrading mode of software, thereby further increasing the computing speed, reducing the computing amount, and reducing the computing power consumption of an accelerator.
Computing device and method
A computing device, comprising: a computing module, comprising one or more computing units; and a control module, comprising a computing control unit, and used for controlling shutdown of the computing unit of the computing module according to a determining condition. Also provided is a computing method. The computing device and method have the advantages of low power consumption and high flexibility, and can be combined with the upgrading mode of software, thereby further increasing the computing speed, reducing the computing amount, and reducing the computing power consumption of an accelerator.
Systems and methods for consistent feature flag evaluation
Described herein is a computer implemented method. The method comprises executing an application defining a feature flag, the execution of the application being associated with a user identifier. The method further comprises determining if version data associated with the feature flag and user identifier is stored in a local data store. In response determining that the version data associated with the feature flag and user identifier is stored in the local data store an evaluation request is generated that includes the version data and the user identifier. The evaluation request is then communicated to a feature flag evaluation service.
Compute unit having independent data paths
One embodiment provides for a general-purpose graphics processing unit comprising a streaming multiprocessor having a single instruction, multiple thread (SIMT) architecture including hardware multithreading. The streaming multiprocessor comprises multiple processing blocks including multiple processing cores. The processing cores include independent integer and floating-point data paths that are configurable to concurrently execute multiple independent instructions. A memory is coupled with the multiple processing blocks.
Transactional compare-and-discard instruction
An apparatus comprising: processing circuitry to process threads of data processing; and transactional memory support circuitry to support execution of a transaction within a thread processed by the processing circuitry. In response to a transactional compare-and-discard instruction executed within a given transaction, specifying a target address and a compare value, the processing circuitry loads a target data value from a memory location corresponding to the target address; sets at least one condition status indication depending on a result of comparing the target data value and the compare value; and discards the target data value without adding the target address to a working set of addresses tracked for the given transaction. This is useful for enabling thread level speculation to be implemented on a transactional memory architecture.
ADDRESS GENERATION METHOD, RELATED APPARATUS, AND STORAGE MEDIUM
A system parses a very long instruction word (VLIW) to obtain an execution parameter. The system obtains a first sliding window width count, a first sliding window height count, a first feature map width count, and a first feature map height count that correspond to first target data. In accordance with a determination that the first sliding window width count falls within the sliding window width range, the first sliding window height count falls within the sliding window height range, (the first feature map width count falls within the feature map width range, and the first feature map height count falls within the feature map height range, the system determines an offset of the first target data. The system also obtains a starting address of the first target data, and adds the starting address to the offset to obtain a first target address of the first target data.
PROCESSOR AND NON-TRANSITORY COMPUTER-READABLE MEDIUM
A processor includes a first register, a second register configured to store status information related to the first register, and a detection circuit configured to detect an exception in an instruction in which the first register is specified in an operand, based on the status information stored in the second register, wherein the status information has a first flag indicating whether the first register has been used as a write destination register before the execution of the instruction and a second flag indicating whether the first register has been used as a read source register before the execution of the instruction, and the detection circuit detects the exception when the first flag indicates that the first register has been used as the write destination register and the second flag indicates that the first register has not been used as the read source register.
Mixed inference using low and high precision
One embodiment provides for a graphics processing unit (GPU) to accelerate machine learning operations, the GPU comprising an instruction cache to store a first instruction and a second instruction, the first instruction to cause the GPU to perform a floating-point operation, including a multi-dimensional floating-point operation, and the second instruction to cause the GPU to perform an integer operation; and a general-purpose graphics compute unit having a single instruction, multiple thread (SIMT) architecture, the general-purpose graphics compute unit to simultaneously execute the first instruction and the second instruction, wherein the integer operation corresponds to a memory address calculation.
Enhanced techniques for traversing ray tracing acceleration structures
Enhanced techniques applicable to a ray tracing hardware accelerator for traversing a hierarchical acceleration structure are disclosed. For example, traversal efficiency is improved by combining programmable traversals based on ray operations with per-node static configurations that modify traversal behavior. The per-node static configurations enable creators of acceleration data structures to optimize for potential traversals without necessarily requiring detailed information about ray characteristics and ray operations used when traversing the acceleration structure. Moreover, by providing for selective exclusion of certain nodes using per-node static configurations, less memory is needed to express an acceleration structure that includes, for example, different geometric levels of details corresponding to a single object.