Patent classifications
G06F9/3889
MECHANISM TO PROVIDE RELIABLE RECEIPT OF EVENT MESSAGES
Devices and techniques for providing receipts for event messages in a processor are described herein. A system includes multiple memory-compute nodes coupled to one another over a scale fabric; a set of registers; and an event manager hardware circuitry to: receive an event message corresponding to an event, and the event associated with an event mode; track a counter value representing a number of received event messages related to the event, the counter value stored in the set of registers; compare the number of received event messages to a trigger value; and in response to the number of received event messages equaling the trigger value: use an atomic operation to reset the counter value in the set of registers while maintaining the event mode; and alert a thread of the event.
Data Processing Method and Interaction System
A data processing method applied to a programmable chip, includes a logic classification unit (LCU) that obtains, based on first data received through a data bus, at least first target classified data (TCD) and second TCD, and sends the first and second TCD to a corresponding first arithmetic logic unit (ALU) and a corresponding second ALU based on a preset mapping relationship. The LCU classifies target execution information obtained through preprocessing an entry by a ternary content addressable memory (TCAM) and service data, so that an instruction memory determines first and second information, and sends the first and second information to the corresponding first and second ALUs. The first and second ALUs respectively send, through the data bus, data obtained through performing calculation based on the first TCD and the first information and data obtained through performing calculation based on the second TCD and the second information.
Artificial Intelligence (AI) System for Learning Spatial Patterns in Sparse Distributed Representations (SDRs) and Associated Methods
Introduced here is an artificial intelligence system designed for machine learning. The system may be based on a neuromorphic computational model that learns spatial patterns in inputs using data structures called Sparse Distributed Representations (SDRs) to represent the inputs. Moreover, the system can generate signatures for these SDRs, and these signatures may be used to create definitions of classes or subclasses for classification purposes.
DECOUPLED ACCESS-EXECUTE PROCESSING AND PREFETCHING CONTROL
Apparatuses and methods are provided, relating to the control of data processing in devices which comprise both decoupled access-execute processing circuitry and prefetch circuitry. Control of the access portion of the decoupled access-execute processing circuitry may be dependent on a performance metric of the prefetch circuitry. Alternatively or in addition, control of the prefetch circuitry may be dependent on a performance metric of the access portion.
SPARSE MATRIX CALCULATIONS UTILIZING TIGHTLY COUPLED MEMORY AND GATHER/SCATTER ENGINE
A processor for sparse matrix calculation includes an on-chip memory, a cache, a gather/scatter engine, and a core. The on-chip memory stores a first matrix or vector, and the cache stores a compressed sparse second matrix data structure. The compressed sparse second matrix data structure includes a value array including non-zero element values of the sparse second matrix, where each entry includes a given number of element values; and a column index array where each entry includes the given number of offsets matching the value array. The gather/scatter engine gathers element values of the first matrix or vector using the column index array of the sparse second matrix. In a hybrid horizontal/vertical implementation, the gather/scatter engine gathers sets of element values from sets of rows and from different sub-banks within the same rows based on the column index array of the sparse matrix.
VECTOR COMPUTATIONAL UNIT
A microprocessor system comprises a computational array and a vector computational unit. The computational array includes a plurality of computation units. The vector computational unit is in communication with the computational array and includes a plurality of processing elements. The processing elements are configured to receive output data elements from the computational array and process in parallel the received output data elements.
Neural processing accelerator
A system for calculating. A scratch memory is connected to a plurality of configurable processing elements by a communication fabric including a plurality of configurable nodes. The scratch memory sends out a plurality of streams of data words. Each data word is either a configuration word used to set the configuration of a node or of a processing element, or a data word carrying an operand or a result of a calculation. Each processing element performs operations according to its current configuration and returns the results to the communication fabric, which conveys them back to the scratch memory.
Wavefront selection and execution
Techniques are provided for executing wavefronts. The techniques include at a first time for issuing instructions for execution, performing first identifying, including identifying that sufficient processing resources exist to execute a first set of instructions together within a processing lane; in response to the first identifying, executing the first set of instructions together; at a second time for issuing instructions for execution, performing second identifying, including identifying that no instructions are available for which sufficient processing resources exist for execution together within the processing lane; and in response to the second identifying, executing an instruction independently of any other instruction.
Providing code sections for matrix of arithmetic logic units in a processor
The present invention relates to a processor having a trace cache and a plurality of ALUs arranged in a matrix, comprising an analyser unit located between the trace cache and the ALUs, wherein the analyser unit analyses the code in the trace cache, detects loops, transforms the code, and issues to the ALUs sections of the code combined to blocks for joint execution for a plurality of clock cycles.
Out-of-order block-based processors and instruction schedulers using ready state data indexed by instruction position identifiers
Apparatus and methods are disclosed for implementing block-based processors including field programmable gate-array implementations. In one example of the disclosed technology, a block-based processor includes an instruction decoder configured to generate decoded ready dependencies for a transactional block of instructions, where each of the instructions is associated with a different instruction identifier encoded in the transactional block. The processor further includes an instruction scheduler configured to issue an instruction from a set of instructions of the transactional block of instructions. The instruction is issued based on determining that decoded ready state dependencies for an instruction are satisfied. The determining includes accessing storage with the decoded ready dependencies indexed with a respective instruction identifier that is encoded in the transactional block of instructions.