G06F9/30076

Accelerator for dense and sparse matrix computations

A method of increasing computer hardware efficiency of a matrix computation. The method comprises receiving at a computer processing device, digital signals encoding one or more operations of the matrix computation, each operation including one or more operands. The method further comprises, responsive to determining, by a sparse data check device of the computer processing machine, that an operation of the matrix computation includes all dense operands, forwarding the operation to a dense computation device of the computer processing machine configured to perform the operation of the matrix computation based on the dense operands. The method further comprises, responsive to determining, by the sparse data check device, that an operation of the matrix computation includes one or more sparse operands, forwarding the operation to a sparse computation device configured to perform the operation of the matrix computation.

Marking current context data to control a context-data-dependent processing operation to save current or default context data to a data location

A data processing system includes processing circuitry for executing context-data-dependent program instructions which are decoded by decoder circuitry. Such context-data-dependent program instructions perform processing which is dependent upon currently existing context data. As an example, the context-data-dependent program instructions may be floating point instructions and the context data may be rounding mode information. The decoder circuitry supports a context save instruction which saves context data when it is marked as having been used and saves default context data when the current context data is marked as not having been used. The decoder circuitry further supports a context restore instruction which restores context data when the current context data is marked as having been used and permits the current context data to continue for future use when it is marked as currently unused.

Machine learning for computing enabled systems and/or devices
11699295 · 2023-07-11 ·

Aspects of the disclosure generally relate to computing enabled systems and/or devices and may be generally directed to machine learning for computing enabled systems and/or devices. In some aspects, the system captures one or more digital pictures, receives one or more instruction sets, and learns correlations between the captured pictures and the received instruction sets.

Generating a vector predicate summary
11550574 · 2023-01-10 · ·

Apparatuses and methods of operating such apparatuses are disclosed. Vector processing circuitry performs data processing in multiple parallel processing lanes, wherein the data processing is performed in a subset of the multiple parallel processing lanes determined by bit values of a vector predicate which are set. Predicate monitoring circuitry is responsive to the vector predicate to generate a predicate summary value in dependence on the bit values of the vector predicate. A first value of the predicate summary value indicates that a sparse condition is true for the vector predicate, the sparse condition being true when the bit values of the vector predicate comprise a set bit corresponding to a vector element at a higher index immediately followed by a non-set bit corresponding to a vector element at a lower index. A second value of the predicate summary value indicates that the sparse condition is not true for the vector predicate. Improved predicate controlled vector processing is thus supported.

Execution circuits using discardable state
11550651 · 2023-01-10 · ·

There is provided execution circuitry. Storage circuitry retains a stored state of the execution circuitry. Operation receiving circuitry receives, from issue circuitry, an operation signal corresponding to an operation to be performed that accesses the stored state of the execution circuitry from the storage circuitry. Functional circuitry seeks to perform the operation in response to the operation signal by accessing the stored state of the execution circuitry from the storage circuitry. Delete request receiving circuitry receives a deletion signal and in response to the deletion signal, deletes the stored state of the execution circuitry from the storage circuitry. State loss indicating circuitry responds to the operation signal when the stored state of the execution circuitry is not present and is required for the operation by indicating an error. In addition, there is provided a data processing apparatus comprising issue circuitry to issue an operation to execution circuitry. The execution circuitry stores a stored state that is accessed during performance of the operation and error detecting circuitry detects an indication of an error from the execution circuitry that the stored state is required for performance of the operation and that the stored state has been deleted.

Systems, methods, and apparatuses for heterogeneous computing

Embodiments of systems, methods, and apparatuses for heterogeneous computing are described. In some embodiments, a hardware heterogeneous scheduler dispatches instructions for execution on one or more plurality of heterogeneous processing elements, the instructions corresponding to a code fragment to be processed by the one or more of the plurality of heterogeneous processing elements, wherein the instructions are native instructions to at least one of the one or more of the plurality of heterogeneous processing elements.

Implementing a type restriction that restricts to a non-polymorphic layout type or a maximum value

A type restriction contextually modifies an existing type descriptor. The type restriction is imposed on a data structure to restrict the values that are assumable by the data structure. The type restriction does not cancel or otherwise override the effect of the existing type descriptor on the data structure. Rather the type restriction may declare that a value of the data structure's type is forbidden for the data structure. Additionally or alternatively, the type restriction may declare that an element count allowable for a data structure's type is forbidden for the data structure. Type restriction allows optionality (where only a singleton value for a data structure is allowed), empty sets (where no value for a data structure is allowed), and multiplicity (where only a limited element count for a data structure) to be injected into a code set independent of data type. Type restriction allows certain optimizations to be performed.

RATE LIMITING COMMANDS FOR SHARED WORK QUEUES
20230004503 · 2023-01-05 · ·

A memory management unit of a processor may receive a command associated with a process. The command may specify an operation to be performed by another device. The memory management unit may determine a counter value associated with a shared work queue of the another device, an indication the shared work queue to be specified by the command. The memory management unit may determine whether to accept or reject the command based on the counter value and a threshold for the process.

APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS FOR A HARDWARE ASSISTED HETEROGENEOUS INSTRUCTION SET ARCHITECTURE DISPATCHER

Systems, methods, and apparatuses to support instructions for a hardware assisted heterogeneous instruction set architecture dispatcher are described. In one embodiment, a hardware processor includes a plurality of processor cores comprising a first type of processor core that supports a first instruction set architecture and a second type of processor core that supports a second different instruction set architecture, a decoder circuit of a processor core of the plurality of processor cores to decode a single instruction into a decoded single instruction, the single instruction including a field that identifies a requested core type and an opcode that indicates an execution circuit of the processor core is to: read a register to determine a core type of the processor core, cause the processor core to enter a first mode, that only permits execution of the first instruction set architecture by the processor core, when the requested core type and the core type of the processor core are the first type, cause the processor core to enter a second mode, that only permits execution of the second different instruction set architecture by the processor core, when the requested core type and the core type of the processor core are the second type, cause the processor core to enter a third mode, that only permits execution of the first instruction set architecture by the processor core, when the requested core type is the second type and the core type of the processor core is the first type, and cause the processor core to enter a fourth mode, that only permits execution of the second different instruction set architecture by the processor core, when the requested core type is the first type and the core type of the processor core is the second type, and the execution circuit of the processor core to execute the decoded single instruction according to the opcode.

SYSTEM, APPARATUS AND METHODS FOR PERFORMANT READ AND WRITE OF PROCESSOR STATE INFORMATION RESPONSIVE TO LIST INSTRUCTIONS

In one embodiment, a processor includes: a front end circuit to fetch and decode a read list instruction, the read list instruction to cause storage to a memory of a software-provided list of processor state information; and an execution circuit coupled to the front end circuit. The execution circuit, in response to the decoded read list instruction, is to read the processor state information stored in the processor and store each datum of the processor state information into an entry of a data table in the memory. Other embodiments are described and claimed.