G06F9/3818

METHODS AND APPARATUS FOR THREAD-BASED SCHEDULING IN MULTICORE NEURAL NETWORKS
20230153596 · 2023-05-18 · ·

Systems, apparatus, and methods for thread-based scheduling within a multicore processor. Neural networking uses a network of connected nodes (aka neurons) to loosely model the neuro-biological functionality found in the human brain. Various embodiments of the present disclosure use thread dependency graphs analysis to decouple scheduling across many distributed cores. Rather than using thread dependency graphs to generate a sequential ordering for a centralized scheduler, the individual thread dependencies define a count value for each thread at compile-time. Threads and their thread dependency count are distributed to each core at run-time. Thereafter, each core can dynamically determine which threads to execute based on fulfilled thread dependencies without requiring a centralized scheduler.

Processor authentication method

The disclosure includes a method of authenticating a processor that includes an arithmetic and logic unit. At least one decoded operand of at least a portion of a to-be-executed opcode is received on a first terminal of the arithmetic and logic unit. A signed instruction is received on a second terminal of the arithmetic and logic unit. The signed instruction combines a decoded instruction of the to-be-executed opcode and at least one previously-executed opcode.

CONVERSION INSTRUCTIONS

Techniques for data type conversion are described. An example uses an instruction that is to include fields for an opcode, an identification of source operand location, and an identification of destination operand location, wherein the opcode is to indicate instruction processing circuitry is to convert a 16-bit floating-point value from the identified source operand location into a 32-bit floating point value and store that 32-bit floating point value in one or more data element positions of the identified destination operand.

CONVERSION INSTRUCTIONS

Techniques for data type conversion via instruction are described. An exemplary instruction is to include fields for an opcode, an identification of a source operand, and an identification of destination operand, wherein the opcode is to indicate instruction processing circuitry is to convert odd 16-bit floating point values from the identified source operand into 32-bit floating point values and store the 32-bit floating point values in data element positions of the identified destination operand.

Out-of-order block-based processors and instruction schedulers using ready state data indexed by instruction position identifiers

Apparatus and methods are disclosed for implementing block-based processors including field programmable gate-array implementations. In one example of the disclosed technology, a block-based processor includes an instruction decoder configured to generate decoded ready dependencies for a transactional block of instructions, where each of the instructions is associated with a different instruction identifier encoded in the transactional block. The processor further includes an instruction scheduler configured to issue an instruction from a set of instructions of the transactional block of instructions. The instruction is issued based on determining that decoded ready state dependencies for an instruction are satisfied. The determining includes accessing storage with the decoded ready dependencies indexed with a respective instruction identifier that is encoded in the transactional block of instructions.

GRAPHICS PROCESSORS AND GRAPHICS PROCESSING UNITS HAVING DOT PRODUCT ACCUMULATE INSTRUCTION FOR HYBRID FLOATING POINT FORMAT

Described herein is a graphics processing unit (GPU) configured to receive an instruction having multiple operands, where the instruction is a single instruction multiple data (SIMD) instruction configured to use a bfloat16 (BF16) number format and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent. The GPU can process the instruction using the multiple operands, where to process the instruction includes to perform a multiply operation, perform an addition to a result of the multiply operation, and apply a rectified linear unit function to a result of the addition.

DEVICE, METHOD AND SYSTEM TO PROVIDE A PREDICTED VALUE WITH A SEQUENCE OF MICRO-OPERATIONS

Techniques and mechanisms for efficiently making value prediction information available for use by in a processor. In an embodiment, the instruction execution is to include a loading of some data to a first location (e.g., a first register). A decoder of the processor accesses reference information which indicates that the execution is to comprise multiple micro-operations (μops) including a LoadCheck μop and a Move μop. The LoadCheck μop loads a first value to the first location, and checks whether the loaded first value is the same as a previously-determined second value which represents a prediction of what the first value would be. The Move μop moves the second value to the first location. In another embodiment, the Move μop is scheduled for execution out-of-order with respect to the LoadCheck μop, resulting in an early availability of the second value for access in a register file by another μop.

CIRCUITRY AND METHODS FOR IMPLEMENTING CAPABILITY-BASED COMPARTMENT SWITCHES WITH DESCRIPTORS
20230195360 · 2023-06-22 ·

Systems, methods, and apparatuses for implementing capability-based compartment switches with descriptors are described. In certain examples, a hardware processor core comprises a capability management circuit to check a capability for a memory access request, the capability comprising an address field and a bounds field that is to indicate a lower bound and an upper bound of an address range to which the capability authorizes access; a decoder circuit to decode a single instruction into a decoded single instruction, the single instruction comprising one or more fields to indicate a first compartment descriptor that identifies a first capability to a first state element in a first compartment of memory and a second capability to a second state element in the first compartment of the memory, and an opcode to indicate that an execution circuit is to load the first capability from the first compartment descriptor of the memory into a first register to enable the capability management circuit to determine whether a first bounds field of the first capability authorizes an access to the first state element in the first compartment of the memory, and load the second capability from the first compartment descriptor of the memory into a second register to enable the capability management circuit to determine that a second bounds field of the second capability authorizes an access to the second state element in the first compartment of the memory; and the execution circuit to execute the decoded single instruction according to the opcode.

REGISTER FILE VIRTUALIZATION : APPLICATIONS AND METHODS

Methods and apparatus relating to register file virtualization techniques are described. In an embodiment, a register file includes a plurality of register file cells. Each of the register file cells includes a register file entry and a shadow buffer. Logic circuitry causes storage of input data to the shadow buffer, while data stored in the register file entry is accessible to perform one or more operations. Other embodiments are also disclosed and claimed.

HAZARD GENERATING FOR SPECULATIVE CORES IN A MICROPROCESSOR
20230195981 · 2023-06-22 ·

A system, mechanism, tool, programming product, processor, and/or method for generating a hazard in a processor includes: identifying one or more cache lines to invalidate in a second level memory of a processing core in the processor; invalidating, in response to identifying one or more cache lines to invalidate in the second level cache, the one or more identified cache lines in the second level memory; and invalidating, in response to invalidating the one or more identified cache lines in the second level memory, the corresponding one or more cache lines in a first level memory. In an aspect the hazard generating mechanism is triggered, preferably on demand, and includes in an approach searching for cache lines in the second level memory that are also in the first level memory.