G06F9/30178

Cryptographic isolation of memory compartments in a computing environment

Technologies disclosed herein provide cryptographic computing. An example method comprises executing a first instruction of a first software entity to receive a first input operand indicating a first key associated with a first memory compartment of a plurality of memory compartments stored in a first memory unit, and execute a cryptographic algorithm in a core of a processor to compute first encrypted contents based at least in part on the first key. Subsequent to computing the first encrypted contents in the core, the first encrypted contents are stored at a memory location in the first memory compartment of the first memory unit. More specific embodiments include, prior to storing the first encrypted contents at the memory location in the first memory compartment and subsequent to computing the first encrypted contents in the core, moving the first encrypted contents into a level one (L1) cache outside a boundary of the core.

Memory write log storage processors, methods, systems, and instructions

A processor of an aspect includes a decode unit to decode an instruction. The instruction is to indicate a destination memory address information. An execution unit is coupled with the decode unit. The execution unit, in response to the decode of the instruction, is to store memory addresses, for at least all initial writes to corresponding data items, which are to occur after the instruction in original program order, to a memory address log. A start of the memory address log is to correspond to the destination memory address information. Other processors, methods, systems, and instructions are also disclosed.

Microprocessor pipeline circuitry to support cryptographic computing

In one embodiment, a processor of a cryptographic computing system includes data cache units storing encrypted data and circuitry coupled to the data cache units. The circuitry accesses a sequence of cryptographic-based instructions to execute based on the encrypted data, decrypts the encrypted data based on a first pointer value, executes the cryptographic-based instruction using the decrypted data, encrypts a result of the execution of the cryptographic-based instruction based on a second pointer value, and stores the encrypted result in the data cache units. In some embodiments, the circuitry generates, for each cryptographic-based instruction, at least one encryption-based microoperation and at least one non-encryption-based microoperation. The circuitry also schedules the at least one encryption-based microoperation and the at least one non-encryption-based microoperation for execution based on timings of the encryption-based microoperation.

Management of keys for use in cryptographic computing

A method comprising executing, by a core of a processor, a first instruction requesting access to a parameter associated with data for storage in a main memory coupled to the processor, the first instruction including a reference to the parameter, a reference to a wrapping key, and a reference to an encrypted encryption key, wherein execution of the first instruction comprises decrypting the encrypted encryption key using the wrapping key to generate a decrypted encryption key; requesting transfer of the data between the main memory and the processor core; and performing a cryptographic operation on the parameter using the decrypted encryption key.

Thread group scheduling for graphics processing

Embodiments are generally directed to thread group scheduling for graphics processing. An embodiment of an apparatus includes a plurality of processors including a plurality of graphics processors to process data; a memory; and one or more caches for storage of data for the plurality of graphics processors, wherein the one or more processors are to schedule a plurality of groups of threads for processing by the plurality of graphics processors, the scheduling of the plurality of groups of threads including the plurality of processors to apply a bias for scheduling the plurality of groups of threads according to a cache locality for the one or more caches.

HIGHLY PARALLEL PROCESSING ARCHITECTURE WITH SHALLOW PIPELINE
20220075627 · 2022-03-10 · ·

Techniques for task processing using a highly parallel processing architecture with a shallow pipeline are disclosed. A two-dimensional array of compute elements is accessed. Each compute element within the array of compute elements is known to a compiler and is coupled to its neighboring compute elements within the array of compute elements. Control for the array of compute elements is provided on a cycle-by-cycle basis. The control is enabled by a stream of wide, variable length, microcode control words generated by the compiler. Relevant portions of the control word are stored within a cache associated with the array of compute elements. The control words are decompressed. The decompressing occurs cycle-by-cycle out of the cache over multiple cycles. A compiled task is executed on the array of compute elements, based on the decompressing. Simultaneous execution of two or more potential compiled task outcomes is provided.

METHOD AND SYSTEM FOR ACCELERATING AI TRAINING WITH ADVANCED INTERCONNECT TECHNOLOGIES
20210318878 · 2021-10-14 ·

According to various embodiments, methods and systems are provided to accelerate artificial intelligence (AI) model training with advanced interconnect communication technologies and systematic zero-value compression over a distributed training system. According to an exemplary method, during each iteration of a Scatter-Reduce process performed on a cluster of processors arranged in a logical ring to train a neural network model, a processor receives a compressed data block from a prior processor in the logical ring, performs an operation on the received compressed data block and a compressed data block generated on the processor to obtain a calculated data block, and sends the calculated data block to a following processor in the logical ring. A compressed data block calculated from corresponding data blocks from the processors can be identified on each processor and distributed to each other processor and decompressed therein for use in the AI model training.

Enhanced protection of processors from a buffer overflow attack
11119769 · 2021-09-14 ·

A method for changing a processor instruction randomly, covertly, and uniquely, so that the reverse process can restore it faithfully to its original form, making it virtually impossible for a malicious user to know how the bits are changed, preventing them from using a buffer overflow attack to write code with the same processor instruction changes into said processor's memory with the goal of taking control of the processor. When the changes are reversed prior to the instruction being executed, reverting the instruction back to its original value, malicious code placed in memory will be randomly altered so that when it is executed by the processor it produces chaotic, random behavior that will not allow control of the processor to be compromised, eventually producing a processing error that will cause the processor to either shut down the software process where the code exists to reload, or reset.

Illegal instruction exception handling

Techniques are disclosed relating to the handling of exceptions generated by illegal instructions in a processor. In an embodiment, a processor may be configured to fetch instructions defined according to an instruction set architecture (ISA). The ISA may include a set of uncompressed instructions and a set of compressed instructions. The processor may further be configured to, upon detecting a given one of the set of compressed instructions, cause a copy of the given compressed instruction to be saved and convert the given compressed instruction to a corresponding given uncompressed instruction. The processor may also be configured to detect that the given uncompressed instruction is illegal and was converted from the given compressed instruction, and based at least in part on these, cause an illegal instruction exception to be generated using the copy of the given compressed instruction.

APPARATUS AND METHOD FOR INHIBITING INSTRUCTION MANIPULATION

An apparatus and method are provided for inhibiting instruction manipulation. The apparatus has execution circuitry for performing data processing operations in response to a sequence of instructions from an instruction set, and decoder circuitry for decoding each instruction in the sequence in order to generate control signals for the execution circuitry. Each instruction comprises a plurality of instruction bits, and the decoder circuitry is arranged to perform a decode operation on each instruction to determine from the value of each instruction bit, and knowledge of the instruction set, the control signals to be issued to the execution circuitry in response to that instruction. An input path to the decoder circuitry comprises a set of wires over which the instruction bits of each instruction are provided. Scrambling circuitry is used to perform a scrambling function on each instruction using a secret scrambling key, such that the wire within the set of wires over which any given instruction bit is provided to the decoder circuitry is dependent on the secret scrambling key. The decode operation performed by the decoder circuitry is then adapted to incorporate a descrambling function using the secret scrambling key to reverse the effect of the scrambling function. As a result, independent of which wire any given instruction bit is provided on, the decode operation is arranged when decoding a given instruction to correctly interpret each instruction bit of that given instruction, based on knowledge of the instruction set, in order to determine from the value of each instruction bit the control signals to be issued to the execution circuitry in response to that given instruction.