G06F9/30098

INLINE DATA INSPECTION FOR WORKLOAD SIMPLIFICATION

A method, computer readable medium, and processor are described herein for inline data inspection by using a decoder to decode a load instruction, including a signal to cause a circuit in a processor to indicate whether data loaded by a load instruction exceeds a threshold value. Moreover, an indication of whether data loaded by a load instruction exceeds a threshold value may be stored.

Techniques for memory access in a reduced power state

Various embodiments are generally directed to techniques for memory access by a computer in a reduced power state, such as during video playback or connected standby. Some embodiments are particularly directed to disabling one or more memory channels during a reduced power state by mapping memory usages during the reduced power state to one of a plurality of memory channels. In one embodiment, for example, one or more low-power mode blocks in a set of functional blocks of a computer may be identified. In some such embodiments, the computer may include a processor, a memory, and first and second memory channels to communicatively couple the processor with the second memory. In many embodiments, usage of the one or more low-power mode blocks in the set of functional blocks may be mapped to a first address range associated with the first memory channel.

Memory access collision management on a shared wordline

A processing device in a memory sub-system sends a program command to the memory device to cause the memory device to initiate a program operation on a corresponding wordline and sub-block of a memory array of the memory device. The processing device further receives a request to perform a read operation on data stored on the wordline and sub-block of the memory array, sends a suspend command to the memory device to cause the memory device to suspend the program operation, reads data corresponding to the read operation from a page cache of the memory device, and sends a resume command to the memory device to cause the memory device to resume the program operation.

STREAMING ADDRESS GENERATION

A digital signal processor having at least one streaming address generator, each with dedicated hardware, for generating addresses for writing multi-dimensional streaming data that comprises a plurality of elements. Each at least one streaming address generator is configured to generate a plurality of offsets to address the streaming data, and each of the plurality of offsets corresponds to a respective one of the plurality of elements. The address of each of the plurality of elements is the respective one of the plurality of offsets combined with a base address.

Method and apparatus to process SHA-2 secure hashing algorithm

A processor includes an instruction decoder to receive a first instruction to process a secure hash algorithm 2 (SHA-2) hash algorithm, the first instruction having a first operand associated with a first storage location to store a SHA-2 state and a second operand associated with a second storage location to store a plurality of messages and round constants. The processor further includes an execution unit coupled to the instruction decoder to perform one or more iterations of the SHA-2 hash algorithm on the SHA-2 state specified by the first operand and the plurality of messages and round constants specified by the second operand, in response to the first instruction.

Arithmetic processing device having multicore ring bus structure with turn-back bus for handling register file push/pull requests
11550576 · 2023-01-10 · ·

An arithmetic processing device includes arithmetic processing units, each having a calculator unit; a scheduler that controls a push instruction to write data to a register file in one of the arithmetic processing units and a pull instruction to read data from the register file; a pull request bus to which the scheduler outputs a pull request and which is connected to the arithmetic processing units; a push request bus to which the scheduler outputs a push request and which is connected to the arithmetic processing units; and a pull data bus that inputs, into the scheduler, pull data read from the register file in response to the pull request. Each of the arithmetic processing units includes a pull data turn-back bus that propagates pull data read from its register file to the pull data bus.

Method and tensor traversal engine for strided memory access during execution of neural networks

A tensor traversal engine in a processor system comprising a source memory component and a destination memory component, the tensor traversal engine comprising: a control signal register storing a control signal for a strided data transfer operation from the source memory component to the destination memory component, the control signal comprising an initial source address, an initial destination address, a first source stride length in a first dimension, and a first source stride count in the first dimension; a source address register communicatively coupled to the control signal register; a destination address register communicatively coupled to the control signal register; a first source stride counter communicatively coupled to the control signal register; and control logic communicatively coupled to the control signal register, the source address register, and the first source stride counter.

Mechanism for interrupting and resuming execution on an unprotected pipeline processor

Techniques related to executing a plurality of instructions by a processor comprising receiving a first instruction for execution on an instruction execution pipeline, beginning execution of the first instruction, receiving one or more second instructions for execution on the instruction execution pipeline, the one or more second instructions associated with a higher priority task than the first instruction, storing a register state associated with the execution of the first instruction in one or more registers of a capture queue associated with the instruction execution pipeline, copying the register state from the capture queue to a memory, determining that the one or more second instructions have been executed, copying the register state from the memory to the one or more registers of the capture queue, and restoring the register state to the instruction execution pipeline from the capture queue.

Dynamic size of static SLC cache
11693700 · 2023-07-04 · ·

Apparatus and methods are disclosed, including using a memory controller to track a maximum logical saturation over the lifespan of the memory device, where logical saturation is the percentage of capacity of the memory device written with data. A portion of a pool of memory cells of the memory device is reallocated from single level cell (SLC) static cache to SLC dynamic cache storage based at least in part on a value of the maximum logical saturation, the reallocating including writing at least one electrical state to a register, in some examples.

Fast memory mapped IO support by register switch
11693722 · 2023-07-04 · ·

The technology disclosed herein enhances a fault-based communication channel between a virtual machine and a hypervisor. An example method may include: configuring, by a hypervisor, a first memory location to generate one or more faults when accessed by a virtual machine process, wherein the first memory location is mapped to a device and a second memory location is mapped to memory; detecting, by the hypervisor, a fault caused by a first execution of an instruction of the virtual machine process, wherein the instruction comprises a reference to a register comprising the first memory location; responsive to the detecting the fault, the hypervisor performing a computing task for the virtual machine process and updating the register to comprise the second memory location; and initiating, by the hypervisor, a second execution of the instruction of the virtual machine process, wherein the second execution of the instruction accesses the second memory location.