Patent classifications
G06F9/3824
Non-volatile memory based processors and dataflow techniques
A monolithic integrated circuit (IC) including one or more compute circuitry, one or more non-volatile memory circuits, one or more communication channels and one or more communication interface. The one or more communication channels can communicatively couple the one or more compute circuitry, the one or more non-volatile memory circuits and the one or more communication interface together. The one or more communication interfaces can communicatively couple one or more circuits of the monolithic integrated circuit to one or more circuits external to the monolithic integrated circuit.
Digitally coordinated dynamically adaptable clock and voltage supply apparatus and method
An apparatus and method is described that digitally coordinates dynamically adaptable clock and voltage supply to significantly reduce the energy consumed by a processor without impacting its performance or latency. A signal is generated that indicates a long latency operation. This signal is used to reduce power supply voltage and frequency of the adaptable clock. An early resume indicator is generated a few nanoseconds before normal operations are about to resume. This early resume signal is used to power up the power-downed voltage regulator, and/or can increase frequency and/or supply voltage back to normal level before normal processor operations are about to resume.
APPARATUS AND METHOD FOR HANDLING MEMORY LOAD REQUESTS
When load requests are generated to support data processing operations, the load requests are buffered in pending load buffer circuitry prior to being carried out. Coalescing circuitry determines for a first load request whether a set of one or more subsequent load requests buffered in the pending load buffer circuitry satisfies an address proximity condition. The address proximity condition is satisfied when all data items identified by the set of one or more subsequent load requests are comprised within a series of data items which will be retrieved from the memory system in response to the first load request. When the address proximity condition is satisfied, forwarding of the set of one or more subsequent load requests is suppressed.
Register File Protection
An integrated circuit chip can provide protection with registers of a register file. A processor can be part of general or security-oriented (e.g., root-of-trust (RoT)) circuitry. In described implementations, the processor includes multiple register blocks for storing multiple register values. The processor also includes multiple integrity blocks for storing multiple integrity codes. A respective integrity block is associated with a respective register block. The respective integrity block can store a respective integrity code that is derived from a respective register value that is stored in the respective register block. The integrity code can enable detection or correction of one or more corrupted bits in the register value. An integrity controller of the processor can monitor the register value regularly or in response to an access by an execution unit. The controller can take a protective action if corruption is detected. This enables information protection to extend to processor execution units.
METHOD FOR SECURE, SIMPLE, AND FAST SPECULATIVE EXECUTION
A method of verifying authenticity of a speculative load instruction is disclosed which includes receiving a new speculative source-destination pair (PAIR), wherein the source represents a speculative load instruction and the destination represents an associated destination virtual memory location holding data to be loaded onto a register with execution of the source, checking the PAIR against one or more memory tables associated with non-speculative source-destination pairs, if the PAIR exists in the one or more memory tables, then executing the instruction associated with the source of the PAIR, if the PAIR does not exist, then i) waiting until the speculation of the source instruction has cleared as being non-speculative, ii) updating the one or more memory tables, and iii) executing the instruction associated with the source, and if the speculation of the source instruction of the PAIR does not clear as non-speculative, then the source is nullified.
Combining load or store instructions
Various aspects disclosed herein relate to combining instructions to load data from or store data in memory while processing instructions in a computer processor. More particularly, at least one pattern of multiple memory access instructions that reference a common base register and do not fully utilize an available bus width may be identified in a processor pipeline. In response to determining that the multiple memory access instructions target adjacent memory or non-contiguous memory that can fit on a single cache line, the multiple memory access instructions may be replaced within the processor pipeline with one equivalent memory access instruction that utilizes more of the available bus width than either of the replaced memory access instructions.
Widening memory access to an aligned address for unaligned memory operations
Unaligned atomic memory operations on a processor using a load-store instruction set architecture (ISA) that requires aligned accesses are performed by widening the memory access to an aligned address by the next larger power of two (e.g., 4-byte access is widened to 8 bytes, and 8-byte access is widened to 16 bytes). Data processing operations supported by the load-store ISA including shift, rotate, and bitfield manipulation are utilized to modify only the bytes in the original unaligned address so that the atomic memory operations are aligned to the widened access address. The aligned atomic memory operations using the widened accesses avoid the faulting exceptions associated with unaligned access for most 4-byte and 8-byte accesses. Exception handling is performed in cases in which memory access spans a 16-byte boundary.
INSERTING A PROXY READ INSTRUCTION IN AN INSTRUCTION PIPELINE IN A PROCESSOR
Inserting a proxy read instruction in an instruction pipeline in a processor is disclosed. A scheduler circuit is configured to recognize when a produced value generated by execution of a producer instruction in the instruction pipeline will not be available through a data forwarding path to be consumed for processing of a subsequent consumer instruction. In this case, the scheduling circuit is configured to insert a proxy read instruction in the instruction pipeline to cause execution of an operation to generate the same produced value as was generated by previous execution of producer instruction in the instruction pipeline. Thus, the produced value will remain available in the instruction pipeline to again be available through a data forwarding path to an earlier stage of the instruction pipeline to be consumed by a consumer instruction, which may avoid a pipeline stall.
Method and apparatus for virtualizing the micro-op cache
Systems, apparatuses, and methods for virtualizing a micro-operation cache are disclosed. A processor includes at least a micro-operation cache, a conventional cache subsystem, a decode unit, and control logic. The decode unit decodes instructions into micro-operations which are then stored in the micro-operation cache. The micro-operation cache has limited capacity for storing micro-operations. When new micro-operations are decoded from pending instructions, existing micro-operations are evicted from the micro-operation cache to make room for the new micro-operations. Rather than being discarded, micro-operations evicted from the micro-operation cache are stored in the conventional cache subsystem. This prevents the original instruction from having to be decoded again on subsequent executions. When the control logic determines that micro-operations for one or more fetched instructions are stored in either the micro-operation cache or the conventional cache subsystem, the control logic causes the decode unit to transition to a reduced-power state.
Processor device for executing SIMD instructions
In a processor device according to the present invention, a memory access unit reads data to be processed from an external memory and writes the data to a first register group that a plurality of processors does not access among a plurality of register groups. A control unit sequentially makes each of the plurality of processors implement a same instruction, in parallel with changing an address of a register group that stores the data to be processed. A scheduler, based on specified scenario information, specifies an instruction to be implemented and a register group to be accessed for the plurality of processors, and specifies a register group to be written to among the plurality of register groups and data to be processed that is to be written for the memory access unit.