Patent classifications
G06F9/312
Semiconductor device and systems using the same
A semiconductor device capable of suppressing performance degradation and systems using the same are provided. The semiconductor device includes a plurality of processors CPU1 and CPU2, a scheduling device 10 (ID1) connected to the processors CPU1 and CPU2 for controlling the processors CPU1 and CPU2 to execute a plurality of tasks in real time, memories 17 and 18 accessed by the processors CPU1 and CPU2 to store data by executing the tasks, and access monitor circuits 15 for monitoring accesses to the memories by the processors CPU1 and CPU2. When an access to the memory is detected by the access monitor circuit 15, the data stored in the memory 18 is transferred based on the destination information of the data stored in the memory 18.
Atomic-copy-XOR instruction for replacing data in a first cacheline with data from a second cacheline
Apparatus and associated methods for implementing atomic instructions for copy-XOR of data. An atomic-copy-xor instruction is defined having a first operand comprising an address of a first cacheline and a second operand comprising an address of a second cacheline. The atomic-copy-xor instruction, which may be included in an instruction set architecture (ISA) of a processor, performs a bitwise XOR operation on copies of data retrieved from the first cacheline and second cacheline to generate an XOR result, and replaces the data in the first cacheline with a copy of data from the second cacheline when the XOR result is non-zero. In addition to implementation using a processor core, the atomic-copy-xor instruction may be implemented using various offloading schemes under which the processor core executing the atomic-copy-xor instruction offloads operations to other components in the processor or system in which the processor is implemented, including offloading operations to a last level cache (LLC) engine, a memory controller, or a DIMM controller.
Processing of plural-register-load instruction
An apparatus comprises processing circuitry to issue load operations to load data from memory. In response to a plural-register-load instruction specifying at least two destination registers to be loaded with data from respective target addresses, the processing circuitry permits issuing of separate load operations corresponding to the plural-register-load instruction. Load tracking circuitry maintains tracking information for one or more issued load operations. When the plural-register-load instruction is subject to an atomicity requirement and the plurality of load operations are issued separately, the load tracking circuitry detects, based on the tracking information, whether a loss-of-atomicity condition has occurred for the load operations corresponding to the plural-register-load instruction, and requests re-processing of the plural-register-load instruction when the loss-of-atomicity condition is detected.
Mechanism for mitigating information leak via cache side channels during speculative execution
A processor includes a first core and a second core to execute computer instructions. Each of the cores includes its own private memory cache and speculative load queue. The speculative load queue stores cachelines for the computer instructions and data when the core is operating in a speculative state with respect to a process or thread. The processor includes a state tracking buffer having a state field to store a speculative exclusive ownership state for each cacheline in the speculative load queue when present therein.
Handling of single-copy-atomic load/store instruction with a memory access request shared by micro-operations
In response to a single-copy-atomic load/store instruction for requesting an atomic transfer of a target block of data between the memory system and the registers, where the target block has a given size greater than a maximum data size supported for a single load/store micro-operation by a load/store data path, instruction decoding circuitry maps the single-copy-atomic load/store instruction to two or more mapped load/store micro-operations each for requesting transfer of a respective portion of the target block of data. In response to the mapped load/store micro-operations, load/store circuitry triggers issuing of a shared memory access request to the memory system to request the atomic transfer of the target block of data of said given size to or from the memory system, and triggers separate transfers of respective portions of the target block of data over the load/store data path.
Writing prefetched data into intra-core caches of cores identified by prefetching instructions
A method for accessing a memory of a multi-core system, a related apparatus, a system, and a storage medium involve obtaining data from a system memory according to a prefetch instruction, and sending a message to a core that carries the to-be-accessed data. Each segment of data is stored in an intra-core cache based on the prefetch instruction.
Extended pointer register for configuring execution of a store and pack instruction and a load and unpack instruction
A microprocessor system includes a processing circuit and a memory operably coupled to the processing circuit and configured to receive input data according to a pack and store operation and output the data according to a load and unpack operation. The processing circuit comprises a hardware extension configured to: configure a variable number of bits per data element during a pack and store operation; store a concatenation of a plurality of data elements with a reduced number of bits; extract a plurality of data elements with a reduced number of bits during a load and unpacking operation; and recreate a plurality of data elements with an increased number of bits per data element representative of the data elements prior to the pack and store operation.
Look-up table write
A digital data processor includes an instruction memory storing instructions each specifying a data processing operation and at least one data operand field, an instruction decoder coupled to the instruction memory for sequentially recalling instructions from the instruction memory and determining the data processing operation and the at least one data operand, and at least one operational unit coupled to a data register file and to the instruction decoder to perform a data processing operation upon at least one operand corresponding to an instruction decoded by the instruction decoder and storing results of the data processing operation. The at least one operational unit is configured to perform a table write in response to a look up table write instruction by writing at least one data element from a source data register to a specified location in a specified number of at least one table.
Streaming engine with cache-like stream data storage and lifetime tracking
A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. The streaming engine fetches stream data ahead of use by the central processing unit core in a stream buffer constructed like a cache. The stream buffer cache includes plural cache lines, each includes tag bits, at least one valid bit and data bits. Cache lines are allocated to store newly fetched stream data. Cache lines are deallocated upon consumption of the data by a central processing unit core functional unit. Instructions preferably include operand fields with a first subset of codings corresponding to registers, a stream read only operand coding and a stream read and advance operand coding.
Streaming engine with error detection, correction and restart
Disclosed embodiments relate to a streaming engine employed in, for example, a digital signal processor. A fixed data stream sequence including plural nested loops is specified by a control register. The streaming engine includes an address generator producing addresses of data elements and a steam head register storing data elements next to be supplied as operands. The streaming engine fetches stream data ahead of use by the central processing unit core in a stream buffer. Parity bits are formed upon storage of data in the stream buffer which are stored with the corresponding data. Upon transfer to the stream head register a second parity is calculated and compared with the stored parity. The streaming engine signals a parity fault if the parities do not match. The streaming engine preferably restarts fetching the data stream at the data element generating a parity fault.