Patent classifications
G06F2212/452
MICROPROCESSOR THAT PREVENTS SAME ADDRESS LOAD-LOAD ORDERING VIOLATIONS USING PHYSICAL ADDRESS PROXIES
A L2 set associative cache that is inclusive of an L1 cache. Each entry of a load queue holds a load physical address proxy (PAP) for a load physical memory line address (PMLA) rather than the load PMLA itself. The load PAP comprises the set index and the way that uniquely identifies the L2 entry that holds a memory line specified by the load PMLA. Each load queue entry indicates whether the load instruction has completed execution. The microprocessor removes a memory line at a removal PMLA from an L2 entry and forms a removal PAP as a proxy for the removal PMLA. The removal PAP comprises a set index and a way that uniquely identifies the removed entry. The microprocessor snoops the load queue with the removal PAP to determine whether the removal PAP matches one or more load PAPs in one or more load queue entries associated with one or more load instructions that have completed execution and, if so, signals an abort request.
Streaming engine with deferred exception reporting
This invention is a streaming engine employed in a digital signal processor. A fixed data stream sequence is specified by a control register. The streaming engine fetches stream data ahead of use by a central processing unit and stores it in a stream buffer. Upon occurrence of a fault reading data from memory, the streaming engine identifies the data element triggering the fault preferably storing this address in a fault address register. The streaming engine defers signaling the fault to the central processing unit until this data element is used as an operand. If the data element is never used by the central processing unit, the streaming engine never signals the fault. The streaming engine preferably stores data identifying the fault in a fault source register. The fault address register and the fault source register are preferably extended control registers accessible only via a debugger.
Pointer dereferencing within memory sub-system
Various embodiments described herein provide for a memory sub-system read operation or a memory sub-system write operation that can be requested by a host system and involves performing a multi-level (e.g., two-level) pointer dereferencing internally within the memory sub-system. Such embodiments can at least reduce the number of read operations that a host system sends to a memory sub-system to perform a multi-level dereferencing operation.
CACHING BASED ON BRANCH INSTRUCTIONS IN A PROCESSOR
In an embodiment, a processor may include an execution circuit to execute a plurality of instructions, a cache, and a decode circuit. The decode circuit may be to: detect a branch instruction in a program, the branch instruction to cause execution to follow either a first path or a second path in the program; and in response to a determination that the branch instruction is a hard to predict (HTP) branch, cause first and second sets of instructions to be stored in the cache, where the first set of instructions is included in the first path, and where the second set of instructions is included in the second path. Other embodiments are described and claimed.
Wireless Control Device and Methods Thereof
A wireless control device includes a power source, one or more sensors, one or more switches, a wireless transceiver circuit, an antenna connected to the wireless transceiver circuit, and a processor communicably coupled to the power source, the one or more sensors, the one or more switches, and the wireless transceiver circuit. The processor receives a data from the one or more sensors or the one or more switches, determines a pre-defined action associated with the data that identifies one or more external devices and one or more tasks, and transmits one or more control signals via the wireless transceiver circuit and the antenna that instruct the identified external device(s) to perform the identified task(s).
Memory controllers including examples of calculating hamming distances for neural network and data center applications
Examples of systems and method described herein provide for the processing of image codes (e.g., a binary embedding) at a memory controller with various memory devices. Such images codes may generated by various endpoint computing devices, such as Internet of Things (IoT) computing devices, Such devices can generate a Hamming processing request, having an image code of the image, to compare that representation of the image to other images (e.g., in an image dataset) to identify a match or a set of neural network results. Advantageously, examples described herein may be used in neural networks to facilitate the processing of datasets, so as to increase the rate and amount of processing of such datasets. For example, comparisons of image codes can be performed “closer” to the memory devices, e.g., at the memory controller coupled to memory devices.
METHOD AND APPARATUS FOR IMPLIED BIT HANDLING IN FLOATING POINT MULTIPLICATION
A method is provided that includes performing, by a processor in response to a floating point multiply instruction, multiplication of floating point numbers, wherein determination of values of implied bits of leading bit encoded mantissas of the floating point numbers is performed in parallel with multiplication of the encoded mantissas, and storing, by the processor, a result of the floating point multiply instruction in a storage location indicated by the floating point multiply instruction.
Streaming engine with flexible streaming engine template supporting differing number of nested loops with corresponding loop counts and loop offsets
A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template specifies loop count and loop dimension for each nested loop. A format definition field in the stream template specifies the number of loops and the stream template bits devoted to the loop counts and loop dimensions. This permits the same bits of the stream template to be interpreted differently enabling trade off between the number of loops supported and the size of the loop counts and loop dimensions.
APPARATUSES, METHODS, AND SYSTEMS TO PRECISELY MONITOR MEMORY STORE ACCESSES
Systems, methods, and apparatuses relating to circuitry to precisely monitor memory store accesses are described. In one embodiment, a system includes a memory, a hardware processor core comprising a decoder to decode an instruction into a decoded instruction, an execution circuit to execute the decoded instruction to produce a resultant, a store buffer, and a retirement circuit to retire the instruction when a store request for the resultant from the execution circuit is queued into the store buffer for storage into the memory, and a performance monitoring circuit to mark the retired instruction for monitoring of post-retirement performance information between being queued in the store buffer and being stored in the memory, enable a store fence after the retired instruction to be inserted that causes previous store requests to complete within the memory, and on detection of completion of the store request for the instruction in the memory, store the post-retirement performance information in storage of the performance monitoring circuit.
Store prefetches for dependent loads in a processor
An information handling system, method, and processor that detects a store instruction for data in a processor where the store instruction is a reliable indicator of a future load for the data; in response to detecting the store instruction, sends a prefetch request to memory for an entire cache line containing the data referenced in the store instruction, and preferably only the single cache line containing the data; and receives, in response to the prefetch request, the entire cache line containing the data referenced in the store instruction.