IPIQ

G06F9/38

SPLIT CONTROL STACK AND DATA STACK PLATFORM

20180004531 · 2018-01-04 ·

Microsoft Technology Licensing, Llc

In one example, a method includes allocating separate portions of memory for a control stack and a data stack. The method also includes, upon detecting a call instruction, storing a first return address in the control stack and a second return address in the data stack; and upon detecting a return instruction, popping the first return address from the control stack and the second return address from the data stack and raising an exception if the two return addresses do not match. Otherwise, the return instruction returns the first return address. Additionally, the method includes executing an exception handler in response to the return instruction detecting an exception, wherein the exception handler is to pop one or more return addresses from the control stack until the return address on a top of the control stack matches the return address on a top of the data stack.

POST-RETIRE SCHEME FOR TRACKING TENTATIVE ACCESSES DURING TRANSACTIONAL EXECUTION

20180011748 · 2018-01-11 ·

A method and apparatus for post-retire transaction access tracking is herein described. Load and store buffers are capable of storing senior entries. In the load buffer a first access is scheduled based on a load buffer entry. Tracking information associated with the load is stored in a filter field in the load buffer entry. Upon retirement, the load buffer entry is marked as a senior load entry. A scheduler schedules a post-retire access to update transaction tracking information, if the filter field does not represent that the tracking information has already been updated during a pendency of the transaction. Before evicting a line in a cache, the load buffer is snooped to ensure no load accessed the line to be evicted.

METHOD FOR EXECUTING MULTITHREADED INSTRUCTIONS GROUPED INTO BLOCKS

20180011738 · 2018-01-11 ·

Mohammad Abdallah

A method for executing multithreaded instructions grouped into blocks. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks, wherein the instructions of the instruction blocks are interleaved with multiple threads; scheduling the instructions of the instruction block to execute in accordance with the multiple threads; and tracking execution of the multiple threads to enforce fairness in an execution pipeline.

Providing load address predictions using address prediction tables based on load path history in processor-based systems

11709679 · 2023-07-25 ·

Qualcomm Incorporated

Aspects disclosed in the detailed description include providing load address predictions using address prediction tables based on load path history in processor-based systems. In one aspect, a load address prediction engine provides a load address prediction table containing multiple load address prediction table entries. Each load address prediction table entry includes a predictor tag field and a memory address field for a load instruction. The load address prediction engine generates a table index and a predictor tag based on an identifier and a load path history for a detected load instruction. The table index is used to look up a corresponding load address prediction table entry. If the predictor tag matches the predictor tag field of the load address prediction table entry corresponding to the table index, the memory address field of the load address prediction table entry is provided as a predicted memory address for the load instruction.

Streaming engine with multi dimensional circular addressing selectable at each dimension

11709779 · 2023-07-25 ·

Texas Instmments Incorporated

Joseph Zbiciak

A streaming engine employed in a digital data processor may specify a fixed read-only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template register independently specifies a linear address or a circular address mode for each of the nested loops.

Implementing 128-bit SIMD operations on a 64-bit datapath

11709674 · 2023-07-25 ·

Marvell Asia Pte, Ltd.

A method of implementing a processor architecture and corresponding system includes operands of a first size and a datapath of a second size. The second size is different from the first size. Given a first array of registers and a second array of registers, each register of the first and second arrays being of the second size, selecting a first register and corresponding second register from the first array and the second array, respectively, to perform operations of the first size. This allows a user, who is interfacing with the hardware processor through software, to provide data of the datapath bit-width instead of the register bit-width. Advantageously, the user is agnostic to the size of the registers.

Computing device and method

11709672 · 2023-07-25 ·

SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD

The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.

Computing System and controller thereof

20180011710 · 2018-01-11 ·

Computing system and controller thereof are disclosed for ensuring the correct logical relationship between multiple instructions during their parallel execution. The computing system comprises: a plurality of functional modules each performing a respective function in response to an instruction for the given functional module; and a controller for determining whether or not to send an instruction to a corresponding functional module according to dependency relationship between the plurality of instructions.

TECHNIQUES FOR METADATA PROCESSING

20180011708 · 2018-01-11 ·

Andre' DeHon

Techniques are described for metadata processing that can be used to encode an arbitrary number of security policies for code running on a processor. Metadata may be added to every word in the system and a metadata processing unit may be used that works in parallel with data flow to enforce an arbitrary set of policies. In one aspect, the metadata may be characterized as unbounded and software programmable to be applicable to a wide range of metadata processing policies. Techniques and policies have a wide range of uses including, for example, safety, security, and synchronization. Additionally, described are aspects and techniques in connection with metadata processing in an embodiment based on the RISC-V architecture.

Sparsity uniformity enforcement for multicore processor

11709662 · 2023-07-25 ·

Tenstorrent Inc.

Methods and systems relating to the field of parallel computing are disclosed herein. The methods and systems disclosed include approaches for sparsity uniformity enforcement for a set of computational nodes which are used to execute a complex computation. A disclosed method includes determining a sparsity distribution in a set of operand data, and generating, using a compiler, a set of instructions for executing, using the set of operand data and a set of processing cores, a complex computation. Alternatively, the method includes altering the operand data. The method also includes distributing the set of operand data to the set of processing cores for use in executing the complex computation in accordance with the set of instructions. Either the altering is conducted to, or the compiler is programmed to, balance the sparsity distribution among the set of processing cores.

Patent classifications

G06F9/38