G06F9/30072

Enhanced Macroscalar predicate operations
09817663 · 2017-11-14 · ·

Systems, apparatuses and methods for utilizing enhanced macro scalar predicate operations which take enhanced predicate operands that designate the element width and which elements are to be processed. The element width and the number of elements per vector are determined at run-time rather than being defined in the architectural definition of the instruction. This enables additional parallelism when processing smaller-sized data. The instruction performs the requested operation on the elements specified by the enhanced control predicate, assuming an element-width also specified by the enhanced control predicate, and returns the result as an enhanced predicate of the same element width.

Method and apparatus for efficient execution of nested branches on a graphics processor unit

An apparatus and method for executing nested control flow instructions on a graphics processing unit (GPU). For example, one embodiment of a processor comprises: an execution unit having a plurality of channels to execute control flow instructions including fused control flow instructions comprising two or more consecutive control flow instructions fused into a single fused control flow instruction; and a branch unit to process the control flow instructions and to maintain a global counter indicating a nesting level of the control flow instructions, wherein to process a fused control flow instruction, the branch unit is to store a value N in a stack indicating a number of control flow instructions fused into the fused control flow instruction, the branch unit to subsequently read the value N from the stack upon execution of the fused control flow instruction and decrement the global counter by a value of N responsive to execution of the fused control flow instruction.

In-Memory/Register Vector Radix Sort
20170262211 · 2017-09-14 ·

Methods, systems and computer program products for accelerating sorting of data are provided herein. A computer-implemented method includes retrieving a plurality of cache lines of data from an input buffer, wherein each cache line comprises a plurality of elements, scattering the plurality of elements of each retrieved cache line into a plurality of bins, wherein said scattering comprises using one or more vector instructions, forming a bin cache line in a corresponding one of the plurality of bins, wherein the bin cache line comprises a group of the plurality of elements which were scattered to the corresponding one of the plurality of bins, writing the bin cache line from the corresponding one of the plurality of bins to a memory, and loading the bin cache line from the memory to the input buffer.

Techniques For Metadata Processing
20220043654 · 2022-02-10 ·

Techniques are described for metadata processing that can be used to encode an arbitrary number of security policies for code running on a processor. Metadata may be added to every word in the system and a metadata processing unit may be used that works in parallel with data flow to enforce an arbitrary set of policies. In one aspect, the metadata may be characterized as unbounded and software programmable to be applicable to a wide range of metadata processing policies. Techniques and policies have a wide range of uses including, for example, safety, security, and synchronization. Additionally, described are aspects and techniques in connection with metadata processing in an embodiment based on the RISC-V architecture.

SYSTEMS AND METHODS OF TELEMETRY DIAGNOSTICS
20220237021 · 2022-07-28 ·

Systems and method are provided for executing a workflow based on a received alert notification, wherein the workflow includes one or more tasks to be executed by a workflow processor. The workflow is validated when it is determined that each task of the workflow is executable without failure. A job may be generated based on the validated workflow, and a state object in a state engine may be generated to be used by the job for processing by the workflow processor. Each task of the state object may be iterated to complete the workflow, and data may be transmitted in response to the alert notification based on the completed workflow.

Conditional execution specification of instructions using conditional extension slots in the same execute packet in a VLIW processor

In one embodiment, a system includes a memory and a processor core. The processor core includes functional units and an instruction decode unit configured to determine whether an execute packet of instructions received by the processing core includes a first instruction that is designated for execution by a first functional unit of the functional units and a second instruction that is a condition code extension instruction that includes a plurality of sets of condition code bits, wherein each set of condition code bits corresponds to a different one of the functional units, and wherein the sets of condition code bits include a first set of condition code bits that corresponds to the first functional unit. When the execute packet includes the first and second instructions, the first functional unit is configured to execute the first instruction conditionally based upon the first set of condition code bits in the second instruction.

LOOP EXECUTION IN A RECONFIGURABLE COMPUTE FABRIC
20220206804 · 2022-06-30 ·

Various examples are directed to systems and methods for executing a loop in a reconfigurable compute fabric. A first flow controller may initiate a first thread at a first synchronous flow to execute a first portion of a first iteration of the loop. A second flow controller may receive a first asynchronous message instructing the second flow controller to initiate a first thread at a second synchronous flow to execute a second portion of the first iteration. The second flow controller may determine that the first iteration of the loop is the last iteration of the loop to be executed and initiate the first thread at the second synchronous flow with a last iteration flag set.

Controlling the number of powered vector lanes via a register field

The vector data path is divided into smaller vector lanes. A register such as a memory mapped control register stores a vector lane number (VLX) indicating the number of vector lanes to be powered. A decoder converts this VLX into a vector lane control word, each bit controlling the ON of OFF state of the corresponding vector lane. This number of contiguous least significant vector lanes are powered. In the preferred embodiment the stored data VLX indicates that 2.sup.VLX contiguous least significant vector lanes are to be powered. Thus the number of vector lanes powered is limited to an integral power of 2. This manner of coding produces a very compact controlling bit field while obtaining substantially all the power saving advantage of individually controlling the power of all vector lanes.

METHOD AND APPARATUS FOR COMPARING PREDICTED LOAD VALUE WITH MASKED LOAD VALUE

A digital processor, method, and a non-transitory computer readable storage medium are described, and include a load pipeline operative to access a data content and convert the data content into a load result. The digital processor also includes a value prediction check circuit that is operative to access a speculative content, determine a predicted value from the speculative content, and determine a masked value by masking the data content with a data mask. The masked value is compared to the predicted value, and an action associated with the load result is commanded based upon the comparing of the masked value and the predicted value.

Data processing
11354126 · 2022-06-07 · ·

Data processing apparatus comprises vector processing circuitry to selectively apply vector processing operations defined by vector processing instructions to generate one or more data elements of a data vector comprising a plurality of data elements at respective data element positions of the data vector, according to the state of respective predicate flags associated with the positions of the data vector; and generator circuitry to generate instruction sample data indicative of processing activities of the vector processing circuitry for selected ones of the vector processing instructions, instruction sample data indicating at least the state of the predicate flags at execution of the selected vector processing instructions.