G06F9/3875

DEVICE FOR PROCESSING HOMOMORPHICALLY ENCRYPTED DATA

There is provided a device for processing homomorphically encrypted data. The device includes: inter-line butterfly array blocks, each inter-line butterfly array block including inter-line modulus butterfly units, each inter-line modulus butterfly unit being configured to perform a modulus butterfly operation based on a computation pair of data points received corresponding to a pair of input data points at a same row of a matrix of input data points; intra-line butterfly array blocks, each intra-line butterfly array block including intra-line modulus butterfly units, each intra-line modulus butterfly unit being configured to perform a modulus butterfly operation based on a computation pair of data points received corresponding to a pair of input data points at a same column of the matrix of input data points; and a clock counter communicatively coupled to each inter-line butterfly array block and each intra-line butterfly array block, and configured to output a counter signal for controlling each inter-line butterfly array block and each intra-line butterfly array block to operate with single cycle initiation interval. The matrix of input data points includes columns of input data points, whereby parallel input data points derived from the homomorphically encrypted data are arranged into the columns of input data points. Furthermore, the inter-line butterfly array blocks and the intra-line butterfly array blocks are arranged in series to form a pipeline for processing the matrix of input data points.

Metadata predictor

Embodiments for a metadata predictor. An index pipeline generates indices in an index buffer in which the indices are used for reading out a memory device. A prediction cache is populated with metadata of instructions read from the memory device. A prediction pipeline generates a prediction using the metadata of the instructions from the prediction cache, the populating of the prediction cache with the metadata of the instructions being performed asynchronously to the operating of the prediction pipeline.

Graphics processing

Disclosed is a method of handling thread termination events within a graphics processor when a group of plural execution lanes are executing in a co-operative state. When a group of lanes is in the co-operative state, in response to the graphics processor encountering an event that means that a subset of one or more execution threads associated with the group of execution lanes in the co-operative state should be terminated: it is determined whether a condition to immediately terminate the subset of one or more execution threads is met. When the condition is not met, the group of execution lanes continue their execution in the co-operative state, but a record is stored to track that the threads in the subset of one or more execution threads should subsequently be terminated.

Configurable pipeline based on error detection mode in a data processing system

A method includes providing a data processor having an instruction pipeline, where the instruction pipeline has a plurality of instruction pipeline stages, and where the plurality of instruction pipeline stages includes a first instruction pipeline stage and a second instruction pipeline stage. The method further includes providing a data processor instruction that causes the data processor to perform a first set of computational operations during execution of the data processor instruction, performing the first set of computational operations in the first instruction pipeline stage if the data processor instruction is being executed and a first mode has been selected, and performing the first set of computational operations in the second instruction pipeline stage if the data processor instruction is being executed and a second mode has been selected.

Dynamically foldable and unfoldable instruction fetch pipeline

A dynamically-foldable instruction fetch pipeline receives a first fetch request that includes a fetch virtual address and includes first, second and third sub-pipelines that respectively include a translation lookaside buffer (TLB) that translates the fetch virtual address into a fetch physical address, a tag random access memory (RAM) of a physically-indexed physically-tagged set associative instruction cache that receives a set index that selects a set of tag RAM tags for comparison with a tag portion of the fetch physical address to determine a correct way of the instruction cache, and a data RAM of the instruction cache that receives the set index and a way number that together specify a data RAM entry from which to fetch an instruction block. When a control signal indicates a folded mode, the sub-pipelines operate in a parallel manner. When the control signal indicates a unfolded mode, the sub-pipelines operate in a sequential manner.

SYSTEMS AND METHODS FOR AUTONOMOUS MONITORING IN END-TO-END ORCHESTRATION

Various systems and methods for autonomously monitoring intent-driven end-to-end (E2E) orchestration are described herein. An orchestration system is configured to: receive, at the orchestration system, an intent-based service level objective (SLO) for execution of a plurality of tasks; generate a common context that relates the SLO to the execution of the plurality of tasks; select a plurality of monitors to monitor the execution of the plurality of tasks, the plurality of monitors to log a plurality of key performance indicators; generate a domain context for the plurality of tasks; configure an analytics system with the plurality of monitors and the plurality of key performance indicators correlated by the domain contexts; deploy the plurality of monitors to collect telemetry; monitor the execution of the plurality of tasks using the telemetry from the plurality of monitors; and perform a responsive action based on the telemetry.

CONFIGURABLE PIPELINE BASED ON ERROR DETECTION MODE IN A DATA PROCESSING SYSTEM
20190065207 · 2019-02-28 · ·

A method includes providing a data processor having an instruction pipeline, where the instruction pipeline has a plurality of instruction pipeline stages, and where the plurality of instruction pipeline stages includes a first instruction pipeline stage and a second instruction pipeline stage. The method further includes providing a data processor instruction that causes the data processor to perform a first set of computational operations during execution of the data processor instruction, performing the first set of computational operations in the first instruction pipeline stage if the data processor instruction is being executed and a first mode has been selected, and performing the first set of computational operations in the second instruction pipeline stage if the data processor instruction is being executed and a second mode has been selected.

STREAM PROCESSOR WITH OVERLAPPING EXECUTION

Systems, apparatuses, and methods for implementing a stream processor with overlapping execution are disclosed. In one embodiment, a system includes at least a parallel processing unit with a plurality of execution pipelines. The processing throughput of the parallel processing unit is increased by overlapping execution of multi-pass instructions with single pass instructions without increasing the instruction issue rate. A first plurality of operands of a first vector instruction are read from a shared vector register file in a single clock cycle and stored in temporary storage. The first plurality of operands are accessed and utilized to initiate multiple instructions on individual vector elements on a first execution pipeline in subsequent clock cycles. A second plurality of operands are read from the shared vector register file during the subsequent clock cycles to initiate execution of one or more second vector instructions on the second execution pipeline.

On-the-fly conversion during load/store operations in a vector processor

Systems and methods for performing on-the-fly format conversion on data vectors during load/store operations are described herein. In one embodiment, a method for loading a data vector from a memory into a vector unit comprises reading a plurality of samples from the memory, wherein the plurality of samples are packed in the memory. The method also comprises unpacking the samples to obtain a plurality of unpacked samples, performing format conversion on the unpacked samples in parallel, and sending at least a portion of the format-converted samples to the vector unit.

METHOD AND APPARATUS FOR EFFICIENT SCHEDULING FOR ASYMMETRICAL EXECUTION UNITS
20180232237 · 2018-08-16 ·

A method and system performs instruction scheduling in an out-of-order microprocessor pipeline. The method and system selects a first set of instructions to dispatch from a scheduler to an execution module, wherein the execution module comprises two types of execution units. The first type of execution unit executes both a first and a second type of instruction and the second type of execution unit executes only the second type. Next, the method selects a second set of instructions to dispatch, which is a subset of the first set and comprises only instructions of the second type. The method determines a third set of instructions, which comprises instructions not selected as part of the second set. Further, the method dispatches the second set for execution using the second type of execution unit and dispatching the third set for execution using the first type of execution unit.