Patent classifications
G06F9/30192
Processors, methods, systems, and instructions to Partition a source packed data into lanes
A processor includes a decode unit to decode an instruction that is to indicate a source packed data that is to include a plurality of adjoining data elements, a number of adjoining data elements, and a destination. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the instruction, is to store a result packed data in the destination. The result packed data is to have a plurality of lanes that are each to store a different non-overlapping set of the indicated number of adjoining data elements aligned with a least significant end of the respective lane. The different non-overlapping sets of the indicated number of the adjoining data elements in adjoining lanes of the result packed data are to be separated from one another by at least one most significant data element position of the less significant lane of the adjoining lanes.
Static Identifications in Object-based Memory Access
A computer system having an address system of a first predetermined width in which each address of the first predetermined width in the address system includes a first portion identifying an object and a second portion identifying an offset relative to the object, where a static identifier for the first portion is predetermined to identify an address space having a second predetermined width that is smaller than the first predetermined width, or a space of kernel objects.
PROVISIONING SECURE/ENCRYPTED VIRTUAL MACHINES IN A CLOUD INFRASTRUCTURE
A method, system and apparatus for provisioning a computation into a trusted execution environment, including verifying the trusted execution environment, generating integrity information of the computation, generating sealed data, sending information of the computation, the sealed data, and integrity information to the trusted execution environment, confirming the sealed data, and verifying integrity of the computation information from the integrity information and the computation information.
TASK SYNCHRONIZATION FOR ACCELERATED DEEP LEARNING
Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a compute element and a routing element. Each compute element has memory. Each router enables communication via wavelets with at least nearest neighbors in a 2D mesh. Routing is controlled by respective virtual channel specifiers in each wavelet and routing configuration information in each router. A compute element conditionally selects for task initiation a previously received wavelet specifying a particular one of the virtual channels. The conditional selecting excludes the previously received wavelet for selection until at least block/unblock state maintained for the particular virtual channel is in an unblock state. The compute element executes block/unblock instructions to modify the block/unblock state.
Apparatus and method for managing a capability domain
An apparatus and method are provided for managing a capability domain. The apparatus has processing circuitry for executing instructions, the processing circuitry when in a default state being arranged to operate in a capability domain comprising capabilities used to constrain operations performed by the processing circuitry when executing the instructions. A program counter capability storage element is also provided to store a program counter capability used by the processing circuitry to determine a program counter value. The program counter capability is arranged to identify a capability state for the processing circuitry. The processing circuitry is then arranged, when the capability state indicates the default state, to operate in the capability domain. However, when the capability state indicates the executive state, the processing circuitry is arranged to operate in a manner less constrained than when in the default state so as to allow modification of the capability domain. This provides a simple and effective mechanism for selectively allowing the apparatus to modify the capability domain.
Vector friendly instruction format and execution thereof
- Robert C. Valentine ,
- Jesus Corbal San Adrian ,
- Roger Espasa Sans ,
- Robert D. Cavin ,
- Bret L. Toll ,
- Santiago Galan Duran ,
- Jeffrey G. Wiedemeier ,
- Sridhar Samudrala ,
- Milind Baburao Girkar ,
- Edward Thomas Grochowski ,
- Jonathan Cannon Hall ,
- Dennis R. Bradford ,
- Elmoustapha Ould-Ahmed-Vall ,
- James C Abel ,
- Mark Charney ,
- Seth Abraham ,
- Suleyman Sair ,
- Andrew Thomas Forsyth ,
- Lisa Wu ,
- Charles Yount
A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams.
Detecting and preventing exploits of software vulnerability using instruction tags
A secure processor, comprising a logic execution unit configured to process data based on instructions; a communication interface unit, configured to transfer of the instructions and the data, and metadata tags accompanying respective instructions and data; a metadata processing unit, configured to enforce specific restrictions with respect to at least execution of instructions, access to resources, and manipulation of data, selectively dependent on the received metadata tags; and a control transfer processing unit, configured to validate a branch instruction execution and an entry point instruction of each control transfer, selectively dependent on the respective metadata tags.
Drift detection in static processes
In many industrial settings, a process is repeated many times, for instance to transform physical inputs into physical outputs. To detect a situation involving such a process in which errors are likely to occur, information about the process may be collected to determine time-varying feature vectors. Then, a drift value may be determined by comparing feature vectors corresponding with different time periods. When the drift value crosses a designated drift threshold, a predicted outcome value may be determined by applying a prediction model. Sensitivity values may be determined for different features, and elements of the process may then be updated based at least in part on the sensitivity values.
Optimized compute hardware for machine learning operations
A processing cluster of a processing cluster array comprises a plurality of registers to store input values of vector input operands, the input values of at least some of the vector input operands having different bit lengths than those of other input values of other vector input operands, and a compute unit to execute a dot-product instruction with the vector input operands to perform a number of parallel multiply operations and an accumulate operation per 32-bit lane based on a bit length of the smallest-sized input value of a first vector input operand relative to the 32-bit lane.
COMPUTER-READABLE RECORDING MEDIUM STORING CONVERSION PROGRAM AND CONVERSION METHOD
A recording medium stores a program for causing a computer to execute process including: incrementing a counter when a CISC instruction is converted into a RISC instruction; updating conversion timing of a register used for conversion with a value of the counter; recording a difference before and after update of conversion timing as an interval; selecting a register number from among register numbers that have a same interval condition, and updating the use register number with a selected register number; saving data for a register indicated by the updated use register number and generating an instruction for reading data of a memory operand; when a CISC instruction before conversion does not include the use register number, generating an instruction for reading data of a memory operand without restoring and saving data for a register indicated by the use register number; and generating the RISC instruction equivalent to the CISC instruction.