Patent classifications
G06F9/30181
Marking current context data to control a context-data-dependent processing operation to save current or default context data to a data location
A data processing system includes processing circuitry for executing context-data-dependent program instructions which are decoded by decoder circuitry. Such context-data-dependent program instructions perform processing which is dependent upon currently existing context data. As an example, the context-data-dependent program instructions may be floating point instructions and the context data may be rounding mode information. The decoder circuitry supports a context save instruction which saves context data when it is marked as having been used and saves default context data when the current context data is marked as not having been used. The decoder circuitry further supports a context restore instruction which restores context data when the current context data is marked as having been used and permits the current context data to continue for future use when it is marked as currently unused.
Method for control-flow integrity protection, apparatus, device and storage medium
Embodiments of the present disclosure provide a method for control-flow integrity protection, including: changing preset bits of all legal target addresses of a current indirect branch instruction in a control flow of a program to be protected to be same; and rewriting preset bits of a current target address of the current indirect branch instruction to be same as the preset bits of the legal target addresses, so that the program to be protected terminates when the current target address is tampered with. By changing the preset bits of all the legal target addresses of the current indirect branch instruction to be same and rewriting the preset bits of the current target address to be consistent with the preset bits of the legal target addresses, traditional label comparison is replaced by the preset bit overlap operation, reducing performance overhead and improving attack defense efficiency.
Computing device and method
The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.
INLINE DATA INSPECTION FOR WORKLOAD SIMPLIFICATION
A method, computer readable medium, and processor are described herein for inline data inspection by using a decoder to decode a load instruction, including a signal to cause a circuit in a processor to indicate whether data loaded by a load instruction exceeds a threshold value. Moreover, an indication of whether data loaded by a load instruction exceeds a threshold value may be stored.
METHOD AND DEVICE FOR PROVIDING A VECTOR STREAM INSTRUCTION SET ARCHITECTURE EXTENSION FOR A CPU
A method and device for providing a vector stream instruction set architecture extension for a CPU. In one aspect, there is provided a vector stream engine unit comprising: a first fast memory storage for temporarily storing data of vector data streams from a memory for loading into a vector register file; a second fast memory storage for temporarily storing data of the vector data streams from the vector register file for loading into the memory; a prefetcher configured to prefetch data of the vector data streams from the memory into the first fast storage memory, and prefetch data of the vector data streams from the vector register file into the second fast storage memory; and a stream configuration table (SCT) storing stream information for prefetching data from the vector data streams.
Systems, methods, and apparatuses for heterogeneous computing
- Rajesh M. Sankaran ,
- Gilbert Neiger ,
- Narayan Ranganathan ,
- Stephen R. Van Doren ,
- Joseph Nuzman ,
- Niall D. McDonnell ,
- Michael A. O'Hanlon ,
- Lokpraveen B. Mosur ,
- Tracy Garrett Drysdale ,
- Eriko Nurvitadhi ,
- Asit K. Mishra ,
- Ganesh Venkatesh ,
- Deborah T. Marr ,
- Nicholas P. Carter ,
- Jonathan D. Pearce ,
- Edward T. Grochowski ,
- Richard J. Greco ,
- Robert Valentine ,
- Jesus Corbal ,
- Thomas D. Fletcher ,
- Dennis R. Bradford ,
- Dwight P. Manley ,
- Mark J. Charney ,
- Jeffrey J. Cook ,
- Paul Caprioli ,
- Koichi Yamada ,
- Kent D. Glossop ,
- David B. Sheffield
Embodiments of systems, methods, and apparatuses for heterogeneous computing are described. In some embodiments, a hardware heterogeneous scheduler dispatches instructions for execution on one or more plurality of heterogeneous processing elements, the instructions corresponding to a code fragment to be processed by the one or more of the plurality of heterogeneous processing elements, wherein the instructions are native instructions to at least one of the one or more of the plurality of heterogeneous processing elements.
Co-scheduled loads in a data processing apparatus
A data processing apparatus and method of operating such is disclosed. Issue circuitry buffers operations prior to execution until operands are available in a set of registers. A first and a second load operation are identified in the issue circuitry, when both are dependent on a common operand, and when the common operand is available in the set of registers. Load circuitry has a first address generation unit to generate a first address for the first load operation and a second address generation unit to generate a second address for the second load operation. An address comparison unit compares the first address and the second address. The load circuitry is arranged to cause a merged lookup to be performed in local temporary storage, when the address comparison unit determines that the first and the second address differ by less than a predetermined address range characteristic of the local temporary storage.
Memory-network processor with programmable optimizations
Various embodiments are disclosed of a multiprocessor system with processing elements optimized for high performance and low power dissipation and an associated method of programming the processing elements. Each processing element may comprise a fetch unit and a plurality of address generator units and a plurality of pipelined datapaths. The fetch unit may be configured to receive a multi-part instruction, wherein the multi-part instruction includes a plurality of fields. First and second address generator units may generate, based on different fields of the multi-part instruction, addresses from which to retrieve first and second data for use by an execution unit for the multi-part instruction or a subsequent multi-part instruction. The execution units may perform operations using a single pipeline or multiple pipelines based on third and fourth fields of the multi-part instruction.
Apparatus and method for store pairing with reduced hardware requirements
An apparatus and method for pairing store operations. For example, one embodiment of a processor comprises: a grouping eligibility checker to evaluate a plurality of store instructions based on a set of grouping rules to determine whether two or more of the plurality of store instructions are eligible for grouping; and a dispatcher to simultaneously dispatch a first group of store instructions of the plurality of store instructions determined to be eligible for grouping by the grouping eligibility checker.
Execution elision of intermediate instruction by processor
A method for operation of a processor core is provided. First instruction data is consulted to determine whether a second instruction has execution data that matches the first instruction data. The first instruction data is from a first instruction. In response to determining that the second instruction has execution data that matches the first instruction data, prior data is copied into the second instruction. The first instruction depends on the prior data. After receiving an availability indication of the prior data, both the first instruction and the second instruction are woken for execution, without requiring execution of the first instruction before waking of the second instruction. The second instruction is executed by using the prior data as a skip of the first instruction. A computer system and a processor core configured to operate according to the method are also disclosed herein.