Patent classifications
G06F9/30181
DATA PROCESSING APPARATUS, CHIP, AND DATA PROCESSING METHOD
Disclosed is a data processing apparatus, chip, and data processing method. The data processing apparatus includes: a plurality of processing cores having a preset execution sequence, the plurality of processing cores including a head processing core and at least one other processing core; wherein the head processing core is configured to send an instruction, and receive and execute a program obtained according to the instruction; and each of the other processing cores is configured to receive and execute a program sent by a previous processing core in the preset execution sequence.
Virtualized multicore systems with extended instruction heterogeneity
A system on a chip may include a plurality of data plane processor cores sharing a common instruction set architecture. At least one of the data plane processor cores is specialized to perform a particular function via extensions to the otherwise common instruction set architecture. Such systems on a chip may have reduced physical complexity, cost, and time-to-market, and may provide improvements in core utilization and reductions in system power consumption.
Reducing a number of commands transmitted to a co-processor by merging register-setting commands having address continuity
An electronic apparatus and a method for reducing the number of commands are provided. The electronic apparatus includes a central processor and a co-processor. The central processor generates a plurality of original register setting commands to set at least one bit of at least one register of the co-processor. The original register setting commands include a plurality of first original register setting commands, and a plurality of setting targets of the first original register setting commands have address continuity. The central processor merges the first original register setting commands to generate at least one merged register setting command. The central processor transmits the at least one merged register setting command to the co-processor.
INACTIVATING BASIC BLOCKS OF PROGRAM CODE TO PREVENT CODE REUSE ATTACKS
An approach is provided that, after receiving a request to execute a computer program, determines an active set of metadata that corresponds to the requested computer program and then loads basic blocks of the requested computer program into memory. One of the loaded basic blocks is a starting block of the requested computer program. The memory also stores basic blocks corresponding to some previously loaded computer programs. The approach also inactivates basic blocks that are currently stored in the memory, with the inactivated basic blocks being identified based on a comparison of the active set of metadata to the sets of metadata that corresponding to the basic blocks of previously loaded computer programs. After inactivating some basic blocks, the approach executes the starting block of the requested computer program.
Family of lossy sparse load SIMD instructions
Systems, apparatuses, and methods for implementing a family of lossy sparse load single instruction, multiple data (SIMD) instructions are disclosed. A lossy sparse load unit (LSLU) loads a plurality of values from one or more input vector operands and determines how many non-zero values are included in one or more input vector operands of a given instruction. If the one or more input vector operands have less than a threshold number of non-zero values, then the LSLU causes an instruction for processing the one or more input vector operands to be skipped. In this case, the processing of the instruction of the one or more input vector operands is deemed to be redundant. If the one or more input vector operands have greater than or equal to the threshold number of non-zero values, then the LSLU causes an instruction for processing the input vector operand(s) to be executed.
Computing device and method
The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.
System and method for securely debugging across multiple execution contexts
A system and method for a virtual processor base/virtual execution context arrangement. The disclosed arrangement utilizes chiplets comprising core logic and defined instruction sets. The chiplets are adapted to operate in conjunction with one or more active execution contexts to enable the execution of particular processes. In particular, the defined instruction sets includes a instructions for processor debugging. The system and method support the compartmentalization of such debugging instructions so as to provide enhanced processor and process security.
INSTRUCTION EXECUTION METHOD AND INSTRUCTION EXECUTION DEVICE
An instruction configuration and execution method includes the following steps. A target instruction is received through an instruction cache. The target instruction is decoded by an instruction translator. It is determined whether the target instruction has the authority to read or write the model specific register in an unprivileged state. It is determined whether the model specific register index of the specific instruction corresponds to a specific model specific register, so as to order the microprocessor to perform an instruction serialization operation.
Extended memory neuromorphic component
Systems, apparatuses, and methods related to an extended memory neuromorphic component for performing operations in memory are described. An example apparatus can include a plurality of computing devices. Each of the computing devices can include a processing unit and a memory array. The example apparatus can further include a communication subsystem coupled to the at least one of the plurality of computing devices and to a neuromorphic component. At least one of the plurality of computing devices can receive a request from a host to perform an operation, receive an indication of data to be access in a memory device to perform the operation, and send an indication to the neuromorphic component to monitor the data to be accessed. The neuromorphic component can intercept data and determine that a portion of the data should be flagged.
Framework integration for instance-attachable accelerator
- Sudipta Sengupta ,
- Poorna Chand Srinivas Perumalla ,
- Jalaja Kurubarahalli ,
- Samuel Oshin ,
- Cory Pruce ,
- Jun Wu ,
- Eftiquar Shaikh ,
- Pragya Agarwal ,
- David Thomas ,
- Karan Kothari ,
- Daniel Evans ,
- Umang Wadhwa ,
- Mark Klunder ,
- Rahul Sharma ,
- Zdravko Pantic ,
- Dominic Rajeev Divakaruni ,
- Andrea Olgiati ,
- Leo Dirac ,
- Nafea Bshara ,
- Bratin Saha ,
- Matthew Wood ,
- Swaminathan Sivasubramanian ,
- Rajankumar Singh
Techniques for partitioning data flow operations between execution on a compute instance and an attached accelerator instance are described. A set of operations supported by the accelerator is obtained. A set of operations associated with the data flow is obtained. An operation in the set of operations associated with the data flow is identified based on the set of operations supported by the accelerator. The accelerator executes the first operation.