Patent classifications
G06F9/30098
Fast perfect issue of dependent instructions in a distributed issue queue system
Embodiments for fast perfect issue of dependent instructions in a distributed issue queue system. Producer information of a producer instruction is inserted in a lookup entry in a lookup table, the lookup entry being allocated to a register. It is determined that the register corresponding to the lookup entry is a source for a dependent instruction. Responsive to storing the dependent instruction in an issue queue, the producer information is stored in a back-to-back entry of a back-to-back wakeup table, the back-to-back entry corresponding to the dependent instruction. The producer instruction is issued which causes the producer information of the producer instruction to be sent to the back-to-back wakeup table. It is determined that there is a match between the producer information and the back-to-back entry for the dependent instruction, and the dependent instruction is caused to issue based on the match.
Vector processing unit
A vector processing unit is described, and includes processor units that each include multiple processing resources. The processor units are each configured to perform arithmetic operations associated with vectorized computations. The vector processing unit includes a vector memory in data communication with each of the processor units and their respective processing resources. The vector memory includes memory banks configured to store data used by each of the processor units to perform the arithmetic operations. The processor units and the vector memory are tightly coupled within an area of the vector processing unit such that data communications are exchanged at a high bandwidth based on the placement of respective processor units relative to one another, and based on the placement of the vector memory relative to each processor unit.
System log collection method
The present invention provides a method for collecting system logs, applied to an intelligent device with an Android system, wherein providing a daemon process for log collecting, and the daemon process is started when the system of an Android device is started; providing an application process for log processing; providing an external storage device for accessing the intelligent device; the method comprises the following steps: the daemon process collects the application framework layer and logs of the Linux kernel, and saves the logs in a first storage path of the Android system; the application process creating a second storage path in the external storage device after identifying the accessed external storage device; and the application process obtaining the logs from the first storage path and saving the logs in the second storage path.
System and method for formal fault propagation analysis
A system and method for formulating a sequential equivalency problem for fault (non)propagation with minimal circuit logic duplication by leveraging information about the location and nature of a fault. The system and method further apply formal checking to safety diagnoses and efficiently models simple and complex transient faults.
Input/output (I/O) memory management unit (IOMMU) multi-core interference mitigation
A multicore processing environment (MCPE) is disclosed. In embodiments, the MCPE includes multiple processing cores hosting multiple user applications configured for simultaneous execution. The cores and user applications share system resources including main memory and input/output (I/O) domains, each I/O domain including multiple I/O devices capable of requesting inbound access to main memory through an I/O memory management unit (IOMMU). For example, the IOMMU cache associates unique cache tags to each I/O device based on device identifiers or settings determined by the system registers, caching the configuration data for each I/O device under the appropriate cache tag. When each I/O device requests main memory access, the IOMMU cache refers to the appropriate configuration data under the corresponding unique cache tag. This prevents contention in the IOMMU cache caused by one device evicting the cache entry of another, minimizing interference channels by reducing the need for main memory access.
Pipeline flattener with conditional triggers
A semiconductor device comprising a processor having a pipelined architecture and a pipeline flattener and a method for operating a pipeline flattener in a semiconductor device are provided. The processor comprises a pipeline having a plurality of pipeline stages and a plurality of pipeline registers that are coupled between the pipeline stages. The pipeline flattener comprises a plurality of trigger registers for storing a trigger, wherein the trigger registers are coupled between the pipeline stages.
Thread commencement and completion using work descriptor packets in a system having a self-scheduling processor and a hybrid threading fabric
Representative apparatus, method, and system embodiments are disclosed for a self-scheduling processor which also provides additional functionality. Representative embodiments include a self-scheduling processor, comprising: a processor core adapted to execute a received instruction; and a core control circuit adapted to automatically schedule an instruction for execution by the processor core in response to a received work descriptor data packet. In another embodiment, the core control circuit is also adapted to schedule a fiber create instruction for execution by the processor core, to reserve a predetermined amount of memory space in a thread control memory to store return arguments, and to generate one or more work descriptor data packets to another processor or hybrid threading fabric circuit for execution of a corresponding plurality of execution threads. Event processing, data path management, system calls, memory requests, and other new instructions are also disclosed.
System and method for implementing trusted execution environment on PCI device
System and method for providing trusted execution environments uses a peripheral component interconnect (PCI) device of a computer system to receive and process commands to create and manage a trusted execution environment for a software process running in the computer system. The trusted execution environment created in the PCI device is then used to execute operations for the software process.
OPERATOR REGISTRATION METHOD AND APPARATUS FOR DEEP LEARNING FRAMEWORK, DEVICE AND STORAGE MEDIUM
The present disclosure provides an operator registration method and apparatus for a deep learning framework, a device and a storage medium, relates to the field of computer technologies, and specifically to the field of artificial intelligence such as deep learning. The operator registration method for a deep learning framework includes: receiving registration information provided by a user for registering operators with the deep learning framework, the registration information including: a custom calculation function, the custom calculation function being written in a manner irrelevant to the deep learning framework; building operator meta-information in the deep learning framework based on the registration information; and constructing a to-be-registered operator within the deep learning framework based on the operator meta-information, and registering the to-be-registered operator in a global operator table within the deep learning framework. The present disclosure can simplify an operator registration process.
Combining load or store instructions
Various aspects disclosed herein relate to combining instructions to load data from or store data in memory while processing instructions in a computer processor. More particularly, at least one pattern of multiple memory access instructions that reference a common base register and do not fully utilize an available bus width may be identified in a processor pipeline. In response to determining that the multiple memory access instructions target adjacent memory or non-contiguous memory that can fit on a single cache line, the multiple memory access instructions may be replaced within the processor pipeline with one equivalent memory access instruction that utilizes more of the available bus width than either of the replaced memory access instructions.