Patent classifications
G06F9/30079
Branch stall elimination in pipelined microprocessors
A system can include a microprocessor having a prefetch queue including a plurality of slots configured to store program counter values (PCVs) and instructions, a pipeline configured to receive instructions from the prefetch queue, and a select circuit coupled to the prefetch queue. The select circuit may selectively freeze a first slot of the plurality of slots and selectively output a frozen PCV and a frozen instruction from the first slot while frozen. The microprocessor can include write logic coupled to the prefetch queue and a comparator circuit coupled to the prefetch queue and the select circuit. The write logic may load data into unfrozen slots of the prefetch queue. The comparator circuit may compare a target PCV with the frozen PCV to determine a match. The select circuit indicates, to the pipeline, whether the frozen instruction is valid based on the comparing.
Methods and systems for processing vehicle sensor data across multiple digital signal processing cores virtually arranged in segments based on a type of sensor
Example embodiments relate to scheduling and processing sensor data across multiple digital signal processing (DSP) cores. A system may include DSP cores that are virtually arranged into a first segment and a second segment, each with DSP cores arranged in a linear order. The system may process sensor data using the DSP cores and a processing pipeline. DSP cores from the first segment are configured to initiate processing a first stage sequentially based on the linear order until all DSP cores are processing portions in parallel, collate outputs to produce a first stage output, and provide a first signal to the second segment based on producing the first stage output. Responsive to receiving the first signal, DSP cores from the second segment are configured to initiate processing the second stage sequentially based on the linear order until all DSP cores are processing portions in parallel, collate outputs to produce a second stage output, and provide a second signal based on producing the second stage output. The system may provide instructions based on a combination of the first stage output and the second stage output.
CALCULATION METHOD AND RELATED PRODUCT
The present disclosure provides a computing method that is applied to a computing device. The computing device includes: a memory, a register unit, and a matrix computing unit. The method includes the following steps: controlling, by the computing device, the matrix computing unit to obtain a first operation instruction, where the first operation instruction includes a matrix reading instruction for a matrix required for executing the instruction; controlling, by the computing device, an operating unit to send a reading command to the memory according to the matrix reading instruction; and controlling, by the computing device, the operating unit to read a matrix corresponding to the matrix reading instruction in a batch reading manner, and executing the first operation instruction on the matrix. The technical solutions in the present disclosure have the advantages of fast computing speed and high efficiency.
Data science platform
Disclosed herein is a data science platform that is built with a specific focus on monitoring and analyzing the operation of industrial assets, such as trucking assets, rail assets, construction assets, mining assets, wind assets, thermal assets, oil-and-gas assets, and manufacturing assets, among other possibilities. The disclosed data science platform is configured to carry out operations including (i) ingesting asset-related data from various different data sources and storing it for downstream use, (ii) transforming the ingested asset-related data into a desired formatting structure and then storing it for downstream use, (iii) evaluating the asset-related data to derive insights about an asset's operation that may be of interest to a platform user, which may involve data science models that have been specifically designed to analyze asset-related data in order to gain a deeper understanding of an asset's operation, and (iv) presenting derived insights and other asset-related data to platform users.
INSTRUCTION BASED CONTROL OF MEMORY ATTRIBUTES
Embodiments described herein provide techniques to facilitate instruction-based control of memory attributes. One embodiment provides a graphics processor comprising a processing resource, a memory device, a cache coupled with the processing resources and the memory, and circuitry to process a memory access message received from the processing resource. The memory access message enables access to data of the memory device. To process the memory access message, the circuitry is configured to determine one or more cache attributes that indicate whether the data should be read from or stored the cache. The cache attributes may be provided by the memory access message or stored in state data associated with the data to be accessed by the access message.
PIPELINE ARBITRATION
A method includes receiving, by a first stage in a pipeline, a first transaction from a previous stage in pipeline; in response to first transaction comprising a high priority transaction, processing high priority transaction by sending high priority transaction to a buffer; receiving a second transaction from previous stage; in response to second transaction comprising a low priority transaction, processing low priority transaction by monitoring a full signal from buffer while sending low priority transaction to buffer; in response to full signal asserted and no high priority transaction being available from previous stage, pausing processing of low priority transaction; in response to full signal asserted and a high priority transaction being available from previous stage, stopping processing of low priority transaction and processing high priority transaction; and in response to full signal being de-asserted, processing low priority transaction by sending low priority transaction to buffer.
Cache size change
A method includes determining, by a level one (L1) controller, to change a size of a L1 main cache; servicing, by the L1 controller, pending read requests and pending write requests from a central processing unit (CPU) core; stalling, by the L1 controller, new read requests and new write requests from the CPU core; writing back and invalidating, by the L1 controller, the L1 main cache. The method also includes receiving, by a level two (L2) controller, an indication that the L1 main cache has been invalidated and, in response, flushing a pipeline of the L2 controller; in response to the pipeline being flushed, stalling, by the L2 controller, requests received from any master; reinitializing, by the L2 controller, a shadow L1 main cache. Reinitializing includes clearing previous contents of the shadow L1 main cache and changing the size of the shadow L1 main cache.
NPU IMPLEMENTED FOR FUSION-ARTIFICIAL NEURAL NETWORK TO PROCESS HETEROGENEOUS DATA PROVIDED BY HETEROGENEOUS SENSORS
A neural processing unit (NPU) includes a controller including a scheduler, the controller configured to receive from a compiler a machine code of an artificial neural network (ANN) including a fusion ANN, the machine code including data locality information of the fusion ANN, and receive heterogeneous sensor data from a plurality of sensors corresponding to the fusion ANN; at least one processing element configured to perform fusion operations of the fusion ANN including a convolution operation and at least one special function operation; a special function unit (SFU) configured to perform a special function operation of the fusion ANN; and an on-chip memory configured to store operation data of the fusion ANN, wherein the schedular is configured to control the at least one processing element and the on-chip memory such that all operations of the fusion ANN are processed in a predetermined sequence according to the data locality information.
BLOCKCHAIN MACHINE COMPUTE ACCELERATION ENGINE WITH OUT-OF-ORDER SUPPORT
Embodiments herein describe a hardware accelerator for a blockchain node. The hardware accelerator is used to perform a validation operation to validate one or more transactions before those transactions are committed to a ledger of a blockchain. The embodiments herein describe an out-of-order validation scheme where a collector is used to collect validated transactions out of order. Thus, if a validation pipeline has finished validating a later transaction before another validation pipeline has finished validating an earlier transaction, the pipeline can nonetheless send its results to the collector and retrieve another transaction from a scheduler. In this manner, the downtime for the validation pipelines is reduced or eliminated.
Shadow caches for level 2 cache controller
An apparatus including a CPU core and a L1 cache subsystem coupled to the CPU core. The L1 cache subsystem includes a L1 main cache, a L1 victim cache, and a L1 controller. The apparatus includes a L2 cache subsystem coupled to the L1 cache subsystem. The L2 cache subsystem includes a L2 main cache, a shadow L1 main cache, a shadow L1 victim cache, and a L2 controller. The L2 controller receives an indication from the L1 controller that a cache line A is being relocated from the L1 main cache to the L1 victim cache; in response to the indication, update the shadow L1 main cache to reflect that the cache line A is no longer located in the L1 main cache; and in response to the indication, update the shadow L1 victim cache to reflect that the cache line A is located in the L1 victim cache.