G06F9/3879

VOICE PROCESSING SYSTEM AND METHOD, ELECTRONIC DEVICE AND READABLE STORAGE MEDIUM

The present application discloses a voice processing system and method, an electronic device and a readable storage medium, which relates to the field of voice processing technologies. The voice processing system includes: a neural-network processing unit (NPU) and an RISC-V processor; wherein the RISC-V processor includes predefined NPU instructions, and the RISC-V processor is configured to send the NPU instructions to the NPU to cause the NPU to perform corresponding neural network computation; the NPU includes a memory unit and a computing unit, and the memory unit includes a plurality of storage groups; the computing unit is configured to execute one of main computation, special computation, auxiliary computation and complex instruction set computing (CISC) control according to the received NPU instructions.

SUPPORTING INSTRUCTION SET ARCHITECTURE COMPONENTS ACROSS RELEASES
20220100527 · 2022-03-31 ·

Various embodiments of the present technology generally relate to methods and systems for providing a flexible, updatable, and backward compatible programmable logic controller (“PLC”) and instruction set library. The instruction set library in the PLC can be updated without downtime of the PLC or the machines controlled by the PLC. The instruction set library is decoupled from the PLC firmware and bound via an API so that instructions in the executable code are bound to the firmware such that updates to the instruction set library can happen between scans of the executable without requiring the firmware be updated. Further, the instruction set library may be partitioned to limit updates and the amount of the complete instruction set library that is stored on the PLC to only those used by the PLC.

Coprocessor Context Priority

A system may include a plurality of processors and a coprocessor. A plurality of coprocessor context priority registers corresponding to a plurality of contexts supported by the coprocessor may be included. The plurality of processors may use the plurality of contexts, and may program the coprocessor context priority register corresponding to a context with a value specifying a priority of the context relative to other contexts. An arbiter may arbitrate among instructions issued by the plurality of processors based on the priorities in the plurality of coprocessor context priority registers. In one embodiment, real-time threads may be assigned higher priorities than bulk processing tasks, improving bandwidth allocated to the real-time threads as compared to the bulk tasks.

VIRTUAL MACHINE FOR VIRTUALIZING GRAPHICS FUNCTIONS

A host computer for emulating a target system includes a host memory, a CPU, and a host GPU. The host memory is configured to store a library of graphics functions and a VM. The VM includes a section of emulated memory storing target code configured to execute on the target system. The CPU is configured to execute the VM to emulate the target system. The VM is configured to execute the target code and intercept a graphics function call in the target code. The VM is further configured to redirect the graphics function call to a corresponding graphics function in the library of graphics functions stored in the host memory. The host GPU is configured to execute the corresponding graphics function to determine at least one feature configured to be rendered on a display coupled to the host GPU.

FREQUENCY SCALING FOR PER-CORE ACCELERATOR ASSIGNMENTS

Methods for frequency scaling for per-core accelerator assignments and associated apparatus. A processor includes a CPU (central processing unit) having multiple cores that can be selectively configured to support frequency scaling and instruction extensions. Under this approach, some cores can be configured to support a selective set of AVX instructions (such as AVX3/5G-ISA instructions) and/or AMX instructions, while other cores are configured to not support these AVX/AMX instructions. In one aspect, the selective AVX/AMX instructions are implemented in one or more ISA extension units that are separate from the main processor core (or otherwise comprises a separate block of circuitry in a processor core) that can be selectively enabled or disabled. This enables cores having the separate unit(s) disabled to consume less power and/or operate at higher frequencies, while supporting the selective AVX/AMX instructions using other cores. These capabilities enhance performance and provides flexibility to handle a variety of applications requiring use of advanced AVX/AMX instructions to support accelerated workloads.

Programmable Fabric-Based Instruction Set Architecture for a Processor

A semiconductor device may include a programmable fabric and a processor. The processor may utilize one or more extension architectures. At least one of these extension architectures may be used to integrate and/or embed the programmable fabric into the processor as part of the processor. Specifically, a buffer of the extension architecture may be used to load data to and store data from the programmable fabric.

ACCELERATING NETWORK SECURITY MONITORING

Generally discussed herein are systems, devices, and methods for network security monitoring (NSM). A hardware queue manager (HQM) may include an input interface to receive first data from at least a first worker thread, queue duplication circuitry to generate a copy of at least a portion of the first data to create first copied data, and an output interface to (a) provide the first copied data to a second worker thread, and/or (b) provide at least a portion of the first data to a third worker thread.

ARCHITECTURE TO SUPPORT TANH AND SIGMOID OPERATIONS FOR INFERENCE ACCELERATION IN MACHINE LEARNING
20210248497 · 2021-08-12 ·

A processing unit to support inference acceleration for machine learning (ML) comprises an inline post processing unit configured to accept and maintain one or more lookup tables for performing a tanh and/or sigmoid operation/function. The inline post processing unit is further configured to accept data from a set of registers configured to maintain output from a processing block instead of streaming the data from an on-chip memory (OCM), perform the tanh and/or sigmoid operation on each element of the data from the processing block on a per-element basis via the one or more lookup tables, and stream post processing result of the per-element tanh and/or sigmoid operation back to the OCM after the tanh and/or sigmoid operation is complete.

ARCHITECTURE TO SUPPORT COLOR SCHEME-BASED SYNCHRONIZATION FOR MACHINE LEARNING
20210240521 · 2021-08-05 ·

A system to support a machine learning (ML) operation comprises an array-based inference engine comprising a plurality of processing tiles each comprising at least one or more of an on-chip memory (OCM) configured to maintain data for local access by components in the processing tile and one or more processing units configured to perform one or more computation tasks on the data in the OCM by executing a set of task instructions. The system also comprises a data streaming engine configured to stream data between a memory and the OCMs and an instruction streaming engine configured to distribute said set of task instructions to the corresponding processing tiles to control their operations and to synchronize said set of task instructions to be executed by each processing tile, respectively, to wait current certain task at each processing tile to finish before starting a new one.

Hardware accelerator with locally stored macros

Provided are techniques for a hardware accelerator with locally stored macros. A plurality of macros are stored in a lookup memory of a hardware accelerator. In response to receiving an operation code, the operation code is mapped to one or more macros of the plurality of macros, wherein each of the one or more macros includes micro-instructions. Each of the micro-instructions of the one or more macros is routed to a function block of a plurality of function blocks. Each of the micro-instructions is processed with the plurality of function blocks. Data from the processing of each of the micro-instructions is stored in an accelerator memory of the hardware accelerator. The data is moved from the accelerator memory to a host memory.