IPIQ

G06F9/3895

PARALLEL PROCESSOR, ADDRESS GENERATOR OF PARALLEL PROCESSOR, AND ELECTRONIC DEVICE INCLUDING PARALLEL PROCESSOR

20220164192 · 2022-05-26 ·

Disclosed is a parallel processor. The parallel processor includes a processing element array including a plurality of processing elements arranged in rows and columns, a row memory group including row memories corresponding to rows of the processing elements, a column memory group including column memories corresponding to columns of the processing elements, and a controller to generate a first address and a second address, to send the first address to the row memory group, and to send the second address to the column memory group. The controller supports convolution operations having mutually different forms, by changing a scheme of generating the first address.

Address interleaving for machine learning

11734608 · 2023-08-22 ·

Marvell Asia Pte Ltd

A system includes a memory, an interface engine, and a master. The memory is configured to store data. The inference engine is configured to receive the data and to perform one or more computation tasks of a machine learning (ML) operation associated with the data. The master is configured to interleave an address associated with memory access transaction for accessing the memory. The master is further configured to provide a content associated with the accessing to the inference engine.

Programmable coarse grained and sparse matrix compute hardware with advanced scheduling

11727527 · 2023-08-15 ·

Intel Corporation

One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex compute operation.

Neural network accelerator with parameters resident on chip

11727259 · 2023-08-15 ·

Google Llc

One embodiment of an accelerator includes a computing unit; a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations, the second memory bank configured to store a sufficient amount of the neural network parameters on the computing unit to allow for latency below a specified level with throughput above a specified level. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs computations associated with at least one element of a data array, the one or more computations performed by the MAC operator.

POINT TO POINT CONNECTED PROCESSING ELEMENTS WITH DATA JOINER COMPONENTS

20230252263 · 2023-08-10 ·

A system comprises a first processing element, a second processing element, a point-to-point connection between the first processing element and the second processing element, and a communication bus connecting together at least the first processing element and the second processing element. The first processing element includes a first matrix computing unit and the second processing element includes a second matrix computing unit. The point-to-point connection is configured to provide at least a result of the first processing element to a data joiner component of the second processing element configured to join at least the provided result of the first processing element with a result of the second matrix computing unit.

MECHANISM FOR REDUCING COHERENCE DIRECTORY CONTROLLER OVERHEAD FOR NEAR-MEMORY COMPUTE ELEMENTS

20230244496 · 2023-08-03 ·

A parallel processing (PP) level coherence directory, also referred to as a Processing In-Memory Probe Filter (PimPF), is added to a coherence directory controller. When the coherence directory controller receives a broadcast PIM command from a host, or a PIM command that is directed to multiple memory banks in parallel, the PimPF accelerates processing of the PIM command by maintaining a directory for cache coherence that is separate from existing system level directories in the coherence directory controller. The PimPF maintains a directory according to address signatures that define the memory addresses affected by a broadcast PIM command. Two implementations are described: a lightweight implementation that accelerates PIM loads into registers, and a heavyweight implementation that accelerates both PIM loads into registers and PIM stores into memory.

MEMORY-BASED DISTRIBUTED PROCESSOR ARCHITECTURE

20210365334 · 2021-11-25 ·

NeuroBlade, Ltd.

Distributed processors and methods for compiling code for execution by distributed processors are disclosed. In one implementation, a distributed processor may include a substrate; a memory array disposed on the substrate; and a processing array disposed on the substrate. The memory array may include a plurality of discrete memory banks, and the processing array may include a plurality of processor subunits, each one of the processor subunits being associated with a corresponding, dedicated one of the plurality of discrete memory banks. The distributed processor may further include a first plurality of buses, each connecting one of the plurality of processor subunits to its corresponding, dedicated memory bank, and a second plurality of buses, each connecting one of the plurality of processor subunits to another of the plurality of processor subunits.

METHODS AND SYSTEMS FOR MULTI-DIMENSIONAL AGGREGATION USING COMPOSITION

20210342367 · 2021-11-04 ·

Multi-dimensional aggregation using user interface workflow composition is described. Data for a computer implemented process is in a set of related data objects in a data store with each object in the set of related data objects representing an entity modelled in the process. A number of levels for a multi-dimensional aggregation associated with a request is determined where each level of the multi-dimensional aggregation represents a different dimension of data values to be aggregated. Data is aggregated at the levels of aggregation based on the relationships between parent objects and children objects. The data for a final level of aggregation is output to a user interface. The final result includes multiple dimensions of data.

Neural network unit that manages power consumption based on memory accesses per period

11216720 · 2022-01-04 ·

Shanghai Zhaoxin Semiconductor Co., Ltd.

G. Glenn Henry

An apparatus includes a first memory, processing units that access the first memory, and a counter that, for each period of a sequence of periods, holds an indication of accesses to the first memory during the period; and control logic that, for each period of the sequence of periods, monitors the indication to determine whether it exceeds the threshold and, if so, stalls the processing units from accessing the first memory for a remaining portion of the period.

Hardware accelerator for convolutional neural networks and method of operation thereof

11775313 · 2023-10-03 ·

Purdue Research Foundation

An accelerator for processing of a convolutional neural network (CNN) includes a compute core having a plurality of compute units. Each compute unit includes a first memory cache configured to store at least one vector in a map trace, a second memory cache configured to store at least one vector in a kernel trace, and a plurality of vector multiply-accumulate units (vMACs) connected to the first and second memory caches. Each vMAC includes a plurality of multiply-accumulate units (MACs). Each MAC includes a multiplier unit configured to multiply a first word that of the at least one vector in the map trace by a second word of the at least one vector in the kernel trace to produce an intermediate product, and an adder unit that adds the intermediate product to a third word to generate a sum of the intermediate product and the third word.

Patent classifications

G06F9/3895